SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
Automatic vs. human question answering over multimedia meeting recordings
Quoc Anh Le1,∗
, Andrei Popescu-Belis2
1
University of Namur, Belgium
2
Idiap Research Institute, Martigny, Switzerland
lequocanh@student.fundp.ac.be, andrei.popescu-belis@idiap.ch
Abstract
Information access in meeting recordings can be assisted by
meeting browsers, or can be fully automated following a
question-answering (QA) approach. An information access task
is defined, aiming at discriminating true vs. false parallel state-
ments about facts in meetings. An automatic QA algorithm is
applied to this task, using passage retrieval over a meeting tran-
script. The algorithm scores 59% accuracy for passage retrieval,
while random guessing is below 1%, but only scores 60% on
combined retrieval and question discrimination, for which hu-
mans reach 70%–80% and the baseline is 50%. The algorithm
clearly outperforms humans for speed, at less than 1 second
per question, vs. 1.5–2 minutes per question for humans. The
degradation on ASR compared to manual transcripts still yields
lower but acceptable scores, especially for passage identifica-
tion. Automatic QA thus appears to be a promising enhance-
ment to meeting browsers used by humans, as an assistant for
relevant passage identification.
Index Terms: question answering, passage retrieval, meeting
browsers, evaluation, meeting recording
1. Introduction
Accessing the content of multimedia recordings remains a chal-
lenge for current systems despite significant progress in compo-
nent technologies, such as automatic speech recognition (ASR),
speaker diarization, or summarization. Technology progress
does not guarantee that meeting browsers using such compo-
nents will actually be more efficient in helping users to access
specific information in meeting recordings.
This paper focuses on tools that facilitate access to specific
bits of information from a meeting, as opposed to abstracting
information over an entire meeting. The paper describes an au-
tomatic question answering (QA) system designed to pass the
Browser Evaluation Test (BET) [1, 2], an evaluation method
originally intended for human users of a browser. The scores of
humans using meeting browsers on the BET task, in terms of
speed and precision, are compared with those of our automatic
QA system. The results demonstrate the utility of automatic QA
techniques as an assistant tool for meeting browsers.
The paper is organized as follows. Section 2 briefly de-
scribes the BET task and evaluation protocol. Section 3 de-
scribes the automatic QA system, which operates in two stages:
passage identification, followed by true/false statement discri-
mination. Section 4 gives the scores of our QA system in sev-
eral conditions, and compares them with those of humans us-
ing meeting browsers. This comparison shows that humans
still outperform the system in terms of precision, but are by far
∗Work performed while at Idiap Research Institute.
slower, and that the best use of the QA system would be as an
assistant for relevant passage identification.
2. The Browser Evaluation Test (BET)
The main quality aspects to be evaluated for interactive soft-
ware are often categorized as effectiveness – the extent to which
a browser helps users to accomplish their task, efficiency – the
speed with which the task is accomplished, and user satisfac-
tion. Systematic evaluation of QA systems started with the
TREC-8 QA task in 1999 [3], using a database of 1,337 ques-
tions from multiple sources, of which 200 were selected for the
campaign. At TREC 2003, the test set of questions contained
413 questions (3 types: factoid, list, definition) drawn from
AOL and MSNSearch logs [4]. The 2003 QA track had a pas-
sage retrieval task, which inspired also our approach. An eval-
uation task for interactive QA was also proposed at iCLEF, the
Interactive track of the Cross-Language Evaluation Forum [5];
systems-plus-humans were evaluated for accuracy over a large
set of questions defined by the experimenters.
The Browser Evaluation Test (BET) [1, 2] is a procedure
to collect browser-independent ‘questions’ about a meeting and
to use them for evaluating a browser’s capacity to help hu-
mans answering them. The questions are in fact pairs of par-
allel true/false statements which are constructed by neutral ‘ob-
servers’ that view a meeting, write down ‘observations of inter-
ests’ about the meeting, and then create the false counterpart of
each observation. Experimenters then gather similar observa-
tions into groups. The importance of the group is derived from
the observers’ own rating of importance and from the size of
each group. Examples of the first most important observations
(true/false statements) for IB4010, respectively IS1008c, are:
• “The group decided to show The Big Lebowski” vs.
“The group decided to show Saving Private Ryan”.
• “According to the manufacturers, the casing has to be
made out of wood” vs. “According to the manufacturers,
the casing has to be made out of rubber”.
Three meetings from the AMI Corpus [6], in English,
were selected for the observation collection procedure: IB4010,
IS1008c, and ISSCO-Meeting 024, resulting in 129, 58 and 158
final pairs of true/false observations. These figures are in the
same range as those from the TREC QA campaigns, though the
data set containing the answers is considerably smaller here.
To answer the BET questions, human subjects are required
to view the pairs of BET statements in sequence and to decide,
using a meeting browser, which statement from the pair is true,
and which one is false. The time allowed for each meeting is
typically half the duration of the meeting: that is, ca. 24 min. are
allowed for IB4010, and 13 min. for IS1008c. The performance
of subjects-plus-browsers is measured by the precision of the
answers, which is related to effectiveness, and by their speed,
which is related to efficiency.
The BET results of half a dozen meeting browsers are dis-
cussed in a technical report [7]. Average time per question
varies from about 1.5 minutes to 4 minutes (with no prior train-
ing); most subjects-plus-browsers require on average ca. 2 min-
utes per question. Precision – against a 50% baseline in the BET
application – generally stays in the 70%–80% range, with high-
est values reaching 90%. The standard deviations are somewhat
smaller for precision than for speed, but for both metrics indi-
vidual performance varies substantially.
3. Automatic BET Question Answering
We have designed a question answering system aimed at dis-
criminating between pairs of BET statements using manual or
automatic (ASR) meeting transcripts. The system has an ar-
chitecture that is inspired by current QA systems [8], with a
number of specificities due to the nature of the data and of the
task.
The system proceeds in three stages. The first stage is the
pre-processing of the pair of BET questions (true/false paral-
lel statements) and of the meeting transcript. The second stage
aims to identify separately for each of the questions the passage
of the transcript that is most likely to contain the answer to it,
using a complex score based on lexical similarity. The third
stage compares the two BET statements based on the paragraph
found for each question, and hypothesizes which one is true and
which one is false.
3.1. Lexical Pre-processing
The transcript and the questions are first prepared for the lexi-
cal matching procedure used in passage identification. Initially,
abbreviated words are converted into full forms (we’ve → we
have), numeric forms into text forms (34 → thirty four), upper
case letters into lower case ones, and punctuations and stop-
words are removed. Apart from frequent function words such
as prepositions, adverbs or conjunctions, the list of stopwords
includes many pronouns (personal, possessive, and demonstra-
tive) because BET statements are generally formulated using
indirect speech with third person pronouns, while the transcript
contains utterances in direct form, with first and second per-
son pronouns, leading to lexical mismatches if these are not re-
moved.
The remaining words from the BET question are lemma-
tized using WordNet, and stemmed using the Snowball imple-
mentation of Porter’s stemmer. The words from the meeting
transcript are processed in the same way, and the name of the
speaker of each utterance is included in the data, as names are
often quoted in the BET questions. In addition, to each word of
the transcript is associated a list of its synonyms from WordNet
that have the same part of speech (this is determined using the
QTag POS tagger). The reason synonyms are added only to the
transcript, and not to the BET questions, is that adding them for
both question and transcript words would significantly decrease
the predictive power of the lexical matching.
3.2. Passage Identification
Unlike many cases in QA evaluation in which the size of the
passage to be retrieved is fixed, here the goal is to retrieve a
passage from the transcript that most likely will help to discrim-
inate between the two BET statements, regardless of its size –
a very large passage is of course likely not to be helpful. For
that, a window of fixed size is moved over the entire transcript,
in steps of fixed size, and for each position a lexical similarity
score between the words in the window and the words of the
candidate BET question is computed. The window that yields
the highest score is the returned passage. The score is computed
from the number of words from the passage that match words
in the candidate question, as in [8]. However, this method is
particularized here for the BET task on meetings as follows:
1. If a word from the question matches the name of the
speaker of the passage, then it receives the highest pos-
sible score value (e.g., a value of 4.0).
2. If a word from the question matches a word from the
passage (in terms of lemmas), and this word is spoken
by a speaker mentioned in the question, then it receives
the second highest score value (e.g., 2.5).
3. Otherwise, if a word from the question matches a word
from the passage (lemmas), then it receives the “normal”
score value (e.g., 1.0).
4. If a word (lemma) from the question matches one of the
synonyms of a word from the passage, then it receives a
low score value (e.g., 0.5).
The numeric values listed above were set by the authors
based on intuition about the importance of each matching, and
might not be optimal for this task. No automatic optimization
(statistical learning) was attempted, because the amount of data
was insufficient for training and test.
The total score of the passage is the sum of the scores for
each matching word. If a word appears several times in the
question and/or in the passage, then it is only allowed to count a
number of matchings equal to the lowest of the two numbers of
occurrences of the word, by discarding pairs of matching words
as they are matched.
To find the best matching paragraph, the system must some-
times choose between paragraphs with the same score. In this
case, it applies a second time the method described above, but
using bigrams of words instead of individual words. If two para-
graphs still have the same score, trigrams of words are used, and
so on until a difference is found (or a random draw if not). A
possible heuristic that has yet to be explored is to favor match-
ing paragraphs that are toward the end of a meeting, as this is
the place where “important” ideas are more likely to appear.
3.3. Discrimination of True/False Statements
If a retrieved passage is indeed the discriminating one for the
true or the false candidate, then inspiration from work on entail-
ment could be used to find the true statement. The method used
here is simpler, and is based on the value of the matching score
computed above: the true statement is considered to be the one
with the highest matching score. If the best passage scores us-
ing word matching are equal for the two candidate statements,
then the position of the matched words in the text is used to
recompute the scores, favoring passages where matched words
are in the same order in both transcript and question. If scores
are still the same, bigrams of words are used, and so on.
3.4. Optimization of Parameters
Most of the parameters of the algorithm could be adjusted using
statistical learning (optimization) if enough data were available.
Given the relatively low number of BET questions available,
for two transcribed meetings, we only experimented with brute-
force optimization of the window size and window step in the
QA algorithm above.
Table 1: Accuracy of passage retrieval and of true/false statement discrimination for the two BET meetings. Standard deviation (stdev)
is computed using 5-fold cross-validation.
Passage retrieval Statement discrimination
IB4010 IS1008c IB4010 IS1008c
Condition Accuracy Stdev Accuracy Stdev Accuracy Stdev Accuracy Stdev
Random 0.0033 n/a 0.0078 n/a 0.50 n/a 0.50 n/a
Unigram matching 0.27 0.15 0.54 0.21 0.37 0.14 0.36 0.21
N-gram matching 0.32 0.15 0.50 0.19 0.43 0.17 0.42 0.11
N-gram matching + speakers 0.55 0.14 0.62 0.16 0.57 0.06 0.64 0.18
Quite early in the development of the algorithm, it appeared
useful to have a window size and step which are proportional
to the length of a question rather than fixed at the same value
for all questions. Using 5-fold cross-validation, all sizes from
1 to 13 times the length of a BET question for windows and
steps were tested to find optimal values (the evaluation metric
is stated in the next section, 4.1). The results point to a set
of optimal values rather than a unique value, and the lowest
ones for window size are: for IB4010, 10 times the question
size, with a step of 3 times the question size. For IS1008c,
these values are respectively 12 and 1, but a window size of 4
times the question length is nearly optimal too, and was selected
because a narrower window is more helpful for discrimination.
4. Results for Automatic BET QA
4.1. Evaluation Methods
In order to assess the correctness of a retrieved passage, this
is compared to a reference passage annotated by hand, which is
the minimal passage that is sufficient to discriminate a BET pair
of true/false statements. If the retrieved passage and the correct
one have a non-empty intersection (at least one word), then the
retrieved passage is considered to be correct.
To assess the correctness of discrimination (second process-
ing stage), it is of course sufficient to check whether the true
statement was correctly found or not.
There are 116 pairs of true/false statements for IB4010 and
50 pairs for IS1008c, because only those that were actually
shown to humans (after cleaning of the list by BET experi-
menters) were used in our experiments.
To better assess the scores that follow, we categorized and
counted BET questions of two types. The “simple” questions
are those that have an explicit answer in some passage, while the
“deductive” ones are those the require some reasoning, based on
one or more passages, in order to find the answer. Our labeling
showed that for IB4010, 74 BET questions out of 116 (64%) are
simple and the remaining 42 (36%) are deductive. For IS1008c,
46 questions out of 50 (92%) are simple and only 4 (8%) are
deductive. These figures provide an upper margin for the per-
formance of our system, which cannot be expected to answer
deductive questions at this stage, though it still can find the cor-
responding passages.
The baseline score for passage retrieval can be defined by
random draw, and is thus dependent on the size of the window
and its step. With the current values of these sizes, the likeli-
hoods of finding the correct passage by chance are less than 1%
for each meeting. The baseline score for true/false discrimina-
tion is 50%.
4.2. Results on Manual Transcripts
Results for the two processing stages of the automatic BET QA
algorithm are given in Table 1 above, for three variants of the
algorithm: (1) allowing only unigram matching when comput-
ing the similarity score, with no weighting of speaker-specific
words; (2) with N-gram matching and still no weighting; and
(3) with the additional weighting of matched words spoken by a
speaker mentioned in the question, as explained in Section 3.2
above, second bullet point.
The passage retrieval component obtains very good results
compared with the chances of randomly locating the correct
passage, with accuracy scores of 0.55±0.14 for IB4010 and
0.62±0.16 for IS1008c1
. These results also compare favorably
with overall human-plus-browser precision scores, shown to be
in the 70%–80% range, though they remain clearly below them.
As for speed, the automatic QA system is of course much faster
than humans-plus-browsers, requiring less than 1 second per
question (vs. 2–4 minutes).
When combined with the question discrimination, the per-
formance degrades at 0.57±0.06 accuracy for IB4010 and
0.64±0.18 for IS1008c (with respect to the 50% baseline). A
more informative comparison is done by noting that a perfect
discrimination module would still be clueless on passages that
are wrongly identified (45% and 38%), and so its score could
only reach at most 77.5% for IB4010 and 81% for IS1008c2
.
The fact that the actual scores are clearly lower than these the-
oretical values shows that the algorithm needs improvement for
this stage.
In order to answer the question: “do humans and our QA
system have difficulties on the same BET statements?”, a de-
tailed comparison, by question, of the automatic scores with
those of humans using the TQB browser [2] is shown in Ta-
ble 2. For humans using TQB, the tables show the scores of a
group of 14 people without training, and the scores of an equiv-
alent group after training on one meeting [2, 7]. The amount of
available data does not allow a full statistical study of the dif-
ferences between humans and the QA system based on correla-
tions, but the observation of the first eight questions for each
meeting seem to suggest some correlation between the what
humans and the QA system find difficult, e.g. question #1 for
IB4010, and questions #5 and #6 for IS1008c.
1Confidence intervals at the 95% level are obtained through 5-fold
cross validation.
2That is, if question discrimination was perfect, it would work on
the correctly retrieved passages, and would reach 50% accuracy on the
others, so the expected scores would be 0.55 ∗ 100% + 0.45 ∗ 50% =
77.5% for IB4010 and respectively 0.62∗100%+0.38∗50% = 81%
for IS1008c.
Table 2: Human (with TQB browser) vs. automatic results for
the first eight questions of the meetings (P: accuracy of passage
retrieval; D: accuracy of statement discrimination).
Question Precision of humans QA System
number no training with training P D
IB4010: 1 0.93 0.71 0 0
2 0.93 1.00 1 1
3 0.71 1.00 1 1
4 0.86 0.86 1 1
5 1.00 0.93 0 1
6 0.93 1.00 1 1
7 0.93 0.71 1 1
8 0.71 0.79 1 1
Average 0.88 0.88 0.75 0.88
IS1008c: 1 0.86 0.93 1 1
2 0.67 0.86 1 1
3 0.82 0.93 1 1
4 0.89 0.93 1 1
5 0.63 0.69 1 0
6 0.67 0.73 0 0
7 1.00 0.82 1 0
8 0.67 0.64 0 1
Average 0.77 0.81 0.75 0.63
4.3. Results on ASR and Summaries
The automatic BET QA system was also applied to the diarized
ASR from the same meetings which is available with the AMI
Corpus [6], as a sample of the output of the AMI ASR sys-
tem [9]. As expected, the scores decrease on ASR with respect
to the manual transcript, but remain in a similar range.
For IB4010, passage retrieval accuracy drops to 0.46±0.13
from 0.55±0.14, and discrimination accuracy drops to
0.52±0.09 from 0.57±0.06 (thus getting very close to the
baseline score of 50%). For IS1008c, passage retrieval drops
to 0.60±0.33 from 0.62±0.16, and discrimination drops to
0.56±0.19 from 0.64±0.18. Our QA system appears thus to be
robust with respect to the quality of the ASR. Especially pas-
sage retrieval scores remain at levels that are quite high above
the baseline: the system finds the right passage for more than
half of the BET questions.
We also assessed the degradation when automatic sum-
maries [10] over the ASR are used instead of the full ASR.
The goal is first to test the robustness of our system, but also
to assess summarization quality. We indeed believe that this
approach could provide an indirect measure of the quality of a
summary: the higher the quality of the summary, the lower the
degradation when shorter and shorter summaries are used. We
compared extractive summaries of various sizes with a random
extract and with a gold standard summary from the AMI Cor-
pus, but the initial results did not show significant differences
between the various conditions.
5. Conclusion and Future Work
This paper has described an automatic system for passing a QA-
based browser evaluation task, the BET, and has compared its
results with those of human subjects using meeting browsers.
Although significantly above the baseline, the performance of
automatic QA appeared to be quite below that of humans for
the overall precision on the entire task, i.e. the discrimination
of true vs. false statements. However, the meeting-specific QA
algorithm introduced here displays two clear advantages over
humans: it retrieves the correct passage for more than 50% of
the questions, in a very short time, smaller than 1 second per
question.
These results suggest that an improvement to meeting
browsers – at least for the fact-finding task – is to enhance them
with an automatic passage retrieval function, which points the
user from the start to a potentially relevant section of a meeting,
depending on the question that was formulated. If the section is
not relevant, then the user starts using the meeting browser as
before. But if the section is indeed relevant, as in more than half
of the cases, then the user can easily reason upon it (if needed)
to extract the answer, in a more reliable way than the method
proposed here. Future work on entailment could also help to
improve the automation of this second stage.
6. Acknowledgments
This work has been supported by the Swiss National Science
Foundation, through the IM2 National Center of Competence
in Research, and by the European IST Program, through the
AMIDA Integrated Project FP6-0033812. The first author was
supported by the AMI Training Program during his internship
at Idiap (Aug. 2008 – Jan. 2009).
7. References
[1] P. Wellner, M. Flynn, S. Tucker, and S. Whittaker, “A meeting
browser evaluation test,” in CHI 2005, Portland, OR, 2005, pp.
2021–2024.
[2] A. Popescu-Belis, P. Baudrion, M. Flynn, and P. Wellner, “To-
wards an objective test for meeting browsers: the BET4TQB pilot
experiment,” in Machine Learning for Multimodal Interaction IV,
ser. LNCS 4892, A. Popescu-Belis, H. Bourlard, and S. Renals,
Eds. Berlin: Springer, 2008, pp. 108–119.
[3] E. M. Voorhees, “The TREC question answering track,” Natural
Language Engineering, vol. 7, no. 4, pp. 361–378, 2001.
[4] ——, “Overview of the TREC 2003 question answering track,” in
TREC 2003 (12th Text REtrieval Conference), ser. NIST Special
Publication 500-255, Gaithersburg, MD, 2003, pp. 54–68.
[5] J. Gonzalo, P. Clough, and A. Vallin, “Overview of the CLEF
2005 interactive track,” in Accessing Multilingual Information
Repositories (CLEF 2005 Revised Selected Papers), ser. LNCS
4022, C. Peters et al., Eds. Berlin: Springer, 2006, pp. 251–262.
[6] J. Carletta et al., “The AMI Meeting Corpus: A pre-
announcement,” in Machine Learning for Multimodal Interaction
II, ser. LNCS 3869, S. Renals and S. Bengio, Eds. Berlin:
Springer, 2006, pp. 28–39.
[7] A. Popescu-Belis, “Comparing meeting browsers using a task-
based evaluation method,” Idiap Research Institute, Research Re-
port Idiap-RR-11-09, June 2009.
[8] M. Light, G. S. Mann, E. Riloff, and E. Breck, “Analyses for eluci-
dating current question answering technology,” Natural Language
Engineering, vol. 7, no. 4, pp. 325–342, 2001.
[9] T. Hain et al., “The development of the AMI system for the tran-
scription of speech in meetings,” in Machine Learning for Multi-
modal Interaction II, ser. LNCS 3869, S. Renals and S. Bengio,
Eds. Berlin/Heidelberg: Springer-Verlag, 2006, pp. 344–356.
[10] G. Murray, S. Renals, and J. Carletta, “Extractive summarization
of meeting recordings,” in Interspeech 2005, Lisbon, 2005, pp.
593–596.

Mais conteúdo relacionado

Mais procurados

IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word EmbeddingIRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word EmbeddingIRJET Journal
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Bhagyashree Deokar
 
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...Editor IJCATR
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2Saman Sara
 
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviewsiosrjce
 
Document Summarization
Document SummarizationDocument Summarization
Document SummarizationPratik Kumar
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysisBob Prieto
 
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
 
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET Journal
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
A scoring rubric for automatic short answer grading system
A scoring rubric for automatic short answer grading systemA scoring rubric for automatic short answer grading system
A scoring rubric for automatic short answer grading systemTELKOMNIKA JOURNAL
 
Lexical Analysis to Effectively Detect User's Opinion
Lexical Analysis to Effectively Detect User's Opinion   Lexical Analysis to Effectively Detect User's Opinion
Lexical Analysis to Effectively Detect User's Opinion dannyijwest
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276IJMER
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
 

Mais procurados (20)

Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word EmbeddingIRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviews
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
 
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
 
Text summarization
Text summarization Text summarization
Text summarization
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
IJET-V3I1P1
IJET-V3I1P1IJET-V3I1P1
IJET-V3I1P1
 
A scoring rubric for automatic short answer grading system
A scoring rubric for automatic short answer grading systemA scoring rubric for automatic short answer grading system
A scoring rubric for automatic short answer grading system
 
Lexical Analysis to Effectively Detect User's Opinion
Lexical Analysis to Effectively Detect User's Opinion   Lexical Analysis to Effectively Detect User's Opinion
Lexical Analysis to Effectively Detect User's Opinion
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 

Destaque

ExamView Item Banking step-by-step
ExamView Item Banking step-by-stepExamView Item Banking step-by-step
ExamView Item Banking step-by-stepMohamed Bahr
 
Top python interview question and answer
Top python interview question and answerTop python interview question and answer
Top python interview question and answerAnkita Singh
 
An Automatic Approach to Translate Use Cases to Sequence Diagrams
An Automatic Approach to Translate Use Cases to Sequence DiagramsAn Automatic Approach to Translate Use Cases to Sequence Diagrams
An Automatic Approach to Translate Use Cases to Sequence DiagramsMohammed Misbhauddin
 
Fuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmFuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmPintu Khan
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case DiagramAshesh R
 
Uses of Computers in Education
Uses of Computers in EducationUses of Computers in Education
Uses of Computers in EducationAllana Delgado
 
Computer Uses in different areas
Computer Uses in different areasComputer Uses in different areas
Computer Uses in different areasswatikakade
 
Application Of Computers
Application Of ComputersApplication Of Computers
Application Of ComputersLUZ PINGOL
 

Destaque (11)

ExamView Item Banking step-by-step
ExamView Item Banking step-by-stepExamView Item Banking step-by-step
ExamView Item Banking step-by-step
 
Top python interview question and answer
Top python interview question and answerTop python interview question and answer
Top python interview question and answer
 
An Automatic Approach to Translate Use Cases to Sequence Diagrams
An Automatic Approach to Translate Use Cases to Sequence DiagramsAn Automatic Approach to Translate Use Cases to Sequence Diagrams
An Automatic Approach to Translate Use Cases to Sequence Diagrams
 
Fuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmFuzzy Genetic Algorithm
Fuzzy Genetic Algorithm
 
Os Question Bank
Os Question BankOs Question Bank
Os Question Bank
 
Case tools
Case toolsCase tools
Case tools
 
Case tools
Case toolsCase tools
Case tools
 
Use Case Diagram
Use Case DiagramUse Case Diagram
Use Case Diagram
 
Uses of Computers in Education
Uses of Computers in EducationUses of Computers in Education
Uses of Computers in Education
 
Computer Uses in different areas
Computer Uses in different areasComputer Uses in different areas
Computer Uses in different areas
 
Application Of Computers
Application Of ComputersApplication Of Computers
Application Of Computers
 

Semelhante a Automatic vs. human question answering over multimedia meeting recordings

Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESKula Sekhar Reddy Yerraguntla
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyConceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyZac Darcy
 
Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le webAhmed Hammami
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationDarian Pruitt
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...ijcsit
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...AIRCC Publishing Corporation
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites ijseajournal
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET Journal
 
Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsijcsa
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingShakas Technologies
 
Online Examination and Evaluation System
Online Examination and Evaluation SystemOnline Examination and Evaluation System
Online Examination and Evaluation SystemIRJET Journal
 
An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...TELKOMNIKA JOURNAL
 
Co extracting opinion targets and opinion words from online reviews based on ...
Co extracting opinion targets and opinion words from online reviews based on ...Co extracting opinion targets and opinion words from online reviews based on ...
Co extracting opinion targets and opinion words from online reviews based on ...redpel dot com
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 

Semelhante a Automatic vs. human question answering over multimedia meeting recordings (20)

D15 1051
D15 1051D15 1051
D15 1051
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyConceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
 
Répondre à la question automatique avec le web
Répondre à la question automatique avec le webRépondre à la question automatique avec le web
Répondre à la question automatique avec le web
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
 
Novel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid QuestionsNovel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid Questions
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
 
Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systems
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
 
Online Examination and Evaluation System
Online Examination and Evaluation SystemOnline Examination and Evaluation System
Online Examination and Evaluation System
 
An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...An evolutionary optimization method for selecting features for speech emotion...
An evolutionary optimization method for selecting features for speech emotion...
 
Co extracting opinion targets and opinion words from online reviews based on ...
Co extracting opinion targets and opinion words from online reviews based on ...Co extracting opinion targets and opinion words from online reviews based on ...
Co extracting opinion targets and opinion words from online reviews based on ...
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 

Mais de Lê Anh

Spark docker
Spark dockerSpark docker
Spark dockerLê Anh
 
Presentation des outils traitements distribues
Presentation des outils traitements distribuesPresentation des outils traitements distribues
Presentation des outils traitements distribuesLê Anh
 
Lap trinh java hieu qua
Lap trinh java hieu quaLap trinh java hieu qua
Lap trinh java hieu quaLê Anh
 
Cahier de charges
Cahier de chargesCahier de charges
Cahier de chargesLê Anh
 
Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Lê Anh
 
Lequocanh
LequocanhLequocanh
LequocanhLê Anh
 
These lequocanh v7
These lequocanh v7These lequocanh v7
These lequocanh v7Lê Anh
 
Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Lê Anh
 
Poster WACAI 2012
Poster WACAI 2012Poster WACAI 2012
Poster WACAI 2012Lê Anh
 
ICMI 2012 Workshop on gesture and speech production
ICMI 2012 Workshop on gesture and speech productionICMI 2012 Workshop on gesture and speech production
ICMI 2012 Workshop on gesture and speech productionLê Anh
 
Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lê Anh
 
IEEE Humanoids 2011
IEEE Humanoids 2011IEEE Humanoids 2011
IEEE Humanoids 2011Lê Anh
 
ACII 2011, USA
ACII 2011, USAACII 2011, USA
ACII 2011, USALê Anh
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012Lê Anh
 
Mid-term thesis report
Mid-term thesis reportMid-term thesis report
Mid-term thesis reportLê Anh
 
Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Lê Anh
 
Nao Tech Day
Nao Tech DayNao Tech Day
Nao Tech DayLê Anh
 
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotJournée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotLê Anh
 
Người Ảo
Người ẢoNgười Ảo
Người ẢoLê Anh
 

Mais de Lê Anh (19)

Spark docker
Spark dockerSpark docker
Spark docker
 
Presentation des outils traitements distribues
Presentation des outils traitements distribuesPresentation des outils traitements distribues
Presentation des outils traitements distribues
 
Lap trinh java hieu qua
Lap trinh java hieu quaLap trinh java hieu qua
Lap trinh java hieu qua
 
Cahier de charges
Cahier de chargesCahier de charges
Cahier de charges
 
Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013
 
Lequocanh
LequocanhLequocanh
Lequocanh
 
These lequocanh v7
These lequocanh v7These lequocanh v7
These lequocanh v7
 
Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam
 
Poster WACAI 2012
Poster WACAI 2012Poster WACAI 2012
Poster WACAI 2012
 
ICMI 2012 Workshop on gesture and speech production
ICMI 2012 Workshop on gesture and speech productionICMI 2012 Workshop on gesture and speech production
ICMI 2012 Workshop on gesture and speech production
 
Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)
 
IEEE Humanoids 2011
IEEE Humanoids 2011IEEE Humanoids 2011
IEEE Humanoids 2011
 
ACII 2011, USA
ACII 2011, USAACII 2011, USA
ACII 2011, USA
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012
 
Mid-term thesis report
Mid-term thesis reportMid-term thesis report
Mid-term thesis report
 
Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)
 
Nao Tech Day
Nao Tech DayNao Tech Day
Nao Tech Day
 
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotJournée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
 
Người Ảo
Người ẢoNgười Ảo
Người Ảo
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Automatic vs. human question answering over multimedia meeting recordings

  • 1. Automatic vs. human question answering over multimedia meeting recordings Quoc Anh Le1,∗ , Andrei Popescu-Belis2 1 University of Namur, Belgium 2 Idiap Research Institute, Martigny, Switzerland lequocanh@student.fundp.ac.be, andrei.popescu-belis@idiap.ch Abstract Information access in meeting recordings can be assisted by meeting browsers, or can be fully automated following a question-answering (QA) approach. An information access task is defined, aiming at discriminating true vs. false parallel state- ments about facts in meetings. An automatic QA algorithm is applied to this task, using passage retrieval over a meeting tran- script. The algorithm scores 59% accuracy for passage retrieval, while random guessing is below 1%, but only scores 60% on combined retrieval and question discrimination, for which hu- mans reach 70%–80% and the baseline is 50%. The algorithm clearly outperforms humans for speed, at less than 1 second per question, vs. 1.5–2 minutes per question for humans. The degradation on ASR compared to manual transcripts still yields lower but acceptable scores, especially for passage identifica- tion. Automatic QA thus appears to be a promising enhance- ment to meeting browsers used by humans, as an assistant for relevant passage identification. Index Terms: question answering, passage retrieval, meeting browsers, evaluation, meeting recording 1. Introduction Accessing the content of multimedia recordings remains a chal- lenge for current systems despite significant progress in compo- nent technologies, such as automatic speech recognition (ASR), speaker diarization, or summarization. Technology progress does not guarantee that meeting browsers using such compo- nents will actually be more efficient in helping users to access specific information in meeting recordings. This paper focuses on tools that facilitate access to specific bits of information from a meeting, as opposed to abstracting information over an entire meeting. The paper describes an au- tomatic question answering (QA) system designed to pass the Browser Evaluation Test (BET) [1, 2], an evaluation method originally intended for human users of a browser. The scores of humans using meeting browsers on the BET task, in terms of speed and precision, are compared with those of our automatic QA system. The results demonstrate the utility of automatic QA techniques as an assistant tool for meeting browsers. The paper is organized as follows. Section 2 briefly de- scribes the BET task and evaluation protocol. Section 3 de- scribes the automatic QA system, which operates in two stages: passage identification, followed by true/false statement discri- mination. Section 4 gives the scores of our QA system in sev- eral conditions, and compares them with those of humans us- ing meeting browsers. This comparison shows that humans still outperform the system in terms of precision, but are by far ∗Work performed while at Idiap Research Institute. slower, and that the best use of the QA system would be as an assistant for relevant passage identification. 2. The Browser Evaluation Test (BET) The main quality aspects to be evaluated for interactive soft- ware are often categorized as effectiveness – the extent to which a browser helps users to accomplish their task, efficiency – the speed with which the task is accomplished, and user satisfac- tion. Systematic evaluation of QA systems started with the TREC-8 QA task in 1999 [3], using a database of 1,337 ques- tions from multiple sources, of which 200 were selected for the campaign. At TREC 2003, the test set of questions contained 413 questions (3 types: factoid, list, definition) drawn from AOL and MSNSearch logs [4]. The 2003 QA track had a pas- sage retrieval task, which inspired also our approach. An eval- uation task for interactive QA was also proposed at iCLEF, the Interactive track of the Cross-Language Evaluation Forum [5]; systems-plus-humans were evaluated for accuracy over a large set of questions defined by the experimenters. The Browser Evaluation Test (BET) [1, 2] is a procedure to collect browser-independent ‘questions’ about a meeting and to use them for evaluating a browser’s capacity to help hu- mans answering them. The questions are in fact pairs of par- allel true/false statements which are constructed by neutral ‘ob- servers’ that view a meeting, write down ‘observations of inter- ests’ about the meeting, and then create the false counterpart of each observation. Experimenters then gather similar observa- tions into groups. The importance of the group is derived from the observers’ own rating of importance and from the size of each group. Examples of the first most important observations (true/false statements) for IB4010, respectively IS1008c, are: • “The group decided to show The Big Lebowski” vs. “The group decided to show Saving Private Ryan”. • “According to the manufacturers, the casing has to be made out of wood” vs. “According to the manufacturers, the casing has to be made out of rubber”. Three meetings from the AMI Corpus [6], in English, were selected for the observation collection procedure: IB4010, IS1008c, and ISSCO-Meeting 024, resulting in 129, 58 and 158 final pairs of true/false observations. These figures are in the same range as those from the TREC QA campaigns, though the data set containing the answers is considerably smaller here. To answer the BET questions, human subjects are required to view the pairs of BET statements in sequence and to decide, using a meeting browser, which statement from the pair is true, and which one is false. The time allowed for each meeting is typically half the duration of the meeting: that is, ca. 24 min. are allowed for IB4010, and 13 min. for IS1008c. The performance of subjects-plus-browsers is measured by the precision of the
  • 2. answers, which is related to effectiveness, and by their speed, which is related to efficiency. The BET results of half a dozen meeting browsers are dis- cussed in a technical report [7]. Average time per question varies from about 1.5 minutes to 4 minutes (with no prior train- ing); most subjects-plus-browsers require on average ca. 2 min- utes per question. Precision – against a 50% baseline in the BET application – generally stays in the 70%–80% range, with high- est values reaching 90%. The standard deviations are somewhat smaller for precision than for speed, but for both metrics indi- vidual performance varies substantially. 3. Automatic BET Question Answering We have designed a question answering system aimed at dis- criminating between pairs of BET statements using manual or automatic (ASR) meeting transcripts. The system has an ar- chitecture that is inspired by current QA systems [8], with a number of specificities due to the nature of the data and of the task. The system proceeds in three stages. The first stage is the pre-processing of the pair of BET questions (true/false paral- lel statements) and of the meeting transcript. The second stage aims to identify separately for each of the questions the passage of the transcript that is most likely to contain the answer to it, using a complex score based on lexical similarity. The third stage compares the two BET statements based on the paragraph found for each question, and hypothesizes which one is true and which one is false. 3.1. Lexical Pre-processing The transcript and the questions are first prepared for the lexi- cal matching procedure used in passage identification. Initially, abbreviated words are converted into full forms (we’ve → we have), numeric forms into text forms (34 → thirty four), upper case letters into lower case ones, and punctuations and stop- words are removed. Apart from frequent function words such as prepositions, adverbs or conjunctions, the list of stopwords includes many pronouns (personal, possessive, and demonstra- tive) because BET statements are generally formulated using indirect speech with third person pronouns, while the transcript contains utterances in direct form, with first and second per- son pronouns, leading to lexical mismatches if these are not re- moved. The remaining words from the BET question are lemma- tized using WordNet, and stemmed using the Snowball imple- mentation of Porter’s stemmer. The words from the meeting transcript are processed in the same way, and the name of the speaker of each utterance is included in the data, as names are often quoted in the BET questions. In addition, to each word of the transcript is associated a list of its synonyms from WordNet that have the same part of speech (this is determined using the QTag POS tagger). The reason synonyms are added only to the transcript, and not to the BET questions, is that adding them for both question and transcript words would significantly decrease the predictive power of the lexical matching. 3.2. Passage Identification Unlike many cases in QA evaluation in which the size of the passage to be retrieved is fixed, here the goal is to retrieve a passage from the transcript that most likely will help to discrim- inate between the two BET statements, regardless of its size – a very large passage is of course likely not to be helpful. For that, a window of fixed size is moved over the entire transcript, in steps of fixed size, and for each position a lexical similarity score between the words in the window and the words of the candidate BET question is computed. The window that yields the highest score is the returned passage. The score is computed from the number of words from the passage that match words in the candidate question, as in [8]. However, this method is particularized here for the BET task on meetings as follows: 1. If a word from the question matches the name of the speaker of the passage, then it receives the highest pos- sible score value (e.g., a value of 4.0). 2. If a word from the question matches a word from the passage (in terms of lemmas), and this word is spoken by a speaker mentioned in the question, then it receives the second highest score value (e.g., 2.5). 3. Otherwise, if a word from the question matches a word from the passage (lemmas), then it receives the “normal” score value (e.g., 1.0). 4. If a word (lemma) from the question matches one of the synonyms of a word from the passage, then it receives a low score value (e.g., 0.5). The numeric values listed above were set by the authors based on intuition about the importance of each matching, and might not be optimal for this task. No automatic optimization (statistical learning) was attempted, because the amount of data was insufficient for training and test. The total score of the passage is the sum of the scores for each matching word. If a word appears several times in the question and/or in the passage, then it is only allowed to count a number of matchings equal to the lowest of the two numbers of occurrences of the word, by discarding pairs of matching words as they are matched. To find the best matching paragraph, the system must some- times choose between paragraphs with the same score. In this case, it applies a second time the method described above, but using bigrams of words instead of individual words. If two para- graphs still have the same score, trigrams of words are used, and so on until a difference is found (or a random draw if not). A possible heuristic that has yet to be explored is to favor match- ing paragraphs that are toward the end of a meeting, as this is the place where “important” ideas are more likely to appear. 3.3. Discrimination of True/False Statements If a retrieved passage is indeed the discriminating one for the true or the false candidate, then inspiration from work on entail- ment could be used to find the true statement. The method used here is simpler, and is based on the value of the matching score computed above: the true statement is considered to be the one with the highest matching score. If the best passage scores us- ing word matching are equal for the two candidate statements, then the position of the matched words in the text is used to recompute the scores, favoring passages where matched words are in the same order in both transcript and question. If scores are still the same, bigrams of words are used, and so on. 3.4. Optimization of Parameters Most of the parameters of the algorithm could be adjusted using statistical learning (optimization) if enough data were available. Given the relatively low number of BET questions available, for two transcribed meetings, we only experimented with brute- force optimization of the window size and window step in the QA algorithm above.
  • 3. Table 1: Accuracy of passage retrieval and of true/false statement discrimination for the two BET meetings. Standard deviation (stdev) is computed using 5-fold cross-validation. Passage retrieval Statement discrimination IB4010 IS1008c IB4010 IS1008c Condition Accuracy Stdev Accuracy Stdev Accuracy Stdev Accuracy Stdev Random 0.0033 n/a 0.0078 n/a 0.50 n/a 0.50 n/a Unigram matching 0.27 0.15 0.54 0.21 0.37 0.14 0.36 0.21 N-gram matching 0.32 0.15 0.50 0.19 0.43 0.17 0.42 0.11 N-gram matching + speakers 0.55 0.14 0.62 0.16 0.57 0.06 0.64 0.18 Quite early in the development of the algorithm, it appeared useful to have a window size and step which are proportional to the length of a question rather than fixed at the same value for all questions. Using 5-fold cross-validation, all sizes from 1 to 13 times the length of a BET question for windows and steps were tested to find optimal values (the evaluation metric is stated in the next section, 4.1). The results point to a set of optimal values rather than a unique value, and the lowest ones for window size are: for IB4010, 10 times the question size, with a step of 3 times the question size. For IS1008c, these values are respectively 12 and 1, but a window size of 4 times the question length is nearly optimal too, and was selected because a narrower window is more helpful for discrimination. 4. Results for Automatic BET QA 4.1. Evaluation Methods In order to assess the correctness of a retrieved passage, this is compared to a reference passage annotated by hand, which is the minimal passage that is sufficient to discriminate a BET pair of true/false statements. If the retrieved passage and the correct one have a non-empty intersection (at least one word), then the retrieved passage is considered to be correct. To assess the correctness of discrimination (second process- ing stage), it is of course sufficient to check whether the true statement was correctly found or not. There are 116 pairs of true/false statements for IB4010 and 50 pairs for IS1008c, because only those that were actually shown to humans (after cleaning of the list by BET experi- menters) were used in our experiments. To better assess the scores that follow, we categorized and counted BET questions of two types. The “simple” questions are those that have an explicit answer in some passage, while the “deductive” ones are those the require some reasoning, based on one or more passages, in order to find the answer. Our labeling showed that for IB4010, 74 BET questions out of 116 (64%) are simple and the remaining 42 (36%) are deductive. For IS1008c, 46 questions out of 50 (92%) are simple and only 4 (8%) are deductive. These figures provide an upper margin for the per- formance of our system, which cannot be expected to answer deductive questions at this stage, though it still can find the cor- responding passages. The baseline score for passage retrieval can be defined by random draw, and is thus dependent on the size of the window and its step. With the current values of these sizes, the likeli- hoods of finding the correct passage by chance are less than 1% for each meeting. The baseline score for true/false discrimina- tion is 50%. 4.2. Results on Manual Transcripts Results for the two processing stages of the automatic BET QA algorithm are given in Table 1 above, for three variants of the algorithm: (1) allowing only unigram matching when comput- ing the similarity score, with no weighting of speaker-specific words; (2) with N-gram matching and still no weighting; and (3) with the additional weighting of matched words spoken by a speaker mentioned in the question, as explained in Section 3.2 above, second bullet point. The passage retrieval component obtains very good results compared with the chances of randomly locating the correct passage, with accuracy scores of 0.55±0.14 for IB4010 and 0.62±0.16 for IS1008c1 . These results also compare favorably with overall human-plus-browser precision scores, shown to be in the 70%–80% range, though they remain clearly below them. As for speed, the automatic QA system is of course much faster than humans-plus-browsers, requiring less than 1 second per question (vs. 2–4 minutes). When combined with the question discrimination, the per- formance degrades at 0.57±0.06 accuracy for IB4010 and 0.64±0.18 for IS1008c (with respect to the 50% baseline). A more informative comparison is done by noting that a perfect discrimination module would still be clueless on passages that are wrongly identified (45% and 38%), and so its score could only reach at most 77.5% for IB4010 and 81% for IS1008c2 . The fact that the actual scores are clearly lower than these the- oretical values shows that the algorithm needs improvement for this stage. In order to answer the question: “do humans and our QA system have difficulties on the same BET statements?”, a de- tailed comparison, by question, of the automatic scores with those of humans using the TQB browser [2] is shown in Ta- ble 2. For humans using TQB, the tables show the scores of a group of 14 people without training, and the scores of an equiv- alent group after training on one meeting [2, 7]. The amount of available data does not allow a full statistical study of the dif- ferences between humans and the QA system based on correla- tions, but the observation of the first eight questions for each meeting seem to suggest some correlation between the what humans and the QA system find difficult, e.g. question #1 for IB4010, and questions #5 and #6 for IS1008c. 1Confidence intervals at the 95% level are obtained through 5-fold cross validation. 2That is, if question discrimination was perfect, it would work on the correctly retrieved passages, and would reach 50% accuracy on the others, so the expected scores would be 0.55 ∗ 100% + 0.45 ∗ 50% = 77.5% for IB4010 and respectively 0.62∗100%+0.38∗50% = 81% for IS1008c.
  • 4. Table 2: Human (with TQB browser) vs. automatic results for the first eight questions of the meetings (P: accuracy of passage retrieval; D: accuracy of statement discrimination). Question Precision of humans QA System number no training with training P D IB4010: 1 0.93 0.71 0 0 2 0.93 1.00 1 1 3 0.71 1.00 1 1 4 0.86 0.86 1 1 5 1.00 0.93 0 1 6 0.93 1.00 1 1 7 0.93 0.71 1 1 8 0.71 0.79 1 1 Average 0.88 0.88 0.75 0.88 IS1008c: 1 0.86 0.93 1 1 2 0.67 0.86 1 1 3 0.82 0.93 1 1 4 0.89 0.93 1 1 5 0.63 0.69 1 0 6 0.67 0.73 0 0 7 1.00 0.82 1 0 8 0.67 0.64 0 1 Average 0.77 0.81 0.75 0.63 4.3. Results on ASR and Summaries The automatic BET QA system was also applied to the diarized ASR from the same meetings which is available with the AMI Corpus [6], as a sample of the output of the AMI ASR sys- tem [9]. As expected, the scores decrease on ASR with respect to the manual transcript, but remain in a similar range. For IB4010, passage retrieval accuracy drops to 0.46±0.13 from 0.55±0.14, and discrimination accuracy drops to 0.52±0.09 from 0.57±0.06 (thus getting very close to the baseline score of 50%). For IS1008c, passage retrieval drops to 0.60±0.33 from 0.62±0.16, and discrimination drops to 0.56±0.19 from 0.64±0.18. Our QA system appears thus to be robust with respect to the quality of the ASR. Especially pas- sage retrieval scores remain at levels that are quite high above the baseline: the system finds the right passage for more than half of the BET questions. We also assessed the degradation when automatic sum- maries [10] over the ASR are used instead of the full ASR. The goal is first to test the robustness of our system, but also to assess summarization quality. We indeed believe that this approach could provide an indirect measure of the quality of a summary: the higher the quality of the summary, the lower the degradation when shorter and shorter summaries are used. We compared extractive summaries of various sizes with a random extract and with a gold standard summary from the AMI Cor- pus, but the initial results did not show significant differences between the various conditions. 5. Conclusion and Future Work This paper has described an automatic system for passing a QA- based browser evaluation task, the BET, and has compared its results with those of human subjects using meeting browsers. Although significantly above the baseline, the performance of automatic QA appeared to be quite below that of humans for the overall precision on the entire task, i.e. the discrimination of true vs. false statements. However, the meeting-specific QA algorithm introduced here displays two clear advantages over humans: it retrieves the correct passage for more than 50% of the questions, in a very short time, smaller than 1 second per question. These results suggest that an improvement to meeting browsers – at least for the fact-finding task – is to enhance them with an automatic passage retrieval function, which points the user from the start to a potentially relevant section of a meeting, depending on the question that was formulated. If the section is not relevant, then the user starts using the meeting browser as before. But if the section is indeed relevant, as in more than half of the cases, then the user can easily reason upon it (if needed) to extract the answer, in a more reliable way than the method proposed here. Future work on entailment could also help to improve the automation of this second stage. 6. Acknowledgments This work has been supported by the Swiss National Science Foundation, through the IM2 National Center of Competence in Research, and by the European IST Program, through the AMIDA Integrated Project FP6-0033812. The first author was supported by the AMI Training Program during his internship at Idiap (Aug. 2008 – Jan. 2009). 7. References [1] P. Wellner, M. Flynn, S. Tucker, and S. Whittaker, “A meeting browser evaluation test,” in CHI 2005, Portland, OR, 2005, pp. 2021–2024. [2] A. Popescu-Belis, P. Baudrion, M. Flynn, and P. Wellner, “To- wards an objective test for meeting browsers: the BET4TQB pilot experiment,” in Machine Learning for Multimodal Interaction IV, ser. LNCS 4892, A. Popescu-Belis, H. Bourlard, and S. Renals, Eds. Berlin: Springer, 2008, pp. 108–119. [3] E. M. Voorhees, “The TREC question answering track,” Natural Language Engineering, vol. 7, no. 4, pp. 361–378, 2001. [4] ——, “Overview of the TREC 2003 question answering track,” in TREC 2003 (12th Text REtrieval Conference), ser. NIST Special Publication 500-255, Gaithersburg, MD, 2003, pp. 54–68. [5] J. Gonzalo, P. Clough, and A. Vallin, “Overview of the CLEF 2005 interactive track,” in Accessing Multilingual Information Repositories (CLEF 2005 Revised Selected Papers), ser. LNCS 4022, C. Peters et al., Eds. Berlin: Springer, 2006, pp. 251–262. [6] J. Carletta et al., “The AMI Meeting Corpus: A pre- announcement,” in Machine Learning for Multimodal Interaction II, ser. LNCS 3869, S. Renals and S. Bengio, Eds. Berlin: Springer, 2006, pp. 28–39. [7] A. Popescu-Belis, “Comparing meeting browsers using a task- based evaluation method,” Idiap Research Institute, Research Re- port Idiap-RR-11-09, June 2009. [8] M. Light, G. S. Mann, E. Riloff, and E. Breck, “Analyses for eluci- dating current question answering technology,” Natural Language Engineering, vol. 7, no. 4, pp. 325–342, 2001. [9] T. Hain et al., “The development of the AMI system for the tran- scription of speech in meetings,” in Machine Learning for Multi- modal Interaction II, ser. LNCS 3869, S. Renals and S. Bengio, Eds. Berlin/Heidelberg: Springer-Verlag, 2006, pp. 344–356. [10] G. Murray, S. Renals, and J. Carletta, “Extractive summarization of meeting recordings,” in Interspeech 2005, Lisbon, 2005, pp. 593–596.