Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Speaker: Yun-Nung Chen 陳縕儂
Advisor: Prof. Lin-Shan Lee 李琳山
National Taiwan University
Automatic Key Term Extraction and Summarization
from Spoken Course Lectures
課程錄音之自動關鍵用語擷取及摘要

Introduction
2
Master Defense, National Taiwan University
Target: extract key terms and summaries from course lectures

Key Term Summary
O Indexing and retrieval
O The relations between key
terms and segments of
documents
3
Introduction
O Efficiently understand the
document
Related to document understanding and semantics from the document
Both are “Information Extraction”

4

Definition
O Key Term
O Higher term frequency
O Core content
O Two types
O Keyword
O Ex. “語音”
O Key phrase
O Ex. “語言模型”
5

Automatic Key Term Extraction
6
▼ Original spoken documents
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans

7
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans

8
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans

Phrase
Identification
9
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
First using branching entropy to identify phrases
ASR trans

Phrase
Identification
Key Term Extraction
10
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Then using learning methods to extract key terms by some features
ASR trans

Phrase
Identification
Key Term Extraction
11
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
ASR trans

Branching Entropy
12
O Inside the phrase
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
13
O Inside the phrase
O Inside the phrase
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
14
hidden Markov model
boundary
Define branching entropy to decide possible boundary
represent
is
can
:
:
is
of
in
:
:
O Inside the phrase
O Inside the phrase
O Boundary of the phrase

Branching Entropy
15
hidden Markov model
O Definition of Right Branching Entropy
O Probability of xi given X
O Right branching entropy for X
X xi
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
16
hidden Markov model
O Decision of Right Boundary
O Find the right boundary located between X and xi where
X
boundary
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
17
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
18
hidden Markov model
represent
is
can
:
:
is
of
in
:
:

Branching Entropy
19
hidden Markov model
represent
is
can
:
:
is
of
in
:
:
boundary
Using PAT tree to implement

Phrase
Identification
Key Term Extraction
20
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans

Feature Extraction
21
OProsodic features
O For each candidate term appearing at the first time
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for
duration of the term
duration of phone “a” normalized by
avg duration of phone “a”

Feature Extraction
22
OProsodic features
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration

Feature Extraction
23
OProsodic features
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0

Feature Extraction
24
OProsodic features
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0

Feature Extraction
25
OProsodic features
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
Pitch
(I - IV)
F0
Energy
(I - IV)
energy

Feature Extraction
26
OLexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term

Feature Extraction
27
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Probability
Key terms tend to focus on limited topics
t1
t2
tj
tn
D1
D2
Di
DN
TK
Tk
T2
T1
P(T |D )k i
P(t |T )j k
Di: documents Tk: latent topics tj: terms

Feature Extraction
28
OSemantic features
O Latent Topic Probability
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
describe a probability distribution

Feature Extraction
29
OSemantic features
O Latent Topic Significance
Within-topic to out-of-topic ratio
non-key term
key term
within-topic freq.
out-of-topic freq.

Feature Extraction
30
OSemantic features
O Latent Topic Significance
Within-topic to out-of-topic ratio
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
within-topic freq.
out-of-topic freq.

Feature Extraction
31
OSemantic features
O Latent Topic Entropy
non-key term
key term

Feature Extraction
32
OSemantic features
O Latent Topic Entropy
LTE term entropy for latent topic
non-key term
key term
Higher LTE
Lower LTE

Phrase
Identification
Key Term Extraction
33
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans
Key terms
entropy
acoustic
model
:
Using supervised approaches to extract key terms

Learning Methods
34
OAdaptive Boosting (AdaBoost)
ONeural Network
Automatically adjust the weights of features to train a classifier

35

Experiments
36
OCorpus
O NTU lecture corpus
O Mandarin Chinese embedded by English words
O Single speaker
O 45.2 hours
OASR System
O Bilingual AM with model adaptation [1]
O LM with adaptation using random forests [2]
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
[1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011.
[2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and
Random Forests,” Master Thesis, 2011.

Experiments
37
OReference Key Terms
O Annotations from 61 students who have taken the course
O If the an annotator labeled 150 key terms, he gave each
of them a score of 1/150 , but 0 to others
O Rank the terms by the sum of all scores given by all
annotators for each term
O Choose the top N terms form the list
O N is average number of key terms
O N = 154 key terms
O 59 key phrases and 95 keywords
OEvaluation
O 3-fold cross validation

0
10
20
30
40
50
60
Pr Lx Sm Pr+Lx Pr+Lx+Sm
Experiments
38
OFeature Effectiveness
O Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: Prosodic
Lx: Lexical
Sm: Semantic
F-measure

0
10
20
30
40
50
60
70
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural
Network
ASR
Manual
Experiments
39
OOverall Performance (Keywords & Key Phrases)
Baseline
Branching entropy performs well
F-measure
N-Gram
TFIDF
Branching Entropy
TFIDF
Branching Entropy
AdaBoost
Branching Entropy
Neural Network
key phrase
keyword
23.44
52.60
57.68
62.70

The performance of manual is slightly better than ASR
0
10
20
30
40
50
60
70
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural
Network
ASR
Manual
Experiments
40
OOverall Performance (Keywords & Key Phrases)
55.84
Baseline
62.39
67.31
32.19
Supervised learning using neural network gives the best results
F-measure
N-Gram
TFIDF
Branching Entropy
TFIDF
Branching Entropy
AdaBoost
Branching Entropy
Neural Network
key phrase
keyword
23.44
52.60
57.68
62.70

41

Introduction
42
OExtractive Summary
O Important sentences in the document
OComputing Importance of Sentences
O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram
Score, Grammatical Structure Score
ORanking Sentences by Importance and Deciding
Ratio of Summary
Proposed a better statistical measure of a term

Statistical Measure of a Term
43
OLTE-Based Statistical Measure (Baseline)
OKey-Term-Based Statistical Measure
O Considering only key terms
O Weighted by LTS of the term
Tk-1 Tk Tk+1… …
Key terms can represent core content of the document
Latent topic probability can be estimated more accurately
ti ϵ key
ti ϵ key

Importance of the Sentence
44
OOriginal Importance
O LTE-based statistical measure
O Key-term-based statistical measure
ONew Importance
O Considering original importance and similarity of other
sentences
Sentences similar to more sentences should get higher importance

Random Walk on a Graph
45
O Idea
O Sentences similar to more important
sentences should be more important
O Graph Construction
O Node: sentence in the document
O Edge: weighted by similarity between nodes
O Node Score
O Interpolating two scores
O Normalized original score of sentence Si
O Scores propagated from neighbors according to edge weight p(j, i)
Nodes connecting to more nodes with higher scores should get higher scores
score of Si in k-th iter.

46
OTopical Similarity between Sentences
O Edge weight sim(Si, Sj): (sentence i  sentence j)
O Latent topic probability of the sentence
O Using Latent Topic Significance
Sj t
LTS
Si
… …Tk Tk+1
tj
Tk-1
ti tk

47
OScores of Sentences
O Converged equation
O Matrix form
O Solution
dominate eigen vector of P’
OIntegrated with Original Importance

Automatic Summarization
48

Experiments
49
OSame Corpus and ASR System
O NTU lecture corpus
OReference Summaries
O Two human produced reference summaries for each document
O Ranking sentences from “the most important” to “of average importance”
OEvaluation Metric
O ROUGE-1, ROUGE-2, ROUGE-3
O ROUGE-L: Longest Common Subsequence (LCS)

Evaluation
50
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ASR
ROUGE-1
LTE Key
ROUGE-2 ROUGE-3 ROUGE-L
Key-term-based statistical measure is helpful

Evaluation
51
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Random walk can help the LTE-based statistical measure
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
Random walk can also help the key-term-based statistical measure
LTE LTE + RW
Key Key + RW
ASR
ASR
Topical similarity can compensate recognition errors

Evaluation
52
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ASR
LTE LTE + RW Key Key + RW
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
Manual
Key-term-based statistical measure and random walk using topical similarity
are useful for summarization

53

Automatic Key Term Extraction Automatic Summarization
• The performance can be improved
by
▫ Key-term-based statistical
measure
▫ Random walk with topical
similarity
 Compensating recognition errors
 Giving higher scores to sentences
topically similar to more
important sentences
 Considering all sentences in the
document
54
• The performance can be improved
by
▫ Identifying phrases by branching
entropy
▫ Prosodic, lexical, and semantic
features together
Conclusions

Published Papers:
[1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from
Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of
SLT, 2010.
[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by
Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of
InterSpeech, 2011.
55

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Semelhante a Automatic Key Term Extraction and Summarization from Spoken Course Lectures (20)

Mais de Yun-Nung (Vivian) Chen

Mais de Yun-Nung (Vivian) Chen (6)

Último

Último (20)

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Notas do Editor