Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification

Automatic Prediction of Evidence-based
Recommendations via Sentence-level Polarity
Classiﬁcation
Abeed Sarker1,2

Diego Moll´1,2
a

C´cile Paris1,2
e

Macquarie University1 and CSIRO ICT Centre2
Sydney, Australia

IJCNLP 2013, Nagoya, Japan

Sentence Polarity for Evidence Based Medicine

Feasibility Study

Automatic Polarity Classiﬁcation

Results

Contents

Feasibility Study
Results

EBM Sentence Polarity

Sarker, Moll´, Paris
a

2/24


Feasibility Study


Results

Evidence Based Medicine

http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/


a

3/24


Feasibility Study


Results

The Ultimate Goal


a

4/24


Feasibility Study


Results

Sentence Polarity for EBM
The Task
Given a context intervention, determine the polarity of a
sentence returned by an automatic summariser.

Q

IR

polarity
summarisers detectors
s11
+
doc1
+
s12
+
s21
doc2
−
−
s22
+
s31
doc3
s32

drug1, +
multisummariser


drug2, +
drug3, −

a

5/24


Feasibility Study


Results

Sentence Polarity in Context
Different contexts may determine different polarities

Sentence fragment
The present study demonstrated that the combination of
cimetidine with levamisole is more effective than cimetidine alone
and is a highly effective therapy ...

Polarities in Context
cimetidine with levamisole: recommended.
cimetidine alone: not recommended.


a

6/24


Feasibility Study


Results

Related Work
Related tasks
Sentiment analysis
Semantic orientation
Opinion mining
Subjectivity

Typical approaches use
statistical classiﬁers (e.g.
SVM) trained on bag-of-word
features.

Closely Related
Niu et al. (2005,2006) Polarity classiﬁcation of medical sentences
into four categories (positive, negative, neutral, no
outcome).
Our approach contemplates the possibility of the same sentence
having multiple polarities.

a

7/24


Feasibility Study


Results

Contents

Feasibility Study
Results


a

8/24


Feasibility Study


Results

Data and Annotation
Initial corpus
456 clinical questions sourced
from the Journal of Family
Practice.

Polarity annotations
589 sentences from 33
questions annotated.
Bottom-line answers.
Key sentences extracted
by QSpec summariser.

a

9/24


Feasibility Study


Results

Example of annotations
Question
What is the most eﬀective beta-blocker for heart failure?

Bottom-line answer
Three beta-blockers- carvedilol, metoprolol, and bisoprolol-reduce
mortality in chronic heart failure caused by left ventricular systolic
dysfunction, when used in addition to diuretics and angiotensin
converting enzyme (ACE) inhibitors.

Contextual polarities
carvedilol — recommended; metoprolol — recommended;
bisoprolol — recommended.

a

10/24


Feasibility Study


Results

Analysis I
Inter-annotator agreement (124 sentences)
Cohen Kappa: k = 0.85 (almost perfect agreement).

Agreement between annotated sentences and bottom-line
summaries
Interventions with positive polarity that are mentioned in the
bottom-line summary: 177.
Polarity agreement: 95.5%.


a

11/24


Feasibility Study


Results

Analysis II

But do we have enough interventions?
Out of 109 unique interventions listed in the bottom-line
summaries . . .
. . . 99 are listed in the annotated sentences.
Recall= 90.8%
If we ignore missing abstracts: Recall = 96.1%


a

12/24


Feasibility Study


Results

Contents

Feasibility Study
Results


a

13/24


Feasibility Study


Results

Approach
Train a statistical classiﬁer (SVM).
Input: context, sentence (may have sentence duplicates, each
with a diﬀerent context).
Output: the polarity.

Features
1. Word n-grams
2. Change Phrases
3. UMLS Semantic Types
4. Negations
5. PIBOSO Category

6. Synset Expansion
7. Context Windows
8. Dependency Chains
9. Other Features


a

14/24


Feasibility Study


Results

Description of Features I
1 Word n-grams
n = 1, 2
Lowercased, stop words removed, stemmed (Porter).
Context words (strings matching the provided contexts)
replaced with generic string ’ CONTEXT ’.
Disorder terms (UMLS semantic types) replaced with generic
string ’ DISORDER ’.


a

15/24


Feasibility Study


Results

Description of Features II
2 Change Phrases
Expanded Niu et al. (2005) groups of good, bad, more, less
words.
Features used: more-good, more-bad, less-good, less-bad.
Context window of 4 words.

3 UMLS semantic types
Used all UMLS semantic types as binary features.


a

16/24


Feasibility Study


Results

Description of Features III
4 Negations
Niu et al. 2005.
BioScope corpus.
NegEx.

5 PIBOSO categories
Population, Intervention, Background, Outcome, Study
design, Other.
Used Kim et al. (2011) classiﬁer.


a

17/24


Feasibility Study


Results

Description of Features IV
6 Synset Expansion
Use WordNet to expand synonyms.

7 Context Windows
Terms within 3-word boundaries around context-drug terms.
Terms before are appended ’BEFORE’ string.
Terms after are appended ’AFTER’ string.


a

18/24


Feasibility Study


Results

Description of Features V
8 Dependency chains
Used GDep parser.
For each intervention, follow dependencies using this rule:
1. Move up the dependency chain until we ﬁnd a verb or the root.
2. Move down the dependencies and collect all terms.

Terms collected are appended ’DEP’ string.

9 Other features
Context-intervention position.
Summary sentence position.
Presence of modals, comparatives, superlatives.

a

19/24


Feasibility Study


Results

Contents

Feasibility Study
Results


a

20/24


Feasibility Study


Results

Results with SVM Classiﬁer

Training: 85% of annotated data (2008 sentences).
Test: 15% of annotated data (354 sentences).
Feature sets
1,2,3,4 (Niu)
1–6
All (Niu)
All (Bioscope)
All (NegEx)

Accuracy
Value (%)
95% CI
76.0
78.5
83.9
84.7
84.5

71.2–80.4
73.8–82.8
79.7–87.6
80.5–88.9
80.2–88.1


Positive
0.58
0.64
0.71
0.74
0.73

F-score
Non-positive
0.83
0.85
0.89
0.89
0.89

a

21/24


Feasibility Study


Results

Impact of Training Size on Classiﬁcation Results

It seems that we will get
better results with more
data. . .


a

22/24


Feasibility Study


Results

Towards Generation of Bottom-line Recommendations

Used the 33 questions from our preliminary analysis.
Compared automatic polarities of interventions with manual
annotations of bottom-line summaries.

Results
Recall

Precision

F1

0.62

0.82

0.71

We might get better results with more training data.


a

23/24


Feasibility Study


Results

Conclusions
http://web.science.mq.edu.au/˜diego/medicalnlp/

There is strong agreement between polarity of interventions in
clinical abstracts and polarity in bottom-line summaries.
A SVM classifier with a range of features including context
features achieve better results than classifiers without context
features.
More training data will probably lead to better results.

Bottom-line conclusions
Polarity classification of abstract sentences may help EBM
summarisation.
More data are needed.

a

24/24

Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification

Semelhante a Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification (20)

Mais de Diego Molla-Aliod

Mais de Diego Molla-Aliod (11)

Último

Último (20)

Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification