Extraction of Drug-Drug Interactions from Biomedical Texts
1. Isabel Segura-Bedmar, Paloma Martínez, María Herrero-Zazo
Universidad Carlos III de Madrid, SPAIN
SemEval-2013 Task 9:
Extraction of Drug-Drug Interactions
from Biomedical Texts
2. Outline
2
Motivation
Previous Work: DDIExtraction 2011
New in DDIExtraction 2013
The DDI corpus
Tasks
Task 9.1: Drug Name Recognition and Classification
Taks 9.2: Drug-Drug Interaction Extraction
Conclusions
3. What is a Drug-Drug Interaction (DDI)?
3
Motivation
A DDI occurs when a drug influences the
level or the activity of another drug.
A DDI can be beneficial, but most times
DDIs are dangerous for patients and can
increase healthcare costs.
Medical literature is the most effective
source for the detection of DDIs.
4. Information Extraction
4
Motivation
We thank the team at the Humboldt-Universitaet zu Berlin for making available a visualization of the DDI corpus using Stav:
http://http://corpora.informatik.hu-berlin.de/, https://github.com/TsujiiLaboratory/stav
5. Previous Work: DDIExtraction 2011
5
Automatic extraction of drug-drug
interactions from texts.
Dataset: a collection of 579 documents
from DrugBank.
DDIs annotated by a pharmacist,
Drugs automatically annotated.
F1 ranged between 0.16 and 0.66.
Previous Work
6. New in SemEval Task 9
6
Task 9.1: Drug Name Recognition and
Classification.
Task 9.2: DDI Detection and Classification.
The DDI corpus:
double size: 1,025 annotated documents, 18,502
pharmacological substances and 5,028 DDIs.
Drugs and DDIs were manually annotated by two
pharmacists.
Available annotation guidelines and Inter-Annotator
agreement.
Two different text sources:
MedLine
DrugBank.
Motivation
7. Tasks
7
Task 9.1: Drug Name Recognition and
Classification.
Task 9.2: Drug-Drug Interaction Extraction
Tasks
8. Task 9.1 - Drug Classification
8
Tasks
drug type for generic drugs.
(Eg. Heparin, ibuprofen, methotrexate).
brand type for trade drugs.
(Eg. Espidifen, aspirin).
group type for groups of drugs.
(Eg. Analgesics, anticoagulants).
drug_n type for active substances not approved for
human use.
(Eg. Picrotoxin, heroin)
9. Task 9.1 - Teams
9
Team Affiliation Approach
LASIGE Lisbon University Conditional Random
Fields
UEM_UC3M European
University, Carlos III
University of Madrid
Ontology-based
approach
UMCC_DLSI Matanzas
University, Alicant
University
J48 classifier
Uturku Turku University SVM classifier
(TEES system)
WBI Humboldt University
of Berlin
Conditional Random
Fields
Tasks
10. Task 9.1 Evaluation
10
Recognition (regardless to the type):
Exact-boundary matching (EXACT).
Partial-boundary matching (PARTIAL).
Recognition and classification:
Exact-boundary + type matching
(STRICT).
Partial-boundary + type matching
(TYPE).
Tasks
11. Task 9.1- Overview of the results
11
Groups and substances not approved are
more difficult than drugs and brands:
brand names: short and unique.
generic names: no ambiguity because they
are simplified chemical names.
group names can be ambiguous (eg.
anticoagulant, anti-retroviral, etc)
group names: many variants and
abbreviations.
Tasks
12. Task 9.1- Overview of the results
12
Drug-n type was the most difficult
type:
very scarce in DrugBank (less1%).
less clearly defined in guidelines.
Systems are able to identify, but fail to
classify them.
Tasks
13. Tasks
13
Task 9.1: Drug Name Recognition and
Classification.
Task 9.2: Drug-Drug Interaction
Extraction
Tasks
14. 14
Gold annotations for drugs are provided to
teams both for training and test datasets.
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
15. 15
Gold annotations for drugs are provided to
teams both for training and test datasets.
Detect DDI and classify them
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
16. 16
Gold annotations for drugs are provided to
teams both for training and test datasets.
Detect DDI and classify them
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
EFFECT
EFFECT
MECHANISM
17. DDI Classification
17
Tasks
mechanism type for interactions describing the way the
interaction occurs.
Lansoprazole may decrease the absorption of enoxacin.
effect type for interactions describing the consequence of
the interaction.
Additive CNS depression may occur when antihistamines are
administered with barbiturates.
advice type for interactions describing a recommendation or
advice.
Patients taking isoniazid and disulfiram concomitantly should
closely monitored.
int type for mentions of interactions without any additional
information. Clopidogrel interacts with omeprazol.
18. Task 9.2 Teams
18
Team Affiliation Approach
FBK-irst FBK-irst, Italy Hybrid kernel + scope of
negations and semantic
roles
NIL_UCM Complutense University of
Madrid, Spain
SVM classifier
SCAI Fraunhofer SCAI,
Germany
SVM classifier
UC3M Carlos III University of
Madrid, Spain
Shallow Linguistic Kernel
UCOLORADO_SO
M
University of Colorado,
School of Medicine, USA
SMV classifier
Uturku Turku University, Finland SVM classifier (TEES
system)
UWM_TRIADS University of Wisconsin,
USA
Two-stage SVM
WBI_DDI Humboldt University of Ensemble of kernels
Tasks
20. Task 9.2- Overview of the results
20
Detection: significant improvement over 2011:
66% F1 (2011) vs . 82% F1 (2013)
In DrugBank:
Int DDI type is the most difficult (54% F1).
Mechanism, effect and advice types show
similar F1 (70%).
In MedLine, results for effect and mechanism
types are considerably lower due to the
complexity of sentences describing these DDIs.
Non-linear kernel-based methods overcome
linear SVMs.
Tasks
21. Conclusion
21
13 teams from 7 different countries.
In both tasks, the results on DrugBank are
considerably better than the ones on MedLine.
Best F1:
Task 9.1 Drug NERC Task 9.2 Extraction of DDIs
Recognitio
n
Recognition +
Classification
Detection Detection +
Classification
DrugBan
k
90% 87% 82% 53%
MedLine 80% 58% 67% 42%
22. Conclusion
22
13 teams from 7 different countries.
In both tasks, results on DrugBank
considerably better than the ones on
MedLine.
Task 9.1:
Best system (WBI): conditional random field
+ the training dataset extended with the
test dataset for task 9.2.
Most difficult: groups and drug-n.
Task 9.2:
There is much room to improve.
23. Future of the task
23
Include new types of texts:
prescription drug documents,
health records,
texts from social media about DDIs and
adverse event drugs.
No plans for annotating new documents.
Goal of the next DDIExtraction:
Create a silver standard DDI corpus.
To annotate effect, mechanism, drug dosages,
etc.
Similar to CALBC challenge.
24. Acknowledgments
24
This work was supported by the Regional
Government of Madrid under the Research Network
MA2VICMR [S2009/TIC-1542] and by the Spanish
Ministry of Education under the project
MULTIMEDICA [TIN2010-20644-C03-01].
To all participants for their efforts and to congratulate
them to their interesting work.
To the Uturku team who provided TEES analyses for
training and test datasets.
To the WBI team who made available a visualization
of the DDI corpus using Stav.
Hello, My name is Isabel Segura-Bedmar from the University Carlos the third of Madrid, I am here /hir/ with Paloma Martínez.
We have organized the task nine ABOUT the extraction /ek-traek-shon/ of drug-drug interactions /inter-aek-shons/ from biomedical texts /teksts/
Time: 0.15 min
Now I am going to describe /dis-kraib/ the task, the participating /parti-sipeiting/ sytems and their results /deir risalts/.
DEFINICIÓN- DANGEROUS- DATABASES.
I will start talking /’toking/ about what it’s a drug-drug interaction. An interaction occurs when a drug influences the level or the activity of another drug,
Unfortunately /an’fo:rchunali/, many interactions are very dangerous /’deingeras/
Clinicians use drug databases to avoid DDIs, however these databases are not comprehensive, because many interactions are only described /diskraibd/ in medical journals or in drug safety re’ports.
Time: 0.30 min.
INFORMATION EXTRACTION - EXAMPLE
We think that Information Extraction can help to improve the early /e:rli/ detection /di-tek-shon/ of drug interactions and to reduce /ri-duus/ time spent by healthcare proffessionals on reading all published(pablisht/ information about DDIs.
For example, in this sentences, an IE could identiy the drugs (marked in color) and then, extract their interactions (the pairs of drugs connected with the label the DDI label /leibol/
Time: 1 min
DDIEXTRACION 2011 – DDI CORPUS – DRUGS NO HUMAN REVIW – 10 TEMAS AND F1 RANGED
In 2011 /tuentiileven/, we organized /o:r’ga-naisd/ a first challenge /’chalinch/ to promote ‘re-search /’ri-sarch/ on this topic.
With this goal /goul/, We created /krietid/ a corpus annotated with DDIs by a pharmacist. However, drugs were annotated by the MetaMap tool without any man-ual /man-inual/ review /ri-viu/
10 teams participated /partisipeitid/ in the task and THE F1-meAsures ranged /reinchd/ between zero /zierou/ point one six and zero /zierou/ point six six.
Time: 1 min.
NEW TASK 9.1 – AND CLAS DDIS + DOUBLE SIZE, MANUALLY ANNOTATED, GUIDELINES AND IAA + MEDLINE IN ADITTION TO DRUGBNAK
Ok, and now what’s it new?
In this new challenge, we propose a new task, the recognition and classificaction of the pharmacological substances.
And the initial task, detection of DDIs, is extended to include the classification of DDIs.
Moreover, we have improved the DDI corpus. In particular, we have increase its size, almost double.
Also, In this new version, both drugs and DDIs are manually annotated by two different pharmacists.
We created annotation guidelines and also measured the inter-annotaror agreement.
The new corpus contains MedLine abstracts in addition to the DrugBank documents.
Time: 2 min
So, now I am going to describe the first task.
In this task, systems should be able to identify /aidentifai/ the drugs /sap’stansis/ and then, to classiy (klaesifai/ them with their right types.
Time: 0.15 sg.
In particular, we propose four different types of pharmacologcial substances:
Drug type for generic drugs, for example, ‘hep-arin
Brand type for commercial names, for example, asirin (/asprin/)
Group type for groups of drugs, for example, analgesics /anal-gisecs/
And drug_n for active /aktiv/ substances not approved /a-proud/ for human /hu’man/ use (for example, heroin /jer-oin/)
Time: 0.30 sg.
5- SUPERVISED – ONLY ONE DICTIONARIES – BEST SYSTEM
5 teams /tims/ particiated /partisipetid/ from 5 /faiv/ different countries,.
Most teams used supervised /su’per-vaiz/ methods such as CRF, SVM or j48 /jei fortyeight/ classifiers /klaesifairs/
Only one team developed /di’velopt/ a system based on dictionaries and rules to classify the pharmacological substances.
The best team was Humbold university of berlin. This system was based on CRF.
We proposed four different evaluations: The first two, exact /ig´sakt/ and partial boundary matching, for the evaluation of drug recognition task, that is, without considering the types.
The latter two for the evaluation of the overall task, recognition and classification.
Unfortunately, we do not have enough time to discuss /dis-cas/ each evaluation, so in this presentation, we will only present the results /’risalts/ for the overall task.
We also calculated F-measure for each entity type to know what type of substances are more difficult /difikalt/ than others.
The results showed that groups and substances not approved /a-pproud/ for human use are more difficult than drugs and brands.
This may be because brand names are usually short are unique and in general, generic drugs are no ambigues because they are simplified /simplifaid/ chemical names, and therefore, they are unique.
On the other hand, group names have many variants and abbreviations, and also can be ambigues (for example, anticoagulant is a group and also its effect).
Finally, substances not approved for human use is the most difficult type.
This is becuase this type is very scarce /skees/ in DrugBank, and therefore, there are few examples for training.
We also think that the definition of this type may be some confused.
Indeed, the systems were able to identiy these substances, but failed /feld/ to classify them.
Now, I am going to describe the second task about the extraction /eks-trak-shon/ of DDIs.
In this task, the gold annotations for pharmacological substances are provided for training and test datasets
Time: 0.15 sg.
Then,
Systems should be able to detect /di-tekt/ which pairs /pers/ of drugs are interacting /in-teraktin/
Time: 0.15 sg.
And then to classify them.
Time: 0.15 sg.
We propose four types to classify /klaesifai/ DDIs:
Mechanism /mecanisem/ type to classify DDIs describing the way the interaction occurs.
Effect /e’fekt/ type to classify DDIs describing the consequence of the interaction.
Advice type to classify DDIs providing a recommendation to avoid a possible interaction.
And the int type for the sentences describing an interaction without providing any additional information.
Time: 0.30 sg.
8- BUILD /BILT/ ON SVM – NON-LINEAR KERNELS – BEST ON DRUGBANK – BEST ON MEDLINE.
8 teams participated in the task.
Most sytems were built /bilt/ on SVM.
Thee of them used non-linear kernel methods, and the rest linear SVM
In DrugBank,
The two best teams were FBK-irst /ef bi key from Italy/ and the team from Humbold University.
These two teams were also the two best teams in 2011.
In MedLine, the best team was the fraunhofer SCAI (s-si-ei-ai) team.
OVERALL TASK MORE DIFFICULT THAN ONLY DETECTION.
DEC (82) vs (67) DEC&CLA on DRUGBANK. FBK
DEC (53) vs (42) DEC&CLA on MEDLINE. FBK, SCAI.
CONCLUSION: MORE DIFFICULT FOR MEDLINE
Here you can see the results for the detection task (bars in dark blue) and for the overall task, detection and classification of DDIs (bars in light blue)
The overall task is more difficult tan only detection.
For example, in DrugBAnk, the best F1 was 82% for detection and only 67% for the overall task. (this is a difference of 25% between both tasks).
The best team was the FBK.
In MedLine, the best F! Was 53% for detection and only 42% for the overal task.
We can conclude the extraction of DDIs is more dificult on MedLine than on DrugBank.
.
IMPROVEMENT – MOST DIFFICUTL TYPE – MECHANISM AND EFFECT MORE EXAMPLES FOR TRAINING – ADVICE SIMILAR TEXT PATTERNS – WORSE MEDLINE – KERNLES OVERCOM LINEAR CLASSIFIERS
OK, let me say some highlights of the results:
First, There is an important improvement /impruvment/ in the results of the the detetion task over 2011, because the best F1 in 2011 was only 65% and now is 82%, almost a 17% of increase.
-----
Well, In DrugBank, the most difficult is the int type because it’s least /li:st/ frequent type, and thereore, there few examples for training.
The rest of types show similar performance around 70% of F1.
Mechanisim and effect types also show 70% of f1, we think becuase they are the most common types, and therefore, they have more examples for training.
And on the other hand, advice is less frequente, however Sentences describing advices usually follow very same similar text patterns.
in MedLine,we would to say that the results are lower due /diut/ to mainly the complexity of these sentences.
Like in 2011, kernels methods overcome linear classifiers.
NUM TEAMS- TASK MORE DIFFICULT MEDLINE
Ok, main conclusions of the tast are:
A total of 13 /thirtiin/ teams from 7 different countries /kan’tris/ have participated in the two tasks, 5 in the drug name reconition and 8 in the extraction of DDIs.
Both tasks are more difficult for MedLine texts than for DrugBank texts. In fact,
Task 9.1- difference almost 30% in F1 between the two datasets. We think that this is because DrugBank contain very few instances of the most difficult type, substances not approved por human use.
In the detection and classification of DDI, the diference almost 11%, We think that this is becuase DrugBank sentences are shorter and simpler, and therefore, less difficutl to process than MedLIne sentences which usuallly are long and complex.
Ok, Finally, Let me remind you/ri-maind-you/ that
In the first task, the best system was based on CRF and used the training dataset extended /ek-tendid/ with the test dataset for the extraction of DDIs task.
In general the systems obtain good performance on DrugBank, however the results
In the second task, there are much room to improve, particularly on MedLine, becuase the best f1 was only 42% for detection and classification of DDIs.
We would like to improve the task in several ways, for example, including new types of documents such as health records or prescription drug documents.
However, we have no plans to annotate more documents because the task is very expensive.
For this reason, we plan to organize a third challenge in which the participating systems colaborate to create a silver corpus.
Also, In this new challenge, we would like to annotate additional features /fi’chars/ (such as the effect /i’fekt/, the mechanism /’mekanisem/ or the drug dosages /dosichs/) because are very important to determine /di’te:rmin/ the clinical significance of a DDI.
We think that this new challenge can be similar to the CALBC /kabici/ challenge.
I have to say that the task task was funded /andid/ by the MULTIMEDICA and MAVIR projects.
We would like to thank all teams, and
In particular, two teams:
The Uturku team who provided TEES analyses for datasets.
And the team from the humbold univerity of Berlin, who provided a visualization /visualiseison/ of the DDI corpus using the Stav tool .
Thank you for your attention. If you have questions or comments