Although hospitals have accumulated a tremendous number of radiology images and reports, using them to build data-hungry deep-learning models remains challenging. This is partially because manually annotating a dataset in large scale is costly. Here, we propose using text mining to automatically generate a weakly-labeled dataset for detecting thoracic diseases in radiology images. This work represents the first attempt to unify text mining with computer vision for medical imaging analysis in the era of deep learning.
Text Mining Radiology Reports for Deep Learning Radiology Images
1. Yifan Peng1, Xiaosong Wang2, Le Lu2, Mohammadhadi Bagheri2,
Ronald Summers2, Zhiyong Lu1
1 NCBI/NLM/NIH
2 CC/NIH
Twitter: #AMIA2017
Text Mining Radiology Reports for Deep
Learning Radiology Images
Methods for Identification, Classification, and Association using EHR Data
S23
2. • The availability of well-labeled data is the key for large scale machine learning, e.g. deep
learning
• Hospitals have accumulated a large number of raw radiology images and reports
• Conventional ways for collecting image labels are NOT applicable
• the security and privacy issues
• requires comprehension of domain-specific medical knowledge
All Start with Data
Large scale natural image datasets
Large scale
Medical Image dataset
2AMIA 2017 | amia.org
4. A Sample Entry
Image Report Label
findings: pa and lateral views of
the chest demonstrate significantly
improved bilateral lower lung field
interstitial markings compatible with
linear atelectasis. unchanged right
9th rib fracture peripherally.
unchanged ossification left
coracoacromial ligament. the cardiac
and mediastinal contours are stable.
impression: improved bilateral lower
lung field linear atelectasis.
Atelectasis
4AMIA 2017 | amia.org
6. Challenges
Negative and equivocal findings may indicate the absence of findings
mentioned within the radiology report
Findings: right internal jugular catheter remains in place. Large metastatic lung mass in
the lateral left upper lobe is again noted. No infiltrate or effusion. Extensive surgical
clips again noted left axilla.
Impression: no significant change.
Reason for exam (entered by ordering clinician into cris): bilateral pneumonia no
change in the tracheostomy tube or right internal jugular venous catheter. Unchanged
bilateral alveolar infiltrates, fluid in the right minor fissure, lucency at the right
costophrenic angle suggesting pneumothorax. Overall, no significant change
6AMIA 2017 | amia.org
7. Related Work
Chapman W, et al. A simple algorithm for identifying negated findings and diseases in
discharge summaries. Journal of Biomedical Informatics. 2001;34:301-310.
Harkema H, et al. ConText: an algorithm for determining negation, experiencer, and
temporal status from clinical reports. Journal of biomedical informatics. 2009;42:839-851.
Mutalik P, et al. Use of general-purpose negation detection to augment concept indexing
of medical documents: a quantitative study using the UMLS. Journal of the American
Medical Informatics Association. 2001;8:598-609.
Sohn S, Wu S, Chute C. Dependency parser-based negation detection in clinical
narratives. In AMIA Summits on Translational Science proceedings AMIA Summit on
Translational Science. 2012;2012:1-8.
Mehrabi S, et al. DEEPEN: A negation detection system for clinical text incorporating
dependency relation into NegEx. Journal of Biomedical Informatics. 2015;54:213-219.
7AMIA 2017 | amia.org
8. Related Work
Ogren P, et al. Constructing evaluation corpora for automated clinical named entity
recognition. In Proceedings of the Sixth International Conference on Language
Resources and Evaluation (LREC'08). 2008;28-30.
Uzuner South B, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in
clinical text. Journal of the American Medical Informatics Association. 2011;18:552-556.
Suominen H, et al. Overview of the ShARe/CLEF eHealth evaluation lab 2013. In
International Conference of the Cross-Language Evaluation Forum for European
Languages. 2013;212-231.
Albright D, et al. Towards comprehensive syntactic and semantic annotations of the
clinical narrative. Journal of the American Medical Informatics Association. 2013;20:922-
930.
etc..
8AMIA 2017 | amia.org
9. Our overall method
1. MetaMap and DNorm were used to map every mention of keywords in a report
to a unique concept ID in the Systematized Nomenclature of Medicine Clinical
Terms (SNOMED-CT)
• MetaMap (Aronson et al. 2010)
• DNorm (Leaman, Lu, 2014)
2. Remove negative and equivocal findings within the radiology report
9AMIA 2017 | amia.org
10. • Utilize the universal dependency graph to define patterns
• a directed graph
• vertices are labeled with information such as the word, part-of-speech and
the word lemma
• edges represent typed dependencies from the governor to its dependent
and are labeled with dependency type
10AMIA 2017 | amia.org
Negation and Uncertainty detection
11. Sample rules
11AMIA 2017 | amia.org
• Defined rules on the dependency graphs by utilizing the dependency label
and direction information
12. Experiments
Three benchmarking corpora
12AMIA 2017 | amia.org
Dataset Reports Positives Negatives
OpenI (Demner-Fushman et al) 3,851 1,354 -
ChestX-ray (Wang et al) 900 2,131 -
• Demner-Fushman D, Kohli M, Rosenman M, et al. Preparing a collection
of radiology examinations for distribution and retrieval. Journal of the
American Medical Informatics Association. 2015;23:304-310.
13. Results on OpenI and ChestX-ray
13AMIA 2017 | amia.org
OpenI ChestX-ray
P R F P R F
DNorm/MetaMap 13.8 85.7 23.8 72.3 95.7 82.4
DNorm/MetaMap + Negation 89.8 85.0 87.3 94.4 94.4 94.4
14. NIH Chest X-ray Dataset
One of the largest publicly available chest x-ray datasets to scientific
community
• 112,120 frontal-view X-ray images
• 30,805 unique patients
14AMIA 2017 | amia.org
https://nihcc.app.box.com/v/ChestXray-NIHCC
15. Multi-label Classification and Localization
Multi-label Classification and Localization Framework
15AMIA 2017 | amia.org
17. Radiology image classification
• Training (70%), validation (10%) and testing (20%)
• Multi-label CNN architecture is implemented using Caffe framework
• The ImageNet pre-trained models, i.e., AlexNet, GoogLeNet, VGGNet-16
and ResNet-50 are obtained from the Caffe model zoo
17AMIA 2017 | amia.org
18. Multi-label Disease Classification Results
Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R. ChestX-ray8: Hospital-scale
Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and
Localization of Common Thorax Diseases. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). 2017.
18AMIA 2017 | amia.org
19. Conclusion and Future work
• We propose a new algorithm (NegBio), based on dependency graph, to
determine negative and equivocal findings in radiology reports
• We provide one of the largest publicly available chest x-ray datasets to
scientific community
• We explore the combination of text mining with radiology imaging analysis in
the era of deep learning.
Future work
• To explore NegBio’s applicability in clinical texts beyond radiology reports.
• The current results suggest that building fully-automated high precision CAD
systems remains challenging.
19AMIA 2017 | amia.org
20. Acknowledgment
This work was supported by the Intramural Research Program of the National
Institutes of Health, at National Library of Medicine and Clinical Center. We are
also grateful to Robert Leaman for his editorial comments. We thank NVIDIA
Corporation for the GPU donation.
20AMIA 2017 | amia.org
The motivation of this project is straightforward. In general computer vision, we have seen great use of neural network and deep learning techniques on different image processing tasks, such as image classification, object detection and caption generation. But we rarely see computer vision applications of deep learning in the clinical domain. The reason is probably we don’t have a large scale medical image dataset to fulfil the data-hungry DL needs.
For natural image, we can use crowd-sourcing. But it is not applicable for X-ray images because the issues of security and privacy. Also, it usually require domain knowledge to label the X-ray.
Although hospital have accumulated a large number of raw radiology images and reports. how we can generate labels for a large scale dataset remains challenging.
In this project, we provide a text-mining method to automatcially genreate labels from radiology reports, and we show we can successfully train DL models uisng this dataset.
The figure shows the overview of our approach. We have raw images and reports from Picture Archiving and Communication Systems. We mined the labels from the reports. We used the labeled images to train deep learning models for multi-label classification.
In this talk, I will focus on the first step of how we constructed the labels.
So the target of my side is to find diseases/findings from the clinical report
Including atelectasis, we mainly focus on 8 diseases such as mass, nodule, and effusion.
Different from other text, there are many negative or equivocal findings in the clinical text. For negative findings, we refer to findings that were ruled out by the radiologist such as no XXX. For equivocal findings, we refer to findings which radiologist is suspicious of. Such as “suggesting obstructive lung disease”.
Since they may indicate the absence of findings mentioned within the radiology report, identifying them is as important as identifying positive findings. Otherwise, information extraction algorithms that do not distinguish negative and equivocal findings from positive ones may return many irrelevant results. Even though many natural language processing applications have been developed in recent years that successfully extract findings mentioned in medical reports, discriminating between positive, negative, and equivocal findings remains challenging
We use a two-pass approach to achieve this. In the first pass, we use named-entity recognition tools to detect the findings from the report and normailzied to a unique ID in SNOMED
MetaMap is a knowledge-intensive rule-based approach to map biomedical text to the UMLS Metathesaurus
DNorm is a machine learning method, developed by our group for disease recognition and normalization
Then we remove negative and equivocal findings from the reports.
The motiviation of using dg is that we can use Less rules to capture more text variants
Several rules that are frequently matched in the text
To test the performance of NegBio
Open I is one of the largest corpus where positive findings are annotated
We can detect and remove more negative cases. As a result, the precision for positive finding detection increases.
The NIH Clinical Center recently released over 100,000 anonymized chest x-ray images and their corresponding data to the scientific community.
We hope The release will allow researchers across the country and around the world to freely access the datasets and increase their ability to teach computers how to detect and diagnose disease.
We used the labeled images to train deep learning models for multi-label classification.
We hope could be a baseline
We hope The release will increase their ability to teach computers how to detect and diagnose disease.
allow researchers across the country and around the world to freely access the datasets