Text Mining Radiology Reports for Deep Learning Radiology Images

Yifan Peng1, Xiaosong Wang2, Le Lu2, Mohammadhadi Bagheri2,
Ronald Summers2, Zhiyong Lu1
1 NCBI/NLM/NIH
2 CC/NIH
Twitter: #AMIA2017
Text Mining Radiology Reports for Deep
Learning Radiology Images
Methods for Identification, Classification, and Association using EHR Data
S23

• The availability of well-labeled data is the key for large scale machine learning, e.g. deep
learning
• Hospitals have accumulated a large number of raw radiology images and reports
• Conventional ways for collecting image labels are NOT applicable
• the security and privacy issues
• requires comprehension of domain-specific medical knowledge
All Start with Data
Large scale natural image datasets
Large scale
Medical Image dataset
2AMIA 2017 | amia.org

Overview
Mining image labels via NLP for multi-label pathology classification

A Sample Entry
Image Report Label
findings: pa and lateral views of
the chest demonstrate significantly
improved bilateral lower lung field
interstitial markings compatible with
linear atelectasis. unchanged right
9th rib fracture peripherally.
unchanged ossification left
coracoacromial ligament. the cardiac
and mediastinal contours are stable.
impression: improved bilateral lower
lung field linear atelectasis.
Atelectasis

8 Common Thorax Diseases

Challenges
Negative and equivocal findings may indicate the absence of findings
mentioned within the radiology report
Findings: right internal jugular catheter remains in place. Large metastatic lung mass in
the lateral left upper lobe is again noted. No infiltrate or effusion. Extensive surgical
clips again noted left axilla.
Impression: no significant change.
Reason for exam (entered by ordering clinician into cris): bilateral pneumonia no
change in the tracheostomy tube or right internal jugular venous catheter. Unchanged
bilateral alveolar infiltrates, fluid in the right minor fissure, lucency at the right
costophrenic angle suggesting pneumothorax. Overall, no significant change

Related Work
Chapman W, et al. A simple algorithm for identifying negated findings and diseases in
discharge summaries. Journal of Biomedical Informatics. 2001;34:301-310.
Harkema H, et al. ConText: an algorithm for determining negation, experiencer, and
temporal status from clinical reports. Journal of biomedical informatics. 2009;42:839-851.
Mutalik P, et al. Use of general-purpose negation detection to augment concept indexing
of medical documents: a quantitative study using the UMLS. Journal of the American
Medical Informatics Association. 2001;8:598-609.
Sohn S, Wu S, Chute C. Dependency parser-based negation detection in clinical
narratives. In AMIA Summits on Translational Science proceedings AMIA Summit on
Translational Science. 2012;2012:1-8.
Mehrabi S, et al. DEEPEN: A negation detection system for clinical text incorporating
dependency relation into NegEx. Journal of Biomedical Informatics. 2015;54:213-219.

Related Work
Ogren P, et al. Constructing evaluation corpora for automated clinical named entity
recognition. In Proceedings of the Sixth International Conference on Language
Resources and Evaluation (LREC'08). 2008;28-30.
Uzuner South B, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in
clinical text. Journal of the American Medical Informatics Association. 2011;18:552-556.
Suominen H, et al. Overview of the ShARe/CLEF eHealth evaluation lab 2013. In
International Conference of the Cross-Language Evaluation Forum for European
Languages. 2013;212-231.
Albright D, et al. Towards comprehensive syntactic and semantic annotations of the
clinical narrative. Journal of the American Medical Informatics Association. 2013;20:922-
930.
etc..

Our overall method
1. MetaMap and DNorm were used to map every mention of keywords in a report
to a unique concept ID in the Systematized Nomenclature of Medicine Clinical
Terms (SNOMED-CT)
• MetaMap (Aronson et al. 2010)
• DNorm (Leaman, Lu, 2014)
2. Remove negative and equivocal findings within the radiology report

• Utilize the universal dependency graph to define patterns
• a directed graph
• vertices are labeled with information such as the word, part-of-speech and
the word lemma
• edges represent typed dependencies from the governor to its dependent
and are labeled with dependency type
Negation and Uncertainty detection

Sample rules
• Defined rules on the dependency graphs by utilizing the dependency label
and direction information

Experiments
Three benchmarking corpora
Dataset Reports Positives Negatives
OpenI (Demner-Fushman et al) 3,851 1,354 -
ChestX-ray (Wang et al) 900 2,131 -
• Demner-Fushman D, Kohli M, Rosenman M, et al. Preparing a collection
of radiology examinations for distribution and retrieval. Journal of the
American Medical Informatics Association. 2015;23:304-310.

Results on OpenI and ChestX-ray
OpenI ChestX-ray
P R F P R F
DNorm/MetaMap 13.8 85.7 23.8 72.3 95.7 82.4
DNorm/MetaMap + Negation 89.8 85.0 87.3 94.4 94.4 94.4

NIH Chest X-ray Dataset
One of the largest publicly available chest x-ray datasets to scientific
community
• 112,120 frontal-view X-ray images
• 30,805 unique patients
https://nihcc.app.box.com/v/ChestXray-NIHCC

Multi-label Classification and Localization
Multi-label Classification and Localization Framework

Multi-label Classification and Localization

Radiology image classification
• Training (70%), validation (10%) and testing (20%)
• Multi-label CNN architecture is implemented using Caffe framework
• The ImageNet pre-trained models, i.e., AlexNet, GoogLeNet, VGGNet-16
and ResNet-50 are obtained from the Caffe model zoo

Multi-label Disease Classification Results
Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R. ChestX-ray8: Hospital-scale
Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and
Localization of Common Thorax Diseases. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). 2017.

Conclusion and Future work
• We propose a new algorithm (NegBio), based on dependency graph, to
determine negative and equivocal findings in radiology reports
• We provide one of the largest publicly available chest x-ray datasets to
scientific community
• We explore the combination of text mining with radiology imaging analysis in
the era of deep learning.
Future work
• To explore NegBio’s applicability in clinical texts beyond radiology reports.
• The current results suggest that building fully-automated high precision CAD
systems remains challenging.

Acknowledgment
This work was supported by the Intramural Research Program of the National
Institutes of Health, at National Library of Medicine and Clinical Center. We are
also grateful to Robert Leaman for his editorial comments. We thank NVIDIA
Corporation for the GPU donation.

Thank you!
yifan.peng@nih.gov
https://nihcc.app.box.com/v/ChestXray-NIHCC

Text Mining Radiology Reports for Deep Learning Radiology Images

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Text Mining Radiology Reports for Deep Learning Radiology Images

Similar to Text Mining Radiology Reports for Deep Learning Radiology Images (20)

Recently uploaded

Recently uploaded (20)

Text Mining Radiology Reports for Deep Learning Radiology Images

Editor's Notes