This document provides an overview of deep learning and its applications in medical image analysis. It begins with an introduction to the speaker and their background in biomedical image analysis. It then discusses machine learning and how deep learning uses neural networks with many layers to automatically determine useful features from data. Convolutional neural networks are described as being well-suited for image analysis. Several examples of deep learning applications in medical images are given, including brain MRI segmentation, detection of prostate cancer in ultrasound images, and the speaker's own work on neonatal brain injury assessment from MRI scans. Resources for getting started with deep learning are also listed.
1. A World -Leading Science Foundation Ireland Research Centre
Deep Learning for Medical
Image Analysis
Keelin Murphy
July 3rd
2017
2. About Me
BA Mathematics TCD
Software Industry 4-5 years
MSc + PhD in Biomedical Image Analysis
UCC (INFANT Research Centre)
Utrecht Medical Center, the Netherlands
7. How does it learn?
Machine Learning
Everything Else
(“Conventional” Machine Learning)
Neural Networks
(AKA Deep Learning)
Support Vector Machines
Random Forests
Gradient Boosting
Linear Classifiers
Nearest Neighbour Classifiers
……..
“Hand-crafted” Features
13. Deep Learning
(Artificial) Neural networks with lots of hidden layers (deep)
The network determines what features are
useful
Lost favour until around 2006-2012
- Large amounts of data online
- GPU and distributed processing
Source: Alexander Del Toro Barba
https://www.linkedin.com/pulse/how-artificial-intelligence-revolutionizing-finance-del-toro-barba
14. Neural Networks: Auto-features!
Dog
Cat
Penguin
0.1
0.2
0.7
Input Layer Hidden Layers Output Layer
Dog
Cat
Penguin
0.0
0.0
1.0
TRUTH
ERROR
ERROR
ERROR
Neuron / Perceptron
= Matrix of Weights
TRAINING
0.3
0.4
0.3
Back-Propagation
Update weights to minimize errors
18. Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
f(W1x + b1)
f(W2x + b2)
f = activation
function (non
linearity)
Output Layer
Dog
Cat
Softmax
Function
n1
n2
19. Deep Neural Networks
Simplest Neural Network Example:
Input
Hidden Layer
2 neurons
x1
x2
x4
x3
w11
b1
b2
w12 w13 w14
w21 w22 w23 w24
x
w1
w2
Output Layer
Dog
Cat
Softmax
Function
TRAINING
Dog
Cat
TRUTH
ERROR
ERROR
Back-Propagation
Update weights to minimize errors
1.0
0.0
n1
n2
f(W1x + b1)
f(W2x + b2)
20. Deep Neural Networks
Back-propagation:
Choose weight changes
which move us
“downwards” in the loss
function, L
Gradient Descent:
W11
W12
L
Network error
measured by Loss function,
L (Cost function)
ERROR
21. Deep Neural Networks
Gradient Descent = basis for many more sophisticated
optimization methods
Optimizer = Method of updating weights based on Loss
Adam, Adagrad, Adamax, RMSProp……. etc etc
See also http://sebastianruder.com/optimizing-gradient-descent/index.html
23. Deep Neural Networks
256
256
(Small) RGB image
Fully connected model :
•256 x 256 x 3 weights PER neuron in first
hidden layer!
•Flattening input to a vector loses information
What about image analysis……?
24. By Aphex34 - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=45659236
Convolutional Neural Networks
Input layer (image)
First Hidden layer
(Num channels (features) = 5)
CNN model :
•Neurons arranged in blocks
•Each neuron connects to a small region
of the input (receptive field)
•Neurons in same channel share same
weights
•Weight-sharing -> detection of similar
features across the image
26. Convolutional Neural Networks
Adapted from : http://benanne.github.io/images/architecture.png
MaxPool: reduce dimensionality, prevent overfitting
Could also add “dropout” layer to help with overfitting
42. Detection of prostate cancer using temporal sequences of
ultrasound data: a large clinical feasibility study
Azizi et al “Detection of prostate cancer using temporal sequences of ultrasound data: a large clinical feasibility study”, 2016
Malignancy
determination in
Prostate Ultrasound
Medical Imaging Applications
43. Detection of
Tuberculosis in Chest
X-Ray
Kim et al “Deconvolutional Feature Stacking for Weakly-Supervised Semantic Segmentation”, 2016
Medical Imaging Applications
44. Gao et al “Multi-label Deep Regression and Unordered Pooling for Holistic Interstitial Lung Disease Detection”,
2016
Detecting Patterns of
Interstitial Lung
Disease
(CT)
Medical Imaging Applications
45. Ghafoorian et al “Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities”,
2016
Segmentation of
White-Matter
Hyperintensities
(MRI)
Medical Imaging Applications
46. INFANT Research
INFANT Perinatal Research
NeonatesPregnancy
e.g.
Diagnostic Testing
Improved Monitoring
Newborn Health
Monitoring
Nutrition
Brain InjuryBrain Injury
www.infantcentre.i
47. Hypoxic Ischemic Encephalopathy
Oxygen Deprivation during Birth
Cause of brain injury
2-5 cases per 1000 live births
Wide range of severities and outcomes
Which part of the brain is injured and how severely?Which part of the brain is injured and how severely?
52. The Neural Network
(25 Subjects – per pixel Classification – round-robin 5 subjects training)
Fully Convolutional Network with dilated convolutions
Trained on image patches with Data Augmentation
3 x Hidden Convolutional Layers (32 features, 64 features, 96 features)
Loss function : Binary Cross-entropy
Optimizer : Adam
53. Brain Tissue Segmentation
Segment 8 tissue types
in anatomical scans
NeoBrainS12 Public Challenge
2 x subjects fully labelled (training)
5 x subjects no labels (test)
56. The Neural Network
(2 training Subjects – per pixel classification)
Fully Convolutional Network with dilated convolutions
Trained on image patches with Data Augmentation
Deep residual network with 11 convolutional layers stacked with batch-
normalization and ‘ReLU’ activation layers.
Loss function : Categorical Cross-entropy
Optimizer : Adam
57. Getting Started with Deep Learning
Preferably with GPU
e.g. NVIDIA GTX
see also http://timdettmers.com/2017/04/09/which-gpu-for-deep-learning/
Coding Frameworks:
Caffe (Berkeley AI)
Theano (University of Montreal)
Tensorflow (Google)
PyTorch (or Torch)
Higher Level:
Lasagne (layered on Theano)
Keras (layered on Tensorflow/Theano)
Also check out:
Deep learning for JVM (Java, Scala, Hadoop, Spark) https://deeplearning4j.org/
Packages in e.g. Matlab & R (extensive list: http://deeplearning.net/software_links/)
Sigmoids saturate gradients during back-propagation
ReLU is fast to compute, works well in many cases, caution with learning rate as incorrect settings can lead to dead neurons which always output 0.
Want to convert our output values from the hidden layer into categorical probability.
L shown as a function of weights
Regular Neural Nets don’t scale well to full images. In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectable size, e.g. 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.
the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner
Flattening images to vectors loses information
Each neuron connects to a small region of the input (receptive field)
Weight-sharing -> detection of similar features across the image
Central pixel replaced with a weighted sum of itself and surrounding pixels
Max pooling, reduces dimensionality, helps overfitting
Dropout – disable some neurons on occasion, so that the others learn to function better independently - helps overfitting
Discuss modalities, pros/cons?
Anisotropy
Grayscale
Lack of annotations
Imaging protocols which ten years ago might have generated 50 images, may now produce thousands of images, all requiring expert examination
Error-prone – tired, distracted, bad day, inexperienced
Subjective – lots of disagreement!
Qualitative – “quite big”, “fairly severe” (2D line measures)
Tireless – Can work 24/7
Objective - Repeatable results, 100% agreement
Quantitative – “10% bigger than average”
common finding on brain MR images of patients diagnosed with small vessel disease (SVD) [1], multiple sclerosis [2], Parkinsonism [3], stroke [4], Alzheimer’s disease [5] and Dementia [6]. Associated with various measures of decline. Not well understood.
use primitive functions that NVIDIA has specifically developed for deep learning on GPUs, called cuDNN
Torch based on Lua language