Eye deep

Eye-Deep
Detecting Diabetes
with Convolutional Neural Networks
team o_O
Mathis Antony
sveitser@gmail.com
Stephan Brüggemann
https://github.com/sveitser/kaggle_diabetic
https://www.kaggle.com/c/diabetic-retinopathy-detection

Intro
● Supervised Learning
○ Training set (features + labels) and test set (only features)
○ Training
■ learn relationship between features and labels (on training set)
○ Testing
■ predict labels from test data and measure performance
● Deep learning
○ Deep → many layers
○ Concepts not “new”
■ More data (internet)
■ More computational power (GPUs)
■ advancements in the field
■ great open source software

Neurons
Complete neuron cell diagram
Mariana Ruiz Villarreal
https://en.wikipedia.org/wiki/Neuron

ReLU: max(x, 0)
Leaky ReLU: max(x/100, x)
x (sum of inputs)
y (output)
Rectified Linear Unit: ReLU
inputs
output
1. sum inputs
2. activation function
Artificial Neurons

Forward Pass on Toy Neural Network
output
input
tail age
weights 1 -1 -2 31 2
-1 2 -2weights
tail? = yes | age = 3 | grumpiness = 10
loss/error:
● (prediction - truth)2
● → (11 - 10)2
= 1
1 3
1·1 - 2·3 = -5 -1·1 + 3·3 = 8 1·1 + 2·2 = 5
1·5 + 2·8 - 2·5 = 11
● Creature grumpiness
● Features: “has tail?”, age
● Target: grumpiness
● Loss: mean squared error

Gradient Descent
1. Compute derivative of loss
function with respect to weights
2. Update weights
η: learning rate

Training
1. Initialize weights randomly
2. Until happy, repeat
a. Forward pass through network (make prediction)
b. Calculate error
c. Backward propagation of errors (backprop)
d. Update weights
● Done in mini batches
● One batch in memory at a time if necessary
● Libraries provide almost everything

Image Convolutions
-1 -1 -1
-1 8 -1
-1 -1 -1
deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
-1 0 -1
0 -1 0
-1 0 -1
filter

Max Pooling
pool size 3
1 1 1 1 7
0 2 1 1 2
1 2 4 0 1
2 4 5 5 6
2 4 1 4 2
4 74 7
5 65
stride 2
max pooling
pool size 3
stride 2

from sklearn.datasets import load_digits
d = load_digits()
X = d.images
# reshape to n_samples, n_channels, n_x, n_y
# and convert to 32-bit (to train on GPU)
X = X.reshape((-1, 1, 8, 8)).astype('f4')
# standardize
X = (X - X.mean()) / X.std()
# convert target to 32-bit int
y = d.target.astype('i4')
from lasagne import layers
from lasagne.nonlinearities import softmax
my_layers = [
(layers.InputLayer, {'shape': (None, 1, 8, 8)}),
(layers.Conv2DLayer, {'num_filters': 64,
'filter_size': (3, 3)}),
(layers.MaxPool2DLayer, {'pool_size': (3, 3),
'stride': (2, 2)}),
(layers.DenseLayer, {'num_units': 20}),
(layers.DenseLayer, {'num_units': 10,
'nonlinearity': softmax}),
]
from nolearn.lasagne import NeuralNet
net = NeuralNet(my_layers, verbose=1,
max_epochs=20,
update_learning_rate=0.02)
# train network
net.fit(X, y)
# make predictions
y_pred = net.predict(X)

$python nn_example.py
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled)
## Layer information
# name size
--- ---------- ------
0 input0 1x8x8
1 conv2d1 64x6x6
2 maxpool2d2 64x3x3
3 dense3 20
4 dense4 10
epoch train loss valid loss train/val valid acc dur
------- ------------ ------------ ----------- ----------- -----
1 2.17409 1.94451 1.11807 0.55025 0.09s
2 1.67648 1.35972 1.23297 0.64005 0.09s
3 1.03381 0.75170 1.37530 0.89149 0.10s
4 0.56765 0.41487 1.36825 0.90712 0.10s
5 0.33763 0.27013 1.24991 0.94387 0.09s
...
18 0.02589 0.07183 0.36048 0.98438 0.10s
19 0.02325 0.07053 0.32962 0.98438 0.09s
20 0.02129 0.06951 0.30625 0.98698 0.10s

Problem
● input data
○ high resolution color retinal images
○ training set: 35126 images
○ test set: 53576 images
● target
○ stage of diabetic retinopathy
■ 0 No DR
■ 1 Mild
■ 2 Moderate
■ 3 Severe
■ 4 Proliferative DR
● Highly unbalanced dataset

Metric
● Quadratic (Weighted) Cohen’s kappa (κ)
○ Agreement between rating of two parties
■ 0 agreement by chance
■ 0 - 0.2 poor
■ ...
■ 0.8 - 1.0 very good
■ 1 total agreement
● “Weighted” → Ordinal classification problem
● “Less penalty for classifying a 0 as a 1 than as a 2”
● Our “solution”:
○ Regression with mean squared error
○ thresholding at [0.5, 1.5, 2.5, 3.5]

Dataset
stage 0 stage 1 stage 2 stage 3 stage 4

What are we looking for?
Saiprasad Ravishankar, Arpit Jain, Anurag Mittal
IEEE Conf. on Computer Vision Pattern Recognition (CVPR) 2009 http://ieeexplore.ieee.
org/xpl/articleDetails.jsp?arnumber=5206763

Preprocessing
● Simple heuristics to isolate and crop foreground
● Resize to 512 pixel squares
● Standardize each channel (RGB) to have zero mean and unit variance
● That’s it!
● But, training large networks requires a lot of data.

Augmentation
● Problem: “Small” Dataset
● Artificially increase size of dataset
○ translation
○ rotation
■ can become the bottleneck for large images
○ flipping
○ shearing
○ stretching
○ color augmentation*

Layer Types
● ☑Convolutional Layer (find features)
● ☑Max Pooling Layer (find features + reduce size)
● ☑Fully Connected Layer (prediction from features)
● Dropout Layer (model averaging, against overfitting)
○ Zero half the neurons
○ Network becomes different for each mini batch
5 2 8 1 9 2 2 5
5 0 0 1 9 0 2 0
3 5
5
2 1
2
● Maxout Layer
○ Take maximum value over 2 Neurons

● Input image to many tiny “images” (feature maps) a few pixels wide.
● Extract features on the way through the network.
● Layers with stride 2 halve width and height of feature maps.
● Handy “Units”
○ 2 - 4 convolutional layers with small filters (2 x 2 to 5 x 5)
○ followed by max pooling layer with stride 2 and pool size 3
● Add ReLUs (or similar)
● 1 or 2 fully connected layers with dropout at the end
● Weight decay for convolutional layers.
● “If it doesn’t overfit, you should probably it bigger”.
● In competition:
○ Tiny features → larger input images: 64 → 128 → 256 → 512 (→ 768)
○ More and more convolution and pooling layers
Network Architecture

Network Architecture Convolution (4x4)
Pooling (3x3, stride 2)
Dropout
Maxout
Fully Connected
32
64
128
1024
256
512
1024
512
512

Training
● Deep networks (many layers) are sometimes hard to train.
● Initialization strategy is very important.
● learning rate:
a. Find largest value for loss still converges
b. When loss doesn’t decrease, decrease learning rate by factor of 5 or 10
● Use “Adam” optimizer or “Nesterov Momentum”.
● In competition
a. Dynamic resampling to deal with class imbalance.
b. Train smaller network and use learned weights to initialize bigger
networks.
c. 200 - 250 epochs
d. ~ 2 days to train one network

What does it “see”?
input (stage 1) 5x5 pixel occluded prediction overlay
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
http://arxiv.org/abs/1311.2901

What does it “see”?
input (stage 1)
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
http://arxiv.org/abs/1311.2901

Feature Extraction
● Output of any layer can be used as features
● Could use pretrained networks for feature extraction (unless kaggle)
output of last pooling layer → features
● Original score: κ 0.79 (~ rank 13 on final kaggle leaderboard)
● Features of last pooling layer:
○ Blend Network
■ features → FC 32 → maxout → FC 32 → maxout → output
○ κ ~ 0.80 (~ rank 12)
○ fully connected layers in our convolutional network not well trained

Test Time Averaging (TTA)
● From winners of kaggle plankton competition early 2015:
https://github.com/benanne/kaggle-ndsb
● Average output of last pooling layer over multiple augmentations for
each eye
● Use mean and standard deviation of each feature
● Same blend network
○ features → FC 32 → maxout → FC 32 → maxout → output
● with TTA mean κ ~ 0.81 (~ rank 11)
● with TTA mean + standard deviation κ ~ 0.815 (~ rank 10)

● use TTA features from left and right eye and blend
○ [features of this eye, features of patient’s other eye, left eye indicator]
■ left: [left eye features, right eye features, 1] → left eye label
■ right: [right eye features, left eye features, 0] → right eye label
○ mean, standard deviation, indicator: 8193 features
● Train Blend Network: κ → ~ 0.84 (~ rank 2 - 3)
“Per Patient” Blend
● both eyes for each patient in dataset
● some images look very “different”
● correlation between labels for left and
right eye is very high: ρ ~ 0.85
right eye label
for patients
with left eye
label 3

Final Result
● Ensembling
○ averaged results from 2 similar network architectures and 3 sets of
weights each: κ → ~ 0.845
HK Electric wins too

Thank you
Q&A
● edu:
○ https://mitpress.mit.edu/books/introduction-machine-learning
○ https://www.coursera.org/course/neuralnets
● code:
○ https://github.com/Lasagne/Lasagne
○ https://github.com/dnouri/nolearn
○ https://github.com/BVLC/caffe
○ https://github.com/dmlc/mxnet
○ https://github.com/tensorflow/tensorflow
○ http://keras.io/
○ https://github.com/scikit-learn/scikit-learn

Eye deep

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (12)

Semelhante a Eye deep

Semelhante a Eye deep (20)

Último

Último (20)

Eye deep