2. Hypothesis
Main: Features of both the lesion and the contralateral
breast can help predict why cancer stops in some patients
and spreads in others.
Sub: A merging of features can create an even
better predictor than the original features themselves.
5. Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
3 Segmentation Types:
-Region Growing
~Uses Mass Classify
-Snake & RGI
~Use specific Matlab
Function
(runSnakeByList)
8. Applied Analysis Programs/Methods
● Two Feature scatterplots
○ For Malignant (106) and Benign (76)
○ For Invasive, Non-invasive, and Benign
● ROC Analysis
● Correlation Tests
● Linear Discriminant Analysis (LDA)
9. Malignant vs. Benign
Feature Plots
● Created an iterative code that loops through all of the
features and graphs every pair with respect to one
another.
10. Malignant vs. Benign
Feature Plots
● Created an iterative code that loops through all of the
features and graphs every pair with respect to one
another.
○ Created a code that allows an input of two specific
features, and automatically creates a graph with the
two feature names as the title.
13. ROC Analysis
Purpose: To determine how well each individual feature can predict the truth
(Whether Malignant vs. Benign, or Invasive vs. Non-Invasive).
Sensitivity (TPF) = The fraction of patients with
the disease correctly identified as “positive.”
Specificity = The fraction of patients without
the disease correctly identified as “negative.”
AUC = The probability that a specific classifier
will correctly differentiate between positive and
negative.
14. ROC Analysis
● Created a code that automatically plots an ROC curve of
a specific feature both iteratively or by specific selection.
15. ROC Analysis
● Created a code that automatically plots an ROC curve of
a specific feature both iteratively or by specific selection.
○ The code prints the AUC value on the graph, and
saves it in a separate matrix.
26. Correlation Test
Purpose :
To see which features are similar to each other in order to create the
most affective feature merging.
R2 : The coefficient of determination. Is a statistical measure of how close the
data are to the fitted regression line. The higher the number, the better the
model fits your data (the higher the correlation).
P: The probability of getting “more extreme” results. The lower the number,
the smaller the probability of extreme values (the higher the correlation).
27. Correlation Test
● used the Matlab function “corr”, which returns both the
correlation coefficients, R, and the P values between two
features.
28. Correlation Test
● Used the Matlab function “corr”, which returns both the
correlation coefficients, R, and the P values between two
features.
○ created a 77 x 77 matrix of all of the R2 and P values
for each feature combination for both benign and
malignant cases.
29. Correlation Test Results (R2)
* The other 7 feature pair linear regression coefficients were <= .226 *
Feature Pair Malignant Coeff. Benign Coeff.
4 & 9 .9760 .8402
8 & 3 .6515 .6683
8 & 15 .5356 .4825
30. Correlation Test Results (P)
* The other 6 feature pair P values were >= .022*
Feature Pair Malignant P-value Benign P-value
4 & 9 4.419 x 10-86 3.461 x 10-31
8 & 3 1.510 x 10-25 2.076 x 10-19
8 & 15 5.062 x 10-19 3.411 x 10-12
15 & 3 2.711 X 10-7 2.460 x 10-5
33. Purpose:
● To create a more robust merged feature model that is not dependent on
the specific cases given.
Procedure:
● A “round-robin” process where each time one case is left out and the best
features are selected.
○ The selected features are then merged and applied to that one case
that was left out, and a score is given for how well those combined
features predicted the correct outcome.
Linear Discriminant Analysis
34. Consistently Selected Features:
for Malignant vs. Benign with all 77 features
Linear Discriminant Analysis
Feature # Feature Name AUC Value
Alone
4 FWHM ROI 0.7472
15 Diameter 0.6730
16 Circularity 0.5549
3 Margin Sharpness 0.6717
59 Beta 4 0.5440
AUC value of discriminant
scores (new pseudo
features):
.7655
35. of just the 32 lesion features:
● Most commonly selected were: 3 (Margin Sharpness), 4 (FWHM
ROI), 15 (Diameter), 16 (Circularity)
● The Az value of the discriminant scores was .7745
Linear Discriminant Analysis
42. Ttest
Purpose:
• To determine whether or not the difference between two groups’ averages
is due to random chance.
• We used it to double-check our ROC results
Procedure:
• Compares the average values of two groups, evaluates the standard
deviation of the values from the average, and then computes the statistical
significance of the difference between the two group averages.
43. Ttest Results and Trends
*P-value: The percent
chance that the
difference between
the two data sets
is due to random
variation.
Feature # Feature Name Malignant Trend ttest P-value
4 FWHM ROI 7.6131 x 10-9
9 FWHM Margin 9.5087 x 10-9
8 Texture (STD div) 9.4544 x 10-6
3 Margin Sharpness 1.1289 x 10-5
15 Diameter 4.4803 x 10-5
6 FWHM Grown 1.0707 x 10-4
1 FWHM Border 1.7025 x 10-4
5 Radial Grad ROI 1.4082 x 10-3
10 Radial Grad Margin 3.0365 x 10-3
44. Ttest Results and Trends (cont.)
*P-value: The percent
chance that the
difference between
the two data sets
is due to random
variation.
Feature # Feature Name Malignant Trend ttest P-value
29 Max Corr. Coff. 3.4371 x 10-3
30 Sum Average 5.2348 x 10-3
28 IMC2 8.2497 x 10-3
21 Difference Entropy 9.3717 x 10-3
22 Difference Variance 9.4370 x 10-3
19 Contrast 1.1036 x 10-2
45. Conclusions
• The merged features did preform better than the original features
on their own (proven by a larger AUC value).
• The top features for the prediction of cancer (based on ROC
analysis, ttest, and LDA) were FWHM ROI (4), Diameter (15), and
Margin Sharpness (3). Also FWHM Margin (9), Texture STD (8),
and Beta 4 (59).
• FWHM ROI (4) should be used instead of FWHM Margin (9) due
to high correlation.
46. Conclusions
• The features of the contralateral breast combined with the lesion
features are less affective predictors than the lesion features alone.
• The ability to differentiate between invasive and non-invasive based
on features appears promising but is so far inconclusive.
47. Difficulties
• The Non-Invasive data set has only 11 cases.
• The Malignant and Benign cases we used are
not in the lab database.
• The new data set has not yet come in.
48. Moving Forward
• Would use different classifier programs besides LDA to experiment
with addition or removal of features merged (particularly with
contralateral and lesion combinations).
• Once the new data comes in, would re-preform ROC Analysis for
Invasive vs. Non-invasive, and would use LDA to merge the most
promising features.
• Would begin to look at other classifiers such as ER and PR, genetic
information (HER2 (ERBB2) ,BRCA1/2 mutation), and Age.
49. The Intern Experience
Skills/ Information Learned:
-More comfortable with Linux (new commands)
Creating Matlab code for efficiency
understand ROC curves and AUC value meaning
-when, why and how to use LDA
-How to preform feature extraction of mammograms
-learned different segmentation types
understand more about statistical analysis (P, R2, ttest, ect.)
comfortable dealing with large matrices and data sets
-understand fundamental differences of malignant and benign lesions
Analyze meaning of results (look for trends, standard error, histogram distribution, ect.)
-learned about different types of cancer(IDC, DCIS, metastatic, ect.)
50. The Intern Experience
Lab Environment:
-Appreciate how computers have revolutionized Radiology
Thinking creatively/proactively about a project
-having another intern on the same project but working independently
-asking questions is important
-lab meetings
-lecture series