SlideShare uma empresa Scribd logo
1 de 52
Summer 2015 Internship
Medical Physics Department of Radiology
Dr. Maryellen L. Giger
TAYLOR MARTELL
Risk Assessment
Hypothesis
Main: Features of both the lesion and the contralateral
breast can help predict why cancer stops in some patients
and spreads in others.
Sub: A merging of features can create an even
better predictor than the original features themselves.
Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
3 Segmentation Types:
-Region Growing
~Uses Mass Classify
-Snake & RGI
~Use specific Matlab
Function
(runSnakeByList)
Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
Feature Extraction Process
1. Cut ROI
2. Identify Center
3. Segmentation
4. Overlay
5. Calculate Features
Applied Analysis Programs/Methods
● Two Feature scatterplots
○ For Malignant (106) and Benign (76)
○ For Invasive, Non-invasive, and Benign
● ROC Analysis
● Correlation Tests
● Linear Discriminant Analysis (LDA)
Malignant vs. Benign
Feature Plots
● Created an iterative code that loops through all of the
features and graphs every pair with respect to one
another.
Malignant vs. Benign
Feature Plots
● Created an iterative code that loops through all of the
features and graphs every pair with respect to one
another.
○ Created a code that allows an input of two specific
features, and automatically creates a graph with the
two feature names as the title.
Feature Scatter Plot Example
Invasive vs. noninvasive Feature Plots
ROC Analysis
Purpose: To determine how well each individual feature can predict the truth
(Whether Malignant vs. Benign, or Invasive vs. Non-Invasive).
Sensitivity (TPF) = The fraction of patients with
the disease correctly identified as “positive.”
Specificity = The fraction of patients without
the disease correctly identified as “negative.”
AUC = The probability that a specific classifier
will correctly differentiate between positive and
negative.
ROC Analysis
● Created a code that automatically plots an ROC curve of
a specific feature both iteratively or by specific selection.
ROC Analysis
● Created a code that automatically plots an ROC curve of
a specific feature both iteratively or by specific selection.
○ The code prints the AUC value on the graph, and
saves it in a separate matrix.
ROC Graph Example
*Used to differentiate
Malignant and Benign.
ROC Analysis Results
TOP 5 Features:
for Malignant vs. Benign
Feature # Feature Name AUC Value 95% Confidence
Interval
Malignant
Trend
4 FWHM ROI 0.7472 [.6778, .8166]
9 FWHM Margin 0.7456 [.6764, .8147]
8 Texture (STD div) 0.7112 [.6369, .7855]
15 Diameter 0.6730 [.5966, .7494]
3 Margin Sharpness 0.6717 [.5925, .7509]
Full Width of Half Maximum
It is given by the distance between
points on the curve at which the function
reaches half of its maximum value.
FWHM Neighborhoods
A) Within the Grown
Region
Feature Name:
FWHM Grown
(6)
FWHM Neighborhoods
B) Along the Extracted
Margin
Feature Name:
FWHM Margin
(9)
FWHM Neighborhoods
C) Within a rectangular
segment
Feature Name:
FWHM ROI
(4)
FWHM Neighborhoods
D) In the Surrounding
Periphery
Feature Name:
FWHM Border
(1)
Radial Gradient and Radial Angle
FWHM Benign Lesion Example
FWHM Malignant Lesion Example
Correlation Test
Purpose :
To see which features are similar to each other in order to create the
most affective feature merging.
R2 : The coefficient of determination. Is a statistical measure of how close the
data are to the fitted regression line. The higher the number, the better the
model fits your data (the higher the correlation).
P: The probability of getting “more extreme” results. The lower the number,
the smaller the probability of extreme values (the higher the correlation).
Correlation Test
● used the Matlab function “corr”, which returns both the
correlation coefficients, R, and the P values between two
features.
Correlation Test
● Used the Matlab function “corr”, which returns both the
correlation coefficients, R, and the P values between two
features.
○ created a 77 x 77 matrix of all of the R2 and P values
for each feature combination for both benign and
malignant cases.
Correlation Test Results (R2)
* The other 7 feature pair linear regression coefficients were <= .226 *
Feature Pair Malignant Coeff. Benign Coeff.
4 & 9 .9760 .8402
8 & 3 .6515 .6683
8 & 15 .5356 .4825
Correlation Test Results (P)
* The other 6 feature pair P values were >= .022*
Feature Pair Malignant P-value Benign P-value
4 & 9 4.419 x 10-86 3.461 x 10-31
8 & 3 1.510 x 10-25 2.076 x 10-19
8 & 15 5.062 x 10-19 3.411 x 10-12
15 & 3 2.711 X 10-7 2.460 x 10-5
Features 4 & 9 Correlation
Features 8 & 3 Correlation
Purpose:
● To create a more robust merged feature model that is not dependent on
the specific cases given.
Procedure:
● A “round-robin” process where each time one case is left out and the best
features are selected.
○ The selected features are then merged and applied to that one case
that was left out, and a score is given for how well those combined
features predicted the correct outcome.
Linear Discriminant Analysis
Consistently Selected Features:
for Malignant vs. Benign with all 77 features
Linear Discriminant Analysis
Feature # Feature Name AUC Value
Alone
4 FWHM ROI 0.7472
15 Diameter 0.6730
16 Circularity 0.5549
3 Margin Sharpness 0.6717
59 Beta 4 0.5440
AUC value of discriminant
scores (new pseudo
features):
.7655
of just the 32 lesion features:
● Most commonly selected were: 3 (Margin Sharpness), 4 (FWHM
ROI), 15 (Diameter), 16 (Circularity)
● The Az value of the discriminant scores was .7745
Linear Discriminant Analysis
ROC Analysis Results
Top 5 Features:
for Invasive vs. Non-Invasive
Feature # Feature Name AUC
Value
95% Confidence
Interval
Invasive
Trend
49 Balance 2 .6649 [.5806, .7492]
46 Balance 1 .6492 [.5643, .7341]
50 Skewness .6492 [.5651, .7333]
3 Margin Sharpness .6267 [.5475, .7059]
59 Beta 4 .6213 [.5390, .7036]
Beta 4 Examples
Beta 4 Examples
Histograms of Balance 1
Invasive
Invasive
Invasive Non-Invasive
Histograms of Beta 4
Invasive Non-Invasive
Ttest
Purpose:
• To determine whether or not the difference between two groups’ averages
is due to random chance.
• We used it to double-check our ROC results
Procedure:
• Compares the average values of two groups, evaluates the standard
deviation of the values from the average, and then computes the statistical
significance of the difference between the two group averages.
Ttest Results and Trends
*P-value: The percent
chance that the
difference between
the two data sets
is due to random
variation.
Feature # Feature Name Malignant Trend ttest P-value
4 FWHM ROI 7.6131 x 10-9
9 FWHM Margin 9.5087 x 10-9
8 Texture (STD div) 9.4544 x 10-6
3 Margin Sharpness 1.1289 x 10-5
15 Diameter 4.4803 x 10-5
6 FWHM Grown 1.0707 x 10-4
1 FWHM Border 1.7025 x 10-4
5 Radial Grad ROI 1.4082 x 10-3
10 Radial Grad Margin 3.0365 x 10-3
Ttest Results and Trends (cont.)
*P-value: The percent
chance that the
difference between
the two data sets
is due to random
variation.
Feature # Feature Name Malignant Trend ttest P-value
29 Max Corr. Coff. 3.4371 x 10-3
30 Sum Average 5.2348 x 10-3
28 IMC2 8.2497 x 10-3
21 Difference Entropy 9.3717 x 10-3
22 Difference Variance 9.4370 x 10-3
19 Contrast 1.1036 x 10-2
Conclusions
• The merged features did preform better than the original features
on their own (proven by a larger AUC value).
• The top features for the prediction of cancer (based on ROC
analysis, ttest, and LDA) were FWHM ROI (4), Diameter (15), and
Margin Sharpness (3). Also FWHM Margin (9), Texture STD (8),
and Beta 4 (59).
• FWHM ROI (4) should be used instead of FWHM Margin (9) due
to high correlation.
Conclusions
• The features of the contralateral breast combined with the lesion
features are less affective predictors than the lesion features alone.
• The ability to differentiate between invasive and non-invasive based
on features appears promising but is so far inconclusive.
Difficulties
• The Non-Invasive data set has only 11 cases.
• The Malignant and Benign cases we used are
not in the lab database.
• The new data set has not yet come in.
Moving Forward
• Would use different classifier programs besides LDA to experiment
with addition or removal of features merged (particularly with
contralateral and lesion combinations).
• Once the new data comes in, would re-preform ROC Analysis for
Invasive vs. Non-invasive, and would use LDA to merge the most
promising features.
• Would begin to look at other classifiers such as ER and PR, genetic
information (HER2 (ERBB2) ,BRCA1/2 mutation), and Age.
The Intern Experience
Skills/ Information Learned:
-More comfortable with Linux (new commands)
Creating Matlab code for efficiency
understand ROC curves and AUC value meaning
-when, why and how to use LDA
-How to preform feature extraction of mammograms
-learned different segmentation types
understand more about statistical analysis (P, R2, ttest, ect.)
comfortable dealing with large matrices and data sets
-understand fundamental differences of malignant and benign lesions
Analyze meaning of results (look for trends, standard error, histogram distribution, ect.)
-learned about different types of cancer(IDC, DCIS, metastatic, ect.)
The Intern Experience
Lab Environment:
-Appreciate how computers have revolutionized Radiology
Thinking creatively/proactively about a project
-having another intern on the same project but working independently
-asking questions is important
-lab meetings
-lecture series
THANK YOU!
Sources
"Delve Deeper into Survey Data with Minitab: 2-Sample T-Tests, Proportion Tests, ANOVA and
Regression.” Minitab Inc., 2015. Web. 09 July 2015. <http://www.minitab.com/en-us/
Published-Articles/Delve-Deeper-into-Survey-Data-with-Minitab--2-Sample-t-Tests,-
Proportion-Tests,-ANOVA-and-Regression/>.
"GraphPad Statistics Guide." GraphPad Statistics Guide. GraphPad Software Inc., 1995-2015. Web. 09
July 2015.
Huo, Zhimin. Computerized Methods for Classification of Masses and Analysis of Parenchymal
Patterns of Digitized Mammograms. Diss. The U of Chicago, 1998. Ann Arbor: Bell &
Howell Information, 1998. Print.
Nuzzo, Regina. "Scientific Method: Statistical Errors." Nature 506.7487 (2014): 150-52. Nature.com.
Nature Publishing Group, 13 Feb. 2014. Web. 09 July 2015.

Mais conteúdo relacionado

Mais procurados

PEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliPEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliMDO_Lab
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...PAPIs.io
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliMDO_Lab
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsChamin Nalinda Loku Gam Hewage
 
Classification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic RegressionClassification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic RegressionSetia Pramana
 
Open06
Open06Open06
Open06butest
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliMDO_Lab
 
COSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_AliCOSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_AliMDO_Lab
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationSridhar Nomula
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithmrahulmonikasharma
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)MikeBlyth
 
Evaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyEvaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyMohammed Abdullah Issa
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop QuantUniversity
 

Mais procurados (20)

PEMF-1-MAO2012-Ali
PEMF-1-MAO2012-AliPEMF-1-MAO2012-Ali
PEMF-1-MAO2012-Ali
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
Feature selection
Feature selectionFeature selection
Feature selection
 
ModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_AliModelSelection1_WCSMO_2013_Ali
ModelSelection1_WCSMO_2013_Ali
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection AlgorithmsBranch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
 
Classification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic RegressionClassification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic Regression
 
Open06
Open06Open06
Open06
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
 
COSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_AliCOSMOS1_Scitech_2014_Ali
COSMOS1_Scitech_2014_Ali
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classification
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Evaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyEvaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodology
 
Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop Anomaly detection : QuantUniversity Workshop
Anomaly detection : QuantUniversity Workshop
 

Semelhante a Summer 2015 Internship

SummerPresentation_FrankWaggoner_2015
SummerPresentation_FrankWaggoner_2015SummerPresentation_FrankWaggoner_2015
SummerPresentation_FrankWaggoner_2015Frank Waggoner
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_JieMDO_Lab
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...IRJET Journal
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyonoDwi Riyono
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...RAHUL WAGAJ
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciencesfsmart01
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 

Semelhante a Summer 2015 Internship (20)

SummerPresentation_FrankWaggoner_2015
SummerPresentation_FrankWaggoner_2015SummerPresentation_FrankWaggoner_2015
SummerPresentation_FrankWaggoner_2015
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
report
reportreport
report
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Final presentation dwi riyono
Final presentation dwi riyonoFinal presentation dwi riyono
Final presentation dwi riyono
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...A Comparative study of locality Preserving Projection & Principle Component A...
A Comparative study of locality Preserving Projection & Principle Component A...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 

Summer 2015 Internship

  • 1. Summer 2015 Internship Medical Physics Department of Radiology Dr. Maryellen L. Giger TAYLOR MARTELL Risk Assessment
  • 2. Hypothesis Main: Features of both the lesion and the contralateral breast can help predict why cancer stops in some patients and spreads in others. Sub: A merging of features can create an even better predictor than the original features themselves.
  • 3. Feature Extraction Process 1. Cut ROI 2. Identify Center 3. Segmentation 4. Overlay 5. Calculate Features
  • 4. Feature Extraction Process 1. Cut ROI 2. Identify Center 3. Segmentation 4. Overlay 5. Calculate Features
  • 5. Feature Extraction Process 1. Cut ROI 2. Identify Center 3. Segmentation 4. Overlay 5. Calculate Features 3 Segmentation Types: -Region Growing ~Uses Mass Classify -Snake & RGI ~Use specific Matlab Function (runSnakeByList)
  • 6. Feature Extraction Process 1. Cut ROI 2. Identify Center 3. Segmentation 4. Overlay 5. Calculate Features
  • 7. Feature Extraction Process 1. Cut ROI 2. Identify Center 3. Segmentation 4. Overlay 5. Calculate Features
  • 8. Applied Analysis Programs/Methods ● Two Feature scatterplots ○ For Malignant (106) and Benign (76) ○ For Invasive, Non-invasive, and Benign ● ROC Analysis ● Correlation Tests ● Linear Discriminant Analysis (LDA)
  • 9. Malignant vs. Benign Feature Plots ● Created an iterative code that loops through all of the features and graphs every pair with respect to one another.
  • 10. Malignant vs. Benign Feature Plots ● Created an iterative code that loops through all of the features and graphs every pair with respect to one another. ○ Created a code that allows an input of two specific features, and automatically creates a graph with the two feature names as the title.
  • 12. Invasive vs. noninvasive Feature Plots
  • 13. ROC Analysis Purpose: To determine how well each individual feature can predict the truth (Whether Malignant vs. Benign, or Invasive vs. Non-Invasive). Sensitivity (TPF) = The fraction of patients with the disease correctly identified as “positive.” Specificity = The fraction of patients without the disease correctly identified as “negative.” AUC = The probability that a specific classifier will correctly differentiate between positive and negative.
  • 14. ROC Analysis ● Created a code that automatically plots an ROC curve of a specific feature both iteratively or by specific selection.
  • 15. ROC Analysis ● Created a code that automatically plots an ROC curve of a specific feature both iteratively or by specific selection. ○ The code prints the AUC value on the graph, and saves it in a separate matrix.
  • 16. ROC Graph Example *Used to differentiate Malignant and Benign.
  • 17. ROC Analysis Results TOP 5 Features: for Malignant vs. Benign Feature # Feature Name AUC Value 95% Confidence Interval Malignant Trend 4 FWHM ROI 0.7472 [.6778, .8166] 9 FWHM Margin 0.7456 [.6764, .8147] 8 Texture (STD div) 0.7112 [.6369, .7855] 15 Diameter 0.6730 [.5966, .7494] 3 Margin Sharpness 0.6717 [.5925, .7509]
  • 18. Full Width of Half Maximum It is given by the distance between points on the curve at which the function reaches half of its maximum value.
  • 19. FWHM Neighborhoods A) Within the Grown Region Feature Name: FWHM Grown (6)
  • 20. FWHM Neighborhoods B) Along the Extracted Margin Feature Name: FWHM Margin (9)
  • 21. FWHM Neighborhoods C) Within a rectangular segment Feature Name: FWHM ROI (4)
  • 22. FWHM Neighborhoods D) In the Surrounding Periphery Feature Name: FWHM Border (1)
  • 23. Radial Gradient and Radial Angle
  • 26. Correlation Test Purpose : To see which features are similar to each other in order to create the most affective feature merging. R2 : The coefficient of determination. Is a statistical measure of how close the data are to the fitted regression line. The higher the number, the better the model fits your data (the higher the correlation). P: The probability of getting “more extreme” results. The lower the number, the smaller the probability of extreme values (the higher the correlation).
  • 27. Correlation Test ● used the Matlab function “corr”, which returns both the correlation coefficients, R, and the P values between two features.
  • 28. Correlation Test ● Used the Matlab function “corr”, which returns both the correlation coefficients, R, and the P values between two features. ○ created a 77 x 77 matrix of all of the R2 and P values for each feature combination for both benign and malignant cases.
  • 29. Correlation Test Results (R2) * The other 7 feature pair linear regression coefficients were <= .226 * Feature Pair Malignant Coeff. Benign Coeff. 4 & 9 .9760 .8402 8 & 3 .6515 .6683 8 & 15 .5356 .4825
  • 30. Correlation Test Results (P) * The other 6 feature pair P values were >= .022* Feature Pair Malignant P-value Benign P-value 4 & 9 4.419 x 10-86 3.461 x 10-31 8 & 3 1.510 x 10-25 2.076 x 10-19 8 & 15 5.062 x 10-19 3.411 x 10-12 15 & 3 2.711 X 10-7 2.460 x 10-5
  • 31. Features 4 & 9 Correlation
  • 32. Features 8 & 3 Correlation
  • 33. Purpose: ● To create a more robust merged feature model that is not dependent on the specific cases given. Procedure: ● A “round-robin” process where each time one case is left out and the best features are selected. ○ The selected features are then merged and applied to that one case that was left out, and a score is given for how well those combined features predicted the correct outcome. Linear Discriminant Analysis
  • 34. Consistently Selected Features: for Malignant vs. Benign with all 77 features Linear Discriminant Analysis Feature # Feature Name AUC Value Alone 4 FWHM ROI 0.7472 15 Diameter 0.6730 16 Circularity 0.5549 3 Margin Sharpness 0.6717 59 Beta 4 0.5440 AUC value of discriminant scores (new pseudo features): .7655
  • 35. of just the 32 lesion features: ● Most commonly selected were: 3 (Margin Sharpness), 4 (FWHM ROI), 15 (Diameter), 16 (Circularity) ● The Az value of the discriminant scores was .7745 Linear Discriminant Analysis
  • 36.
  • 37. ROC Analysis Results Top 5 Features: for Invasive vs. Non-Invasive Feature # Feature Name AUC Value 95% Confidence Interval Invasive Trend 49 Balance 2 .6649 [.5806, .7492] 46 Balance 1 .6492 [.5643, .7341] 50 Skewness .6492 [.5651, .7333] 3 Margin Sharpness .6267 [.5475, .7059] 59 Beta 4 .6213 [.5390, .7036]
  • 40. Histograms of Balance 1 Invasive Invasive Invasive Non-Invasive
  • 41. Histograms of Beta 4 Invasive Non-Invasive
  • 42. Ttest Purpose: • To determine whether or not the difference between two groups’ averages is due to random chance. • We used it to double-check our ROC results Procedure: • Compares the average values of two groups, evaluates the standard deviation of the values from the average, and then computes the statistical significance of the difference between the two group averages.
  • 43. Ttest Results and Trends *P-value: The percent chance that the difference between the two data sets is due to random variation. Feature # Feature Name Malignant Trend ttest P-value 4 FWHM ROI 7.6131 x 10-9 9 FWHM Margin 9.5087 x 10-9 8 Texture (STD div) 9.4544 x 10-6 3 Margin Sharpness 1.1289 x 10-5 15 Diameter 4.4803 x 10-5 6 FWHM Grown 1.0707 x 10-4 1 FWHM Border 1.7025 x 10-4 5 Radial Grad ROI 1.4082 x 10-3 10 Radial Grad Margin 3.0365 x 10-3
  • 44. Ttest Results and Trends (cont.) *P-value: The percent chance that the difference between the two data sets is due to random variation. Feature # Feature Name Malignant Trend ttest P-value 29 Max Corr. Coff. 3.4371 x 10-3 30 Sum Average 5.2348 x 10-3 28 IMC2 8.2497 x 10-3 21 Difference Entropy 9.3717 x 10-3 22 Difference Variance 9.4370 x 10-3 19 Contrast 1.1036 x 10-2
  • 45. Conclusions • The merged features did preform better than the original features on their own (proven by a larger AUC value). • The top features for the prediction of cancer (based on ROC analysis, ttest, and LDA) were FWHM ROI (4), Diameter (15), and Margin Sharpness (3). Also FWHM Margin (9), Texture STD (8), and Beta 4 (59). • FWHM ROI (4) should be used instead of FWHM Margin (9) due to high correlation.
  • 46. Conclusions • The features of the contralateral breast combined with the lesion features are less affective predictors than the lesion features alone. • The ability to differentiate between invasive and non-invasive based on features appears promising but is so far inconclusive.
  • 47. Difficulties • The Non-Invasive data set has only 11 cases. • The Malignant and Benign cases we used are not in the lab database. • The new data set has not yet come in.
  • 48. Moving Forward • Would use different classifier programs besides LDA to experiment with addition or removal of features merged (particularly with contralateral and lesion combinations). • Once the new data comes in, would re-preform ROC Analysis for Invasive vs. Non-invasive, and would use LDA to merge the most promising features. • Would begin to look at other classifiers such as ER and PR, genetic information (HER2 (ERBB2) ,BRCA1/2 mutation), and Age.
  • 49. The Intern Experience Skills/ Information Learned: -More comfortable with Linux (new commands) Creating Matlab code for efficiency understand ROC curves and AUC value meaning -when, why and how to use LDA -How to preform feature extraction of mammograms -learned different segmentation types understand more about statistical analysis (P, R2, ttest, ect.) comfortable dealing with large matrices and data sets -understand fundamental differences of malignant and benign lesions Analyze meaning of results (look for trends, standard error, histogram distribution, ect.) -learned about different types of cancer(IDC, DCIS, metastatic, ect.)
  • 50. The Intern Experience Lab Environment: -Appreciate how computers have revolutionized Radiology Thinking creatively/proactively about a project -having another intern on the same project but working independently -asking questions is important -lab meetings -lecture series
  • 52. Sources "Delve Deeper into Survey Data with Minitab: 2-Sample T-Tests, Proportion Tests, ANOVA and Regression.” Minitab Inc., 2015. Web. 09 July 2015. <http://www.minitab.com/en-us/ Published-Articles/Delve-Deeper-into-Survey-Data-with-Minitab--2-Sample-t-Tests,- Proportion-Tests,-ANOVA-and-Regression/>. "GraphPad Statistics Guide." GraphPad Statistics Guide. GraphPad Software Inc., 1995-2015. Web. 09 July 2015. Huo, Zhimin. Computerized Methods for Classification of Masses and Analysis of Parenchymal Patterns of Digitized Mammograms. Diss. The U of Chicago, 1998. Ann Arbor: Bell & Howell Information, 1998. Print. Nuzzo, Regina. "Scientific Method: Statistical Errors." Nature 506.7487 (2014): 150-52. Nature.com. Nature Publishing Group, 13 Feb. 2014. Web. 09 July 2015.

Notas do Editor

  1. using the “Display Image” Function