SlideShare uma empresa Scribd logo
1 de 27
Predicting
Breast Cancer

   Max Mitchell
 Amanda Rollason
 Cheannette Smith
AGENDA
 Understanding the Problem
 Exploring the Data
    Frequency Tables
    Graphs
 Modeling with Logistic Regression
 Validating the Predictive Model
THE PROBLEM
Of women who have a mammogram interpretation
that leads to a breast biopsy, 70% actually have
benign outcomes, which means only 30% sensitivity.




         Understand   Explore   Model   Validate
A SOLUTION
Better diagnostic tools for physicians would help
increase the sensitivity and specificity of
mammogram interpretations, while reducing the
number of unnecessary breast biopsies.




          Understand   Explore   Model   Validate
THE DATA SOURCE
 Dr. Rudiger Schulz-Wendtland, M. Elter, and T.
  Wittenberg at the Institute of Radiology at University
  of Erlangen in Nuremberg, Germany
 Data collected between 2003-2006
 Published in Medical Physics in 2007: “The Prediction
  of Breast Cancer Biopsy Outcomes Using Two CAD
  Approaches that both Emphasize an Intelligible
  Decision Process”
 Donated database to University of California-Irvine
  for their machine learning database repository

           Understand   Explore   Model   Validate
THE CASE-CONTROL STUDY
 961 observations represent the outcome for
  women who already had breast biopsies
   516 benign cases
   445 malignant cases
 Two physicians reviewed the mammograms
   not knowing that each woman already had a biopsy
    for the suspected mass
   not knowing the outcome of her biopsy
 Computer aided-diagnosis (CAD) systems were
  run for the full-field digital mammograms
         Understand   Explore   Model   Validate
THE VARIABLES
 1 binary response, the SEVERITY of mass lesion
    Malignant = 1
    Benign = 0
 1 continuous predictor, AGE
 1 semi-ordinal predictor, the BI-RADS Assessment
 3 nominal predictors, the CAD output
    SHAPE
    DENSITY
    MARGIN
         Understand   Explore   Model   Validate
THE MAMMOGRAM DATA




    Understand   Explore   Model   Validate
Breast Imaging Reporting and Data System
 BI-RADS
           ASSESSMENT                 CLINICAL RECOMMENDATION
CATEGORY
           Assessment                 Need to review prior studies
   0
           Incomplete                 and/or complete additional imaging

   1       Negative                   Continue routine screening

   2       Benign                     Continue routine screening

           Probably Benign            Short-term follow-up at 6 months,
   3
           Finding                    then every 6 to 12 months for 1 to 2 years

   4       Suspicious Abnormality Perform biopsy, preferably needle biopsy

           Highly Suspicious of
   5                                  Biopsy and treatment, as necessary
           Malignancy
           Known Biopsy–
   6                                  Assure that treatment is completed
           Proven Malignancy
              Understand          Explore      Model        Validate
SHAPE                                  MARGIN
1 = ROUND                              1 = CIRCUMSCRIBED



2 = OVAL                               2 = MICROLOBULATED



3 = LOBULAR                            3 = OBSCURED



4 = IRREGULAR                          4 = ILL-DEFINED



                                       5 = SPICULATED




                Understand   Explore      Model          Validate
DENSITY
1 = high                          2 = iso



                                                      4=fat-
                                                  containing
               3=low



           Understand   Explore     Model   Validate
EXPLORING THE DATA




     Understand   Explore   Model   Validate
EXPLORING THE DATA




     Understand   Explore   Model   Validate
EXPLORING THE DATA




     Understand   Explore   Model   Validate
LOGISTIC REGRESSION MODELS
 There is a binary response variable.
 There are more than three predictors, so
  frequency tables alone will be inadequate.
 The predictors are both numerical and
  categorical.
 Some of the categorical variables are ordinal.



         Understand   Explore   Model   Validate
PRE-MODELING
MODEL MAIN EFFECTS – Mostly Categorial
  1     AGE BIRADSc SHAPEc MARGINc DENSITYc
  2     AGE BIRADSc
  3     AGE SHAPEc MARGINc DENSITYc
  4     AGE SHAPEc MARGINc
  5     AGE BIRADSc SHAPEc MARGINc
  6     AGE BIRADSc SHAPEc

        Understand   Explore   Model   Validate
PRE-MODELING
MODEL MAIN EFFECTS – Mostly Numerical
  7    AGE BIRADS SHAPE MARGIN DENSITYc
  8    AGE BIRADS
  9    AGE SHAPE MARGIN DENSITYc
  10   AGE SHAPE MARGIN
  11   AGE BIRADS SHAPE MARGIN
  12   AGE BIRADS SHAPE

        Understand   Explore   Model   Validate
MODELING THE DATA
MODEL TERMS

  A   AGE BIRADSc SHAPEc MARGINc DENSITYc

  B   AGE BIRADSc SHAPEc MARGINc

  C   AGE BIRADSc SHAPEc

 D    AGE BIRADS SHAPE

  E   AGE BIRADS SHAPE AGE×BIRADS SHAPE×BIRADS

       Understand   Explore   Model   Validate
LOGISTIC REGRESSION – Training
  MODEL            p-cutoff      Sensitivity   Specificity      AIC     AUC = c

A B c S c Mc D c
                    0.414          0.845         0.838         475.3     0.910
     QCS

 A Bc S c M c       0.363          0.878         0.815         507.6     0.906


   A Bc Sc          0.364          0.875         0.815         534.7     0.902

    ABS             0.419          0.844         0.813         563.4     0.888

A B S AB SB         0.438          0.873         0.809         544.0     0.899


                    Understand       Explore     Model       Validate
LOGISTIC REGRESSION
Logit ( ˆ) α β1 AGE β2BIRADS β3SHAPE
           β4 AGE BIRADS β5BIRADS SHAPE




        Understand   Explore   Model   Validate
LOGISTIC REGRESSION – Validation
   MODEL                 Sensitivity         Specificity        AUC 95% CI

 A Bc S c M c D c
                           0.866                 0.863        (0.822, 0.907)
     QCS

  A Bc S c Mc              0.858                 0.850         (0.812 ,0.897)


    A Bc S c               0.846                 0.844        (0.802, 0.887)

     ABS                   0.835                 0.848        (0.798, 0.884)

 A B S AB SB               0.816                 0.870         (0.800, 0.928)


                    Understand         Explore      Model   Validate
Comparing ROC curves




     Understand   Explore   Model   Validate
Example 1
AGE = 42 | BIRADS = 2 | SHAPE = Oval = 2

Logit = -34.1514 + 0.2398(42) + 6.8365(2) + 4.0423 (2)
         – 0.0441(42)(2) – 0.7842(2)(2)
      = -8.903
Odds = e-8.903 = 0.0001

 Patient most likely does not have a malignant lesion.

TRUE. She had multiple cutaneous neurofibromas. They
are benign, so there is no evidence of malignancy. The
reader recommended that she should have a normal
interval screening follow-up in 12 months.



                Understand      Explore     Model         Validate
Example 2
       AGE = 62 | BIRADS = 4 | SHAPE = Irregular = 4

       Logit = -34.1514 + 0.2398(62) + 6.8365(4) + 4.0423 (4)
               – 0.0441(62)(4) – 0.7842(4)(4)
             = 1.5162
        Odds = e1.5162 = 4.55

          Patient most likely does have a malignant lesion.

            TRUE. She had invasive ductal carcinoma, so there was
            evidence of malignancy. The reader saw that she had a
            suspicious abnormality and recommended a core needle
            biopsy.


     Understand     Explore      Model      Validate
Conclusion
 Readers’ interpretation alone (BIRADS) isn’t
  sufficient.
 Computer Aided-Diagnosis systems (SHAPE,
  MARGIN, and DENSITY) alone aren’t sufficient.
 AGE does need to be considered for determining if
  a breast biopsy is warranted.
 AGE, BIRADS, and SHAPE did the most to improve
  sensitivity and specificity.



          Understand   Explore   Model   Validate
For the Future
 Incorporate other CAD tools.
    MRI tests
    Ultrasound examinations
 Explore results of other modeling methods.
    Decision Trees
    Boot-strapping
 Educate patients regarding the imperfect process
  of mammogram interpretation.


          Understand   Explore   Model   Validate
Questions?

Understand   Explore   Model   Validate

Mais conteúdo relacionado

Semelhante a Predicting Breast Cancer

BIRADS_Decoded.pdf
BIRADS_Decoded.pdfBIRADS_Decoded.pdf
BIRADS_Decoded.pdfssuser40fd68
 
Mammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptMammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptomkarnunna1
 
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadAdvanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadHeadDPT
 
Biological variation update_ed
Biological variation update_edBiological variation update_ed
Biological variation update_edmarufkhan056
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech
 
talalalsubaie-1220737011220266-9.pdf
talalalsubaie-1220737011220266-9.pdftalalalsubaie-1220737011220266-9.pdf
talalalsubaie-1220737011220266-9.pdfsomeyamohsen2
 
Case control studies..skp
Case control studies..skpCase control studies..skp
Case control studies..skpsudhiramkcg
 
Robust biomarker selection from RT-qPCR data using statistical consensus crit...
Robust biomarker selection from RT-qPCR data using statistical consensus crit...Robust biomarker selection from RT-qPCR data using statistical consensus crit...
Robust biomarker selection from RT-qPCR data using statistical consensus crit...Roger Alexander
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 
Arna Andrews, CSL limited (vienna, april 2016)
Arna Andrews,  CSL limited (vienna, april 2016)Arna Andrews,  CSL limited (vienna, april 2016)
Arna Andrews, CSL limited (vienna, april 2016)IanTaylor50
 
Mammographic mass dataset
Mammographic mass datasetMammographic mass dataset
Mammographic mass datasetSean Rezvani
 

Semelhante a Predicting Breast Cancer (20)

BIRADS_Decoded.pdf
BIRADS_Decoded.pdfBIRADS_Decoded.pdf
BIRADS_Decoded.pdf
 
Mammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.pptMammography-a techniques to identifiy breast cancer.ppt
Mammography-a techniques to identifiy breast cancer.ppt
 
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadAdvanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Biological variation update_ed
Biological variation update_edBiological variation update_ed
Biological variation update_ed
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"
 
talalalsubaie-1220737011220266-9.pdf
talalalsubaie-1220737011220266-9.pdftalalalsubaie-1220737011220266-9.pdf
talalalsubaie-1220737011220266-9.pdf
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
Case control studies..skp
Case control studies..skpCase control studies..skp
Case control studies..skp
 
Robust biomarker selection from RT-qPCR data using statistical consensus crit...
Robust biomarker selection from RT-qPCR data using statistical consensus crit...Robust biomarker selection from RT-qPCR data using statistical consensus crit...
Robust biomarker selection from RT-qPCR data using statistical consensus crit...
 
Biostatistics ug
Biostatistics  ug Biostatistics  ug
Biostatistics ug
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideo
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
CAD v2
CAD v2CAD v2
CAD v2
 
A010320106
A010320106A010320106
A010320106
 
Arna Andrews, CSL limited (vienna, april 2016)
Arna Andrews,  CSL limited (vienna, april 2016)Arna Andrews,  CSL limited (vienna, april 2016)
Arna Andrews, CSL limited (vienna, april 2016)
 
Mammographic mass dataset
Mammographic mass datasetMammographic mass dataset
Mammographic mass dataset
 
Sam Case Study
Sam Case StudySam Case Study
Sam Case Study
 

Predicting Breast Cancer

  • 1. Predicting Breast Cancer Max Mitchell Amanda Rollason Cheannette Smith
  • 2. AGENDA  Understanding the Problem  Exploring the Data  Frequency Tables  Graphs  Modeling with Logistic Regression  Validating the Predictive Model
  • 3. THE PROBLEM Of women who have a mammogram interpretation that leads to a breast biopsy, 70% actually have benign outcomes, which means only 30% sensitivity. Understand Explore Model Validate
  • 4. A SOLUTION Better diagnostic tools for physicians would help increase the sensitivity and specificity of mammogram interpretations, while reducing the number of unnecessary breast biopsies. Understand Explore Model Validate
  • 5. THE DATA SOURCE  Dr. Rudiger Schulz-Wendtland, M. Elter, and T. Wittenberg at the Institute of Radiology at University of Erlangen in Nuremberg, Germany  Data collected between 2003-2006  Published in Medical Physics in 2007: “The Prediction of Breast Cancer Biopsy Outcomes Using Two CAD Approaches that both Emphasize an Intelligible Decision Process”  Donated database to University of California-Irvine for their machine learning database repository Understand Explore Model Validate
  • 6. THE CASE-CONTROL STUDY  961 observations represent the outcome for women who already had breast biopsies  516 benign cases  445 malignant cases  Two physicians reviewed the mammograms  not knowing that each woman already had a biopsy for the suspected mass  not knowing the outcome of her biopsy  Computer aided-diagnosis (CAD) systems were run for the full-field digital mammograms Understand Explore Model Validate
  • 7. THE VARIABLES  1 binary response, the SEVERITY of mass lesion  Malignant = 1  Benign = 0  1 continuous predictor, AGE  1 semi-ordinal predictor, the BI-RADS Assessment  3 nominal predictors, the CAD output  SHAPE  DENSITY  MARGIN Understand Explore Model Validate
  • 8. THE MAMMOGRAM DATA Understand Explore Model Validate
  • 9. Breast Imaging Reporting and Data System BI-RADS ASSESSMENT CLINICAL RECOMMENDATION CATEGORY Assessment Need to review prior studies 0 Incomplete and/or complete additional imaging 1 Negative Continue routine screening 2 Benign Continue routine screening Probably Benign Short-term follow-up at 6 months, 3 Finding then every 6 to 12 months for 1 to 2 years 4 Suspicious Abnormality Perform biopsy, preferably needle biopsy Highly Suspicious of 5 Biopsy and treatment, as necessary Malignancy Known Biopsy– 6 Assure that treatment is completed Proven Malignancy Understand Explore Model Validate
  • 10. SHAPE MARGIN 1 = ROUND 1 = CIRCUMSCRIBED 2 = OVAL 2 = MICROLOBULATED 3 = LOBULAR 3 = OBSCURED 4 = IRREGULAR 4 = ILL-DEFINED 5 = SPICULATED Understand Explore Model Validate
  • 11. DENSITY 1 = high 2 = iso 4=fat- containing 3=low Understand Explore Model Validate
  • 12. EXPLORING THE DATA Understand Explore Model Validate
  • 13. EXPLORING THE DATA Understand Explore Model Validate
  • 14. EXPLORING THE DATA Understand Explore Model Validate
  • 15. LOGISTIC REGRESSION MODELS  There is a binary response variable.  There are more than three predictors, so frequency tables alone will be inadequate.  The predictors are both numerical and categorical.  Some of the categorical variables are ordinal. Understand Explore Model Validate
  • 16. PRE-MODELING MODEL MAIN EFFECTS – Mostly Categorial 1 AGE BIRADSc SHAPEc MARGINc DENSITYc 2 AGE BIRADSc 3 AGE SHAPEc MARGINc DENSITYc 4 AGE SHAPEc MARGINc 5 AGE BIRADSc SHAPEc MARGINc 6 AGE BIRADSc SHAPEc Understand Explore Model Validate
  • 17. PRE-MODELING MODEL MAIN EFFECTS – Mostly Numerical 7 AGE BIRADS SHAPE MARGIN DENSITYc 8 AGE BIRADS 9 AGE SHAPE MARGIN DENSITYc 10 AGE SHAPE MARGIN 11 AGE BIRADS SHAPE MARGIN 12 AGE BIRADS SHAPE Understand Explore Model Validate
  • 18. MODELING THE DATA MODEL TERMS A AGE BIRADSc SHAPEc MARGINc DENSITYc B AGE BIRADSc SHAPEc MARGINc C AGE BIRADSc SHAPEc D AGE BIRADS SHAPE E AGE BIRADS SHAPE AGE×BIRADS SHAPE×BIRADS Understand Explore Model Validate
  • 19. LOGISTIC REGRESSION – Training MODEL p-cutoff Sensitivity Specificity AIC AUC = c A B c S c Mc D c 0.414 0.845 0.838 475.3 0.910 QCS A Bc S c M c 0.363 0.878 0.815 507.6 0.906 A Bc Sc 0.364 0.875 0.815 534.7 0.902 ABS 0.419 0.844 0.813 563.4 0.888 A B S AB SB 0.438 0.873 0.809 544.0 0.899 Understand Explore Model Validate
  • 20. LOGISTIC REGRESSION Logit ( ˆ) α β1 AGE β2BIRADS β3SHAPE β4 AGE BIRADS β5BIRADS SHAPE Understand Explore Model Validate
  • 21. LOGISTIC REGRESSION – Validation MODEL Sensitivity Specificity AUC 95% CI A Bc S c M c D c 0.866 0.863 (0.822, 0.907) QCS A Bc S c Mc 0.858 0.850 (0.812 ,0.897) A Bc S c 0.846 0.844 (0.802, 0.887) ABS 0.835 0.848 (0.798, 0.884) A B S AB SB 0.816 0.870 (0.800, 0.928) Understand Explore Model Validate
  • 22. Comparing ROC curves Understand Explore Model Validate
  • 23. Example 1 AGE = 42 | BIRADS = 2 | SHAPE = Oval = 2 Logit = -34.1514 + 0.2398(42) + 6.8365(2) + 4.0423 (2) – 0.0441(42)(2) – 0.7842(2)(2) = -8.903 Odds = e-8.903 = 0.0001  Patient most likely does not have a malignant lesion. TRUE. She had multiple cutaneous neurofibromas. They are benign, so there is no evidence of malignancy. The reader recommended that she should have a normal interval screening follow-up in 12 months. Understand Explore Model Validate
  • 24. Example 2 AGE = 62 | BIRADS = 4 | SHAPE = Irregular = 4 Logit = -34.1514 + 0.2398(62) + 6.8365(4) + 4.0423 (4) – 0.0441(62)(4) – 0.7842(4)(4) = 1.5162 Odds = e1.5162 = 4.55  Patient most likely does have a malignant lesion. TRUE. She had invasive ductal carcinoma, so there was evidence of malignancy. The reader saw that she had a suspicious abnormality and recommended a core needle biopsy. Understand Explore Model Validate
  • 25. Conclusion  Readers’ interpretation alone (BIRADS) isn’t sufficient.  Computer Aided-Diagnosis systems (SHAPE, MARGIN, and DENSITY) alone aren’t sufficient.  AGE does need to be considered for determining if a breast biopsy is warranted.  AGE, BIRADS, and SHAPE did the most to improve sensitivity and specificity. Understand Explore Model Validate
  • 26. For the Future  Incorporate other CAD tools.  MRI tests  Ultrasound examinations  Explore results of other modeling methods.  Decision Trees  Boot-strapping  Educate patients regarding the imperfect process of mammogram interpretation. Understand Explore Model Validate
  • 27. Questions? Understand Explore Model Validate

Notas do Editor

  1. Cheannette
  2. Cheannette
  3. Cheannette
  4. Cheannette
  5. Cheannette
  6. Cheannette
  7. Cheannette
  8. Cheannette
  9. Cheannette
  10. Amanda
  11. Amanda
  12. Amanda
  13. Amanda
  14. Amanda
  15. Amanda
  16. Amanda
  17. Amanda
  18. Amanda
  19. Max
  20. Max
  21. Max
  22. Max
  23. Max
  24. Max
  25. Max
  26. Max
  27. Max