SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Using ANOVA to evaluate
differential protein expression
          Shaun Broderick
      The Ohio State University
              OARDC
The research question: Which proteins are
differentially expressed during flower senescence?
 • Senescence is an active stage of flower development
 • Senescence activates the degradation of floral proteins
    – Proteases, nucleases, electron transport proteins,
      lipoxygenases
 • Remobilization of nutrients
    – transmembrane proteins (i.e. pumps)




                Figure Credit: Bai et al., 2010 © 2010 The Author(s).
Why petunias?
Petunias serve as a model organism for studying flower senescence because they have a
                           well-defined senescence pattern




                         Figure Credit: Bai et al., 2010 © 2010 The Author(s).
Experimental design for protein
               analysis
• Treatments: pollinated and unpollinated
• Senescence cycle is completed in 72 hours—
  corollas will be collected at 0, 24, 48 and 72 hours
• 8 corollas will be pooled from at least 3 plants for
  each time point
• Three 2D gels will be poured in each rep
• Total soluble protein will be extracted from
  corollas and separated on 2D gel
• Protein spots will be cut from gel and sequenced
2D-gel electrophoresis: Finding
    differentially expressed proteins
• Extract all soluble proteins from senescing
  corollas
• Extract all soluble proteins from non-
  senescing corollas of the same developmental
  stage
• Proteins can be separated by their physical
  properties of net charge and molecular weight
Protein changes visualized by 2D gels
             2nd separated by mass
             (smaller proteins move faster)
                                              1st separated
                                              by their
                                              isoelectric
                                              point (i.e.
                                              charge)
Pollinated vs. unpollinated proteins
         Pollinated                  Unpollinated




Potential problem: Variation in protein extracts will
change protein patterns and amounts;
how do we set an objective cut-off for “differential
expression”?
Data analysis overview
•   Use SAS to rearrange a data set
•   Evaluate distribution of data
•   Transform data
•   Estimate an experiment-wide cutoff
Analyzing the data
• What analyses would be most appropriate?
• Do the data fit the assumptions for ANOVA
  (normality)?
• Transform data (either in Excel or SAS)
• How will the data need to be structured?
• What are the main factors in the model?
• Statistically determine the cutoff point (i.e. least
  significant difference [LSD]) for statistical
  significance
Are the data normally distributed?
Useful SAS code for visualizing the data


options formdlim='~';
data one;
  title 'Bai Protein Expression data - sequenced Proteins';
  infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv'
dlm=',' firstobs=2 ;
  input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481
WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242 HAP243 HAP481
HAP482 HAP483 HAP721 HAP722 HAP723;
Proc univariate normal plot;
        var WT01 WT02 WT03 WTU241 WTU242 WTU243;
        run;
Are the data normally distributed?
options formdlim='~';
data one;
  title 'Bai Protein Expression data - sequenced Proteins';
  infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv'
dlm=',' firstobs=2 ;
  input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481
WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242 HAP243 HAP481
HAP482 HAP483 HAP721 HAP722 HAP723;
Proc univariate normal plot;
        var WT01 WT02 WT03 WTU241 WTU242 WTU243;
        run;




The data are not normally distributed. The histogram should appear as a bell shaped
curve turned on its side. The normal probability plot should appear as a straight line.
These data require transformation.
options formdlim='~';
                       Transformed via log + 1
data one;
  title 'Bai Protein Expression data - sequenced Proteins';
  infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv' dlm=',' firstobs=2 ;
  input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481 WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242
HAP243 HAP481 HAP482 HAP483 HAP721 HAP722 HAP723;
lnWT01 = log(WT01 + 1);
lnWT02 = log(WT02 + 1);
lnWT03 = log(WT03 + 1);                                                              Data now fit the
lnWTU241 = log(WTU241 + 1);                                                          ANOVA assumption of
lnWTU242 = log(WTU242 + 1);
lnWTU243 = log(WTU243 + 1);                                                          normality.
lnWTU481 = log(WTU481 + 1);
lnWTU482 = log(WTU482 + 1);
lnWTU483 = log(WTU483 + 1);
lnWTU721 = log(WTU721 + 1);
lnWTU722 = log(WTU722 + 1);
lnWTU723 = log(WTU723 + 1);
lnHAP241 = log(HAP241 +1);
lnHAP242 = log(HAP242 +1);
lnHAP243 = log(HAP243 +1);
lnHAP481 = log(HAP481 +1);
lnHAP482 = log(HAP482 +1);
lnHAP483 = log(HAP483 +1);
lnHAP721 = log(HAP721 + 1);
lnHAP722 = log(HAP722 + 1);
lnHAP723 = log(HAP723 + 1);
Proc univariate normal plot;
            var lnWT01 lnWT02 lnWT03 lnWTU241 lnWTU242 lnWTU243;
            run;
Structuring data for SAS


EXP   =   lnWT01; treatment =   0; time =    0; Rep = 1;    output;
EXP   =   lnWT02; treatment =   0; time =    0; Rep = 2;    output;
EXP   =   lnWT03; treatment =   0; time =    0; Rep = 3;    output;
EXP   =   lnWTU241; treatment   = 1; time    = 24; Rep =    1; output;
EXP   =   lnWTU242; treatment   = 1; time    = 24; Rep =    2; output;
EXP   =   lnWTU243; treatment   = 1; time    = 24; Rep =    3; output;
EXP   =   lnWTU481; treatment   = 1; time    = 48; Rep =    1; output;
EXP   =   lnWTU482; treatment   = 1; time    = 48; Rep =    2; output;
EXP   =   lnWTU483; treatment   = 1; time    = 48; Rep =    3; output;
EXP   =   lnWTU721; treatment   = 1; time    = 72; Rep =    1; output;
EXP   =   lnWTU722; treatment   = 1; time    = 72; Rep =    2; output;
EXP   =   lnWTU723; treatment   = 1; time    = 72; Rep =    3; output;
EXP   =   lnHAP241; treatment   = 2; time    = 24; rep =    1; output;
EXP   =   lnHAP242; treatment   = 2; time    = 24; rep =    2; output;
EXP   =   lnHAP243; treatment   = 2; time    = 24; rep =    3; output;
EXP   =   lnHAP481; treatment   = 2; time    = 48; rep =    1; output;
EXP   =   lnHAP482; treatment   = 2; time    = 48; rep =    2; output;

                   The baseline for protein quantities is at 0 hours after pollination.
SAS Command: Proc GLM
  Main effects (all fixed within the model):
  • protein (protein spots on the gel)
  • treatment (unpollinated/pollinated)
  • time (after pollination)
  • rep (3, 2D gels per treatment)

Proc glm;
       class protein treatment time rep;
       model EXP = protein treatment time rep rep*protein
       rep*treatment rep*time treat*protein time*protein
       time*treatment;
Proc varcomp method = REML;
       class protein treatment time rep;
       model EXP = protein treatment time rep rep*protein
       rep*treatment rep*time treat*protein time*protein
       time*treatment;
quit;
Output of Proc GLM
The GLM Procedure
Dependent Variable: EXP
                                             Sum of
   Source                       DF          Squares       Mean Square     F Value   Pr > F
   Model                       989      36342.17871          36.74639       20.22   <.0001
   Error                      1926       3499.80085           1.81713
   Corrected Total            2915      39841.97956
                                                                        Mean Square Error (MSE)
                          R-Square      Coeff Var        Root MSE       EXP Mean
                          0.912158       17.39963        1.348011       7.747356

   Source                       DF      Type III SS       Mean Square     F Value   Pr > F
   protein                     139      22324.65836         160.60905       88.39   <.0001
   treatment                     1        519.20558         519.20558      285.73   <.0001
   time                          2        108.58773          54.29387       29.88   <.0001
   Rep                           2          0.35196           0.17598        0.10   0.9077
   Protein*Rep                 278         33.08495           0.11901        0.07   1.0000
   treatment*Rep                 2          2.07059           1.03530        0.57   0.5658
   time*Rep                      4          1.39061           0.34765        0.19   0.9430
   Protein*treatment           139       5433.79977          39.09208       21.51   <.0001
   Protein*time                278       4550.96421          16.37037        9.01   <.0001
   treatment*time                2        216.32514         108.16257       59.52   <.0001

             Sources in red were found to be significant. Rep is not significant.
Appropriate cut off
• The data are inherently noisy
• The cut off (or least significant difference,
  LSD) should be based on the variance of the
  data, or Mean Square Error and the level of
  statistical significance
• The LSD can be calculated with e[t×√(MSE/n)]/2,
  where t is 1.96 for P = 0.05, 2.576 for P =
  0.01, and 3.291 for P = 0.001 and n is the
  number of replicates
Use Excel for determining LSD
     This is calculated and            Choose the appropriate α-
     given in the SAS output           level for the experiment; n = 3.
                                       LSD = [t*(SQRT(MSE/n))]/2

            MSE             LSD              Fold             α-level
            1.817           0.762            2.144            p = 0.05
            1.817           1.002            2.725            p = 0.01
            1.817           1.281            3.599            p = 0.001

         The LSD must be “untransformed” using Fold = (LSD)e
         This can be done in excel with Fold=power(2.718, LSD)

Why is it a fold difference?
A log transformation was necessary to fit normality assumptions. Log data are only
interpretable as fold differences
What parameters are used to select an appropriate significance level?
For a more conservative value, you would select a smaller α-level. This reduces the chance
of a Type I error
Summary
1. Determine if the data fit the assumptions
2. Transform as needed
3. Restructure the data in SAS
4. Determine the main effects for the
   experiment
5. Use Proc GLM to determine the MSE
6. Calculate the cutoff (LSD) for statistical
   significance
References Cited
• Bai, S., B. Willard, L. J. Chapin, M. T. Kinter, D. M. Francis, A. D.
  Stead, and M. L. Jones. 2010. Proteomic analysis of
  pollination-induced corolla senescence in petunia. Journal of
  Experimental Botany 61:1089-1109. (Available online at:
  http://dx.doi.org/10.1093/jxb/erp373) (verified 20 Oct 2011).

• Kerr, M. K., M. Martin, and G. A. Churchill. 2000. Analysis of
  variance for gene expression microarray data. Journal of
  Computational Biology 7:819-837. (Available online at:
  http://dx.doi.org/10.1089/10665270050514954) (verified 20
  Oct 2011).

Mais conteúdo relacionado

Semelhante a Using Anova to Evaluate Differential Protein Expression

Pham,Nhat_ResearchPoster
Pham,Nhat_ResearchPosterPham,Nhat_ResearchPoster
Pham,Nhat_ResearchPosterNhat Pham
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analysesMike LaValley
 
Presenter Disclosure Information
Presenter Disclosure InformationPresenter Disclosure Information
Presenter Disclosure Informationdrucsamal
 
Development and validation of rp hplc method for simultaneous
Development and validation of rp hplc method for simultaneousDevelopment and validation of rp hplc method for simultaneous
Development and validation of rp hplc method for simultaneouschandu chatla
 
Binomial and Poission Probablity distribution
Binomial and Poission Probablity distributionBinomial and Poission Probablity distribution
Binomial and Poission Probablity distributionPrateek Singla
 
Slides faod - the other mitochondrial energy diseases - vockley
Slides   faod - the other mitochondrial energy diseases - vockleySlides   faod - the other mitochondrial energy diseases - vockley
Slides faod - the other mitochondrial energy diseases - vockleymitoaction
 
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayExpt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayVISHALJADHAV100
 
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayExpt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayVISHALJADHAV100
 
Tables and Formulas for Sullivan, Statistics Informed Decisio.docx
Tables and Formulas for Sullivan, Statistics Informed Decisio.docxTables and Formulas for Sullivan, Statistics Informed Decisio.docx
Tables and Formulas for Sullivan, Statistics Informed Decisio.docxmattinsonjanel
 
Calculate and Interpret Pharmacokinetic Parameters of a Given Drug
Calculate and Interpret Pharmacokinetic Parameters of a Given DrugCalculate and Interpret Pharmacokinetic Parameters of a Given Drug
Calculate and Interpret Pharmacokinetic Parameters of a Given DrugShivankan Kakkar
 

Semelhante a Using Anova to Evaluate Differential Protein Expression (20)

Pham,Nhat_ResearchPoster
Pham,Nhat_ResearchPosterPham,Nhat_ResearchPoster
Pham,Nhat_ResearchPoster
 
Lncntrst
LncntrstLncntrst
Lncntrst
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
 
Static Models of Continuous Variables
Static Models of Continuous VariablesStatic Models of Continuous Variables
Static Models of Continuous Variables
 
Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Two dependent samples (matched pairs)
Two dependent samples (matched pairs)
 
Presenter Disclosure Information
Presenter Disclosure InformationPresenter Disclosure Information
Presenter Disclosure Information
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Proteinomics
ProteinomicsProteinomics
Proteinomics
 
Development and validation of rp hplc method for simultaneous
Development and validation of rp hplc method for simultaneousDevelopment and validation of rp hplc method for simultaneous
Development and validation of rp hplc method for simultaneous
 
Binomial and Poission Probablity distribution
Binomial and Poission Probablity distributionBinomial and Poission Probablity distribution
Binomial and Poission Probablity distribution
 
Slides faod - the other mitochondrial energy diseases - vockley
Slides   faod - the other mitochondrial energy diseases - vockleySlides   faod - the other mitochondrial energy diseases - vockley
Slides faod - the other mitochondrial energy diseases - vockley
 
Statistics
StatisticsStatistics
Statistics
 
Student’s t test
Student’s  t testStudent’s  t test
Student’s t test
 
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayExpt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
 
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassayExpt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
Expt. 1 Bioassay of serotonin using rat fundus strip by three point bioassay
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Statistics
StatisticsStatistics
Statistics
 
Tables and Formulas for Sullivan, Statistics Informed Decisio.docx
Tables and Formulas for Sullivan, Statistics Informed Decisio.docxTables and Formulas for Sullivan, Statistics Informed Decisio.docx
Tables and Formulas for Sullivan, Statistics Informed Decisio.docx
 
Calculate and Interpret Pharmacokinetic Parameters of a Given Drug
Calculate and Interpret Pharmacokinetic Parameters of a Given DrugCalculate and Interpret Pharmacokinetic Parameters of a Given Drug
Calculate and Interpret Pharmacokinetic Parameters of a Given Drug
 
DNA Microarray
DNA MicroarrayDNA Microarray
DNA Microarray
 

Último

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Último (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Using Anova to Evaluate Differential Protein Expression

  • 1. Using ANOVA to evaluate differential protein expression Shaun Broderick The Ohio State University OARDC
  • 2. The research question: Which proteins are differentially expressed during flower senescence? • Senescence is an active stage of flower development • Senescence activates the degradation of floral proteins – Proteases, nucleases, electron transport proteins, lipoxygenases • Remobilization of nutrients – transmembrane proteins (i.e. pumps) Figure Credit: Bai et al., 2010 © 2010 The Author(s).
  • 3. Why petunias? Petunias serve as a model organism for studying flower senescence because they have a well-defined senescence pattern Figure Credit: Bai et al., 2010 © 2010 The Author(s).
  • 4. Experimental design for protein analysis • Treatments: pollinated and unpollinated • Senescence cycle is completed in 72 hours— corollas will be collected at 0, 24, 48 and 72 hours • 8 corollas will be pooled from at least 3 plants for each time point • Three 2D gels will be poured in each rep • Total soluble protein will be extracted from corollas and separated on 2D gel • Protein spots will be cut from gel and sequenced
  • 5. 2D-gel electrophoresis: Finding differentially expressed proteins • Extract all soluble proteins from senescing corollas • Extract all soluble proteins from non- senescing corollas of the same developmental stage • Proteins can be separated by their physical properties of net charge and molecular weight
  • 6. Protein changes visualized by 2D gels 2nd separated by mass (smaller proteins move faster) 1st separated by their isoelectric point (i.e. charge)
  • 7. Pollinated vs. unpollinated proteins Pollinated Unpollinated Potential problem: Variation in protein extracts will change protein patterns and amounts; how do we set an objective cut-off for “differential expression”?
  • 8. Data analysis overview • Use SAS to rearrange a data set • Evaluate distribution of data • Transform data • Estimate an experiment-wide cutoff
  • 9. Analyzing the data • What analyses would be most appropriate? • Do the data fit the assumptions for ANOVA (normality)? • Transform data (either in Excel or SAS) • How will the data need to be structured? • What are the main factors in the model? • Statistically determine the cutoff point (i.e. least significant difference [LSD]) for statistical significance
  • 10. Are the data normally distributed? Useful SAS code for visualizing the data options formdlim='~'; data one; title 'Bai Protein Expression data - sequenced Proteins'; infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv' dlm=',' firstobs=2 ; input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481 WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242 HAP243 HAP481 HAP482 HAP483 HAP721 HAP722 HAP723; Proc univariate normal plot; var WT01 WT02 WT03 WTU241 WTU242 WTU243; run;
  • 11. Are the data normally distributed? options formdlim='~'; data one; title 'Bai Protein Expression data - sequenced Proteins'; infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv' dlm=',' firstobs=2 ; input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481 WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242 HAP243 HAP481 HAP482 HAP483 HAP721 HAP722 HAP723; Proc univariate normal plot; var WT01 WT02 WT03 WTU241 WTU242 WTU243; run; The data are not normally distributed. The histogram should appear as a bell shaped curve turned on its side. The normal probability plot should appear as a straight line. These data require transformation.
  • 12. options formdlim='~'; Transformed via log + 1 data one; title 'Bai Protein Expression data - sequenced Proteins'; infile 'C:Documents and SettingsjoneslabDesktopSeqProt.csv' dlm=',' firstobs=2 ; input protein $ SSP WT01 WT02 WT03 WTU241 WTU242 WTU243 WTU481 WTU482 WTU483 WTU721 WTU722 WTU723 HAP241 HAP242 HAP243 HAP481 HAP482 HAP483 HAP721 HAP722 HAP723; lnWT01 = log(WT01 + 1); lnWT02 = log(WT02 + 1); lnWT03 = log(WT03 + 1); Data now fit the lnWTU241 = log(WTU241 + 1); ANOVA assumption of lnWTU242 = log(WTU242 + 1); lnWTU243 = log(WTU243 + 1); normality. lnWTU481 = log(WTU481 + 1); lnWTU482 = log(WTU482 + 1); lnWTU483 = log(WTU483 + 1); lnWTU721 = log(WTU721 + 1); lnWTU722 = log(WTU722 + 1); lnWTU723 = log(WTU723 + 1); lnHAP241 = log(HAP241 +1); lnHAP242 = log(HAP242 +1); lnHAP243 = log(HAP243 +1); lnHAP481 = log(HAP481 +1); lnHAP482 = log(HAP482 +1); lnHAP483 = log(HAP483 +1); lnHAP721 = log(HAP721 + 1); lnHAP722 = log(HAP722 + 1); lnHAP723 = log(HAP723 + 1); Proc univariate normal plot; var lnWT01 lnWT02 lnWT03 lnWTU241 lnWTU242 lnWTU243; run;
  • 13. Structuring data for SAS EXP = lnWT01; treatment = 0; time = 0; Rep = 1; output; EXP = lnWT02; treatment = 0; time = 0; Rep = 2; output; EXP = lnWT03; treatment = 0; time = 0; Rep = 3; output; EXP = lnWTU241; treatment = 1; time = 24; Rep = 1; output; EXP = lnWTU242; treatment = 1; time = 24; Rep = 2; output; EXP = lnWTU243; treatment = 1; time = 24; Rep = 3; output; EXP = lnWTU481; treatment = 1; time = 48; Rep = 1; output; EXP = lnWTU482; treatment = 1; time = 48; Rep = 2; output; EXP = lnWTU483; treatment = 1; time = 48; Rep = 3; output; EXP = lnWTU721; treatment = 1; time = 72; Rep = 1; output; EXP = lnWTU722; treatment = 1; time = 72; Rep = 2; output; EXP = lnWTU723; treatment = 1; time = 72; Rep = 3; output; EXP = lnHAP241; treatment = 2; time = 24; rep = 1; output; EXP = lnHAP242; treatment = 2; time = 24; rep = 2; output; EXP = lnHAP243; treatment = 2; time = 24; rep = 3; output; EXP = lnHAP481; treatment = 2; time = 48; rep = 1; output; EXP = lnHAP482; treatment = 2; time = 48; rep = 2; output; The baseline for protein quantities is at 0 hours after pollination.
  • 14. SAS Command: Proc GLM Main effects (all fixed within the model): • protein (protein spots on the gel) • treatment (unpollinated/pollinated) • time (after pollination) • rep (3, 2D gels per treatment) Proc glm; class protein treatment time rep; model EXP = protein treatment time rep rep*protein rep*treatment rep*time treat*protein time*protein time*treatment; Proc varcomp method = REML; class protein treatment time rep; model EXP = protein treatment time rep rep*protein rep*treatment rep*time treat*protein time*protein time*treatment; quit;
  • 15. Output of Proc GLM The GLM Procedure Dependent Variable: EXP Sum of Source DF Squares Mean Square F Value Pr > F Model 989 36342.17871 36.74639 20.22 <.0001 Error 1926 3499.80085 1.81713 Corrected Total 2915 39841.97956 Mean Square Error (MSE) R-Square Coeff Var Root MSE EXP Mean 0.912158 17.39963 1.348011 7.747356 Source DF Type III SS Mean Square F Value Pr > F protein 139 22324.65836 160.60905 88.39 <.0001 treatment 1 519.20558 519.20558 285.73 <.0001 time 2 108.58773 54.29387 29.88 <.0001 Rep 2 0.35196 0.17598 0.10 0.9077 Protein*Rep 278 33.08495 0.11901 0.07 1.0000 treatment*Rep 2 2.07059 1.03530 0.57 0.5658 time*Rep 4 1.39061 0.34765 0.19 0.9430 Protein*treatment 139 5433.79977 39.09208 21.51 <.0001 Protein*time 278 4550.96421 16.37037 9.01 <.0001 treatment*time 2 216.32514 108.16257 59.52 <.0001 Sources in red were found to be significant. Rep is not significant.
  • 16. Appropriate cut off • The data are inherently noisy • The cut off (or least significant difference, LSD) should be based on the variance of the data, or Mean Square Error and the level of statistical significance • The LSD can be calculated with e[t×√(MSE/n)]/2, where t is 1.96 for P = 0.05, 2.576 for P = 0.01, and 3.291 for P = 0.001 and n is the number of replicates
  • 17. Use Excel for determining LSD This is calculated and Choose the appropriate α- given in the SAS output level for the experiment; n = 3. LSD = [t*(SQRT(MSE/n))]/2 MSE LSD Fold α-level 1.817 0.762 2.144 p = 0.05 1.817 1.002 2.725 p = 0.01 1.817 1.281 3.599 p = 0.001 The LSD must be “untransformed” using Fold = (LSD)e This can be done in excel with Fold=power(2.718, LSD) Why is it a fold difference? A log transformation was necessary to fit normality assumptions. Log data are only interpretable as fold differences What parameters are used to select an appropriate significance level? For a more conservative value, you would select a smaller α-level. This reduces the chance of a Type I error
  • 18. Summary 1. Determine if the data fit the assumptions 2. Transform as needed 3. Restructure the data in SAS 4. Determine the main effects for the experiment 5. Use Proc GLM to determine the MSE 6. Calculate the cutoff (LSD) for statistical significance
  • 19. References Cited • Bai, S., B. Willard, L. J. Chapin, M. T. Kinter, D. M. Francis, A. D. Stead, and M. L. Jones. 2010. Proteomic analysis of pollination-induced corolla senescence in petunia. Journal of Experimental Botany 61:1089-1109. (Available online at: http://dx.doi.org/10.1093/jxb/erp373) (verified 20 Oct 2011). • Kerr, M. K., M. Martin, and G. A. Churchill. 2000. Analysis of variance for gene expression microarray data. Journal of Computational Biology 7:819-837. (Available online at: http://dx.doi.org/10.1089/10665270050514954) (verified 20 Oct 2011).