SlideShare uma empresa Scribd logo
1 de 17
Statistics for Next Generation
   Sequencing (RNA-Seq)
Distribution?
• 25000 genes, each with counts over several
  samples
     • 2 conditions, each with several replicates

• Recall, log-Normal for Microarrays
     • Based on fitting on actual data with many replicates


• No equivalent data for RNA-Seq
     • So go back to first principles
RNA-Seq Setting
RNA-Seq Counts Distribution
Hypergeometric Distribution
Simplifying the Hypergeometric
          Distribution
The Poisson Distribution




 λ is both mean
  and variance
The Poisson Distribution
                       (Wikipedia)
•   The number of soldiers killed by horse-kicks each year in each corps in
    the Prussian cavalry. This example was made famous by a book of Ladislaus
    Josephovich Bortkiewicz (1868–1931).
•   The number of yeast cells used when brewing Guinness beer. This example was
    made famous by William Sealy Gosset (1876–1937).[19]
•   The number of phone calls arriving at a call centre per minute.
•   The number of goals in sports involving two competing teams.
•   The number of deaths per year in a given age group.
•   The number of jumps in a stock price in a given time interval.
•   Under an assumption of homogeneity, the number of times a web server is
    accessed per minute.
•   The number of mutations in a given stretch of DNA after a certain amount of
    radiation.
•   The proportion of cells that will be infected at a given multiplicity of infection.
Is Mean = Variance for NGS ?


– Variance ∝ Mean2




 Log Scale: White
line is the Poisson
         line
Why this Over-Dispersion

• The Poisson model only
  models technical variation,
  not biological variation

• Biological variation induces
  more variance than
  captured by the Poisson
  model
–    No reason for difference from
     microarrays where SD ∝ Mean
         (or Variance ∝ Mean2)
                                     SD vs Mean for
                                      Microarrays
Handling Over-Dispersion
What Distribution is X?

• Log-Normal for Arrays?

• The combination of log-Normal and Poisson
  doesn’t have a neat closed form (i.e., formula)

• So assume Gamma distribution
   – Poisson + Gamma -> Negative Binomial
   – Used traditionally to fix the problem of over-
     dispersion
The Gamma Distribution




             Control on
              Right Tail
The Negative Binomial Distribution
Estimating Parameters




                For each gene, estimate
              the mean across replicates,
                 and then estimate the
              variance from the curve fit
                         above
Bias Correction
Thank You

Mais conteúdo relacionado

Destaque

Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction Revised
MD Specialclass
 
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTESREGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
University of Louisiana at Monroe, USA
 

Destaque (9)

Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction Revised
 
Dna sequencing
Dna sequencingDna sequencing
Dna sequencing
 
217 c reactive protein
217 c reactive protein217 c reactive protein
217 c reactive protein
 
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismChem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
 
Lipid metabolism
Lipid metabolismLipid metabolism
Lipid metabolism
 
Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression ppt
 
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTESREGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Semelhante a Introduction to statistics iii

sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
Naveen Gupta
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
jsrep91
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 

Semelhante a Introduction to statistics iii (20)

Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Ssr assignment
Ssr assignmentSsr assignment
Ssr assignment
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR GenomicsTarget Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
 
Diagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methodsDiagnosis of mrsa by molecular methods
Diagnosis of mrsa by molecular methods
 
Genomics seminar
Genomics seminarGenomics seminar
Genomics seminar
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
SlipChip - Oct 2012
SlipChip - Oct 2012SlipChip - Oct 2012
SlipChip - Oct 2012
 
DNA analysis
DNA analysisDNA analysis
DNA analysis
 

Mais de Strand Life Sciences Pvt Ltd

Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
Strand Life Sciences Pvt Ltd
 

Mais de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Introduction to statistics iii

  • 1. Statistics for Next Generation Sequencing (RNA-Seq)
  • 2. Distribution? • 25000 genes, each with counts over several samples • 2 conditions, each with several replicates • Recall, log-Normal for Microarrays • Based on fitting on actual data with many replicates • No equivalent data for RNA-Seq • So go back to first principles
  • 7. The Poisson Distribution λ is both mean and variance
  • 8. The Poisson Distribution (Wikipedia) • The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868–1931). • The number of yeast cells used when brewing Guinness beer. This example was made famous by William Sealy Gosset (1876–1937).[19] • The number of phone calls arriving at a call centre per minute. • The number of goals in sports involving two competing teams. • The number of deaths per year in a given age group. • The number of jumps in a stock price in a given time interval. • Under an assumption of homogeneity, the number of times a web server is accessed per minute. • The number of mutations in a given stretch of DNA after a certain amount of radiation. • The proportion of cells that will be infected at a given multiplicity of infection.
  • 9. Is Mean = Variance for NGS ? – Variance ∝ Mean2 Log Scale: White line is the Poisson line
  • 10. Why this Over-Dispersion • The Poisson model only models technical variation, not biological variation • Biological variation induces more variance than captured by the Poisson model – No reason for difference from microarrays where SD ∝ Mean (or Variance ∝ Mean2) SD vs Mean for Microarrays
  • 12. What Distribution is X? • Log-Normal for Arrays? • The combination of log-Normal and Poisson doesn’t have a neat closed form (i.e., formula) • So assume Gamma distribution – Poisson + Gamma -> Negative Binomial – Used traditionally to fix the problem of over- dispersion
  • 13. The Gamma Distribution Control on Right Tail
  • 14. The Negative Binomial Distribution
  • 15. Estimating Parameters For each gene, estimate the mean across replicates, and then estimate the variance from the curve fit above