SlideShare uma empresa Scribd logo
1 de 43
Budget friendly sample
sizes for genomics research
Biostatistician, bioinformatician
Ognjen Milicevic, MD
Why do you need a
biostatistician?
Common biostatistics tasks
● Cleaning and transforming data
● Data description
● Statistical testing
● Tabulation and visualization
● Bioinformatics (applied statistics for genomics)
● Post-hoc power calculations
● ...
Common biostatistics tasks
● Cleaning and transforming data
● Data description
● Statistical testing
● Tabulation and visualization
● Bioinformatics (applied statistics for genomics)
● Post-hoc power calculations
● Complain they weren't consulted earlier
Post-hoc sample size / power analysis
● Due to convenience, we justify choices already made
● Find the similar effect size in literature
● Use the posterior distribution as prior
● Set the desired power (80-100%)
● Adjust as needed for dropout, loss, margin-of-error
● Obtain the sample size you already have
Make a wish,
biostatistician
Dear bioinformatician, how many samples do we need to
sequence to investigate...
NO CONVENIENCE!
● Not routinely done
● Effect size unknown
● Literature not helpful
● Multiple unknown genes
● Distribution is complex
● ...
RNA sequencing around the internet
DATA SCIENCE OF
RNA SEQUENCING
Natural variability of RNA per gene
De Torrente et al. (2020)
Surprisingly, the expression of less than 50% of all genes
was Normally-distributed, with other distributions including
Gamma, Bimodal, Cauchy, and Lognormal also
represented.
Liu et al. (2019)
Based on the analysis of a group of real gene expression
profiles, this study reveal that the primary density
distributions of the real profiles are normal/log-normal and
t distributions, accounting for 80% and 19% respectively.
20K+ genes
Representing RNAs with fragments
Gamma-Poisson distribution
Count and normalize to quantify (TPM)
Overview of the pipeline
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Count matrix and metadata
Each gene is an independent outcome
LAYERS UPON LAYERS OF VARIABILITY
So, what about those sample sizes?
COVID-19 RNA characterization
Example project
RNA characterization of COVID-19 (2021) - Plan
● Total RNA – virus and host (human)
● Nasopharyngeal swabs and blood samples
● Paired design (on admittance and discharge from hospital)
● 18 individuals, total of 72 samples
● Which biological pathways are affected? (DEG)
● What can we say about the viral load? (metagenomics)
Estimating sample size for RNA
● Theoretical models with assumed distributions
● Parameters inferred from previous datasets
● R-packages: RNASeqDesign, PROPER, powsimR, ssizeRNA
● Web tool: RNASeqSampleSize
● Variable result
● If cost is not relevant, choose the most conservative (largest)
Proposed approach
● Perform one estimate and use it
● Remove unwanted variability (batch
effect)
● Reduce variability with paired design
● Use meaningful metadata
● Filter the genes
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
A number of methods based on SVD remove high level batch effects
without specifically tracing them to interpretable variables.
One can use housekeeping or control genes as markers.
• SVA
• RUVseq
These methods produce new surrogate variables.
Colleague quote:
"Once I see batch effects, I can correct them mathematically, but I
never trust that dataset again."
Batch effects against the collaborative science!
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Paired design - taking control samples from patients
after resolution or before the event.
● Increases power
● Not all analysis frameworks can take advantage of it
● Sometimes biologically difficult
● Reduces DF by half
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Gender and age can always be relevant.
Collect metrics of sample quality (before and after
sequencing).
Disease subtypes can be a covariate or group variable.
Helps choosing when sequencing a subset.
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Multiple testing correction for 20K+ genes.
Remove mostly unexpressed genes.
A priori removal is allowed.
Results
● EdgeR GLM
● Nasal DEG p<0.05:
40(paired)/51(unpaired)
● Blood DEG p<0.05:
76(paired)/2(unpaired)
● Every parameter choice changes
results
● Validation?
Annotation representation testing – Panther.db
● Annotation is a subset of genes
● Multiple available annotation sets (structure, function, pathway...)
● We only use significant genes
● Overrepresentation test – chi-square to compare observed and
expected frequencies
● Enrichment test – Mann-Whitney to test randomness of ranks
Molecular function in blood (PAIRED)
● Increased
immunoglobulin binding
● Reduced smell (in blood!)
● Reduced oxygen binding
and carrier activity
● We consider the result
validated
Takeaways of the study
● Study rescued by pairing
● No batch to correct
● Almost no metadata
● Smaller signal in blood
● Specific tissue (nasal) more
robust
WHAT HAPPENED?
Data science implications
Reduced individual variation
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Intra
Reduced batch effects
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Intra
Easier to control for batches
● Pairing absorbs a proportion of
batch effects
● Usually 8 lanes in a flowcell
● Focus on pairs instead of whole
samples
● Aggregation of datasets easier
Technical downsides of pairing
● Loss of half DF
● Many frameworks cannot use it as easily as GLM-based ones
● RNA is used for other analyses:
○ SUPPA2 for alternative splicing
○ Building empirical distribution from all pairs of samples
○ If pairing was implemented, would reduce the observations
drastically
SHOULD WE ALWAYS PAIR?
Medical implications
Tissue implications
● Specific tissues have robust signatures without pairing
● Blood reflects many tissues:
○ Weaker signal
○ Local changes reflected
● Systemic effects are found only in blood
● Always available for sampling (minimum invasive)
● Blood analysis benefits from pairing
Utility implications
● Paired designs are easier to aggregate to meta-studies (robust to
batch effects)
● Blood controls can be used as unpaired controls for other studies (if
healthy enough)
● Solves the problem of finding controls
● If controls are after resolution, questionable health (long COVID)
● Some chronic diseases cannot be caught early or ever resolved, so
pairing is impossible
Example – cardiovascular events
● We are interested in markers of
plaque progression/instability
● Patient checkup and sampling every
X months
● Sequencing is expensive, sampling
and storing is not
● Sequence only the previous two
samples before the event
Example – neurodegenerative disease (ALS)
● We cannot predict the disease (10% familial)
● Patient available for sampling once diseased
● Sequence patients sufficiently apart
● We cannot find the root cause of ALS, as we
are not catching the initial event
● We can find signatures of neuronal suffering
and death, which is an actionable point
● Generalizes to all chronic diseases
Example – cancer
● For DNA, tumor is matched with blood
sample control
● For RNA, we need the normal
surrounding tissue
● Sampling the healthy normal target
tissue may be problematic
● Tissue margin – potential normal
sample
● Admixture of tumor in normal reduces
the signal (but not critically for RNA)
Many thanks to...
● Institute for Biocides and Medical Ecology
for providing the samples and sequencing
● HTEC Group for providing computational
resources and support
● School of Medicine, University of Belgrade
for supporting research
● Thanks to DSC organizers for the invite
● Last but not least...
...THANK YOU FOR LISTENING!
ognjen.milicevic@med.bg.ac.rs
ognjen.milicevic@htecgroup.com
ognjen011@gmail.com

Mais conteúdo relacionado

Semelhante a [DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic

Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVSGolden Helix
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdfDataScienceConferenc1
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTnist-spin
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTNathan Olson
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Team c final slides
Team c final slidesTeam c final slides
Team c final slidesDEXINREN
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...QIAGEN
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappanElsa von Licy
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarAlexander Pico
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regressionbbuliksullivan
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Oxford Gene Technology
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...QIAGEN
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...QIAGEN
 

Semelhante a [DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic (20)

Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Team c final slides
Team c final slidesTeam c final slides
Team c final slides
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 

Mais de DataScienceConferenc1

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDFDataScienceConferenc1
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...DataScienceConferenc1
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdfDataScienceConferenc1
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...DataScienceConferenc1
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptxDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In TreatmentsDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMEDDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with SeifDataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help youDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...DataScienceConferenc1
 

Mais de DataScienceConferenc1 (20)

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
 

Último

Single Assessment Framework - What We Know So Far
Single Assessment Framework - What We Know So FarSingle Assessment Framework - What We Know So Far
Single Assessment Framework - What We Know So FarCareLineLive
 
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door ModelCall Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door ModelCall Girls Lucknow
 
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...sandeepkumar69420
 
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...callgirlsinsaket2024
 
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbers
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbersHi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbers
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbersnarwatsonia7
 
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Deliverymarshasaifi
 
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...narwatsonia7
 
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...narwatsonia7
 
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...ggsonu500
 
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...narwatsonia7
 
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment BookingRussian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Call Girls Dwarka 9999965857 Cheap & Best with original Photos
Call Girls Dwarka 9999965857 Cheap & Best with original PhotosCall Girls Dwarka 9999965857 Cheap & Best with original Photos
Call Girls Dwarka 9999965857 Cheap & Best with original Photosparshadkalavatidevi7
 
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...narwatsonia7
 
Globalny raport: „Prawdziwe piękno 2024" od Dove
Globalny raport: „Prawdziwe piękno 2024" od DoveGlobalny raport: „Prawdziwe piękno 2024" od Dove
Globalny raport: „Prawdziwe piękno 2024" od Doveagatadrynko
 
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original PhotosCall Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photosparshadkalavatidevi7
 
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500
 
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed RuleShelby Lewis
 
EMS and Extrication: Coordinating Critical Care
EMS and Extrication: Coordinating Critical CareEMS and Extrication: Coordinating Critical Care
EMS and Extrication: Coordinating Critical CareRommie Duckworth
 
FAMILY in sociology for physiotherapists.pptx
FAMILY in sociology for physiotherapists.pptxFAMILY in sociology for physiotherapists.pptx
FAMILY in sociology for physiotherapists.pptxMumux Mirani
 

Último (20)

Single Assessment Framework - What We Know So Far
Single Assessment Framework - What We Know So FarSingle Assessment Framework - What We Know So Far
Single Assessment Framework - What We Know So Far
 
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door ModelCall Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
 
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...
Russian Call Girls Mohan Nagar | 9711199171 | High Profile -New Model -Availa...
 
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
 
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbers
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbersHi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbers
Hi,Fi Call Girl In Marathahalli - 7001305949 with real photos and phone numbers
 
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
 
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...
Call Girls Nandini Layout - 7001305949 Escorts Service with Real Photos and M...
 
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...
Air-Hostess Call Girls Shanti Nagar - Call 7001305949 Rs-3500 with A/C Room C...
 
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...
Gurgaon DLF Phase 5 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Fe...
 
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...
Russian Call Girl Chandapura Dommasandra Road - 7001305949 Escorts Service 50...
 
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment BookingRussian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking
Russian Call Girls Sadashivanagar | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...
Call Girls Service Bommasandra - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Call Girls Dwarka 9999965857 Cheap & Best with original Photos
Call Girls Dwarka 9999965857 Cheap & Best with original PhotosCall Girls Dwarka 9999965857 Cheap & Best with original Photos
Call Girls Dwarka 9999965857 Cheap & Best with original Photos
 
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...
Housewife Call Girls Nandini Layout - Phone No 7001305949 For Ultimate Sexual...
 
Globalny raport: „Prawdziwe piękno 2024" od Dove
Globalny raport: „Prawdziwe piękno 2024" od DoveGlobalny raport: „Prawdziwe piękno 2024" od Dove
Globalny raport: „Prawdziwe piękno 2024" od Dove
 
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original PhotosCall Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos
Call Girls Laxmi Nagar 9999965857 Cheap and Best with original Photos
 
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 68 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
 
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
 
EMS and Extrication: Coordinating Critical Care
EMS and Extrication: Coordinating Critical CareEMS and Extrication: Coordinating Critical Care
EMS and Extrication: Coordinating Critical Care
 
FAMILY in sociology for physiotherapists.pptx
FAMILY in sociology for physiotherapists.pptxFAMILY in sociology for physiotherapists.pptx
FAMILY in sociology for physiotherapists.pptx
 

[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic

  • 1. Budget friendly sample sizes for genomics research Biostatistician, bioinformatician Ognjen Milicevic, MD
  • 2. Why do you need a biostatistician?
  • 3. Common biostatistics tasks ● Cleaning and transforming data ● Data description ● Statistical testing ● Tabulation and visualization ● Bioinformatics (applied statistics for genomics) ● Post-hoc power calculations ● ...
  • 4. Common biostatistics tasks ● Cleaning and transforming data ● Data description ● Statistical testing ● Tabulation and visualization ● Bioinformatics (applied statistics for genomics) ● Post-hoc power calculations ● Complain they weren't consulted earlier
  • 5.
  • 6. Post-hoc sample size / power analysis ● Due to convenience, we justify choices already made ● Find the similar effect size in literature ● Use the posterior distribution as prior ● Set the desired power (80-100%) ● Adjust as needed for dropout, loss, margin-of-error ● Obtain the sample size you already have
  • 8. Dear bioinformatician, how many samples do we need to sequence to investigate...
  • 9. NO CONVENIENCE! ● Not routinely done ● Effect size unknown ● Literature not helpful ● Multiple unknown genes ● Distribution is complex ● ...
  • 10. RNA sequencing around the internet
  • 11. DATA SCIENCE OF RNA SEQUENCING
  • 12. Natural variability of RNA per gene De Torrente et al. (2020) Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Liu et al. (2019) Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. 20K+ genes
  • 13. Representing RNAs with fragments Gamma-Poisson distribution Count and normalize to quantify (TPM)
  • 14. Overview of the pipeline Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing
  • 15. Count matrix and metadata Each gene is an independent outcome
  • 16. LAYERS UPON LAYERS OF VARIABILITY So, what about those sample sizes?
  • 18. RNA characterization of COVID-19 (2021) - Plan ● Total RNA – virus and host (human) ● Nasopharyngeal swabs and blood samples ● Paired design (on admittance and discharge from hospital) ● 18 individuals, total of 72 samples ● Which biological pathways are affected? (DEG) ● What can we say about the viral load? (metagenomics)
  • 19. Estimating sample size for RNA ● Theoretical models with assumed distributions ● Parameters inferred from previous datasets ● R-packages: RNASeqDesign, PROPER, powsimR, ssizeRNA ● Web tool: RNASeqSampleSize ● Variable result ● If cost is not relevant, choose the most conservative (largest)
  • 20. Proposed approach ● Perform one estimate and use it ● Remove unwanted variability (batch effect) ● Reduce variability with paired design ● Use meaningful metadata ● Filter the genes
  • 21. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes A number of methods based on SVD remove high level batch effects without specifically tracing them to interpretable variables. One can use housekeeping or control genes as markers. • SVA • RUVseq These methods produce new surrogate variables. Colleague quote: "Once I see batch effects, I can correct them mathematically, but I never trust that dataset again."
  • 22. Batch effects against the collaborative science!
  • 23. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Paired design - taking control samples from patients after resolution or before the event. ● Increases power ● Not all analysis frameworks can take advantage of it ● Sometimes biologically difficult ● Reduces DF by half
  • 24. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Gender and age can always be relevant. Collect metrics of sample quality (before and after sequencing). Disease subtypes can be a covariate or group variable. Helps choosing when sequencing a subset.
  • 25. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Multiple testing correction for 20K+ genes. Remove mostly unexpressed genes. A priori removal is allowed.
  • 26. Results ● EdgeR GLM ● Nasal DEG p<0.05: 40(paired)/51(unpaired) ● Blood DEG p<0.05: 76(paired)/2(unpaired) ● Every parameter choice changes results ● Validation?
  • 27. Annotation representation testing – Panther.db ● Annotation is a subset of genes ● Multiple available annotation sets (structure, function, pathway...) ● We only use significant genes ● Overrepresentation test – chi-square to compare observed and expected frequencies ● Enrichment test – Mann-Whitney to test randomness of ranks
  • 28. Molecular function in blood (PAIRED) ● Increased immunoglobulin binding ● Reduced smell (in blood!) ● Reduced oxygen binding and carrier activity ● We consider the result validated
  • 29. Takeaways of the study ● Study rescued by pairing ● No batch to correct ● Almost no metadata ● Smaller signal in blood ● Specific tissue (nasal) more robust
  • 31. Reduced individual variation Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing Intra
  • 32. Reduced batch effects Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing Intra
  • 33. Easier to control for batches ● Pairing absorbs a proportion of batch effects ● Usually 8 lanes in a flowcell ● Focus on pairs instead of whole samples ● Aggregation of datasets easier
  • 34. Technical downsides of pairing ● Loss of half DF ● Many frameworks cannot use it as easily as GLM-based ones ● RNA is used for other analyses: ○ SUPPA2 for alternative splicing ○ Building empirical distribution from all pairs of samples ○ If pairing was implemented, would reduce the observations drastically
  • 35. SHOULD WE ALWAYS PAIR? Medical implications
  • 36. Tissue implications ● Specific tissues have robust signatures without pairing ● Blood reflects many tissues: ○ Weaker signal ○ Local changes reflected ● Systemic effects are found only in blood ● Always available for sampling (minimum invasive) ● Blood analysis benefits from pairing
  • 37. Utility implications ● Paired designs are easier to aggregate to meta-studies (robust to batch effects) ● Blood controls can be used as unpaired controls for other studies (if healthy enough) ● Solves the problem of finding controls ● If controls are after resolution, questionable health (long COVID) ● Some chronic diseases cannot be caught early or ever resolved, so pairing is impossible
  • 38. Example – cardiovascular events ● We are interested in markers of plaque progression/instability ● Patient checkup and sampling every X months ● Sequencing is expensive, sampling and storing is not ● Sequence only the previous two samples before the event
  • 39. Example – neurodegenerative disease (ALS) ● We cannot predict the disease (10% familial) ● Patient available for sampling once diseased ● Sequence patients sufficiently apart ● We cannot find the root cause of ALS, as we are not catching the initial event ● We can find signatures of neuronal suffering and death, which is an actionable point ● Generalizes to all chronic diseases
  • 40. Example – cancer ● For DNA, tumor is matched with blood sample control ● For RNA, we need the normal surrounding tissue ● Sampling the healthy normal target tissue may be problematic ● Tissue margin – potential normal sample ● Admixture of tumor in normal reduces the signal (but not critically for RNA)
  • 41. Many thanks to... ● Institute for Biocides and Medical Ecology for providing the samples and sequencing ● HTEC Group for providing computational resources and support ● School of Medicine, University of Belgrade for supporting research ● Thanks to DSC organizers for the invite ● Last but not least...
  • 42. ...THANK YOU FOR LISTENING!

Notas do Editor

  1. Hello, my name is Ognjen Milicevic from Belgrade, Serbia. Because of my mixed medical and engineering background, today I chose to tackle an interdisciplinary subject -