7. When to contact the bioinformatician?
“To call in the statistician after the experiment is done is no more
than asking him to perform a post-mortem examination: he may
be able to say what the experiment died of.” Ronald Fisher
8. Why? Am I wasting your time? Never.
Are you really just a control freak? Ok, maybe.
The GIGO principle of computer science:
Garbage In Garbage Out
10. Considerations for study design
• Sample preparation impacts which technologies you can use
• Biological hypothesis should drive technology selection
• “Off-label” use of technologies impacts technology protocols (e.g.,
TCR sequence, virus, or splice variant detection from bulk or sc RNA-
seq)
• Consider study design to anticipate the impact of technical artifacts
may impact data quality (e.g., library, sequencing run, batch,
technician, date of processing, age of sample, etc).
Measure twice, cut once
18. Study design and data cleaning are the most
critical part of any analysis
19. We can mathematically correct for known
batch effects in data with good study designs
20. We can correct for batch effects if we know
they are there
21. Recognizing confounded designs
• Trial Arm A in one batch and trial Arm B in another
• Pre-treatment in one batch and post-treatment in another
• Responders in one batch and non-responders in another
• Designs can get complicated. E.g., what do you do if you have
multiple tissue sites from multiple individuals and you want to
compare both site and individual differences?
We love to help
during design!
23. Bioinformatics as a team sport and best
practices
• Early consultation for sample
preparation, technology selection,
and study design
• Interactive collaboration during
data preprocessing and cleaning
• Reproducible scripts to include as
manuscript supplements or online
to document analysis steps
• Open source software for
dissemination of any new
algorithms employed in analysis
24. Summary
• It is never too early to contact your friendly neighborhood
bioinformatician and we can consult on
• Sample preservation
• Technology selection
• Study design
• Analysis plan and preprocessing
• Data parasiting
• Coordinated collaboration in the data generation process and with
the sequencing core minimizes costs and maximizes data quality