SlideShare uma empresa Scribd logo
1 de 24
Considerations in bioinformatics
analyses and study design
Elana J Fertig
Johns Hopkins University
Why study design and bioinformatics
pipelines?
Let’s team up to avoid painful power
calculation discussions
• https://www.youtube.com/watch?v=PbODigCZqL8
Is there a boundary between standard
bioinformatics and AI/data science?
Data science and statistics are a continuum
that must work together for best analyses
How does bioinformatics work?
When to contact the bioinformatician?
“To call in the statistician after the experiment is done is no more
than asking him to perform a post-mortem examination: he may
be able to say what the experiment died of.” Ronald Fisher
Why? Am I wasting your time? Never.
Are you really just a control freak? Ok, maybe.
The GIGO principle of computer science:
Garbage In Garbage Out
Best analyses come from good data cleaning
and study design
Considerations for study design
• Sample preparation impacts which technologies you can use
• Biological hypothesis should drive technology selection
• “Off-label” use of technologies impacts technology protocols (e.g.,
TCR sequence, virus, or splice variant detection from bulk or sc RNA-
seq)
• Consider study design to anticipate the impact of technical artifacts
may impact data quality (e.g., library, sequencing run, batch,
technician, date of processing, age of sample, etc).
Measure twice, cut once
Core coordination minimizes off-label analysis costs
Off-labelinformatics
toolsworkonrawdata
Published bladder cancer microarray data set
Leek et al. 2010
Even large consortia datasets like TCGA have
batch effects
Fortin et al., 2014
Design studies to avoid confounding technical
artifacts and biological covariates
Batch effects change the correlation structure
between genes
Leek et al. 2010
Batch effects change the correlation structure
between genes
Leek et al. 2010
Study design and data cleaning are the most
critical part of any analysis
We can mathematically correct for known
batch effects in data with good study designs
We can correct for batch effects if we know
they are there
Recognizing confounded designs
• Trial Arm A in one batch and trial Arm B in another
• Pre-treatment in one batch and post-treatment in another
• Responders in one batch and non-responders in another
• Designs can get complicated. E.g., what do you do if you have
multiple tissue sites from multiple individuals and you want to
compare both site and individual differences?
We love to help
during design!
What should we do?
Leek et al. 2010
Bioinformatics as a team sport and best
practices
• Early consultation for sample
preparation, technology selection,
and study design
• Interactive collaboration during
data preprocessing and cleaning
• Reproducible scripts to include as
manuscript supplements or online
to document analysis steps
• Open source software for
dissemination of any new
algorithms employed in analysis
Summary
• It is never too early to contact your friendly neighborhood
bioinformatician and we can consult on
• Sample preservation
• Technology selection
• Study design
• Analysis plan and preprocessing
• Data parasiting
• Coordinated collaboration in the data generation process and with
the sequencing core minimizes costs and maximizes data quality

Mais conteúdo relacionado

Mais procurados

screening of hypoglycemic agent
screening of hypoglycemic agentscreening of hypoglycemic agent
screening of hypoglycemic agentAzhar iqbal
 
Models & Trends in REMS Program Success
Models & Trends in REMS Program SuccessModels & Trends in REMS Program Success
Models & Trends in REMS Program SuccessBest Practices
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveySOYEON KIM
 
Bioinformatics applications and challenges
Bioinformatics applications and challengesBioinformatics applications and challenges
Bioinformatics applications and challengesS V Singh
 
Introduction to Computational Vaccinology and iVAX by EpiVax
Introduction to Computational Vaccinology and iVAX by EpiVaxIntroduction to Computational Vaccinology and iVAX by EpiVax
Introduction to Computational Vaccinology and iVAX by EpiVaxAnnie De Groot
 
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"FinianCN
 
The Prescription Drug Pipeline
The Prescription Drug PipelineThe Prescription Drug Pipeline
The Prescription Drug PipelineKristin O'Donovan
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeakin University
 
Integrating Clinical Operations and Clinical Data Management Through EDC
Integrating Clinical Operations and Clinical Data Management Through EDCIntegrating Clinical Operations and Clinical Data Management Through EDC
Integrating Clinical Operations and Clinical Data Management Through EDCwww.datatrak.com
 
Safety pharmacology (siri)
Safety pharmacology (siri)Safety pharmacology (siri)
Safety pharmacology (siri)Ramavath Aruna
 
Targeted gene therapy
Targeted gene therapyTargeted gene therapy
Targeted gene therapySelvaMani69
 
Alternative methods to animal toxicity testing
Alternative methods to animal toxicity testingAlternative methods to animal toxicity testing
Alternative methods to animal toxicity testingpriyachhikara1
 
Investigator’ Brochure 12-1.pptx
Investigator’ Brochure 12-1.pptxInvestigator’ Brochure 12-1.pptx
Investigator’ Brochure 12-1.pptxNitinKale46
 
Immunotherapeutics.pptx
Immunotherapeutics.pptxImmunotherapeutics.pptx
Immunotherapeutics.pptxPronay Mandal
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
Drug development process
Drug development processDrug development process
Drug development processKarthiga M
 

Mais procurados (20)

screening of hypoglycemic agent
screening of hypoglycemic agentscreening of hypoglycemic agent
screening of hypoglycemic agent
 
Models & Trends in REMS Program Success
Models & Trends in REMS Program SuccessModels & Trends in REMS Program Success
Models & Trends in REMS Program Success
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a survey
 
Bioinformatics applications and challenges
Bioinformatics applications and challengesBioinformatics applications and challenges
Bioinformatics applications and challenges
 
Introduction to Computational Vaccinology and iVAX by EpiVax
Introduction to Computational Vaccinology and iVAX by EpiVaxIntroduction to Computational Vaccinology and iVAX by EpiVax
Introduction to Computational Vaccinology and iVAX by EpiVax
 
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY  "AN OVERVIEW OF AWARENESS"
ARTIFICIAL INTELLIGENCE IN DRUG DISCOVERY "AN OVERVIEW OF AWARENESS"
 
The Prescription Drug Pipeline
The Prescription Drug PipelineThe Prescription Drug Pipeline
The Prescription Drug Pipeline
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and future
 
Schedule y by dr.roohna
Schedule y by dr.roohnaSchedule y by dr.roohna
Schedule y by dr.roohna
 
CDM
CDMCDM
CDM
 
Integrating Clinical Operations and Clinical Data Management Through EDC
Integrating Clinical Operations and Clinical Data Management Through EDCIntegrating Clinical Operations and Clinical Data Management Through EDC
Integrating Clinical Operations and Clinical Data Management Through EDC
 
Safety pharmacology (siri)
Safety pharmacology (siri)Safety pharmacology (siri)
Safety pharmacology (siri)
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 
Targeted gene therapy
Targeted gene therapyTargeted gene therapy
Targeted gene therapy
 
Alternative methods to animal toxicity testing
Alternative methods to animal toxicity testingAlternative methods to animal toxicity testing
Alternative methods to animal toxicity testing
 
Clinical trial process
Clinical trial processClinical trial process
Clinical trial process
 
Investigator’ Brochure 12-1.pptx
Investigator’ Brochure 12-1.pptxInvestigator’ Brochure 12-1.pptx
Investigator’ Brochure 12-1.pptx
 
Immunotherapeutics.pptx
Immunotherapeutics.pptxImmunotherapeutics.pptx
Immunotherapeutics.pptx
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Drug development process
Drug development processDrug development process
Drug development process
 

Semelhante a Bioinformatics workflows and study design

ScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&DScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&DBIOVIA
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...robertstevens65
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
 
A practical guide to practicing open science
A practical guide to practicing open scienceA practical guide to practicing open science
A practical guide to practicing open scienceKrzysztof Gorgolewski
 
Cracking the (bio)code -- Professional Development Session at SACNAS 2014
Cracking the (bio)code -- Professional Development Session at SACNAS 2014Cracking the (bio)code -- Professional Development Session at SACNAS 2014
Cracking the (bio)code -- Professional Development Session at SACNAS 2014Tracy Heath
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdfAdhySugara2
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Elia Brodsky
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsM Abdullah Chaudhry
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 

Semelhante a Bioinformatics workflows and study design (20)

ScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&DScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&D
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 
A practical guide to practicing open science
A practical guide to practicing open scienceA practical guide to practicing open science
A practical guide to practicing open science
 
Cracking the (bio)code -- Professional Development Session at SACNAS 2014
Cracking the (bio)code -- Professional Development Session at SACNAS 2014Cracking the (bio)code -- Professional Development Session at SACNAS 2014
Cracking the (bio)code -- Professional Development Session at SACNAS 2014
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
Use of Artificial Intelligence for Literature Screening
Use of Artificial Intelligence for Literature ScreeningUse of Artificial Intelligence for Literature Screening
Use of Artificial Intelligence for Literature Screening
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and Jobs
 
Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 

Último

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Último (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Bioinformatics workflows and study design

  • 1. Considerations in bioinformatics analyses and study design Elana J Fertig Johns Hopkins University
  • 2. Why study design and bioinformatics pipelines?
  • 3. Let’s team up to avoid painful power calculation discussions • https://www.youtube.com/watch?v=PbODigCZqL8
  • 4. Is there a boundary between standard bioinformatics and AI/data science?
  • 5. Data science and statistics are a continuum that must work together for best analyses
  • 7. When to contact the bioinformatician? “To call in the statistician after the experiment is done is no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” Ronald Fisher
  • 8. Why? Am I wasting your time? Never. Are you really just a control freak? Ok, maybe. The GIGO principle of computer science: Garbage In Garbage Out
  • 9. Best analyses come from good data cleaning and study design
  • 10. Considerations for study design • Sample preparation impacts which technologies you can use • Biological hypothesis should drive technology selection • “Off-label” use of technologies impacts technology protocols (e.g., TCR sequence, virus, or splice variant detection from bulk or sc RNA- seq) • Consider study design to anticipate the impact of technical artifacts may impact data quality (e.g., library, sequencing run, batch, technician, date of processing, age of sample, etc). Measure twice, cut once
  • 11. Core coordination minimizes off-label analysis costs Off-labelinformatics toolsworkonrawdata
  • 12.
  • 13. Published bladder cancer microarray data set Leek et al. 2010
  • 14. Even large consortia datasets like TCGA have batch effects Fortin et al., 2014
  • 15. Design studies to avoid confounding technical artifacts and biological covariates
  • 16. Batch effects change the correlation structure between genes Leek et al. 2010
  • 17. Batch effects change the correlation structure between genes Leek et al. 2010
  • 18. Study design and data cleaning are the most critical part of any analysis
  • 19. We can mathematically correct for known batch effects in data with good study designs
  • 20. We can correct for batch effects if we know they are there
  • 21. Recognizing confounded designs • Trial Arm A in one batch and trial Arm B in another • Pre-treatment in one batch and post-treatment in another • Responders in one batch and non-responders in another • Designs can get complicated. E.g., what do you do if you have multiple tissue sites from multiple individuals and you want to compare both site and individual differences? We love to help during design!
  • 22. What should we do? Leek et al. 2010
  • 23. Bioinformatics as a team sport and best practices • Early consultation for sample preparation, technology selection, and study design • Interactive collaboration during data preprocessing and cleaning • Reproducible scripts to include as manuscript supplements or online to document analysis steps • Open source software for dissemination of any new algorithms employed in analysis
  • 24. Summary • It is never too early to contact your friendly neighborhood bioinformatician and we can consult on • Sample preservation • Technology selection • Study design • Analysis plan and preprocessing • Data parasiting • Coordinated collaboration in the data generation process and with the sequencing core minimizes costs and maximizes data quality