SlideShare uma empresa Scribd logo
1 de 43
Data and Science in
Biomedical Research
Sean Davis, MD, PHD
National Cancer Institute, NIH
July 12, 2016
https://watson.nci.nih.gov/~sdavis/
@seandavis12
https://github.com/seandavi
Views my own
-Omics in context
• That which is not measurable is not science. —
Unknown
• That which is not physics is stamp collecting. — Ernest
Rutherford?
• To every action there is always opposed an equal
reaction. — Isaac Newton
• When we have found how the nucleus of atoms is built
up we shall have found the greatest secret of all —
except life. — Ernest Rutherford
Normal
Karyotype
Tumor
Karyotype
That which is not physics
is stamp collecting.
Measuring -omes…
Historical Perspectives
Microarray Databases
Data Availability
NCBI GEO
Historical Perspectives
Microarray Data Deposition
Physiol Genomics 20: 153–156, 2005;
The Human Genome Project
Your Nature Paper
To every action there is
always opposed an equal
reaction.
Integrative, large-scale projects begin to investigate
interrelated biological processes.
The Central Dogma
phenotype
Gene Copy
Number
Sequence
Variation
Chromatin
Structure and
Function
Gene Expression
Transcriptional
Regulation
DNA Methylation
Patient and
Population
Characteristics
The Cancer Genome Atlas
(TCGA)
• https://gdc-portal.nci.nih.gov/
• https://gdc-portal.nci.nih.gov/projects/g
Big Data
Costs…
a lot
Measure and Understand
Incentivize with appropriate
business models
Organize, democratize,
and value data
National Cancer Institute
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
National Institutes of Health
NCI Cancer Genomics Cloud
Pilots (and Genomic Data
Commons)
Tanja Davidsen, Ph.D.
Center for Biomedical Informatics and Information Technology (CBIIT)
National Cancer Institute
May 12, 2015
• Goal to unify fragmentary repositories at NCI
• TCGA, TARGET and CGCI have their own data repositories
(DCCs)
• Sequencing data: BAM files at CGhub while VCF/MAF files
at DCC
Center For Cancer Genomics (CCG) Genomics Data
Commons (GDC)
• Harmonize diverse standards
• BAMs aligned to various references
• Mutations are called by various tools
Genomics Data Commons (GDC)
• University of Chicago, PI: Dr. Robert Grossman
• Go live date: Late Spring 2016
• Not a commercial cloud: Free to download data
Genomics Data Commons (GDC)
Standard Model of Computational Analysis
Local Data
Locally Developed Software
Publicly Available
Software
Local storage and
compute resources
Network
Download
Public Data
Co-located Compute & Data
API
Data Access
Security
Resource Access
Core Data
(TCGA)
User Data
Computational
Capacity
Standard tools
User uploaded tools
The Cloud Pilots in Context
QA/QC
Validation
Aggregation
Authoritative NCI
Reference Data Set
Data Coordinating Center
NCI Genomic Data Commons
NCI Clouds
High Performance
Computing
Search/Retrieve
Download
Analysis
Project Schedule and Deliverables
Selection
Design/
Build I
Design/Build II Evaluation
6 Months
Initial Design and
Development
9 Months
Completion of Design,
Development and
Implementation
9 Months
Provide cloud to
researchers
NCI evaluations
Community evaluations
NCI Cloud Pilots Demo
• http://rpubs.com/seandavi/NCICloudRNA
• https://cgc.sbgenomics.com/
• https://bigquery.cloud.google.com/
• https://github.com/isb-cgc/examples-R
When we have found how the
nucleus of atoms is built up we
shall have found the greatest
secret of all — except life.
Future directions….
New Technologies
At the interface of new technologies, basic and
translational research, and multi-scale modeling.
Precision Medicine
Prime Time, Literally
Data Engineering to Speed
Cancer Research
• RTCGA Toolbox repackages TCGA data into reusable, fully-documented
analysis packages
• Adds value by including a general set of tools for TCGA data mining and
integration
• Relies on and extends largest open source biological software project,
Bioconductor, enabling thousands of scientists to more easily do cancer-related,
data-driven research
Data Sharing
in Action
• Powering clinical and translational
research using advanced databasing
and open data principles
• Validating of biomarkers
• Drug repositioning/repurposing
• Adding new mutations to existing
drug labels
• Identifying new drug targets
• Could provide the evidence base
necessary to support reimbursement
for next-generation sequence-based
testing by payers
Direct Clinical Applications
Internet of Things
Potential to fundamentally change the way we interact with
research subjects, patients, and the general population.
2016 07 12_purdue_bigdatainomics_seandavis

Mais conteúdo relacionado

Mais procurados

Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
Valery Tkachenko
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 

Mais procurados (20)

Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
It summit facilitate-researchcomputing-mercecrosas
It summit facilitate-researchcomputing-mercecrosasIt summit facilitate-researchcomputing-mercecrosas
It summit facilitate-researchcomputing-mercecrosas
 
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support ResearchDataStarR: A Data Sharing and Publication Infrastructure to Support Research
DataStarR: A Data Sharing and Publication Infrastructure to Support Research
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Pacific Research Platform Application Drivers
Pacific Research Platform Application DriversPacific Research Platform Application Drivers
Pacific Research Platform Application Drivers
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 

Destaque

Destaque (20)

SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Linked data in the digital humanities skills workshop for realising the oppo...
Linked data in the digital humanities  skills workshop for realising the oppo...Linked data in the digital humanities  skills workshop for realising the oppo...
Linked data in the digital humanities skills workshop for realising the oppo...
 
Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Harrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social mediaHarrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social media
 
PhD viva - 11th November 2015
PhD viva - 11th November 2015PhD viva - 11th November 2015
PhD viva - 11th November 2015
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalism
 
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...
 
Beyond Journalism Chicago
Beyond Journalism ChicagoBeyond Journalism Chicago
Beyond Journalism Chicago
 
From protein interaction networks to human phenotypes
From protein  interaction networks to human phenotypesFrom protein  interaction networks to human phenotypes
From protein interaction networks to human phenotypes
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation
 
Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013
 
Data Journalism - Start working with Data
Data Journalism  - Start working with DataData Journalism  - Start working with Data
Data Journalism - Start working with Data
 

Semelhante a 2016 07 12_purdue_bigdatainomics_seandavis

Challenges in Clinical Trials Networks
Challenges in Clinical Trials NetworksChallenges in Clinical Trials Networks
Challenges in Clinical Trials Networks
US Cochrane Center
 

Semelhante a 2016 07 12_purdue_bigdatainomics_seandavis (20)

Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Christine borgman keynote
Christine borgman keynoteChristine borgman keynote
Christine borgman keynote
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Challenges in Clinical Trials Networks
Challenges in Clinical Trials NetworksChallenges in Clinical Trials Networks
Challenges in Clinical Trials Networks
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 

Mais de Sean Davis

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 

Mais de Sean Davis (12)

Lightweight data engineering, tools, and software to facilitate data reuse an...
Lightweight data engineering, tools, and software to facilitate data reuse an...Lightweight data engineering, tools, and software to facilitate data reuse an...
Lightweight data engineering, tools, and software to facilitate data reuse an...
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor packageShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Rna seq
Rna seqRna seq
Rna seq
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 
Public datatutorialoverview
Public datatutorialoverviewPublic datatutorialoverview
Public datatutorialoverview
 
Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Genetics Branch Journal club
Genetics Branch Journal clubGenetics Branch Journal club
Genetics Branch Journal club
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Último (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 

2016 07 12_purdue_bigdatainomics_seandavis

  • 1. Data and Science in Biomedical Research Sean Davis, MD, PHD National Cancer Institute, NIH July 12, 2016 https://watson.nci.nih.gov/~sdavis/ @seandavis12 https://github.com/seandavi Views my own
  • 2. -Omics in context • That which is not measurable is not science. — Unknown • That which is not physics is stamp collecting. — Ernest Rutherford? • To every action there is always opposed an equal reaction. — Isaac Newton • When we have found how the nucleus of atoms is built up we shall have found the greatest secret of all — except life. — Ernest Rutherford
  • 4.
  • 5. That which is not physics is stamp collecting. Measuring -omes…
  • 6.
  • 7.
  • 8.
  • 11. Historical Perspectives Microarray Data Deposition Physiol Genomics 20: 153–156, 2005;
  • 12. The Human Genome Project
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 19.
  • 20. To every action there is always opposed an equal reaction. Integrative, large-scale projects begin to investigate interrelated biological processes.
  • 22. phenotype Gene Copy Number Sequence Variation Chromatin Structure and Function Gene Expression Transcriptional Regulation DNA Methylation Patient and Population Characteristics
  • 23.
  • 24. The Cancer Genome Atlas (TCGA) • https://gdc-portal.nci.nih.gov/ • https://gdc-portal.nci.nih.gov/projects/g
  • 25. Big Data Costs… a lot Measure and Understand Incentivize with appropriate business models Organize, democratize, and value data
  • 26. National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health NCI Cancer Genomics Cloud Pilots (and Genomic Data Commons) Tanja Davidsen, Ph.D. Center for Biomedical Informatics and Information Technology (CBIIT) National Cancer Institute May 12, 2015
  • 27. • Goal to unify fragmentary repositories at NCI • TCGA, TARGET and CGCI have their own data repositories (DCCs) • Sequencing data: BAM files at CGhub while VCF/MAF files at DCC Center For Cancer Genomics (CCG) Genomics Data Commons (GDC)
  • 28. • Harmonize diverse standards • BAMs aligned to various references • Mutations are called by various tools Genomics Data Commons (GDC)
  • 29. • University of Chicago, PI: Dr. Robert Grossman • Go live date: Late Spring 2016 • Not a commercial cloud: Free to download data Genomics Data Commons (GDC)
  • 30. Standard Model of Computational Analysis Local Data Locally Developed Software Publicly Available Software Local storage and compute resources Network Download Public Data
  • 31. Co-located Compute & Data API Data Access Security Resource Access Core Data (TCGA) User Data Computational Capacity Standard tools User uploaded tools
  • 32. The Cloud Pilots in Context QA/QC Validation Aggregation Authoritative NCI Reference Data Set Data Coordinating Center NCI Genomic Data Commons NCI Clouds High Performance Computing Search/Retrieve Download Analysis
  • 33. Project Schedule and Deliverables Selection Design/ Build I Design/Build II Evaluation 6 Months Initial Design and Development 9 Months Completion of Design, Development and Implementation 9 Months Provide cloud to researchers NCI evaluations Community evaluations
  • 34.
  • 35. NCI Cloud Pilots Demo • http://rpubs.com/seandavi/NCICloudRNA • https://cgc.sbgenomics.com/ • https://bigquery.cloud.google.com/ • https://github.com/isb-cgc/examples-R
  • 36. When we have found how the nucleus of atoms is built up we shall have found the greatest secret of all — except life. Future directions….
  • 37. New Technologies At the interface of new technologies, basic and translational research, and multi-scale modeling.
  • 39. Data Engineering to Speed Cancer Research • RTCGA Toolbox repackages TCGA data into reusable, fully-documented analysis packages • Adds value by including a general set of tools for TCGA data mining and integration • Relies on and extends largest open source biological software project, Bioconductor, enabling thousands of scientists to more easily do cancer-related, data-driven research
  • 40. Data Sharing in Action • Powering clinical and translational research using advanced databasing and open data principles • Validating of biomarkers • Drug repositioning/repurposing • Adding new mutations to existing drug labels • Identifying new drug targets • Could provide the evidence base necessary to support reimbursement for next-generation sequence-based testing by payers
  • 42. Internet of Things Potential to fundamentally change the way we interact with research subjects, patients, and the general population.

Notas do Editor

  1. The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
  2. Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.