The document discusses the Genomic Data Commons (GDC), an effort by the National Cancer Institute to standardize and simplify submission of genomic cancer data and make it accessible to researchers according to FAIR principles. The GDC stores raw genomic data from over 50,000 cancer cases and associated clinical information to support discovery, validation of new therapies, and precision oncology. It aims to foster data sharing, reuse, and collaboration across cancer studies to advance understanding and treatment of cancer.
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Cancer Research Data Ecosystem
1. The Genomic Data Commons
Data Sharing and
Cancer Research Knowledge Ecosystem
Warren Kibbe, PhD
warren.kibbe@nih.gov
@wakibbe
September 26th, 2016
2. 2
Cancer Data Sharing
Genomic Data Commons
Cancer Data Ecosystem
• Support open science
• Support data reusability
• Precision Oncology
• Improve patient access to clinical
trials
Reduce the risk, improve early detection, outcomes and survivorship in cancer
3. Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem
The era of precision oncology is predicated on the
integration of research, care, and molecular medicine
and the availability of data for modeling, risk analysis,
and optimal care
The Genomic Data Commons is a
completely new way of making
scientific data available to the cancer
research community
4. 4
The Cancer Genomic Data Commons
(GDC) is an existing effort to standardize
and simplify submission of genomic data to
NCI and follow the principles of FAIR –
Findable, Accessible, Attributable,
Interoperable, Reusable, and Provide
Recognition.
The GDC is part of the NIH Big Data to
Knowledge (BD2K) initiative and an
example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the
use of data, annotation of data, use of
algorithms, supports the data /software
/metadata life cycle to provide credit and
analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
5. Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
Research Data Sets Cancer Cohorts Patient Data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational Studies
Proteogenomics
Imaging Data
Clinical Trials
Discovery
Patient-engaged
Research
Surveillance
Big Data
Implementation Research
SEERGDC
6. 6
Cancer Research Data Commons Ecosystem
Genomic
Data Commons
Data Validation and
Harmonization
Imaging
Data Commons
Proteomics
Data Commons
Clinical Data
Commons
(Cohorts /
Indiv.)
SEER
(Populations)
Data Contributors and Consumers
Researchers PatientsClinician
s
Institutions
7. Why is the GDC important?
A data ecosystem?
How does this move cancer research forward?
8. 8
Tumor, Cancer, and Metastasis:
(Length-scale and Time-scale Matter)
“…>90% of deaths are caused by disseminated disease or metastasis…”
Gupta et. al., Cell, 2006 and Siegel et. al. CA Cancer J Clin, Jan/Feb 2016
5 year Relative Survival Rates (2016 report of 2005-2011 data)
10. 10
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
14. What about Data Sharing?
Making data available for discovery, validation, new therapies
Working toward a National Learning Heath System for Cancer
Maximizing the impact, reuse, and reproducibility of cancer research
We need to change the incentives for data sharing
15. 15
NIH Genomic Data Sharing Policy
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-
management/nci-policies/genomic-data
Requires public sharing of genomic data sets
27. Development of the NCI Genomic Data Commons (GDC)
To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PI
Univ. of Chicago
Ontario Inst. Cancer Res.
Leidos
Institute of Medicine
Towards Precision Medicine
2011
28.
29.
30. Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.
+ 2,900
Colorectal
+ 1,200
Ovarian
+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to
identify all cancer drivers at >2% prevalence
31. GDC Content
GDC
TCGA 11,353 cases
TARGET 3,178 cases
Current
Foundation Medicine 18,000 cases
Cancer studies in dbGAP ~4,000 cases
Coming soon
NCI-MATCH ~5,000 cases
Clinical Trial Sequencing Program ~3,000 cases
Planned (1-3 years)
Cancer Driver Discovery Program ~5,000 cases
Human Cancer Model Initiative ~1,000 cases
APOLLO – VA-DoD ~8,000 cases
~58,000 cases
32. What Makes GDC Special?
Stores raw genomic data, allowing continuous reanalysis as
computation methods and genome annotations improve
NCI commitment to maintain long-term storage of cancer
genomic data in the GDC with free access to researchers
Utilizes shared bioinformatic pipelines to facilitate cross-study
comparisons and integrated analysis of multiple data types
Maintains harmonized clinical data in a highly structured and
extensible schema
Enables researchers to comply with the NIH Genomic Data
Sharing policy as well as journal requirements for data sharing
GDC
The explanatory power of data in the GDC will grow over time as
it accrues more cases => GDC will promote precision
oncology
33. Other Cancer Data Sharing Efforts
Signature Efforts Data
BRCA Challenge
Somatic variant sharing
Isolated genetic variants
No raw sequencing data
Precision medicine questions
Somatic variant sharing
Panel gene resequencing
Clinical response
Clinical trial
Public-private partnerships
Comprehensive genomics
Detailed clinical
phenotype data
Clinical trial access
Clinical/genomic data
aggregation
EHR data
Clinical sequencing
Clinical oncology standards
EHR data
Clinical sequencing
34. GDC
Towards a Cancer Knowledge System
Continue genomic investigations of cancer
=> Need > 100,000 cases analyzed
=> Embrace all genomic platforms
=> Relationship of relapse and primary biopsies
Incorporate associated clinical annotations
=> Clinical trial data
=> Observational, longitudinal standard-of-care data
=> N-of-1 clinical data
Promote and curate biological investigations of
cancer genetic variants
=> Driver vs. passenger mutations
=> Multiple phenotypic assays
=> Alterations in regulatory pathways – proteomics
=> Mechanisms of therapeutic resistance
=> Functional genomic investigations
Integrative models for high-dimensional data
35. GDC
Utility of a Cancer Knowledge System
Identify
low-frequency
cancer drivers
Define genomic
determinants of response
to therapy
Compose clinical trial
cohorts sharing
targeted genetic lesions
Cancer
information
donors
Genomic
Data
Commons
36. 36
Genomic Data Commons / Cloud Pilot Ecosystem– Near Term
Data Submission,
Harmonization,
Visualization &
Download
Broad FireCloud
ISB CGC
Researchers
APIs
Web Interface
Researchers
Contributors
Authentication
& Authorization
thru eRA Commons
and dbGaP
SBG CGC
Commons
@ AWS
Commons
@ GCP
Commons
@ Azure
New algorithms,
tools, pipelines,
visualizationsDockStore
Genomic
Data Commons
37. 37
Support the Precision Medicine Initiative
• Expand data model to include
other data (e.g. imaging and
proteomics)
• Allow easy publication of
persistent links to data,
annotations, algorithms, tools,
workflows
• Measure usage and impact
• Change incentives for public
contributions
The Genomic Data Commons and Cloud Pilots
38. Cancer data ecosystem
Well characterized
research data sets Cancer cohorts Patient data
EHR, lab data, imaging,
PROs, smart devices,
decision support
Learning from every
cancer patient
Active research
participation
Researchinformation
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEER
39. GDC Acknowledgements
NCI Center for Cancer Genomics Univ. of Chicago
Bob Grossman
Allison Heath
Mike Ford
Zhenyu Zhang
Ontario Institute for Cancer Research
Lou Staudt
Zhining Wang
Martin Ferguson
JC Zenklusen
Daniela Gerhard
Deb Steverson
Vincent Ferretti
'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical Research
Mark Jensen
Sharon Gaheen
Himanso Sahni
NCI NCI CBIIT
Tony Kerlavage
Tanya Davidsen
40. CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
• Allison Heath, Ph.D - University of Chicago
• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.