Grand Rounds at the Siteman Cancer Center at Washington University. Highlighting the Genomic Data Commons and the National Cancer Data Ecosystem defined by the Cancer Moonshot Blue Ribbon Panel
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
National Cancer Data Ecosystem and Data Sharing
1. NCI Support for a National Cancer Data
Ecosystem and Data Sharing
Warren Kibbe, PhD
warren.kibbe@nih.gov
@wakibbe
February 6th, 2017
2. 2
To develop the knowledge base
that will lessen the burden of
cancer in the United States and
around the world.
NCI Mission
3. 3
Cancer Statistics
In 2016 there were an estimated
1,700,000 new cancer cases
and
600,000 cancer deaths
- American Cancer Society 2015
Cancer remains the second most common cause of death in the U.S.
- Centers for Disease Control and Prevention 2015
4. 4
Understanding Cancer
Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and clinical
presentation and direct effective, evidence-based
prevention and treatment.
5. 5
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
Cancer is a grand challenge
Deep biological understanding
Advances in scientific methods
Advances in instrumentation
Advances in technology
Data and computation
Cancer Research and Care generate
detailed data that is critical to
create a learning health system for cancer
Requires:
7. 7
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
8. 8
Tumor, Cancer, and Metastasis:
(Length-scale and Time-scale Matter)
“…>90% of deaths are caused by disseminated disease or metastasis…”
Gupta et. al., Cell, 2006 and Siegel et. al. CA Cancer J Clin, Jan/Feb 2016
5 year Relative Survival Rates (2016 report of 2005-2011 data)
13. 13
The Beau Biden
Cancer Moonshot
How do we enable meaningful,
patient-centered and patient-level
data sharing for cancer and
promote access to clinical trials for
all Americans?
14. 14
Goals of the Beau Biden Cancer Moonshot
Accelerate progress in cancer, including prevention &
screening
From cutting edge basic research to wider uptake of standard
of care
Encourage greater cooperation and collaboration
Within and between academia, government, and private sector
Enhance data sharing
(Presidential Memo 2016)
15. 15
A Few Beau Biden Cancer Moonshot Milestones
• Announced by Former President Obama at the State of the Union January 12, 2016
• Blue Ribbon Panel convened at AACR, April 18, 2016
• Genomic Data Commons went public June 6, 2016
• Vice President’s Cancer Moonshot Summit – June 29, 2016
• Rethinking Clinical Trial Search – Open API at https://clinicaltrialsapi.cancer.gov
• Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory
Board on September 7th, 2016
• Cancer Moonshot Task Force and BRP recommendations sent to President on October
17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer-
initiative/milestones and released at https://cancer.gov/brp
• 21st Century Cures Act funding the Beau Biden Cancer Moonshot passed 94-5 by the
Senate on December 8 and signed by Former President Obama December 13, 2016.
17. Blue Ribbon Panel Recommendations
Network for Direct Patient Engagement
Cancer Immunotherapy Translational Science Network
Therapeutic Target Identification to Overcome Drug Resistance
A National Cancer Data Ecosystem for Sharing and Analysis
Fusion Oncoproteins in Childhood Cancers
Symptom Management Research
Prevention and Early Detection – Implementation of Evidence-based Approaches
Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care
Generation of 4D Human Tumor Atlas
Development of New Enabling Cancer Technologies
18. 18
Relationship Between Bypass Budget and Blue Ribbon
Panel Report
Bypass Budget addresses NCI’s
entire research portfolio
Lays out the plan for NCI’s
continued investment in cancer
research
Cancer Moonshot is a unique
opportunity to enhance cancer
research in specific areas that are
poised for acceleration
The BRP report made 10 bold,
yet feasible, recommendations
that will fast-track initiatives if
infused with Moonshot funding
20. 20
Changing the conversation around data sharing
How do we find data, software, standards?
How can we make data, annotations, software, metadata accessible?
How do we increase data reuse? Reproducibility?
How do we make more data machine readable?
NIH Data Commons
NCI Genomic Data Commons
National Cancer Data Ecosystem
Data Commons co-locate data, storage and computing infrastructure, and
frequently used tools for analyzing and sharing data to create an
interoperable resource for the research community.
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a
Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.
21. Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
research data sets Cancer cohorts Patient data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEERGDC
22. 22
Principles for the Cancer Research Data Ecosystem
• Open Science. Supporting Open Access, Open Data, Open Source, and
Data Liquidity for the cancer community
• Standardization through CDEs and Case Report Forms
• Interoperability by exposing existing knowledge through appropriate
integration of ontologies, vocabularies and taxonomies
• Consistency of curation and capture of evidence (tissue types, recurrence,
therapy – including genomic, epigenomic, etc context)
• Sustainable models for informatics infrastructure, services, data, metadata
23. 23
Cancer Data Sharing
& Data Commons
• Support open science
• Support data reusability
• Aligned with Cancer Moonshot
• Part of Precision Medicine
• Reduce Health Disparities
• Improve patient access to clinical
trials
• Work toward a learning National
Cancer Data Ecosystem
Reduce the risk, improve early detection, outcomes and survivorship in cancer
24. 24
Data Commons Structure
DICOM, AIM
Amazon
Google
IBM
Imaging
Validator
Q/A
Proteomic
Validator
Q/A
Clinical Phenotype
Validator
Q/A
MOD Phenotype
Validator
Q/A
Pathology
Radiology
Mass
Spectrometry
Array
Data
Commons
Security
Visualization
Authentication
& Authorization
Genomic
Validator
Q/A Germline Pipelines
DNA, RNA Pipelines
EMRs, Clinical
Trials
Azure
Data Contributors and Consumers
Researchers PatientsCliniciansInstitutions
NCI Thesaurus
caDSR
NLM UMLS
RxNorm
LOINC
SNOMED
25. 25
The Cancer Genomic Data Commons
(GDC) is an existing effort to standardize
and simplify submission of genomic data
to NCI and follow the principles of FAIR
– Findable, Accessible, Attributable,
Interoperable, Reusable, and Provide
Recognition.
The GDC is part of the NIH Big Data to
Knowledge (BD2K) initiative and an
example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of
data, annotation of data, use of algorithms, supports
the data /software /metadata life cycle to provide
credit and analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
26. NCI Genomic Data Commons
The GDC went live on June 6, 2016 with approximately 4.1 PB of data.
This includes:
2.6 PB of legacy data
1.5 PB of “harmonized” data
577,878 files about 14194 cases (patients), in 42 cancer types, across 29 primary
sites.
10 major data types, ranging from Raw Sequencing Data, Raw Microarray Data, to
Copy Number Variation, Simple Nucleotide Variation and Gene Expression.
Data are derived from 17 different experimental strategies, with the major ones being
RNA-Seq, WXS, WGS, miRNA-Seq, Genotyping Array and Expression Array.
Foundation Medicine announced the release of 18,000 genomic profiles to the
GDC at the Cancer Moonshot Summit.
31. Recovery
rate
(% true
positives) A0F0
SomaticSniper 81.1% 76.5%
VarScan 93.9% 84.3%
MuSE 93.1% 87.3%
All Three 96.4% 91.2%
GDC variant calling
pipelines
Wash U
Baylor
Broad
GDC Data Harmonization
Multiple pipelines needed to recover all variants
32. Development of the NCI Genomic Data Commons (GDC)
To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PI
Univ. of Chicago
Ontario Inst. Cancer Res.
Leidos
Institute of Medicine
Towards Precision Medicine
2011
33.
34.
35. Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.
+ 2,900
Colorectal
+ 1,200
Ovarian
+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to
identify all cancer drivers at >2% prevalence
36. What Makes GDC Special?
Stores raw genomic data, allowing continuous reanalysis as
computation methods and genome annotations improve
NCI commitment to maintain long-term storage of cancer
genomic data in the GDC with free access to researchers
Utilizes shared bioinformatic pipelines to facilitate cross-study
comparisons and integrated analysis of multiple data types
Maintains harmonized clinical data in a highly structured and
extensible schema
Enables researchers to comply with the NIH Genomic Data
Sharing policy as well as journal requirements for data sharing
GDC
The explanatory power of data in the GDC will grow over time as
it accrues more cases => GDC will promote precision
oncology
37. Other Cancer Data Sharing Efforts
Signature Efforts Data
BRCA Challenge
Somatic variant sharing
Isolated genetic variants
No raw sequencing data
Precision medicine questions
Somatic variant sharing
Panel gene resequencing
Clinical response
Clinical trial
Public-private partnerships
Comprehensive genomics
Detailed clinical
phenotype data
Clinical trial access
Clinical/genomic data
aggregation
EHR data
Clinical sequencing
Clinical oncology standards
EHR data
Clinical sequencing
38. GDC
Utility of a Cancer Knowledge System
Identify
low-frequency
cancer drivers
Define genomic
determinants of response
to therapy
Compose clinical trial
cohorts sharing
targeted genetic lesions
Cancer
information
donors
Genomic
Data
Commons
39. Towards a Cancer Knowledge System
Continue genomic investigations of cancer
• Need > 100,000 cases analyzed
• Embrace all genomic platforms
• Relationship of relapse and primary biopsies
Incorporate associated clinical annotations
• Clinical trial data
• Observational, longitudinal standard-of-care data
• N-of-1 clinical data
Promote and curate biological investigations of cancer genetic variants
• Driver vs. passenger mutations
• Multiple phenotypic assays
• Alterations in regulatory pathways – proteomics
• Mechanisms of therapeutic resistance
• Functional genomic investigations
Integrative models for high-dimensional data
Genomic
Data
Commons
40. 40
Support the Precision Medicine Initiative
• Expand data model to include
other data (e.g. imaging and
proteomics)
• Allow easy publication of
persistent links to data,
annotations, algorithms, tools,
workflows
• Measure usage and impact
• Change incentives for public
contributions
The Genomic Data Commons and Cloud Pilots
41. 41
PMI – Oncology, the GDC and the Cloud Pilots Goals
Support precision medicine-focused clinical research
Enable researchers to deposit well-annotated
(Interoperable) genomic data sets with the GDC
Provide a single source (and single dbGaP access
request!) to Find and Access these data
Enable effective analysis and meta-analysis of these data
without requiring local downloads – data Reuse
Understand Contributions, Assess value through usage,
and give Attribution to all users
42. 42
PMI – Oncology, the GDC and the Cloud Pilots Goals
Provide a data integration platform to allow multiple data
types, multi-scalar data, temporal data from cancer models
and patients through open APIs
Work with the Global Alliance for Genomics and Health
(GA4GH) to define the next generation of secure,
flexible, meaningful, interoperable, lightweight
interfaces – open APIs
Engage the cancer research community in evaluating
the open APIs for ease of use and effectiveness
43. GDC Acknowledgements
NCI Center for Cancer Genomics Univ. of Chicago
Bob Grossman
Allison Heath
Mike Ford
Zhenyu Zhang
Ontario Institute for Cancer Research
Lou Staudt
Zhining Wang
Martin Ferguson
JC Zenklusen
Daniela Gerhard
Deb Steverson
Vincent Ferretti
'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical Research
Mark Jensen
Sharon Gaheen
Himanso Sahni
NCI NCI CBIIT
Tony Kerlavage
Tanya Davidsen
44. CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
• Allison Heath, Ph.D - University of Chicago
• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.
45. Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
research data sets Cancer cohorts Patient data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEERGDC
46. 46
Data Commons Structure
DICOM, AIM
Amazon
Google
IBM
Imaging
Validator
Q/A
Proteomic
Validator
Q/A
Clinical Phenotype
Validator
Q/A
MOD Phenotype
Validator
Q/A
Pathology
Radiology
Mass
Spectrometry
Array
Data
Commons
Security
Visualization
Authentication
& Authorization
Genomic
Validator
Q/A Germline Pipelines
DNA, RNA Pipelines
EMRs, Clinical
Trials
Azure
Data Contributors and Consumers
Researchers PatientsCliniciansInstitutions
NCI Thesaurus
caDSR
NLM UMLS
RxNorm
LOINC
SNOMED
49. Rethinking Clinical Trials Search
Engaging the Presidential Innovation Fellows
Create an Application Programming Interface (API) for Clinical Trials
Create an example search interface based on the API
Create a twitter feed for all new clinical trials
Incorporation of these innovations into cancer.gov
2/7/2017
50. 50
Rethinking and Enhancing Clinical Trial Search: June, 2016
• Initial Release of an API (Application Programming Interface) (API)1, developed by the
Presidential Innovation Fellows, for testing. This tool, found at
https://clinicaltrialsapi.cancer.gov, makes publicly available trial registration information
from the CTRP database, currently found on cancer.gov, assessable to third-party
innovators so that they can build new digital tools tailored to the clinical trial search
needs of their users.
• Launch of @NCICancerTrials on Twitter and dissemination of clinical trial information
via GovDelivery: https://public.govdelivery.com/accounts/USNIHNCI/subscriber/new
• Changes the Cancer.gov Website to enhance clinical trial searching
1A set of protocols designed to provide communication between a software application and a computer
operating system or between applications.
51. Rethinking Clinical Trial Search – Next Steps
• Cancer.gov
- Work with the CTAC Clinical Trials Informatics Working Group (CTIWG) on
the design on a “front end” to the API for use on the Cancer.gov website.
- This will allow search and retrieval of information that is currently available on
Cancer.gov directly from NCI’s Clinical Trials Reporting Program
- The CTIWG will provide input regarding design and usability of the
Cancer.gov website, as well as:
- Prioritization of requested enhancements (e.g., structured eligibility criteria)
• Other websites and/or providers of clinical trial search
- Test API and use publicly assessable CTRP data for use in their systems.
62. 62
NIH Genomic Data Sharing Policy
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-
management/nci-policies/genomic-data
Requires public sharing of genomic data sets