Newer, faster, cheaper molecular assays are driving biomedical research. I discuss the history of biomedical data including concepts of data sharing, hypothesis-driven vs generating research, and the potential to expand our thinking on biomedical research to be much more integrated through smart, creative, and open use of technologies and more flexible, longitudinal studies.
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
2016 07 12_purdue_bigdatainomics_seandavis
1. Data and Science in
Biomedical Research
Sean Davis, MD, PHD
National Cancer Institute, NIH
July 12, 2016
https://watson.nci.nih.gov/~sdavis/
@seandavis12
https://github.com/seandavi
Views my own
2. -Omics in context
• That which is not measurable is not science. —
Unknown
• That which is not physics is stamp collecting. — Ernest
Rutherford?
• To every action there is always opposed an equal
reaction. — Isaac Newton
• When we have found how the nucleus of atoms is built
up we shall have found the greatest secret of all —
except life. — Ernest Rutherford
20. To every action there is
always opposed an equal
reaction.
Integrative, large-scale projects begin to investigate
interrelated biological processes.
24. The Cancer Genome Atlas
(TCGA)
• https://gdc-portal.nci.nih.gov/
• https://gdc-portal.nci.nih.gov/projects/g
25. Big Data
Costs…
a lot
Measure and Understand
Incentivize with appropriate
business models
Organize, democratize,
and value data
26. National Cancer Institute
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
National Institutes of Health
NCI Cancer Genomics Cloud
Pilots (and Genomic Data
Commons)
Tanja Davidsen, Ph.D.
Center for Biomedical Informatics and Information Technology (CBIIT)
National Cancer Institute
May 12, 2015
27. • Goal to unify fragmentary repositories at NCI
• TCGA, TARGET and CGCI have their own data repositories
(DCCs)
• Sequencing data: BAM files at CGhub while VCF/MAF files
at DCC
Center For Cancer Genomics (CCG) Genomics Data
Commons (GDC)
28. • Harmonize diverse standards
• BAMs aligned to various references
• Mutations are called by various tools
Genomics Data Commons (GDC)
29. • University of Chicago, PI: Dr. Robert Grossman
• Go live date: Late Spring 2016
• Not a commercial cloud: Free to download data
Genomics Data Commons (GDC)
30. Standard Model of Computational Analysis
Local Data
Locally Developed Software
Publicly Available
Software
Local storage and
compute resources
Network
Download
Public Data
31. Co-located Compute & Data
API
Data Access
Security
Resource Access
Core Data
(TCGA)
User Data
Computational
Capacity
Standard tools
User uploaded tools
32. The Cloud Pilots in Context
QA/QC
Validation
Aggregation
Authoritative NCI
Reference Data Set
Data Coordinating Center
NCI Genomic Data Commons
NCI Clouds
High Performance
Computing
Search/Retrieve
Download
Analysis
33. Project Schedule and Deliverables
Selection
Design/
Build I
Design/Build II Evaluation
6 Months
Initial Design and
Development
9 Months
Completion of Design,
Development and
Implementation
9 Months
Provide cloud to
researchers
NCI evaluations
Community evaluations
39. Data Engineering to Speed
Cancer Research
• RTCGA Toolbox repackages TCGA data into reusable, fully-documented
analysis packages
• Adds value by including a general set of tools for TCGA data mining and
integration
• Relies on and extends largest open source biological software project,
Bioconductor, enabling thousands of scientists to more easily do cancer-related,
data-driven research
40. Data Sharing
in Action
• Powering clinical and translational
research using advanced databasing
and open data principles
• Validating of biomarkers
• Drug repositioning/repurposing
• Adding new mutations to existing
drug labels
• Identifying new drug targets
• Could provide the evidence base
necessary to support reimbursement
for next-generation sequence-based
testing by payers
42. Internet of Things
Potential to fundamentally change the way we interact with
research subjects, patients, and the general population.
Notas do Editor
The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.