High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
EMBL Australian Bioinformatics Resource AHM - Data Commons
1. BD2K and why bioinformatics matters
relevance to Australia
EMBL - Australia AHM 2016
Vivien Bonazzi
Senior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) Office
Office of the Director (OD)
National Institutes of Health (NIH)
2. The NIH Data Commons
Digital Ecosystems for using and sharing FAIR Data
EMBL - Australia AHM 2016
Vivien Bonazzi
Senior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) Office
Office of the Director (OD)
National Institutes of Health (NIH)
5. Convergence of factors
Mountains of Data
Increasing need and support for Data sharing
Availability of digital technologies and
infrastructures that support Data at scale
6.
7.
8. https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-management/nci-
policies/genomic-data
Requires public sharing of genomic data sets
9. 9
Recommendation #4: A national cancer data ecosystem for sharing and analysis.
Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad
array of large datasets so that researchers, clinicians, and patients will be able to both
contribute and analyze data, facilitating discovery that will ultimately improve patient
care and outcomes.
9
10.
11.
12. Challenges with Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Findable, Accessible, Interoperable, Reproducible
Limited e-infrastructures to support FAIR data
15. How do we find data, software, standards?
How can we make (large) data, annotations, software,
metadata accessible?
How do we reuse data, tools and standards?
How do we make more data machine readable?
How do we leverage existing digital technologies systems,
infrastructures?
How do we collaborate?
How do we enable digital ecosystem?
Changing the conversation around
Data sharing and access
NIH Data Commons
16. Data Commons
enabling data driven science
Enable investigators to leverage all possible data and tools
in the effort to accelerate biomedical discoveries, therapies
and cures
by
driving the development of data infrastructure and data
science capabilities through collaborative research and
robust engineering
Matthew Trunnel, FHC
18. Developing a Data Commons
Treats products of research – data, methods, papers etc.
as digital objects
These digital objects exist in a shared virtual space
• Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows
Digital object compliance through FAIR principles:
• Findable
• Accessible (and usable)
• Interoperable
• Reusable
19. The Data Commons
is a framework
that supports
FAIR data access and sharing
and
fosters the development
of a digital ecosystem
https://datascience.nih.gov/commons
20. The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
22. Current Data Commons Pilots
Explore feasibility of the Commons Framework
Facilitate collaboration and interoperability
Making large and/or high impact NIH funded data sets and tools
accessible in the cloud
Developing Data and Software indexing methods
Leveraging BD2K Efforts: bioCADDIE and others.
Collaborating with external groups
Provide access to cloud (IaaS) and PaaS/SaaS via credits
Connecting credits to the grants system
27. Cloud Credits Model
$ denominated NIH credits to use
cloud resources (IaaS) and services (PaaS/SaaS)
28. The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
31. Considerations
Metrics – Understanding and accounting of data usage patterns
Cost
• Cloud Storage
• Pay for use cloud compute (NIH credits pilot)
• Indirect costs for cloud
Hybrid Clouds – Institution (private) and commercial (public) clouds
Managing Open vs Controlled access data
• Auth: single sign on - dreams/nightmares?
Archive vs Working and versioning Copies of data
Interoperability with other Commons (clouds)
32. Standards – Metadata, UIDs, APIs
Discoverability – Finding digital objects across clouds
Interfaces – For users with different needs and capabilities
Consent – Reconsenting data, Dynamic consents?
Policies
• Data sharing policies that are useful and effective
• Keep pace with use of technology (e.g. dbGAP data in the Cloud)
Incentives
• Access to, and shareability of FAIR Data as part of NIH grant review
criteria
Governance – Community involvement in governance models
Sustainability – Long term support
34. Relevance to Australia
The value of Australian Data *
Unique flora and fauna
e.g Marsupials
Indigenous Australians
Understanding of genomic structure – health & disease
Medicinal products
Making this data (securely) available
With high quality annotation and metadata
Attributions to original authors
On the cloud
Via open standard APIs
Aggregation of data via an Australian wide Commons?
36. Summary
We need an unprecedented level of convergence and
collaboration to drive biomedical science to the next level.
Supporting this model of data-intensive collaborative science
requires a shift in academic research culture and new
investments in data infrastructure and capabilities.
Matthew Trunnel, FHC
37. Acknowledgments
• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu
Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB),
Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI),
Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
The mission of the Office of Science and Technology Policy is threefold;
provide the President and his senior staff with accurate, relevant, and timely scientific and technical advice on all matters of consequence;
to ensure that the policies of the Executive Branch are informed by sound science;
3) to ensure that the scientific and technical work of the Executive Branch is properly coordinated so as to provide the greatest benefit to society.
Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons
Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons