How to find data for your research
Presented by Fiona Nielsen at the International Conference of Genomics 2016 www.icg-11.org in the session Data Sharing and Analysis chaired by Laurie Goodman, editor-in-chief, GigaScience
ICG-11 - genomic data projects around the world - nov 5 2016
1. ICG-11: Genomic Data Projects around the World
- How to find data for your research
Fiona Nielsen – November 4th 2016
2. We are always looking for data
Genetics,
Cancer,
Rare disease
research
We need
access to the
right data at
the right time
DNA
interpretation
requires
lots of data
3. How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
?
2016:
Exac consortium 65,000 exomes
GnomAD ~126,000 exomes
2020:
4. Data is not easy to find and access
FRAGMENTED
Poor visibility of available
genomic data
ADMIN BURDEN
Huge overhead to manage
data access
BAD CULTURE
Lack of data sharing habits in
research culture
5. Finding and accessing data can take months
40%
48%
11%
< 1 week
1-3 months
+6 months
Time spent data scouting per project
6. Why the barrier?
Barriers
• Difficult to find data, let alone
find the RIGHT data
• Time-consuming and difficult
to apply for access to data
• Complicated and labourious to
submit data to public
repositories
http://blog.repositive.io/tag/data-access/
http://blog.repositive.io/tag/data-sharing/
10. We have identified hundreds of data sources
Universities – Or repositories
affiliated to a university.
Projects/Consortia – Has a
specific purpose/aim. Often
focussed on a specific
research question or disease.
Public repositories – Allows
download and upload of
data from multiple
institutions.
Companies – For profit
organisations making data
available for free or as a
service.
Biobanks – many have sequence data of their biological samples.
Researchers
know on
average 4-5
data sources
More data sources appear every day,
to date we have identified 350+
11. Simpler workflow
for data access
And indexed them on a the Repositive platform
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
Free to use: http://discover.repositive.io
12. Platform launched in Sept 2016
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
1 Million+
Human genomic
datasets indexed
Free to use: http://discover.repositive.io
13. Platform launched in Sept 2016
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
1 Million+
datasets indexed
Simpler workflow
for data access
177k
Whole Exomes
213k
Whole Genomes
2400
23andMe samples
Free to use: http://discover.repositive.io
14. Platform launched in Sept 2016
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
1 Million+
datasets indexed
Simpler workflow
for data access
61+
Countries
426+
Research organisations
Using Repositive
PDX Consortium
With AstraZeneca
Free to use: http://discover.repositive.io
15. 11
155
2
2
4
4
7
780
0
5
10
15
20
25
30
35
40
45
GB FI NL FR DE CH EE BE DK ES SI IE SE
0
5
10
15
20
25
30
35
CA MD MA WA NY TX AZ DC NJ NC PA UT TN CO IN FL LA VA IL ME OH MO MI SC OR
1
1
1
1
1
1
Data sources across the globe
GEO location of 278
data sources analysed.
Found by tracking IP address
of the source.
These include:
Public Repositories
Universities
Companies
BioBanks
Research consortiums
18. Machines & Data sources
947
5600
88
660
26
68
50
62
3
25
0
0
23 International
Interesting site to look at:
http://omicsmaps.com/stats
19. • Repositive is supporting the whole research workflow
• Faster, more efficient data discovery
• Streamlining data access applications
• Developing technology for efficient data access
• Setting up pre-competitive data sharing agreements
• Running workshops and training programmes
More efficient data access
Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx
20. Building upon best practices
MAKE DATA
DISCOVERABLE
SIMPLIFY
WORKFLOWS
CONTRIBUTE TO
COMMUNITY
DNAdigest and Repositive – Connecting the world of genomic data
http://www.tinyurl.com/plos-biology-repositive
First 30 data sources listed here:
21. Connecting the world of genomic data
Visit us at: http://repositive.io
Or tweet us @repositiveio Free to use: http://discover.repositive.io
Fiona Nielsen, CEO
Email us: info@repositive.io
Notas do Editor
Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
Because interpretation requires LOTS of data
And although data exists around the world, it is siloed, and even if available, it is not accessible
This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer
She needs data from other patients to compare and interpret Mabels DNA
She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and vetting of users
Data is fragmented in unconnected silos – makes it very difficult to discover data
Tracking data and working with data access requests is a time-consuming and bureaucratic exercise
Difficult to build a user community without best practices and tools/platforms where users can share their data experience / findings
Because interpretation requires LOTS of data
And although data exists around the world, it is siloed, and even if available, it is not accessible
This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer
She needs data from other patients to compare and interpret Mabels DNA
She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
Further confounded by the data being highly fragmented.
Siloed in repositories and institutions around the world.
There are many public repositories, but It can be hugely confusing to know where to look for the right kind of data
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices