Data management, data sharing: the SysMO-SEEK Story
1. Data sharing
Data management
The SysMO-SEEK
Story
Professor Carole Goble FREng FBCS CITP
University of Manchester, UK
carole.goble@manchester.ac.uk
2. 13 teams
91 institutes, 300 scientists
Multi-site, multi-disciplinary
Each three year duration
Data generation
Data consumption
Data analysis
Data management:
Local – Shared – Long term
Pan European
Systems Biology
http://www.sysmo.net
3.
4. Own data solutions. wikis, e-Groupware,
PHProjekt, BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Extreme caution over sharing.
Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping
memberships, changing membership. Projects
ending, starting, carrying on the same, carrying
on differently.
Legacy
Suspicion
Dynamics
Expert scientists, inexpert informaticians. Few
resources.
Skills
Patchy standards, incomparable data,
afterthought.
Data
6. Data mine-ing
“my impression of researchers, and I can
criticize myself in this, is that we’re much
more interested in sharing data when we
mean sharing somebody else’s as opposed
[to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010
8. “It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare
it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the
credit/control too”
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its
too noisy.
“Others won’t use it properly.”
“It’s not worth
my while”“They are my competitors!!”
10. 2. Preparation for Use
Curation
Standards
Reusability
Reproducibility
Accountability & Quality
Data discipline Silo busting
11. CIMR Core Information for Metabolomics Reporting
MIABE Minimal Information About a Bioactive Entity
MIACA Minimal Information About a Cellular Assay
MIAME Minimum Information About a Microarray Experiment
MIAME/Env MIAME / Environmental transcriptomic experiment
MIAME/Nutr MIAME / Nutrigenomics
MIAME/Plant MIAME / Plant transcriptomics
MIAME/Tox MIAME / Toxicogenomics
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIASE Minimum Information About a Simulation Experiment
MIENS Minimum Information about an ENvironmental Sequence
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMIx Minimum Information about a Molecular Interaction Experiment
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINI Minimum Information about a Neuroscience Investigation
MINIMESS Minimal Metagenome Sequence Analysis Standard
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment
MIRIAM Minimal Information Required In the Annotation of biochemical Models
MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
Experiments
STRENDA Standards for Reporting Enzymology Data
TBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/
FuGE Functional Genomics Experimenthttp://www.mibbi.org/index.php/MIBBI_portal
Minimum
Information for
Biological and
Biomedical
Investigations
Metadata Minefield
14. Blue Collar Science
John Quackenbush
Difficult
and time
consuming
Poor Credit
or Reward
Shabby
Career
Paths &
Prospects
15. 3. Credit Crisis
• Reward sharing, curation and
reuse rather than reinvention.
• Credit. Attribution. Citation.
• For software, methods and
standards too.
• Technical (DataCite.org).
• Cultural (Respected policy).
• Institutional.
• Funding bodies.
16. 4. Infrastructure, Capability & Capacity
• Three year
PhD/project cycle
• Local data control
• Realistic paths to
adoption by busy
people.
• Spreadsheets, wikis,
catalogues and
yellow pages.
• Content and Tools
18. 6. Sustained Resources
• Three year projects.
• Three year lifespan of data (and its software).
• Sunsets and Sustains
• Reinvention rewarded
• Institution.
• Funding councils.
• Funding panels.
• Publishers
• Libraries
• National data centres
• International data centres
Free. Like Puppies
20. A Partnership
• Software engineers
• Computational scientists
• Experimental Scientists
• Domain informaticians
• Service providers
• Funding agencies
• But the community
credit crisis continues….
21. Summary
• Science is a complex social activity
undertaken by tribes of people and
dominated by trust issues.
• Infrastructure has to be there and fit for
purpose but its not the real the problem.
• Need a cultural shift (on all sides) that
truly honours data.
Notas do Editor
Sharing without fear
Some excuses
Data management is free like puppies are free
Add url here
E-Lab and Taverna – all my software - elephants ---- elephant in the room, blind men and elephants, danger of being white elephants?
SysMO
And other e-Science projects
Each of these apply to all our projects. Just one of them is not enough. Not even for Taverna.
To sustain it as a service we must sustain the software and the content in its repositories