1. Software Sustainability Institute
www.software.ac.uk
Software – a different
kind of research object?
http://dx.doi.org/10.6084/m9.figshare.5459542
3rd October 2017, Lancaster Data Conversations, Lancaster
Neil Chue Hong (@npch), Software Sustainability Institute
ORCID: 0000-0002-8876-7606 | N.ChueHong@software.ac.uk
Slides licensed under
CC-BY where indicated:
Supported by Project funding
from
3. The research community
relies on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
56%
Develop their
own software
71% Have no formal
software training
6. Software Sustainability Institute
www.software.ac.uk
Repeatability of published
microarray gene expression
analyses
56% of analyses could not be repeated,
of which 30% were because of software issues.
50% did not state software version, 39% did not provide raw data.
Only 11% could be reproduced satisfactorily.
Ioannidis et al. Nature Genetics, 41, 2010
doi:10.1038/ng.295
7. Software Sustainability Institute
www.software.ac.uk
Repeatability in Computer
Science
Of 401 papers in ACM Computer Science journals and proceedings,
only 85 provided a link to software.
For 176 the software could not be obtained.
Collberg, Proebsting, Warren, University of Arizona TR 14-04, 2015
http://reproducibility.cs.arizona.edu/v2/RepeatabilityTR.pdf
8. Software Sustainability Institute
www.software.ac.uk
Errors due to
bioinformatics pipeline
The results presented in the Report “Ancient Ethiopian genome reveals
extensive Eurasian admixture throughout the African continent“ were
affected by a bioinformatics error – identified because of open science
Llorente et al. Science, 350, 6262
doi:10.1126/science.aad2879
11. Software Sustainability Institute
www.software.ac.uk
Authorship Lifecycle
IdentifyCite
Reuse
Research
Index
Papers, data, software all
research outputs of
a continuous cycle.
With software, technology
makes it easier to track,
but not reward.
We cannot separate
papers, data and software
when we release research.
http://openresearchsoftware.metajnl.com
12. Software Sustainability Institute
www.software.ac.uk
The current process
Start
research
Write
software
Use
software
Produce
results
Publish
research
paper
Release
data
Release
software
Which mentions
software and data
This process is simple but
does not reward production or
reuse of good software and data.
It also has a long contribution cycle.
13. Software Sustainability Institute
www.software.ac.uk
Write
software
A better process?
Start
research
Identify
existing
software
Use
software
Produce
results
Publish
research
paper
Adapt/
extend
software
Release
data
Release
software
Publish
software
paper Publish
data
paper
Whichreferences
softwareanddatapapers
Software and data papers
are needed as proxies for
rewarding reuse.
But it enables a shorter contribution cycle for
data and software.
14. Software Sustainability Institute
www.software.ac.uk
What do we choose to identify:
- Workflow?
- Software that runs workflow?
- Software referenced by workflow?
- Software dependencies?
What’s the minimum citable part?
Boundary
http://dx.doi.org/10.6084/m9.figshare.1497930
17. Software Sustainability Institute
www.software.ac.uk
AuthorshipAuthorship
• Which authors have had what impact on each version of the software?
• Who had the largest contribution to the scientific results in a paper?
http://beyond-impact.org/?p=175
OGSA-DAI projects statistics
from Ohloh
http://dx.doi.org/10.6084/m9.figshare.1497930
19. Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
A national facility for cultivating better, more
sustainable, research software to enable world-
class research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software Supported by EPSRC Grant EP/H043160/1
+ EPSRC/ESRC/BBSRC grant EP/N006410/1
20. Software Sustainability Institute
www.software.ac.uk
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
21. Software Sustainability Institute
www.software.ac.uk
T
Research Culture
Needs Changing
“This particular project was something I wrote a
couple years ago to help me out with a
workflow… I’d put it up on Github, so that others
could potentially use it or use the code. So I went
to see what people were saying about this
project. It seemed liked I’d done something
fundamentally wrong, so stupid that it
flabbergasts someone... So of course I start
sobbing. Then I see these people’s follower
count, and I sob harder. I can’t help but think of
potential future employers that are no longer
potential.”
http://www.software.ac.uk/blog/2013-01-25-haters-gonna-
hate-why-you-shouldnt-be-ashamed-releasing-your-code
22. Software Sustainability Institute
www.software.ac.uk
T
Research Culture
Needs Changing
Our research culture presents barriers
but few incentives to sharing code
• There is a fear of being “found out” for poor
code, but no encouragement or resources to
improve software engineering skills
• There is no reward for publishing code in the
current system of metrics. Researchers fear
being “scooped” or losing ability to publish.
• Many organisations do not understand how to
exploit open source licenses
25. Software Sustainability Institute
www.software.ac.uk
Research Software Workflow
develop share preserve
Developed and
versioned using
code repository
Published via
code repository
or website
Deposited in
digital repository
with paper /
for preservation
26. Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To
Please Your Future Self
• Data:
Save and backup raw data
Create analysis-friendly data
Record your processing steps
Anticipate the need to use multiple tables, and use
a unique identifier for each record
Submit data to a repository and get a DOI
27. Software Sustainability Institute
www.software.ac.uk
Good Enough Practices To
Please Your Future Self
• Software:
Document for your future self:
• Brief descriptive comment at the start of your code
• Provide a simple example or test data set
• Give functions and variables meaningful names
• Make dependencies and requirements explicit
Learn to be modular
• Break programs into functions
• Don’t duplicate functionality
• Search for well-maintained libraries that do what you need
Make it accessible in the future
• Make the license explicit
• Keep track of changes
• Submit code to a reputable DOI-issuing repository
Good Enough Practices in Scientific Computing: https://doi.org/10.1371/journal.pcbi.1005510
28. Software Sustainability Institute
www.software.ac.uk
What you can do now
• Make sure you’re using version control
• Write a README file that describes how you
can get your code up and running, and give it
to a colleague to try out
What it does, requirements / dependencies, simple
example of use and input + output data
• Ask a collaborator to contribute a new piece of
functionality, and get feedback on the process
• Talk to your library / IT services about the
services they offer
29. Software Sustainability Institute
www.software.ac.uk
Get some training
Teach basic lab skills
for scientific computing
so that researchers can do more in less
time and with less pain.
Teach basic concepts, skills and tools for
working more effectively with data.
Workshops are designed for people with
little to no prior computational
experience.
admin@datacarpentry.org
admin@software-carpentry.org
Open source learning, that can be tailored to disciplines.
“Train the trainers”: building a capable base of instructors.
31. Software Sustainability Institute
www.software.ac.uk
Interested in more?
• Publish a software paper
http://bit.ly/softwarejournals
• Easily archive your GitHub Code and make ir
citable
GitHub to Zenodo
GitHub to FigShare
• Software Citation Implementation WG
https://www.force11.org/group/software-citation-
implementation-working-group
32. Software Sustainability Institute
www.software.ac.uk
Literate Programming
• Traditional papers are just advertisements
A literate computing document is the research
• The technology is out there
Jupyter notebooks
Mathematica
R Markdown
knitR
MATLAB Live scripts
33. Software Sustainability Institute
www.software.ac.uk
• LIGO Paper:
http://journals.aps.org/prl/abstract/10.1103/Phys
RevLett.116.061102
• LIGO Notebook:
https://losc.ligo.org/s/events/GW150914/GW1509
14_tutorial.ipynb
LIGO Example
34. Software Sustainability Institute
www.software.ac.uk
SSI Fellows 2018 / CW18
• SSI Fellowships
Deadline: 9th October 2017
£3000 bursary to be a research software advocate
Join a network of great people working to improve
• Collaborations Workshop 2018
Cardiff, 26-28th March 2018
Theme: “Culture Change and Productivity”
The un-conference that most participants would
recommend to their colleagues
35. Software Sustainability Institute
www.software.ac.uk
T
Without data it’s difficult
to validate results.
But without code, we
waste the opportunity to
advance science.
These slides: http://dx.doi.org/10.6084/m9.figshare.5459542
“The only way to publish software in a scientifically robust manner is to share
source code, and that means publishing via the internet in an open-access/open-
source fashion. —Warren Lyford DeLano, Creator of PyMOL, 2005
36. Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
A national facility for cultivating better, more
sustainable, research software to enable world-
class research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software Supported by EPSRC Grant EP/H043160/1
+ EPSRC/ESRC/BBSRC grant EP/N006410/1
37. Software
Policy
Training
Community
Outreach
Delivering essential software
skills to researchers via CDTs,
institutions & doctoral schools
Helping the community to
develop software that meets the
needs of reliable, reproducible,
and reusable research
Collecting evidence
on the community’s
software use & sharing
with stakeholders
Bringing together
the right people to
understand and address
topical issues
Exploiting our platform to
enable engagement,
delivery & uptake
39. Software Sustainability Institute
www.software.ac.uk
Find out more about the SSI
• Community Engagement (Lead: Shoaib Sufi)
Fellowship Programme
Events and Workshops
• Consultancy (Lead: Steve Crouch)
Open Call for Projects / Collaborations
Software Evaluation
• Policy and Publicity (Lead: Simon Hettrick)
Case Studies / Policy Campaigns
Software and Research Blog
• Training (Lead: Aleksandra Nenadic)
Software Carpentry and Data Carpentry (300+ students/year)
Guides and Top Tips
• Journal of Open Research Software (Editor: Neil Chue Hong)
• Collaboration between universities of Edinburgh, Manchester, Oxford and Southampton
Supported by EPSRC Grant EP/H043160/1 + EPSRC/ESRC/BBSRC grant EP/N006410/1
41. Software Sustainability Institute
www.software.ac.uk
T
Research Culture
Needs Changing
But there’s still a lot to be done
• Software Assessment
• Software Management Plans
• Group Identifiers
Software project teams – encompassing
contributors
Software products – across versions
• Machine readable references
Software papers solve the credit problem
The reference problem is still hard
• Where is software mentioned and can we find it
42. Software Sustainability Institute
www.software.ac.uk
T
Research Culture
Needs Changing
Mechanisms are becoming available
• Roles
Project Credit http://credit.casrai.org/
Transitive Credit http://doi.org/10.5334/jors.be
• Mechanisms
Software papers http://bit.ly/softwarejournals
Software citation https://doi.org/10.7717/peerj-cs.86
• Tools
Researcher Identifiers e.g. ORCID http://orcid.org/
Alt-Metrics e.g. ImpactStory http://impactstory.org/
• Metadata
CodeMeta http://codemeta.github.io/
43. Software Sustainability Institute
www.software.ac.uk
T
Research Culture
Needs Changing
Software Referencing needs
• Where is software referenced in publications?
• How can we understand its influence?
• How can we choose between software?
Howison, Bullard 2015. DOI: 10.1002/asi.23538
Notas do Editor
In related study 100% of respondents to survey from Europe said they used research software, 2.5 % No effect, 7.5% Possible but difficult, 90% impossible – this survey has more inherent bias because of the way it was conducted.
Euro survey responses – Develop own software (90%), no formal training (57.5%)
No training = 15%
Formal + Self-Taught/Formal = 42.5%
Self-taught only = 42.5%
Study by Nangia and Katz
https://arxiv.org/pdf/1706.06527.pdf
January – March 2016, 173 pieces of software mentioned
32 of 40 papers mention software
6 packages mentioned in 4 or more papers: Pymol, R, Chimera, Coot, Matlab, PHENIX
26 packages mentioned in 2 or more papers
Note that R packages feature heavily, along with visualisation tools
5
6
7
8
9
This process utilises the existing mechanisms for credit based on citation.
Is it more important to sustain the software that this workflow references, or the workflow itself?
At what level do you reference, at what level do you deposit?
Made more difficult than data because of the fluidly changing collaborative nature of software development – not just adding to the contributor pool
Made more difficult than data because of the fluidly changing collaborative nature of software development – not just adding to the contributor pool
The Software Sustainability Institute can help with: software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more…
Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchers
Providing services for research software users and developers
Developing research community interactions and capacity
Promoting research software best practice and capability
20
Victoria Stodden has done a lot of work looking at the barriers to sharing
The Software Sustainability Institute can help with: software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more…
Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchers
Providing services for research software users and developers
Developing research community interactions and capacity
Promoting research software best practice and capability