This document discusses publicly-funded research software, algorithms, and workflows. It argues that software is fundamentally different than data and requires different policies regarding public access. The document outlines that a large portion of research is software-intensive and relies on software. However, software faces sustainability issues like "software collapse" if not actively maintained. The document recommends that funding agencies take steps to incentivize open source software and long-term maintenance through funding and career incentives. It suggests defaulting to open source models but allowing other options if justified, with the goal of software remaining useful over time beyond the initial funding period.
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Public: Why and Why Not
1. Requiring Publicly-Funded Software,
Algorithms, and Workflows to be Made
Public: Why and Why Not
OECD, 15 October 2019
Daniel S. Katz
(d.katz@ieee.org, http://danielskatz.org, @danielskatz)
Assistant Director for Scientific
Software & Applications, NCSA
Research Associate Professor,
CS, ECE, iSchool
2. Why do we care about research software?
• Examining funding
• ~20% of NSF projects over 11 years topically discuss software in their
abstracts ($10b) [1]
• 2 of 3 main ECP areas are research software (~$4b)
• Examining publications
• Software intensive projects are a majority of current publications [2]
• Most-cited papers are methods and software [3]
• Asking researchers [4-6]
• >90% of US/UK researchers use research software
• ~65% would not be able to do their research without it
• ~50% develop software as part of their research
[1] Collected from http://www.dia2.org in 2017
[2] Nangia & Katz, 10.1109/eScience.2017.78
[3] “Top 100-cited papers of all time,” 10.1038/514550a
[4] Hettrick, http://bit.ly/2B8y6Iz
[5] Hettrick et al., 10.5281/zenodo.14809
[6] Nangia & Katz, 10.6084/m9.figshare.5328442.v1
3. Software (vs data) properties
• Software and data are fundamentally different
• Software is executable, data is not
• Data provides evidence, software provides a tool
• Software is a creative work, data are facts or observations
• Copyright applies to software but not data; different licenses are appropriate
• Software suffers from software collapse
• Software is not a one-time effort, it must be sustained
• Development, production, and maintenance are human-intensive
• Personal aside: FAIR was created for data, work needed to
decide if it can be applied to software, and if so, to do so, still
needs to be done
Katz, et al., https://doi.org/10.7287/peerj.preprints.2630v1
4. Background
• Now at University of Illinois
• Assistant Director for Scientific Software & Applications, NCSA
• Research Associate Professor, CS, ECE, iSchool
• From 2012-2016, I ran the Software Infrastructure for Sustained
Innovation at NSF
• Led the writing of NSF documents
• Software Vision and Strategy Report
• Implementation of Software Vision
• Funded about US$30m in software projects/year
• 2/3 of funding under my control from Cyberinfrastructure Office
• 1/3 raised under agreement of Science & Engineering Divisions
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
5. NSF Support for Infrastructure Software
• Some software intended for research
• Funded by many parts of NSF,
sometimes explicitly, often implicitly
• Intended for use by developer
• Other software intended as
infrastructure
• Funded by many parts of NSF, often
Office of Cyberinfrastructure (OCI),
almost always explicitly
• Intended for use by community
• NSF’s Software Infrastructure for
Sustained Innovation (SI2) focused
on research infrastructure projects
6. SI2 Review Criteria
• Standard NSF Criteria
• Intellectual Merit – advancing knowledge
• Generally not direct knowledge advances made by project; usually indirect based
on how the software would be used by others
• Broader Impacts – benefitting society
• Some of the additional SI2 review criteria
• Fill a recognized need and advance research capabilities?
• Security, trustworthiness, reproducibility, and usability are integrated?
• User interaction, community-driven approach?
• Leverage & interoperate with other software?
• Appropriate and justified license?
• Sustainability of software beyond award?
https://www.nsf.gov/pubs/2016/nsf16532/nsf16532.htm
7. SI2 licensing and sustainability
• Goal: software that has impact beyond the lifetime of the award
• How
• Ask proposers to provided sustainability plan
• Open source as default, but not required
• Proposers make a case for the best way to achieve sustainability
• In some fields (e.g., chemistry), may include integration into commercial packages
with low-cost licenses for academic research
• Over time, sustainability plans improved
• Realization that putting the software on GitHub is not a sustainability plan
• But still no clear model that works in all cases
• And few cases where sustainability path and success were clear
8. Software collapse
• Software stops working eventually if is not actively maintained
• Structure of computational science software stacks:
1. Project-specific software (developed by researchers): software to do a computation using
building blocks from the lower levels: scripts, workflows, computational notebooks, small
special-purpose libraries & utilities
2. Discipline-specific software (developed by developers & researchers): tools & libraries that
implement disciplinary models & methods
3. Scientific infrastructure (developed by developers): libraries & utilities used for research in many
disciplines
4. Non-scientific infrastructure (developed by developers): operating systems, compilers, and
support code for I/O, user interfaces, etc.
• Software builds & depends on software in all layers below it; any change below may
cause collapse
• Note: Containers freeze software; can stop collapse but also prevents bug fixes, new
algorithms, adaptations for new hardware, etc.; too long a freeze can kill software
K. Hinsen, “Dealing With Software Collapse,” 2019. https://doi.org/10.1109/MCSE.2019.2900945
9. Software Sustainability
• Software sustainability is the capacity of the software to endure
• Will the software will continue to be available in the future, on new platforms, meeting
new needs?
• Software sustainability ≡ sufficient ∆ software state
• Sufficient to deal with: software collapse, bugs, new features needed
• ∆ software state = (human effort in – human effort out - friction) * efficiency
• Software stops being sustained when human effort out > human effort in over some time
• Human effort ⇆ $
• All human effort works (community open source)
• All $ (salary) works (commercial software, grant funded projects)
• Combined is hard, equation is not completely true, humans are not purely rational
10. What can funding agencies do?
• Human effort ⇆ $
• All human effort works (community open source)
• All $ (salary) works (commercial software, grant funded projects)
• Combined is hard, equation is not completely true, humans are not
purely rational
• Provide incentives to support community contributions
• Provide funds to directly support software
11. Publicly-funded software
• Goal is funding software that is useful to a community over time,
not just during the period of public funding
• Personal aside: reproducibility also is a function of time, not an absolute
• Leads to options for each software package
• Make software public, commit to pay for maintenance/support
• Make software public, software developers grow community that
performs maintenance/support (as needed to sustain the software for
their own needs)
• Make software commercial, use sales/service to pay for
maintenance/support
12. Recommendations for publicly-funded software
• Let the developers/proposers state what they will do as part of
requesting funds
• Open source as default
• Take this into account when making decisions about what to fund
• Commit to reasonable maintenance funding, not tied to novel
research by the maintainers
• Support policy to provide incentives for community contributions
• Career paths, e.g., Research Software Engineers
• Credit, e.g. software citation, to include software in decisions such as
hiring, promotion, grants
• Overall: software is not data; policies must be carefully considered
https://rse.ac.uk
Smith, Katz, Niemeyer et al. 10.7717/peerj-cs.86
13. Recommendations for algorithms and workflows
• Algorithms
• If algorithms are executable, treat them the same as software
• If not, treat them the same as papers
• Workflows
• Can be data (e.g. DAG) or software (e.g. program)
• Treat software workflows as software
• Treat data workflows as data, and
• Ideally treat software that generates data workflows as software
Katz, https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/