Presented at the Workshop for Sustaonable Software for Science: Practice and Experiment (WSSSPE). Part of Supercomputing 2013 (SC13) in denver Colorado.
2. Outline
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
2
3. My Perspective/Bias
•
•
•
•
•
•
•
Basic scientist in the biomedical sciences
Not coded anything for years
Built computing infrastructure
Manage software project teams of ~10 people
Formed 4 software-based companies
15 years with a community resource – PDB
Helped to establish communities –
PLOS, FORCE11, DELSA, NIF
• University Administrator
• Journal co-founder
11/17/13
WSSSPE
3
4. Motivation – The Good News
• Those iconic DNA and protein
representations were drawn by hand
• Molecular graphics emerged to
automate this process
• Today cell contents are drawn by
hand
• Automating that conceptualization is
is just one next step
We are at the beginning of what software
will bring to the life sciences
11/17/13
WSSSPE
4
6. Thinking on Software back in 2008..
•
•
•
•
•
Costs too much
Is located in silos
Does not foster reproducibility
Is poorly maintained – is unsustainable
Does not meet the needs of 21st century
biology
• Is a major time waster
Computational Biology Resources Lack
Persistence and Usability. PLoS Comp. Biol.
2008 . 4(7): e1000136
11/17/13
WSSSPE
6
7. What Got Me Thinking More
• Software development in science has
improved thanks to open
source, github etc. but for the most
part remains arcane
• Software (and data) atrophy is a
problem
• There is much we can learn from the
app model
– Consistent user interface – intuitive
– Common calling interface
– App store – ratings commentary etc.
11/17/13
WSSSPE
7
8. The Protein Data Bank (PDB)
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
8
9. The Protein Data Bank (PDB)
• The single community
owned worldwide repository
containing structures of
publically accessible
biological macromolecules
• A resource used by ~
300,000 individuals per
month
• A resource distributing
worldwide the equivalent to
¼ the National Library of
Congress each month
• A bicoastal resource
• 1TB
11/17/13
WSSSPE
10. PDB: Looking Back Over the Past
15 Years – In General
• Everything was harder and took longer than we
thought
• There are a lot of politics associated with data and
software
• Emphasis has shifted from archive to + analytical
tool to + educational tool
• Consequently outreach is our most important yet
least understood activity today
• Staff needed to change accordingly
• It has become a worldwide enterprise
• Prorated our budget has decreased
11/17/13
WSSSPE
10
11. PDB: Looking Back Over the Past
15 Years – Infrastructure
• It took about 5 years to achieve and
subsequently sustain 99.99% uptime
• We have gone through 3 distinct code
refreshes another is needed
– Object model / Perl CGI
– Enterprise Java
– Code rewrite Enterprise Java
Bluhm et al. 2011 Quality Assurance
doi: 10.1093/database/bar003
11/17/13
WSSSPE
11
12. PDB: Looking Back Over the Past
15 Years – Open Source
• Only considered in the past 7 years or so
• Had “PDB in a Box” but abandoned that
• Now new components are made available
through biojava and github
• Don’t really use community contributions
enough
11/17/13
WSSSPE
12
13. PDB: Trends Today
• Constant demand for better
performance
• Use of Web services increasing
• Widgets have not taken off
• Mobile use is increasing fast
• PDB 2.0 services are in demand
11/17/13
WSSSPE
13
14. PDBMobile
Objective: PDB Data Access On-The-Go
• Fast, low bandwidth data access
• iPhone in production ~ 10,000 users
• Android in beta
• HTML 5-based web application
• Client-side database stores data for
offline-access
• Tight integration with MyPDB
11/17/13
WSSSPE
14
15. PDB Sustainability
• Its easier when the data are seen as vital to
the scientific enterprise
• Quality breeds trust which breeds support
• The community must be involved in every
major decision
• Different people/skills are needed at different
time points
• The Google bus is inevitable – make
allowances for it
11/17/13
WSSSPE
15
16. Sustainability Through the Private
Sector
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
16
17. Founded 4 Companies
• ViSoft Inc.
• Protein Vision Inc.
• Film Frontiers
• SciVee Inc.
11/17/13
WSSSPE
17
18. Sustainability Through Companies
• Making a business from scientific software alone
is very rare – founders tend to overvalue
everything; customers undervalue
• Be at the right place on the technology adoption
curve
• Need to provide value add – either through
content (again rare for science) or services –
increasingly likely but needs a special skill set
• TTOs do not understand the value (or lack) of
scientific software – be prepared
11/17/13
WSSSPE
18
19. Journals & Sustainability
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
19
20. The Role of Journals
• Journals can help elevate the value of
software and software developers
• However, it propagates a broken reward
system
• Provide quality control through peer review
• Provide copy of record
11/17/13
WSSSPE
20
21. Example: PLOS Computational Biology
Software Articles - Requirements
• Outstanding open source software of exceptional
importance that has been shown to provide new
biological insights, either as a part of the software
article, or published elsewhere.
• The software must already be widely adopted, or have
the promise of wide adoption by a broad community of
users.
• No enhancements published
• The software must be downloadable anonymously in
source code form and licensed under an OSI license
• Must be documented and testable
• Presubmission determines suitability
11/17/13
WSSSPE
21
23. The PLOS/Mozilla Experiment
• How much scientific software can be reviewed
by non-specialists, and how often is domain
expertise required?
• How much effort does this take compared to
reviews of other kinds of software, and to
reviews of papers themselves?
• How useful do scientists find these reviews?
11/17/13
WSSSPE
23
24. Institutions Can Sustain Developers
and Software
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
24
25. University 2.0 Is Yet to Happen –
Demand Appears to be There
11/17/13
WSSSPE
25
26. Institutions Underrate Software as
Scholarship, But There is a Glimmer of
Hope – But You Must Do Your Bit
PLoS Comp. Biol. 7(1) e1002001
11/17/13
WSSSPE
26
27. Your Responsibility for Software as
Scholarship
• Make it easy for software developers to
quantify the use and perceived value of
software
• Explain to reviewers who do not understand
the value the impact you have had
• Software is frequently more valuable that a
research article – don’t hide that
• Make clear the costs and sustainability issues
to institutions
11/17/13
WSSSPE
27
28. The Academic Institutions
Responsibility for Software as
Scholarship
• Accept alternative metrics
• Encourage individual departments to put
forward promotion files that reflect the value
of software to that domain
• Educate the committee on academic
promotions
11/17/13
WSSSPE
28
29. Funders & Sustainability
• My Perspective/Bias
• Motivation
• Experiences providing ingredients to the
recipe:
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
29
30. NIH As An Example
http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
11/17/13
WSSSPE
30
33. Features of the Software Catalog
(Maybe)
•
•
•
•
•
Driven by the community
Registration service
Rating service
Discovery service
Long term sustainability?
11/17/13
WSSSPE
33
34. The Role of Funders
• There needs to be more agency cross-talk –
both national and international
• Funders can help train institutions not just
individuals
• Better specification of the software enterprise
• Less build it and they will come – more grass
roots application driven support but managed
11/17/13
WSSSPE
34
35. The 3D Virtual Cell & FORCE11
Communities
• My Perspective/Bias
• Motivation
• Experiences driving ingredients to the recipe
– The role of journals
– The role of institutions
– The role of the community
– The role of funders
– A business model
11/17/13
WSSSPE
35
37. Sustainability Lessons from the 3D
Virtual Cell
• There remains a minimal requirement for
funding even with a vibrant community –
How?
• Communities still need champions & a vision
• Self organization is not an option
• Members must like each other – face to face is
needed
11/17/13
WSSSPE
37