This presentation was provided by Kristi Holmes of Northwestern University during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Holmes "Institutional Infrastructure for Data Sharing"
1. Institutional Infrastructure
for Data Sharing
Effective Data Management
NISO Virtual Conference
Kristi Holmes, PhD
Galter Health Sciences Library
Northwestern University
@kristiholmes
29 Sept 2021
Adapted from illustration by Andrew Russell
2. Empowering a collaborative, equitable ecosystem
Where are we going?
How do we get there?
Who’s along for the ride?
The big picture...
3. What do we miss when science isn’t equitable?
Image credits: https://www.proboat.com/2017/08/lyman-morse-builds-new-wood-glue/
and https://pixabay.com/photos/island-tropical-blue-waters-5783440/
4. Open Science at UNESCO. Towards a UNESCO Recommendation on Open Science. Available at
https://en.unesco.org/sites/default/files/open_science_brochure_en.pdf
Openness
Critical for access, innovation, and sustainability
BENEFITS
CHALLENGES
INCENTIVES
COLLABORATION
RECIPROCITY
5. Empowering a collaborative, equitable data ecosystem
Tools – both social and technical
Understanding, recognizing, and incentivizing
6. Personas
It is critical to understand and support the team
PERSONAS FEATURES
● 14 one-page profiles of key roles in translational
research, including 2 patient profiles
● Evidence-based through systematic review of
literature, job descriptions, and interviews. Two
empathetic patient profiles informed by literature
● Sample use cases informed by project
experience
USE PERSONAS
● Download the Profiles & User Guidebook
● Sample Use Cases: read ours and contribute your own
● Submit feedback via the Personas Evaluation Form
Expanding Personas based on feedback: AI/ML,
Bioethics/research, community health educator, Diversity
professional, Medical Education Coordinator, Regulatory
professional
Lead: Sara Gonzales
Collaborators: NYU, OHSU, UVA, and Northwestern
Gonzales S, O’Keefe L, Gutzman K, Viger G, Wescott A, Farrow B, Heath AP, Kim MC, Taylor D, Champieux R, Yen PY, Holmes K.
Personas for the translational workforce. Journal of Clinical and Translational Science, 1-8. https://doi.org/10.1017/cts.2020.2
7. Bahlai, C. A., et al. (2019). Open Science Isn't Always Open to All Scientists: Current efforts to make research more
accessible and transparent can reinforce inequality within STEM professions. American Scientist, 107(2). Available at
https://www.americanscientist.org/article/open-science-isnt-always-open-to-all-scientists
Illustration by Tom Dunne
Research isn’t open for everyone
(and there are a lot of incentives to keep it closed)
8. Institutional
perspectives &
new models
Team Scientists. Northwestern University Feinberg School of Medicine Faculty Affairs Office.
Available at https://www.feinberg.northwestern.edu/fao/for-administrators/team-scientists/index.html
9. Team Scientist Faculty Track Survey Results
SATISFIED
Overall satisfaction with current position 74%
Opportunity to collaborate with other faculty 90%
Sense of contributing to important research 83%
Contributions are acknowledged via
co-authorships
80%
Promotion process is clear and transparent 68%
Fall 2017 survey response rate: 81%
Fall 2021 survey is about to launch!
▪ 2015: a new “Team Scientist” track was established
within our regular faculty lines to better value such
scientists’ contributions
▪ Collaborative effort between NUCATS, Vice Dean for
Faculty Affairs at Feinberg, and relevant stakeholders
▪ Collaborative scientists who span content disciplines at
NU now have several distinct pathways for promotion
with clear metrics through our tenure-eligible,
non-tenure-eligible, and research faculty lines
▪ All faculty identify critical references and roles, going
beyond publications and grants to tell a story of
meaningful impact
▪ Contribution roles are key!
Institutional perspectives
New models: Northwestern’s Team Scientist Faculty Track
NUCATS Team: Keith Herzog, Mohammad Hosseini, Emily Traw
10. DataClinic
The Galter DataClinic employs a primary care model for data
management and analysis. In addition to providing training and best
practices support, we offer free consultations for researchers with data
issues, including data management and sharing.
Data Education & Community Engagement
The Galter Library actively promotes reproducibility and open science
best practices by organizing and hosting special events featuring
cutting-edge organizations and software tools in the field.
Innovations
Galter Health Sciences Library and Learning Center is dedicated to
building a collaborative clinical and translational research data
infrastructure.
Institutional perspectives
The Galter Library DataLab
Lead: Matt Carson, with Pamela Shaw, Sara Gonzales
11. Rule 1: Don’t try to own everything
Rule 2: Leverage champions to get buy-in from stakeholders
Rule 3: Have a sustainability plan (and find funding)
Rule 4: Hire a team, and support them
Rule 5: Recognize and elevate data, software, and workflow contributions
Rule 6: Focus on interdisciplinarity, but don’t overdilute
Rule 7: Emphasize responsible data science
Rule 8: Establish a set of guiding principles
Rule 9: Engage with external communities
Rule 10: Leverage core service groups
Check out Simple Rules and other resources
Institutional perspectives
Ten simple rules for starting (and sustaining) an
academic data science initiative
Parker MS, Burgess AE, Bourne PE (2021) Ten simple rules for starting (and sustaining) an academic data
science initiative. PLoS Comput Biol 17(2): e1008628. https://doi.org/10.1371/journal.pcbi.1008628
12. 1. Define achievable milestones, whether those milestones are for an established time period or for a task.
Milestones are defined by what works best for your team.
2. Show visible interim results within each milestone. If a milestone is one week, identify what tasks each
team member will be working on during that week.
3. Be realistic and don’t go it alone. Milestones can be difficult to achieve, so be sure to accurately assess
how much time you need to complete each task. If a roadblock appears, don’t hesitate to discuss it as a
team and reassign work as needed (if time- or skill-related). Call on help when needed.
4. Break through roadblocks. Don’t let the small details slow you down. Be sure to know who makes the
decisions when disagreement occurs.
5. Encourage open, positive communication among team members and stakeholders.
6. Implement intentional checkpoints to ensure ethical practices
7. Celebrate victories and work together to understand challenges.
8. Take time to do it right the first time. Scope is secondary. Be sure to understand the data, the process,
and work the plan accordingly.
9. Above all else, focus on the user and the mission
Institutional perspectives
Successful priorities and lessons learned
Adapted from: Davis V, et al. "Implementing VIVO and Filling it with Life." VIVO: A Semantic Portal for Scholarly Networking Across
Disciplinary Boundaries (Synthesis Lectures on the Semantic Web). Morgan & Claypool Publishers, 2012.
14. The InvenioRDM project. An international, interdisciplinary collaboration to build a user-focused,
turn-key, next-gen research repository. Available at https://inveniosoftware.org/products/rdm/
Your generalist institutional repository
A powerful tool for open science and data management and sharing
next-generation!
15. Final NIH Policy for Data Management and Sharing
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
We’re leveraging InvenioRDM as a strong foundation at NU. Here’s why.
Behind the scenes
▪ Discoverability. Leverages metadata standards, extensible and customizable metadata, and
the powerful Elasticsearch full-text search engine retrieves, facets, sorts, and filters searches
with ease.
▪ Scalablity. Invenio is fast. Designed to manage 100+ million records and petabytes of files. All
data can be archived independently of the size.
▪ Technology. Modular underlying technology (Python, Flask) widely supported. Invenio is
JSON-native and provides RESTful APIs to make it easy to build apps on top of the framework.
▪ Ethical metrics. Industry standard usage statistics for record pages with all tracking completely
anonymized.
▪ Easy. Turn-key research data management platform & index can be easily deployed in the local
environment. A SAAS-model for service via TIND (CERN spinoff). Customize the look and feel
to our local environment.
▪ A robust OS community: Large team of developers & active open source community.
User-focused features
▪ Research, shared. Securely share and preserve data
records and a wide range of research types with
collaborators, with asy dissemination to the community.
▪ Communities. Create and curate communities (e.g.,
workshop, project, lab, or journal).
▪ Integrates with best-practice tools and workflows.
GitHub, Jupyter Notebooks, Binder, and more.
▪ Compliance-friendly. Comply with data sharing
mandates* and acknowledge your funders.
▪ Get credit & be cited. Get a DOI to make records easily
and uniquely citable. Pre-formatted citation text makes it
easy to cite your work and be cited. Contributor roles
allows recognition of the whole team.
16. How did this collaboration start
(and what about Zenodo?!)
What motivated the InvenioRDM project?
• Some organizations tried to reuse the
existing open source Zenodo source code
• Other orgs tried to use the Invenio
Framework to build a RDM repository from
scratch
• Several orgs tried to make the same
modifications but had no easy way of
sharing their changes
All these groups came together to create a
collaborative open source project and grow a
sustainable community. Zenodo is also moving to InvenioRDM ;)
Two goals of InvenioRDM project:
18. InvenioRDM’s records are made
findable through each being
issued a Digital Object Identifier
(DOI), and through their
metadata being indexed and
made searchable immediately.
Metadata in InvenioRDM are
accessible because they are
retrievable using a standardized
communications protocol which
is free and universally
implementable.
InvenioRDM leverages
metadata encoding (JSON) and
vocabulary (FundRef,
OpenAIRE, COAR Resource
Types, etc.) standards to ensure
maximum interoperability for
records describing digital
assets.
Ensuring the reusability of
digital assets deposited in
InvenioRDM is key and is
achieved through assigning
licenses and establishing
provenance through registering
users.
Support for FAIRer science
19. Upcoming:
Communities
& Collections
Community: Define a research group, department,
event, or other collaborative unit; official and ad hoc
groups supported
Collection: Create multiple Collections under the
umbrella of a Community.
Collections bring together related groupings of files to
communicate process, enable sharing of results, and
support publication, compliance and reproducibility
Resource types are customizable for each instance as
needed.
XYZ Clinical
Study
Phenotype
Definitions
Phenotype Definitions
Definitions
Characterizations
Evaluations
Metadata
Dissemination Strategy
Clinical Studies
Research Proposals
Protocols
Data Management Plans
Methods Descriptions
Measures
Case Reports
Datasets and Analyses
This example page from
Zenodo shows approach;
sample records highlight
application in biomedicine
Lorem ipsum dolor
sit amet, do
eiusmod tempor
consectetur elit,
sed incididunt ut
labore et dolore
magna aliqua.
Lorem ipsum
dolor sit amet,
do eiusmod
consectetur elit,
sed incididunt
20. Early-Career Scholars: I’m just getting started on my research career. I want to showcase my work and demonstrate my expertise and
collaborations, Our repository gives me a way to make all of my research efforts findable and the metrics are helpful for reporting to leadership.
The secure “Funding Community” collection of example proposals from other investigators at my university was a huge help to me as I prepared
and submitted my proposal!
Basic Science: My research group uses the repository to support reproducible science by packaging our data and methods together.
Everything gets a unique identifier and versioning is supported. Our lab prioritizes science communication. Our graduate students set up a Lab
Community to enhance engagement and dissemination of data, protocols, tools, handbooks, lay summaries, presentations, training materials,
and more. We like that we can credit all contributors and present our work through an attractive resource that can be easily updated.
Translation to Practice: My team wants to find out about clinical trial opportunities to offer patients all options for treatment. It is important to us
to openly share the latest research with patients. The Communities feature gives us a way to make these materials openly, packaged in a
cohesive and attractive manner. As our team updates these resources, we can upload the new versions and track access.
Population Health and Health Equity: Our large multi-site cancer health equity project uses InvenioRDM to collaborate with our
community-based partners and credit these partners. We can share materials from community health events, project materials, training
materials, annual reports, and lay summaries of research. InvenioRDM helps us to be better partners, accountable to collaborators and the
community. We can create communities of practice to integrate theories, data, techniques, and tools. We want to make our research
discoverable to the wider community, helping to spark new collaborations and opportunities for dissemination.
Art and creative works: The same scientific research that generates new understanding and innovation also brings unexpected and often
breathtaking beauty. Every year the university hosts a “Science in Society” scientific images contest. These visually stunning images captivate
the imagination and deepen the connection between science and art. Each image in the collection originates from contemporary research in a
wide range of fields, including medicine, genetics, chemistry, and engineering. Judged by an interdisciplinary panel of local artists, scientists and
community leaders, the winning images are digitally housed in a “Community” and have been displayed throughout the Chicagoland area,
including Navy Pier, Harold Washington Library, Evanston Public Library, the Noyes Cultural Arts Center, and the Museum of Science and
Industry.
Adaptation of Illustration by Tom Dunne
User
Stories
to
highlight
use
of
the
InvenioRDM
Communities
feature
21. InvenioRDM in action (coming soon!)
NNLM Beacon: An openly available online repository of outputs from NNLM
Northwestern Prism
Images: Wireframes
Leads: Karen Gutzman & Matt Carson, with many others
22. Resources & Links
Official InvenioRDM site:
https://inveniosoftware.org/products/rdm/
Roadmap: https://inveniosoftware.org/roadmap/
GitHub:
https://github.com/inveniosoftware/invenio-app-rdm
Documentation: https://inveniordm.docs.cern.ch/
Install your own instance:
https://inveniordm.docs.cern.ch/
Get a demo! Contact InvenioRDM Community
Coordinator Sara Gonzales to arrange
23. “Hail the Maintainers: innovation is overvalued, maintenance often matters more.” Aeon. Available at
https://aeon.co/essays/innovation-is-overvalued-maintenance-often-matters-more
themaintainers.org
@The_Maintainers
(great perspectives)
24. Thank you!
@kristiholmes
kristi.holmes@northwestern.edu
Partnerships & Collaborations
• Galter Health Sciences Library
• Presented projects: Sara Gonzales, Karen Gutzman, Matt Carson, Joelen Pastva, Pamela Shaw,
Guillaume Viger, Mohammad Hosseini, Galter Library Digital Initiatives team, Galter DataLab, and more
• InvenioRDM open source community and the team at CERN
• Personas collaboration: NYU, OHSU, UVA, Northwestern
• Northwestern University Clinical and Translational Sciences Institute
• The Network of the National Library of Medicine (NNLM)
• Support the NNLM National evaluation Center & clinical data project
• The Data Discovery Collaboration
Support
• Northwestern University Clinical and Translational Sciences Institute UL1TR001422 (NIH/NCATS)
• NNLM National Evaluation Center (U24LM013751) (NIH/NLM)
• Chicago Cancer Health Equity Collaborative (ChicagoCHEC) U54CA202995, U54CA202997, U54CA203000 (NIH/NCI)
• Enhancing clinical trials participation though library partnerships G08LM012688 (NIH/NLM)
• Health For All: Advancing Library-Academic Medical Center Partnerships to Navigate Wellness and Scale
Preventive Services Access G08LM013188 (NIH/NLM)
Andrew Russell and Lee Vinsel. Let's Get Excited about Maintenance. NYTimes