Eric Chen, Cornell; NSF Data Management Plan Case Studies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
Chen RDAP11 NSF Data Management Plan Case Studies
1. Research Data Management Service Group Cross-institution response to NSF Data Management Plan Eric Chen Analyst Consultant for Data-Driven Science Cornell University March 31st, 2011
2. Responding to NSF DMP Background Cross-institution response Current activities Next steps
5. How to respond? Existing data management service providers Cornell University Library Center for Advanced Computing Institutional Review Board Office of Sponsored Programs Vice Provost for Research Cornell Inst. for Social & Econ. Research DISCOVER Research Service Group Cornell IT Weill Cornell Medical College IT
6. RDMSG Research Data Management Service Group (RDMSG) Virtual organization Comprise existing campus data management service providers
7. RDMSG Vice Provost for Research University Librarian Faculty Advisory Board Sponsors and advisors Management Group RDMSG Virtual Organization Services assessment Outreach and training others as appropriate Implementation teams Service Providers others as appropriate Management Council Staff Coordinator CISER CIT CAC CUL
44. Information Sessions 3 Sessions about NSF DMP requirement “common-sense” interpretation Over 100 participants from 60 different campus affiliations Over 30 questions
45. Information Sessions What counts as data? Does that include raw data? I publish my findings in journals, so I already share my data. How do we budget for data storage and access beyond the end of a grant? For how long should data be archived? What about the additional burden on reviewers? What if your research plans change so that the original DMP is no longer relevant?
Source material from Gail SOutline – background (brief) why we’re doing this Cornell context and the structure we came up with to support these requirements Early results Observations – challenges, questionsteinhart, could not be here in person today.
Before the NSF DMP was announcedStarted with a group of us had already been exploring how we could cooperate to support the needs of data-driven science (our other head start).The focus of the DRSG has been twofold: assessment of researchers’ needs with respect to CI and data mgmt (series of interviews), and pilot projects on support for data-driven science.While not specifically focused on data mgmt planning, it had already brought together many of the right groups on campus that would comprise the cross-institution response
NSF announces DMP plan requirement that lead to the question of how to respond to the needs of researchers to create a plan.
Like many of our peers, Cornell has many groups that provide data management services from central campus IT, research computing, and library computing to name a few.The conversation started with the question of how would a researcher navigate through the various service groups to create a data management planHow to present them with a single point of contact so they don’t have to navigate something that looks like this?At Cornell, this is what this could look like for a researcher attempting to piece together the services they need to develop a robust data management plan. (+ have left out some smaller, more specialized service providers – SRI, CBSU, CCTEC…)
Solution - new VO; meant to address two main issues: distribution of resources – making that appear seamless to researchers by providing a single point of contact for assistance with data management. identifying and filling gaps in services in a coordinated way.This comes out of a proposal we submitted to the Vice Provost for Research and the University Librarian; available on website.
Here is a overview of the virtual organization. At the top we have the sponsors of the group that include the Vice Provost for Research, University Librarian, and the Faculty Advisory Board.VO itself is comprised of a management group: the management council (reps from major service providers; these are people who can allocate staff and resources to get work done), and a staff coordinator, that’s me!, to hold it all together.Implementation teams are charged by the management group to get work done for the RDMSG.… of course all of existing service providers – which may do some work related to the RDMSG, but also provide services outside of the RDMSG.One thing to note about this new organization is no new $ – participants have all accepted this structure and the work that comes with it as consistent with their mission and purpose.
Think of it as a concierge service for data management.Wikipedia / re: hotel concierges: “ a concierge is often expected to "achieve the impossible", dealing with any request a guest may have, no matter how strange, relying on an extensive list of contacts with localmerchants and service providers.”
Just to give you an idea of some of the things this new group has been working on… Survey: to better understand the potential impact on existing services, and to identify service gaps Website (still early on that) and single point of contact email Small and growing pool of consultants to field help requests – these are people with particular subject or IT expertise Emails to the ‘help’ email address generate a help ticket; triage process to route help requests to qualified consultant Ran three information sessions for prospective PIs and other interested staff; total attendance >100, slides and video of one session on the web Convened faculty advisory board Agreed to establish implementation teams to work in a handful of areas that support the RDMSG
We haven’t had a change to really look at all of the information yet, but can share a few preliminary results.Left: gives a sense of who responded to the surveyRight: significant interest in support for data mgmt planning (and some uncertainty)
“Not sure” is the most consistent winner when asked which approach(es) would PI use to share data
One kind of think we can get out of this exercise: most respondents weren’t sure about using the library’s IR (several commented that they don’t know what it is).Look at “yes” – blue – by size bins. IR has limit of ~50MB per upload, and w/o special intervention from sysadmin, files are uploaded one at a time by contributor. Several researchers planning to use IR for some ~sizable data collections – so anticipate managing expectations WRT services, and redirecting users to more appropriate services (could become a problem when redirecting from free to fee services).
FAQs (bold are big winners), persistently asked Qs, and comments.Lot of basic questions and a lot of uncertainty.
Lack of detail: will NSF provide any additional guidance, feedback, examples? Have to advise w/o having seen good and bad DMPs. Would like to hear more on this from NSF (note directorate guidelines are often nearly as vague as umbrella policy).Longer term needs: challenge for us to come up with business models / cost structures to support. Princeton has one, but addresses only simple storage, not preservation. Business models for preservation are active area of research and complicated – see blue ribbon panel report – so this is a tough problem.Interim solutions: good to have movement in this area – if it takes policy/requirement to make that happen, that’s good. Have to balance requirements with good decision making (which can take some time). Concern: interim adoption of mediocre solutions, can be difficult to migrate or reverse engineer.