ESIP Federation: Community-Driven, Collaborative Governance
Carol Beaton Meyer
Presentation at Research Data Access & Preservation Summit
22 March 2012
1. ESIP Federation: Community-Driven,
Collaborative Governance
Carol Beaton Meyer
Research Data Access and Preservation Summit 2012
March 22, 2012
2. About ESIP
Formed in 1998 by NASA
~140 Members (2012)
Representing multi-agency, sector, science domains
Forum for Practitioners to Exchange Knowledge, Share
Technologies and Collaborate on Research
ESIP Ethos
Community-Driven (self-forming groups)
Open
Collaborative
Participatory
Innovative
3. What Drives ESIP Community?
Desire for Interoperability
Best Practices (from consensus)
Leverage Expertise and Resources of Others
4. Community-Driven Data Management
Case Study 1: Data Identifiers
Used Testbed to Evaluate Various Identifier Schemes
DOI, LSID, OID, PURL, ARK, UUID, XRI, Handles, URI/URN/URL
Evaluated each against set criteria
http://bit.ly/wSJFCA
Governance:
Community-defined problem
Developed review criteria
Each identifier reviewed against established criteria
Community analysis of results
Published Results in Journal of Earth Science Informatics, Volume 4,
Number 3, 139-160, DOI: 10.1007/s12145-011-0083-6
Implementation/Additional Testing Through California Digital Library
5. Community-Driven Data Management
Case Study 2: Data Citation Guidelines
Citation Guidelines for Data Providers
Governance
Community-need identified
Used existing best practices (IPY Citation work) as baseline
Iterative process within Data Preservation & Stewardship Cluster
Broader ESIP community review (input sought & provided)
Guidelines (best practices) adopted by ESIP Assembly, January 2012
Core Elements
Author(s)
Release Date
Title
Version
Archive and/or Distributor
Locator/Identifier
Access Date and Time
Full Guidelines: http://bit.ly/q0sz80
6. Community-Driven Data Management
Case Study 3: Discovery Conventions
Data centers using different discovery services
OpenSearch, DataCasting, ServiceCasting Services
Issues: interoperability, differing standards, distributed orgs
Goal: develop usable and simple solutions that leverage
existing standards, conventions and technologies, that have a
high likelihood of voluntary adoption
Governance – adopted by community (bottom up approach)
Submit
Review
Revisions
Vote
Ratify/Reject
Recommendations for Adoption
7. ESIP Community Resources
ESIP Governance -
wiki.esipfed.org/index.php/Federation_Documents
Wiki Workspace – wiki.esipfed.org
ESIP Commons – coming Spring 2012
Next ESIP Meetings
July 17-20, 2012 in Madison, Wisconsin
January 9-11, 2013 in Washington, DC
Join the community discussions
http://esipfed.org/collaboration-areas
10. Community-Driven Data Management
Case Study: Data Management Short Course
Two-year Volunteer Effort
Phase 1 – aimed at scientists
Phase 2 – aimed at data managers
Drew expertise from ESIP community
Course Outline
The case for data stewardship
Data management plans
Local data management
Preservation strategies
Responsible data use
Governance
Community-identified need/opportunity
Trial workshop at 2010 AGU
Defined scope of content
Volunteers drafted short modules
Peer review and editorial review
http://bit.ly/pTpe1k
11. Community-Driven Data Management
Case Study: Data Stewardship Principles
Preservation and Stewardship Cluster Considered:
Existing member data policies (NASA, NOAA, etc.)
Other organizations’ policies (CODATA, GEO, etc.)
Data Creators, Data Intermediaries and Data Users
Consensus Document
f
Home institution policies supersede
Room for commercialization
Adopted in January 2012
http://bit.ly/xOWS7e
Notas do Editor
Funders: Principally NASA & NOAA, some EPA, a little from NSF thru EarthCubeStarted out with 24 membersESIP provides:Forum for open, science data-centric community collaborationVoluntary participationCommunity of Practice – practitioners share expertise & technologiesTrusted authority, built by the communityInfrastructure for community collaborationCollaborative workspace on web (Drupal, wiki, listservs)Communications (coordinated, ad hoc, open)GovernanceFormal (constitution, bylaws, strategic plan)Informal (cluster-based governance for consensus building)Ethos: Results Network effect – productive connections made thru ESIP that likely would not have been made
ESIP provides community coordination to support interoperability at the data, systems, human and organization level. ESIP works through informal and formal structures, depending on what’s needed at a given moment
Community-driven activity of the ESIP Federation’s Data Preservation and Stewardship ClusterDifferent parts of the community were using different identifiers for their data & community wanted to know which identifiers worked best for different data types.Criteria: Technical Value - Scalability, Security, Standards, Interoperable, Compatible with Naming Conventions, Require a registry?, Dependence on a naming authority (longevity of naming authority institution), Longevity of technology usedUser Value – Will publishers allow it in citation?, Does identifier have any additional trust value?, Does the identifier have meaning? (Should identifiers be transparent or opaque?)Archival Value - How maintainable is the identification scheme when data migrates from one archive to another?, Cost associated with identifier?, Does the identification scheme handle data that is not on the web? What about physical objects?Looked at:Using DOIs for Entire Datasets (Results: http://bit.ly/zCY24T)Using DOIs for Components Within Datasets (Results: http://bit.ly/yZeIbT)
Purposes of data citation:To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study. (This is the paramount purpose and also the hardest to achieve).To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process.To ensure scientific transparency and reasonable accountability for authors and stewards.To aid in tracking the impact of data set and the associated data center through reference in scientific literature.To help data authors verify how their data are being used.To help future data users identify how others have used the data.Elements:Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.AccessDate and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
Submission of new proposalsForum to review proposalsAuthor revision based on feedbackVoting on change proposalsRatification or rejection by editors***To maintain an open community process, all steps are posted to the mailing list and/or wiki.
Funded by NOAA
Funded by NOAAResponsive to Perception that there was little training for the scientist having to do data management