The CONUL Research Group sponsored a workshop on the certification of digital repositories, focusing on the CoreTrustSeal framework. The workshop was presented by Dr John B Howard of University College Dublin, a founding member of the CoreTrustSeal Board of Directors.
The workshop reviewed the concept of "trust" in the context of data management and Open Science, identifying stakeholders and the issues that identify trustworthiness as a significant issue for managers of digital repository services. An overview of initiatives and services that provide a basis for the certification of digital repositories will be provided, including the European Framework for Audit & Certification of Digital Repositories.
Focus was made on the CoreTrustSeal, a framework that represents a merger of the previously separate Data Seal of Approval and ICSU/World Data Systems assessment and certification approaches. Attendees were introduced to the assessment process and requirements, including a walk through the 17 assessment categories in the CTS questionnaire.
Web & Social Media Analytics Previous Year Question Paper.pdf
Core Trust Seal for Trustworthy Data Repositories, 2018-04-19
1. CoreTrustSeal for
Trustworthy Data Repositories
John B Howard, CoreTrustSeal Board of Directors
University Librarian, University College Dublin
john.b.howard@ucd.ie
CTS IntroducCon and Workshop, Dublin, March X, 2018
2. Workshop Outline
• Trust: What does it mean, how does it apply?
• Who are the stakeholders, what are the issues?
• Repository cerCficaCon:
• CerCficaCon frameworks
• European Framework for Audit & CerCficaCon of Digital
Repositories
• CoreTrustSeal
• Background
• CerCficaCon process
• Guidelines - a quick review
3. Perspec<ves on Trust
• Trust in common parlance
• Repository context
• May represent a technical maZer of compliance (OAIS Reference
Model, or a cerCficaCon framework)
• Transparency with regard to policy & operaCons
• Engagement with the designated community
• End-user context
• Good will, understanding the needs of the designated community;
value proposiCon; reputaCon
• Transparency
• Confidence, reliability, persistence/permanence
• Endorsement: peers within the designated community, cerCficaCon
frameworks
5. Understanding stakeholder communi<es
• Focus on data with regard to Open Science, with
implicaCons of scienCfic interest
• Most cerCfied repositories hold observaConal,
experimental, or simulated data and data analyCcs
• Prevailing definiCons of data open the door to
designated communiCes that are heterogenous
• Current DataCite “resource types”:
CollecCon, Dataset, Event, Film, Image,
InteracCveResource, PhysicalObject, Service, Socware,
Sound, Text
6. Cer<fica<on affirms trustworthiness,
it does not bestow trust
• Forming a strategy to engender trust in repository
services
• Understand the OAIS model and how it applies to your
situaCon
• IdenCfy your designated community (communiCes)
• Evaluate your acCons as a repository: how do you add value,
support reliability
• Evaluate your mission with regard to the long-term usability
and cite-ability of assets held, be able to demonstrate
awareness of preservaCon challenges
• Evaluate the transparency of your documentaCon
9. Various approaches
• Task Force on Archiving of Digital InformaCon (Commission on
PreservaCon and Access and Research Libraries Group (RLG), 2001)
• OCLC
• TRAC Trustworthy Repositories Audit & Cer9fica9on
• ConsultaCve CommiZee for Space Data Systems (CCSDS)
• RAC Repositories Audit & Cer9fica9on
• Digital CuraCon Centre (DCC) und DigitalPreservaConEurope (DPE)
• DRAMBORA
• nestor AG Vertrauenswürdige Archive/ ZerCfizierung
• DIN 31644/nestor-Siegel
• Data Archive and Networked Services (DANS)
• Data Seal of Approval
• Primary Trustworthy Digital Repository AuthorisaCon Body (ISO-PTAB)
• ISO 16363
10. European Framework for Audit &
Cer<fica<on of Digital Repositories (2010)
Three different Cers of cerCficaCon processes that build
upon each other
hZp://www.trusteddigitalrepository.eu/Trusted%20Digital%20Repository.html
12. Formal cer<fica<on:
ISO 16363
• “Trusted Digital Repository (TDR) Checklist”
• Based on Open Archival InformaCon System (OAIS) and
Trusted Repository Audit and CerCficaCon (TRAC)
• Over 100 metrics
• Test audits 2011 by PTAB (Primary Trustworthy Digital
Repository AuthorisaCon Body)
• Full external audiCng process
• ISO 16919: Requirements for bodies providing audit
and cerCficaCon of candidate trustworthy digital
repositories
http://www.iso16363.org/
15. Partnership
Goals
• Realizing efficiencies
• Simplifying assessment opCons
• SCmulaCng more cerCficaCons
Outcomes
• Common catalogue of requirements for core repository
assessment
• Common procedures for assessment
• One new cerCficaCon body, replacing DSA and WDS
cerCficaCon
• New, not-for-profit S"ch"ng based in NL
18. Orienta<on to CTS Assessment
Key documents
• An IntroducCon to the Core Trustworthy Data
Repositories Requirements
• Core Trustworthy Data Repositories Requirements
• Glossary
• CoreTrustSeal Extended Guidance v1.0
See also:
Video capture of Extended Guidance Webinare
19. CoreTrustSeal requirements
• OrganisaConal infrastructure (6)
• Digital object management (8)
• Technology (2)
• Endorsed RDA output
• EC-recogniCon as an ICT technical specificaCon
• Self assessment (publicly available)
• Peer review
• 3 year seal period
• CerCficaCon mandatory for regular members of WDS
20. Organisa<onal infrastructure
• R1. The repository has an explicit mission to provide access to and preserve
data in its domain.
• R2. The repository maintains all applicable licenses covering data access and
use and monitors compliance.
• R3. The repository has a continuity plan to ensure ongoing access to and
preservation of its holdings.
• R4. The repository ensures, to the extent possible, that data are created,
curated, accessed, and used in compliance with disciplinary and ethical
norms.
• R5. The repository has adequate funding and sufficient numbers of qualified
staff managed through a clear system of governance to effectively carry out
the mission.
• R6. The repository adopts mechanism(s) to secure ongoing expert guidance
and feedback (either in-house, or external, including scientific guidance, if
relevant).
21. Digital object management
• R7. The repository guarantees the integrity and authenCcity of the data.
• R8. The repository accepts data and metadata based on defined criteria to ensure
relevance and understandability for data users.
• R9. The repository applies documented processes and procedures in managing
archival storage of the data.
• R10. The repository assumes responsibility for long-term preservaCon and
manages this funcCon in a planned and documented way.
23. Cer<fica<on procedure
• Self assessment based on requirements
• Online tool available
• Extended guidance (document and webinar)
• Review of the self assessment by two reviewers under the responsibility
of the CTS Board
• URLs to evidence strongly encouraged
• Maturity raCngs strongly encouraged
• Responses in English
• Assessments to be publicly available
• Assessment fee €1,000
• Renewal every 3 years
30. 0. Context
• Repository Type
• Comments should explain which roles are being fulfilled.
• Brief DescripCon of the Repository’s Designated Community
• DefiniCon of the Designated Community is important here (see
glossary): narrow, broad?
• Where does your data come from?
• Level of CuraCon Performed
A. Content distributed as deposited
B. Basic curaCon – e.g., brief checking, addiCon of basic metadata or documentaCon
C. Enhanced curaCon – e.g., conversion to new formats, enhancement of
documentaCon
D. Data-level curaCon – as in C above, but with addiConal ediCng of deposited data
for accuracy
• Outsource Partners.
• A diagram could be useful here.
31. I. Mission/Scope
R1. The repository has an explicit mission to provide access
to and preserve data in its domain.
Repositories take responsibility for stewardship of digital objects, and for
ensuring that materials are held in the appropriate environment for
appropriate periods of Cme. Depositors and users must be clear that
preservaCon of and conCnued access to the data is an explicit role of the
repository.
For this Requirement, please describe:
• Your organizaCon’s mission in preserving and providing access to data, and
include links to explicit [public] statements of this mission.
• The level of approval within the organisaCon that such a mission
statement has received (e.g., approved public statement, roles mandated
by funders, policy statement signed off by governing board).
• If data management is not referred to in the mission statement, then this
requirement, as a rule, cannot have a compliance level of 3 or higher.
31
32. II. Licenses
R2. The repository maintains all applicable licenses covering
data access and use and monitors compliance.
• Access and use condiCons could be set differently: either as standard terms and
condiCons, or as differenCated for parCcular depositors or datasets. These could
cover the level of curaCon, what is the liability level, the level of responsibility
taken for the data, limitaCons on use, limits on usage environment (safe room,
secure remote access), limits on types of users (approved researcher, has received
training, etc.).
• The consequences if noncompliance is detected (e.g., sancCons on current or
future access/use of data) should be made clear. Ideally, repositories should have
a public policy in place for noncompliance.
• The minimum compliance level should be 4, if the applicant is currently providing
access to data.
32
33. III. ConCnuity of access
R3. The repository has a conCnuity plan to ensure ongoing
access to and preservaCon of its holdings.
The level of responsibility for data should be indicated in
the evidence.
This informaCon helps the reviewer to judge whether the
organisaCon is sustainable in terms of its finances and
processes; in parCcular the conCnuity of its collecCons and
responsibiliCes in the case of cessaCon of funding. The
responsibility for sustainability may not lie in the hands of
the repository itself, but a higher, overarching (or umbrella)
organisaCon.
33
34. IV. ConfidenCality/Ethics
R4. The repository ensures, to the extent possible, that data are
created, curated, accessed, and used in compliance with
disciplinary and ethical norms.
• All organisaCons responsible for data have an ethical duty to manage them to the
level as expected by the scienCfic pracCce of its designated community. For
repositories holding data about individuals, businesses, or other organisaCons, there
are furthermore obligaCons and obligaCons that the rights of the data subjects will
be protected. These will be both of a legal and ethical nature.
• Disclosure of these data could also present a risk of personal harm, a breach of
commercial confidenCality, or the release of criCcal informaCon (e.g., the locaCon of
protected species or an archaeological site).
• Reviewers expect to see evidence that the applicant understands their legal
environment and the relevant ethical pracCces, and has documented procedures.
• Minimum compliance level should be a 4 if the repository is currently providing
access to personal data.
34
35. V. OrganizaConal infrastructure
R5. The repository has adequate funding and sufficient numbers
of qualified staff managed through a clear system of
governance to effecCvely carry out the mission.
• The repository is hosted by a recognized insCtuCon (ensuring long-term stability and
sustainability) appropriate to its Designated Community.
• The repository has sufficient funding, including staff resources, IT resources, and a budget for
aZending meeCngs when necessary. Ideally this should be for a three- to five-year period.
• The repository ensures that its staff have access to ongoing training and professional
development.
• The range and depth of experCse of both the organizaCon and its staff, including any relevant
affiliaCons (e.g., naConal or internaConal bodies), is appropriate to the mission. The descripCon
of this requirement should contain evidence describing the organisaCon’s governance/
management decision making processes, and the enCCes involved. Staff should have appropriate
training in data management to ensure consistent quality standards.
• In what degree is funding structural or project-based? Can this be expressed in FTE numbers?
• How ocen does periodic renewal occur?
35
36. VI. Expert guidance
R6. The repository adopts mechanism(s) to secure ongoing expert
guidance and feedback (either in-house, or external, including
scienCfic guidance, if relevant).
• Does the repository have in-house advisers, or an external advisory
commiZee that might be populated with technical members, data science
experts, and disciplinary experts?
• How does the repository communicate with the experts for advice?
• How does the repository communicate with its Designated Community for
feedback?
• This Requirement seeks to confirm that the repository has access to
objecCve expert advice beyond that provided by skilled staff menConed in
R5 (OrganisaConal infrastructure).
36
37. VII. Data integrity and authenCcity
R7. The repository guarantees the integrity and
authenCcity of the data.
• A clear and complete context secCon is important for all
requirements but this is parCcularly the case for this long
requirement 7. The organisaCon of the curaCon and the types of
data will help guide the reviewer expectaCon. The reviewer would
benefit from a clear overview of the processes and tools used to
curate the data including the level of manual and automated
pracCce, and the how the processes, tools and pracCces are
documented. Most useful would be when the applicant responds to
each bullet point separately and to address integrity and
authenCcity independently as defined in the requirement.
• Audit trails (wriZen evidence on which acCons have been performed
on the data) should be elaborated on in the evidence
37
38. VIII. Appraisal
R8. The repository accepts data and metadata based
on defined criteria to ensure relevance and
understandability for data users.
The applicant should be able to demonstrate that
procedures are in place to ensure that only data
appropriate to the collecCon policy are accepted and that
they have all the necessary informaCon and procedures and
skills to ensure long term preservaCon and use relevant for
the designated community.
38
39. IX. Documented storage procedures
R9. The repository applies documented processes and
procedures in managing archival storage of the data.
• The reviewer will be looking to understand each of the storage
locaCons which support curaCon processes, how data are
appropriately managed in each environment and that
processes are in place to monitor and manage change to
storage documentaCon.
• Can the repository recover from short-term disasters?
• Are procedures documented and standardised in such a way
that different data managers, while performing the same tasks
separately, will arrive at substanCally the same outcome?
39
40. X. PreservaCon plan
R10. The repository assumes responsibility for long-term
preservaCon and manages this funcCon in a planned and
documented way.
The reviewer will be looking for clear managed documenta<on to
ensure (1) a managed approach to long term preserva<on (2)
con<nued access for data types despite format changes and (3)
with sufficient documenta<on to support usability by the
designated community.
The preserva<on plan should be managed to ensure that changes
to data technology and user requirements are handled in a stable
and <mely manner.
40
41. XI. Data quality
R11. The repository has appropriate experCse to address technical data and
metadata quality and ensures that sufficient informaCon is available for end
users to make quality related evaluaCons.
• The applicant should make clear in his statements that he
understands the quality levels which can reasonably be
expected from depositors. This should describe the quality
assurance and improvement it will undertake during
curaCon and the quality expectaCons of users, which may
involve documentaCon of areas where quality thresholds
have not been met.
41
42. XII. Workflows
R12. Archiving takes place according to defined
workflows from ingest to disseminaCon.
The reviewer is looking for evidence that the applicant takes
a consistent, rigorous, documented approach to managing
its acCviCes throughout their processes and that changes to
those processes are appropriately evaluated, documented,
managed and implemented.
42
43. XIII. Data discovery and idenCficaCon
R13. The repository enables users to discover the data and
refer to them in a persistent way through proper citaCon.
This should contain evidence that the cura<on of data and
metadata is designed to support resource discovery of
clearly defined and iden<fied digital objects. It should be
clear to the users of this data how it must be cited to provide
appropriate academic credit and linkage between related
research.
43
44. XIV. Data reuse
R14. The repository enables reuse of the data over Cme,
ensuring that appropriate metadata are available to support
the understanding and use of the data.
The applicant should understand the needs of the designated
community in terms of their research prac<ses and technical
environment and used standards. Changes in technology are
important, but appropriate high quality metadata should also play an
essen<al role and should be men<oned in the evidence provided.
The laWer informa<on is cri<cal to design cura<on processes which
result in digital objects that meet the needs of the end user as well
as generic or disciplinary standards.
44
45. XV. Technical infrastructure
R15. The repository funcCons on well supported operaCng systems and
other core infrastructural so`ware and is using hardware and so`ware
technologies appropriate to the services it provides to its Designated
Community.
The workflows and human actors providing repository services must be
supported by a technological infrastructure. If possible this should be
demonstrated by using a reference model.
The reviewer is looking for evidence that the applicant understands the wider
ecosystem of standards, tools and technologies available for (research) data
management and cura<on and has selected op<ons which align with local
requirements.
45
46. XVI. Security
R16. The technical infrastructure of the repository provides
for protecCon of the facility and its data, products, services,
and users.
• The applicant should understand the technical risks applicable to its
parCcular service data user and physical environment and that it has
mechanisms in place to respond to incidents.
• Evidence must focus on technical infrastructure rather than on
managerial and procedural aspects of business conCnuity.
• In what way is the technical infrastructure determined by the
repository or by their host /outsource insCtuCon?
46