SlideShare uma empresa Scribd logo
1 de 31
The Right Combination:
Using DDI and PREMIS for data
preservation

Parul Sharma & Sally Vermaaten




                                 March 2012
Outline

1. The context – drivers for preservation
2. The problem – challenges faced when trying to re-
   use data
3. Our solution – metadata for data management
   &preservation
4. Our recommendations– strategies for making the
   right metadata choices
                                                       2
1. THE CONTEXT:DRIVERS FOR
PRESERVATION




                             3
Data is a cross-domain concern

Geospatial data
Scientific data




                          Statistical data
                          Financial and
                          commercial data
                                         4
There are many drivers
        for data preservation

Legal mandates       Cost of data
Verification         collection
Uniqueness of data   Data re-use




                                    5
An example of data re-use at
Statistics New Zealand




                               6
2. THE PROBLEM:
CHALLENGESFACEDWHEN
TRYING TO RE-USE DATA



                        7
Common challenges
to re-use/preservation of any type of digital object
Common challenges
   to re-use/preservation of any type of digital object

I can’t find it
I can’t open it (wrong hardware/software)
I’m not sure it is the right thing
Unique challenges
to re-use/preservation of structured data
Unique challenges
       to re-use/preservation of structured data



I’m not sure it is the authoritative data
I don’t understand the meaning of the data - data is
not self-descriptive
I can’t use the data because I can’t harmonize it
with other data




                                                       11
3. OUR SOLUTION: METADATA
   FOR DATA MANAGEMENT
   &PRESERVATION



                            12
Our solutions
                          Have subject    Archivists put
I can’t find the data
     (common)            experts record    it in a safe
                           locations           place



I can’t open the data
   (common)



I’m not sure it’s the
right thing / it’s the
 authoritative data
  (particularly hard
     with data)


 I don’t understand
 the meaning of the
  data (particularly
   hard with data)


I can’t reuse the data
   because it’s not
 harmonised (unique
        to data)



                                                           13
Our solutions
                          Have subject    Archivists put
I can’t find the data
     (common)
                         experts record    it in a safe
                           locations           place


                           Archivists
I can’t open the data
   (common)               monitor file
                            formats

I’m not sure it’s the
right thing / it’s the
 authoritative data
  (particularly hard
     with data)


 I don’t understand
 the meaning of the
  data (particularly
   hard with data)


I can’t reuse the data
   because it’s not
 harmonised (unique
        to data)



                                                           14
Our solutions
                            Have subject
I can’t find the data                           Archivists put it in a
                           experts record
     (common)                                       safe place
                             locations




I can’t open the data    Archivists monitor
   (common)                 file formats



I’m not sure it’s the
right thing / it’s the                           Subject experts &
                            Have subject
                                                 archivists capture
 authoritative data      experts identify key
                                                what has happened
  (particularly hard          datasets
                                                    to the data
     with data)


 I don’t understand
 the meaning of the
  data (particularly
   hard with data)


I can’t reuse the data
   because it’s not
 harmonised (unique
        to data)



                                                                         15
Our solutions
                            Have subject
I can’t find the data                           Archivists put it in a
                           experts record
     (common)                                       safe place
                             locations




I can’t open the data    Archivists monitor
   (common)                 file formats



I’m not sure it’s the
right thing / it’s the                           Subject experts &
                            Have subject
                                                 archivists capture
 authoritative data      experts identify key
                                                what has happened
  (particularly hard          datasets
                                                    to the data
     with data)


 I don’t understand
                           Have subject
 the meaning of the                              Archivists capture
                          experts capture
  data (particularly                              or QA metadata
                          important data
   hard with data)


I can’t reuse the data
   because it’s not
 harmonised (unique
        to data)



                                                                         16
Our solutions
                            Have subject
I can’t find the data                           Archivists put it in a
                           experts record
     (common)                                       safe place
                             locations




I can’t open the data    Archivists monitor
   (common)                 file formats



I’m not sure it’s the
right thing / it’s the                           Subject experts &
                            Have subject
                                                 archivists capture
 authoritative data      experts identify key
                                                what has happened
  (particularly hard          datasets
                                                    to the data
     with data)


 I don’t understand
                           Have subject
 the meaning of the                              Archivists capture
                          experts capture
  data (particularly                              or QA metadata
                          important data
   hard with data)


I can’t reuse the data      Archivists and
                                                 Tools to create
   because it’s not        subject experts
                                                more standardised
 harmonised (unique       capture detailed
                                                      data
        to data)              metadata




                                                                         17
To support these processes…
Metadata is key
We could invent our own standard for recording
metadata but there is a better way …




                                                 18
How?
                                       PREservation Metadata:
Data Documentation
                                       Implementation Strategies
Initiative (DDI)
                                       (PREMIS)

                         Dublin Core


                     +                 +
                          Discover !

                                            Preserve!
  Describe!




                                                            19
Comparison of standards coverage

Dublin Core                DDI                      PREMIS
Discovery information       Surveys and outputs     Objects (significant
about a resource (e.g.      (Series and Studies)    characteristics,
Title, Creator, Publication                         checksums, basic
date)                                               identifying information)
                           Methodology & quality    Events (preservation
                           information              actions)
                           Classifications used     Agents
                           Dataset descriptions     Rights
                           Variables used
                           Links to documentation



                                                                               20
Metadata to support re-use
 I can’t find the       Have subject
                                            Archivists put it in a
       data
                       experts record
                                                safe place           DDI
                         locations
                                                                     PREMIS

 I’m not sure it’s      Have subject
                                             Subject experts &
                                             archivists capture
the authoritative    experts identify key
                                            what has happened
                          datasets
      data                                      to the data


   I don’t             Have subject
understand the        experts capture        Archivists capture
meaning of the          important             or QA metadata
                         metadata
     data


I can’t open the     Archivists monitor
      data              file formats




I can’t reuse the       Archivists and
                                             Tools to create
                       subject experts
data because it’s     capture detailed
                                            more standardised
                                                  data
not harmonised            metadata




                                                                              21
4. OUR RECOMMENDATIONS:
   STRATEGIES FOR MAKING THE
   RIGHT METADATA CHOICES



                               22
Metadata Top Tips

1. Create structures that will allow you to re-use metadata
   tools
2. Use standards that are fit for your content so users can
   re-use
3. Consider overlap between standards so you’re using the
   right standard for the right job
4. Provide standard based tools and capture at point of
   creation to improve quality and efficiency

                                                              23
1. Create structures that will allow you
        to re-use metadata tools
 Set yourself up to be able to use the same tools to
 harvest and mine your metadata (e.g. handy reports,
 searching across content types) by:
    – developing a standard structure that can support all your content
      types
    – and recording generic information in generic metadata standards




                                                                          24
Data_1500                                           Database_0120
DublinCore.xml                     Non-format            DublinCore.xml
PREMIS.xml                      specific metadata
                                                         PREMIS.xml
Original                                            Original
       data.sas7bdat                                           database.mdb
       questionnaire.doc                            ArchiveMaster
ArchiveMaster                                       Header
Data                                                                metadata.xsd
            data.csv                       Format                   metadata.xml
                                     specific structure &
Documentation                             metadata Content
            questionnaire.pdf                       Schema1
Metadata                                            Table1
            DDI.xml                                                           table.xsd
                                                                              table.xml   25
2. Use standards that are fit for your
      content so users can re-use

Enable future re-use and understanding by recording format
or content-specific metadata in fit-for-purpose standards e.g.
       DDI for statistical data
       SIARD for databases
       MIX for images




                                                            26
3. Consider overlap between standards so
    you’re using the right standard for the
                   right job
 Information         DDI                 PREMIS          Dublin Core   Useful to
                                                                       duplicate?



 Basic identifying   •Title                              •Title        yes
 information         •Creator                            •Creator
                     •PublicationDate                    •Date
                     •ID                                 •Identifier
 Access              •Access Conditions •Rights entity   •Rights       No – PREMIS is
 information                                                           most expressive
                                                                       and generic
                                                                       location

                                                                                    27
4. Provide standard based tools and
     capture at point of creation to
     improve quality and efficiency
At first, you may need to capture or collate all
metadata about data yourself
Think ahead about tools you might be able to
provide to data experts to allow them to record the
information directly in the standard if possible



                                                      28
29
Takeaways

1.  Organisations have many reasons to re-use data over time
2.  There are unique challenges to preserving data
3.  Where possible, save yourself some work and make your
    metadata more harvestable and data more understandable by
    using international standards like DDI and PREMIS
4. When you use metadata standards like DDI and PREMIS together:
   • create generic structures
   • use fit-for-purpose standards for specific content
   • consider information overlap
   • ‘delegate’ metadata capture where possible
                                                               30
Thanks!

Sally Vermaatensally.vermaaten@stats.govt.nz
Parul Sharma parul.sharma@stats.govt.nz




                                               31

Mais conteúdo relacionado

Destaque

T Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotT Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotFuture Perfect 2012
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchFuture Perfect 2012
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryFuture Perfect 2012
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsFuture Perfect 2012
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemFuture Perfect 2012
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWFuture Perfect 2012
 
Eclipse shortcut[most usuage]
Eclipse shortcut[most usuage]Eclipse shortcut[most usuage]
Eclipse shortcut[most usuage]Siddiq Abu Bakkar
 

Destaque (10)

T Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation PilotT Bahr M Lindlar Goportis Digital Preservation Pilot
T Bahr M Lindlar Goportis Digital Preservation Pilot
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake Research
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage Library
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying Formats
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation Ecosystem
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSW
 
Eclipse shortcut[most usuage]
Eclipse shortcut[most usuage]Eclipse shortcut[most usuage]
Eclipse shortcut[most usuage]
 
Android code convention
Android code conventionAndroid code convention
Android code convention
 

Semelhante a Parul Sharma Sally Vermaaten Right Combination

Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05bhughes26
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Data management workshop 101113
Data management workshop 101113Data management workshop 101113
Data management workshop 101113Jackie Wirz, PhD
 
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...Trevor Owens
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Data Management - Basic Concepts
Data Management - Basic ConceptsData Management - Basic Concepts
Data Management - Basic ConceptsSr Edith Bogue
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?GigaScience, BGI Hong Kong
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
Scality presentation cloud Computing Expo NY 2012 v1.0
Scality presentation cloud Computing Expo NY 2012 v1.0Scality presentation cloud Computing Expo NY 2012 v1.0
Scality presentation cloud Computing Expo NY 2012 v1.0Marc Villemade
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingUniversity of Arizona
 
NogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligenceNogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligenceGiuliano Bonassi
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionGarethKnight
 

Semelhante a Parul Sharma Sally Vermaaten Right Combination (20)

Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
 
Log Data Mining
Log Data MiningLog Data Mining
Log Data Mining
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Qiagram
QiagramQiagram
Qiagram
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Data management workshop 101113
Data management workshop 101113Data management workshop 101113
Data management workshop 101113
 
Data managementbasics issr_20130301
Data managementbasics issr_20130301Data managementbasics issr_20130301
Data managementbasics issr_20130301
 
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...
Doing Less More Often: An Approach to Digital Strategy for Cultural Heritage ...
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Data Management - Basic Concepts
Data Management - Basic ConceptsData Management - Basic Concepts
Data Management - Basic Concepts
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
Scality presentation cloud Computing Expo NY 2012 v1.0
Scality presentation cloud Computing Expo NY 2012 v1.0Scality presentation cloud Computing Expo NY 2012 v1.0
Scality presentation cloud Computing Expo NY 2012 v1.0
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data Sharing
 
NogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligenceNogaLogic-DataClassification&Governace&BusinessIntelligence
NogaLogic-DataClassification&Governace&BusinessIntelligence
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 

Mais de Future Perfect 2012

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paperFuture Perfect 2012
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paperFuture Perfect 2012
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessFuture Perfect 2012
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesFuture Perfect 2012
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiFuture Perfect 2012
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveFuture Perfect 2012
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingFuture Perfect 2012
 
Kevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueKevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueFuture Perfect 2012
 
Grace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstGrace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstFuture Perfect 2012
 
Dennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationDennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationFuture Perfect 2012
 

Mais de Future Perfect 2012 (18)

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paper
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paper
 
Bigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie LeanBigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie Lean
 
Steve Knight by Design
Steve Knight by DesignSteve Knight by Design
Steve Knight by Design
 
Michael Parsons Passion
Michael Parsons PassionMichael Parsons Passion
Michael Parsons Passion
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 
Andrew Waugh Business Systems
Andrew Waugh Business SystemsAndrew Waugh Business Systems
Andrew Waugh Business Systems
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in Databases
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of Digi
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation Perspective
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud Computing
 
Kevin De Vorsey Past is Prologue
Kevin De Vorsey Past is PrologueKevin De Vorsey Past is Prologue
Kevin De Vorsey Past is Prologue
 
Grace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things FirstGrace Currie Ann Jebson First Things First
Grace Currie Ann Jebson First Things First
 
Dennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital PreservationDennis Phillips Cooperative Digital Preservation
Dennis Phillips Cooperative Digital Preservation
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Parul Sharma Sally Vermaaten Right Combination

  • 1. The Right Combination: Using DDI and PREMIS for data preservation Parul Sharma & Sally Vermaaten March 2012
  • 2. Outline 1. The context – drivers for preservation 2. The problem – challenges faced when trying to re- use data 3. Our solution – metadata for data management &preservation 4. Our recommendations– strategies for making the right metadata choices 2
  • 3. 1. THE CONTEXT:DRIVERS FOR PRESERVATION 3
  • 4. Data is a cross-domain concern Geospatial data Scientific data Statistical data Financial and commercial data 4
  • 5. There are many drivers for data preservation Legal mandates Cost of data Verification collection Uniqueness of data Data re-use 5
  • 6. An example of data re-use at Statistics New Zealand 6
  • 8. Common challenges to re-use/preservation of any type of digital object
  • 9. Common challenges to re-use/preservation of any type of digital object I can’t find it I can’t open it (wrong hardware/software) I’m not sure it is the right thing
  • 11. Unique challenges to re-use/preservation of structured data I’m not sure it is the authoritative data I don’t understand the meaning of the data - data is not self-descriptive I can’t use the data because I can’t harmonize it with other data 11
  • 12. 3. OUR SOLUTION: METADATA FOR DATA MANAGEMENT &PRESERVATION 12
  • 13. Our solutions Have subject Archivists put I can’t find the data (common) experts record it in a safe locations place I can’t open the data (common) I’m not sure it’s the right thing / it’s the authoritative data (particularly hard with data) I don’t understand the meaning of the data (particularly hard with data) I can’t reuse the data because it’s not harmonised (unique to data) 13
  • 14. Our solutions Have subject Archivists put I can’t find the data (common) experts record it in a safe locations place Archivists I can’t open the data (common) monitor file formats I’m not sure it’s the right thing / it’s the authoritative data (particularly hard with data) I don’t understand the meaning of the data (particularly hard with data) I can’t reuse the data because it’s not harmonised (unique to data) 14
  • 15. Our solutions Have subject I can’t find the data Archivists put it in a experts record (common) safe place locations I can’t open the data Archivists monitor (common) file formats I’m not sure it’s the right thing / it’s the Subject experts & Have subject archivists capture authoritative data experts identify key what has happened (particularly hard datasets to the data with data) I don’t understand the meaning of the data (particularly hard with data) I can’t reuse the data because it’s not harmonised (unique to data) 15
  • 16. Our solutions Have subject I can’t find the data Archivists put it in a experts record (common) safe place locations I can’t open the data Archivists monitor (common) file formats I’m not sure it’s the right thing / it’s the Subject experts & Have subject archivists capture authoritative data experts identify key what has happened (particularly hard datasets to the data with data) I don’t understand Have subject the meaning of the Archivists capture experts capture data (particularly or QA metadata important data hard with data) I can’t reuse the data because it’s not harmonised (unique to data) 16
  • 17. Our solutions Have subject I can’t find the data Archivists put it in a experts record (common) safe place locations I can’t open the data Archivists monitor (common) file formats I’m not sure it’s the right thing / it’s the Subject experts & Have subject archivists capture authoritative data experts identify key what has happened (particularly hard datasets to the data with data) I don’t understand Have subject the meaning of the Archivists capture experts capture data (particularly or QA metadata important data hard with data) I can’t reuse the data Archivists and Tools to create because it’s not subject experts more standardised harmonised (unique capture detailed data to data) metadata 17
  • 18. To support these processes… Metadata is key We could invent our own standard for recording metadata but there is a better way … 18
  • 19. How? PREservation Metadata: Data Documentation Implementation Strategies Initiative (DDI) (PREMIS) Dublin Core + + Discover ! Preserve! Describe! 19
  • 20. Comparison of standards coverage Dublin Core DDI PREMIS Discovery information Surveys and outputs Objects (significant about a resource (e.g. (Series and Studies) characteristics, Title, Creator, Publication checksums, basic date) identifying information) Methodology & quality Events (preservation information actions) Classifications used Agents Dataset descriptions Rights Variables used Links to documentation 20
  • 21. Metadata to support re-use I can’t find the Have subject Archivists put it in a data experts record safe place DDI locations PREMIS I’m not sure it’s Have subject Subject experts & archivists capture the authoritative experts identify key what has happened datasets data to the data I don’t Have subject understand the experts capture Archivists capture meaning of the important or QA metadata metadata data I can’t open the Archivists monitor data file formats I can’t reuse the Archivists and Tools to create subject experts data because it’s capture detailed more standardised data not harmonised metadata 21
  • 22. 4. OUR RECOMMENDATIONS: STRATEGIES FOR MAKING THE RIGHT METADATA CHOICES 22
  • 23. Metadata Top Tips 1. Create structures that will allow you to re-use metadata tools 2. Use standards that are fit for your content so users can re-use 3. Consider overlap between standards so you’re using the right standard for the right job 4. Provide standard based tools and capture at point of creation to improve quality and efficiency 23
  • 24. 1. Create structures that will allow you to re-use metadata tools Set yourself up to be able to use the same tools to harvest and mine your metadata (e.g. handy reports, searching across content types) by: – developing a standard structure that can support all your content types – and recording generic information in generic metadata standards 24
  • 25. Data_1500 Database_0120 DublinCore.xml Non-format DublinCore.xml PREMIS.xml specific metadata PREMIS.xml Original Original data.sas7bdat database.mdb questionnaire.doc ArchiveMaster ArchiveMaster Header Data metadata.xsd data.csv Format metadata.xml specific structure & Documentation metadata Content questionnaire.pdf Schema1 Metadata Table1 DDI.xml table.xsd table.xml 25
  • 26. 2. Use standards that are fit for your content so users can re-use Enable future re-use and understanding by recording format or content-specific metadata in fit-for-purpose standards e.g. DDI for statistical data SIARD for databases MIX for images 26
  • 27. 3. Consider overlap between standards so you’re using the right standard for the right job Information DDI PREMIS Dublin Core Useful to duplicate? Basic identifying •Title •Title yes information •Creator •Creator •PublicationDate •Date •ID •Identifier Access •Access Conditions •Rights entity •Rights No – PREMIS is information most expressive and generic location 27
  • 28. 4. Provide standard based tools and capture at point of creation to improve quality and efficiency At first, you may need to capture or collate all metadata about data yourself Think ahead about tools you might be able to provide to data experts to allow them to record the information directly in the standard if possible 28
  • 29. 29
  • 30. Takeaways 1. Organisations have many reasons to re-use data over time 2. There are unique challenges to preserving data 3. Where possible, save yourself some work and make your metadata more harvestable and data more understandable by using international standards like DDI and PREMIS 4. When you use metadata standards like DDI and PREMIS together: • create generic structures • use fit-for-purpose standards for specific content • consider information overlap • ‘delegate’ metadata capture where possible 30

Notas do Editor

  1. Much of the important information about the world we live in today is recorded as structured data rather than unstructured documentation. Structured data is diverse in content and expression- ranging from commercial databases containing client information to geospatial and scientific research datasets. As structured data, such as statistical data, contains important information that scientists, businesses, and researchers may want to reuse in the future, there is an increasingly urgent push for its preservation.Preservation and re-use of data requires that data be described with appropriate metadata that will allow future users and machines to discover and interpret it. Organisations who want to preserve data must make a series of choices about how to describe it using the right combination of standards. In this presentation, we will use the Statistics New Zealand Data Archive as a case study for examining the point of connection between a statistical metadata standard that supports active data management (DDI) and a metadata standard that supports preservation (PREMIS). We will share our experience in using DDI and PREMIS to describe statistical data and will highlight how data-specific metadata can be used to support long-term preservation.
  2. We live in a data-driven society today. We’ve got vast quantities of geospatial data driving systems like Google Maps, we’ve got data-intensive sciences like astronomy that work with petabytes (1000 terabytes) of data, national statistical organisations like Stats NZ regularly collect data from individuals and businesses across the country to enable a better understanding of our society, there’s swathes of online data collected everyday by companies like Amazon and Facebook to help drive marketing decisions… and all this data is extremely useful and valuable.Image credits: http://rifm.org/default.htmhttp://www.stats.govt.nz/browse_for_stats/snapshots-of-nz/nz-in-profile-2012/~/media/Statistics/browse-categories/snapshots-of-nz/nz-in-profile/2012/nzip-2012-food-prices.PNG
  3. For statistical organisations like Stats NZ the primary driver for preservation of data is re-use of expensive data collections to answer questions that demand longitudinal data  Image credits:http://www.stats.govt.nz/Publications/MacroEconomic/productivity-stats-sources-methods.aspx
  4. I can’t find itI can’t identify the objectI can’t open it because I don’t have the right software/hardware or the object or media is damagedI’m not sure it is the right thing (i.e. is it the authoritative version? has it been changed along the way?)Image Credit: http://www.envelop.eu/shop/patterns/details/p/red-green-and-blue-apples
  5. Image Credit: http://www.envelop.eu/shop/patterns/details/p/red-green-and-blue-apples
  6. - researchers havelots of iterations of datasets during processing-data uses codes I don’t have any documentation on, what are the variables measuring, what events during the collection phase could have affected the quality of the data, who was surveyed, cryptic variable names, unsure of weighting applied, sources used
  7. Some of the solutions are more about statistical information while other are about those common preservation or re-use problemsCould have cumbersome org-specific standard but better to have combination of international standards
  8. Use a combination of international standardsThere are a few great benefits of this:This helps us and could help you by saving you from re-inventing the wheel.International experts and great community at your disposalInteroperable data and makes it easier to create shared access points and search and mine across data repositoriesTo describe data, particularly statistical, we use DDI , which is a fairly complex standard for managing and describing dataDDI includes Dublin Core information like titles and creators or authors that helps users find dataPREMIS contains information that will help the archive preserve the data via checksums, file formats, and provenance information
  9. Significant characteristics to preserve (e.g. fonts, colors, content only)How do you bring these all together? And what happens if the same information is included in more than one standard?We’ve done some thinking about this and can share our experience and strategies to consider when deciding what to record where!
  10. Looking back at our activities, some are more content-specific, i.e. just about data, and others are more general/common preservation activities.
  11. If you haven’t started managing your data - you can go back to your desk tomorrow and think about what metadata you could start capturing to support long-term re-use – whether you’re the one with the preservation archive or you’re planning/hoping to hand off your data to someone else. If you have already started managing your data – you can check whether your current practices consider the following things
  12. Premis – admin-ey m/d? ddi – descriptive? Other overlap includes DDI Archive module lifecycle events – could contain same info as PREMIS events but this overlap is probably not useful
  13. At Statistics NZ, we’re implementing a tool that will allow our statisticians to capture the statistical information as DDI.
  14. Don’t ignore data – it’s probably a key part of your core business