SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Northwestern digital repository
initiative:
Platform and persistence
Claire Stewart
Director, Center for Scholarly Communication and Digital Curation
Head, Digital Collections, Library Technology Division
Northwestern University
claire-stewart@northwestern.edu
What is a repository and why
should I care?
Library as institutional memory
Tweeted in 2012 by Gail
Steinhart, Head of Research
Services, Mann Library, Cornell
University
Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2013). The Availability of
Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014
“The major cause of the reduced data availability for
older papers was the rapid increase in the proportion
of data sets reported as either lost or on inaccessible
storage media. For papers where authors reported
the status of their data, the odds of the data being
extant decreased by 17% per year (Figure 1D).”
[emphasis added]
The Availability of Research Data
Declines Rapidly with Article Age
What is a repository and why
should I care?
A concept
The
Repository
All the stuff
A set of technologies
Technologies and architecture
Repository as service
• Description and characterization - descriptive, provenance and technical
metadata
• Selection, conversion, digitization
• Deposit and versioning
• Interoperability, APIs for ingest, discovery
• Access control, copyright support and other legal/regulatory compliance
• Persistence
– Stable, permanent links (URLs, DOIs, etc.)
– Health of digital objects
– Replication and dark archiving
– Migration or emulation, virtualization
What’s already in
our repository
digital.library.northwestern.edu
Maps of Africa
First Fedora project @ NU
2006 project, internally
funded
116 antique maps at high
resolution
Maps in Fedora
METS, PREMIS, JPEG2000
Archival finding aids
findingaids.library.northwestern.edu Archon for EAD, Fedora + Blacklight for storage and discovery,
Primo syndication
Winterton Collection
Northwestern Books and the Book Workflow Interface
2009
Mellon-funded
Now used for all
in-house book
digitization
books.northwestern.edu
Every page of each digitized book has this information:
Datastream ID MIMETYPE Schema/ontology
Dublin Core metadata DC text/xml OAI_DC
MODS metadata MODS text/xml MODS
Relationship metadata RELS-EXT text/xml RELS-EXT
OCR PDF file PDF application/pdf
OCR XML OCR XML text/xml ABBYY OCR
OCR Text OCR TEXT text/plain
Source camera image file ARCHV-IMG image/jpeg
Source technical metadata in MIX ARCHIV-TECHMD text/xml MIX
Source camera technical metadata in EXIF ARCHV-EXIF text/xml Exif as XML
Corrected image file PROC-IMG image/jpeg
Corrected image technical metadata in MIX PROC-TECHMD text/xml MIX
Delivery image JPEG2000 file DELIV-IMG image/jp2
Delivery image technical metadata in MIX DELIV-TECHMD text/xml MIX
SVG for delivery mechanism DELIV-OPS text/xml SVG
Viewer html HTML text/html HTML
By the numbers — # of objects
As of November 2013:
• Finding aids: 1,114
• Digitized books: 3,491
• Digitized book pages: 835,806
• Image objects: 216,271
• A few others, including 3D objects, and collection objects
A total of 1,187,414 objects in the repository
Every object has several datastreams (files, descriptive metadata, technical metadata, etc.)
By the numbers — storage
As of Feb 5, 2014:
97.1 TB of content on repository (including digitized collections
queued for ingestion) and JPEG2000 server.
Library & NUIT purchased 200 TB of storage replicated between
Evanston and Chicago campuses (that is over 400 TB in total).
Digital preservation/persistence
• Persistent URLs
• Mirrored storage (as of fall 2014)
• PREMIS (preservation) metadata
• Routine health checks for data
• Geographically distributed storage
• Dark archives
• Migration/virtualization services
Distributed storage and dark archives
• DuraCloud
• Amazon Glacier
• Digital Preservation Network (DPN)
Current repository projects
• Digital Image Library (DIL)
• Avalon
• Hydramata
Hydra
Northwestern joined 2011
Framework for repository applications
using Ruby on Rails
Community with 22 partners
2007 Provost funded move from
Art History to the Library,
expansion to other disciplines
115,000 images in Hydra + Fedora
Moving all legacy digital
collections into DIL & its Hydra
counterparts in 2014-2015
images.northwestern.edu
Digital Image Library (DIL)
Avalon
IMLS-funded project with
Indiana University
Releases:
• 0 July 2012
• .5 October 2012
• 1.0 May 2013
• 2.0 October 2013 (NU pilot)
First NU production with R3,
expected in next month
media.northwestern.edu (dev/demo)
Scholarly communication and
digital curation
• Options for archiving scholarly
materials
• Authors rights, copyright help and
education, open access support
• E-science and research data life
cycle
• Digital humanities
• Library-based publishing
• Responding to funder requirements
Hydramata (formerly Shared IR)
Five-institution project to develop a next-generation institutional repository solution in Hydra
Expanding our repository program
• Massive storage, planning for growth, sustainability
• Digital preservation services
o Offsite third copy (DPN, DuraCloud, Glacier)
o Verification services
• Research computing
o Research data lifecyle - how to capture metadata early? what to
keep?
o Automate deposit from Vault?
• Shared infrastructure and services whenever possible
• Deeper collaboration with NUIT, Research, central admin, schools
Discussion and questions
Claire Stewart
Director, Center for Scholarly Communication and Digital Curation
Head, Digital Collections, Library Technology Division
Northwestern University
claire-stewart@northwestern.edu

Mais conteúdo relacionado

Mais procurados

10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...EDINA, University of Edinburgh
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data CenterGilles Fedak
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataChris Freeland
 
Repository technologies
Repository technologiesRepository technologies
Repository technologiesAndrea Bollini
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Harnessing Collective Intelligence for Sustainable Development
Harnessing Collective Intelligence for Sustainable DevelopmentHarnessing Collective Intelligence for Sustainable Development
Harnessing Collective Intelligence for Sustainable DevelopmentEDINA, University of Edinburgh
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 

Mais procurados (20)

10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Benoit Visual Only Retrieval
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic data
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Repository technologies
Repository technologiesRepository technologies
Repository technologies
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Harnessing Collective Intelligence for Sustainable Development
Harnessing Collective Intelligence for Sustainable DevelopmentHarnessing Collective Intelligence for Sustainable Development
Harnessing Collective Intelligence for Sustainable Development
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 

Destaque (8)

Copyright and your research
Copyright and your researchCopyright and your research
Copyright and your research
 
Copyright & your research, 2012
Copyright & your research, 2012Copyright & your research, 2012
Copyright & your research, 2012
 
Curating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and realityCurating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and reality
 
Open Access 101
Open Access 101Open Access 101
Open Access 101
 
The Center & Scholarly Digital Publishing
The Center & Scholarly Digital PublishingThe Center & Scholarly Digital Publishing
The Center & Scholarly Digital Publishing
 
A Quick Introduction to Copyright
A Quick Introduction to CopyrightA Quick Introduction to Copyright
A Quick Introduction to Copyright
 
Copyright & your research
Copyright & your researchCopyright & your research
Copyright & your research
 
Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...
 

Semelhante a Northwestern digital repository initiative: platform and persistence

Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011DLFCLIR
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare University of Edinburgh
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...IFLAAcademicandResea
 
Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentationPrince Sterling
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless OpportunityRachel Frick
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 

Semelhante a Northwestern digital repository initiative: platform and persistence (20)

Ji cv6n1
Ji cv6n1Ji cv6n1
Ji cv6n1
 
Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011
 
Digital Libray
Digital LibrayDigital Libray
Digital Libray
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
Dataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and PreservationDataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and Preservation
 
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
 
Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentation
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 

Último

ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 

Último (20)

ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 

Northwestern digital repository initiative: platform and persistence

  • 2. Claire Stewart Director, Center for Scholarly Communication and Digital Curation Head, Digital Collections, Library Technology Division Northwestern University claire-stewart@northwestern.edu
  • 3. What is a repository and why should I care?
  • 5. Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University
  • 6. Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2013). The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014 “The major cause of the reduced data availability for older papers was the rapid increase in the proportion of data sets reported as either lost or on inaccessible storage media. For papers where authors reported the status of their data, the odds of the data being extant decreased by 17% per year (Figure 1D).” [emphasis added] The Availability of Research Data Declines Rapidly with Article Age
  • 7. What is a repository and why should I care? A concept The Repository All the stuff A set of technologies
  • 9. Repository as service • Description and characterization - descriptive, provenance and technical metadata • Selection, conversion, digitization • Deposit and versioning • Interoperability, APIs for ingest, discovery • Access control, copyright support and other legal/regulatory compliance • Persistence – Stable, permanent links (URLs, DOIs, etc.) – Health of digital objects – Replication and dark archiving – Migration or emulation, virtualization
  • 10. What’s already in our repository digital.library.northwestern.edu
  • 11. Maps of Africa First Fedora project @ NU 2006 project, internally funded 116 antique maps at high resolution
  • 12. Maps in Fedora METS, PREMIS, JPEG2000
  • 13. Archival finding aids findingaids.library.northwestern.edu Archon for EAD, Fedora + Blacklight for storage and discovery, Primo syndication
  • 15. Northwestern Books and the Book Workflow Interface 2009 Mellon-funded Now used for all in-house book digitization books.northwestern.edu
  • 16. Every page of each digitized book has this information: Datastream ID MIMETYPE Schema/ontology Dublin Core metadata DC text/xml OAI_DC MODS metadata MODS text/xml MODS Relationship metadata RELS-EXT text/xml RELS-EXT OCR PDF file PDF application/pdf OCR XML OCR XML text/xml ABBYY OCR OCR Text OCR TEXT text/plain Source camera image file ARCHV-IMG image/jpeg Source technical metadata in MIX ARCHIV-TECHMD text/xml MIX Source camera technical metadata in EXIF ARCHV-EXIF text/xml Exif as XML Corrected image file PROC-IMG image/jpeg Corrected image technical metadata in MIX PROC-TECHMD text/xml MIX Delivery image JPEG2000 file DELIV-IMG image/jp2 Delivery image technical metadata in MIX DELIV-TECHMD text/xml MIX SVG for delivery mechanism DELIV-OPS text/xml SVG Viewer html HTML text/html HTML
  • 17. By the numbers — # of objects As of November 2013: • Finding aids: 1,114 • Digitized books: 3,491 • Digitized book pages: 835,806 • Image objects: 216,271 • A few others, including 3D objects, and collection objects A total of 1,187,414 objects in the repository Every object has several datastreams (files, descriptive metadata, technical metadata, etc.)
  • 18. By the numbers — storage As of Feb 5, 2014: 97.1 TB of content on repository (including digitized collections queued for ingestion) and JPEG2000 server. Library & NUIT purchased 200 TB of storage replicated between Evanston and Chicago campuses (that is over 400 TB in total).
  • 19. Digital preservation/persistence • Persistent URLs • Mirrored storage (as of fall 2014) • PREMIS (preservation) metadata • Routine health checks for data • Geographically distributed storage • Dark archives • Migration/virtualization services
  • 20. Distributed storage and dark archives • DuraCloud • Amazon Glacier • Digital Preservation Network (DPN)
  • 21. Current repository projects • Digital Image Library (DIL) • Avalon • Hydramata
  • 22. Hydra Northwestern joined 2011 Framework for repository applications using Ruby on Rails Community with 22 partners
  • 23. 2007 Provost funded move from Art History to the Library, expansion to other disciplines 115,000 images in Hydra + Fedora Moving all legacy digital collections into DIL & its Hydra counterparts in 2014-2015 images.northwestern.edu Digital Image Library (DIL)
  • 24. Avalon IMLS-funded project with Indiana University Releases: • 0 July 2012 • .5 October 2012 • 1.0 May 2013 • 2.0 October 2013 (NU pilot) First NU production with R3, expected in next month media.northwestern.edu (dev/demo)
  • 25. Scholarly communication and digital curation • Options for archiving scholarly materials • Authors rights, copyright help and education, open access support • E-science and research data life cycle • Digital humanities • Library-based publishing • Responding to funder requirements
  • 26. Hydramata (formerly Shared IR) Five-institution project to develop a next-generation institutional repository solution in Hydra
  • 27. Expanding our repository program • Massive storage, planning for growth, sustainability • Digital preservation services o Offsite third copy (DPN, DuraCloud, Glacier) o Verification services • Research computing o Research data lifecyle - how to capture metadata early? what to keep? o Automate deposit from Vault? • Shared infrastructure and services whenever possible • Deeper collaboration with NUIT, Research, central admin, schools
  • 28. Discussion and questions Claire Stewart Director, Center for Scholarly Communication and Digital Curation Head, Digital Collections, Library Technology Division Northwestern University claire-stewart@northwestern.edu