SlideShare a Scribd company logo
1 of 38
Download to read offline
Building Collections in IRs
      from External Data Sources

  Data
Selection
                      Data
                    Curation           Data
                                      Ingest
       Data
      Trans-
    formation
                                            Data
                                                       ?!
                         Copyright
                                        Preservation
            Data         Compliance
            Reuse




                        Sai Deng, University of Central Florida
                        Susan Matveyeva, Wichita State University
*
* Data Acquisition and Content Recruitment for Institutional Repository
    (IR)
      * The reality: Lack of faculty self-submission; limited resources deposited…
      * Factors affecting faculty’s contribution to digital repositories
          *   Responses of 1700 scientific researchers to an international survey of digital repositories:
              Variable quality of materials; insecurity over IR’s long-term viability; researchers prefer
              subject repository… (Nicholas et al., 2012)

          *   Perceptions of faculty members from 17 Carnegie doctorate-granting universities: long term
              concern, copyright concern (Kim, 2011)


     *   Data Archives development in Open Access Repositories: Among over 2000 OpenDOAR
         repositories, 80 claim to contain datasets: 31 had no datasets found, 13 had few datasets,
         7 were not accessible, only 29 contain datasets. Among the 29 repositories, 15 are subject
         repositories and 7 are IRs. (Luzi et al., 2011)


* How libraries have been dealing with the reality?
     *   Strengthening digital preservation and copyright management (long time effort);

     *   Mediated deposit by librarians has become a norm: librarians collect, curate and deposit
         data from Graduate School, faculty, schools and departments…
*

    * Promoting Data Archiving, Curation and Preservation Services to
     be Part of the Research Lifecycle
      * Purdue University’s Institutional Data Repository service: Partner
        with the campus research office, facilitate data curation and
        cyberinfrastructure, data reference, data literacy… (Witt, 2012)


      * Research Lifecycle @ UCF and its Supporting Services Model
        (Under discussion, a cross campus effort initiated by the Information
        Services & Scholarly Communication Unit in the library)


    * Getting Data from External Sources
      * What are those external sources?
      * Which types of data? Datasets?
      * How?
*
* Building Partnerships between IRs and Data Services
     * IASSIST 2012 panel: Institutional Repositories and Data
         * IRs “on the front line of curating a growing variety of data sources;”
         * Partnerships with data producers, other IRs and digital repositories to
           improve local data curation.
            * Data Archive-Institutional Repository Partnership Project: Building
              partnerships between IRs and social science data archives.



* Creating Collections in an IR or Digital Repository from
    External Data Sources
     * External data sources: PubMed, IEEExplore, Web of Science,
      EndNote, open web…
*
    * Wrestling between IR and Domain Based Digital Repository
      * Domain-based Repository vs. IR, IR advantages: Institutional
       identity of the library, university data management service (The
       Research Data Access & Preservation Summit, Wickett et al., 2012)


      * Researchers prefer subject repository (Nicholas et al., 2012)

    * Data Curation and Digital Repositories
      * Monash University’s two types of repositories: Collaboration
       Repository, Publication/Preservation Repository.
      * Data curation continuum, boundary between the different
       repositories (Treloar et al., 2007)
*

    * How to build collections in IR or digital repository: Data harvesting? Data
     pre-populating to an IR or digital repository? Query external data provider and
     export results to the repository?

    * Some Observations of Data Services/Activities Related to PubMed and
     DSpace
       * SWORD (Simple Web-service Offering Repository Deposit)
         (http://swordapp.org/)

       * Populating Metadata in Submission with Data from PubMed
         (https://wiki.duraspace.org/display/DSPACE/PopulateMetadataFromPubMed)

       * PMC-OAI (PubMed Central OAI service) Service
         (http://www.ncbi.nlm.nih.gov/pmc/tools/oai/)

       * Commercial Services
         * BioMed Commercial Service (http://www.biomedcentral.com/libraries/aad)
         * Web of Science Web Services
             (http://wokinfo.com/products_tools/products/related/webservices/)
         *   @mire Customization (Dryad case:
             https://atmire.com/website/?q=references/dryad)
*
* PubMed-DSpace project at WSU Libraries.
* SOAR: Shocker Open Access Repository has been in production
    since 2007.

* Comprehensive Collection Development Strategy for SOAR
     * During the first several years, the comprehensive collection
      development strategy was explored;

     * IR Librarian promoted a new service to faculty and university
      administration and accepted numerous types of materials;

     * A variety of projects were completed: videos, presentations,
      science-museum like projects (e.g. collection of pottery of the Museum
      of Anthropology, virtual herbarium of the Department of Biological
      Sciences);
*
*Comprehensive Collection Development Strategy for
    SOAR (Cont.)
    * SOAR also included serial publications of the university, such as
      e-journals and proceedings;


    * ETDs program was developed as a collaboration of the Graduate
      School and the Library;


    * Two types of faculty collections were created: departmental
      and individual
      * Individual faculty collections included e-books, articles, conference
        papers and presentations;
      * Departmental collections included mainly articles and conference
        papers.
*
* Comprehensive Collection Development Strategy for SOAR
    (Cont.)
     * Some of these collections were uploaded to the repository title by title;

     * The others, such as Virtual Herbarium were added to the system as bulk;

     * However, both manually entered or bulk uploaded, these collections use
       internal sources of data:
        *   Full text for individual collections was provided by faculty authors;
        *   Full text for departmental collections included materials found at the University’s
            departmental websites.


     * In the last few years, our collection development strategy has been changed
       from comprehensive to limited.

     * Serial publications (ETDs, proceedings, e-journal) continued, but no new major
       projects were invited.
*
*Limited Collection Development Strategy for SOAR
    * Bibliographic data is imported from external sources;

    * Data is enhanced to ensure consistency of the repository
     collections and needs of the end users;

    * Full text is accepted if permitted (practically, limited to Open
     Access under Creative Commons license);

    * Access to full text is provided via links in preference order;

    * The emphasis is made on bulk import of faculty articles and
     conference papers.
        * Faculty did not provide these materials;
        * All materials are searched for by librarians on the Web and in different
           databases.
*
* How do we use data from external sources?
    * As source of information about the work written and published by the
     University’s author (this information is not available for us within the
     University);


    * Record leads us to full text of the work on the publisher’s websites or a
     hard copy of the journal (if available);


    * We use the work itself as a primary source of information of a metadata
     record;


    * After we export records from external sources, we verify information and
     modify the records according to our metadata template;


    * We acknowledge sources of information by including rights.holder field
     and the record’s ID number in its original database.
*
* Method Exploration: Options?
    * Export PubMed XML file (search results) to spreadsheet directly (require heavy
     manual editing);


    * Export Medline txt file to spreadsheet (requires data transpose, columns
     numbers vary, problematic);


    * Transform PubMed XML file to DCXML file using XSLT (source and target XML
     schemas needed);


    * Transform PubMed XML file to DCXML file using VB script and XPath expressions.

* PubMed vs. PubMed Central
    * All PubMed articles vs. Full text articles only
*
     Librarian                                                                                           2.0
                                      1.0
                                                                       PubMed                  Transform w/ VB and
                              Search PubMed by                    T1
                              Institution Affiliation
                                                                       XML                     XPath expression


               4.0
     Edit/Enrich/Enhance
     Data in Excel:
     Data Accuracy and
     Consistency Analysis;
                                                                                    3.0
     Name Authority Check;                   T3         Excel file                                  T2     DCXML
     Check fields against                                                    Export to Excel
     publication template;
     Additional Fields added;
     Link enrichment;
     Peer Review Article Status
     Check;
     Copyright Check;
                                               5.0                                                   6.0
     Divide data to subsets.                                                SIP Packages for
                                       Transform to SIP                T5                          Export to
                                       Packages w/ Java                     Departmental
                                       Program                              Collections            DSpace

    T4    Curated
          Datasets                                                                                                   Users

                  External                                                                                     Data
                  Entity                                Process             Data Flow
                                                                                                               Store/File
*
*   Source Data: Search PubMed (http://www.ncbi.nlm.nih.gov/pubmed) by institution affiliation, save the result
    as XML file;

*   Data Analysis and Mapping: Refer to MedLine/PubMed Data Element (Field) Descriptions; map PubMed fields to
    DC elements;

*   Transfer PubMed XML File to DC XML File
     *   VBScript run in Microsoft Visual Web Developer. Use XPath expressions.
     *   For example: Get value for DC element “identifierIssn” from node
         "./MedlineCitation/Article/Journal/ISSN" in the retrieved PubMed xml file:
          *       ' Check if there's a ISSN
           If node.SelectNodes("./MedlineCitation/Article/Journal/ISSN").Count > 0 Then
             writer.WriteStartElement("identifierIssn")
    writer.WriteValue(node.Item("MedlineCitation").Item("Article").Item("Journal").Item("ISSN").InnerText)
                  writer.WriteEndElement()
           End If
              …

*   Complexity in Data Extraction and Transformation
     *   identifierCitation: need to combine Journal title, Volume, Issue and Year, e.g., Journal title. 2011 Oct; 39(4):320-32.
     *   dateIssued: need to combine Year, Month and Day under “PubDate”, e.g., 2011-10-01. Used “yyyy-mm-dd” format.
         Formalized data for these situations: only year and month available; only year and season available. However changed
         back to their original formats at the final project stage.
     *   SubjectMesh: PubMed has DescriptorName and QualifierName. Need to consider different situations: descriptor with
         one qualifier, descriptor with multiple qualifiers…
     *   contributorAuthor: list all authors' names under AuthorList…
*
                        PubMed Source
DC Field                Field         Node in PubMed export XML file                     Note
identifier              PMID          ./MedlineCitation/PMID
                                                                                         Check if there's a ISSN in node
                                                                                         “./MedlineCitation/Article/Journal/ISSN”
                                                                                         of the PubMed XML file, if yes, output its mapped
identifierIssn          ISSN            ./MedlineCitation/Article/Journal/ISSN           DC element “identifierIssn” with the ISSN value.
                                                                                         To form identifierCitation need to combine
                                                                                         Journal title, Volume, Issue and Year ,e.g.,
identifierCitation      Title           ./MedlineCitation/Article/Journal/Title          Journal title. 2011 Oct; 39(4):320-32.
                                        ./MedlineCitation/Article/Journal/JournalIssue
                        PubDate         /PubDate
                                        ./MedlineCitation/Article/Journal/JournalIssue
                        Volume          /Volume
                                        ./MedlineCitation/Article/Journal/JournalIssue
                        Issue           /Issue
                                        ./MedlineCitation/Article/Pagination/MedlinePg
                        MedlinePgn      n

                                        ./MedlineCitation/Article/Journal/JournalIssue To form dateIssued, combined Year and Month
dateIssued              Year            /PubDate/Year                                  under “PubDate”, e.g., “2011-10.”
                                        ./MedlineCitation/Article/Journal/JournalIssue
                        Month           /PubDate/Month
                                        ./MedlineCitation/Article/Journal/JournalIssue Check if there's a day, otherwise use default day
                        Day             /PubDate/Day                                   “01.” “yyyy-mm-dd” format.
                                        ./MedlineCitation/Article/Journal/JournalIssue
dateIssued              Year            /PubDate/Year                                  Check if there's a season…
                                                                                       If only season available, replace “spring” with “03-
                                                                                       01,” summer with “06-01” and winter with “12-
                                        ./MedlineCitation/Article/Journal/JournalIssue 01” (However changed back to its original format
                        Season          /PubDate/Season                                at the project final stage).

relationIspartofseries1 Title           ./MedlineCitation/Article/Journal/Title
                                        ./MedlineCitation/Article/Journal/ISOAbbreviat
relationIspartofseries2 ISOAbbreviation ion
*
                       PubMed Source
DC Field               Field             Node in PubMed export XML file                        Note
                                                                                               Check if there's a “ArticleTitle” in node
                                                                                               ./MedlineCitation/Article/ArticleTitle” of the
                                                                                               PubMed XML file, if yes, output its mapped DC
title                  ArticleTitle      ./MedlineCitation/Article/ArticleTitle                element “title” with the “ArticleTitle” value.
formatExtent           MedlinePgn       ./MedlineCitation/Article/Pagination/MedlinePgn
                       CopyrightInforma ./MedlineCitation/Article/Abstract/CopyrightInform
RightsHolder           tion             ation

descriptionSponsorship Agency            ./MedlineCitation/Article/GrantList/Grant/Agency
title2                 VernacularTitle   ./MedlineCitation/Article/VernacularTitle
subjectMesh            MeshHeadingList   ./MedlineCitation/MeshHeadingList
                                                                                               Check if there's a DescriptorName. Need to consider
                                                                                               different situations: a descriptor with one qualifier,
                       DescriptorName    ./MedlineCitation/MeshHeadingList/MeshHeading         with multiple qualifiers…
                       QualifierName     ./MedlineCitation/MeshHeadingList/MeshHeading
contributorAuthor[1,2,                                                                         Check if there's a AuthorList and then list all
3…]                    AuthorList        ./MedlineCitation/Article/AuthorList                  authors' names.
languageIso            Language          ./MedlineCitation/Article/Language
                                         ./MedlineCitation/Article/PublicationTypeList/Publi
type                   PublicationType   cationType
coverageSpacial        Country           ./MedlineCitation/MedlineJournalInfo/Country
MedlineTA              MedlineTA         ./MedlineCitation/MedlineJournalInfo/MedlineTA
identifierIssn2        ISSNLinking       ./MedlineCitation/MedlineJournalInfo/ISSNLinking
Identifier2            NlmUniqueID       ./MedlineCitation/MedlineJournalInfo/NlmUniqueID
identifier3            ArticleId (doi)   ./PubmedData/ArticleIdList/ArticleId[@IdType='doi']
identifier4            ArticleId (pii)   ./PubmedData/ArticleIdList/ArticleId[@IdType='pii'
identifier5            GrantID           ./MedlineCitation/Article/GrantList/Grant/GrantID
*
    *   Refer to Faculty and Research Publication Template in DSpace (WSU SOAR);
         *   dc.contributor.author
         *   dc.date.issued
         *   dc.identifier                 (doi)
         *   dc.identifier.citation
         *   dc.identifier.issn
         *   dc.identifier.uri
         *   dc.description
         *   dc.description.abstract
         *   dc.format.extent
         *   dc.language.iso
         *   dc.publisher
         *   dc.relation.ispartofseries
         *   dc.source??
         *   dc.title
         *   dc.type
         *   dc.coverage.spacial
         *   dc.description.version       (peer-reviewed status)
         *   dc.rights.holder


    *   After the DCXML file is exported to Excel, additional fields need to be added to the
        spreadsheet from the WSU research publication template;

    *   Keep extra fields from PubMed export for this Collection?!
         * descriptionSponsorship
         * subjectMesh
         * MedlineTA (NLM journal title abbreviation)
*
    * Edit affiliation field, make department names consistent and turn them to
      departmental collection names;

    * Name authority check: Check OCLC authority file, local Voyager authority file;

    * Peer-review article check: Check journal’s peer review status; Search
      Sherpa/Romeo, or, Ulrichsweb (serials directory);

    * Additional fields added: contributor (dept. name), publisher, identifier.uri,
      description;

    * Enriched data in existing fields: rightsHolder;

    * Copyright compliance: Only provide links to the article, not resave and host it;

    * Data accuracy and consistency analysis;

    * Divide records to several departmental collections.
*
* Edit affiliation field and find affiliation if not available;
* Check “contributorAuthor” against OCLC and local Voyager authority file;
* Add peer review status for “descriptionVersion” by searching Sherpa/Romeo or
    Ulrichsweb;
* Delete the value in “identifierIssn2” if it’s the same as the value in
    “identifierIssn;”
* Change “relationIspartofseries” (journal name) to proper case (use PROPER());
* Add “contributor” (corporate contributor, university department);
* Sort by “identifierIssn,” add “publisher” (check Ulrich’s Periodical and
    Sherpa);
* Search “identifierUri” by title, enrich doi links and other links;
* Check “rightsHolder”;
* Add “description” for links;
* Check special characters in the spreadsheet;
* Data accuracy and consistency check…
*
* Search for additional links and add link descriptions for publication records;
* Access to full text is provided via links in the following preference order:
    *   Direct link to the title: DOI, e.g., Click on the DOI link below to access the article.
        (Description)


    *   Link to the electronic journal / proceedings record in the library catalog if the library
        subscribes this journal; link to the journal website for publications not licensed by the
        library.
         *   For example: The full text of this article is not available in SOAR. WSU users can access
             the article via commercial databases licensed by University Libraries: ... The stable URL
             of this article is: …


    *   If title not online, provide a note (no digital copy of this title available, or hard copy
        available in the library catalog)
         *   For example: The full text of this article is not available in SOAR. Check the journal
             record … for the paper version of the article in the library. Or,
         *   The full text of this article is not available in SOAR.
*
    * An Add-on to Facilitate DSpace Batch Import Procedure
       *   Part of the Google Summer of Code Project 2008;
       *   This program was written by Blooma Mohan John at the Nanyang Technological University;
       *   Written in Java. It transforms data prepared in Microsoft Excel or OpenOffice spreadsheet
           to DSpace batch import format, the XML-based Submission Information Packages (SIPs);
       *   The spreadsheet requires inclusion of metadata as well as the locations of the digital
           resource.
       *   Program wiki:
           https://wiki.duraspace.org/display/GSOC/Google+Summer+of+Code+2008+Batch+Import


    * Customization of the Add-on to Generate SIP Packages for Individual Collections from
      Spreadsheet
       *   For customization of this program, refer to:
            *   Deng, S., Matveyeva, S. & Khan, B. (2010). Enhancing workflow through batch import from
                Excel to DSpace. Kansas Library 2010 Conference. Wichita, KS, April 8, 2010. Available at:
                http://hdl.handle.net/10057/2366
            *   Deng, S. Optimizing Workflow through Metadata Repurposing and Batch Processing. Available
                at: http://www.tandfonline.com/doi/abs/10.1080/19386389.2010.524862#preview
*
* Export individual batch to its corresponding location (DSpace
    handle)
        * Transfer data to DSpace server using psftp, for example:
            open soar.wichita.edu
            cd /data/dspace/upload
            put PubMed_PSY.zip



        * Log into the server and run the DSpace ItemImport command, for
           example:
           cd /usr/local/src/dspace/upload
           unzip PubMed_PSY.zip
           cd /data/dspace/bin
          ./dsrun org.dspace.app.itemimport.ItemImport -a -e firstnamelastname@institution.edu -c
123456789/3377 -s /usr/local/src/dspace/upload/PubMed__PSY -m mapfilePubMed_PSY
*
*




    *
*
    * In this example,
     the dates in yyyy-
     mm-dd format
     were changed
     back to their
     original format
     (as requested);


    * If number of
     changes exceeds
     maximum
     allowed, limit
     changes or alter
     bulkedit.gui-item-
     limit in
     dspace.cfg.
*
    * PubMed to Excel Python script (by Nitin Arora):
      * http://blog.humaneguitarist.org/projects/pubmed2xl/
      * Export PubMed citations to spreadsheet;
      * XSLT stylesheet can be customized to output desirable data;
      * Does not validate data against PubMed and DC xml schemas.

    * XSLT and XML Processor
      * Transform PubMed XML to DCXML with XSLT in an XML processor;
      * Use PubMed and DCXML schemas for element validation;
      * Question: DCXML in which format? SIP packages? How to add
        files/bitstreams to DSpace?
*
* IEEE Data Source: WSU faculty publications from The
    Institute of Electrical and Electronics Engineers (IEEE)
    Xplore.


* This data was exported to individual research publications
    for Computer Science (CS), Electrical Engineering and
    Computer Science (EECS), Mathematics and Statistics (MAT),
    Mechanical Engineering (ME) and Physics (PHY).
*
    Librarian                                                                          2.0
                              1.0                        Excel Comma
                                                                               Edit/Enrich/Enhance
                       Search IEEE by                 T1 Separated             Data in Excel:
                       Author Affiliation                Value File            Map fields to DC and
                       and Export Results                                      Check fields against
                                                                               publication template;
                                                                               Additional DC Fields
                                                                               added;
                       3.0                               Curated               Distribute titles to
                 Transform to SIPs               T2                            appropriate collection
                                                         Datasets              sets;
                 w/ Java Program
                                                                               Data Accuracy and
                                                                               Consistency Analysis;
                                                                               Peer Review Article Status
                                                                               Check;
                                                                               Copyright Check.
                 SIP Packages for
                 Research
                                                  4.0
            T3   Publication                    Export to
                 Collections                    DSpace

                                                                       Users


            External                                                                         Data
            Entity                          Process            Data Flow
                                                                                             Store/File
*

    *The project goal is to export to the repository’s
     appropriate collection retrospective records derived
     from WOS database for all available years;

    *Because librarians do not have an internal source of
     data on faculty publications, the only way to collect
     data is to turn to external sources;

    *WOS is the largest commercial source of data on
     published articles & conference proceedings in all
     subjects from 1988- that contains over 41 million
     records collected from over 12,000 journals.
*
    Librarian
                             1.0                                                           2.0
                         Search WOS by                     EndNote                 Export EndNote
                                                      T1                                           T2
                         Organization and                  Citation                Txt Files to Excel
                         Export Results


                                                                                 2.0                     Excel Data
                                                                      Edit/Enrich/Enhance Data
           3.0                                                        in Excel:
     Transform to SIPs                      Curated                   Sort Excel file by Authors;
                                                                      Map fields to DC and Check
     w/ Java Program               T3
                                            Datasets                  fields against publication
                                                                      template;
                                                                      Additional Fields added;
                                                                      Data Accuracy and
                                                                      Consistency Analysis;
                                                                      Name Authority Check;
      SIP Packages for             4.0                                Link enrichment;
      Faculty                                                         Peer Review Article Status
T4    Publication
                                 Export to                            Check;
      Collections                DSpace                               Copyright Check;
                                                                      Divide data to subsets;
                                                                      Check for duplicates.
                                                           Users

              External                                                                              Data
              Entity                        Process           Data Flow
                                                                                                    Store/File
*
* Issues to Think About
    * Is it appropriate for an IR be a back up resort to institutional
     research generated datasets in subject repositories and other
     digital repositories?


    * Open access sources and U.S. federal government sources are
     okay to use. We need to read commercial companies papers to
     see what they permit us to use.


    * Can librarians provide value added data services? Or data
     curation services?


    * Do librarians have sufficient training to describe and preserve
     scientific data? (Newton et al., 2011)
*

                  *   DCC: Digital Curation Centre, UK


                  *   Full Lifecycle Actions: Description
                      and Representation Information,
                      Preservation Planning,
                      Community Watch and
                      Participation.


                  *   Sequential Actions:
                      Conceptualize, Create or
                      Receive, Appraise and Select,
                      Ingest, Preservation Action,
                      Store, Access, Use and Reuse,
                      Transform.


                  *   Occasional Actions: Dispose,
                      Reappraise, Migrate


    From http://www.dcc.ac.uk/resources/curation-lifecycle-model
*
                             It practices “Curate”
                             and “Preserve” from     It involves activities in
                             an institution’s        “Appraise & Select”,
Building collections from    aspect.                 “Ingest”, “Preservation
external data sources put                            Action” and “Store.”
emphasis on:
“Access use, reuse” and
“Transform.”



    Reuse: repurposing
    datasets from external
    data sources to IR or
    digital repository.
    Transform: preserve
    subsets.
                                                               These projects
                                                               promote data
                                                               sharing
Value added data                                               between the
services is part of the                                        subject
curation process.                                              repository,
This dynamic model can                                         data providers
be entered at any point?!                                      and
                                                               institutional
                                                               repository?!
*
* To get data from other sources is an alternative way to develop collections
    for different disciplines since author self-deposit has not become a common
    practice for IRs. It can be a good way to find information for retrospective
    projects.


* This is an effort in line with the current metadata cataloging trend of moving
    from item by item cataloging to batch processing of datasets, repurposing of
    metadata between different systems and communities, and providing value-
    added data curation services to students and faculty in an IR.


* For small to medium sized institutions that do not have the budget for
    commercial data services, it is possible to get data from external databases or
    data sources through some procedures without infringing copyright policies.


* If an institution prefers data enhancement to the external source, it may work
    better to export the data to a software program allowing local batch editing
    such as Excel than pre-populating it to the IR directly.
*
* The need for external data source in faculty publications
    arise from its unavailability on institutional level.


* The situation may change with the improvement of research
    data management at universities.


* University administration needs better statistical data for
    reporting and assessment purposes, more universities organize
    data on faculty research using online systems and some systems
    provide seamless flow of data to institutional repositories.
*
    * Data acquisition and content recruitment for IRs
    * Institutional Repository vs. Subject Repository
    * Creating research publication collection in IRs and digital
      repositories
    * Data batch transfer from external sources to IRs
    * Data curation for IR and digital repository
    * Data curation services to support research lifecycle
    * Data management plan (Project level, institutional level)
    * Transform of Medline in PubMed to DC in DSpace
    * Workflow management
    * Value-added metadata services
    * Metadata fields selection and enrichment
    * Copyright compliance
    * Data acquisition from the open web…
*
*   Constantopoulos, P., Dallas, C., Androutsopoulos, I., Angelis, S., Deligiannakis, A., Gavrilis, D., Kotidis, Y. and
    Papatheodorou, C. (2009). DCC&U: An extended digital curation lifecycle model. The International Journal of Digital
    Curation, Vol. 4, No. 1, pp. 34-45.
*   Digital Curation Centre (DCC). DCC curation lifecycle model. Retrieved from http://www.dcc.ac.uk/resources/curation-
    lifecycle-model.
*   Higgins, S. (2009). Applying the DCC curation lifecycle model. IASSIST 2009: Tampere, Finland 26-29 May.
*   Kim, J. (May 2011). Motivations of faculty self-archiving in institutional repositories. The Journal of Academic
    Librarianship, Vol. 37, No. 3, pp. 246-254, doi:10.1016/j.acalib.2011.02.017.
*   Luzi D., Di Cesare R., Ruggieri R. and Ricci M. (2012). Enhancing diffusion of scientific content: open data in open archives.
    In: GL-13 - Thirteenth International Conference on Grey Literature : the Grey Circuit, from Social Networking to Wealth
    Creation (Washington D.C., USA, 5-6 December 2011). Proceedings, D.J. Farace, J. Frantzen, GreyNet (eds.). TextRelease,
    2012.
*   Newton, Mark P., Miller, Christopher C. and Bracke, Marianne S. (2011). Librarian roles in institutional repository data set
    collecting: outcomes of a research library task force. Collection Management, Volume 36 Issue 1, pages 53-67. DOI:
    10.1080/01462679.2011.530546
*   Nicholas, D., Rowlands, I., Watkinson, A., Brown, D. and Jamali, H. R. (2012). Digital repositories ten-years on: what do
    scientific researchers think of them and how do they use them? Learned Publishing, 25: 195–206 doi:10.1087/20120306
*   Treloar, A., Groenewegen, D. and Harboe-Ree, C. (2007). The data curation continuum: Managing data objects in
    institutional repositories. D-Lib Magazine (September 2007).
*   University of Central Florida Library. Scholarly Communication: 21st Century Digital Scholarship at UCF. Retrieved from
    http://develop.lib.ucf.edu/ScholarlyCommunication/ (Temporary link).
*   Wichita State University Libraries. Shocker Open Access Repository (SOAR). Retrieved from http://soar.wichita.edu
*   Wickett, Karen M.; Hu, X. and Thomer, A. (2012). RDAP12 summit: challenges and opportunities for data management.
    Bulletin of the American Society for Information Science and Technology. June/July 2012, pages 14–19. Volume 38, Issue 5.
    DOI: 10.1002/bult.2012.1720380506
*   Witt, M. (2012). Co-designing, co-developing, and co-implementing an institutional data repository service. Journal of
    Library Administration, 52(2). DOI:10.1080/01930826.2012.655607.
*
    Sai Deng
    Associate Librarian and Cataloger/ Metadata Librarian, University of
    Central Florida Libraries
    Email: sai.deng@ucf.edu
    (Previously worked as a Metadata Cataloger at Wichita State
    University Libraries)

    Susan Matveyeva
    Associate Professor & Catalog & Institutional Repository Librarian,
    Wichita State University Libraries
    Email: susan.matveyeva@wichita.edu


                                             Thank you!

More Related Content

What's hot

WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
bio data
bio databio data
bio data007dcp
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505EDINA, University of Edinburgh
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesLIBER Europe
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Jian Qin
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...datacite
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 

What's hot (20)

WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
bio data
bio databio data
bio data
 
Register "New Directions in Cataloging and Metadata Creation"
Register "New Directions in Cataloging and Metadata Creation"Register "New Directions in Cataloging and Metadata Creation"
Register "New Directions in Cataloging and Metadata Creation"
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)Linking Scientific Metadata (presented at DC2010)
Linking Scientific Metadata (presented at DC2010)
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
2013 DataCite Summer Meeting - Thomson Reuters Data citation index cooperatio...
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
E conf(2)
E conf(2)E conf(2)
E conf(2)
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

Viewers also liked

IR Services for Students, Faculty, & Research Administration
IR Services for Students, Faculty, & Research AdministrationIR Services for Students, Faculty, & Research Administration
IR Services for Students, Faculty, & Research AdministrationSusan Matveyeva
 
bidlack work samples II
bidlack work samples IIbidlack work samples II
bidlack work samples IIkylebidlack
 
Pwc Profile Revised
Pwc Profile RevisedPwc Profile Revised
Pwc Profile RevisedAndrew Jenkins
 
Introducing Talents Media To Somebody 170409
Introducing  Talents Media To   Somebody   170409Introducing  Talents Media To   Somebody   170409
Introducing Talents Media To Somebody 170409Andrew Jenkins
 
Pwc Profile For Slideshare
Pwc Profile For SlidesharePwc Profile For Slideshare
Pwc Profile For SlideshareAndrew Jenkins
 
bidlack work samples
bidlack work samplesbidlack work samples
bidlack work sampleskylebidlack
 

Viewers also liked (7)

IR Services for Students, Faculty, & Research Administration
IR Services for Students, Faculty, & Research AdministrationIR Services for Students, Faculty, & Research Administration
IR Services for Students, Faculty, & Research Administration
 
bidlack work samples II
bidlack work samples IIbidlack work samples II
bidlack work samples II
 
PWC Profile
PWC ProfilePWC Profile
PWC Profile
 
Pwc Profile Revised
Pwc Profile RevisedPwc Profile Revised
Pwc Profile Revised
 
Introducing Talents Media To Somebody 170409
Introducing  Talents Media To   Somebody   170409Introducing  Talents Media To   Somebody   170409
Introducing Talents Media To Somebody 170409
 
Pwc Profile For Slideshare
Pwc Profile For SlidesharePwc Profile For Slideshare
Pwc Profile For Slideshare
 
bidlack work samples
bidlack work samplesbidlack work samples
bidlack work samples
 

Similar to Building Collections in IRs from External Data Sources

Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansJeffrey Beall
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Nikesh Narayanan
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptxhasanrdhaiwi
 
VIVO at the University of Idaho
VIVO at the University of IdahoVIVO at the University of Idaho
VIVO at the University of Idahoanniegaines
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Nick Sheppard
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...EDINA, University of Edinburgh
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
W3C Library Linked Data Incubator Group: Review of the Final Report
W3C Library Linked Data Incubator Group:  Review of the Final ReportW3C Library Linked Data Incubator Group:  Review of the Final Report
W3C Library Linked Data Incubator Group: Review of the Final ReportF. Tim Knight
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 

Similar to Building Collections in IRs from External Data Sources (20)

Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
 
finde datasets repository.pptx
finde datasets repository.pptxfinde datasets repository.pptx
finde datasets repository.pptx
 
VIVO at the University of Idaho
VIVO at the University of IdahoVIVO at the University of Idaho
VIVO at the University of Idaho
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
W3C Library Linked Data Incubator Group: Review of the Final Report
W3C Library Linked Data Incubator Group:  Review of the Final ReportW3C Library Linked Data Incubator Group:  Review of the Final Report
W3C Library Linked Data Incubator Group: Review of the Final Report
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 

Recently uploaded

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Recently uploaded (20)

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Building Collections in IRs from External Data Sources

  • 1. Building Collections in IRs from External Data Sources Data Selection Data Curation Data Ingest Data Trans- formation Data ?! Copyright Preservation Data Compliance Reuse Sai Deng, University of Central Florida Susan Matveyeva, Wichita State University
  • 2. * * Data Acquisition and Content Recruitment for Institutional Repository (IR) * The reality: Lack of faculty self-submission; limited resources deposited… * Factors affecting faculty’s contribution to digital repositories * Responses of 1700 scientific researchers to an international survey of digital repositories: Variable quality of materials; insecurity over IR’s long-term viability; researchers prefer subject repository… (Nicholas et al., 2012) * Perceptions of faculty members from 17 Carnegie doctorate-granting universities: long term concern, copyright concern (Kim, 2011) * Data Archives development in Open Access Repositories: Among over 2000 OpenDOAR repositories, 80 claim to contain datasets: 31 had no datasets found, 13 had few datasets, 7 were not accessible, only 29 contain datasets. Among the 29 repositories, 15 are subject repositories and 7 are IRs. (Luzi et al., 2011) * How libraries have been dealing with the reality? * Strengthening digital preservation and copyright management (long time effort); * Mediated deposit by librarians has become a norm: librarians collect, curate and deposit data from Graduate School, faculty, schools and departments…
  • 3. * * Promoting Data Archiving, Curation and Preservation Services to be Part of the Research Lifecycle * Purdue University’s Institutional Data Repository service: Partner with the campus research office, facilitate data curation and cyberinfrastructure, data reference, data literacy… (Witt, 2012) * Research Lifecycle @ UCF and its Supporting Services Model (Under discussion, a cross campus effort initiated by the Information Services & Scholarly Communication Unit in the library) * Getting Data from External Sources * What are those external sources? * Which types of data? Datasets? * How?
  • 4. * * Building Partnerships between IRs and Data Services * IASSIST 2012 panel: Institutional Repositories and Data * IRs “on the front line of curating a growing variety of data sources;” * Partnerships with data producers, other IRs and digital repositories to improve local data curation. * Data Archive-Institutional Repository Partnership Project: Building partnerships between IRs and social science data archives. * Creating Collections in an IR or Digital Repository from External Data Sources * External data sources: PubMed, IEEExplore, Web of Science, EndNote, open web…
  • 5. * * Wrestling between IR and Domain Based Digital Repository * Domain-based Repository vs. IR, IR advantages: Institutional identity of the library, university data management service (The Research Data Access & Preservation Summit, Wickett et al., 2012) * Researchers prefer subject repository (Nicholas et al., 2012) * Data Curation and Digital Repositories * Monash University’s two types of repositories: Collaboration Repository, Publication/Preservation Repository. * Data curation continuum, boundary between the different repositories (Treloar et al., 2007)
  • 6. * * How to build collections in IR or digital repository: Data harvesting? Data pre-populating to an IR or digital repository? Query external data provider and export results to the repository? * Some Observations of Data Services/Activities Related to PubMed and DSpace * SWORD (Simple Web-service Offering Repository Deposit) (http://swordapp.org/) * Populating Metadata in Submission with Data from PubMed (https://wiki.duraspace.org/display/DSPACE/PopulateMetadataFromPubMed) * PMC-OAI (PubMed Central OAI service) Service (http://www.ncbi.nlm.nih.gov/pmc/tools/oai/) * Commercial Services * BioMed Commercial Service (http://www.biomedcentral.com/libraries/aad) * Web of Science Web Services (http://wokinfo.com/products_tools/products/related/webservices/) * @mire Customization (Dryad case: https://atmire.com/website/?q=references/dryad)
  • 7. * * PubMed-DSpace project at WSU Libraries. * SOAR: Shocker Open Access Repository has been in production since 2007. * Comprehensive Collection Development Strategy for SOAR * During the first several years, the comprehensive collection development strategy was explored; * IR Librarian promoted a new service to faculty and university administration and accepted numerous types of materials; * A variety of projects were completed: videos, presentations, science-museum like projects (e.g. collection of pottery of the Museum of Anthropology, virtual herbarium of the Department of Biological Sciences);
  • 8. * *Comprehensive Collection Development Strategy for SOAR (Cont.) * SOAR also included serial publications of the university, such as e-journals and proceedings; * ETDs program was developed as a collaboration of the Graduate School and the Library; * Two types of faculty collections were created: departmental and individual * Individual faculty collections included e-books, articles, conference papers and presentations; * Departmental collections included mainly articles and conference papers.
  • 9. * * Comprehensive Collection Development Strategy for SOAR (Cont.) * Some of these collections were uploaded to the repository title by title; * The others, such as Virtual Herbarium were added to the system as bulk; * However, both manually entered or bulk uploaded, these collections use internal sources of data: * Full text for individual collections was provided by faculty authors; * Full text for departmental collections included materials found at the University’s departmental websites. * In the last few years, our collection development strategy has been changed from comprehensive to limited. * Serial publications (ETDs, proceedings, e-journal) continued, but no new major projects were invited.
  • 10. * *Limited Collection Development Strategy for SOAR * Bibliographic data is imported from external sources; * Data is enhanced to ensure consistency of the repository collections and needs of the end users; * Full text is accepted if permitted (practically, limited to Open Access under Creative Commons license); * Access to full text is provided via links in preference order; * The emphasis is made on bulk import of faculty articles and conference papers. * Faculty did not provide these materials; * All materials are searched for by librarians on the Web and in different databases.
  • 11. * * How do we use data from external sources? * As source of information about the work written and published by the University’s author (this information is not available for us within the University); * Record leads us to full text of the work on the publisher’s websites or a hard copy of the journal (if available); * We use the work itself as a primary source of information of a metadata record; * After we export records from external sources, we verify information and modify the records according to our metadata template; * We acknowledge sources of information by including rights.holder field and the record’s ID number in its original database.
  • 12. * * Method Exploration: Options? * Export PubMed XML file (search results) to spreadsheet directly (require heavy manual editing); * Export Medline txt file to spreadsheet (requires data transpose, columns numbers vary, problematic); * Transform PubMed XML file to DCXML file using XSLT (source and target XML schemas needed); * Transform PubMed XML file to DCXML file using VB script and XPath expressions. * PubMed vs. PubMed Central * All PubMed articles vs. Full text articles only
  • 13. * Librarian 2.0 1.0 PubMed Transform w/ VB and Search PubMed by T1 Institution Affiliation XML XPath expression 4.0 Edit/Enrich/Enhance Data in Excel: Data Accuracy and Consistency Analysis; 3.0 Name Authority Check; T3 Excel file T2 DCXML Check fields against Export to Excel publication template; Additional Fields added; Link enrichment; Peer Review Article Status Check; Copyright Check; 5.0 6.0 Divide data to subsets. SIP Packages for Transform to SIP T5 Export to Packages w/ Java Departmental Program Collections DSpace T4 Curated Datasets Users External Data Entity Process Data Flow Store/File
  • 14. * * Source Data: Search PubMed (http://www.ncbi.nlm.nih.gov/pubmed) by institution affiliation, save the result as XML file; * Data Analysis and Mapping: Refer to MedLine/PubMed Data Element (Field) Descriptions; map PubMed fields to DC elements; * Transfer PubMed XML File to DC XML File * VBScript run in Microsoft Visual Web Developer. Use XPath expressions. * For example: Get value for DC element “identifierIssn” from node "./MedlineCitation/Article/Journal/ISSN" in the retrieved PubMed xml file: * ' Check if there's a ISSN If node.SelectNodes("./MedlineCitation/Article/Journal/ISSN").Count > 0 Then writer.WriteStartElement("identifierIssn") writer.WriteValue(node.Item("MedlineCitation").Item("Article").Item("Journal").Item("ISSN").InnerText) writer.WriteEndElement() End If … * Complexity in Data Extraction and Transformation * identifierCitation: need to combine Journal title, Volume, Issue and Year, e.g., Journal title. 2011 Oct; 39(4):320-32. * dateIssued: need to combine Year, Month and Day under “PubDate”, e.g., 2011-10-01. Used “yyyy-mm-dd” format. Formalized data for these situations: only year and month available; only year and season available. However changed back to their original formats at the final project stage. * SubjectMesh: PubMed has DescriptorName and QualifierName. Need to consider different situations: descriptor with one qualifier, descriptor with multiple qualifiers… * contributorAuthor: list all authors' names under AuthorList…
  • 15. * PubMed Source DC Field Field Node in PubMed export XML file Note identifier PMID ./MedlineCitation/PMID Check if there's a ISSN in node “./MedlineCitation/Article/Journal/ISSN” of the PubMed XML file, if yes, output its mapped identifierIssn ISSN ./MedlineCitation/Article/Journal/ISSN DC element “identifierIssn” with the ISSN value. To form identifierCitation need to combine Journal title, Volume, Issue and Year ,e.g., identifierCitation Title ./MedlineCitation/Article/Journal/Title Journal title. 2011 Oct; 39(4):320-32. ./MedlineCitation/Article/Journal/JournalIssue PubDate /PubDate ./MedlineCitation/Article/Journal/JournalIssue Volume /Volume ./MedlineCitation/Article/Journal/JournalIssue Issue /Issue ./MedlineCitation/Article/Pagination/MedlinePg MedlinePgn n ./MedlineCitation/Article/Journal/JournalIssue To form dateIssued, combined Year and Month dateIssued Year /PubDate/Year under “PubDate”, e.g., “2011-10.” ./MedlineCitation/Article/Journal/JournalIssue Month /PubDate/Month ./MedlineCitation/Article/Journal/JournalIssue Check if there's a day, otherwise use default day Day /PubDate/Day “01.” “yyyy-mm-dd” format. ./MedlineCitation/Article/Journal/JournalIssue dateIssued Year /PubDate/Year Check if there's a season… If only season available, replace “spring” with “03- 01,” summer with “06-01” and winter with “12- ./MedlineCitation/Article/Journal/JournalIssue 01” (However changed back to its original format Season /PubDate/Season at the project final stage). relationIspartofseries1 Title ./MedlineCitation/Article/Journal/Title ./MedlineCitation/Article/Journal/ISOAbbreviat relationIspartofseries2 ISOAbbreviation ion
  • 16. * PubMed Source DC Field Field Node in PubMed export XML file Note Check if there's a “ArticleTitle” in node ./MedlineCitation/Article/ArticleTitle” of the PubMed XML file, if yes, output its mapped DC title ArticleTitle ./MedlineCitation/Article/ArticleTitle element “title” with the “ArticleTitle” value. formatExtent MedlinePgn ./MedlineCitation/Article/Pagination/MedlinePgn CopyrightInforma ./MedlineCitation/Article/Abstract/CopyrightInform RightsHolder tion ation descriptionSponsorship Agency ./MedlineCitation/Article/GrantList/Grant/Agency title2 VernacularTitle ./MedlineCitation/Article/VernacularTitle subjectMesh MeshHeadingList ./MedlineCitation/MeshHeadingList Check if there's a DescriptorName. Need to consider different situations: a descriptor with one qualifier, DescriptorName ./MedlineCitation/MeshHeadingList/MeshHeading with multiple qualifiers… QualifierName ./MedlineCitation/MeshHeadingList/MeshHeading contributorAuthor[1,2, Check if there's a AuthorList and then list all 3…] AuthorList ./MedlineCitation/Article/AuthorList authors' names. languageIso Language ./MedlineCitation/Article/Language ./MedlineCitation/Article/PublicationTypeList/Publi type PublicationType cationType coverageSpacial Country ./MedlineCitation/MedlineJournalInfo/Country MedlineTA MedlineTA ./MedlineCitation/MedlineJournalInfo/MedlineTA identifierIssn2 ISSNLinking ./MedlineCitation/MedlineJournalInfo/ISSNLinking Identifier2 NlmUniqueID ./MedlineCitation/MedlineJournalInfo/NlmUniqueID identifier3 ArticleId (doi) ./PubmedData/ArticleIdList/ArticleId[@IdType='doi'] identifier4 ArticleId (pii) ./PubmedData/ArticleIdList/ArticleId[@IdType='pii' identifier5 GrantID ./MedlineCitation/Article/GrantList/Grant/GrantID
  • 17. * * Refer to Faculty and Research Publication Template in DSpace (WSU SOAR); * dc.contributor.author * dc.date.issued * dc.identifier (doi) * dc.identifier.citation * dc.identifier.issn * dc.identifier.uri * dc.description * dc.description.abstract * dc.format.extent * dc.language.iso * dc.publisher * dc.relation.ispartofseries * dc.source?? * dc.title * dc.type * dc.coverage.spacial * dc.description.version (peer-reviewed status) * dc.rights.holder * After the DCXML file is exported to Excel, additional fields need to be added to the spreadsheet from the WSU research publication template; * Keep extra fields from PubMed export for this Collection?! * descriptionSponsorship * subjectMesh * MedlineTA (NLM journal title abbreviation)
  • 18. * * Edit affiliation field, make department names consistent and turn them to departmental collection names; * Name authority check: Check OCLC authority file, local Voyager authority file; * Peer-review article check: Check journal’s peer review status; Search Sherpa/Romeo, or, Ulrichsweb (serials directory); * Additional fields added: contributor (dept. name), publisher, identifier.uri, description; * Enriched data in existing fields: rightsHolder; * Copyright compliance: Only provide links to the article, not resave and host it; * Data accuracy and consistency analysis; * Divide records to several departmental collections.
  • 19. * * Edit affiliation field and find affiliation if not available; * Check “contributorAuthor” against OCLC and local Voyager authority file; * Add peer review status for “descriptionVersion” by searching Sherpa/Romeo or Ulrichsweb; * Delete the value in “identifierIssn2” if it’s the same as the value in “identifierIssn;” * Change “relationIspartofseries” (journal name) to proper case (use PROPER()); * Add “contributor” (corporate contributor, university department); * Sort by “identifierIssn,” add “publisher” (check Ulrich’s Periodical and Sherpa); * Search “identifierUri” by title, enrich doi links and other links; * Check “rightsHolder”; * Add “description” for links; * Check special characters in the spreadsheet; * Data accuracy and consistency check…
  • 20. * * Search for additional links and add link descriptions for publication records; * Access to full text is provided via links in the following preference order: * Direct link to the title: DOI, e.g., Click on the DOI link below to access the article. (Description) * Link to the electronic journal / proceedings record in the library catalog if the library subscribes this journal; link to the journal website for publications not licensed by the library. * For example: The full text of this article is not available in SOAR. WSU users can access the article via commercial databases licensed by University Libraries: ... The stable URL of this article is: … * If title not online, provide a note (no digital copy of this title available, or hard copy available in the library catalog) * For example: The full text of this article is not available in SOAR. Check the journal record … for the paper version of the article in the library. Or, * The full text of this article is not available in SOAR.
  • 21. * * An Add-on to Facilitate DSpace Batch Import Procedure * Part of the Google Summer of Code Project 2008; * This program was written by Blooma Mohan John at the Nanyang Technological University; * Written in Java. It transforms data prepared in Microsoft Excel or OpenOffice spreadsheet to DSpace batch import format, the XML-based Submission Information Packages (SIPs); * The spreadsheet requires inclusion of metadata as well as the locations of the digital resource. * Program wiki: https://wiki.duraspace.org/display/GSOC/Google+Summer+of+Code+2008+Batch+Import * Customization of the Add-on to Generate SIP Packages for Individual Collections from Spreadsheet * For customization of this program, refer to: * Deng, S., Matveyeva, S. & Khan, B. (2010). Enhancing workflow through batch import from Excel to DSpace. Kansas Library 2010 Conference. Wichita, KS, April 8, 2010. Available at: http://hdl.handle.net/10057/2366 * Deng, S. Optimizing Workflow through Metadata Repurposing and Batch Processing. Available at: http://www.tandfonline.com/doi/abs/10.1080/19386389.2010.524862#preview
  • 22. * * Export individual batch to its corresponding location (DSpace handle) * Transfer data to DSpace server using psftp, for example: open soar.wichita.edu cd /data/dspace/upload put PubMed_PSY.zip * Log into the server and run the DSpace ItemImport command, for example: cd /usr/local/src/dspace/upload unzip PubMed_PSY.zip cd /data/dspace/bin ./dsrun org.dspace.app.itemimport.ItemImport -a -e firstnamelastname@institution.edu -c 123456789/3377 -s /usr/local/src/dspace/upload/PubMed__PSY -m mapfilePubMed_PSY
  • 23. *
  • 24. * *
  • 25. * * In this example, the dates in yyyy- mm-dd format were changed back to their original format (as requested); * If number of changes exceeds maximum allowed, limit changes or alter bulkedit.gui-item- limit in dspace.cfg.
  • 26. * * PubMed to Excel Python script (by Nitin Arora): * http://blog.humaneguitarist.org/projects/pubmed2xl/ * Export PubMed citations to spreadsheet; * XSLT stylesheet can be customized to output desirable data; * Does not validate data against PubMed and DC xml schemas. * XSLT and XML Processor * Transform PubMed XML to DCXML with XSLT in an XML processor; * Use PubMed and DCXML schemas for element validation; * Question: DCXML in which format? SIP packages? How to add files/bitstreams to DSpace?
  • 27. * * IEEE Data Source: WSU faculty publications from The Institute of Electrical and Electronics Engineers (IEEE) Xplore. * This data was exported to individual research publications for Computer Science (CS), Electrical Engineering and Computer Science (EECS), Mathematics and Statistics (MAT), Mechanical Engineering (ME) and Physics (PHY).
  • 28. * Librarian 2.0 1.0 Excel Comma Edit/Enrich/Enhance Search IEEE by T1 Separated Data in Excel: Author Affiliation Value File Map fields to DC and and Export Results Check fields against publication template; Additional DC Fields added; 3.0 Curated Distribute titles to Transform to SIPs T2 appropriate collection Datasets sets; w/ Java Program Data Accuracy and Consistency Analysis; Peer Review Article Status Check; Copyright Check. SIP Packages for Research 4.0 T3 Publication Export to Collections DSpace Users External Data Entity Process Data Flow Store/File
  • 29. * *The project goal is to export to the repository’s appropriate collection retrospective records derived from WOS database for all available years; *Because librarians do not have an internal source of data on faculty publications, the only way to collect data is to turn to external sources; *WOS is the largest commercial source of data on published articles & conference proceedings in all subjects from 1988- that contains over 41 million records collected from over 12,000 journals.
  • 30. * Librarian 1.0 2.0 Search WOS by EndNote Export EndNote T1 T2 Organization and Citation Txt Files to Excel Export Results 2.0 Excel Data Edit/Enrich/Enhance Data 3.0 in Excel: Transform to SIPs Curated Sort Excel file by Authors; Map fields to DC and Check w/ Java Program T3 Datasets fields against publication template; Additional Fields added; Data Accuracy and Consistency Analysis; Name Authority Check; SIP Packages for 4.0 Link enrichment; Faculty Peer Review Article Status T4 Publication Export to Check; Collections DSpace Copyright Check; Divide data to subsets; Check for duplicates. Users External Data Entity Process Data Flow Store/File
  • 31. * * Issues to Think About * Is it appropriate for an IR be a back up resort to institutional research generated datasets in subject repositories and other digital repositories? * Open access sources and U.S. federal government sources are okay to use. We need to read commercial companies papers to see what they permit us to use. * Can librarians provide value added data services? Or data curation services? * Do librarians have sufficient training to describe and preserve scientific data? (Newton et al., 2011)
  • 32. * * DCC: Digital Curation Centre, UK * Full Lifecycle Actions: Description and Representation Information, Preservation Planning, Community Watch and Participation. * Sequential Actions: Conceptualize, Create or Receive, Appraise and Select, Ingest, Preservation Action, Store, Access, Use and Reuse, Transform. * Occasional Actions: Dispose, Reappraise, Migrate From http://www.dcc.ac.uk/resources/curation-lifecycle-model
  • 33. * It practices “Curate” and “Preserve” from It involves activities in an institution’s “Appraise & Select”, Building collections from aspect. “Ingest”, “Preservation external data sources put Action” and “Store.” emphasis on: “Access use, reuse” and “Transform.” Reuse: repurposing datasets from external data sources to IR or digital repository. Transform: preserve subsets. These projects promote data sharing Value added data between the services is part of the subject curation process. repository, This dynamic model can data providers be entered at any point?! and institutional repository?!
  • 34. * * To get data from other sources is an alternative way to develop collections for different disciplines since author self-deposit has not become a common practice for IRs. It can be a good way to find information for retrospective projects. * This is an effort in line with the current metadata cataloging trend of moving from item by item cataloging to batch processing of datasets, repurposing of metadata between different systems and communities, and providing value- added data curation services to students and faculty in an IR. * For small to medium sized institutions that do not have the budget for commercial data services, it is possible to get data from external databases or data sources through some procedures without infringing copyright policies. * If an institution prefers data enhancement to the external source, it may work better to export the data to a software program allowing local batch editing such as Excel than pre-populating it to the IR directly.
  • 35. * * The need for external data source in faculty publications arise from its unavailability on institutional level. * The situation may change with the improvement of research data management at universities. * University administration needs better statistical data for reporting and assessment purposes, more universities organize data on faculty research using online systems and some systems provide seamless flow of data to institutional repositories.
  • 36. * * Data acquisition and content recruitment for IRs * Institutional Repository vs. Subject Repository * Creating research publication collection in IRs and digital repositories * Data batch transfer from external sources to IRs * Data curation for IR and digital repository * Data curation services to support research lifecycle * Data management plan (Project level, institutional level) * Transform of Medline in PubMed to DC in DSpace * Workflow management * Value-added metadata services * Metadata fields selection and enrichment * Copyright compliance * Data acquisition from the open web…
  • 37. * * Constantopoulos, P., Dallas, C., Androutsopoulos, I., Angelis, S., Deligiannakis, A., Gavrilis, D., Kotidis, Y. and Papatheodorou, C. (2009). DCC&U: An extended digital curation lifecycle model. The International Journal of Digital Curation, Vol. 4, No. 1, pp. 34-45. * Digital Curation Centre (DCC). DCC curation lifecycle model. Retrieved from http://www.dcc.ac.uk/resources/curation- lifecycle-model. * Higgins, S. (2009). Applying the DCC curation lifecycle model. IASSIST 2009: Tampere, Finland 26-29 May. * Kim, J. (May 2011). Motivations of faculty self-archiving in institutional repositories. The Journal of Academic Librarianship, Vol. 37, No. 3, pp. 246-254, doi:10.1016/j.acalib.2011.02.017. * Luzi D., Di Cesare R., Ruggieri R. and Ricci M. (2012). Enhancing diffusion of scientific content: open data in open archives. In: GL-13 - Thirteenth International Conference on Grey Literature : the Grey Circuit, from Social Networking to Wealth Creation (Washington D.C., USA, 5-6 December 2011). Proceedings, D.J. Farace, J. Frantzen, GreyNet (eds.). TextRelease, 2012. * Newton, Mark P., Miller, Christopher C. and Bracke, Marianne S. (2011). Librarian roles in institutional repository data set collecting: outcomes of a research library task force. Collection Management, Volume 36 Issue 1, pages 53-67. DOI: 10.1080/01462679.2011.530546 * Nicholas, D., Rowlands, I., Watkinson, A., Brown, D. and Jamali, H. R. (2012). Digital repositories ten-years on: what do scientific researchers think of them and how do they use them? Learned Publishing, 25: 195–206 doi:10.1087/20120306 * Treloar, A., Groenewegen, D. and Harboe-Ree, C. (2007). The data curation continuum: Managing data objects in institutional repositories. D-Lib Magazine (September 2007). * University of Central Florida Library. Scholarly Communication: 21st Century Digital Scholarship at UCF. Retrieved from http://develop.lib.ucf.edu/ScholarlyCommunication/ (Temporary link). * Wichita State University Libraries. Shocker Open Access Repository (SOAR). Retrieved from http://soar.wichita.edu * Wickett, Karen M.; Hu, X. and Thomer, A. (2012). RDAP12 summit: challenges and opportunities for data management. Bulletin of the American Society for Information Science and Technology. June/July 2012, pages 14–19. Volume 38, Issue 5. DOI: 10.1002/bult.2012.1720380506 * Witt, M. (2012). Co-designing, co-developing, and co-implementing an institutional data repository service. Journal of Library Administration, 52(2). DOI:10.1080/01930826.2012.655607.
  • 38. * Sai Deng Associate Librarian and Cataloger/ Metadata Librarian, University of Central Florida Libraries Email: sai.deng@ucf.edu (Previously worked as a Metadata Cataloger at Wichita State University Libraries) Susan Matveyeva Associate Professor & Catalog & Institutional Repository Librarian, Wichita State University Libraries Email: susan.matveyeva@wichita.edu Thank you!