SlideShare uma empresa Scribd logo
1 de 15
If not now, when? 
Denise J. Hills, Sandy Ebersole, and W. Edward Osborne 
Geological Survey of Alabama 
GSA 2014
Data Preservation is Time Sensitive 
 Samples or media can 
deteriorate over time 
 Individual researchers 
have their own 
methodology of record-keeping 
 If the original researcher 
(or current maintainer) is 
no longer available, what 
are your options? 
GSA 2014
Data at the GSA/OGB 
 Legally charged to be a repository for data 
relating to energy and mineral resources. 
GSA 2014
Data Preservation Efforts 
at GSA/OGB 
NGGDPP NGDS 
GSA 2014
PARENT 
Well 
(NGDS) 
CHILD 
Core 
(SESAR) 
CHILD 
Well Logs 
(NGDS) 
CHILD 
Other Data 
(NGDS/SESAR, ?) 
Collection Organization 
PARENT 
Core 
(SESAR) 
CHILD 
Section or Part 
(SESAR) 
CHILD 
Thin Section 
(SESAR) 
CHILD 
Other Samples 
(SESAR, ?) 
PARENT 
Thin Section 
(SESAR) 
CHILD 
Photomicrograph 
(SESAR) 
CHILD 
SEM Analysis 
(SESAR) 
CHILD 
Related Analysis 
(SESAR, ?) 
GSA 2014
Data Management at GSA 
GSA 2014
Data Management at GSA 
GSA 2014
Version Control 
GSA 2014
Missing 
Metadata? 
yes 
no Original 
Provider 
Review 
A 
Locate 
Metadata 
Determine 
Canonical 
File(s) 
Start 
no 
B 
Missing 
Metadata? 
yes 
A 
Enter 
null value 
no 
no From 
originator? 
yes yes yes 
Revise 
Metadata 
no 
Other 
source? A In other 
file(s)?
submitted to NGDS 
Well Header Metadata 
submitted to NGDS 
Core Metadata 
USGIN Physical Sample Content 
Core Metadata 
USGIN Physical Sample Content
Revise 
Metadata 
Map to no 
USGIN 
yes 
Validated? 
Approved? 
Revise 
Metadata 
B 
no 
Missing 
Metadata? 
yes 
no Original 
Provider 
Review 
A 
Locate 
Metadata 
Determine 
Canonical 
File(s) 
Start 
Thin Section Metadata 
USGIN Physical Sample Content Model
Complete 
Submit to 
SESAR 
Receive 
IGSN 
yes 
Generate 
SESAR 
Template 
Populate 
Template 
Revise no yes 
Validated? 
Entries 
Revise 
Metadata 
Map to no 
USGIN 
yes 
Validated? 
Approved? 
Revise 
Metadata 
B 
no 
Missing 
Metadata? 
yes 
no Original 
Provider 
Review 
A 
Locate 
Metadata 
Determine 
Canonical 
File(s) 
Start
Core Metadata 
USGIN Physical Sample Content 
Thin Section Metadata 
USGIN Physical Sample Content Model 
Thin Section Metadata 
USGIN Physical Sample Content Model
Summary 
 Use Version Control (e.g., Git) or similar 
 Build on existing standards 
 Develop a workflow 
 Involve current data holder as well as those not 
as familiar with the information 
 Register your samples and data 
DO IT NOW! 
GSA 2014
GSA 2014

Mais conteúdo relacionado

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Hills "If Not Now, When?" Presentation, GSA 2014

  • 1. If not now, when? Denise J. Hills, Sandy Ebersole, and W. Edward Osborne Geological Survey of Alabama GSA 2014
  • 2. Data Preservation is Time Sensitive  Samples or media can deteriorate over time  Individual researchers have their own methodology of record-keeping  If the original researcher (or current maintainer) is no longer available, what are your options? GSA 2014
  • 3. Data at the GSA/OGB  Legally charged to be a repository for data relating to energy and mineral resources. GSA 2014
  • 4. Data Preservation Efforts at GSA/OGB NGGDPP NGDS GSA 2014
  • 5. PARENT Well (NGDS) CHILD Core (SESAR) CHILD Well Logs (NGDS) CHILD Other Data (NGDS/SESAR, ?) Collection Organization PARENT Core (SESAR) CHILD Section or Part (SESAR) CHILD Thin Section (SESAR) CHILD Other Samples (SESAR, ?) PARENT Thin Section (SESAR) CHILD Photomicrograph (SESAR) CHILD SEM Analysis (SESAR) CHILD Related Analysis (SESAR, ?) GSA 2014
  • 6. Data Management at GSA GSA 2014
  • 7. Data Management at GSA GSA 2014
  • 9. Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start no B Missing Metadata? yes A Enter null value no no From originator? yes yes yes Revise Metadata no Other source? A In other file(s)?
  • 10. submitted to NGDS Well Header Metadata submitted to NGDS Core Metadata USGIN Physical Sample Content Core Metadata USGIN Physical Sample Content
  • 11. Revise Metadata Map to no USGIN yes Validated? Approved? Revise Metadata B no Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start Thin Section Metadata USGIN Physical Sample Content Model
  • 12. Complete Submit to SESAR Receive IGSN yes Generate SESAR Template Populate Template Revise no yes Validated? Entries Revise Metadata Map to no USGIN yes Validated? Approved? Revise Metadata B no Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start
  • 13. Core Metadata USGIN Physical Sample Content Thin Section Metadata USGIN Physical Sample Content Model Thin Section Metadata USGIN Physical Sample Content Model
  • 14. Summary  Use Version Control (e.g., Git) or similar  Build on existing standards  Develop a workflow  Involve current data holder as well as those not as familiar with the information  Register your samples and data DO IT NOW! GSA 2014

Notas do Editor

  1. Two examples: DKM thin section DB Recovery after Lewis’ death
  2. including: Geophysical well logs Cores, cuttings, and other physical samples, sometimes with descriptions Fluid production and injection information from oil and gas wells Geologic maps As with many agencies, GSA/OGB has challenges: Data discoverability often difficult Much of the available information was analog Even digital data was not always “machine-readable” Lack of standardization and documentation of data and metadata Provenance and quality often poor or unknown
  3. NGGDPP – Since 2007 >36,000 metadata records uploaded Capturing info for >100,000 individual fossil specimens Also includes geologic maps, oil and gas well cores, geologic cuttings, thin sections, and other physical samples NGDS allowed GSA to generate large quantities of digitally preserved data in a standardized format NGDS Project ran from 2010-2013 Included Geo map with metadata in OneGeology schema Well Headers, >9,000 Well Log metadata, >10,000 BHT metadata, >11,000 Faults 297 Lithologic interval metadata 4,719 Through both of these projects, GSA personnel became familiar with schemas, schema-mapping, and processes to streamline metadata rescue
  4. Much of the physical sample collection structure at the GSA and OGB can be broken down into parent-child relationships. This aids in creating linked records. Well header information was recorded/QC’d through NGDS, some through NGGDPP (including core info). Moving forward, can build on these links to reduce duplicate effort. As part of its mandate to regulate the petroleum industry in Alabama, the OGB requires that companies drilling oil and gas wells in the state provide the GSA with a share of any well samples collected, as well as copies of any geophysical well logs or other testing undertaken. The OGB stores and maintains the geophysical well logs and other documentation, while the GSA stores and maintains the cores and cuttings in its core and sample warehouse. The parent object in the collection is typically the permitted oil and gas well, with all other information tied back to the well. A large amount of other data at the GSA not directly related to oil and gas development, yet still associated with a permitted well, has been held or maintained by individual researchers. Access then is limited to those with the prior knowledge of what researcher to question about the availability of these data. Other challenges arise due to researchers’ individual record-keeping methodology, e.g., use a unique notation system, failure to record information that others may need to understand the work. When the individual is no longer available, the data and objects may therefore become useless. Often, then, it becomes difficult or impossible to regain the information necessary to make that data useful to another researcher.
  5. The GSA recently confronted this issue with the sudden death of our long-time core warehouse manager. As the warehouse filled, cores had to be shifted and relocated. The new locations were not always recorded, as the locations could be temporary until a more logical permanent location was determined. The manager had kept a “mental map” of some core locations. While he had always been able to locate these items immediately, others did not have his knowledge and thus could not replicate his work. Following his death, GSA personnel spent significant time determining physical locations of items to make up for that lack of knowledge. The GSA is actively working to prevent this sort of incident with other collections. For example, information was maintained in several different spreadsheets. These documents contained distinct AND overlapping information. Determination of canonical files/records is a time consuming process.
  6. DKM Thin Section DB – We’ve been keeping track of TS with a basic spread sheet, with no way to cross reference data. Each person had their own way of doing things, of recording information. Old “database” – single spreadsheet. Inconsistencies with measurement reporting (e.g., depths sometimes reported in full, sometimes as range, sometimes as final digits in range). Some TS had NO depth, or unknown what notation meant. It only got worse when you look at the records for photos for these TS Top – TS record. Note depths are recorded inconsistently. Bottom – Photomicrograph records. Note depth incompletely recorded, extensive abbreviations (not always consistent), dependency on previous info (e.g., when changing magnification).
  7. To address lack of clarity of what file should be preeminent, we suggest the use of version control. Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. An example of a version control system is Git, developed by Linus Torvalds for Linux kernel development. Git is a distributed version control system (L) that organizes data like a set of snapshots of a mini file system – every time a project is committed (save the state of the project) in Git, the state of all the files is saved. If the file is unchanged, a new copy is not generated, just a link to the previous identical file (R) (L) Simple diagram of a distributed version control system such as Git (modified from Figure 1-3 from ProGit manual). (R) Demonstration of what changes with each version using a distributed version control system such as Git (modified from Figure 1-5 from ProGit manual).
  8. The first step in our process was at first look an easy one – gather the relevant records about the thin sections and photomicrographs. However, when the researcher was approached, he realized he had multiple versions of the spreadsheets he had been using. Even though the files had time and date stamps, information in the newer files was not always the most accurate. Some information captured in an older file was not present in the newer files, although it was not obsolete. It became obvious that multiple files had been developed and modified concurrently. This resulted in many hours spent sifting through the files line by line to compare the information recorded, and then to verify as best as possible the most accurate information. We next examined the information contained within those spreadsheets to pull out the relevant metadata. This led to the discovery of several idiosyncrasies in the recordkeeping, such as inconsistent or unclear use of abbreviations as well as notations such as “same as previous” which could prove disingenuous if records were re-sorted, otherwise moved, or even deleted. Additionally, multiple categories of information (e.g., lithology, sedimentary structures) were recorded within a single cell. Recording of methodology was also inconsistent to non-existent, although as the original researcher was available, much of this information could be reconstructed from his other notes. Semi-automated workflows were developed to aid in this “translation” process. Abbreviations were replaced with full text or standardized. Information grouped together was split into individual categories. Metadata that were gathered from the NGDS project (e.g., related to well header) could be matched with child items (e.g., Figures 6-7). Well header metadata gathered for the NGDS project (upper) could be matched to child items (lower) based on the OGB permit number (GSAPER:624, in this instance, highlighted). The parent HeaderURI becomes the ParentSpecimenURI (blue boxes), the APINo maps to SamplingFeatureURI (green boxes), and the parent locations map to the child locations (red boxes). URI – unique record identifier
  9. The first step in our process was at first look an easy one – gather the relevant records about the thin sections and photomicrographs. However, when the researcher was approached, he realized he had multiple versions of the spreadsheets he had been using. Even though the files had time and date stamps, information in the newer files was not always the most accurate. Some information captured in an older file was not present in the newer files, although it was not obsolete. It became obvious that multiple files had been developed and modified concurrently. This resulted in many hours spent sifting through the files line by line to compare the information recorded, and then to verify as best as possible the most accurate information. We next examined the information contained within those spreadsheets to pull out the relevant metadata. This led to the discovery of several idiosyncrasies in the recordkeeping, such as inconsistent or unclear use of abbreviations as well as notations such as “same as previous” which could prove disingenuous if records were re-sorted, otherwise moved, or even deleted. Additionally, multiple categories of information (e.g., lithology, sedimentary structures) were recorded within a single cell. Recording of methodology was also inconsistent to non-existent, although as the original researcher was available, much of this information could be reconstructed from his other notes. Semi-automated workflows were developed to aid in this “translation” process. Abbreviations were replaced with full text or standardized. Information grouped together was split into individual categories. Metadata that were gathered from the NGDS project (e.g., related to well header) could be matched with child items (e.g., Figures 6-7). Well header metadata gathered for the NGDS project (upper) could be matched to child items (lower) based on the OGB permit number (GSAPER:624, in this instance, highlighted). The parent HeaderURI becomes the ParentSpecimenURI (blue boxes), the APINo maps to SamplingFeatureURI (green boxes), and the parent locations map to the child locations (red boxes). URI – unique record identifier
  10. Once the original researcher reviewed the updated metadata to help reduce mistranslation and other errors, we mapped the available metadata to an existing USGIN content model for Physical Samples (v.0.8). This content model is based on consideration of content requested for other schemas and services, such as the System for Earth Sample Registration (SESAR), Geoscience Markup Language (GeoSciML), and others. Although this content model is still under review, USGIN provides a content model validation tool to verify appropriate data formatting and content. Any corrections highlighted by the validator were made prior to the final step, registration of the samples. USGIN Content Model fields populated through the GSA workflow
  11. The USGIN Physical Samples content model includes the information necessary for SESAR registration (e.g., Figures 7-8). SESAR operates the registry that distributes the International GeoSample Number (IGSN). The IGSN is a 9-digit alphanumeric code that is assigned to specimens and related sampling features such as drill holes or wells to ensure their unique identification and unambiguous referencing of data generated by the study of samples. SESAR catalogs and preserves sample metadata profiles and then provides access to the sample catalog via search. SESAR allows for batch registration of samples, a clear need for the GSA when we are ultimately looking at hundreds of thousands of potential registrations. Through SESAR’s web interface, a batch file template can be generated (Figure 9). Once the necessary information is entered into the template (Figure 10), a simple process of copying from the USGIN content model, the samples can be registered. SESAR will then respond with the IGSNs of the samples once they are registered. By making the legacy data rescue and preservation process as simple as possible through the development of template workflows, such as that presented here, personnel are more likely to adopt and adhere to standards. Template workflows also simplify training of additional personnel to assist in the registration process. Ultimately this increases data and metadata exposure and interoperability.
  12. The USGIN Physical Samples content model includes the information necessary for SESAR registration (e.g., Figures 7-8). SESAR operates the registry that distributes the International GeoSample Number (IGSN). The IGSN is a 9-digit alphanumeric code that is assigned to specimens and related sampling features such as drill holes or wells to ensure their unique identification and unambiguous referencing of data generated by the study of samples. SESAR catalogs and preserves sample metadata profiles and then provides access to the sample catalog via search. SESAR allows for batch registration of samples, a clear need for the GSA when we are ultimately looking at hundreds of thousands of potential registrations. Through SESAR’s web interface, a batch file template can be generated (Figure 9). Once the necessary information is entered into the template (Figure 10), a simple process of copying from the USGIN content model, the samples can be registered. SESAR will then respond with the IGSNs of the samples once they are registered.
  13. Use Version Control (e.g., Git) or similar Build on existing standards – USGIN, OneGeology, etc Develop a workflow – iterative process to refine Involve current data holder as well as those not as familiar with the information – current archivist checks for accuracy, others check for usability Register your samples and data – thus, people can FIND the information and use it => becomes VALUABLE