Hills "If Not Now, When?" Presentation, GSA 2014

•Transferir como PPTX, PDF•

1 gostou•474 visualizações

D. Hills' presentation at the 2014 Annual Geological Society of America Meeting on geoscience data preservation efforts at the Geological Survey of Alabama. The presentation includes a description of the workflow developed to capture the necessary metadata to register samples and make the data and samples discoverable by a wider audience.

Ciências

If not now, when?
Denise J. Hills, Sandy Ebersole, and W. Edward Osborne
Geological Survey of Alabama
GSA 2014

Data Preservation is Time Sensitive
 Samples or media can
deteriorate over time
 Individual researchers
have their own
methodology of record-keeping
 If the original researcher
(or current maintainer) is
no longer available, what
are your options?
GSA 2014

Data at the GSA/OGB
 Legally charged to be a repository for data
relating to energy and mineral resources.
GSA 2014

Data Preservation Efforts
at GSA/OGB
NGGDPP NGDS
GSA 2014

PARENT
Well
(NGDS)
CHILD
Core
(SESAR)
CHILD
Well Logs
(NGDS)
CHILD
Other Data
(NGDS/SESAR, ?)
Collection Organization
PARENT
Core
(SESAR)
CHILD
Section or Part
(SESAR)
CHILD
Thin Section
(SESAR)
CHILD
Other Samples
(SESAR, ?)
PARENT
Thin Section
(SESAR)
CHILD
Photomicrograph
(SESAR)
CHILD
SEM Analysis
(SESAR)
CHILD
Related Analysis
(SESAR, ?)
GSA 2014

Missing
Metadata?
yes
no Original
Provider
Review
A
Locate
Metadata
Determine
Canonical
File(s)
Start
no
B
Missing
Metadata?
yes
A
Enter
null value
no
no From
originator?
yes yes yes
Revise
Metadata
no
Other
source? A In other
file(s)?

submitted to NGDS
Well Header Metadata
submitted to NGDS
Core Metadata
USGIN Physical Sample Content
Core Metadata
USGIN Physical Sample Content

Revise
Metadata
Map to no
USGIN
yes
Validated?
Approved?
Revise
Metadata
B
no
Missing
Metadata?
yes
no Original
Provider
Review
A
Locate
Metadata
Determine
Canonical
File(s)
Start
Thin Section Metadata
USGIN Physical Sample Content Model

Complete
Submit to
SESAR
Receive
IGSN
yes
Generate
SESAR
Template
Populate
Template
Revise no yes
Validated?
Entries
Revise
Metadata
Map to no
USGIN
yes
Validated?
Approved?
Revise
Metadata
B
no
Missing
Metadata?
yes
no Original
Provider
Review
A
Locate
Metadata
Determine
Canonical
File(s)
Start

Core Metadata
USGIN Physical Sample Content
Thin Section Metadata
USGIN Physical Sample Content Model
Thin Section Metadata
USGIN Physical Sample Content Model

Summary
 Use Version Control (e.g., Git) or similar
 Build on existing standards
 Develop a workflow
 Involve current data holder as well as those not
as familiar with the information
 Register your samples and data
DO IT NOW!
GSA 2014

Mais conteúdo relacionado

Último

Molecular markers are identifiable DNA sequences used to locate genes associated with specific traits or genetic conditions. A molecular marker is a specific gene fragment present at a specific position called ‘locus’ (pleural loci) in the genome of a cell. In the pool of unknown DNA or in a whole chromosome, these molecular markers help in identification of particular sequence of DNA at particular location.

Molecular markers- RFLP, RAPD, AFLP, SNP etc.

Silpa

Context. WASP-76 b has been a recurrent subject of study since the detection of a signature in high-resolution transit spectroscopy data indicating an asymmetry between the two limbs of the planet. The existence of this asymmetric signature has been confirmed by multiple studies, but its physical origin is still under debate. In addition, it contrasts with the absence of asymmetry reported in the infrared (IR) phase curve. Aims. We provide a more comprehensive dataset of WASP-76 b with the goal of drawing a complete view of the physical processes at work in this atmosphere. In particular, we attempt to reconcile visible high-resolution transit spectroscopy data and IR broadband phase curves. Methods. We gathered 3 phase curves, 20 occultations, and 6 transits for WASP-76 b in the visible with the CHEOPS space telescope. We also report the analysis of three unpublished sectors observed by the TESS space telescope (also in the visible), which represents 34 phase curves. Results. WASP-76 b displays an occultation of 260±11 and 152±10 ppm in TESS and CHEOPS bandpasses respectively. Depending on the composition assumed for the atmosphere and the data reduction used for the IR data, we derived geometric albedo estimates that range from 0.05 ± 0.023 to 0.146 ± 0.013 and from <0.13 to 0.189 ± 0.017 in the CHEOPS and TESS bandpasses, respectively. As expected from the IR phase curves, a low-order model of the phase curves does not yield any detectable asymmetry in the visible either. However, an empirical model allowing for sharper phase curve variations offers a hint of a flux excess before the occultation, with an amplitude of ∼40 ppm, an orbital offset of ∼−30◦ , and a width of ∼20◦ . We also constrained the orbital eccentricity of WASP-76 b to a value lower than 0.0067, with a 99.7% confidence level. This result contradicts earlier proposed scenarios aimed at explaining the asymmetry observed in high-resolution transit spectroscopy. Conclusions. In light of these findings, we hypothesise that WASP-76 b could have night-side clouds that extend predominantly towards its eastern limb. At this limb, the clouds would be associated with spherical droplets or spherically shaped aerosols of an unknown species, which would be responsible for a glory effect in the visible phase curves.

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Sérgio Sacani

300003-World Science Day For Peace And Development.pptx

ryanrooker

Porella : features, morphology, anatomy, reproduction etc.

Silpa

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA

Dr. TATHAGAT KHOBRAGADE

module for grade 9 for distance learning

levieagacer

Factory Acceptance Test( FAT).pptx .

Poonam Aher Patil

An introduction on sequence tagged site mapping

adibshanto115

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx

DiariAli

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

Alex Henderson

Human genetics..........................pptx

Silpa

www.seribangash.com The Mariana Trench is one of the most remarkable geological features on Earth. Here are some details about it: Location: The Mariana Trench is located in the western Pacific Ocean, east of the Mariana Islands. It stretches for about 2,550 kilometers (1,580 miles) and is known as the deepest part of the world's oceans. Depth: The trench reaches incredible depths, with its deepest point known as the Challenger Deep, which plunges down to approximately 10,984 meters (36,037 feet) below sea level. To put this into perspective, if Mount Everest, the tallest mountain on Earth, were placed at the bottom of the Challenger Deep, its peak would still be over 2 kilometers (1.25 miles) underwater. Formation: The Mariana Trench was formed by the subduction of the Pacific Plate beneath the Mariana Plate. This process creates a deep trench as the heavier Pacific Plate is forced beneath the lighter Mariana Plate. Geological Features: The trench is characterized by steep, V-shaped valleys, and its walls are composed of highly compressed sedimentary rock. At the bottom of the trench, there are also large amounts of marine sediment. Pressure: The pressure at the bottom of the Mariana Trench is immense, reaching over 1,000 times the pressure at the surface. This extreme pressure creates a challenging environment for exploration and makes it difficult for organisms to survive. Exploration: Despite its extreme conditions, the Mariana Trench has been the subject of numerous scientific expeditions and explorations. One of the most famous explorations was the dive to the Challenger Deep by Swiss scientist Jacques Piccard and U.S. Navy Lieutenant Don Walsh in 1960. More recently, in 2012, filmmaker James Cameron made a solo dive to the bottom of the Challenger Deep in the Deepsea Challenger submersible. Biological Discoveries: Despite the harsh conditions, the Mariana Trench is home to a surprising variety of life forms, including unique species of deep-sea fish, crustaceans, and microbial life. Some organisms have adapted to survive in the extreme pressure and darkness of the trench. Environmental Importance: Studying the Mariana Trench provides valuable insights into the geology, biology, and oceanography of the deep sea. It also helps scientists better understand the processes that shape the Earth's crust and the distribution of life in the oceans. Conservation: Due to its remote location and extreme depths, the Mariana Trench has remained relatively untouched by human activity. However, there is growing concern about the potential impacts of deep-sea mining and pollution on this fragile ecosystem, highlighting the need for conservation efforts to protect this unique environment. https://seribangash.com/barber-shop-business-complete-guide-for-beginners/ https://seribangash.com/legend-virat-kohli-in-cricket-history/

The Mariana Trench remarkable geological features on Earth.pptx

seri bangash

development of diagnostic enzyme assay to detect leuser virus

NazaninKarimi6

Zoology 5th semester notes( Sumit_yadav).pdf

Sumit Kumar yadav

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Silpa

biology HL practice questions IB BIOLOGY

1301aanya

Bacterial Identification and Classifications

Areesha Ahmad

Stages in the normal growth curve

Areesha Ahmad

fruit fly, this slide mainly made for pumpkin fruit fly, this is also known as drosophila melangastor, this type of fruit fly destroyed the mainly vegetables crops. if you want to known examples this types of fly which is destroy the pumpkin, tomato, brinjal, potato, bottle guard, ridge guard, bitter guard, cucumber, water melon, musk melon, bean, long bean and other many vegetables which has fruits. they distryed fruit fly. thank you...

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

PRADYUMMAURYA1

Use of mutants in understanding seedling development.pptx

RenuJangid3

Destaque

https://www.hubspot.com/state-of-marketing · Scaling relationships and proving ROI · Social media is the place for search, sales, and service · Authentic influencer partnerships fuel brand growth · The strongest connections happen via call, click, chat, and camera. · Time saved with AI leads to more creative work · Seeking: A single source of truth · TLDR; Get on social, try AI, and align your systems. · More human marketing, powered by robots

2024 State of Marketing Report – by Hubspot

Marius Sescu

ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!

Everything You Need To Know About ChatGPT

Expeed Software

Product Design Trends in 2024 | Teenage Engineerings

Pixeldarts

Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits. To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups. Technology For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did. While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis. Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad. Age and Gender When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same. Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers. Race Affects Attitudes As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).

How Race, Age and Gender Shape Attitudes Towards Mental Health

ThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

marketingartwork

Skeleton Culture Code

Skeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024

Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)

contently

How to Prepare For a Successful Job Search for 2024

Albert Qian

A report by thenetworkone and Kurio. The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).

Social Media Marketing Trends 2024 // The Global Indie Insights

Kurio // The Social Media Age(ncy)

The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands. It’s important that you’re ready to implement new strategies in 2024. Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024. You’ll learn: - The latest trends in AI and automation, and what this means for an evolving paid search ecosystem. - New developments in privacy and data regulation. - Emerging ad formats that are expected to make an impact next year. Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape. If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.

Trends In Paid Search: Navigating The Digital Landscape In 2024

Search Engine Journal

From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world. With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively. The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage. Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience. See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers See the original article on Forbes here: http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b

5 Public speaking tips from TED - Visualized summary

SpeakerHub

Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world. Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance. For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age. Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.

ChatGPT and the Future of Work - Clark Boyd

Clark Boyd

Getting into the tech field. what next

Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Lily Ray

How to have difficult conversations

Rajiv Jayarajah, MAppComm, ACC

Introduction to Data Science

Christy Abraham Joy

Time Management & Productivity - Best Practices

Vit Horky

The six step guide to practical project management If you think managing projects is too difficult, think again. We’ve stripped back project management processes to the basics – to make it quicker and easier, without sacrificing the vital ingredients for success. “If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.” Dr Andrew Makar, Tactical Project Management

The six step guide to practical project management

MindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

RachelPearson36

Destaque (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Hills "If Not Now, When?" Presentation, GSA 2014

1. If not now, when? Denise J. Hills, Sandy Ebersole, and W. Edward Osborne Geological Survey of Alabama GSA 2014

2. Data Preservation is Time Sensitive  Samples or media can deteriorate over time  Individual researchers have their own methodology of record-keeping  If the original researcher (or current maintainer) is no longer available, what are your options? GSA 2014

3. Data at the GSA/OGB  Legally charged to be a repository for data relating to energy and mineral resources. GSA 2014

4. Data Preservation Efforts at GSA/OGB NGGDPP NGDS GSA 2014

5. PARENT Well (NGDS) CHILD Core (SESAR) CHILD Well Logs (NGDS) CHILD Other Data (NGDS/SESAR, ?) Collection Organization PARENT Core (SESAR) CHILD Section or Part (SESAR) CHILD Thin Section (SESAR) CHILD Other Samples (SESAR, ?) PARENT Thin Section (SESAR) CHILD Photomicrograph (SESAR) CHILD SEM Analysis (SESAR) CHILD Related Analysis (SESAR, ?) GSA 2014

6. Data Management at GSA GSA 2014

7. Data Management at GSA GSA 2014

8. Version Control GSA 2014

9. Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start no B Missing Metadata? yes A Enter null value no no From originator? yes yes yes Revise Metadata no Other source? A In other file(s)?

10. submitted to NGDS Well Header Metadata submitted to NGDS Core Metadata USGIN Physical Sample Content Core Metadata USGIN Physical Sample Content

11. Revise Metadata Map to no USGIN yes Validated? Approved? Revise Metadata B no Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start Thin Section Metadata USGIN Physical Sample Content Model

12. Complete Submit to SESAR Receive IGSN yes Generate SESAR Template Populate Template Revise no yes Validated? Entries Revise Metadata Map to no USGIN yes Validated? Approved? Revise Metadata B no Missing Metadata? yes no Original Provider Review A Locate Metadata Determine Canonical File(s) Start

13. Core Metadata USGIN Physical Sample Content Thin Section Metadata USGIN Physical Sample Content Model Thin Section Metadata USGIN Physical Sample Content Model

14. Summary  Use Version Control (e.g., Git) or similar  Build on existing standards  Develop a workflow  Involve current data holder as well as those not as familiar with the information  Register your samples and data DO IT NOW! GSA 2014

15. GSA 2014

Notas do Editor

Two examples: DKM thin section DB Recovery after Lewis’ death
including: Geophysical well logs Cores, cuttings, and other physical samples, sometimes with descriptions Fluid production and injection information from oil and gas wells Geologic maps As with many agencies, GSA/OGB has challenges: Data discoverability often difficult Much of the available information was analog Even digital data was not always “machine-readable” Lack of standardization and documentation of data and metadata Provenance and quality often poor or unknown
NGGDPP – Since 2007 >36,000 metadata records uploaded Capturing info for >100,000 individual fossil specimens Also includes geologic maps, oil and gas well cores, geologic cuttings, thin sections, and other physical samples NGDS allowed GSA to generate large quantities of digitally preserved data in a standardized format NGDS Project ran from 2010-2013 Included Geo map with metadata in OneGeology schema Well Headers, >9,000 Well Log metadata, >10,000 BHT metadata, >11,000 Faults 297 Lithologic interval metadata 4,719 Through both of these projects, GSA personnel became familiar with schemas, schema-mapping, and processes to streamline metadata rescue
Much of the physical sample collection structure at the GSA and OGB can be broken down into parent-child relationships. This aids in creating linked records. Well header information was recorded/QC’d through NGDS, some through NGGDPP (including core info). Moving forward, can build on these links to reduce duplicate effort. As part of its mandate to regulate the petroleum industry in Alabama, the OGB requires that companies drilling oil and gas wells in the state provide the GSA with a share of any well samples collected, as well as copies of any geophysical well logs or other testing undertaken. The OGB stores and maintains the geophysical well logs and other documentation, while the GSA stores and maintains the cores and cuttings in its core and sample warehouse. The parent object in the collection is typically the permitted oil and gas well, with all other information tied back to the well. A large amount of other data at the GSA not directly related to oil and gas development, yet still associated with a permitted well, has been held or maintained by individual researchers. Access then is limited to those with the prior knowledge of what researcher to question about the availability of these data. Other challenges arise due to researchers’ individual record-keeping methodology, e.g., use a unique notation system, failure to record information that others may need to understand the work. When the individual is no longer available, the data and objects may therefore become useless. Often, then, it becomes difficult or impossible to regain the information necessary to make that data useful to another researcher.
The GSA recently confronted this issue with the sudden death of our long-time core warehouse manager. As the warehouse filled, cores had to be shifted and relocated. The new locations were not always recorded, as the locations could be temporary until a more logical permanent location was determined. The manager had kept a “mental map” of some core locations. While he had always been able to locate these items immediately, others did not have his knowledge and thus could not replicate his work. Following his death, GSA personnel spent significant time determining physical locations of items to make up for that lack of knowledge. The GSA is actively working to prevent this sort of incident with other collections. For example, information was maintained in several different spreadsheets. These documents contained distinct AND overlapping information. Determination of canonical files/records is a time consuming process.
DKM Thin Section DB – We’ve been keeping track of TS with a basic spread sheet, with no way to cross reference data. Each person had their own way of doing things, of recording information. Old “database” – single spreadsheet. Inconsistencies with measurement reporting (e.g., depths sometimes reported in full, sometimes as range, sometimes as final digits in range). Some TS had NO depth, or unknown what notation meant. It only got worse when you look at the records for photos for these TS Top – TS record. Note depths are recorded inconsistently. Bottom – Photomicrograph records. Note depth incompletely recorded, extensive abbreviations (not always consistent), dependency on previous info (e.g., when changing magnification).
To address lack of clarity of what file should be preeminent, we suggest the use of version control. Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. An example of a version control system is Git, developed by Linus Torvalds for Linux kernel development. Git is a distributed version control system (L) that organizes data like a set of snapshots of a mini file system – every time a project is committed (save the state of the project) in Git, the state of all the files is saved. If the file is unchanged, a new copy is not generated, just a link to the previous identical file (R) (L) Simple diagram of a distributed version control system such as Git (modified from Figure 1-3 from ProGit manual). (R) Demonstration of what changes with each version using a distributed version control system such as Git (modified from Figure 1-5 from ProGit manual).
The first step in our process was at first look an easy one – gather the relevant records about the thin sections and photomicrographs. However, when the researcher was approached, he realized he had multiple versions of the spreadsheets he had been using. Even though the files had time and date stamps, information in the newer files was not always the most accurate. Some information captured in an older file was not present in the newer files, although it was not obsolete. It became obvious that multiple files had been developed and modified concurrently. This resulted in many hours spent sifting through the files line by line to compare the information recorded, and then to verify as best as possible the most accurate information. We next examined the information contained within those spreadsheets to pull out the relevant metadata. This led to the discovery of several idiosyncrasies in the recordkeeping, such as inconsistent or unclear use of abbreviations as well as notations such as “same as previous” which could prove disingenuous if records were re-sorted, otherwise moved, or even deleted. Additionally, multiple categories of information (e.g., lithology, sedimentary structures) were recorded within a single cell. Recording of methodology was also inconsistent to non-existent, although as the original researcher was available, much of this information could be reconstructed from his other notes. Semi-automated workflows were developed to aid in this “translation” process. Abbreviations were replaced with full text or standardized. Information grouped together was split into individual categories. Metadata that were gathered from the NGDS project (e.g., related to well header) could be matched with child items (e.g., Figures 6-7). Well header metadata gathered for the NGDS project (upper) could be matched to child items (lower) based on the OGB permit number (GSAPER:624, in this instance, highlighted). The parent HeaderURI becomes the ParentSpecimenURI (blue boxes), the APINo maps to SamplingFeatureURI (green boxes), and the parent locations map to the child locations (red boxes). URI – unique record identifier
The first step in our process was at first look an easy one – gather the relevant records about the thin sections and photomicrographs. However, when the researcher was approached, he realized he had multiple versions of the spreadsheets he had been using. Even though the files had time and date stamps, information in the newer files was not always the most accurate. Some information captured in an older file was not present in the newer files, although it was not obsolete. It became obvious that multiple files had been developed and modified concurrently. This resulted in many hours spent sifting through the files line by line to compare the information recorded, and then to verify as best as possible the most accurate information. We next examined the information contained within those spreadsheets to pull out the relevant metadata. This led to the discovery of several idiosyncrasies in the recordkeeping, such as inconsistent or unclear use of abbreviations as well as notations such as “same as previous” which could prove disingenuous if records were re-sorted, otherwise moved, or even deleted. Additionally, multiple categories of information (e.g., lithology, sedimentary structures) were recorded within a single cell. Recording of methodology was also inconsistent to non-existent, although as the original researcher was available, much of this information could be reconstructed from his other notes. Semi-automated workflows were developed to aid in this “translation” process. Abbreviations were replaced with full text or standardized. Information grouped together was split into individual categories. Metadata that were gathered from the NGDS project (e.g., related to well header) could be matched with child items (e.g., Figures 6-7). Well header metadata gathered for the NGDS project (upper) could be matched to child items (lower) based on the OGB permit number (GSAPER:624, in this instance, highlighted). The parent HeaderURI becomes the ParentSpecimenURI (blue boxes), the APINo maps to SamplingFeatureURI (green boxes), and the parent locations map to the child locations (red boxes). URI – unique record identifier
Once the original researcher reviewed the updated metadata to help reduce mistranslation and other errors, we mapped the available metadata to an existing USGIN content model for Physical Samples (v.0.8). This content model is based on consideration of content requested for other schemas and services, such as the System for Earth Sample Registration (SESAR), Geoscience Markup Language (GeoSciML), and others. Although this content model is still under review, USGIN provides a content model validation tool to verify appropriate data formatting and content. Any corrections highlighted by the validator were made prior to the final step, registration of the samples. USGIN Content Model fields populated through the GSA workflow
The USGIN Physical Samples content model includes the information necessary for SESAR registration (e.g., Figures 7-8). SESAR operates the registry that distributes the International GeoSample Number (IGSN). The IGSN is a 9-digit alphanumeric code that is assigned to specimens and related sampling features such as drill holes or wells to ensure their unique identification and unambiguous referencing of data generated by the study of samples. SESAR catalogs and preserves sample metadata profiles and then provides access to the sample catalog via search. SESAR allows for batch registration of samples, a clear need for the GSA when we are ultimately looking at hundreds of thousands of potential registrations. Through SESAR’s web interface, a batch file template can be generated (Figure 9). Once the necessary information is entered into the template (Figure 10), a simple process of copying from the USGIN content model, the samples can be registered. SESAR will then respond with the IGSNs of the samples once they are registered. By making the legacy data rescue and preservation process as simple as possible through the development of template workflows, such as that presented here, personnel are more likely to adopt and adhere to standards. Template workflows also simplify training of additional personnel to assist in the registration process. Ultimately this increases data and metadata exposure and interoperability.
The USGIN Physical Samples content model includes the information necessary for SESAR registration (e.g., Figures 7-8). SESAR operates the registry that distributes the International GeoSample Number (IGSN). The IGSN is a 9-digit alphanumeric code that is assigned to specimens and related sampling features such as drill holes or wells to ensure their unique identification and unambiguous referencing of data generated by the study of samples. SESAR catalogs and preserves sample metadata profiles and then provides access to the sample catalog via search. SESAR allows for batch registration of samples, a clear need for the GSA when we are ultimately looking at hundreds of thousands of potential registrations. Through SESAR’s web interface, a batch file template can be generated (Figure 9). Once the necessary information is entered into the template (Figure 10), a simple process of copying from the USGIN content model, the samples can be registered. SESAR will then respond with the IGSNs of the samples once they are registered.
Use Version Control (e.g., Git) or similar Build on existing standards – USGIN, OneGeology, etc Develop a workflow – iterative process to refine Involve current data holder as well as those not as familiar with the information – current archivist checks for accuracy, others check for usability Register your samples and data – thus, people can FIND the information and use it => becomes VALUABLE

Hills "If Not Now, When?" Presentation, GSA 2014

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Hills "If Not Now, When?" Presentation, GSA 2014

Notas do Editor