Annotating Research Datasets

•

0 gostou•545 visualizações

This document discusses annotating research datasets. It defines annotation as adding notes or explanations for clarification. Genome annotation attaches biological information to sequences. Research data annotation makes opaque data visible, sensible and valuable. It notes that many researchers have limited funding for data services and are not taught proper data management, so their datasets are difficult to find and curate. The document proposes that research data is well-suited for crowd-sourced annotation to help address these issues.

Tecnologia Educação

Annota&ng
Research

Datasets

1 1
A p r i l
2 0 1 3

U n i v e r s i t y
o f
C a l i f o r n i a
C u r a & o n
C e n t e r

C a l i f o r n i a
D i g i t a l
L i b r a r y

Term
skew

Annota&on:
The
act
of
adding
a
note
by
way
of

comment
or
explana&on.

Genome
annota&on:
The
process
of
aFaching

biological
informa&on
to
sequences.

E.g.,

•  Protein
Data
Bank
annota&on
manual:
247
pgs

Research
data
annota&on:
(?!)
Adding
to
opaque

data
to
make
it
visible,
sensible,
and
valuable.

The
Long
Tail

Size
of

dataset

#
datasets

The
Long
Tail

Size
of

dataset

#
datasets

#
researchers

The
Long
Tail

Size
of

dataset

#
datasets

#
researchers

#
grants

The
Long
Tail

Size
of

dataset

grant
($)

#
datasets

#
researchers

#
grants

The
Long
Tail

With
data
managers

and
fancy
tools

Size
of

dataset

grant
($)

#
datasets

#
researchers
Do-‐it-‐yourself
tools

#
grants

UGLY
TRUTH

Many
researchers…

have
limited
funding
for
data
services

are
not
taught
data
management

don’t
know
what
metadata
or
data
centers
are

don’t
share
data
publicly
or
store
it
in
an
archive

aren’t
convinced
they
should
share
data

From
Flickr
By

puck90

The research data problem

•  Journal article •  Research data
–  Uniquely and persistently –  Nope
identified
–  Concept of “publish” –  Not really

–  Multiple copies –  Typically one

–  Easily findable –  Difficult

–  Impact metrics, etc. –  Nope

–  Curation funding –  Barely

Research data is ripe for crowd-sourced annotation

Mais conteúdo relacionado

Mais procurados

METRO RDM WebinarVictoria Steeves

The Dataverse CommonsMerce Crosas

DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai

Collaborative Data Management using OSFC. Tobin Magle

Bringing bioinformatics into the libraryC. Tobin Magle

The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision

Data Citation Implementation at DataverseMerce Crosas

Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai

ITWS Capstone (RPI, Fall 2013)Rensselaer Polytechnic Institute

Data Management Planning - 02/21/13Lizzy_Rolando

110- Freyman Knowledge flows Linking big datasetinnovationoecd

Adding valuethroughdatacurationAPLICwebmaster

Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard

Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale

Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014TERN Australia

Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls

Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas

DataONE Education Module 03: Data Management PlanningDataONE

Mais procurados (18)

METRO RDM Webinar

The Dataverse Commons

DataTags, The Tags Toolset, and Dataverse Integration

Collaborative Data Management using OSF

Bringing bioinformatics into the library

The Dryad Digital Repository: Published evolutionary data as part of the gre...

Data Citation Implementation at Dataverse

Sharing Sensitive Data With Confidence: The DataTags system

ITWS Capstone (RPI, Fall 2013)

Data Management Planning - 02/21/13

110- Freyman Knowledge flows Linking big dataset

Adding valuethroughdatacuration

Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts

Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014

Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...

Data Publishing at Harvard's Research Data Access Symposium

DataONE Education Module 03: Data Management Planning

Destaque

YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze

Scalable Identifiers for Natural History CollectionsJohn Kunze

Media literacy for the information professionalBarbara Devilee

Putting the IFLA Media & Information Literacy Recommendations into practice i...Sheila Webber

Media and Information Literacy - A Thai Netizen perspectiveThai Netizen Network

Information literacy in a media-saturated worldPam Wilson

Media and Information Literacy: strength through diversitySheila Webber

Destaque (7)

YAMZ: a cross-domain crowd-sourced metadata vocabulary

Scalable Identifiers for Natural History Collections

Media literacy for the information professional

Putting the IFLA Media & Information Literacy Recommendations into practice i...

Media and Information Literacy - A Thai Netizen perspective

Information literacy in a media-saturated world

Media and Information Literacy: strength through diversity

Semelhante a Annotating Research Datasets

Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseGigaScience, BGI Hong Kong

Dataset Citation and Identificationguest453b14

Dataset citation and identificationAdam Farquhar

Dataset Citation and Identificationguest453b14

Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight

Talk at OHSU, September 25, 2013Anita de Waard

Data Communities - reusable data in and outside your organization.Paul Groth

NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone

NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone

Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi

SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone

Where's the Data?Andrea Payant

Guy avoiding-dat apocalypseENUG

DataCite: the Perfect Complement to CrossRefCrossref

RDAP13 John Kunze: The Data Management EcosystemASIS&T

Tools für das Management von ForschungsdatenHeinz Pampel

Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas

D paul ecn2013ECNOfficer

Data Management for Postgraduate students by Lynn Woolfreypvhead123

How and Why to Share Your Datakfear

Semelhante a Annotating Research Datasets (20)

Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse

Dataset Citation and Identification

Dataset citation and identification

Dataset Citation and Identification

Research Data Management: What is it and why is the Library & Archives Servic...

Talk at OHSU, September 25, 2013

Data Communities - reusable data in and outside your organization.

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014

Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final

SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...

Where's the Data?

Guy avoiding-dat apocalypse

DataCite: the Perfect Complement to CrossRef

RDAP13 John Kunze: The Data Management Ecosystem

Tools für das Management von Forschungsdaten

Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...

D paul ecn2013

Data Management for Postgraduate students by Lynn Woolfrey

How and Why to Share Your Data

Mais de John Kunze

The YAMZ MetadictionaryJohn Kunze

YAMZ Metadata Vocabulary BuilderJohn Kunze

The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze

EZID and N2T at CDLJohn Kunze

YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze

A Vocabulary for PersistenceJohn Kunze

Identifiers obey Resolvers not SchemesJohn Kunze

Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze

ARK identifiers: lessons learnt at BnF: paths forwardJohn Kunze

DataONE Preservation and Metadata Working Group Report 2014John Kunze

Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze

The Data Management EcosystemJohn Kunze

Library Tools Supporting Data-Rich ResearchJohn Kunze

Big Data's Long TailJohn Kunze

Pamwg 2012ahmJohn Kunze

Future-Proofing the Web: What We Can Do TodayJohn Kunze

Supporting Data-Rich Research on Many FrontsJohn Kunze

The ARK Identifier Scheme at Ten Years OldJohn Kunze

New Metaphors: Data Papers and Data CitationsJohn Kunze

Pairtrees for object storageJohn Kunze

Mais de John Kunze (20)

The YAMZ Metadictionary

YAMZ Metadata Vocabulary Builder

The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...

EZID and N2T at CDL

YAMZ.net: better, faster, cheaper taxonomy building

A Vocabulary for Persistence

Identifiers obey Resolvers not Schemes

Names, Things, and Open Identifier Infrastructure: N2T and ARKs

ARK identifiers: lessons learnt at BnF: paths forward

DataONE Preservation and Metadata Working Group Report 2014

Selected Bash shell tricks from Camp CDL breakout group

The Data Management Ecosystem

Library Tools Supporting Data-Rich Research

Big Data's Long Tail

Pamwg 2012ahm

Future-Proofing the Web: What We Can Do Today

Supporting Data-Rich Research on Many Fronts

The ARK Identifier Scheme at Ten Years Old

New Metaphors: Data Papers and Data Citations

Pairtrees for object storage

Último

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Artificial Intelligence: Facts and MythsJoaquim Jorge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

🐬 The future of MySQL is Postgres 🐘RTylerCroy

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Evaluating the top large language models.pdfChristopherTHyatt

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Annotating Research Datasets

1. Annota&ng Research Datasets 1 1 A p r i l 2 0 1 3 U n i v e r s i t y o f C a l i f o r n i a C u r a & o n C e n t e r C a l i f o r n i a D i g i t a l L i b r a r y

2. Term skew Annota&on: The act of adding a note by way of comment or explana&on. Genome annota&on: The process of aFaching biological informa&on to sequences. E.g., •  Protein Data Bank annota&on manual: 247 pgs Research data annota&on: (?!) Adding to opaque data to make it visible, sensible, and valuable.

3. The Long Tail Size of dataset # datasets

4. The Long Tail Size of dataset # datasets # researchers

5. The Long Tail Size of dataset # datasets # researchers # grants

6. The Long Tail Size of dataset grant ($) # datasets # researchers # grants

7. The Long Tail With data managers and fancy tools Size of dataset grant ($) # datasets # researchers Do-‐it-‐yourself tools # grants

8. UGLY TRUTH Many researchers… have limited funding for data services are not taught data management don’t know what metadata or data centers are don’t share data publicly or store it in an archive aren’t convinced they should share data From Flickr By puck90

9. The research data problem •  Journal article •  Research data –  Uniquely and persistently –  Nope identified –  Concept of “publish” –  Not really –  Multiple copies –  Typically one –  Easily findable –  Difficult –  Impact metrics, etc. –  Nope –  Curation funding –  Barely Research data is ripe for crowd-sourced annotation

Notas do Editor

10 minutes, Day 2, 9am April 11Abstract: A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging. The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.
One way of looking at Big Data is this graph showing dataset size on the vertical axis against numbers of datasets on the horizontal axis.While there are some very large, celebrated datasets produced by satellites, ocean sensors, etc., there’s a very long tail off to the right of smaller, more obscure datasets that cumulatively account for a large portion of Big Data.
There are many more researchers out in the field collecting heterogeneous data, such as species counts obtained by visual sightings.
And there are many more grants supporting this kind of research...
And those grants are usually much smaller in terms of dollar amounts.
As a result, the large, celebrated datasets tend to come with staff positions for data management, as well as well-supported, standardized software tools supporting rich description and discovery, and enforcing certain curation standards.So for a huge number of grants and datasets, especially in Earth, environmental, and ecological sciences, ...
... there’s an ugly truth. Many of these researchers,This amounts to a whole lot of inertia that keeps a large part of the scientific record invisible, at-risk, and unavailable for re-use.

Annotating Research Datasets

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Destaque

Destaque (7)

Semelhante a Annotating Research Datasets

Semelhante a Annotating Research Datasets (20)

Mais de John Kunze

Mais de John Kunze (20)

Último

Último (20)

Annotating Research Datasets

Notas do Editor