SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Annota&ng	
  Research	
  
      Datasets	
  
                                1 1 	
   A p r i l 	
   2 0 1 3 	
  

U n i v e r s i t y 	
   o f 	
   C a l i f o r n i a 	
   C u r a & o n 	
   C e n t e r 	
  
              C a l i f o r n i a 	
   D i g i t a l 	
   L i b r a r y 	
  
Term	
  skew	
  
Annota&on:	
  The	
  act	
  of	
  adding	
  a	
  note	
  by	
  way	
  of	
  
   comment	
  or	
  explana&on.	
  
Genome	
  annota&on:	
  The	
  process	
  of	
  aFaching	
  
   biological	
  informa&on	
  to	
  sequences.	
  	
  E.g.,	
  
•  Protein	
  Data	
  Bank	
  annota&on	
  manual:	
  247	
  pgs	
  
Research	
  data	
  annota&on:	
  (?!)	
  Adding	
  to	
  opaque	
  
   data	
  to	
  make	
  it	
  visible,	
  sensible,	
  and	
  valuable.	
  
The	
  Long	
  Tail	
  


Size	
  of	
  
dataset	
  




                       #	
  datasets	
  
The	
  Long	
  Tail	
  


Size	
  of	
  
dataset	
  




                         #	
  datasets	
  
                     #	
  researchers	
  
The	
  Long	
  Tail	
  


Size	
  of	
  
dataset	
  




                         #	
  datasets	
  
                     #	
  researchers	
  
                         #	
  grants	
  
The	
  Long	
  Tail	
  


 Size	
  of	
  
dataset	
  
grant	
  ($)	
  




                           #	
  datasets	
  
                       #	
  researchers	
  
                           #	
  grants	
  
The	
  Long	
  Tail	
  

                     With	
  data	
  managers	
  
                       and	
  fancy	
  tools	
  
 Size	
  of	
  
dataset	
  
grant	
  ($)	
  




                            #	
  datasets	
  
                        #	
  researchers	
   Do-­‐it-­‐yourself	
  tools	
  
                           #	
  grants	
  
UGLY	
                                                            TRUTH	
  

Many	
  researchers…	
  
have	
  limited	
  funding	
  for	
  data	
  services	
  
are	
  not	
  taught	
  data	
  management	
  
don’t	
  know	
  what	
  metadata	
  or	
  data	
  centers	
  are	
  
don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
aren’t	
  convinced	
  they	
  should	
  share	
  data	
  
                           From	
  Flickr	
  By	
  	
  puck90	
  
The research data problem	
  

•  Journal article               •  Research data
  –  Uniquely and persistently     –  Nope
     identified
  –  Concept of “publish”          –  Not really

  –  Multiple copies               –  Typically one

  –  Easily findable               –  Difficult

  –  Impact metrics, etc.          –  Nope

  –  Curation funding              –  Barely

 Research data is ripe for crowd-sourced annotation

Mais conteúdo relacionado

Mais procurados

The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse CommonsMerce Crosas
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Lizzy_Rolando
 
110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big datasetinnovationoecd
 
Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacurationAPLICwebmaster
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
 
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014TERN Australia
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE
 

Mais procurados (18)

METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13
 
110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset
 
Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacuration
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 

Destaque

YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsJohn Kunze
 
Media literacy for the information professional
Media literacy for the information professionalMedia literacy for the information professional
Media literacy for the information professionalBarbara Devilee
 
Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Sheila Webber
 
Media and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveMedia and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveThai Netizen Network
 
Information literacy in a media-saturated world
Information literacy in a media-saturated worldInformation literacy in a media-saturated world
Information literacy in a media-saturated worldPam Wilson
 
Media and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversityMedia and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversitySheila Webber
 

Destaque (7)

YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 
Media literacy for the information professional
Media literacy for the information professionalMedia literacy for the information professional
Media literacy for the information professional
 
Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...
 
Media and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveMedia and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspective
 
Information literacy in a media-saturated world
Information literacy in a media-saturated worldInformation literacy in a media-saturated world
Information literacy in a media-saturated world
 
Media and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversityMedia and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversity
 

Semelhante a Annotating Research Datasets

Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseGigaScience, BGI Hong Kong
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identificationAdam Farquhar
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi
 
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefCrossref
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemASIS&T
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Datakfear
 

Semelhante a Annotating Research Datasets (20)

Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identification
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
 
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 

Mais de John Kunze

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ MetadictionaryJohn Kunze
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderJohn Kunze
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDLJohn Kunze
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for PersistenceJohn Kunze
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesJohn Kunze
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardJohn Kunze
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchJohn Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long TailJohn Kunze
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayJohn Kunze
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsJohn Kunze
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldJohn Kunze
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsJohn Kunze
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storageJohn Kunze
 

Mais de John Kunze (20)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storage
 

Último

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Annotating Research Datasets

  • 1. Annota&ng  Research   Datasets   1 1   A p r i l   2 0 1 3   U n i v e r s i t y   o f   C a l i f o r n i a   C u r a & o n   C e n t e r   C a l i f o r n i a   D i g i t a l   L i b r a r y  
  • 2. Term  skew   Annota&on:  The  act  of  adding  a  note  by  way  of   comment  or  explana&on.   Genome  annota&on:  The  process  of  aFaching   biological  informa&on  to  sequences.    E.g.,   •  Protein  Data  Bank  annota&on  manual:  247  pgs   Research  data  annota&on:  (?!)  Adding  to  opaque   data  to  make  it  visible,  sensible,  and  valuable.  
  • 3. The  Long  Tail   Size  of   dataset   #  datasets  
  • 4. The  Long  Tail   Size  of   dataset   #  datasets   #  researchers  
  • 5. The  Long  Tail   Size  of   dataset   #  datasets   #  researchers   #  grants  
  • 6. The  Long  Tail   Size  of   dataset   grant  ($)   #  datasets   #  researchers   #  grants  
  • 7. The  Long  Tail   With  data  managers   and  fancy  tools   Size  of   dataset   grant  ($)   #  datasets   #  researchers   Do-­‐it-­‐yourself  tools   #  grants  
  • 8. UGLY   TRUTH   Many  researchers…   have  limited  funding  for  data  services   are  not  taught  data  management   don’t  know  what  metadata  or  data  centers  are   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data   From  Flickr  By    puck90  
  • 9. The research data problem   •  Journal article •  Research data –  Uniquely and persistently –  Nope identified –  Concept of “publish” –  Not really –  Multiple copies –  Typically one –  Easily findable –  Difficult –  Impact metrics, etc. –  Nope –  Curation funding –  Barely Research data is ripe for crowd-sourced annotation

Notas do Editor

  1. 10 minutes, Day 2, 9am April 11Abstract: A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging.  The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.
  2. One way of looking at Big Data is this graph showing dataset size on the vertical axis against numbers of datasets on the horizontal axis.While there are some very large, celebrated datasets produced by satellites, ocean sensors, etc., there’s a very long tail off to the right of smaller, more obscure datasets that cumulatively account for a large portion of Big Data.
  3. There are many more researchers out in the field collecting heterogeneous data, such as species counts obtained by visual sightings.
  4. And there are many more grants supporting this kind of research...
  5. And those grants are usually much smaller in terms of dollar amounts.
  6. As a result, the large, celebrated datasets tend to come with staff positions for data management, as well as well-supported, standardized software tools supporting rich description and discovery, and enforcing certain curation standards.So for a huge number of grants and datasets, especially in Earth, environmental, and ecological sciences, ...
  7. ... there’s an ugly truth. Many of these researchers,This amounts to a whole lot of inertia that keeps a large part of the scientific record invisible, at-risk, and unavailable for re-use.