SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
A data repository system for sharing and
archiving research data
A Solution for Publishing FAIR research data:
Findable,Accessible, Interoperable, Reusable
@dataverseorg @mercecrosas
But it’s self-correcting
Science isn’t broken
“It’s just a hell a lot
harder than we give it
credit for”
Self-correction in science?
“trust, but verify”
Data (and code) should be
shared for other
researchers to inspect, test
and reuse
Findable Accessible
Interoperable and
Reusable (FAIR)
Research Data Lifecycle
(collect, process, analyze, compute)
Publish and
Archive Data
Publish Research
Results
Data
Citation
Metadata
Access
Controls
http://dataverse.org
Harvard Dataverse:
Generic data repository open
to researchers world wide
http://dataverse.harvard.edu
>1500 dataverses
(15% from Harvard)
> 65,000 datasets
> 1.7 Million downloads
Biomedical Dataverse:
Support for biomedical
large-scale data (pilot phase)
Dataverse Now Dataverse Big
Up to 100 files per dataset
Up to ~ Gb per file
> 1000s files per dataset
> Gb per file
Large file directory
structure or
Streaming data source
Social Science Big Data
Dataverse Projects supporting Big Data
Integration with Data
Management Systems
+
+
at HMS
at UNC
Biomedical Big Data
• 1000s image files per data set
• Non-http batch upload
• Data files in directory structure
Cloud Dataverse
+ Swift Storage
(scalable, access to
computing)
A Billion
streaming
Geotweets

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
Coordinated identifier infrastructure enabling Geoscience researchers to meet...
Coordinated identifier infrastructure enabling Geoscience researchers to meet...Coordinated identifier infrastructure enabling Geoscience researchers to meet...
Coordinated identifier infrastructure enabling Geoscience researchers to meet...
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NL
 
Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Accessing chemical health and safety data online using Royal Society of Chemi...
Accessing chemical health and safety data online using Royal Society of Chemi...Accessing chemical health and safety data online using Royal Society of Chemi...
Accessing chemical health and safety data online using Royal Society of Chemi...
 
Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough Genome sharing projects around the world - Open Access is not enough
Genome sharing projects around the world - Open Access is not enough
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
PerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer Informatics Overview
PerkinElmer Informatics Overview
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
 
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa... OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 
Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 

Semelhante a It summit dataverse-bigdata-mercecrosas

Semelhante a It summit dataverse-bigdata-mercecrosas (20)

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscape
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 

Mais de kevin_donovan

Mais de kevin_donovan (20)

It summit data mgmt-2016.06.02-final
It summit data mgmt-2016.06.02-finalIt summit data mgmt-2016.06.02-final
It summit data mgmt-2016.06.02-final
 
2016 it summit_accessibility_2016-05-24_standard
2016 it summit_accessibility_2016-05-24_standard2016 it summit_accessibility_2016-05-24_standard
2016 it summit_accessibility_2016-05-24_standard
 
Fphs informatics for 2016 it summit 160531
Fphs informatics for 2016 it summit   160531Fphs informatics for 2016 it summit   160531
Fphs informatics for 2016 it summit 160531
 
It summit 2016_combined
It summit 2016_combinedIt summit 2016_combined
It summit 2016_combined
 
Hms crash planitsummit2016
Hms crash planitsummit2016Hms crash planitsummit2016
Hms crash planitsummit2016
 
Lightbox ham it_summit_final
Lightbox ham it_summit_finalLightbox ham it_summit_final
Lightbox ham it_summit_final
 
It summit salesforce
It summit salesforceIt summit salesforce
It summit salesforce
 
Harvard it summit 2016 - opencast in the cloud at harvard dce- live and on-d...
Harvard it summit 2016  - opencast in the cloud at harvard dce- live and on-d...Harvard it summit 2016  - opencast in the cloud at harvard dce- live and on-d...
Harvard it summit 2016 - opencast in the cloud at harvard dce- live and on-d...
 
Fa qs 2016-04-21
Fa qs 2016-04-21Fa qs 2016-04-21
Fa qs 2016-04-21
 
Tlt and friends it summit 2016
Tlt and friends it summit 2016Tlt and friends it summit 2016
Tlt and friends it summit 2016
 
2016 it summit_accessibility_2016-05-24_standard
2016 it summit_accessibility_2016-05-24_standard2016 it summit_accessibility_2016-05-24_standard
2016 it summit_accessibility_2016-05-24_standard
 
Harvard phone it summit demo 06.02.16
Harvard phone it summit demo 06.02.16Harvard phone it summit demo 06.02.16
Harvard phone it summit demo 06.02.16
 
Phish, flop, or fine
Phish, flop, or fine Phish, flop, or fine
Phish, flop, or fine
 
Waldo Summit 2016
Waldo Summit 2016Waldo Summit 2016
Waldo Summit 2016
 
IT Academy at IT Summti
IT Academy at IT SummtiIT Academy at IT Summti
IT Academy at IT Summti
 
Mobile firstpresentation huit
Mobile firstpresentation huitMobile firstpresentation huit
Mobile firstpresentation huit
 
Saving our social_media
Saving our social_mediaSaving our social_media
Saving our social_media
 
Urc it summit-2
Urc it summit-2Urc it summit-2
Urc it summit-2
 
Tlt success
Tlt successTlt success
Tlt success
 
Stakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outageStakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outage
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

It summit dataverse-bigdata-mercecrosas

  • 1. A data repository system for sharing and archiving research data A Solution for Publishing FAIR research data: Findable,Accessible, Interoperable, Reusable @dataverseorg @mercecrosas
  • 2. But it’s self-correcting Science isn’t broken “It’s just a hell a lot harder than we give it credit for”
  • 3. Self-correction in science? “trust, but verify” Data (and code) should be shared for other researchers to inspect, test and reuse
  • 5.
  • 6. Research Data Lifecycle (collect, process, analyze, compute) Publish and Archive Data Publish Research Results Data Citation Metadata Access Controls
  • 7. http://dataverse.org Harvard Dataverse: Generic data repository open to researchers world wide http://dataverse.harvard.edu >1500 dataverses (15% from Harvard) > 65,000 datasets > 1.7 Million downloads Biomedical Dataverse: Support for biomedical large-scale data (pilot phase)
  • 8. Dataverse Now Dataverse Big Up to 100 files per dataset Up to ~ Gb per file > 1000s files per dataset > Gb per file Large file directory structure or Streaming data source
  • 9. Social Science Big Data Dataverse Projects supporting Big Data Integration with Data Management Systems + + at HMS at UNC Biomedical Big Data • 1000s image files per data set • Non-http batch upload • Data files in directory structure Cloud Dataverse + Swift Storage (scalable, access to computing) A Billion streaming Geotweets