SlideShare uma empresa Scribd logo
1 de 25
Small Data: How Elsevier Might Help
 With Research Data Management

            David Marques
           27 February 2013

        Research Data Symposium
           Columbia University
Assertions

• We share a common goal: an open system of
  ubiquitous sharing of research data in
  repositories that are
   – discipline-specific
   – controlled-vocabulary annotated
   – Normalized
• A very small portion of research data is being
  shared to the discipline-specific repositories




                                                   2
Problem statement

• There are a lot of barriers to sharing of data
• There are problems with sustainable funding for
  repositories




                                                3
Points of this presentation


• We can help remove the barriers by
   –   applying rigorous yet efficient process
   –   using discipline-specific informatics skills
   –   providing credit assignment and assessment
   –   helping capture metadata early and digital



• It is possible, and we can help to create
  sustainable funding models for open data
  repositories

                                                      4
Big Data vs Research Data
                        Plan

                                                  Data life cycle taken from DataONE
       Analyze                     Collect


                                                                            Plan


Integrate        'Big Data' Emphasis     Assure
                                                          Analyze                        Collect



                                                                    Research Data Pain
      Discover                     Describe
                                                   Integrate                                   Assure


                      Preserve


                                                         Discover                        Describe



                                                                          Preserve



                                                                                     5
Dataset Repositories: MANY solutions
•   Figshare [http://figshare.com/] (Digital Science)
•   GigaDB [http://gigadb.org/] (BioMed Central)
•   DataDryad [http://datadryad.org/]
•   Australian National Data Service [http://www.ands.org.au/]
    – but: their goal is to move from




• Amazon’s Glacier [http://aws.amazon.com/glacier/]
                                                         6
Problem 1: Barriers to Data Disclosure and Sharing

• Non-digital Metadata          • Open to mis-interpretation
• Different skill sets          • Lack of credit
• Takes time and mindset        • Intellectual property or
  away from research              possible patent issues
• Requires common               • Easier contradiction
  nomenclature                  • No incentive, little value to
• Cost                            the sharer
• It is a long-tail problem:    • Privacy and security
  thousands of narrow             concerns
  solutions provides the best
  value


                                                            7
Are supplemental files the answer?
• Scope
   – 15% of 2012 Elsevier articles had supplemental files
   – ~ 1% have spreadsheets
   – ~ 2% have either spreadsheets or zip files
• Extracting value
   – no rules for supplemental files
   – no common nomenclatures
   – analytics, comparisons, trends are hard
• Elsevier recommends (and some journals such as Cell Press
  journals require) that authors share/deposit data in
  discipline repositories
• Linking helps ovecome the credit barrier
   – Elsevier links articles to/from datasets in open repositories
   – 35 today (including EarthChem)
   – 10 more in progress                                             8
9
10
11
12
Problem 2: Sustainability

• Many are grant-funded initially, as research projects – and
  funding bodies often do not intend to fund repositories long
  term

• Can we fund from a Gold Open Access model?
• Can we fund from high-end analytics subscriptions?
• Can we fund some of them from health care and corporate
  use?




                                                        13
PLAN
                                                                                                             10%     PROPOSE


                                                         SUPPORT SERVICES



                                                                                           25%

                 19%
                                                                                                             ACQUISITION
                                                                                15%
        ACCESS                                                                                          submission agreement
                                                            STORAGE,                                         data formats
 searching and ordering                                 DATA MANAGEMENT                                        IP rules
       user guides                                                                                     user documentation and
delivery of result sets and                                                                                    support
          reports
                                                                                                                        6%

                                                                INGEST
                                                                                           25%
                                                                receive
                                                          QA and validation
                                                              transform
                                                    create metadata (taxonomies)
                                                               updates                                               PRODUCE/
  PUBLISH
                                                          reference linking                                           MANAGE



                              Summary of data in: Keeping Research Data Safe2, Beagrie et al, 2010 funded by JISC           14
Pain Points and Elsevier Strengths and Expertise
• Taxonomies
    – 50+ discipline-specific taxonomies – core to Elsevier
• At-scale, efficient, best-practices process
• At-scale analytics




• Turning freely-available data into high-value solutions for corporate use
  without advertising (advertising models require very large customer groups)




• Impact analysis and reporting
                                                                  15
Research Data Services – new group at Elsevier
• Goals
   – Increase archiving and sharing of research data (as
      requested by funding bodies)
    – Increase the value of shared data (with metadata)
    – Foster and assist with the credit and impact assessment of
      research data for the researcher, the institution, and the funding
      bodies
    – Increase the sustainability of data repositories
• Principles
   – Open data – all data remain open and available
   – Collaborative – with institutions, the research community,
        funding bodies
    – Transparent business model – if we make money, some goes
        back to fund the repositories
                                                                    16
Pilot: see if we            Research Data Management
  can scale a                                               Plan                                              Pilot: collecting
  repository and                                                                                              data with an
  make it                                                                                                     app, integrating




                                                          Data Management
  financially                                                                                                 and sharing with
  sustainable              Analyze                                                                 Collect    a dashboard




                                                                Plan
                              An
                                aly




                                                                                         c t nd
                                    t
                                Do i c s




                                                                                               e
                                                                                       ru a
                                                                                            ur
                                                                                     st u s
                                   m En
Pilot: user




                                                                                  fra B
                                      ain gi
                                         K nes




                                                                               I n ata
LDR to                                           ,




                                                                                   D
                                                                                                                     Pilot: collect and
connect                                                                                                              standardize
                                                                                       Method Tools
data from
different
               Integrate     Linked Data
                             Repositories
                                                     RDM                                (VizTrails)
                                                                                                              Assure
                                                                                                                     method and
                                                                                                                     provenance
repositories                                                                                                  IEDA/EarthC
                                                 s,                               T
                                               ie ries                         B e ax                         ube
to create                                   o m to                                st ono
                                                          Repositories, Data

                                        x on irec                                   Pr m
                                                                                      ac ie                   collaboration
insight
                                                             Mgmt Plans


                                      Ta , D                                            tic s,
                                         O                                                 es                 with Kerstin
                                      SE
                           Discover                                                                Describe
                                                                                                              Lehnert.
                                                                                                               Pilot: annotate
  Pilot: create                                                                                                data and
  directories to                                                                                               methods with
  help discover                                                                                                standard
                                                         Preserve
  data in shared                                                                                               taxonomies
  repositories                                                                                                             17
Disclosure Pilot Benefits for the Researcher

• Immediate visibility and overview of the research (PI
  Dashboard)
• Enhanced discoverability of research data attributable to the
  university and the research team
• Credit/impact for the university, the research team, and the
  funding bodies
• Acknowledgement by the funding bodies of the
  disclosure/sharing of the data
• [better, faster science]



                                                        18
Disclosure Pilot Benefits for the Institution


• Increased rigor of data management
   – consistency
   – best practices
   – overview metadata in research management information systems
• Step toward completeness of research data management
• Compliance to funding body requirements, stronger base
  from which to request
• Increased visibility, discoverability, credit




                                                             19
Disclosure Pilot Benefits for the Funding Body


• Increased data disclosure and sharing
• Increased discoverability of data (with funding body credit)
• Increased opportunity for ‘fourth paradigm’ (analytics-
  derived) science – better, faster science
• Credit/impact for sponsored research
• Standardization and best practices in data management plans
  and actual data curation/preservation




                                                       20
Research funding
Today’s funding models   Data mgmt (Gold OA)
                         FREE
                         License or subs.




                               21
Research funding
Increasingly common models   Data mgmt (Gold OA)
                             FREE
                             License or subs.




                                  Translational
                                    Medicine
                                    Analytics




                                   22
Research funding
Working together, we could do this   Data mgmt (Gold OA)
                                     FREE
                                     License or subs.




                                          Task-specific
                                            Analytics




                                           23
An interesting quote at the IDCC13 cost workshop


       [loosely quoted, I did not catch it verbatim]



We can’t do this by ourselves. We should get someone with
            business savvy to partner with us.




                                                       24
?

    25

Mais conteúdo relacionado

Semelhante a Small Data: How Elsevier Might Help with Research Data Management

Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemGlobus
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
Improve Governance with Autoclassification
Improve Governance with AutoclassificationImprove Governance with Autoclassification
Improve Governance with AutoclassificationAIIM International
 
Researcher KnowHow: Research Data Management
Researcher KnowHow: Research Data ManagementResearcher KnowHow: Research Data Management
Researcher KnowHow: Research Data ManagementLivUniLibrary
 
Scientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyScientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyDave Govoni
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017ARDC
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
The Analytic Trifecta: Abstraction, the Cloud, and Visualization
The Analytic Trifecta: Abstraction, the Cloud, and VisualizationThe Analytic Trifecta: Abstraction, the Cloud, and Visualization
The Analytic Trifecta: Abstraction, the Cloud, and VisualizationBirst
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...TERN Australia
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...OSTHUS
 
Data Science Salon 2018 - Building a true enterprise data governance platform...
Data Science Salon 2018 - Building a true enterprise data governance platform...Data Science Salon 2018 - Building a true enterprise data governance platform...
Data Science Salon 2018 - Building a true enterprise data governance platform...Data Con LA
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...TERN Australia
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Whitehead Seminar 5/2
Whitehead Seminar 5/2Whitehead Seminar 5/2
Whitehead Seminar 5/2Physion
 
Research data lifecycle diagram
Research data lifecycle diagramResearch data lifecycle diagram
Research data lifecycle diagramSteven Cracknell
 

Semelhante a Small Data: How Elsevier Might Help with Research Data Management (20)

Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Improve Governance with Autoclassification
Improve Governance with AutoclassificationImprove Governance with Autoclassification
Improve Governance with Autoclassification
 
Researcher KnowHow: Research Data Management
Researcher KnowHow: Research Data ManagementResearcher KnowHow: Research Data Management
Researcher KnowHow: Research Data Management
 
Scientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological SurveyScientific Information Management at the U.S. Geological Survey
Scientific Information Management at the U.S. Geological Survey
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
The Analytic Trifecta: Abstraction, the Cloud, and Visualization
The Analytic Trifecta: Abstraction, the Cloud, and VisualizationThe Analytic Trifecta: Abstraction, the Cloud, and Visualization
The Analytic Trifecta: Abstraction, the Cloud, and Visualization
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
 
Data Science Salon 2018 - Building a true enterprise data governance platform...
Data Science Salon 2018 - Building a true enterprise data governance platform...Data Science Salon 2018 - Building a true enterprise data governance platform...
Data Science Salon 2018 - Building a true enterprise data governance platform...
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Whitehead Seminar 5/2
Whitehead Seminar 5/2Whitehead Seminar 5/2
Whitehead Seminar 5/2
 
Research data lifecycle diagram
Research data lifecycle diagramResearch data lifecycle diagram
Research data lifecycle diagram
 

Mais de Elsevier

Infographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsInfographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsElsevier
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
Zen and the Art of Data Science Maintenance
Zen and the Art of Data Science MaintenanceZen and the Art of Data Science Maintenance
Zen and the Art of Data Science MaintenanceElsevier
 
Machine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDMachine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDElsevier
 
Gender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringGender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringElsevier
 
Elsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Elsevier
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member SurveyElsevier
 
Food Security: an information provider’s view
Food Security: an information provider’s viewFood Security: an information provider’s view
Food Security: an information provider’s viewElsevier
 
Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Elsevier
 
Sustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeSustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeElsevier
 
Research Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryResearch Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryElsevier
 
Mendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoMendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoElsevier
 
Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble NurseElsevier
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Elsevier
 
Open access survey
Open access surveyOpen access survey
Open access surveyElsevier
 
Presentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchPresentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchElsevier
 
Culinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesCulinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesElsevier
 

Mais de Elsevier (20)

Infographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trendsInfographic infectious disease outbreaks research trends
Infographic infectious disease outbreaks research trends
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Zen and the Art of Data Science Maintenance
Zen and the Art of Data Science MaintenanceZen and the Art of Data Science Maintenance
Zen and the Art of Data Science Maintenance
 
Machine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhDMachine Learning and AI, by Helena Deus, PhD
Machine Learning and AI, by Helena Deus, PhD
 
Gender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on EngineeringGender Report 2017 Infographic – Focus on Engineering
Gender Report 2017 Infographic – Focus on Engineering
 
Elsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer ScienceElsevier Gender Report Infographic – Focus on Computer Science
Elsevier Gender Report Infographic – Focus on Computer Science
 
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017
 
Elsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot InfographicElsevier Cancer Moonshot Infographic
Elsevier Cancer Moonshot Infographic
 
Elsevier Society Member Survey
Elsevier Society Member SurveyElsevier Society Member Survey
Elsevier Society Member Survey
 
Food Security: an information provider’s view
Food Security: an information provider’s viewFood Security: an information provider’s view
Food Security: an information provider’s view
 
Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015Response from OFAC to Elsevier, October 2015
Response from OFAC to Elsevier, October 2015
 
Sustainability Science in a Global Landscape
Sustainability Science in a Global LandscapeSustainability Science in a Global Landscape
Sustainability Science in a Global Landscape
 
Research Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive SummaryResearch Performance in South-East Asia: Executive Summary
Research Performance in South-East Asia: Executive Summary
 
Mendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to PlutoMendeley Report: New Horizons: From Research Paper to Pluto
Mendeley Report: New Horizons: From Research Paper to Pluto
 
Infographic: The Noble Nurse
Infographic: The Noble NurseInfographic: The Noble Nurse
Infographic: The Noble Nurse
 
Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit Jennifer Saul's presentation for Cambridge University's gender equality summit
Jennifer Saul's presentation for Cambridge University's gender equality summit
 
Open access survey
Open access surveyOpen access survey
Open access survey
 
Presentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM ResearchPresentation: A Decade of Development in Sub-Saharan African STEM Research
Presentation: A Decade of Development in Sub-Saharan African STEM Research
 
Culinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoesCulinary Nutrition: garlic-enhanced-mashed-potatoes
Culinary Nutrition: garlic-enhanced-mashed-potatoes
 

Small Data: How Elsevier Might Help with Research Data Management

  • 1. Small Data: How Elsevier Might Help With Research Data Management David Marques 27 February 2013 Research Data Symposium Columbia University
  • 2. Assertions • We share a common goal: an open system of ubiquitous sharing of research data in repositories that are – discipline-specific – controlled-vocabulary annotated – Normalized • A very small portion of research data is being shared to the discipline-specific repositories 2
  • 3. Problem statement • There are a lot of barriers to sharing of data • There are problems with sustainable funding for repositories 3
  • 4. Points of this presentation • We can help remove the barriers by – applying rigorous yet efficient process – using discipline-specific informatics skills – providing credit assignment and assessment – helping capture metadata early and digital • It is possible, and we can help to create sustainable funding models for open data repositories 4
  • 5. Big Data vs Research Data Plan Data life cycle taken from DataONE Analyze Collect Plan Integrate 'Big Data' Emphasis Assure Analyze Collect Research Data Pain Discover Describe Integrate Assure Preserve Discover Describe Preserve 5
  • 6. Dataset Repositories: MANY solutions • Figshare [http://figshare.com/] (Digital Science) • GigaDB [http://gigadb.org/] (BioMed Central) • DataDryad [http://datadryad.org/] • Australian National Data Service [http://www.ands.org.au/] – but: their goal is to move from • Amazon’s Glacier [http://aws.amazon.com/glacier/] 6
  • 7. Problem 1: Barriers to Data Disclosure and Sharing • Non-digital Metadata • Open to mis-interpretation • Different skill sets • Lack of credit • Takes time and mindset • Intellectual property or away from research possible patent issues • Requires common • Easier contradiction nomenclature • No incentive, little value to • Cost the sharer • It is a long-tail problem: • Privacy and security thousands of narrow concerns solutions provides the best value 7
  • 8. Are supplemental files the answer? • Scope – 15% of 2012 Elsevier articles had supplemental files – ~ 1% have spreadsheets – ~ 2% have either spreadsheets or zip files • Extracting value – no rules for supplemental files – no common nomenclatures – analytics, comparisons, trends are hard • Elsevier recommends (and some journals such as Cell Press journals require) that authors share/deposit data in discipline repositories • Linking helps ovecome the credit barrier – Elsevier links articles to/from datasets in open repositories – 35 today (including EarthChem) – 10 more in progress 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. Problem 2: Sustainability • Many are grant-funded initially, as research projects – and funding bodies often do not intend to fund repositories long term • Can we fund from a Gold Open Access model? • Can we fund from high-end analytics subscriptions? • Can we fund some of them from health care and corporate use? 13
  • 14. PLAN 10% PROPOSE SUPPORT SERVICES 25% 19% ACQUISITION 15% ACCESS submission agreement STORAGE, data formats searching and ordering DATA MANAGEMENT IP rules user guides user documentation and delivery of result sets and support reports 6% INGEST 25% receive QA and validation transform create metadata (taxonomies) updates PRODUCE/ PUBLISH reference linking MANAGE Summary of data in: Keeping Research Data Safe2, Beagrie et al, 2010 funded by JISC 14
  • 15. Pain Points and Elsevier Strengths and Expertise • Taxonomies – 50+ discipline-specific taxonomies – core to Elsevier • At-scale, efficient, best-practices process • At-scale analytics • Turning freely-available data into high-value solutions for corporate use without advertising (advertising models require very large customer groups) • Impact analysis and reporting 15
  • 16. Research Data Services – new group at Elsevier • Goals – Increase archiving and sharing of research data (as requested by funding bodies) – Increase the value of shared data (with metadata) – Foster and assist with the credit and impact assessment of research data for the researcher, the institution, and the funding bodies – Increase the sustainability of data repositories • Principles – Open data – all data remain open and available – Collaborative – with institutions, the research community, funding bodies – Transparent business model – if we make money, some goes back to fund the repositories 16
  • 17. Pilot: see if we Research Data Management can scale a Plan Pilot: collecting repository and data with an make it app, integrating Data Management financially and sharing with sustainable Analyze Collect a dashboard Plan An aly c t nd t Do i c s e ru a ur st u s m En Pilot: user fra B ain gi K nes I n ata LDR to , D Pilot: collect and connect standardize Method Tools data from different Integrate Linked Data Repositories RDM (VizTrails) Assure method and provenance repositories IEDA/EarthC s, T ie ries B e ax ube to create o m to st ono Repositories, Data x on irec Pr m ac ie collaboration insight Mgmt Plans Ta , D tic s, O es with Kerstin SE Discover Describe Lehnert. Pilot: annotate Pilot: create data and directories to methods with help discover standard Preserve data in shared taxonomies repositories 17
  • 18. Disclosure Pilot Benefits for the Researcher • Immediate visibility and overview of the research (PI Dashboard) • Enhanced discoverability of research data attributable to the university and the research team • Credit/impact for the university, the research team, and the funding bodies • Acknowledgement by the funding bodies of the disclosure/sharing of the data • [better, faster science] 18
  • 19. Disclosure Pilot Benefits for the Institution • Increased rigor of data management – consistency – best practices – overview metadata in research management information systems • Step toward completeness of research data management • Compliance to funding body requirements, stronger base from which to request • Increased visibility, discoverability, credit 19
  • 20. Disclosure Pilot Benefits for the Funding Body • Increased data disclosure and sharing • Increased discoverability of data (with funding body credit) • Increased opportunity for ‘fourth paradigm’ (analytics- derived) science – better, faster science • Credit/impact for sponsored research • Standardization and best practices in data management plans and actual data curation/preservation 20
  • 21. Research funding Today’s funding models Data mgmt (Gold OA) FREE License or subs. 21
  • 22. Research funding Increasingly common models Data mgmt (Gold OA) FREE License or subs. Translational Medicine Analytics 22
  • 23. Research funding Working together, we could do this Data mgmt (Gold OA) FREE License or subs. Task-specific Analytics 23
  • 24. An interesting quote at the IDCC13 cost workshop [loosely quoted, I did not catch it verbatim] We can’t do this by ourselves. We should get someone with business savvy to partner with us. 24
  • 25. ? 25

Notas do Editor

  1. Metadata are not captured digitallyOpen to mis-interpretationLack of creditIntellectual property or possible patent issuesEasier contradictionNo incentive, little value to the sharer, even dis-incented from current reward modelsDifferent skill setsTakes time and mindset away from researchRequires common nomenclaturemissing in many domainsnomenclature convergence only happens in mature sciencemany researchers are invested in nomenclature discussionsPrivacy and security concernsCostIt is a long-tail problem: thousands of narrow solutions provides the best value
  2. Analytics at scaleMEDai: analyze every treatment event in a hospital for protocol variationsRisk Solutions: analyze public data for fraud detection and predictionShepardizing legal casesFunding from freeReaxys (chemical reactions database, literature and patents)Chemical resistance of plastics (manufacturer data normalized)Pathway Studio (enzymatic pathways for drug discovery, eventually personalized medicine)Geofacets (geologic information for exploration)LexisNexis