O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Making Small Data BIG (UT Austin, March 2016)

497 visualizações

Publicada em

Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.

Publicada em: Ciências
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Making Small Data BIG (UT Austin, March 2016)

  1. 1. Kerstin Lehnert Lamont -Doherty Earth Observatory of Columbia University Palisades, NY, 10964 Success and Challenges in the Earth Sciences
  2. 2. Monday’s Musings: Beyond The Three V’s of Big Data – Viscosity and Virality February 27, 2012 by R "Ray" Wang http://blog.softwareinsider.org/2012/02/27/mondays- musings-beyond-the-three-vs-of-big-data-viscosity-and- virality/ 2 ValueThe sixth ‘V’: 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  3. 3. • heterogeneous • customized & optimized for research questions • lack of data standards • culture of data ‘hording’ • lack of data infrastructure (facilities) Making Small Data BIG: Succss and Challenges in the Earth Sciences 3 3/22/2016
  4. 4. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 4 “While the data volumes are small when viewed individually, in total they represent a very significant portion of the country’s scientific output.” “The long tail is a breeding ground for new ideas and never before attempted science.” (Heidorn, B. 2008: “Shedding Light on the Dark Data in the Long Tail of Science”)
  5. 5. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 5
  6. 6. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 6 … that form a picture
  7. 7. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 7 The PetDB Synthesis Map shows data from >300 publications Symbols are locations of rock samples. Color is scaled to the 87Sr/86Sr isotope ratio in the rocks.
  8. 8. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 8
  9. 9. 9 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences “Understanding where the dust that's in the atmosphere and oceans comes from can help scientists estimate its impact on earth's climate system.” Bess Koffman, Michael Kaplan, Steven Goldstein, Gisela Winckler (LDEO), Natalie Mahowald (Cornell) http://blogs.ei.columbia.edu/2014/03/13/did-new-zealand-dust-influence-the-last-ice-age/ Example #1: Did New Zealand Dust Influence the Last Ice Age?
  10. 10. Making Small Data BIG: Succss and Challenges in the Earth Sciences 10 3/22/2016
  11. 11. Making Small Data BIG: Succss and Challenges in the Earth Sciences 11 3/22/2016
  12. 12. Making Small Data BIG: Succss and Challenges in the Earth Sciences 12 3/22/2016
  13. 13. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 13 Note the number of data points generated in this study (the yellow dots) in light of the effort that included collecting samples in NZ to operating expensive equipment in the lab.
  14. 14. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 14 Example #2: Do convergent margin volcanoes really represent continental crust? “As it is crucial to understand the extent and origin of the compositional difference between central Aleutian lavas and plutons through time and space, this project will map and sample plutonic rocks exposed on the central Aleutians and their coeval volcanic host rocks.” “Results and the samples acquired in this study will help to answer fundamental questions of continental crust formation, and shed light on the formation mechanisms of plutons and volcanics in arcs.” http://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=135851&org=NSF
  15. 15. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 15 Anticipated Data: • ~ 250 samples • ~ 200 major element analyses • ~ 150 trace element analyses • 50 U/Pb zircon geochronology • 30 Ar-Ar ages • 80 Sr, Nd, Hf and Pb isotope analyses • 4 scientists (3 institutions) • 5 weeks on remote islands • a boat (with crew) • a helicopter
  16. 16. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 16
  17. 17. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 17
  18. 18. Making Small Data BIG: Succss and Challenges in the Earth Sciences 18 3/22/2016
  19. 19. • They are widely dispersed in the literature (past & present). • They are not openly accessible. • They lack sufficient and standardized metadata. • They are never published (“dark data”). 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 19
  20. 20. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 20 findable identification, persistence accessible protection, protocols context, provenance re-usable harmonized, machine-readable interoperable small data Data Curation Standards Generic Repositories Domain-specific Data Standards Community Data Collections V a l u e
  21. 21. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 21 findable identification, persistence accessible protection, protocols context, provenance re-usable harmonized, machine-readable interoperable small data Data Curation Standards Domain-specific Data Standards V a l u e Domain Repositories
  22. 22. Making Small Data BIG: Succss and Challenges in the Earth Sciences 22 Science Community Domain specific Data facility 22 Libraries Archives CI, Computer Science Publishers, editors Discipline-specific data services • Context & provenance metadata • Semantics • Workflows Funding Agencies Data Facilities Registries 3/22/2016 Data curation services CI development Disciplinary Expertise
  23. 23. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 23 Data Services for the Solid Earth Sciences www.iedadata.org
  24. 24. 24 www.iedadata.org • Solid Earth Observational Data • High-T Geochemistry • Low-T Geochemistry • Petrology • Marine Geophysics & Geology • Geochronology • Cross-disciplinary tools & services • Sample registry SESAR • IEDA Data Browser • Portals (GeoPRISMs, USAP-DCC, etc.) • GeoMapApp • Interoperability Making Small Data BIG: Succss and Challenges in the Earth Sciences 3/22/2016
  25. 25. 25 IEDA Repositories  >720,000 files  59 TB  4 x 106 samples IEDA Syntheses  19 x 106 analytical values in EarthChem  2.79 x 106 miles of data from 875 cruises in the Global Multi-Resolution Topography (GMRT) 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  26. 26. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 26
  27. 27. 27 Data Data Data Data Data EarthChem Library Data Data Data Data Data PetDB, SedDB EarthChem Portal Data Publication & Preservation Data Mining & Analysis Investigators Metadata Catalog Data & Metadata Data & Metadata External Systems EarthChem Data Managers FINDABLE & ACCESSIBLE • DOI registration • Long-term archiving • CC license • Guidelines for data reporting (community endorsed) • QC by data managers RE-USABLE & INTEROPERABLE • Data & metadata harmonization • Standards-compliant data model • Service Oriented Architecture (ECP) 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  28. 28. Making Small Data BIG: Succss and Challenges in the Earth Sciences 28 DOI to allow proper citation Link to publications Link to funding source 28 3/22/2016
  29. 29. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 29 Global compilation of geochemical data for igneous rocks from the ocean floor & mantle xenoliths > 2,200 data sets/publications > 84,000 samples > 3.2 million observed values http://www.earthchem.org/petdb
  30. 30. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 30 Data from • >13,000 publications • >850,000 samples Total: >19.6 million analytical values Partner Databases: • PetDB • SedDB • GEOROC • USGS • MetPetDB • GANSEKI
  31. 31. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 31 Filter by method or concentration
  32. 32. Making Small Data BIG: Succss and Challenges in the Earth Sciences 32 3/22/2016
  33. 33. • 500 - 800 downloads per quarter • >600 citations in the literature • many fundamental new discoveries & insights • Disciplinary • Multi-disciplinary • Unanticipated purposes • new scientific approaches • Statistical rather than hypothetical Making Small Data BIG: Succss and Challenges in the Earth Sciences 33 3/22/2016
  34. 34. • Many samples and collections are not ‘online’. • Repositories lack resources & expertise to develop & maintain digital collection catalogs. • Samples often only described in publications. • Existing online catalogs are not connected or federated. • No easy way to search for samples. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 34 34 February 25, 2016 DFG Rundgespräch Geochemical Databases
  35. 35. • Linking physical samples digital data generated by their study. • Reproducibility! Access to the physical samples is required to verify & reproduce observations. • Re-usability! Access to information about samples is required for proper evaluation & interpretation of sample- based data. • Broad sharing of physical samples for use & re-use. • Samples are often expensive to collect (drilling, remote locations). • Many samples are unique and irreplaceable. • Re-analysis augments utility of existing data. • Samples often serve in ways that the collectors and repositories could not have imagined. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 35
  36. 36. • Discovery & Access for Re-use and Reproducibility • Sample Citation • Data Integration • Sample Management 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 36 IGSN = International Geo Sample Number
  37. 37. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 37
  38. 38. 38 “… AGU Publications also strongly encourages use of other identifiers in our journal papers. International Geo Sample Numbers (IGSNs) uniquely identify items, such as a rock sample, a piece of coral, or a vial of water taken from the natural environment, and provide important, consistent information about these samples.” Hanson, B. (2016), AGU opens its journals to author identifiers, Eos, 97, doi:10.1029/2016EO043183. Published on 7 January 2016. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  39. 39. 3/22/2016 39Making Small Data BIG: Succss and Challenges in the Earth Sciences
  40. 40. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 40 Technical Organizational Social/cultural
  41. 41. • Limitation of resources versus diversity of data • Need best practices for all small data communities • Need flexibility and performance of database schemas & search applications • Need tools for investigators to improve quality of submitted data • Need tools for data managers to support (semi-automate?) QC workflow • Repository standards/certification • Inclusion of legacy data (data rescue) How can we grow small data across the Geosciences? 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 41
  42. 42. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 42
  43. 43. Coalition for PublishingData in the Earth & Space Sciences 43 • Joint initiative of Earth Science publishers and Data Facilities to help translate the aspirations of open, available, and useful data from policy into practice. • Alignment of data policies across different publishers • Advancing integration of publication and data submission workflows • Support for authors and editors to comply with publishers’ data policies • e.g., online community directory of appropriate Earth science community repositories that meet leading standards on curation, quality, and access Increases development and enforcement of data best practices Reduces effort of metadata QC Increases flow of small data into repositories www.copdess.org3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  44. 44. • Cross-disciplinary development of community data model ODM2 (Observation Data Model) • Collaboration with commercial software engineering 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 44
  45. 45. • Advances coordination, collaboration, and integration • Community governance • Integrative Activities • Fosters new data communities • Research Coordination Networks • Develops and adapts new technologies to structure, transform, integrate, document, harmonize data & metadata • Building Blocks 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 45
  46. 46. The Alliance Testbed Project “Interdisciplinary Earth Data Alliance as a Model for Integrating EarthCube Technology Resources and Engaging the Broad Community” • Design & develop the organizational and technical architecture of a data facility that operates as an alliance of scientifically related data communities • Sharing data services and infrastructure that support trusted data curation and interdisciplinary science. 46 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  47. 47. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 47
  48. 48. • Build on and transition existing infrastructure of an established data facility (IEDA) to provide shared data services for all Alliance partners • Data Submission Hub • Trusted repository services (DOI registration, long-term preservation) • Deploy newly developed EC technologies to align and integrate with EC architecture • CINERGI: pipeline for harvesting, improving, unifying, and re- publishing metadata records assembled by Alliance partners • GeoWS: mechanism for Alliance partners to exchange data with data discovery, search, and visualization tools across the Alliance • GeoLink: Vocabulary services to support the Data Submission Hub 48 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  49. 49. • Data Facility: IEDA • Including existing IEDA Partners: MGDS, EarthChem, SESAR, Geochron, ASP@UTIG, LEPR • Community Data Collection: MetPetDB • New data communities: Mineral Physics, Deep Seafloor Processes • New data provider: IcePod • EarthCube Building Blocks: CINERGI, GeoLink, GeoWS • Stakeholder Alignment: WayMark Systems 49 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences
  50. 50. • Small data grows BIG when properly curated, documented, harmonized, and integrated. • Domain-specific data facilities are essential to ensure quality of data for trusted re-use & community engagement. • Current approaches are not sufficiently scalable. • Partnerships and collaborations help address the challenges. • Integration with publications will augment the flow of data into repositories and data products. • Partnerships among long-tail data communities allow sharing of data publication & preservation infrastructure while supporting domain- specific data curation. • Community-wide initiatives such as EarthCube help solve the entire range of social, technical, and organizational challenges. 3/22/2016 Making Small Data BIG: Succss and Challenges in the Earth Sciences 50

×