SlideShare uma empresa Scribd logo
1 de 17
Do It Yourself (DIY) Earth Science
Collaboratories Using Best Practices
and Breakthrough Technologies
IN13D-01
ERIC STEPHAN
December 11, 2017 1
Pacific Northwest National Laboratory
AGU Fall meeting 2017, New Orleans, LA
IN13D: Approaches for Curation to Data Discovery in the Era of Big Data Variety II
Addressing Data Challenges of Scientists on
Small and Midscale Budgets
Do it yourself (DIY) home project videos have taken storm in media,
helping you reroof a house or replace a water pump.
DIY recommendations can even help you determine if you can, do it yourself!
Talk targeting innovative smaller sized science projects that produce
quality science products including data that can be shared with future
consumer communities..
Many best practices can be carried out in even the humblest situations.
big data center, smaller projects want more effective ways to connect to your
resources beyond ’point and click’.
December 11, 2017 2
Emergence of Scientific Collaborative Tools –
Science inspired the Web and so much more!
Collaboratory - A center without walls, in which the nation’s researchers can perform their research
without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data
and computational resources, [and] accessing information in digital libraries1
December 11, 2017 3
1The national collaboratory. In Towards a national collaboratory. Unpublished report of a National Science Foundation
invitational workshop, Rockefeller University, New York. 1988.
The DOE 2000 Project
Environmental Molecular Sciences
Laboratory (EMSL) User Facility
12 March 1989, Sir Tim
Berners-Lee original “vague
but exciting” submission to
CERN on a distributed
information system
National Institute of Health:
The Human Genome Project
(HGP) Began 1989.
Engage with EMSL to advance your research
How can we work together?
§ Collaborate with our experts
§ Work within multi-disc iplinary teams
to ac c elerate sc ience
§ Acc ess world-c lass sc ientific
user facilities and spec ialized
instrumentation
§ Provide research and c areer
opportunities for your students
Dec ember 8, 2017
www.emsl.pnnl.gov
www.universities.pnnl.gov
Examples of Off the Shelf and Standards
Deluge: What Works for You?
December 11, 2017 4
Attaining Data Study Afterlife?
December 11, 2017 5
Signal
Message
Application
Database
File store
Archive
Deep Web
Science publications
Data
Visibility through commercial search engine
New advancements in science
and engineering require
careful attention to keeping
scientific discovery literature
and data artifacts in
circulation
Example
Data
Lifecycle
“…Placed in storage, the data has as much
productive value as your labor value when
you sit on the sofa at night to watch TV. “
“…If you want to increase the value of your data
you have to increase its active circulation and
utility!” Steven Adler, DWBP co-Chair
Without some help, science can remain largely
invisible in the Deep Web
Increasing Lifespan, Reuse and Visibility DIY
 Choose from 35 DWBP best practices to match research functional needs
 Scope best practices with reference model sketches
 Assess off the shelf product capabilities and limitations with DWBP
 Identify required additional plumbing to accomplish research
https://www.w3.org/TR/dwbp/
DWBP Data Challenges and Motivating
Questions
December 11, 2017 7
Metadata
Data License
Provenance
Data Quality
Versioning
Identification
Data Formats
Vocabularies
Access
Preservation
Feedback
Enrichment
Replication
How do I provide metadata?
How do I permit/restrict access?
How can I convey transparency?
How can I add trust?
How can I track version history?
How can I create and use
persistent identifiers?
What non-proprietary structures
should I use?
How do I make my data more
easily understood?
How can I make data retrieval
easy, robust, and intuitive?
What should I consider when
archiving?
How can data producers and users
be better engaged?
How can I add better value to
data?
How do I use data responsibly?
“The Web is not a glorified USB Stick”,
Phil Archer, W3C Data Activity Lead https://www.w3.org/2017/Talks/0621-phila-oai/
http://w3c.github.io/dwbp/dwbp-implementation-report.html
Best Practices Benefit Measures
December 11, 2017 8
• Comprehension: humans will have a better understanding about the data
structure and meaning, the metadata and the nature of the dataset.
• Processability: machines can automatically ingest and operate on data.
• Discoverability: finding new associations between and in data resources.
• Reuse: increase intrinsic value to wider data consumer communities.
• Trust: improving the confidence that consumers have in the dataset.
• Linkability: it will be possible to associate data resources
• Access: humans and machines will be able to retrieve relevant data in familiar
common formats.
• Interoperability: cooperation among data publishers and consumers.
Using Technology Agnostic Reference
Models to Assess Best Practice Relevance
December 11, 2017 9
ISO Open Archival Information System (OAIS) ISO 14721:2003
The Context, Containers, Components and Classes (C4) model for software architecture
• Provide data provenance information
• Provide data quality information
• Provide a version indicator
• Provide version history
• Preserve identifiers
Example Context Data Producer Reference
Models
December 11, 2017 10
• Provide metadata
• Provide structural metadata
• Use machine-readable standardized data formats
• Provide data in multiple formats
• Reuse vocabularies, preferably standardized
ones
• Provide Subsets for Large Datasets
Provide bulk download
Provide Subsets for Large Datasets
Use Case: Energy Exascale Earth System
Model (E3SM) and Mass Spectrometry
Achieves this through IETF, W3C
formats, W3C Provenance,
Interoperable Protocols,
Off the shelf: Swagger, Jupyter
Notebook, NoSQL databases
Repurposed to support
reproducible Mass Spectrometry
Experiments
December 11, 2017 11
Focus: Recovering enough information to re-execute a given simulation
Thomas M, J Laskin, B Raju, EG Stephan, TO Elsethagen, NYS Van, and SN Nguyen. 2016. "Enabling Re-
executable Workflows with Near-real-time Visualization, Provenance Capture and Advanced Querying for Mass
Spectrometry Data." In NYSDS 2016 - Data-Driven Discovery.
Example Context Data Publisher Reference
Model
December 11, 2017 12
• Provide metadata
• Provide descriptive metadata
• Provide structural metadata
• Provide data provenance information
• Use locale-neutral data representations
• Reuse vocabularies, preferably standardized ones
• Choose the right formalization level
• Gather feedback from data consumers
• Enrich data by generating new data
• Provide Complementary Presentations
• Interoperability
• Use persistent URIs as identifiers of datasets
• Use persistent URIs as identifiers within datasets
• Reuse vocabularies, preferably standardized ones
• Choose the right formalization level
• Make data available through an API
• Use Web Standards as the foundation of APIs
• Avoid Breaking Changes to Your API
• Provide Feedback to the Original Publisher
• Provide data provenance information
• Provide data quality information
• Provide a version indicator
• Provide version history
• Preserve identifiers
December 11, 2017 13
Example curating and re-publishing to
support discovery
Based on a single soil moisture use case
1.4 billion triples curated measurement
metadata (i.e., relationships, graph edges)
Including descriptions of 777,230 datasets,
2,767 data catalogs,
1,701 data centers,
52 data networks.
Chappell AR, JR Weaver, S Purohit, WP Smith, KL Schuchardt, P West, B Lee, and P Fox. 2015. "Enhancing the Impact of Science Data:
Toward Data Discovery and Reuse." In Proceedings of the 14th IEEE/ACIS International Conference on Computer and Information Science
2015.
Ontology alignment
Query Optimization with SPARQL and
Schema.org
Use of services such as geonames.org
DWBP Implementation Report: Field
Guide to Examples of Best Practices
December 11, 2017 14
Use evaluation criteria in report for
assessing your own technology stack and
data resources.
http://w3c.github.io/dwbp/dwbp-implementation-report.html
Indirect Collaborations
December 11, 2017 15
Producers
Publishers
Analysts
Researchers
There is real interest in your data from
emerging fields!
Using common methods and
approaches are extremely helpful
indirect collaborations
Internationalizing your products can
widen your impact
Approach supports open and closed
(behind firewall) collaborations
Example
Data
Lifecycle
What Type of Data Terrain Are We Providing
for Future Science?
Active technical recommendation communities such as W3C are here to serve
you and are interested in your problems.
Evolving good practice as a guideline is less expensive than technology solution
context switching without good practices.
Success criteria described in the DWBP can help you measure benefit to your
project
Change is good, for legacy applications, good practice and new technology
adoption may be more impactful at a gradual pace
December 11, 2017 16
Questions? Eric.Stephan@pnnl.gov
Paraphrased from notes on TBL’s remarks at the the W3C Technical Plenary and Advisor Committee 2014
“Thank you for giving us level terrain to build upon”
Sir Tim Berners-Lee (inventor of the Web), recalling a conversation he had with Vint Cerf (co-
inventor of the Internet)
The International Data on the Web Best
Practices Recommendations Team!
Contributors:
• Annette Greiner (Lawrence Berkley National Laboratory)
• Antoine Isaac
• Carlos Iglesias
• Carlos Laufer
• Christophe Guéret
• Deirdre Lee (Working Group co-Chair)
• Doug Schepers
• Eric G. Stephan (Pacific Northwest National Laboratory)
• Eric Kauz
• Ghislain A. Atemezing
• Hadley Beeman (Working Group co-Chair)
• Ig Ibert Bittencourt
• João Paulo Almeida
• Makx Dekkers
• Peter Winstanley
• Phil Archer (Data Activity Chair)
• Riccardo Albertoni
• Sumit Purohit (Pacific Northwest National Laboratory)
• Yasodara Córdova December 11, 2017 17
DWBP Editors:
• Bernadette Farias Lóscio
• Caroline Burle
• Newton Calegari
Working Group Chairs
• Hadley Beeman
• Deirdre Lee
• Yasodara Córdova
• Steven Adler, Perspective & Community Outreach
W3C Data Activity Lead, W3C Team Contact: Phil Archer

Mais conteúdo relacionado

Mais procurados

Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)robin fay
 
Presentation of science 2.0 at European Astronomical Society
Presentation of science 2.0 at European Astronomical SocietyPresentation of science 2.0 at European Astronomical Society
Presentation of science 2.0 at European Astronomical Societyosimod
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsSEAD
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassAaron Collie
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...John Scally
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management HubDenis Parfenov
 
DataViz_What_How_Why
DataViz_What_How_WhyDataViz_What_How_Why
DataViz_What_How_WhyShweta Gupte
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for EducationLee Stott
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaInstitute of Contemporary Sciences
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 

Mais procurados (20)

Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Presentation of science 2.0 at European Astronomical Society
Presentation of science 2.0 at European Astronomical SocietyPresentation of science 2.0 at European Astronomical Society
Presentation of science 2.0 at European Astronomical Society
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and Tools
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management Hub
 
DataViz_What_How_Why
DataViz_What_How_WhyDataViz_What_How_Why
DataViz_What_How_Why
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for Education
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 

Semelhante a Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies

IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUKerstin Lehnert
 
Building Capacity in Your Library for Research Data Management Support (Or Wh...
Building Capacity in Your Library for Research Data Management Support (Or Wh...Building Capacity in Your Library for Research Data Management Support (Or Wh...
Building Capacity in Your Library for Research Data Management Support (Or Wh...Charleston Conference
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)Research Data Alliance
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Rebecca Grant
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementAccess Innovations, Inc.
 
Research visualization
Research visualizationResearch visualization
Research visualizationDr Trivedi
 

Semelhante a Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies (20)

IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGU
 
Building Capacity in Your Library for Research Data Management Support (Or Wh...
Building Capacity in Your Library for Research Data Management Support (Or Wh...Building Capacity in Your Library for Research Data Management Support (Or Wh...
Building Capacity in Your Library for Research Data Management Support (Or Wh...
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Seeking serendipity
Seeking serendipitySeeking serendipity
Seeking serendipity
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and Management
 
Research visualization
Research visualizationResearch visualization
Research visualization
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 

Último

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Último (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies

  • 1. Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies IN13D-01 ERIC STEPHAN December 11, 2017 1 Pacific Northwest National Laboratory AGU Fall meeting 2017, New Orleans, LA IN13D: Approaches for Curation to Data Discovery in the Era of Big Data Variety II
  • 2. Addressing Data Challenges of Scientists on Small and Midscale Budgets Do it yourself (DIY) home project videos have taken storm in media, helping you reroof a house or replace a water pump. DIY recommendations can even help you determine if you can, do it yourself! Talk targeting innovative smaller sized science projects that produce quality science products including data that can be shared with future consumer communities.. Many best practices can be carried out in even the humblest situations. big data center, smaller projects want more effective ways to connect to your resources beyond ’point and click’. December 11, 2017 2
  • 3. Emergence of Scientific Collaborative Tools – Science inspired the Web and so much more! Collaboratory - A center without walls, in which the nation’s researchers can perform their research without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries1 December 11, 2017 3 1The national collaboratory. In Towards a national collaboratory. Unpublished report of a National Science Foundation invitational workshop, Rockefeller University, New York. 1988. The DOE 2000 Project Environmental Molecular Sciences Laboratory (EMSL) User Facility 12 March 1989, Sir Tim Berners-Lee original “vague but exciting” submission to CERN on a distributed information system National Institute of Health: The Human Genome Project (HGP) Began 1989. Engage with EMSL to advance your research How can we work together? § Collaborate with our experts § Work within multi-disc iplinary teams to ac c elerate sc ience § Acc ess world-c lass sc ientific user facilities and spec ialized instrumentation § Provide research and c areer opportunities for your students Dec ember 8, 2017 www.emsl.pnnl.gov www.universities.pnnl.gov
  • 4. Examples of Off the Shelf and Standards Deluge: What Works for You? December 11, 2017 4
  • 5. Attaining Data Study Afterlife? December 11, 2017 5 Signal Message Application Database File store Archive Deep Web Science publications Data Visibility through commercial search engine New advancements in science and engineering require careful attention to keeping scientific discovery literature and data artifacts in circulation Example Data Lifecycle “…Placed in storage, the data has as much productive value as your labor value when you sit on the sofa at night to watch TV. “ “…If you want to increase the value of your data you have to increase its active circulation and utility!” Steven Adler, DWBP co-Chair Without some help, science can remain largely invisible in the Deep Web
  • 6. Increasing Lifespan, Reuse and Visibility DIY  Choose from 35 DWBP best practices to match research functional needs  Scope best practices with reference model sketches  Assess off the shelf product capabilities and limitations with DWBP  Identify required additional plumbing to accomplish research https://www.w3.org/TR/dwbp/
  • 7. DWBP Data Challenges and Motivating Questions December 11, 2017 7 Metadata Data License Provenance Data Quality Versioning Identification Data Formats Vocabularies Access Preservation Feedback Enrichment Replication How do I provide metadata? How do I permit/restrict access? How can I convey transparency? How can I add trust? How can I track version history? How can I create and use persistent identifiers? What non-proprietary structures should I use? How do I make my data more easily understood? How can I make data retrieval easy, robust, and intuitive? What should I consider when archiving? How can data producers and users be better engaged? How can I add better value to data? How do I use data responsibly? “The Web is not a glorified USB Stick”, Phil Archer, W3C Data Activity Lead https://www.w3.org/2017/Talks/0621-phila-oai/ http://w3c.github.io/dwbp/dwbp-implementation-report.html
  • 8. Best Practices Benefit Measures December 11, 2017 8 • Comprehension: humans will have a better understanding about the data structure and meaning, the metadata and the nature of the dataset. • Processability: machines can automatically ingest and operate on data. • Discoverability: finding new associations between and in data resources. • Reuse: increase intrinsic value to wider data consumer communities. • Trust: improving the confidence that consumers have in the dataset. • Linkability: it will be possible to associate data resources • Access: humans and machines will be able to retrieve relevant data in familiar common formats. • Interoperability: cooperation among data publishers and consumers.
  • 9. Using Technology Agnostic Reference Models to Assess Best Practice Relevance December 11, 2017 9 ISO Open Archival Information System (OAIS) ISO 14721:2003 The Context, Containers, Components and Classes (C4) model for software architecture
  • 10. • Provide data provenance information • Provide data quality information • Provide a version indicator • Provide version history • Preserve identifiers Example Context Data Producer Reference Models December 11, 2017 10 • Provide metadata • Provide structural metadata • Use machine-readable standardized data formats • Provide data in multiple formats • Reuse vocabularies, preferably standardized ones • Provide Subsets for Large Datasets Provide bulk download Provide Subsets for Large Datasets
  • 11. Use Case: Energy Exascale Earth System Model (E3SM) and Mass Spectrometry Achieves this through IETF, W3C formats, W3C Provenance, Interoperable Protocols, Off the shelf: Swagger, Jupyter Notebook, NoSQL databases Repurposed to support reproducible Mass Spectrometry Experiments December 11, 2017 11 Focus: Recovering enough information to re-execute a given simulation Thomas M, J Laskin, B Raju, EG Stephan, TO Elsethagen, NYS Van, and SN Nguyen. 2016. "Enabling Re- executable Workflows with Near-real-time Visualization, Provenance Capture and Advanced Querying for Mass Spectrometry Data." In NYSDS 2016 - Data-Driven Discovery.
  • 12. Example Context Data Publisher Reference Model December 11, 2017 12 • Provide metadata • Provide descriptive metadata • Provide structural metadata • Provide data provenance information • Use locale-neutral data representations • Reuse vocabularies, preferably standardized ones • Choose the right formalization level • Gather feedback from data consumers • Enrich data by generating new data • Provide Complementary Presentations • Interoperability • Use persistent URIs as identifiers of datasets • Use persistent URIs as identifiers within datasets • Reuse vocabularies, preferably standardized ones • Choose the right formalization level • Make data available through an API • Use Web Standards as the foundation of APIs • Avoid Breaking Changes to Your API • Provide Feedback to the Original Publisher • Provide data provenance information • Provide data quality information • Provide a version indicator • Provide version history • Preserve identifiers
  • 13. December 11, 2017 13 Example curating and re-publishing to support discovery Based on a single soil moisture use case 1.4 billion triples curated measurement metadata (i.e., relationships, graph edges) Including descriptions of 777,230 datasets, 2,767 data catalogs, 1,701 data centers, 52 data networks. Chappell AR, JR Weaver, S Purohit, WP Smith, KL Schuchardt, P West, B Lee, and P Fox. 2015. "Enhancing the Impact of Science Data: Toward Data Discovery and Reuse." In Proceedings of the 14th IEEE/ACIS International Conference on Computer and Information Science 2015. Ontology alignment Query Optimization with SPARQL and Schema.org Use of services such as geonames.org
  • 14. DWBP Implementation Report: Field Guide to Examples of Best Practices December 11, 2017 14 Use evaluation criteria in report for assessing your own technology stack and data resources. http://w3c.github.io/dwbp/dwbp-implementation-report.html
  • 15. Indirect Collaborations December 11, 2017 15 Producers Publishers Analysts Researchers There is real interest in your data from emerging fields! Using common methods and approaches are extremely helpful indirect collaborations Internationalizing your products can widen your impact Approach supports open and closed (behind firewall) collaborations Example Data Lifecycle
  • 16. What Type of Data Terrain Are We Providing for Future Science? Active technical recommendation communities such as W3C are here to serve you and are interested in your problems. Evolving good practice as a guideline is less expensive than technology solution context switching without good practices. Success criteria described in the DWBP can help you measure benefit to your project Change is good, for legacy applications, good practice and new technology adoption may be more impactful at a gradual pace December 11, 2017 16 Questions? Eric.Stephan@pnnl.gov Paraphrased from notes on TBL’s remarks at the the W3C Technical Plenary and Advisor Committee 2014 “Thank you for giving us level terrain to build upon” Sir Tim Berners-Lee (inventor of the Web), recalling a conversation he had with Vint Cerf (co- inventor of the Internet)
  • 17. The International Data on the Web Best Practices Recommendations Team! Contributors: • Annette Greiner (Lawrence Berkley National Laboratory) • Antoine Isaac • Carlos Iglesias • Carlos Laufer • Christophe Guéret • Deirdre Lee (Working Group co-Chair) • Doug Schepers • Eric G. Stephan (Pacific Northwest National Laboratory) • Eric Kauz • Ghislain A. Atemezing • Hadley Beeman (Working Group co-Chair) • Ig Ibert Bittencourt • João Paulo Almeida • Makx Dekkers • Peter Winstanley • Phil Archer (Data Activity Chair) • Riccardo Albertoni • Sumit Purohit (Pacific Northwest National Laboratory) • Yasodara Córdova December 11, 2017 17 DWBP Editors: • Bernadette Farias Lóscio • Caroline Burle • Newton Calegari Working Group Chairs • Hadley Beeman • Deirdre Lee • Yasodara Córdova • Steven Adler, Perspective & Community Outreach W3C Data Activity Lead, W3C Team Contact: Phil Archer