SlideShare uma empresa Scribd logo
1 de 16
A Community Roadmap for Enabling
Access to Geosciences Data

Tanu Malik
Ian Foster
Computation Institute
University of Chicago and Argonne National Lab.
tanum@ci.uchicago.edu, foster@anl.gov

                                                  www.ci.anl.gov
                                                  www.ci.uchicago.edu
Outline
•   Access Workshop
•   DataSpace
•   Post Charette EarthCube




                              www.ci.anl.gov
2
                              www.ci.uchicago.edu
Access is Vital for EarthCube’s Success
•   The goal of EarthCube is to create a sustainable
    infrastructure that enables the sharing of all
    geosciences data, information, and knowledge in an
    open, transparent and inclusive manner.

       I cant get access to *.

       It is difficult for me to *.

       I want to integrate data from other disciplines, but *.


      Access refers to software and activities that make data and computational
      resources easily, efficiently and reliably available to scientists across
      disciplines.
                                                                            www.ci.anl.gov
3
                                                                            www.ci.uchicago.edu
Access Workshop Goals
•   Encourage discussions on emergent issues:
     –   Use of cloud computing
     –   Exploiting the general principle of moving computation to data
     –   A technological and governance framework for cross-disciplinary
         access, service architecture, brokering principles, real-time data, uniform
         authentication and authorization environment, etc.
     –   Improving access to data in publications.

•   Bring some standardization on research data life cycle issues:
     –   In general, data, once generated, follow a lifecycle---they are
         stored, described, processed, transformed, accessed, discovered, analyze
         d, and curated. In organized networks and campaigns, lifecycle stages are
         often documented and standardized, though vary significantly across
         networks and campaigns. In individual initiatives, the lifecycle stages
         continue to remain ad hoc and ill-defined. [RDLM-Workshop2011]
•   Obtain community consensus on a few use cases


                                                                          www.ci.anl.gov
4
                                                                          www.ci.uchicago.edu
Workshop Activity Outcomes
•   Use Case 1: Can I access “not large” but “big data”
    to conduct statistical analysis?

•   Use Case 2: I have a hypothesis not tied to a
    physical instrument or geophysical parameter. Can
    I still access all the data, in an “interactive” fashion
    to test my hypothesis?


•   Use Case 3: The storm dust paper is vital to my
    research. Can I access the data in the publication
    and change parameters of experiments to
    understand the nature of storm dust?
                                                     www.ci.anl.gov
5
                                                     www.ci.uchicago.edu
Workshop Reflections
•   Its all about data!
                                 People

           Import                                      Import



                    Resources,            Resources,
    Data                                  Services              Data
                    Services



           Export                                 Export




                                                                       www.ci.anl.gov
6
                                                                       www.ci.uchicago.edu
Workshop Reflections-2
•   Discussing technology issues in insolation is a
    recipe for disaster.
    – Access is closely aligned with other subgroups
    – It is important to organize in functional units




                                                    www.ci.anl.gov
7
                                                    www.ci.uchicago.edu
Workshop Reflections-3
•      Challenges will continue



    Social Challenges                                     Changing Requirements/
                                                          Changing Technology
    • Transparency
    • Openness                   Adoption Culture
    • Establishing social ties                            • Real-time data
                                 Adoption is slow         • Cross-disciplinary Data
                                 Sustainability           • High dimensionality
                                 Establishing practices   • Network bandwidth,
                                                          Computational resource,
                                                          Data management constraints

                                                                       www.ci.anl.gov
8
                                                                       www.ci.uchicago.edu
%4
      Principles of Data Sharing in EarthCube
$ ) 4 ) B '* 7$ & / '* -* #$ -"#2& B '4 $ -% -$ -5'+#-* (-$ ! $
               , )          %    -+$ 4 ! '* #1-$
                 & !1, !) & "* +'/ '% !          /
                                                           "* +2+#-* "* (-$ % 8 "#"9, "(-H
                                                                . , )
                                                                           )$#1-$
-! $ 5 6' . &#) & & 0%2$ ! $ , 4 '-! $ "#"$ "* "7-/ -* #$ 6$ & 0'! '* 7$ '%& $ 0'(-+$ ) &
    /,                                                                       2* ) / +-&
                                                                                         $
                                                                                           %$
* A '+() 0-& "* ! $
   $!           6$
 1&/ ) -/ $!4 #, !4
                -$% #30 !
0 &1N!  @
#"$ ! $ * 7$ +$
     "* K) @"'4
  7!*+#3%   $+!-*. !. ) / *+, . -. !
4 3#!/ &*-3/ &(!
 5&1&' -(-*) !*3!&22#, . . !
 !2&*&@  !92(1$ (", ". '4
                 "$          '#6$
$ "#"$ "(-$ $ 1'(1$
  !     +,     '* B
  & "#'0-$ , & "(1-+$ ) &
           ", )           %$
"* ! $ & 0'* 7$ "#"$
       , -+-&         !
   0 1+&. -?, 2!*+&*!
 . !1#36-2-/ $!&!% ->% !/     ,
         *&/ *-&(!, 44-5-, / 5-, . !*3!*+3. , !$#3% !*+&*!&#, !&(#, and reuse
/ $!. % . Lowers the barrier to entry for data sharing&2) !3#$&/ -?, 2!&/ 2!4 / 2, 27
         '                                           1.                                    %        !
         Uses tenets like “metadata ASAP” to encourage submission of data
  !&55, (, #&*-/ $!. 0 &((, #!$#3% !' ) !#, 0 36-/ $!+&#2C &#, !&/ 2!. 34 &#, !36, #+, &2. @ !
                                        1.                                *C                   !P+,
  !4  Enables creation of “Curation7
    3#!&$$#, $&*-3/ !34     !2&*&7 &() . -. 7
                                      !&/     !*33(. !&($3#-*+0among communities, sub-communities
                                                      Co-ops” . 7 32, (. !&/ 2!0 , *+32. !43#!
                                                                 !0
         Serve the NSF !0 , *#-5. !. %
5, !*+&*!% , . !530 0 % -*)DMP requirement =-/ $!&/ 2!4 , 2' &5=!*3!. *3#, !&/ 2!1#30 3*, !
             .               /                 5+!&. !#&/          ,
         Based on a cloud-based infrastructure to support !0 &#=, *!/ , *C 3#=@
, $&*, 2!#, . , &#5+!1#32% !&/ 2!-/ 5#, &. , !*+, !36, #&((!6&(% !34 data discovery, access, and
                                   5*.                               , !*+,                      !
 !530 1(,mining !*3!*+, ![ &#*+!H5-, / 5, !U3((&' 3#&*3#) !6-. -3/ !*+&*!U+#-. !^ ) / / , . 7
             0 , / *&#)                                                                             !
 53((, &$% . !-/ !*+, !I , 2, #&*-3/ !34 &#*+!H5-, / 5, !A 4
             ,                          ![                    / 3#0 &*-3/ !Y&#*/ , #. !K HA O---!+&6, !
                                                                                         [ YN
*-3/ !34 &#*+!. 5-, / 5, !-/ 4
          ![                    3#0 &*-3/ !&/ 2!=/ 3C (, 2$, !#, . 3% . !' 3*+!+3#-?3/ *&(() !&/ 2!
                                                                         #5,
                                                                                                                www.ci.anl.gov
&6, !, . *&' (-. +, 2!. *#3/ $!53((&' 3#&*-3/ !' , *C , , / !*+, !*C 3!&5*-6-*-, . 7 !-((% *#&*, 2!' ) !*+, !
     9                                                                             !&.    .                     www.ci.uchicago.edu
Enabling A Data Sharing Space: The
DataSpace
 • Embrace a “semi-­­-structured” notion

 • Ingest data in raw form,
 Structuring and refinement of the data and metadata.

 • Open, extensible architecture that supports              Import
     Software as a Service (SaaS) model,
     Process for vetting contributed services prior to their incorporation.
     Based on on-demand resources
                                                 Resc,
 • Emphasis on usability instead                 Services           Data
                                                                 DataSpace
 on developing technology/infrastructure

                                                         Export


                                                              &




                                                                              www.ci.anl.gov
10
                                                                              www.ci.uchicago.edu
Post-Charette
•    2 Earthcube PI meets at University of Colorado, Boulder
      – A Concept group meeting,
           o some representation from Community groups,
           o July 10, 2012

      –   A Concept and Community group meeting,
           o   October 4 -5, 2012


•    Primary objective: Convergence
      –   Through Roadmaps
      –   Architecture
      –   On future steps




                                                               www.ci.anl.gov
11
                                                               www.ci.uchicago.edu
Highlights: Summary of Roadmaps
•    Workplace to collaborate,
•    Lower barriers for participation,
•    Openness and extensibility,
•    Feedback and reproducibility,
•    Discovery of materials held by long-tailed
     scientists,
•    Education and reward system for scientists,
•    Cross-domain teams and broad collaboration
•    A new community paradigm.
                                             www.ci.anl.gov
12
                                             www.ci.uchicago.edu
Defining DataSpace: Architecture-1



                      Import



         Resources,
         Services              Data



                 Export




                                      www.ci.anl.gov
13
                                      www.ci.uchicago.edu
Defining DataSpace: Architecture-2




                                     www.ci.anl.gov
14
                                     www.ci.uchicago.edu
Acknowledgements
• Don Middleton, NCAR                  • Dave Fulker, OPeNDAP,
• Robert Gibb, New Zealand Landcare    • Amarnath Gupta, UCS,
  Research                             • Robert Jacob, ANL
• Jeff Heard, U. of North Carolina
                                       • Chris Jenkins, JPL
• Doug Lindholm, U. of Colorado
                                       • Craig Mattocks, U. Miami
• Joseph Baker, Virginia Tech
                                       • Beth Plale, Indiana Univ.
• Anne Wilson, U of Colorado
                                       • Stephen M. Richard, AZGS
• Chris Lynnes, NASA/ESIP Federation
                                       • Sameer Sirugeri, Microsoft
• Karsten Steinhauser, U. of
                                       • Zhangfan Xing, JPL,
  Minnesota
                                       • John Williams, NCAR
• Ruth Duerr, NSIDC



                                                       www.ci.anl.gov
15
                                                       www.ci.uchicago.edu
Thank You!
•    Tanu Malik, tanum@ci.uchicago.edu,
•    Ian Foster, foster@anl.gov



•    Questions?




                                          www.ci.anl.gov
16
                                          www.ci.uchicago.edu

Mais conteúdo relacionado

Mais procurados

Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
thetfoot
 

Mais procurados (20)

Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
2019 03-11 bio it-world west genepattern notebook slides
2019 03-11 bio it-world west genepattern notebook slides2019 03-11 bio it-world west genepattern notebook slides
2019 03-11 bio it-world west genepattern notebook slides
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
 

Semelhante a EarthCube DDMA AGU

An Introduction to VIVO
An Introduction to VIVOAn Introduction to VIVO
An Introduction to VIVO
Paul Albert
 
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
katherncarlyle
 
Evidence for the Pareto principle in open source software activity
Evidence for the Pareto principle in open source software activityEvidence for the Pareto principle in open source software activity
Evidence for the Pareto principle in open source software activity
Tom Mens
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
งานรวม
งานรวมงานรวม
งานรวม
0904313854
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
Carly Strasser
 
How e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm DataHow e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm Data
Stoitsis Giannis
 

Semelhante a EarthCube DDMA AGU (20)

Sensors and Crowd - Steve Liang, GeoCENS Project
Sensors and Crowd - Steve Liang, GeoCENS ProjectSensors and Crowd - Steve Liang, GeoCENS Project
Sensors and Crowd - Steve Liang, GeoCENS Project
 
7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants) 7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants)
 
Strategies to foster OER and OER initiatives in developing regions
Strategies to foster OER and OER initiatives in developing regionsStrategies to foster OER and OER initiatives in developing regions
Strategies to foster OER and OER initiatives in developing regions
 
An Introduction to VIVO
An Introduction to VIVOAn Introduction to VIVO
An Introduction to VIVO
 
IkeWiki Tutorial
IkeWiki TutorialIkeWiki Tutorial
IkeWiki Tutorial
 
IASSIST 2011 presentation: Problems with our Data Citation Solution
IASSIST 2011 presentation:  Problems with our Data Citation SolutionIASSIST 2011 presentation:  Problems with our Data Citation Solution
IASSIST 2011 presentation: Problems with our Data Citation Solution
 
Turbocharge your automated tests with ci
Turbocharge your automated tests with ciTurbocharge your automated tests with ci
Turbocharge your automated tests with ci
 
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
!#$&()&#+,$)!#$$&())• +,-.$0$12,#-34-$#3.docx
 
Evidence for the Pareto principle in open source software activity
Evidence for the Pareto principle in open source software activityEvidence for the Pareto principle in open source software activity
Evidence for the Pareto principle in open source software activity
 
Momentum of Open Research Data: now in 5-d!
Momentum of Open Research Data: now in 5-d!Momentum of Open Research Data: now in 5-d!
Momentum of Open Research Data: now in 5-d!
 
Factual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 PresentationFactual 2011 Web 2.0 Presentation
Factual 2011 Web 2.0 Presentation
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Knowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceKnowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems Science
 
งานรวม
งานรวมงานรวม
งานรวม
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
Wiser2009 Luis Martinez
Wiser2009 Luis MartinezWiser2009 Luis Martinez
Wiser2009 Luis Martinez
 
How e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm DataHow e-infrastructure can contribute to Linked Germplasm Data
How e-infrastructure can contribute to Linked Germplasm Data
 

EarthCube DDMA AGU

  • 1. A Community Roadmap for Enabling Access to Geosciences Data Tanu Malik Ian Foster Computation Institute University of Chicago and Argonne National Lab. tanum@ci.uchicago.edu, foster@anl.gov www.ci.anl.gov www.ci.uchicago.edu
  • 2. Outline • Access Workshop • DataSpace • Post Charette EarthCube www.ci.anl.gov 2 www.ci.uchicago.edu
  • 3. Access is Vital for EarthCube’s Success • The goal of EarthCube is to create a sustainable infrastructure that enables the sharing of all geosciences data, information, and knowledge in an open, transparent and inclusive manner. I cant get access to *. It is difficult for me to *. I want to integrate data from other disciplines, but *. Access refers to software and activities that make data and computational resources easily, efficiently and reliably available to scientists across disciplines. www.ci.anl.gov 3 www.ci.uchicago.edu
  • 4. Access Workshop Goals • Encourage discussions on emergent issues: – Use of cloud computing – Exploiting the general principle of moving computation to data – A technological and governance framework for cross-disciplinary access, service architecture, brokering principles, real-time data, uniform authentication and authorization environment, etc. – Improving access to data in publications. • Bring some standardization on research data life cycle issues: – In general, data, once generated, follow a lifecycle---they are stored, described, processed, transformed, accessed, discovered, analyze d, and curated. In organized networks and campaigns, lifecycle stages are often documented and standardized, though vary significantly across networks and campaigns. In individual initiatives, the lifecycle stages continue to remain ad hoc and ill-defined. [RDLM-Workshop2011] • Obtain community consensus on a few use cases www.ci.anl.gov 4 www.ci.uchicago.edu
  • 5. Workshop Activity Outcomes • Use Case 1: Can I access “not large” but “big data” to conduct statistical analysis? • Use Case 2: I have a hypothesis not tied to a physical instrument or geophysical parameter. Can I still access all the data, in an “interactive” fashion to test my hypothesis? • Use Case 3: The storm dust paper is vital to my research. Can I access the data in the publication and change parameters of experiments to understand the nature of storm dust? www.ci.anl.gov 5 www.ci.uchicago.edu
  • 6. Workshop Reflections • Its all about data! People Import Import Resources, Resources, Data Services Data Services Export Export www.ci.anl.gov 6 www.ci.uchicago.edu
  • 7. Workshop Reflections-2 • Discussing technology issues in insolation is a recipe for disaster. – Access is closely aligned with other subgroups – It is important to organize in functional units www.ci.anl.gov 7 www.ci.uchicago.edu
  • 8. Workshop Reflections-3 • Challenges will continue Social Challenges Changing Requirements/ Changing Technology • Transparency • Openness Adoption Culture • Establishing social ties • Real-time data Adoption is slow • Cross-disciplinary Data Sustainability • High dimensionality Establishing practices • Network bandwidth, Computational resource, Data management constraints www.ci.anl.gov 8 www.ci.uchicago.edu
  • 9. %4 Principles of Data Sharing in EarthCube $ ) 4 ) B '* 7$ & / '* -* #$ -"#2& B '4 $ -% -$ -5'+#-* (-$ ! $ , ) % -+$ 4 ! '* #1-$ & !1, !) & "* +'/ '% ! / "* +2+#-* "* (-$ % 8 "#"9, "(-H . , ) )$#1-$ -! $ 5 6' . &#) & & 0%2$ ! $ , 4 '-! $ "#"$ "* "7-/ -* #$ 6$ & 0'! '* 7$ '%& $ 0'(-+$ ) & /, 2* ) / +-& $ %$ * A '+() 0-& "* ! $ $! 6$ 1&/ ) -/ $!4 #, !4 -$% #30 ! 0 &1N! @ #"$ ! $ * 7$ +$ "* K) @"'4 7!*+#3% $+!-*. !. ) / *+, . -. ! 4 3#!/ &*-3/ &(! 5&1&' -(-*) !*3!&22#, . . ! !2&*&@ !92(1$ (", ". '4 "$ '#6$ $ "#"$ "(-$ $ 1'(1$ ! +, '* B & "#'0-$ , & "(1-+$ ) & ", ) %$ "* ! $ & 0'* 7$ "#"$ , -+-& ! 0 1+&. -?, 2!*+&*! . !1#36-2-/ $!&!% ->% !/ ,  *&/ *-&(!, 44-5-, / 5-, . !*3!*+3. , !$#3% !*+&*!&#, !&(#, and reuse / $!. % . Lowers the barrier to entry for data sharing&2) !3#$&/ -?, 2!&/ 2!4 / 2, 27 ' 1. % !  Uses tenets like “metadata ASAP” to encourage submission of data !&55, (, #&*-/ $!. 0 &((, #!$#3% !' ) !#, 0 36-/ $!+&#2C &#, !&/ 2!. 34 &#, !36, #+, &2. @ ! 1. *C !P+, !4  Enables creation of “Curation7 3#!&$$#, $&*-3/ !34 !2&*&7 &() . -. 7 !&/ !*33(. !&($3#-*+0among communities, sub-communities Co-ops” . 7 32, (. !&/ 2!0 , *+32. !43#! !0  Serve the NSF !0 , *#-5. !. % 5, !*+&*!% , . !530 0 % -*)DMP requirement =-/ $!&/ 2!4 , 2' &5=!*3!. *3#, !&/ 2!1#30 3*, ! . / 5+!&. !#&/ ,  Based on a cloud-based infrastructure to support !0 &#=, *!/ , *C 3#=@ , $&*, 2!#, . , &#5+!1#32% !&/ 2!-/ 5#, &. , !*+, !36, #&((!6&(% !34 data discovery, access, and 5*. , !*+, ! !530 1(,mining !*3!*+, ![ &#*+!H5-, / 5, !U3((&' 3#&*3#) !6-. -3/ !*+&*!U+#-. !^ ) / / , . 7 0 , / *&#) ! 53((, &$% . !-/ !*+, !I , 2, #&*-3/ !34 &#*+!H5-, / 5, !A 4 , ![ / 3#0 &*-3/ !Y&#*/ , #. !K HA O---!+&6, ! [ YN *-3/ !34 &#*+!. 5-, / 5, !-/ 4 ![ 3#0 &*-3/ !&/ 2!=/ 3C (, 2$, !#, . 3% . !' 3*+!+3#-?3/ *&(() !&/ 2! #5, www.ci.anl.gov &6, !, . *&' (-. +, 2!. *#3/ $!53((&' 3#&*-3/ !' , *C , , / !*+, !*C 3!&5*-6-*-, . 7 !-((% *#&*, 2!' ) !*+, ! 9 !&. . www.ci.uchicago.edu
  • 10. Enabling A Data Sharing Space: The DataSpace • Embrace a “semi-­­-structured” notion • Ingest data in raw form, Structuring and refinement of the data and metadata. • Open, extensible architecture that supports Import Software as a Service (SaaS) model, Process for vetting contributed services prior to their incorporation. Based on on-demand resources Resc, • Emphasis on usability instead Services Data DataSpace on developing technology/infrastructure Export & www.ci.anl.gov 10 www.ci.uchicago.edu
  • 11. Post-Charette • 2 Earthcube PI meets at University of Colorado, Boulder – A Concept group meeting, o some representation from Community groups, o July 10, 2012 – A Concept and Community group meeting, o October 4 -5, 2012 • Primary objective: Convergence – Through Roadmaps – Architecture – On future steps www.ci.anl.gov 11 www.ci.uchicago.edu
  • 12. Highlights: Summary of Roadmaps • Workplace to collaborate, • Lower barriers for participation, • Openness and extensibility, • Feedback and reproducibility, • Discovery of materials held by long-tailed scientists, • Education and reward system for scientists, • Cross-domain teams and broad collaboration • A new community paradigm. www.ci.anl.gov 12 www.ci.uchicago.edu
  • 13. Defining DataSpace: Architecture-1 Import Resources, Services Data Export www.ci.anl.gov 13 www.ci.uchicago.edu
  • 14. Defining DataSpace: Architecture-2 www.ci.anl.gov 14 www.ci.uchicago.edu
  • 15. Acknowledgements • Don Middleton, NCAR • Dave Fulker, OPeNDAP, • Robert Gibb, New Zealand Landcare • Amarnath Gupta, UCS, Research • Robert Jacob, ANL • Jeff Heard, U. of North Carolina • Chris Jenkins, JPL • Doug Lindholm, U. of Colorado • Craig Mattocks, U. Miami • Joseph Baker, Virginia Tech • Beth Plale, Indiana Univ. • Anne Wilson, U of Colorado • Stephen M. Richard, AZGS • Chris Lynnes, NASA/ESIP Federation • Sameer Sirugeri, Microsoft • Karsten Steinhauser, U. of • Zhangfan Xing, JPL, Minnesota • John Williams, NCAR • Ruth Duerr, NSIDC www.ci.anl.gov 15 www.ci.uchicago.edu
  • 16. Thank You! • Tanu Malik, tanum@ci.uchicago.edu, • Ian Foster, foster@anl.gov • Questions? www.ci.anl.gov 16 www.ci.uchicago.edu

Notas do Editor

  1. Shared, standard, reusable software interfacesFor disparate data types, disparate storage, varying protocols;Deliver data in user-requested format and translation between standards.Link various kinds of data Integration of high resolution topography scans & geodetic data;Integration of geologic data in deep time;Geo-located, and non-geo-located datasets;Observation and simulation datasets for comparison.Real-time access to data and facilities Capabilities within Cloud, Grid such a shared storage and data spacesIn low bandwidth settingsSimulation and modeling capabilities within HPC, and Science Portals Access refers to software and activities that make data and computational resources easily, efficiently and reliably available to scientists.
  2. Access Paradigms: The SaaS model, the brokering approach. The SaaS model increases usage and adoption by making access to data and resources easy and convenient. The brokering approach implements mediation and distribution capabilities in a transparent way. Discuss these paradigms in context of the needs of the publishers of the big data and the needs of the long-tail geoscientist. Issues relating to access control, confidentiality, and the role of governance bodies for emerging access paradigms.Structural Data Integration for Access: issues relating to data a, data models, and standards for data integration. discuss novel data types needed by current science cases and their abstraction to data models and knowledge-based models based on space-time integration.Scalable Resource Access:scalable access to resources, such as HPC systems, cloud-based systems (parallel storage systems, parallel analysis systems as map-reduce [8], Hadoop, SciDB [19]), especially at marginal cost. to store and manipulate data even when the structure of the data is not fully known to the system; associating the cloud with a set of services for recognizing the structure of a wide variety of file types used in the geoscience applications, extracting structure from the data, and traversing files to extract metadata.  
  3. However, in cases where researchers are interested in studying a phenomena, can an EarthCube framework provide adequate semantics to express a search query, a generic model for data access of events, and interactively discover ‘events’ within data and perform ‘first look’ analytics, while keeping provenance and history of all analyses?
  4. Earlier Resources were at the center, and data was massaged so that the resources and services can access itBut now the data is going to be central and services will feed into it and so the
  5. The  Sher  Dataspace  embodies  a  “semi-­‐‑structured”  notion  compared,  on  the  one  hand,  with  rigidly   structured  systems  like,  say,  relational  database  systems,  where  a  data  schema  needs  to  be  specified  first   before  data  can  be  stored  and,  on  the  other  hand  with,  say,  filesystems,  which  are  unstructured  and  do   not  support  any  notion  of  a  schema  or  content-­‐‑based  metadata.  In  Sher,  data  can  be  ingested  as  a  file  (or  a   heterogeneous  package,  e.g.  a  folder)  with  minimal  metadata.  Services  are  provided  for  capturing  this   metadata  as  well  as  the  package  structure.  Further  services  are  provided  for  on-­‐‑going  structuring  and   refinement  of  the  data  and  metadata.  Examples  include  user-­‐‑specified  annotations;  extraction  of information  for  well-­‐‑known  filetypes  (e.g.,  netCDF);  extraction  of  metadata  for  proprietary  file  types   using  software  libraries  (e.g.,  NMR  data);  structuring  of  data  and  associated  information,  e.g.  associating   a  set  of  flat  files  with  a  database  along  with  the  set  of  data  cleaning  routines  and  load  scripts  that  were   used  to  create  the  data,  etc.  Thus,  the  Dataspace  concept  supports  the  model  of  data  being  transformed   incrementally  from  a  relatively  unstructured  state  with  minimal  metadata,  to  a  highly  structured  form   with  rich  metadata,  using  an  array  of  structuring  and  refinement  services.    A  key  enabling  characteristic  of  Sher  is  its  open,  extensible  architecture  that  supports  the  Software  as  a   Service  (SaaS)  model,  thereby  removing  the  burden  of  maintaining  software  and  software  environments   from  the  client  [52].  Using  this  SaaS  model,  Sher  facilitates  creation  of  third-­‐‑party  services  that  can  be   contributed  into  the  system,  i.e.,  a  SherStore,  similar  to  the  Apple  AppStore,  including  the  notion  of   vetting  contributed  services  prior  to  their  incorporation.