SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
computationinstitute.org
www.globusonline.org	
  	
  
Research data management
as a service
Ian Foster
foster@uchicago.edu
computationinstitute.org
www.globusonline.org	
  	
  
High energy
physics
Molecular biology
Cosmolog
y
Genetic
s
Metagenomic
s
Linguistic
s
Economic
s
Climate
change
Visual
arts
computationinstitute.org
www.globusonline.org	
  	
  
What would a 

“dropbox for science” 

look like?
computationinstitute.org
www.globusonline.org	
  	
  
Registry	
  
Staging	
  
Store	
  
Ingest	
  
Store	
  
Analysis	
  
Store	
  
Community	
  
Store	
  
Archive	
   Mirror	
  
Ingest	
  
Store	
  
Analysis	
  
Store	
  
Community	
  
Store	
  
Archive	
   Mirror	
  
Registry	
  
Quota
exceeded
!
Expired
credentials
!
Network
failed. Retry.
!
Permission
denied
!
It should be trivial to Collect, Move, Sync, Share, Analyze,
Annotate, Publish, Search, Backup, & Archive BIG DATA
… but in reality it’s often very challenging
computationinstitute.org
www.globusonline.org	
  	
  
• Collect	
  
• Move	
  
• Sync	
  
• Share	
  
• Analyze	
  
• Annotate	
  
• Publish	
  
• Search	
  
• Backup	
  
• Archive	
  
BIG	
  DATA	
  …for
computationinstitute.org
www.globusonline.org	
  	
  
• Collect	
  
• Move	
  
• Sync	
  
• Share	
  
• Analyze	
  
• Annotate	
  
• Publish	
  
• Search	
  
• Backup	
  
• Archive	
  
• Collect	
  
• Move	
  
• Sync	
  
• Share	
  
	
   Capabili8es	
  delivered	
  using	
  	
  
So=ware-­‐as-­‐Service	
  (SaaS)	
  model	
  
computationinstitute.org
www.globusonline.org	
  	
  
computationinstitute.org
www.globusonline.org	
  	
  
Data
Source
Data
Destination
User	
  
iniAates	
  
transfer	
  
request	
  
1
Globus	
  
Online	
  
moves/
syncs	
  files	
  
2
Globus	
  Online	
  
noAfies	
  user	
  
3
computationinstitute.org
www.globusonline.org	
  	
  
Data
Source
User	
  A	
  selects	
  
file(s)	
  to	
  share;	
  
selects	
  user/
group,	
  sets	
  share	
  
permissions	
  	
  
1
Globus	
  Online	
  tracks	
  
shared	
  files;	
  no	
  need	
  
to	
  move	
  files	
  to	
  
cloud	
  storage!	
  
2
User	
  B	
  logs	
  in	
  to	
  
Globus	
  Online	
  
and	
  accesses	
  
shared	
  file	
  
3
computationinstitute.org
www.globusonline.org	
  	
  
Early	
  adopAon	
  is	
  encouraging	
  
computationinstitute.org
www.globusonline.org	
  	
  
Early	
  adopAon	
  is	
  encouraging	
  
8,000	
  registered	
  users;	
  >100	
  daily	
  
~16	
  PB	
  moved;	
  ~1B	
  files	
  
10x	
  (or	
  beOer)	
  performance	
  vs.	
  scp	
  
99.9%	
  availability	
  
En8rely	
  hosted	
  on	
  Amazon	
  
computationinstitute.org
www.globusonline.org	
  	
  
Globus	
  Online	
  already	
  does	
  a	
  lot	
  
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
www.globusonline.org	
  	
  
We	
  are	
  also	
  adding	
  capabiliAes	
  
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
www.globusonline.org	
  	
  
We	
  are	
  also	
  adding	
  capabiliAes	
  
Globus Toolkit
Sharing Service
Transfer Service
Dataset Services
Globus Nexus
(Identity, Group, Profile)
GlobusOnlineAPIs
GlobusConnect
computationinstitute.org
www.globusonline.org	
  	
  
Expanding Globus Online services
•  Ingest and publication
– Imagine a DropBox that not only replicates, but
also extracts metadata, catalogs, converts
•  Cataloging
– Virtual views of data based on user-defined
and/or automatically extracted metadata
•  Computation
– Associate computational procedures,
orchestrate application, catalog results, record
provenance
computationinstitute.org
www.globusonline.org	
  	
  
Builds on catalog as a service
Approach
•  Hosted user-defined
catalogs
•  Based on tag model
<subject, name, value>
•  Optional schema
constraints
•  Integrated with other
Globus services
Three REST APIs
/query/
•  Retrieve subjects
/tags/
•  Create, delete, retrieve
tags
/tagdef/
•  Create, delete, retrieve
tag definitions
Builds	
  on	
  USC	
  Tagfiler	
  project	
  (C.	
  Kesselman	
  et	
  al.)	
  
17	
  
mydata42	
  
owner:	
  Francesco	
  
type:	
  3dtomo	
  
format:	
  HDF5	
  
beamline:	
  2BM	
  
Tomography!
Define	
  dataset	
  
Infer	
  type	
  
Extract	
  metadata	
  
Populate	
  catalog(s)	
  
Locate	
  datasets	
  
Access	
  files	
  
analyze	
  
Catalog	
  derived
products	
  
transfer/schedule	
  
Orchestra8on	
  
Organiza8on	
  
Record	
  	
  
provenance	
  
	
  
Annotate,	
  share	
  
browse,	
  search	
  
computationinstitute.org
www.globusonline.org	
  	
  
Our challenge:
Sustainability
We are a non-profit service
provider to the non-profit
research community
computationinstitute.org
www.globusonline.org	
  	
  
Globus Online Provider Plans
Support ongoing operations
Offer value-added capabilities
Engage more closely with users
computationinstitute.org
www.globusonline.org	
  	
  
Starting at $20k per year
•  Provider endpoints with sharing
•  Multiple GridFTP servers per endpoint
•  Branded web sites
•  Alternate identity provider
•  Usage reporting
•  MSS optimizations
•  Operations monitoring and management
•  Input into and access to product roadmap
Provider Plans offer…
computationinstitute.org
www.globusonline.org	
  	
  
Thanks to great colleagues 

and collaborators
•  Steve Tuecke, Rachana Ananthakrishnan, Kyle
Chard, Raj Kettimuthu, Ravi Madduri, Tanu
Malik, and many others at Argonne & Uchicago
•  Carl Kesselman, Karl Czajkowski, Rob Schuler,
and others at USC/ISI
•  Birali Runesha and others at UChicago
Research Computing Center
computationinstitute.org
www.globusonline.org	
  	
  
Thank	
  you	
  to	
  our	
  sponsors!	
  

Mais conteúdo relacionado

Semelhante a Research Data Management as a Service

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 

Semelhante a Research Data Management as a Service (20)

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
Hybrid Strategies for Research Data Management
Hybrid Strategies for Research Data ManagementHybrid Strategies for Research Data Management
Hybrid Strategies for Research Data Management
 
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
Making the Big Move: Moving to Cloud-Based OCLC’s WorldShare Management Servi...
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
What do you want to discover today? / Janet Aucock, University of St Andrews
What do you want to discover today? / Janet Aucock, University of St AndrewsWhat do you want to discover today? / Janet Aucock, University of St Andrews
What do you want to discover today? / Janet Aucock, University of St Andrews
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Managing Ontologies
Managing OntologiesManaging Ontologies
Managing Ontologies
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCFIntroduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCF
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsug
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 

Mais de Globus

Mais de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Research Data Management as a Service

Notas do Editor

  1. Here are some of the areas where we have active projectsFocus on areas of particular interest to I2/Esnet, namely HEP, climate change, genomics (up and coming)
  2. Many in this room are probably users of Dropbox or similar services for keeping their files synced across multiple machinesWell, the scientific research equivalent is a little different
  3. So how would such a drop box for science be used? Let’s look at a very typical scientific data work flow . . .Data is generated by some instrument (a sequencer at JGI or a light source like APS/ALS)…since these instruments are in high demand, users have to get their data off the instrument to make way for the next userSo the data is typically moved from a staging area to some type of ingest storeEtcetera for analysis, sharing of results with collaborators, annotation with metadata for future search, backup/sync/archival, …
  4. We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  5. Started with seemingly simple/mundane task of transferring files …etc.
  6. http://datasets.globus.org/carl-catalog/query/propertyA=value1
  7. http://www.blyberg.net/card-generator/http://www.sciencemag.org/content/332/6025/88/F1.large.jpg