SlideShare uma empresa Scribd logo
1 de 83
Baixar para ler offline
Manage Data with Assurance
Ian Foster
Rachana Ananthakrishnan
Steve Tuecke
Vas Vasiliadis
Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
Data keeps moving!
3
Globus by the numbers...
7,400
active shared
endpoints
100+
subscribers
600 PB
moved
22,000
active personal
endpoints
90 billion
files processed
1,800
active server
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
600+
identity providers
2000+
most shared
endpoints
at a single
institution 138,000
registered users
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Jul-16 Jan-17 Jul-17 Jan-18 Jul-18 Jan-19
Active Endpoints by Month
Free Subscribed
Globus User Story Highlights
File Sharing
Value
Improved
Performance
Ease of Use
Connector
Benefits
“We needed an easy way to share terabytes of data on a regular basis
with dozens of researchers. Thanks to Globus sharing, it’s easy for us to
get our researchers the data they need.”
Platform
Development
“Now Canadian researchers have a single repository where data can
easily and securely be accessed, searched and shared.”
“With Globus, our
researchers have one less
thing to worry about!”
“I routinely have to move hundreds of gigabytes of data – Globus makes it
easy, so I can execute these transfers with very little effort.”
“Users can quickly, effectively, and
securely share data with their research
community or the broader public.”
“WVU uses Globus to
archive research data
out to Google Drive.”
“[BlackPearl with Globus] enables us
to archive and share petabytes of
information in a convenient solution.”
Usage Briefs: www.globus.org/usage-brief-library User Stories: www.globus.org/user-stories
What makes it all worthwhile
“Whatever you are studying right now, if
you are not getting up to speed on deep
learning, neural networks, etc., you lose.
We are going through the process where
software will automate software,
automation will automate automation.”
-- Mark Cuban
10
Configure apparatus/write code
Run experiments
Solve
societal
problems
Create knowledge
What scientists
want to do
Most
scientist
time
Analyze and plan
Opportunities for AI in science:
Research today
11
Run experiments
Create knowledge
Most
scientist
time
AI
assistants
Analyze and plan
Opportunities for AI in science:
Research tomorrow
Solve
societal
problems
Configure apparatus/write code
AI at Argonne: data-driven discovery
Strong and weak lensing
in sky survey data
Prediction of antimicrobial
resistance phenotypes
Prediction of radiation
stopping power
Identification and tracking
of storms
Parameter extraction in
atom probe tomography
Learning for dynamic
sampling in spectroscopy
Structure-property-process
triangle in additive manufact.
Vehicle energy
consumption prediction
Photometric red shift
estimation
New materials for efficient
solar cells
Cosmic Microwave
Background emulation
Enhancement of noisy
tomographic images
Nowcasting with
convolutional LSTMs
Efficient climate model
emulators
Defect-level prediction in
seminconductors
Flying object detector for
edge deployment
Discovery of new energy
storage materials
Reduced order modeling
of laser sintering
13
Model
creation
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerato
rs
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Rethinking Data infrastructure for Science AI
14
Model
creation
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
Rethinking Data infrastructure for Science AI
15
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
Transfer
Auth
Sharing
Model
creation
Rethinking Data infrastructure for Science AI
16
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
funcX
Transfer
Automate
Auth
Sharing
Identifers
Model
creation
Rethinking Data infrastructure for Science AI
17
Data
ingest
Inference
HPO
Data
enhancement
Data
QA/QC
Feature
selection
Model
training
UQ
Model
reduction Active/
reinforcement
learning
Scientific instruments
Major user facilities
Laboratory equipment
Automated labs
…
Sensors
Environmental
Laboratories
Mobile
…
Simulation codes
Computational results
Function memorization
…
Databases
Reference data
Experimental data
Computed properties
Scientific literature
…
AI Workflows
Data
Models
,
Accelerat
ors
Compute
Agile
Infrastructure
Surrogates
Scientists
Expert input
Goal setting
…
AI industry, academia
New methods
Open source codes
AI accelerators
…
Agile services
Data
transfer
Registries
Data
sharing
Containers
Integrity
Automation
FaaS Identifiers
DLHub
xDF
funcX
Parsl
Transfer
Automate
Petrel
Auth
Sharing
Identifers
Model
creation
CANDLE
Rethinking Data infrastructure for Science AI
DLHub: Organizing and Serving Models
• Collect, publish, categorize models
• Serve models via API with access
controls to simplify sharing,
consumption, and access
• Leverage ALCF resources and
prepare for Exascale ML
• Deploy and scale automatically
• Provide citable DOI for
reproducible science
Argonne Advanced Computing LDRD Cherukara et al.
Energy Storage Tomography
www.dlhub.org Models and Processing Logic as a Service
X-Ray Science
Ward et al. TomoGAN: Liu et al.
Input
Output
funcX: Think “compute endpoints”
funcX: Think “compute endpoints”
Automation: Ripple Pipelines21
Automation: Neuroanatomy
Web
form
User input
Search
Ingest
Share
Set policy
Identifier
Mint DOI
funcX
Auth
Get
credentials
Automate
Run job
Describe
Get
metadata
Transfer
Transfer
data
funcX
Run job
Transfer
Transfer
data
Manage Protected Data
25
Higher assurance levels for HIPAA and other regulated data
• Support for managed data
transfer of protected data such
as health related information
• Share data with collaborators
while meeting compliance
requirements
• Administration and
management of access
• Includes BAA option
Globus for high assurance data management
• Restricted data handling
– PHI (Protected Health Information)
– PII (Personally identifiable information)
– Controlled Unclassified Information
• University of Chicago security controls
– NIST 800-53 Low
– Superset of 800-171 Low
• Business Associate Agreements (BAA) between
University of Chicago and our subscribers
Services in scope
• Globus Services: Auth, Transfer & Sharing, Groups
• Globus Connect Server v5.2 and above
• Globus Connect Personal v3.x
• Web app (app.globus.org)
• Globus Command Line Interface (CLI)
• Connectors: POSIX, Google Drive, AWS S3, CEPH
Restricted data disclosure to Globus
• Globus never sees file contents
– File contents can have restricted data
• File paths/name can have restricted data (e.g. PHI)
• No other elements (endpoint definitions, labels,
collection definitions) can contain restricted data
Product enhancements for high assurance
• Additional authentication assurance
– Authenticate with specific identity within specific time within a
session
• Isolation of applications
– Authentication context is per application, per session (~browser
session)
• Enforces encryption of all user data in transit
• Audit logging
– Both at the institution and Globus services
Product enhancements for high assurance
• Additional security requirements enforced on
management of all high assurance resources
– Data access, and any interaction that can lead to data access
– Examples: Groups, Management Console
• Enhanced user interfaces for seamless management
of protected data
– Webapp and CLI
Operational enhancements for high assurance
• Intrusion detection and prevention
• Encryption
• Enhanced logging
• Secure remote access, access control, and secure
practices for laptops
• Uniform configuration management and change control
• AWS best practices for secure environment: VPCs,
security groups, IAM best practices
New subscription levels
• High Assurance
– 33% uplift on Standard subscription
and on premium connectors used for
high assurance data
• BAA
– All High Assurance features + BAA
with University of Chicago
– 50% uplift on Standard subscription
and on premium connectors used
under a BAA
High Assurance
Demonstration
33
Web app enhancements
• Accessibility
– Target WCAG 2.0 AA compliance
• Responsiveness and touch
• Works with new connectors
collections.globus.org
34
Web app enhancements
• Customizable interface
• Full screen view
• Compact file listing
display
• Remember user
configuration
– Single vs. dual panel
– Columns displayed
• Continue incorporating
user feedback
CLI enhancements
• Support for use with high assurance collections
• '--format UNIX' flag - output suitable for line-oriented
processing with typical Unix tools
• 'globus rm' command
• 'globus whoami --linked-identities' flag to show all linked
identities
• '--timeout-exit-code' flag overrides the default exit code
for commands which wait on tasks
• Enhancements to SDK as needed.
36
Connector updates
• Enhanced user experience for credential handling for
several connectors (GCSv5)
• AWS S3
– Automated multi-region support
• Google Drive
– Enhancement to retry handling for large transfers
• HPSS
– Support added for HPSS 7.5 (7.3 to 7.5 supported)
– Improved asynchronous staging from tape
– New home for documentation: docs.globus.org/premium-
storage-connectors/hpss
38
S3 compatible systems
• Initial customer
deployments
• Validation, testing and
vendor engagement planned
• Additional systems driven
by customer demand
39
Announcing our latest
connector…
beta
globus.org/connectors/box
Globus for Box
• Extends the value of your Box deployment
• Unifies access to cloud and on-prem storage
• Transitions protected data (HIPAA-regulated,
CUI) seamlessly between Box and other storage
systems
41
42
Box for Globus
Demonstration
Make Box part of your
research storage ecosystem
globus.org/connectors/box
docs.globus.org/premium-storage-connectors/box
Globus Connect Server v5.3
• Subsumes GCS version 5.0, 5.1, 5.2
• Standard and high assurance guest collections (sharing)
• High assurance mapped collections
• Connectors: POSIX, AWS S3, CEPH, Google Drive, Box
• Data access protocols: GridFTP and HTTPS
• Single deployment support both high assurance and
standard gateway
• Upgrade all v5.x deployments to v5.3
Recent Transfer enhancements
• Verify transfer using client provided checksums
– User provided checksum used rather than source checksum for
verification
• Improvements for scaling transfer service
– Multiple nodes for transfer service for higher availability and
reliability
– Allows for code updates with no downtime
46
SSH with OAuth
• Securely access resource using SSH with federated identity
– Facilitates automation, eliminates SSH key management
– Replacement for deprecated GSI OpenSSH
• First version released
– Server side PAM module with Globus Auth support
– Command line client
• Open source, community support
– Not part of the standard subscription
– OAuth SSH Client: https://pypi.org/project/oauth-ssh/
– OAuth SSH Server PAM module: https://github.com/xsede/oauth-ssh
Where are we headed?
Enhancing the core:
Transfer
Building the future:
Platform
Globus Transfer: A complete solution
☑ Bulk transfer and sync
☑ Good end-to-end performance in myriad of real world settings
☑ End-to-end reliability
☑ Robust security, with federated identities
☑ Layers onto diverse storage systems
☑ Web-compatible client/server remote access
☑ Easy to use interfaces
☑ Easy installation and administration
☑ Sharing data with guest users
☑ Dedicated professional support
50
HTTPS and what it enables
• Browser based up/download
• Allow your
(research) storage
to be “on the web”
• Enforce same security
policies
51
Globus Connect Server v5 Milestones
v5.0: Google
Drive
v5.1: POSIX guest
collections, HTTPS
v5.x: v4 feature parity+
v5.3
• Multi DTN support
• Additional storage
systems
• Endpoint specific
identity providers
• …
Other
features
v5.2: High
assurance
v5.4: …
GCSv5: Key enabling technology for the future
• Challenge: Managing increasing amount of shared, dynamic state among multiple
DTNs
– Endpoint configuration
– Multiple storage gateway configurations
– Collection configurations
– Credentials (user and system)
• Approach: Stateless DTNs
– No persistent state on DTN
– Multi-DTN endpoints without a shared file system
• GCS state stored in the cloud
– Dynamic sync of state to each DTN
– Enabled by our use of AWS AppSync
• Customer managed encryption keys with optional escrow
– Only you can see and modify your endpoint’s state
• Facilitates creation of new Globus Connect features
GCSv5 has significant admin benefits
• Greatly simplified multi-DTN deployment
– Bootstrap DTN from only client id & secret, and encryption key
– No more copy-pasting GCS config files with every change
– Command line, REST API, and (eventually) web admin of GCS
– Automatic synchronization amongst DTNs
• Rapid recovery from failures
– Restore all nodes from stored state with minimal effort
– No local backups of GCS state required
• Lost client ID/secret? Recover them from Auth.
• Enables us to roll out new features more quickly
What does it mean for you?
• No sudden moves!
• Ready for GCS v4 to v5 migration late this year
• Tools will be available for migration from GCS v4
• Comprehensive documentation
• Long migration period with parallel support of v5 & v4
• Only use GCS v5 today if you need its specific
features, otherwise continue to use GCS v4
Planned Features for Globus Transfer
• S3 compatible HTTPS interface to GCSv5 storage
• Browser based up/downloaders
• Multiple checksum algorithm support
• Manifest support
• Automated recurring replication as a service
• …
57
Rethinking data publication
• Limited adoption
– Not easily customizable
• Maintenance Challenges
– Costly to maintain
– JRE licensing concerns
• Going forward
– Code will be open source
– Leverage platform
• Invest in higher priorities
Platform challenge
• Transform how research applications, services, and
workflows are created, delivered, used, and sustained
– Scientific instrument data processing
– Repositories: Make data more FAIR
– Science gateways
• Interoperable ecosystem
59
Globus platform services
• Identity and Access Management (IAM)
– Federated identity login, Groups, Attributes, Access Control
– Auth: Oauth authorization provider
• Connect
• Transfer
– Will become a family of services
• Execution
• Search, Identifiers
• Automation
– Queues, Events, Actions, Triggers
– Flows
60
Globus Platform: Automation
61
Platform status
• Generally Available in a few years
• Separate product with separate sustainability model
• Early engagements help shape product direction
– Argonne Leadership Computing Facility, Materials Data Facility,
– NCAR Research Data Archive, NSO, …
– Use in Globus products
• Multiple integrations facilitate more complete solution
– e.g. Django, JupyterHub
– Follow progress: globus-integration-examples.readthedocs.io
• Currently accessible via professional services team
We are committed to doing
all this sustainably
Our focus: You, the
research community
is
Why not do a for-profit?
Focus: Investor ROI
è can’t serve you properly!
Sustainability >> $$
No single points of failure
Subscriber Value =
Engineering (DevOps)
+
Customer facing operations
(support, sales, outreach, training,
professional services)
Freemium means
managing tension!
Meeting current
customer needs…
…and furthering
strategic aspirations
Customer community
Delivering on requests
Product planning process
Contractual challenges
Is there a better model?
Internet2-like membership?
Network infrastructure
services provider
Research software
provider
Member fee ≈ sustainability
Governance model ≈
product influence
Do the dynamics change?
- Willingness to join/pay?
- Sufficient revenue growth?
- Greater subscriber satisfaction?
Why now?
Increasing view of Globus
as “enterprise” service
RCC à CIO
Data management needs are
increasingly pervasive
✓ Network
✓ Cycles
✓ Storage
Robust data management for all?
Expand the dialogue
HPC Management
+ IT Leadership
+ Researcher Community
From “Purchase” to “Invest”
Everyone derives more value if
Globus is a strategic partner
Intrigued?
Confused?
Amused?
Share your thoughts with us!
Thank you to our sponsors...
U . S . D E P A R T M E N T O F
ENERGY
THANK YOU, subscribers!
Program Preview
• Today
– Lightning talks
– Guest keynotes: Tom Barton, Bobby Kasthuri
– Reception
• Tomorrow
– Tutorials
– Office Hours
• Friday morning
– Customer forum
globusworld.org/conf/program
#globusworld
@globus

Mais conteúdo relacionado

Mais procurados

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
Ian Foster
 

Mais procurados (20)

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
CV-KS-Jun2015
CV-KS-Jun2015CV-KS-Jun2015
CV-KS-Jun2015
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 

Semelhante a GlobusWorld 2019 Opening Keynote

Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
Gary Wilhelm
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 

Semelhante a GlobusWorld 2019 Opening Keynote (20)

Webinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription FeaturesWebinar: Q&A on Globus Subscription Features
Webinar: Q&A on Globus Subscription Features
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
Physion.PDF
Physion.PDFPhysion.PDF
Physion.PDF
 

Mais de Globus

Mais de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

GlobusWorld 2019 Opening Keynote

  • 1. Manage Data with Assurance Ian Foster Rachana Ananthakrishnan Steve Tuecke Vas Vasiliadis
  • 2. Mission Increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  • 4. Globus by the numbers... 7,400 active shared endpoints 100+ subscribers 600 PB moved 22,000 active personal endpoints 90 billion files processed 1,800 active server endpoints 3 months longest running transfer 1 PB largest single transfer to date 99.9% availability 600+ identity providers 2000+ most shared endpoints at a single institution 138,000 registered users
  • 5. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Jul-16 Jan-17 Jul-17 Jan-18 Jul-18 Jan-19 Active Endpoints by Month Free Subscribed
  • 6.
  • 7. Globus User Story Highlights File Sharing Value Improved Performance Ease of Use Connector Benefits “We needed an easy way to share terabytes of data on a regular basis with dozens of researchers. Thanks to Globus sharing, it’s easy for us to get our researchers the data they need.” Platform Development “Now Canadian researchers have a single repository where data can easily and securely be accessed, searched and shared.” “With Globus, our researchers have one less thing to worry about!” “I routinely have to move hundreds of gigabytes of data – Globus makes it easy, so I can execute these transfers with very little effort.” “Users can quickly, effectively, and securely share data with their research community or the broader public.” “WVU uses Globus to archive research data out to Google Drive.” “[BlackPearl with Globus] enables us to archive and share petabytes of information in a convenient solution.” Usage Briefs: www.globus.org/usage-brief-library User Stories: www.globus.org/user-stories What makes it all worthwhile
  • 8. “Whatever you are studying right now, if you are not getting up to speed on deep learning, neural networks, etc., you lose. We are going through the process where software will automate software, automation will automate automation.” -- Mark Cuban
  • 9.
  • 10. 10 Configure apparatus/write code Run experiments Solve societal problems Create knowledge What scientists want to do Most scientist time Analyze and plan Opportunities for AI in science: Research today
  • 11. 11 Run experiments Create knowledge Most scientist time AI assistants Analyze and plan Opportunities for AI in science: Research tomorrow Solve societal problems Configure apparatus/write code
  • 12. AI at Argonne: data-driven discovery Strong and weak lensing in sky survey data Prediction of antimicrobial resistance phenotypes Prediction of radiation stopping power Identification and tracking of storms Parameter extraction in atom probe tomography Learning for dynamic sampling in spectroscopy Structure-property-process triangle in additive manufact. Vehicle energy consumption prediction Photometric red shift estimation New materials for efficient solar cells Cosmic Microwave Background emulation Enhancement of noisy tomographic images Nowcasting with convolutional LSTMs Efficient climate model emulators Defect-level prediction in seminconductors Flying object detector for edge deployment Discovery of new energy storage materials Reduced order modeling of laser sintering
  • 13. 13 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerato rs Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Rethinking Data infrastructure for Science AI
  • 14. 14 Model creation Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Rethinking Data infrastructure for Science AI
  • 15. 15 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers Transfer Auth Sharing Model creation Rethinking Data infrastructure for Science AI
  • 16. 16 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers funcX Transfer Automate Auth Sharing Identifers Model creation Rethinking Data infrastructure for Science AI
  • 17. 17 Data ingest Inference HPO Data enhancement Data QA/QC Feature selection Model training UQ Model reduction Active/ reinforcement learning Scientific instruments Major user facilities Laboratory equipment Automated labs … Sensors Environmental Laboratories Mobile … Simulation codes Computational results Function memorization … Databases Reference data Experimental data Computed properties Scientific literature … AI Workflows Data Models , Accelerat ors Compute Agile Infrastructure Surrogates Scientists Expert input Goal setting … AI industry, academia New methods Open source codes AI accelerators … Agile services Data transfer Registries Data sharing Containers Integrity Automation FaaS Identifiers DLHub xDF funcX Parsl Transfer Automate Petrel Auth Sharing Identifers Model creation CANDLE Rethinking Data infrastructure for Science AI
  • 18. DLHub: Organizing and Serving Models • Collect, publish, categorize models • Serve models via API with access controls to simplify sharing, consumption, and access • Leverage ALCF resources and prepare for Exascale ML • Deploy and scale automatically • Provide citable DOI for reproducible science Argonne Advanced Computing LDRD Cherukara et al. Energy Storage Tomography www.dlhub.org Models and Processing Logic as a Service X-Ray Science Ward et al. TomoGAN: Liu et al. Input Output
  • 19. funcX: Think “compute endpoints”
  • 20. funcX: Think “compute endpoints”
  • 22. Automation: Neuroanatomy Web form User input Search Ingest Share Set policy Identifier Mint DOI funcX Auth Get credentials Automate Run job Describe Get metadata Transfer Transfer data funcX Run job Transfer Transfer data
  • 23.
  • 24.
  • 25. Manage Protected Data 25 Higher assurance levels for HIPAA and other regulated data • Support for managed data transfer of protected data such as health related information • Share data with collaborators while meeting compliance requirements • Administration and management of access • Includes BAA option
  • 26. Globus for high assurance data management • Restricted data handling – PHI (Protected Health Information) – PII (Personally identifiable information) – Controlled Unclassified Information • University of Chicago security controls – NIST 800-53 Low – Superset of 800-171 Low • Business Associate Agreements (BAA) between University of Chicago and our subscribers
  • 27. Services in scope • Globus Services: Auth, Transfer & Sharing, Groups • Globus Connect Server v5.2 and above • Globus Connect Personal v3.x • Web app (app.globus.org) • Globus Command Line Interface (CLI) • Connectors: POSIX, Google Drive, AWS S3, CEPH
  • 28. Restricted data disclosure to Globus • Globus never sees file contents – File contents can have restricted data • File paths/name can have restricted data (e.g. PHI) • No other elements (endpoint definitions, labels, collection definitions) can contain restricted data
  • 29. Product enhancements for high assurance • Additional authentication assurance – Authenticate with specific identity within specific time within a session • Isolation of applications – Authentication context is per application, per session (~browser session) • Enforces encryption of all user data in transit • Audit logging – Both at the institution and Globus services
  • 30. Product enhancements for high assurance • Additional security requirements enforced on management of all high assurance resources – Data access, and any interaction that can lead to data access – Examples: Groups, Management Console • Enhanced user interfaces for seamless management of protected data – Webapp and CLI
  • 31. Operational enhancements for high assurance • Intrusion detection and prevention • Encryption • Enhanced logging • Secure remote access, access control, and secure practices for laptops • Uniform configuration management and change control • AWS best practices for secure environment: VPCs, security groups, IAM best practices
  • 32. New subscription levels • High Assurance – 33% uplift on Standard subscription and on premium connectors used for high assurance data • BAA – All High Assurance features + BAA with University of Chicago – 50% uplift on Standard subscription and on premium connectors used under a BAA
  • 34. Web app enhancements • Accessibility – Target WCAG 2.0 AA compliance • Responsiveness and touch • Works with new connectors collections.globus.org 34
  • 35. Web app enhancements • Customizable interface • Full screen view • Compact file listing display • Remember user configuration – Single vs. dual panel – Columns displayed • Continue incorporating user feedback
  • 36. CLI enhancements • Support for use with high assurance collections • '--format UNIX' flag - output suitable for line-oriented processing with typical Unix tools • 'globus rm' command • 'globus whoami --linked-identities' flag to show all linked identities • '--timeout-exit-code' flag overrides the default exit code for commands which wait on tasks • Enhancements to SDK as needed. 36
  • 37.
  • 38. Connector updates • Enhanced user experience for credential handling for several connectors (GCSv5) • AWS S3 – Automated multi-region support • Google Drive – Enhancement to retry handling for large transfers • HPSS – Support added for HPSS 7.5 (7.3 to 7.5 supported) – Improved asynchronous staging from tape – New home for documentation: docs.globus.org/premium- storage-connectors/hpss 38
  • 39. S3 compatible systems • Initial customer deployments • Validation, testing and vendor engagement planned • Additional systems driven by customer demand 39
  • 41. Globus for Box • Extends the value of your Box deployment • Unifies access to cloud and on-prem storage • Transitions protected data (HIPAA-regulated, CUI) seamlessly between Box and other storage systems 41
  • 43. Make Box part of your research storage ecosystem globus.org/connectors/box docs.globus.org/premium-storage-connectors/box
  • 44.
  • 45. Globus Connect Server v5.3 • Subsumes GCS version 5.0, 5.1, 5.2 • Standard and high assurance guest collections (sharing) • High assurance mapped collections • Connectors: POSIX, AWS S3, CEPH, Google Drive, Box • Data access protocols: GridFTP and HTTPS • Single deployment support both high assurance and standard gateway • Upgrade all v5.x deployments to v5.3
  • 46. Recent Transfer enhancements • Verify transfer using client provided checksums – User provided checksum used rather than source checksum for verification • Improvements for scaling transfer service – Multiple nodes for transfer service for higher availability and reliability – Allows for code updates with no downtime 46
  • 47. SSH with OAuth • Securely access resource using SSH with federated identity – Facilitates automation, eliminates SSH key management – Replacement for deprecated GSI OpenSSH • First version released – Server side PAM module with Globus Auth support – Command line client • Open source, community support – Not part of the standard subscription – OAuth SSH Client: https://pypi.org/project/oauth-ssh/ – OAuth SSH Server PAM module: https://github.com/xsede/oauth-ssh
  • 48. Where are we headed?
  • 50. Globus Transfer: A complete solution ☑ Bulk transfer and sync ☑ Good end-to-end performance in myriad of real world settings ☑ End-to-end reliability ☑ Robust security, with federated identities ☑ Layers onto diverse storage systems ☑ Web-compatible client/server remote access ☑ Easy to use interfaces ☑ Easy installation and administration ☑ Sharing data with guest users ☑ Dedicated professional support 50
  • 51. HTTPS and what it enables • Browser based up/download • Allow your (research) storage to be “on the web” • Enforce same security policies 51
  • 52. Globus Connect Server v5 Milestones v5.0: Google Drive v5.1: POSIX guest collections, HTTPS v5.x: v4 feature parity+ v5.3 • Multi DTN support • Additional storage systems • Endpoint specific identity providers • … Other features v5.2: High assurance v5.4: …
  • 53.
  • 54. GCSv5: Key enabling technology for the future • Challenge: Managing increasing amount of shared, dynamic state among multiple DTNs – Endpoint configuration – Multiple storage gateway configurations – Collection configurations – Credentials (user and system) • Approach: Stateless DTNs – No persistent state on DTN – Multi-DTN endpoints without a shared file system • GCS state stored in the cloud – Dynamic sync of state to each DTN – Enabled by our use of AWS AppSync • Customer managed encryption keys with optional escrow – Only you can see and modify your endpoint’s state • Facilitates creation of new Globus Connect features
  • 55. GCSv5 has significant admin benefits • Greatly simplified multi-DTN deployment – Bootstrap DTN from only client id & secret, and encryption key – No more copy-pasting GCS config files with every change – Command line, REST API, and (eventually) web admin of GCS – Automatic synchronization amongst DTNs • Rapid recovery from failures – Restore all nodes from stored state with minimal effort – No local backups of GCS state required • Lost client ID/secret? Recover them from Auth. • Enables us to roll out new features more quickly
  • 56. What does it mean for you? • No sudden moves! • Ready for GCS v4 to v5 migration late this year • Tools will be available for migration from GCS v4 • Comprehensive documentation • Long migration period with parallel support of v5 & v4 • Only use GCS v5 today if you need its specific features, otherwise continue to use GCS v4
  • 57. Planned Features for Globus Transfer • S3 compatible HTTPS interface to GCSv5 storage • Browser based up/downloaders • Multiple checksum algorithm support • Manifest support • Automated recurring replication as a service • … 57
  • 58. Rethinking data publication • Limited adoption – Not easily customizable • Maintenance Challenges – Costly to maintain – JRE licensing concerns • Going forward – Code will be open source – Leverage platform • Invest in higher priorities
  • 59. Platform challenge • Transform how research applications, services, and workflows are created, delivered, used, and sustained – Scientific instrument data processing – Repositories: Make data more FAIR – Science gateways • Interoperable ecosystem 59
  • 60. Globus platform services • Identity and Access Management (IAM) – Federated identity login, Groups, Attributes, Access Control – Auth: Oauth authorization provider • Connect • Transfer – Will become a family of services • Execution • Search, Identifiers • Automation – Queues, Events, Actions, Triggers – Flows 60
  • 62. Platform status • Generally Available in a few years • Separate product with separate sustainability model • Early engagements help shape product direction – Argonne Leadership Computing Facility, Materials Data Facility, – NCAR Research Data Archive, NSO, … – Use in Globus products • Multiple integrations facilitate more complete solution – e.g. Django, JupyterHub – Follow progress: globus-integration-examples.readthedocs.io • Currently accessible via professional services team
  • 63. We are committed to doing all this sustainably
  • 64. Our focus: You, the research community is
  • 65. Why not do a for-profit? Focus: Investor ROI è can’t serve you properly!
  • 66. Sustainability >> $$ No single points of failure
  • 67. Subscriber Value = Engineering (DevOps) + Customer facing operations (support, sales, outreach, training, professional services)
  • 68. Freemium means managing tension! Meeting current customer needs…
  • 70. Customer community Delivering on requests Product planning process Contractual challenges
  • 71. Is there a better model? Internet2-like membership?
  • 73. Member fee ≈ sustainability Governance model ≈ product influence
  • 74. Do the dynamics change? - Willingness to join/pay? - Sufficient revenue growth? - Greater subscriber satisfaction?
  • 75. Why now? Increasing view of Globus as “enterprise” service RCC à CIO
  • 76. Data management needs are increasingly pervasive ✓ Network ✓ Cycles ✓ Storage Robust data management for all?
  • 77. Expand the dialogue HPC Management + IT Leadership + Researcher Community
  • 78. From “Purchase” to “Invest” Everyone derives more value if Globus is a strategic partner
  • 80. Thank you to our sponsors... U . S . D E P A R T M E N T O F ENERGY
  • 82. Program Preview • Today – Lightning talks – Guest keynotes: Tom Barton, Bobby Kasthuri – Reception • Tomorrow – Tutorials – Office Hours • Friday morning – Customer forum globusworld.org/conf/program