2. Our mission is to…
increase the efficiency and effectiveness of
researchers engaged in data-driven
science and scholarship
through sustainable software
6. 8,300
active shared
endpoints
100+
subscribers
530 PB
moved
21,000
active personal
endpoints
83 billion
files processed
1,800
active server
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
559
identity providers
1,923
most shared
endpoints
at a single
institution 127,000
registered users
How are we doing so far?
14. Ad hoc data sharing
• Individual users share data
with collaborators
• Using a known email or
identity for user/group
• Make data publicly available
1
Compute Facility
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
Researcher selects
files to share, selects
user or group, and
sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
Personal Computer
Share
2
15. Data from instrument facility
• Provide near-real time access
to data
• Automated permissions
based on site policy
• Self managed by the PI
• Federated login to access
data
Raw data store
Personal Computer
Remote
visualization/analysis
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
16. Data from provider/archive
• Portal/science gateway used to
distribute data
• Interface for searching,
gathering data of interest
• Async or HTTS transfer to user’s
system
• Fine-grained authorization
enforced
Search and request
data of interest
Transfer
data to
destination
17. Core center data processing
• Allow user to securely
upload data for analysis
• Make analysis results
available to user
• Automate setup and tear
down of folders and
permissions
--/123/input rw
Analysis System
--/123/output r
37. Balance: performance - reliability
• Network use parameters: concurrency, parallelism
• Transfer considers source and destination settings
min(
max(preferred src, preferred dest),
max src,
max dest
)
54. • Streamlined data collaboration (file sharing)
• Unified system access (connectors)
• Secure, browser based access to research storage (HTTPS)
• Simplified access to protected data (HA endpoints)
• Sysadmin visibility/control (console, usage reporting)
• Optimized performance (network use parameters)
• App/portal/gateway integration support (REST APIs)
• Seamless user experience (branded web site, alternate IdP)
• World class support
Summary
Not just file transfer
Sustainable = thriving, not just surviving
ASK THE AUDIENCE!!!
Data management is now considered part of core IT infrastructure at most institutions
=> Expectation of production grade services among end-users -- and hence providers
It’s not just about covering operating costs
People are much more likely to invest in software if they see it will continue to evolve to meet their emerging needs
…also need to push “BIG” new ideas because current approaches have increasingly short lifespans
…as innovative research approaches emerge we need more innovative tools to manage it
Integrate data management/automation capabilities into scientific web apps, portals, gateways, etc.
Use existing institutional IdPs, secure web apps and services
Explain flow.
They get time from APS every so often, 2-3days, and efficiency and automation allows them to make the most of the time at APS.