SlideShare a Scribd company logo
1 of 61
Streamlined
data sharing and analysis
to accelerate cancer research
Ian Foster
The University of Chicago and Argonne National Laboratory
1
2
Thesis: We enhance
sharing and analysis
by eliminating friction
3
1919 Motor Transport Corps convoy
Washington, DC., to San Francisco
56 days, average speed of 9 km/h
2016: 41 hours by road, 5.5 hours by air
5
2 minutes by web
<1 second by API
Cloud: Outsourcing and automation
6
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)
Cloud: Outsourcing and automation
7
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)Saas for
science
Data challenges:
Olopade Lab
Inherited hematological malignancies
Impact:
• Familial blood cancer syndromes are being included in the 2016 revision of World Health Organization
Classification of Hematological Malignancies; NCCN guidelines; European LeukemiaNet
• Identification of germline mutations is important for prevention/intervention and early diagnosis, and
may change treatment (e.g., stem cell transplant from related donor w/o mutation or matched
unrelated donor)
Background:
• Familial predisposition to blood cancers has not been widely appreciated,
like some solid cancers
• Identifying the genes involved informs understanding of biology and may
impact patient care (prevention, diagnosis and treatment)
Jane Churpek, MD Lucy Godley, MD, PhD
Research highlights:
• With samples from >500 families, the team has identified novel germline
mutations that predispose to familial myelodysplastic syndromes and leukemia
• These mutations are much more common than previously known
• Specific genes with identified mutations include RUNX1, ETV6, DDX41, ANKRD26
The RUNX1
International
Sequencing
Consortium
(RISC)
Inherited hematological malignancies, Lucy Godley, UChicago
Notable areas of friction
• Moving data rapidly, securely, and reliably from lab to lab
• Accessing data at other labs
• Controlling who can access data
• Tracking what data is where
• Discovering available data within a rapidly growing haystack
• Computing at scale
• Complying with rules on personal health information
• Archive and backup
11
Sequencing center
Publication
repository
Personal Computer
Compute facility
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Publication
repository
Personal Computer
1
Sequencing center Compute facility
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Compute facilityGlobus transfers
files reliably,
securely
2
Personal Computer
Transfer
1
Sequencing center
Publication
repository
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Publication
repository
Personal Computer
1 3
Share
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Publication
repository
Personal Computer
1 3
Share
4
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Publication
repository
Personal Computer
1 3
Share
5
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific) Publication
repository
Personal Computer
1 3
Share
Publish
5
6
6
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific) Publication
repository
Personal Computer
1 3
Share
Publish
5
6
6
7
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific)
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
Publication
repository
Personal Computer
1 3
Share
Publish
Discover
5
6
6
7
8
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
Sequencing center
Researcher
initiates transfer
request; or requested
automatically by script,
science gateway
Curator reviews and
approves; data set published
on campus or other system
Researcher
selects files to
share, selects user
or group, and sets
access permissions
Collaborator logs in
to access shared
files; no local
account needed;
download via
Globus
Researcher
assembles data set;
attaches metadata
(Dublin core,
domain-specific)
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
Publication
repository
Personal Computer
• Only Web browser required
• Use any storage system
• Access using any credential
1 3
Share
Publish
Discover
5
6
6
7
8
Compute facilityGlobus transfers
files reliably,
securely
2
Transfer
Sequencing center
Globus controls access to
shared files on existing
storage; no need to move
files to cloud storage!
4
How Globus adds value…
• Ease of use, consistent user interface across systems
• “Fire-and-forget” reliable file transfer
• Low-overhead external collaboration
• Secure access, multi-tier security model
• Maximized wide area network throughput
• Rapid deployment via standard packages
• Highly automatable: CLI, RESTful API
23
24
25
26
27
28
29
30
Storage connectors
Standard (Posix)
Linux
Windows
MacOS
Lustre, GPFS, OrangeFS, ...
31
Premium
HPSS
HDFS
Amazon S3
Ceph RadosGW (S3 API)
Spectra Logic BlackPearl
Google Drive*
* Coming soon
Globus accelerates disk-to-disk throughput
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
scp
scp (w/HPN)
sftp
GridFTP
(1 stream)
GridFTP
(4 streams)
Disk-to-Disk Throughput (Mbps)
32Source: ESnet (2016)
• Berkeley, CA to Argonne, IL
(RTT: 53 ms, Capacity: 10Gbps)
• scp is 24x slower than GridFTP
• >1 Gbps (125 MB/s) disk-to-disk
requires RAID array
34
35
36
Globus is widely used
4
major services
13
national labs
190 PB
transferred
10,000
active endpoints
20 billion
files processed
10,000
active users
50,000
registered users
99.9%
uptime
35+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest
continuously
managed transfer
130
federated
campus identities
0
20
40
60
80
100
120
140
160
180
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
2015-10
2015-11
2015-12
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
Terabytes
Year and Month
Terabytes per Month
0
20
40
60
80
100
120
140
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
2015-10
2015-11
2015-12
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
Users
Year and Month
Users per Month
Globus @ NIH
Globus subscriptions for sustainability
• Standard subscription
• Shared endpoints
• Data publication
• HTTPS support*
• Management console
• Usage reporting
• Priority support
• Application integration
• Branded Web Site
• Premium Storage Connectors
• Amazon S3, Ceph, HPSS, Spectra, Google Drive*, …
• Alternate Identity Provider (InCommon is standard) 39*Available late 2016
Representativesubscribers
41
Cloud: Outsourcing and automation
42
Software as a service: SaaS
Infrastructure as a service: IaaS
Platform as a service: PaaS
(web & mobile apps)
PaaS for
science
43
44
45
46
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
web server
search
database
authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
47
https://fasterdata.es.net/
Globus
leverages
Science
DMZs
Prototypical research data portal
• Move portal storage
into Science DMZ,
with Globus endpoint
• Leave portal web
server behind firewall
• Globus handles
security and data
heavy lifting
48
Desktop
Globus Cloud
Firewall
Science DMZ
Globus
Transfer
Service
Portal Web
Server (Client)
Globus Auth
Browser
User’s
Endpoint
(optional)
Portal
Endpoint
Other
Endpoints
HTTPS
GridFTP
REST Other
Services
Globus Web
Widgets
50
https://github.com/globus/globus-sample-data-portal
51
https://www.globusworld.org/tour/
Workflows can be easily defined
and automated with integrated
Galaxy Platform capabilities
Data movement is streamlined
with integrated Globus transfer
Resources can be provisioned on-
demand with Amazon Web Services
cloud based infrastructure
Globus Genomics: Genomics analysis as a service
Ravi Madduri et al., University of Chicago
Globus Genomics use cases
A profile of inherited predisposition to breast cancer among Nigerian women
Y. Zheng, T. Walsh, F. Yoshimatsu, M. Lee, S. Gulsuner, S. Casadei, A. Rodriguez, T. Ogundiran, C. Babalola, O.
Ojengbede, D. Sighoko, R. Madduri, M.-C. King, O. Olopade
A case study for high throughput analysis of NGS data for translational research
using Globus Genomics
D. Sulakhe, A. Rodriguez, K. Bhuvaneshwar, Y. Gusev, R. Madduri, L. Lacinski, U. Dave, I. Foster, S. Madhavan
Globus Genomics at a glance
30
institutions, groups
10s
million core hours
2 PBs
raw sequence
analyzed
1,500
analysis tools
10,000
genomes
processed
50
workflows
99%
uptime over the past
two years
1 PB
data generated
43
steps in longest
pipeline
100s
species
75
largest user group
5 days
longest running
workflow
Cost-aware provisioning on cloud resources
55
1. Filter instance types with profiles
2. Determine price for each instance
type across all availability zones
3. Rank potential requests
4. Make requests and monitor
5. Cancel or repurpose excess active
requests once one is fulfilled
Can reduce costs by 95% or more!
$$$
???
R. Chard et al. Cost-aware cloud provisioning, 11th IEEE International Conference on e-Science (e-Science), 2015.
What’s coming soon: Richer endpoints
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
56
GridFTP
What’s coming soon: Richer endpoints
57
GridFTP
Collections
• Groupings of files that are to be
treated as logical units
• Can be named and described
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
What’s coming soon: Richer endpoints
58
Data search
• Automated metadata harvesting
• From Globus endpoints
• Submitted via REST API
• Rich search capabilities
• Free text, faceted, boosted
GridFTP
HTTPS access to endpoints
• Enhanced use of research storage:
• Asynchronous, bulk transfer: GridFTP
• Synchronous remote access: HTTPS
• Enhanced Globus web app
• Browser-based upload/download
• Inline file viewer
• Integration with clients, web apps
Collections
• Groupings of files that are to be
treated as logical units
• Can be named and described
Thank you to our sponsors
U . S . D E P A R T M E N T O F
ENERGY
59
Thanks to: Rachana Ananthakrishnan, Kyle Chard, Ravi Madduri,
Brigitte Raumann, Steve Tuecke, Vas Vasiliadis,
and others in the Globus team at the University of Chicago
Globus provides a new global-scale data fabric that can accelerate
discovery by streamlining scientific data sharing and analysis
• Globus-enabled storage systems enable robust, secure access
• Globus cloud services implement transfer, sharing, publication,
discovery, and other capabilities
This fabric is:
• Being applied in cancer research
• Spreading rapidly by word of mouth (scientists like it!)
• Widely deployed across universities and labs (thanks, NSF & DOE)
• On a path to sustainability based on subscriptions
• Being integrated into research infrastructures and applications 60
To accelerate impact in biomedicine:
•Integrate biomedical research facilities into the fabric
•Encourage subscriptions to address sustainability
•Provide HIPAA compliance for applications involving PHI
•Cultivate an ecosystem of data portals and applications
that leverage the platform
•Continue to add capabilities
61
www.globus.org foster@uchicago.edu

More Related Content

What's hot

Managing data in computational edge clouds
Managing data in computational edge cloudsManaging data in computational edge clouds
Managing data in computational edge cloudsNitinder Mohan
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataUniversity of Chicago
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Amazon Web Services
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010Ian Foster
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyJuan Antonio Vizcaino
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET Journal
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCPIdan Tohami
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusTanu Malik
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...IRJET Journal
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronDataWorks Summit
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panelRavi Madduri
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
 

What's hot (20)

Managing data in computational edge clouds
Managing data in computational edge cloudsManaging data in computational edge clouds
Managing data in computational edge clouds
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific Data
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication System
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCP
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache Metron
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
 

Similar to Streamlined data sharing and analysis to accelerate cancer research

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusGlobus
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
 
RDAP 15: Research Data Management Using Globus Software-as-a-Service
RDAP 15: Research Data Management Using Globus Software-as-a-ServiceRDAP 15: Research Data Management Using Globus Software-as-a-Service
RDAP 15: Research Data Management Using Globus Software-as-a-ServiceASIS&T
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataIan Foster
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File TransferGlobus
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Globus
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Globus
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobus
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialGlobus
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Globus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Globus
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 

Similar to Streamlined data sharing and analysis to accelerate cancer research (20)

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
RDAP 15: Research Data Management Using Globus Software-as-a-Service
RDAP 15: Research Data Management Using Globus Software-as-a-ServiceRDAP 15: Research Data Management Using Globus Software-as-a-Service
RDAP 15: Research Data Management Using Globus Software-as-a-Service
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Globus presentation
Globus presentationGlobus presentation
Globus presentation
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File Transfer
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to Globus
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 

More from Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 

More from Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

Streamlined data sharing and analysis to accelerate cancer research

  • 1. Streamlined data sharing and analysis to accelerate cancer research Ian Foster The University of Chicago and Argonne National Laboratory 1
  • 2. 2 Thesis: We enhance sharing and analysis by eliminating friction
  • 3. 3 1919 Motor Transport Corps convoy Washington, DC., to San Francisco 56 days, average speed of 9 km/h
  • 4. 2016: 41 hours by road, 5.5 hours by air
  • 5. 5 2 minutes by web <1 second by API
  • 6. Cloud: Outsourcing and automation 6 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps)
  • 7. Cloud: Outsourcing and automation 7 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps)Saas for science
  • 9. Inherited hematological malignancies Impact: • Familial blood cancer syndromes are being included in the 2016 revision of World Health Organization Classification of Hematological Malignancies; NCCN guidelines; European LeukemiaNet • Identification of germline mutations is important for prevention/intervention and early diagnosis, and may change treatment (e.g., stem cell transplant from related donor w/o mutation or matched unrelated donor) Background: • Familial predisposition to blood cancers has not been widely appreciated, like some solid cancers • Identifying the genes involved informs understanding of biology and may impact patient care (prevention, diagnosis and treatment) Jane Churpek, MD Lucy Godley, MD, PhD Research highlights: • With samples from >500 families, the team has identified novel germline mutations that predispose to familial myelodysplastic syndromes and leukemia • These mutations are much more common than previously known • Specific genes with identified mutations include RUNX1, ETV6, DDX41, ANKRD26
  • 11. Notable areas of friction • Moving data rapidly, securely, and reliably from lab to lab • Accessing data at other labs • Controlling who can access data • Tracking what data is where • Discovering available data within a rapidly growing haystack • Computing at scale • Complying with rules on personal health information • Archive and backup 11
  • 12.
  • 14. Researcher initiates transfer request; or requested automatically by script, science gateway Publication repository Personal Computer 1 Sequencing center Compute facility
  • 15. Researcher initiates transfer request; or requested automatically by script, science gateway Compute facilityGlobus transfers files reliably, securely 2 Personal Computer Transfer 1 Sequencing center Publication repository
  • 16. Researcher initiates transfer request; or requested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Publication repository Personal Computer 1 3 Share Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center
  • 17. Researcher initiates transfer request; or requested automatically by script, science gateway Globus controls access to shared files on existing storage; no need to move files to cloud storage! Researcher selects files to share, selects user or group, and sets access permissions Publication repository Personal Computer 1 3 Share 4 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center
  • 18. Researcher initiates transfer request; or requested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Publication repository Personal Computer 1 3 Share 5 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 19. Researcher initiates transfer request; or requested automatically by script, science gateway Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Publication repository Personal Computer 1 3 Share Publish 5 6 6 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 20. Researcher initiates transfer request; or requested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Publication repository Personal Computer 1 3 Share Publish 5 6 6 7 Compute facilityGlobus transfers files reliably, securely 2 Transfer Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Sequencing center
  • 21. Researcher initiates transfer request; or requested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Peers, collaborators search and discover datasets; transfer and share using Globus Publication repository Personal Computer 1 3 Share Publish Discover 5 6 6 7 8 Compute facilityGlobus transfers files reliably, securely 2 Transfer Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Sequencing center
  • 22. Researcher initiates transfer request; or requested automatically by script, science gateway Curator reviews and approves; data set published on campus or other system Researcher selects files to share, selects user or group, and sets access permissions Collaborator logs in to access shared files; no local account needed; download via Globus Researcher assembles data set; attaches metadata (Dublin core, domain-specific) Peers, collaborators search and discover datasets; transfer and share using Globus Publication repository Personal Computer • Only Web browser required • Use any storage system • Access using any credential 1 3 Share Publish Discover 5 6 6 7 8 Compute facilityGlobus transfers files reliably, securely 2 Transfer Sequencing center Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4
  • 23. How Globus adds value… • Ease of use, consistent user interface across systems • “Fire-and-forget” reliable file transfer • Low-overhead external collaboration • Secure access, multi-tier security model • Maximized wide area network throughput • Rapid deployment via standard packages • Highly automatable: CLI, RESTful API 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. Storage connectors Standard (Posix) Linux Windows MacOS Lustre, GPFS, OrangeFS, ... 31 Premium HPSS HDFS Amazon S3 Ceph RadosGW (S3 API) Spectra Logic BlackPearl Google Drive* * Coming soon
  • 32. Globus accelerates disk-to-disk throughput 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 scp scp (w/HPN) sftp GridFTP (1 stream) GridFTP (4 streams) Disk-to-Disk Throughput (Mbps) 32Source: ESnet (2016) • Berkeley, CA to Argonne, IL (RTT: 53 ms, Capacity: 10Gbps) • scp is 24x slower than GridFTP • >1 Gbps (125 MB/s) disk-to-disk requires RAID array
  • 33.
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. Globus is widely used 4 major services 13 national labs 190 PB transferred 10,000 active endpoints 20 billion files processed 10,000 active users 50,000 registered users 99.9% uptime 35+ institutional subscribers 1 PB largest single transfer to date 3 months longest continuously managed transfer 130 federated campus identities
  • 38. 0 20 40 60 80 100 120 140 160 180 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 Terabytes Year and Month Terabytes per Month 0 20 40 60 80 100 120 140 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 Users Year and Month Users per Month Globus @ NIH
  • 39. Globus subscriptions for sustainability • Standard subscription • Shared endpoints • Data publication • HTTPS support* • Management console • Usage reporting • Priority support • Application integration • Branded Web Site • Premium Storage Connectors • Amazon S3, Ceph, HPSS, Spectra, Google Drive*, … • Alternate Identity Provider (InCommon is standard) 39*Available late 2016
  • 40.
  • 42. Cloud: Outsourcing and automation 42 Software as a service: SaaS Infrastructure as a service: IaaS Platform as a service: PaaS (web & mobile apps) PaaS for science
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. 46
  • 47. 10GE10GE 10GE 10GE Border Router WAN Science DMZ Switch/Router Firewall Enterprise perfSONAR perfSONAR 10GE 10GE 10GE 10GE DTN DTN API DTNs (data access governed by portal) DTN DTN perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Portal server applications: web server search database authentication Data Path Data Transfer Path Portal Query/Browse Path 47 https://fasterdata.es.net/ Globus leverages Science DMZs
  • 48. Prototypical research data portal • Move portal storage into Science DMZ, with Globus endpoint • Leave portal web server behind firewall • Globus handles security and data heavy lifting 48 Desktop Globus Cloud Firewall Science DMZ Globus Transfer Service Portal Web Server (Client) Globus Auth Browser User’s Endpoint (optional) Portal Endpoint Other Endpoints HTTPS GridFTP REST Other Services Globus Web Widgets
  • 49.
  • 52. Workflows can be easily defined and automated with integrated Galaxy Platform capabilities Data movement is streamlined with integrated Globus transfer Resources can be provisioned on- demand with Amazon Web Services cloud based infrastructure Globus Genomics: Genomics analysis as a service Ravi Madduri et al., University of Chicago
  • 53. Globus Genomics use cases A profile of inherited predisposition to breast cancer among Nigerian women Y. Zheng, T. Walsh, F. Yoshimatsu, M. Lee, S. Gulsuner, S. Casadei, A. Rodriguez, T. Ogundiran, C. Babalola, O. Ojengbede, D. Sighoko, R. Madduri, M.-C. King, O. Olopade A case study for high throughput analysis of NGS data for translational research using Globus Genomics D. Sulakhe, A. Rodriguez, K. Bhuvaneshwar, Y. Gusev, R. Madduri, L. Lacinski, U. Dave, I. Foster, S. Madhavan
  • 54. Globus Genomics at a glance 30 institutions, groups 10s million core hours 2 PBs raw sequence analyzed 1,500 analysis tools 10,000 genomes processed 50 workflows 99% uptime over the past two years 1 PB data generated 43 steps in longest pipeline 100s species 75 largest user group 5 days longest running workflow
  • 55. Cost-aware provisioning on cloud resources 55 1. Filter instance types with profiles 2. Determine price for each instance type across all availability zones 3. Rank potential requests 4. Make requests and monitor 5. Cancel or repurpose excess active requests once one is fulfilled Can reduce costs by 95% or more! $$$ ??? R. Chard et al. Cost-aware cloud provisioning, 11th IEEE International Conference on e-Science (e-Science), 2015.
  • 56. What’s coming soon: Richer endpoints HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps 56 GridFTP
  • 57. What’s coming soon: Richer endpoints 57 GridFTP Collections • Groupings of files that are to be treated as logical units • Can be named and described HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps
  • 58. What’s coming soon: Richer endpoints 58 Data search • Automated metadata harvesting • From Globus endpoints • Submitted via REST API • Rich search capabilities • Free text, faceted, boosted GridFTP HTTPS access to endpoints • Enhanced use of research storage: • Asynchronous, bulk transfer: GridFTP • Synchronous remote access: HTTPS • Enhanced Globus web app • Browser-based upload/download • Inline file viewer • Integration with clients, web apps Collections • Groupings of files that are to be treated as logical units • Can be named and described
  • 59. Thank you to our sponsors U . S . D E P A R T M E N T O F ENERGY 59 Thanks to: Rachana Ananthakrishnan, Kyle Chard, Ravi Madduri, Brigitte Raumann, Steve Tuecke, Vas Vasiliadis, and others in the Globus team at the University of Chicago
  • 60. Globus provides a new global-scale data fabric that can accelerate discovery by streamlining scientific data sharing and analysis • Globus-enabled storage systems enable robust, secure access • Globus cloud services implement transfer, sharing, publication, discovery, and other capabilities This fabric is: • Being applied in cancer research • Spreading rapidly by word of mouth (scientists like it!) • Widely deployed across universities and labs (thanks, NSF & DOE) • On a path to sustainability based on subscriptions • Being integrated into research infrastructures and applications 60
  • 61. To accelerate impact in biomedicine: •Integrate biomedical research facilities into the fabric •Encourage subscriptions to address sustainability •Provide HIPAA compliance for applications involving PHI •Cultivate an ecosystem of data portals and applications that leverage the platform •Continue to add capabilities 61 www.globus.org foster@uchicago.edu

Editor's Notes

  1. Colonel Dwight D. Eisenhower
  2. Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  3. Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  4. Data Publication and Discovery
  5. Amazon VPC Microsoft one Add consumers Consumers, SMBs, large enterprises
  6. We built this pipeline to create high quality variants using multiple genotyping algorithms