SlideShare uma empresa Scribd logo
1 de 26
Analyzing Big Data in Medicine with
Virtual Research Environments and
Microservices
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences
Science for Life Laboratory
Uppsala University
Today: We have access to high-throughput
technologies to study biological phenomena
New challenges: Data management and
analysis
• Storage
• Analysis methods, pipelines
• Scaling
• Automation
• Data integration, security
• Predictions
• …
European Open Science Cloud (EOSC)
• The vast majority of all data in the world (in fact up to 90%) has been
generated in the last two years.
• Scientific data is in direct need of openness, better handling, careful
management, machine actionability and sheer re-use.
• European Open Science Cloud: A vision of a future infrastructure to
support Open Research Data and Open Science in Europe
– It should enable trusted access to services, systems and the re-use
of shared scientific data across disciplinary, social and geographical
borders
– research data should be findable, accessible, interoperable and re-
usable (FAIR)
– provide the means to analyze datasets of huge sizes
4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
Contemporary Big Data analysis in
bioinformatics
• High-Performance Computing with shared storage
– Linux, Terminal, batch queue
• Problems/challenges
– Access to resources is limited
– Dependency management for tools is cumbersome, need help from
system administrators to install software
– Privacy-related issues
– Difficult to share/integrate data
– Accessibility issues
• A common approach: Internet-based services
– Retrieve data
– Analysis tools
5
Workflows
6
Service-Oriented Architectures (SOA) in
the life sciences
• Standardize
– Agree on e.g. interfaces, data formats,
protocols etc.
• Decompose and compartmentalize
– Experts (scientists) should provide
services – do one thing and do it well
– Achieve interoperability by exposing
data and tools as Web services
• Integrate
– Users should access and integrate
remote services
API
Scientist
service
Scientist
consume
Service-Oriented Architectures (SOA) in
the life sciences, ~2005
Scientist
downtime
API
changed
Not maintained
Difficult to sustain,
unreliable solutions
API
API
API
Cloud Computing
• Cloud computing offers advantages over
contemporary e-infrastructures in the life sciences
– On-demand elastic resources and services
– No up-front costs, pay-per-use
• A lot of businesses (and software development)
moving into the cloud
– Vibrant ecosystem of frameworks and tools, including for
big data
• High potential for science
Virtual Machines and Containers
Virtual machines
• Package entire systems (heavy)
• Completely isolated
• Suitable in cloud environments
Containers:
• Share OS
• Smaller, faster, portable
• Docker!
10
MicroServices
• Similar to Web services: Decompose functionality into smaller, loosely
coupled services communicating via API
– “Do one thing and do it well”
• Preferably smaller, light-weight and fast to instantiate on demand
• Easy to replace, language-agnostic
– Suitable for loosely coupled teams (which we have in science)
– Portable - easy to deploy and scale
– Maximize agility for developers
• Suitable to deploy as containers in cloud environments
Scaling microservices
12
http://martinfowler.com/articles/microservices.html
13
Shipping
containers?
Orchestrating containers
14
Kubernetes: Orchestrating containers
• Origin: Google
• A declarative language for
launching containers
• Start, stop, update, and manage
a cluster of machines running
containers in a consistent and
maintainable way
• Suitable for microservices
Containers
Scheduled and packed containers on nodes
Virtual Research Environment (VRE)
• Virtual (online) environments for research
– Easy and user-friendly access to computational resources, tools and
data, commonly for a scientific domain
• Multi-tenant VRE – log into shared system
• Private VRE
– Deploy on your favorite cloud provider
16
• Horizon 2020-project, €8 M, 2015-2018
– “standardized e-infrastructure for the processing, analysis and information-
mining of the massive amount of medical molecular phenotyping and
genotyping data generated by metabolomics applications.”
• Enable users to provision their own virtual infrastructure (VRE)
– Public cloud, private cloud, local servers
– Easy access to compatible tools exposed as microservices
– Will in minutes set up and configure a complete data-center (compute
nodes, storage, networks, DNS, firewall etc)
– Can achieve high-availability, scalability and fault tolerance
• Use modern and established tools and frameworks supported by industry
– Reduce risk and improve sustainability
• Offer an agile and scalable environment to use, and a straightforward
platform to extend
http://phenomenal-h2020.eu/
Users should not see this…
Deployment and user access
Launch on reference installation
Launch on public cloud
Private VRE
In-house deployment scenarios
MRC-NIHR Phenome Centre
• Medium-sized
IT-infrastructure
• Dedicated IT-
personnel
• Users: ICL staff
Hospital environment
• Dedicated
server
• No IT-personnel
• User: Clinical
researcher
Private VRE
Build and test
tools, images,
infrastructure
Docker Hub
PhenoMeNal
Jenkins
PhenoMeNal
Container Hub
Development: Container lifecycle
Source code repositories
Two proof of concepts so far
Kultima group Pablo Moreno
Implications
• Improve sustainability
– Not dependent on specific data centers
• Improve reliability and security
– Users can run their own service environments (VREs) within isolated
environments
– High-availability and fault tolerance
• Scalability
– Deploy in elastic environments
• Agile development
– Automate “from develop to deploy”
• Agile science
– Simple access to discoverable, scalable tools on elastic compute
resources with no up-front costs
• NB: Many problems of interoperability remains!
– Data
– APIs
– etc.
24
Ongoing research on VREs
25
Data
federation
Compute
federation
Privacy
preservation
Workflows
Big Data
frameworks
Data management and
modeling
Acknowledgements
Wesley Schaal
Jonathan Alvarsson
Staffan Arvidsson
Arvid Berg
Samuel Lampa
Marco Capuccini
Martin Dahlö
Valentin Georgiev
Anders Larsson
Polina Georgiev
Maris Lapins
26
AstraZeneca
Lars Carlsson
Ernst Ahlberg
University Vienna
David Kreil
Maciej Kańduła
SNIC Science Cloud
Andreas Hellander
Salman Toor
Caramba.clinic
Kim Kultima
Stephanie Herman
Payam Emami
ToxHQ team
Barry Hardy
Thomas Exner
Joh Dokler
Daniel Bachler

Mais conteúdo relacionado

Mais procurados

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Microsoft Azure for Research
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis GannonMicrosoft Azure for Research
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobDavid Wallom
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonMicrosoft Azure for Research
 
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...ariadnenetwork
 
Genomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus PlatformGenomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus Platformkislyuk
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementBlue BRIDGE
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Low cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source TechnologyLow cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source TechnologyAfrica Open Science & Hardware
 
Science DMZ
Science DMZScience DMZ
Science DMZJisc
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloudstratuslab
 

Mais procurados (20)

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right job
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
 
Ariadne: Lifecycles
Ariadne: LifecyclesAriadne: Lifecycles
Ariadne: Lifecycles
 
Genomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus PlatformGenomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus Platform
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
containers2016
containers2016containers2016
containers2016
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Low cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source TechnologyLow cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source Technology
 
Science DMZ
Science DMZScience DMZ
Science DMZ
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 

Destaque

Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Daniel Nüst
 
Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 caseaASML
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science MeetupDaniel Nüst
 
satllite image processing
satllite image processingsatllite image processing
satllite image processingavhadlaxmikant
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data WorldCloudera, Inc.
 
Geoscience satellite image processing
Geoscience satellite image processingGeoscience satellite image processing
Geoscience satellite image processinggaurav jain
 
satellite image processing
satellite image processingsatellite image processing
satellite image processingavhadlaxmikant
 
Satellite image Processing Seminar Report
Satellite image Processing Seminar ReportSatellite image Processing Seminar Report
Satellite image Processing Seminar Reportalok ray
 
Satellite image processing
Satellite image processingSatellite image processing
Satellite image processingalok ray
 
GIS presentation
GIS presentationGIS presentation
GIS presentationarniontech
 

Destaque (14)

Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Big data -strategia
Big data  -strategiaBig data  -strategia
Big data -strategia
 
Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 casea
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
satllite image processing
satllite image processingsatllite image processing
satllite image processing
 
New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
Geoscience satellite image processing
Geoscience satellite image processingGeoscience satellite image processing
Geoscience satellite image processing
 
satellite image processing
satellite image processingsatellite image processing
satellite image processing
 
Satellite image Processing Seminar Report
Satellite image Processing Seminar ReportSatellite image Processing Seminar Report
Satellite image Processing Seminar Report
 
Satellite image processing
Satellite image processingSatellite image processing
Satellite image processing
 
Big Data In Medicine
Big Data In Medicine Big Data In Medicine
Big Data In Medicine
 
GIS presentation
GIS presentationGIS presentation
GIS presentation
 
Image processing ppt
Image processing pptImage processing ppt
Image processing ppt
 

Semelhante a Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informaticsDavid Wallom
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkKetan Paranjape
 
eROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC ArchitectureeROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC Architecturee-ROSA
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science CloudTERN Australia
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDavid Wallom
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudCitrix
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintrothomasrconnor
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloudmyGrid team
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2Alex Hardisty
 

Semelhante a Analyzing Big Data in Medicine with Virtual Research Environments and Microservices (20)

Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talk
 
eROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC ArchitectureeROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC Architecture
 
EGI Services
EGI Services EGI Services
EGI Services
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science Cloud
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'Omics
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Cyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life ScienceCyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life Science
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 

Mais de Ola Spjuth

Automating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIAutomating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIOla Spjuth
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingOla Spjuth
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsOla Spjuth
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesOla Spjuth
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenOla Spjuth
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryOla Spjuth
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceOla Spjuth
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Ola Spjuth
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Ola Spjuth
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseOla Spjuth
 

Mais de Ola Spjuth (14)

Automating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIAutomating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AI
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
 

Último

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Último (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

  • 1. Analyzing Big Data in Medicine with Virtual Research Environments and Microservices Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences Science for Life Laboratory Uppsala University
  • 2. Today: We have access to high-throughput technologies to study biological phenomena
  • 3. New challenges: Data management and analysis • Storage • Analysis methods, pipelines • Scaling • Automation • Data integration, security • Predictions • …
  • 4. European Open Science Cloud (EOSC) • The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years. • Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use. • European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe – It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders – research data should be findable, accessible, interoperable and re- usable (FAIR) – provide the means to analyze datasets of huge sizes 4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
  • 5. Contemporary Big Data analysis in bioinformatics • High-Performance Computing with shared storage – Linux, Terminal, batch queue • Problems/challenges – Access to resources is limited – Dependency management for tools is cumbersome, need help from system administrators to install software – Privacy-related issues – Difficult to share/integrate data – Accessibility issues • A common approach: Internet-based services – Retrieve data – Analysis tools 5
  • 7. Service-Oriented Architectures (SOA) in the life sciences • Standardize – Agree on e.g. interfaces, data formats, protocols etc. • Decompose and compartmentalize – Experts (scientists) should provide services – do one thing and do it well – Achieve interoperability by exposing data and tools as Web services • Integrate – Users should access and integrate remote services API Scientist service Scientist consume
  • 8. Service-Oriented Architectures (SOA) in the life sciences, ~2005 Scientist downtime API changed Not maintained Difficult to sustain, unreliable solutions API API API
  • 9. Cloud Computing • Cloud computing offers advantages over contemporary e-infrastructures in the life sciences – On-demand elastic resources and services – No up-front costs, pay-per-use • A lot of businesses (and software development) moving into the cloud – Vibrant ecosystem of frameworks and tools, including for big data • High potential for science
  • 10. Virtual Machines and Containers Virtual machines • Package entire systems (heavy) • Completely isolated • Suitable in cloud environments Containers: • Share OS • Smaller, faster, portable • Docker! 10
  • 11. MicroServices • Similar to Web services: Decompose functionality into smaller, loosely coupled services communicating via API – “Do one thing and do it well” • Preferably smaller, light-weight and fast to instantiate on demand • Easy to replace, language-agnostic – Suitable for loosely coupled teams (which we have in science) – Portable - easy to deploy and scale – Maximize agility for developers • Suitable to deploy as containers in cloud environments
  • 15. Kubernetes: Orchestrating containers • Origin: Google • A declarative language for launching containers • Start, stop, update, and manage a cluster of machines running containers in a consistent and maintainable way • Suitable for microservices Containers Scheduled and packed containers on nodes
  • 16. Virtual Research Environment (VRE) • Virtual (online) environments for research – Easy and user-friendly access to computational resources, tools and data, commonly for a scientific domain • Multi-tenant VRE – log into shared system • Private VRE – Deploy on your favorite cloud provider 16
  • 17. • Horizon 2020-project, €8 M, 2015-2018 – “standardized e-infrastructure for the processing, analysis and information- mining of the massive amount of medical molecular phenotyping and genotyping data generated by metabolomics applications.” • Enable users to provision their own virtual infrastructure (VRE) – Public cloud, private cloud, local servers – Easy access to compatible tools exposed as microservices – Will in minutes set up and configure a complete data-center (compute nodes, storage, networks, DNS, firewall etc) – Can achieve high-availability, scalability and fault tolerance • Use modern and established tools and frameworks supported by industry – Reduce risk and improve sustainability • Offer an agile and scalable environment to use, and a straightforward platform to extend http://phenomenal-h2020.eu/
  • 18. Users should not see this…
  • 19.
  • 20. Deployment and user access Launch on reference installation Launch on public cloud Private VRE
  • 21. In-house deployment scenarios MRC-NIHR Phenome Centre • Medium-sized IT-infrastructure • Dedicated IT- personnel • Users: ICL staff Hospital environment • Dedicated server • No IT-personnel • User: Clinical researcher Private VRE
  • 22. Build and test tools, images, infrastructure Docker Hub PhenoMeNal Jenkins PhenoMeNal Container Hub Development: Container lifecycle Source code repositories
  • 23. Two proof of concepts so far Kultima group Pablo Moreno
  • 24. Implications • Improve sustainability – Not dependent on specific data centers • Improve reliability and security – Users can run their own service environments (VREs) within isolated environments – High-availability and fault tolerance • Scalability – Deploy in elastic environments • Agile development – Automate “from develop to deploy” • Agile science – Simple access to discoverable, scalable tools on elastic compute resources with no up-front costs • NB: Many problems of interoperability remains! – Data – APIs – etc. 24
  • 25. Ongoing research on VREs 25 Data federation Compute federation Privacy preservation Workflows Big Data frameworks Data management and modeling
  • 26. Acknowledgements Wesley Schaal Jonathan Alvarsson Staffan Arvidsson Arvid Berg Samuel Lampa Marco Capuccini Martin Dahlö Valentin Georgiev Anders Larsson Polina Georgiev Maris Lapins 26 AstraZeneca Lars Carlsson Ernst Ahlberg University Vienna David Kreil Maciej Kańduła SNIC Science Cloud Andreas Hellander Salman Toor Caramba.clinic Kim Kultima Stephanie Herman Payam Emami ToxHQ team Barry Hardy Thomas Exner Joh Dokler Daniel Bachler

Notas do Editor

  1. Idea with SOA (~2005) Achieve interoperability by exposing data and functionality as Web services Experts (scientists) should set up and host their own Web services Users should integrate a multitude of distributed services, connect into workflows (e.g. Taverna), and share (parts of) workflows What happened? Users could not rely on Web services (downtime, API changes, abandoned) and they could not be mirrored Workflows never gained widespread popularity Today, stable web services mainly remain at large data and tool providers (EBI, NCBI etc)
  2. Drop applications into VMs running Docker in different clouds.