SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Cloud BioLinux: open source, fully-customizable
 bioinformatics computing on the cloud for the
       genomics community and beyond

          BOSC 2011 - Vienna, Austria



                  Ntino Krampis, PhD
                     Asst. Professor
            J. Craig Venter Institute (JCVI)
                 agbiotec@gmail.com
Expensive sequencing and large organizations
                   Commodity sequencing and small labs

●
    large sequencing center, multi-million, broad-impact sequencing projects
●   dedicated bioinformatics department, large Sun Grid Engine cluster


●   small-factor, bench-top sequencer available: GS Junior by 454
●   sequencing as a standard technique in basic biology and genetics research
●   RNAseq and ChiPseq, and each biologist will be tackling a metagenome
Will small labs become the long tail of sequencing ?




   amount of
   sequencing         Credit: WikiMedia Commons




                  number of labs
“Bioinformatics nation is a land of city-states” Lincoln Stein

●   small labs building small-scale bioinformatics infrastructures
●   duplication of effort in compiling and installing software tools
●   some labs have no hardware, expertise, or time to install and run software

●   NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools
●   example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS



    how about large-scale sequence datasets ?
Cloud BioLinux
      pre-configured and on-demand bioinformatics computing on the cloud



                        ●   JCVI cloud computing research
                        ●   NEBC bioinformatics software repository
        +               ●   community effort – Hackathon / BOSC 2010 - 11
                        ●   pre-configured Virtual Machine (VM, image)

                        ● large-scale computing independently of institutional or
                        geographic boundaries
        =               ●   only need a desktop computer with internet access




cloudbiolinux.org
Cloud BioLinux
                 simple for end-users


                                                    signup at

                                                aws.amazon.com
                                                      then
                                             aws.amazon.com/console
                                                      and




http://tinyurl.com/cloud-biolinux-tutorial
Amazon EC2
→
linux desktop
via remote
desktop client
What if I want to
    share my
alignments with
a collaborator?

save your data as
   a new VM

  0.10$ / GB /
     month

at 15GB, it costs
  1.5$ / month
“whole system snapshot exchange” (Dudley and Butte 2010)
capture the state of the computing system and data
software execution parameters and “massaged” input datasets
Cloud BioLinux developer's framework
        create cloud VM / images with standardized software configurations



●   customize Cloud BioLinux based on community requirements

●   mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)

●   share customized VMs with collaborators, avoiding effort duplication

●   deploy Cloud BioLinux on private and local clouds
Cloud BioLinux developer's framework

     ●   based on python-fabric auto-deployment tool

     ●   software components listed in plain text files

     ●   collaborators use files to share descriptions of cloud VM / images

     ●   start with a bare-bones VM / image

     ●   fabric downloads and installs specified software




tinyurl.com/python-fabric        open.eucalyptus.com
software domains in bioinformatics: nextgen
sequencing, de novo assembly, annotation, phylogeny,
    molecular structures, gene expression analysis


        github.com/chapmanb/cloudbiolinux
Cloud Biolinux
                                  The future


●   expand community, receive feedback, add more software to the VM

●   groups.google.com/cloudbiolinux and cloudbiolinux.org

●   add data analysis pipelines that are used by sequencing centers

●   actively seeking funding to put major effort in development

●   2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/

●
Acknowledgments & Credits

Brad Chapman     - development of the fabric scripts and community organizer
Tim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0
J. Craig Venter Inst. - time allowed to work on an open-source project
D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation
Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop
Members of the Cloud Biolinux community – precious development time


       Thank you !

Mais conteúdo relacionado

Semelhante a F02-Cloud-Cloud BioLinux

CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018Krishna-Kumar
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding KubernetesTu Pham
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectPatrick Chanezon
 
Drilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDrilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDavid Rilett
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula Project
 
Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...David Wallom
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow霈萱 蔡
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...PranavPatil822557
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformaticsEnis Afgan
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptgeminass1
 
Towards-cloud-native-HPC.pdf
Towards-cloud-native-HPC.pdfTowards-cloud-native-HPC.pdf
Towards-cloud-native-HPC.pdfWalid Shaari
 
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024Univention GmbH
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28korusamol
 
Google Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesGoogle Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesKasper Nissen
 

Semelhante a F02-Cloud-Cloud BioLinux (20)

Cloud ntino-krampis
Cloud ntino-krampisCloud ntino-krampis
Cloud ntino-krampis
 
CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018CNCF Introduction - Feb 2018
CNCF Introduction - Feb 2018
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 
Drilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDrilett aws vpc_presentation_shared
Drilett aws vpc_presentation_shared
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
 
Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...
 
Cloud computing: highlights
Cloud computing: highlightsCloud computing: highlights
Cloud computing: highlights
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformatics
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Towards-cloud-native-HPC.pdf
Towards-cloud-native-HPC.pdfTowards-cloud-native-HPC.pdf
Towards-cloud-native-HPC.pdf
 
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024
Univention IAM and Portal for Kubernetes - Ingo Steuwer - Univention Summit 2024
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Google Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesGoogle Cloud Platform and Kubernetes
Google Cloud Platform and Kubernetes
 

Mais de Bioinformatics Open Source Conference

Mais de Bioinformatics Open Source Conference (20)

Running workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentationRunning workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentation
 
Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011
 
Bosc mercer
Bosc mercerBosc mercer
Bosc mercer
 
Mobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_serviceMobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_service
 
Bosc2011 arakawa
Bosc2011 arakawaBosc2011 arakawa
Bosc2011 arakawa
 
Bosc2011 isobar-fbp
Bosc2011 isobar-fbpBosc2011 isobar-fbp
Bosc2011 isobar-fbp
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
 
Unipro ugene bosc 2011 update
Unipro ugene bosc 2011 updateUnipro ugene bosc 2011 update
Unipro ugene bosc 2011 update
 
Bosc talk 7-15-2011x
Bosc talk 7-15-2011xBosc talk 7-15-2011x
Bosc talk 7-15-2011x
 
B07-GenomeContent-Biomart
B07-GenomeContent-BiomartB07-GenomeContent-Biomart
B07-GenomeContent-Biomart
 
B03-GenomeContent-Intermine
B03-GenomeContent-IntermineB03-GenomeContent-Intermine
B03-GenomeContent-Intermine
 
G03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCATG03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCAT
 
F06-Cloud-Enabling NGS
F06-Cloud-Enabling NGSF06-Cloud-Enabling NGS
F06-Cloud-Enabling NGS
 
D03-NextGen-Bio-NGS
D03-NextGen-Bio-NGSD03-NextGen-Bio-NGS
D03-NextGen-Bio-NGS
 
F07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAMF07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAM
 
C03-Visualization-Webapollo
C03-Visualization-WebapolloC03-Visualization-Webapollo
C03-Visualization-Webapollo
 
F01-Cloud-Mygene.info
F01-Cloud-Mygene.infoF01-Cloud-Mygene.info
F01-Cloud-Mygene.info
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
F03-Cloud-Obiwee
F03-Cloud-ObiweeF03-Cloud-Obiwee
F03-Cloud-Obiwee
 
F05-Cloud-Sequencescape
F05-Cloud-SequencescapeF05-Cloud-Sequencescape
F05-Cloud-Sequencescape
 

Último

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

F02-Cloud-Cloud BioLinux

  • 1. Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI) agbiotec@gmail.com
  • 2. Expensive sequencing and large organizations Commodity sequencing and small labs ● large sequencing center, multi-million, broad-impact sequencing projects ● dedicated bioinformatics department, large Sun Grid Engine cluster ● small-factor, bench-top sequencer available: GS Junior by 454 ● sequencing as a standard technique in basic biology and genetics research ● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  • 3. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs
  • 4. “Bioinformatics nation is a land of city-states” Lincoln Stein ● small labs building small-scale bioinformatics infrastructures ● duplication of effort in compiling and installing software tools ● some labs have no hardware, expertise, or time to install and run software ● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools ● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
  • 5. Cloud BioLinux pre-configured and on-demand bioinformatics computing on the cloud ● JCVI cloud computing research ● NEBC bioinformatics software repository + ● community effort – Hackathon / BOSC 2010 - 11 ● pre-configured Virtual Machine (VM, image) ● large-scale computing independently of institutional or geographic boundaries = ● only need a desktop computer with internet access cloudbiolinux.org
  • 6. Cloud BioLinux simple for end-users signup at aws.amazon.com then aws.amazon.com/console and http://tinyurl.com/cloud-biolinux-tutorial
  • 7. Amazon EC2 → linux desktop via remote desktop client
  • 8. What if I want to share my alignments with a collaborator? save your data as a new VM 0.10$ / GB / month at 15GB, it costs 1.5$ / month
  • 9. “whole system snapshot exchange” (Dudley and Butte 2010) capture the state of the computing system and data software execution parameters and “massaged” input datasets
  • 10. Cloud BioLinux developer's framework create cloud VM / images with standardized software configurations ● customize Cloud BioLinux based on community requirements ● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.) ● share customized VMs with collaborators, avoiding effort duplication ● deploy Cloud BioLinux on private and local clouds
  • 11. Cloud BioLinux developer's framework ● based on python-fabric auto-deployment tool ● software components listed in plain text files ● collaborators use files to share descriptions of cloud VM / images ● start with a bare-bones VM / image ● fabric downloads and installs specified software tinyurl.com/python-fabric open.eucalyptus.com
  • 12. software domains in bioinformatics: nextgen sequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis github.com/chapmanb/cloudbiolinux
  • 13. Cloud Biolinux The future ● expand community, receive feedback, add more software to the VM ● groups.google.com/cloudbiolinux and cloudbiolinux.org ● add data analysis pipelines that are used by sequencing centers ● actively seeking funding to put major effort in development ● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/ ●
  • 14. Acknowledgments & Credits Brad Chapman - development of the fabric scripts and community organizer Tim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0 J. Craig Venter Inst. - time allowed to work on an open-source project D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop Members of the Cloud Biolinux community – precious development time Thank you !