SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Omics in extreme Environments
(Lightweight bioinformatics)!
Joe Parker"
Royal Botanic Gardens, Kew"
Compute time is (much) cheaper than you think"
"… and much cheaper than your time."
Physical portability requires software portability."
Kew"
One of the largest living and tissue collections in the
world: ca ~6000 genera (~1/3 plant genera)"
2020 Strateigic Output: Plant And Fungal Trees Of Life!
Why in the field"
•  Spatial analysis"
•  ID & naming"
•  Image recognition"
Why in the field"
The ‘micro’ computer: Raspi"
•  Low: cost, energy (& power)!
•  Highly portable"
•  Hackable form-factor"
Laptops"
•  Portable"
•  Very costly form-factor"
•  Maté? Beer?"
Clusters"
•  Not portable,
setup costs"
The cloud"
•  Power closely linked to
budget (as limited as)"
•  Almost infinitely
scalable"
•  Have to have a
connection to get data
up there (and down!)"
•  Fiddly setup"
Comparison"
System
Arch
CPU type,
clock GHz
cores
RAM Gb /
MHz /
type
HDD Gb
Pandanus i686
Xeon E5620
@ 2.4
4 33
1000
@ SATA
Raspberry Pi
2 B+
ARM
ARMv7
@ 1.0
1 1
8
@ flash card
Macbook Pro
(2011)
x64
Core i7
@ 2.2
4 8
250
@ SSD
EC2
m4.10xlarge
x64
Xeon E5
@ 2.4
40 160
320
@ SSD
MidPlus x64
Westmere
@ 2.8
2500+ 24 - 512
2x320
@ SSD
Workflow"
Setup
BLAST
2.2.30
CEGMA
genes
Short reads
Concatenate hits to
CEGMA alignments
Muscle
3.8.31
RAxML
7.2.8+
Set up workflow, binaries, and reference /
alignment data.
Deploy to machines.
Protein-protein blast reads (from MG-
RAST repository, Bass Strait oil field)
against 458 core eukaryote genes from
CEGMA. Keep only top hits. Use max.
num_threads available.
Append top hit sequences to CEGMA
alignments.
For each:
Align in MUSCLE using default
parameters
Infer de novo phylogeny in RAxML
under Dayhoff, random starting tree
and max. PTHREADS.
Output and parse times.
Results"
Platform! Hardware capital! Data! Running!
Pi (new) " £26" NA" NA"
MBP" ~£2000" NA" NA"
AWS (M4.10xlarge)" 0"
~£1/Mb (BGAN); £1/day (Virgin
mobile-only tariff)"
£1.78 /hr"
AWS (t2.micro)" 0"
~£1/Mb (BGAN); £1/day (Virgin
mobile-only tariff)"
£0.01/hr"
log(time)(user,s)
log(Number of queries)
Overall performance"
Raspi in practice"
•  Stability"
•  ARM not x86
architecture"
•  2 GB RAM… "
The cloud in practice"
•  Fiddly setup, easy to
replicate"
•  Need a connection to
get data up there (and
down!)"
Conclusions"
•  Pi opportunities but not there yet, also you’ll
still need a connection unless you’re very
lucky.. "
•  Installation in situ?"
•  Consider cloud computing (connections can
only improve)"
•  Portability of the workflow enhances
portability of the system!
–  …which you should be embracing anyway for
reproducibility…"
Thanks"
!Kew!
!Matt Blissett, Abigail Barker, Rob Turner!
Others!
!Daniel Barker (4273π)"
!Tim Booth (BioLinux)"
"Alexandros Stamatakis (RAxML)"

Mais conteúdo relacionado

Destaque

Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsJoe Parker
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing PhylogenomicsJoe Parker
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe Parker
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerJoe Parker
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Joe Parker
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
 

Destaque (6)

Using field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomicsUsing field-based DNA sequencing to accelerate phylogenomics
Using field-based DNA sequencing to accelerate phylogenomics
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Joe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformaticsJoe parker-benchmarking-bioinformatics
Joe parker-benchmarking-bioinformatics
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 

Semelhante a 'Omics in extreme Environments (Lightweight bioinformatics)

Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataZhong Wang
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015Chris Fregly
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Spark Summit
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009Ian Foster
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersGlenn K. Lockwood
 
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...OpenNebula Project
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysiscursoNGS
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph Community
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...inside-BigData.com
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBSHanson Dong
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 

Semelhante a 'Omics in extreme Environments (Lightweight bioinformatics) (20)

Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
BioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing dataBioPig for scalable analysis of big sequencing data
BioPig for scalable analysis of big sequencing data
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
 
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Último (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

'Omics in extreme Environments (Lightweight bioinformatics)

  • 1. Omics in extreme Environments (Lightweight bioinformatics)! Joe Parker" Royal Botanic Gardens, Kew"
  • 2. Compute time is (much) cheaper than you think" "… and much cheaper than your time." Physical portability requires software portability."
  • 3. Kew" One of the largest living and tissue collections in the world: ca ~6000 genera (~1/3 plant genera)" 2020 Strateigic Output: Plant And Fungal Trees Of Life!
  • 4. Why in the field" •  Spatial analysis" •  ID & naming" •  Image recognition"
  • 5. Why in the field"
  • 6. The ‘micro’ computer: Raspi" •  Low: cost, energy (& power)! •  Highly portable" •  Hackable form-factor"
  • 7. Laptops" •  Portable" •  Very costly form-factor" •  Maté? Beer?"
  • 9. The cloud" •  Power closely linked to budget (as limited as)" •  Almost infinitely scalable" •  Have to have a connection to get data up there (and down!)" •  Fiddly setup"
  • 10. Comparison" System Arch CPU type, clock GHz cores RAM Gb / MHz / type HDD Gb Pandanus i686 Xeon E5620 @ 2.4 4 33 1000 @ SATA Raspberry Pi 2 B+ ARM ARMv7 @ 1.0 1 1 8 @ flash card Macbook Pro (2011) x64 Core i7 @ 2.2 4 8 250 @ SSD EC2 m4.10xlarge x64 Xeon E5 @ 2.4 40 160 320 @ SSD MidPlus x64 Westmere @ 2.8 2500+ 24 - 512 2x320 @ SSD
  • 11. Workflow" Setup BLAST 2.2.30 CEGMA genes Short reads Concatenate hits to CEGMA alignments Muscle 3.8.31 RAxML 7.2.8+ Set up workflow, binaries, and reference / alignment data. Deploy to machines. Protein-protein blast reads (from MG- RAST repository, Bass Strait oil field) against 458 core eukaryote genes from CEGMA. Keep only top hits. Use max. num_threads available. Append top hit sequences to CEGMA alignments. For each: Align in MUSCLE using default parameters Infer de novo phylogeny in RAxML under Dayhoff, random starting tree and max. PTHREADS. Output and parse times.
  • 12. Results" Platform! Hardware capital! Data! Running! Pi (new) " £26" NA" NA" MBP" ~£2000" NA" NA" AWS (M4.10xlarge)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £1.78 /hr" AWS (t2.micro)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £0.01/hr" log(time)(user,s) log(Number of queries)
  • 14. Raspi in practice" •  Stability" •  ARM not x86 architecture" •  2 GB RAM… "
  • 15. The cloud in practice" •  Fiddly setup, easy to replicate" •  Need a connection to get data up there (and down!)"
  • 16. Conclusions" •  Pi opportunities but not there yet, also you’ll still need a connection unless you’re very lucky.. " •  Installation in situ?" •  Consider cloud computing (connections can only improve)" •  Portability of the workflow enhances portability of the system! –  …which you should be embracing anyway for reproducibility…"
  • 17. Thanks" !Kew! !Matt Blissett, Abigail Barker, Rob Turner! Others! !Daniel Barker (4273π)" !Tim Booth (BioLinux)" "Alexandros Stamatakis (RAxML)"