SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Why
computing
for genomics
research
sucks.
y.wurm@qmul.ac.uk
BaltiBio 2014-05-27
Example GenomicsTasks
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
Trim FASTQ files 40-400x high low short
One de novo
genome assembly
1 high high long
Many de novo
genome assemblies
20-1000x high high long
Determine which of
10 new tools that
promise X can
actually do X (once). !
“genome hacking”
1 depends depends depends
Traditional High Performance
Computing (HPC)
• Physics? Astronomy? Maths? Chemistry?
• Traditional HPC infrastructures are great at small tasks:
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
• And/or have mechanisms/tools that transform their challenges
into many small tasks.
“We have 9999 cores!” - central IT admin
but they are inadequate
Big Ass Servers
• e.g.: 1.5TB ram; 48 cores -
SSH into it and do whatever
you want.
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
Trim FASTQ files 40-400x high low short
One de novo genome
assembly
1 high high long
Many de novo genome
assemblies
20-1000x high high long
Determine which of 10
new tools that promise
X can actually do X
1 depends depends depends
Jeremy Leipzig
Additional challenges for biologists
• Datasets continue growing fast!
• Generally:
• We lack computational training.
• Bioinformatics tools suck (badly written, badly
tested, hard to install).
So what do we need?
• access to machines of all shapes and sizes
• big and small machines
• direct access via ssh (for hacking & doing things few times)
• indirect access via queue (for doing things many times)
• fast I/O - cheap archival.
• single login: all files “feel” like they’re in one place
Swiss Institute of
Bioinformatics:Vital-IT
So what do we need?
• access to machines of all shapes and sizes
• big and small machines
• direct access via ssh (for hacking & doing things few times)
• indirect access via queue (for doing things many times)
• fast I/O - cheap archival.
• single login; all files “feel” like they’re in one place
• easily changeable software & OS versions
Easily changeable OS & software versions
https://www.docker.io
>docker-switch bio-linux7
# do stuff
>docker-switch pacbio-assembly-vm
# do other stuff
>docker-switch antlab-ubuntu
# do more stuff
@bmpvieira
Easily changeable OS & software versions
https://www.docker.io
>docker-switch bio-linux7
# do stuff
>docker-switch pacbio-assembly-vm
# do other stuff
>docker-switch antlab-ubuntu
# do more stuff
FAKE
@bmpvieira
What if Apple/Google made an
idiot-proof cloud computing
system for genomics?
What if Apple/Google made an
idiot-proof cloud computing
system for genomics?
• Always on - single place to connect to:
ssh mylab.awskiller.co.uk
• Dropbox-like shared directories & file checksumming.
• Easily switchable OS version / “VM”.
• Automagically & transparently migrates:
• from small to huge machines (and back) as CPU and RAM
demands change.
What if Apple/Google made an
idiot-proof cloud computing
system for genomics?
• Always on - single place to connect to:
ssh mylab.awskiller.co.uk
• Dropbox-like shared directories & file checksumming.
• Easily switchable OS version / “VM”.
• Automagically & transparently migrates:
• from small to huge machines (and back) as CPU and RAM
demands change.
• from one physical site (huge dataset) to another
Summary
• Broad range of needs:!
• some similar to traditional HPC.!
• some very different!!
• Users are naive.!
• Tools are experimental.!
• Datasets are experimental.!
• IT people have difficulty understanding this.
• Do not trust them when they say things will just work!
!
• A lot of potential to make things not suck.
Evolutionary Genetics group
& Queen Mary U London
Bruno Vieira - @bmpvieira
Steve Moss - @gawbul
Anurag Priyam - @yeban
Richard Christie & ITS
Research Support team @
Queen Mary U London
Ioannis Xenarios & Vital-IT
team @ Swiss Institute of
Bioinformatics
http://yannick.poulet.orgy.wurm@qmul.ac.uk

Mais conteúdo relacionado

Mais de Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics researchYannick Wurm
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible researchYannick Wurm
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosomeYannick Wurm
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assemblyYannick Wurm
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker badYannick Wurm
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.keyYannick Wurm
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitchYannick Wurm
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolutionYannick Wurm
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.keyYannick Wurm
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcommYannick Wurm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 introYannick Wurm
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopYannick Wurm
 

Mais de Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshop
 

Último

Point of Care Testing in clinical laboratory
Point of Care Testing in clinical laboratoryPoint of Care Testing in clinical laboratory
Point of Care Testing in clinical laboratoryoyebolasonuga14
 
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Hilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxHilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxhiredepot6
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...drmarathore
 
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制uodye
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证wpkuukw
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理uodye
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证wpkuukw
 
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一uodye
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一ougvy
 
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证ehyxf
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信oopacde
 
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证ehyxf
 
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证tufbav
 
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...samsungultra782445
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证ehyxf
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证wpkuukw
 

Último (20)

Point of Care Testing in clinical laboratory
Point of Care Testing in clinical laboratoryPoint of Care Testing in clinical laboratory
Point of Care Testing in clinical laboratory
 
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
 
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pillsIn Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
 
Hilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxHilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptx
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
 
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
 
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
 
Abortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get CytotecAbortion pills in Jeddah |+966572737505 | Get Cytotec
Abortion pills in Jeddah |+966572737505 | Get Cytotec
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
 
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
 
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
 
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
 
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Ratlam [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
 

2014 05-27 - Opinion: Computing for genomics sucks.

  • 2. Example GenomicsTasks Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short Trim FASTQ files 40-400x high low short One de novo genome assembly 1 high high long Many de novo genome assemblies 20-1000x high high long Determine which of 10 new tools that promise X can actually do X (once). ! “genome hacking” 1 depends depends depends
  • 3. Traditional High Performance Computing (HPC) • Physics? Astronomy? Maths? Chemistry? • Traditional HPC infrastructures are great at small tasks: Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short • And/or have mechanisms/tools that transform their challenges into many small tasks.
  • 4. “We have 9999 cores!” - central IT admin but they are inadequate
  • 5. Big Ass Servers • e.g.: 1.5TB ram; 48 cores - SSH into it and do whatever you want. Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short Trim FASTQ files 40-400x high low short One de novo genome assembly 1 high high long Many de novo genome assemblies 20-1000x high high long Determine which of 10 new tools that promise X can actually do X 1 depends depends depends Jeremy Leipzig
  • 6. Additional challenges for biologists • Datasets continue growing fast! • Generally: • We lack computational training. • Bioinformatics tools suck (badly written, badly tested, hard to install).
  • 7. So what do we need? • access to machines of all shapes and sizes • big and small machines • direct access via ssh (for hacking & doing things few times) • indirect access via queue (for doing things many times) • fast I/O - cheap archival. • single login: all files “feel” like they’re in one place
  • 9. So what do we need? • access to machines of all shapes and sizes • big and small machines • direct access via ssh (for hacking & doing things few times) • indirect access via queue (for doing things many times) • fast I/O - cheap archival. • single login; all files “feel” like they’re in one place • easily changeable software & OS versions
  • 10. Easily changeable OS & software versions https://www.docker.io >docker-switch bio-linux7 # do stuff >docker-switch pacbio-assembly-vm # do other stuff >docker-switch antlab-ubuntu # do more stuff @bmpvieira
  • 11. Easily changeable OS & software versions https://www.docker.io >docker-switch bio-linux7 # do stuff >docker-switch pacbio-assembly-vm # do other stuff >docker-switch antlab-ubuntu # do more stuff FAKE @bmpvieira
  • 12.
  • 13. What if Apple/Google made an idiot-proof cloud computing system for genomics?
  • 14. What if Apple/Google made an idiot-proof cloud computing system for genomics? • Always on - single place to connect to: ssh mylab.awskiller.co.uk • Dropbox-like shared directories & file checksumming. • Easily switchable OS version / “VM”. • Automagically & transparently migrates: • from small to huge machines (and back) as CPU and RAM demands change.
  • 15. What if Apple/Google made an idiot-proof cloud computing system for genomics? • Always on - single place to connect to: ssh mylab.awskiller.co.uk • Dropbox-like shared directories & file checksumming. • Easily switchable OS version / “VM”. • Automagically & transparently migrates: • from small to huge machines (and back) as CPU and RAM demands change. • from one physical site (huge dataset) to another
  • 16. Summary • Broad range of needs:! • some similar to traditional HPC.! • some very different!! • Users are naive.! • Tools are experimental.! • Datasets are experimental.! • IT people have difficulty understanding this. • Do not trust them when they say things will just work! ! • A lot of potential to make things not suck.
  • 17. Evolutionary Genetics group & Queen Mary U London Bruno Vieira - @bmpvieira Steve Moss - @gawbul Anurag Priyam - @yeban Richard Christie & ITS Research Support team @ Queen Mary U London Ioannis Xenarios & Vital-IT team @ Swiss Institute of Bioinformatics http://yannick.poulet.orgy.wurm@qmul.ac.uk