SlideShare a Scribd company logo
1 of 33
Download to read offline
Random RNA interactions control protein expression in
prokaryotes
Paul Gardner
University of Canterbury
Christchurch
New Zealand
Feel free to share what you hear
These slides are available at: http://www.slideshare.net/ppgardne/presentations
The hard work of Sinan Umu, Ant Poole & Ren Dobson
mRNA levels are imperfectly correlated with protein levels
Lu et al. (2007) Nature biotechnology.
Determinants of protein concentration
Protein concentration depends on mRNA concentration, translation and
degradation rates
DNA
[D]
RNA
[R]
Protein
[P]
ktranscription ktranslation
kmRNA degradation kprotein degradation
0 1
A
T GGC
TA
A
GGGGCA
A
T
C
T
T
TA
C
A A
G
AT
CC
G
T
T
C
C
T
G
A
AC
G
C
AC
T G
C
G
T C
G
G
G
A
A
C
G
T
G
T
T C
CAGTTTCTATTTATT
T
G G T G A A T G GTATTA A G C T GC
AA
G
G G
C
AA
A
T
C
G
A
G
T
C
T
TT
T
G
A
T
C
AG
T
T
C
G
T
G
A
T
C
C
T
G
T
T
G
A A
A
A
A
C
A
C
G
G
T
C
A GC
C
A
G
A
T
G
G
T TT
A
C
A
A
GC
A
C
G
C
G
A
T
T
T C T A
C
T
G
T
T G T C C CG
T CT
C
G C C C G G T T T C
T
C
AT
CA
CA
GTAA
CAACGCCG
GT
GGC
G
G
T
A
C
C
A
G
C
A
G
T
A
A
C T A C C A T
C
A
TGGTAGCAGCG
C
G
C A
G
A A
T
AC
T
T
CC
G
C
G
C
A
ACAGG
A
C
A
G
C
G
A
A
GAAACCG
A
A
TAA
de Sousa Abreu, Penalva, Marcotte & Vogel (2009) Global signatures of protein and mRNA expression levels. Molecular
BioSystems.
Two general models describe variation in translation rate
1. Codon usage (Ikemura, 1981)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
Two general models describe variation in translation rate
2. mRNA structure (Pelletier & Sonenberg, 1987)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
We think we have a third general model...
http://dx.doi.org/10.7554/eLife.13479
http://dx.doi.org/10.7554/eLife.20686
Non-coding RNAs are abundant
q
q
q
q
q
q
q
q
012345
log10(MeanReadDepth)
Core ncRNA genes
Core protein coding genes
Lindgreen, Umu et al. (2014) PLOS Computational Biology.
Bacterial non-coding RNA function
Hfq
AUG
SD
X
Ribosome
sRNA
AUG
RNase E
recruitment
AUG
SD
Ribosome
Anti-antisense mechanism
Selective mRNA stabilisation
AUG
RNase E
Shine-Dalgarno
sequence
Sequestration of ribosome binding site
Induction of mRNA decay
SD =
Figure by Bethany Jose
Checking for mRNA:ncRNA interactions
Looking for regulatory interactions which are specific and small in
number, off-targets are non-specific and large in number
Compare 5 ends of CDS & ncRNAs
Looking for a bump on the left...
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Density
Checking for mRNA:ncRNA interactions
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Native
Shuffled (P = 7.69−52
)
Checking negative controls!
−15 −10 −5 0
0.000.050.100.150.200.25
Binding Energy (kcal/mol)
Native
Shuffled (P = 7.69−52
)
Different phylum (P = 0 )
Downstream (P = 2.66−124
)
Rev. complement (P = 6.51−57
)
Intergenic (P = 6.16−93
)
Do ubiquitous and abundant RNAs influence translation?
Given that ncRNAs are among the most abundant RNAs in the cell
([ncRNA] >> [mRNA])
AND that RNAs frequently hybridise
THEN maybe stochastic interactions with mRNAs inhibit translation
Corley & Laederach (2016) Bioinformatics: Selecting against accidental RNA interactions. eLife.
How can this hypothesis be tested?
We predict that:
1. There is selection against mRNA:ncRNA interactions
2. That stochastic mRNA:ncRNA interactions influence [protein]:[mRNA]
ratios
For consistency: focus on 6 ncRNA families & 114 mRNAs/proteins
that are highly conserved & expressed; And first 21 nts of CDS.
Tested 1,582 bacterial & 118 archaeal genomes
Are mRNA:ncRNA interactions selected against?
−15 −10 −5 0
−0.010−0.0050.0000.0050.0100.015
Binding Energy (kcal/mol)
DensityDifference Actinobacteria (n:163) P = 9.8x10−69
Bacteroidetes (n:60) P = 8.7x10−148
Chlamydiae (n:38) P = 1.4x10−193
Cyanobacteria (n:40) P = 3.8x10−11
Firmicutes (n:378) P = 0
Proteobacteria (n:756) P = 0
Spirochaetes (n:38) P = 1.6x10−98
Archaea (n:118) P = 4.2x10−177
Background (n:100)
More stable interactions
NativeinteractionsShuffledinteractions
Act
Bac
Chl
Cya
Fir
Pro
Spi
Arc
010203040
−log10P
Do mRNA:ncRNA interactions influence protein
expression?
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
2.02.53.03.54.0
−300 −250 −200 −150
Rs=0.65
log10(fluorescence)
Avoidance (kcal/mol)
Expression data from: Kudla et al. (2009) Science.
Do mRNA:ncRNA interactions influence protein
expression?
Testing the relationship between protein abundance estimates and
avoidance, mRNA secondary structure, codon usage and mRNA
abundance
GFP datasets Mass-Spec datasets
E.coli
(n=52)
GFP/qPCR
E.coli
(n=154)
GFP/Northern
E.coli
(n=14,234)
mCherry/RNAseq
E.coli
(n=389)
MS/microarray
E.coli
(n=3,301)
MS/microarray
P.aeruginosa
(n=5,479)
MS/microarray
P.aeruginosa
(n=1,148)
MS/microarray
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*P < 0.05
0.0 0.60.2 0.4-0.2
Correlation Coefficient
Avoidance
Secondary
Structure
Codon
[mRNA]
Testing the extremes of expression
0.1
0.5
0.8
1.2
1.6
1.9
2.3
2.6
3
3.3
3.7
4.1
4.4
4.8
Freq
0
20
40
60
80
100
120
A
log10([Protein]/[mRNA])
Frequency
low expression (n=10)
high expression (n=10)
B
Avoidance
Codon
Sec.Str.
Null
Sec.Str.
Codon
Avoidance
−2
−1
0
1
2
*
*
Zscore
low expression (n=10)
high expression (n=10)
E. coli genes (n = 389)
Designing mRNAs
239aa GFP can be encoded by 7.62x10111 synonymous mRNAs
Extremes of avoidance have a stronger effect than codon usage or
secondary structure
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
0.60 0.65 0.70 0.75 0.80 0.85
CAI
log10(fluorescence)
Rs=0.29
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
−15 −10 −5 0
Folding Energy (kcal/mol)
Rs=0.34
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.24.34.44.54.64.7
−350 −300 −250 −200 −150 −100
Binding Energy (kcal/mol)
Rs=0.56
hi low
●
●
●
●
●
●
Avoid
Fold
Codon
Optimal●
Avoidance in 3D on the ribosome
Protein binds to regions with low avoidance (green) while exposed
regions are high avoidance (blue): P = 9.3x10−15, Fishers exact test
Further Work
Further work:
Testing adaptation with experimental evolution experiments
Do mRNA:ncRNA interactions influence eukaryotic gene expression?
Number of possible interactions increases quadratically with number of
genes. May require spatial & temporal separation of genes
Does avoidance drive compartmentalisation and increases in nucleotide
binding proteins?
Do mRNA:ncRNA interactions influence viral infection, hybridisation,
HGT & transformation expts?
Are protein, DNA and protein:nucleotide interactions also avoided?
And now for something completely different...
Bioinformaticians are horrible!
Bioinformaticians are bad, impatient & intolerant
Build a phylogenetic tree: which of the 172 methods do you use?
MBIORE
ANC-GENE
BAli-Phy
BAMBE
BayesPhylogenies
BEAST
BEST
Bio++
bms_runner
burntrees
Cadence
Crux
IMa2
Mesquite
MrBayes
MrBayesPlugin
MrBayes-tree-scanners
Multidivtime
p4
SIMMAP
PAL
tracer
PAML
Vanilla
PHASE
PHYLLAB
PhyloBayes
ARB
Bionumerics
BIRCH
Bosque
BPAnalysis
CAFCA
CRANN
DAMBE
EMBOSS
TNT
FootPrinter
Freqpars
Gambit
GAPars
GelCompar-II
GeneTree
gmaes
Hennig86
IDEA
LVB
MALIGN
MEGA
Mesquite
Murka
Network
NimbleTree
NONA
Notung
Parsimov
PAST
PAUP*
PAUPRat
PaupUp
phangorn
PHYLIP
PhyloNet
Phylo_win
POY
PRAP
PSODA
RA
SeaView
SeqState
Simplot
sog
TCS
Parsimony Maximum Likelihood Bayesian
ALIFRITZ
aLRT
ARB
Bio++
Bionumerics
BIRCH
BootPHYML
Bosque
CodeAxe
CoMET
Concaterpillar
CONSEL
Crux
DAMBE
DART
Darwin
dnarates
DPRML
DT-ModSel
EMBOSS
EREM
fastDNAml
fastDNAmlRev
FASTML
FastTree
GARLI
GZ-Gamma
HY-PHY
IQPNNI
Kakusan4
Leaphy
Mac5
McRate
Mesquite
MetaPIGA
MixtureTree
Modelfit
ModelGenerator
MOLPHY
MrAIC
MrModeltest
MrMTgui
MultiPhyl
NEPAL
NHML
nhPhyML
NimbleTree
p4
PAL
PAML
PARAT
PARBOOT
PASSML
PAUP*
PAUPRat
PaupUp
phangorn
PHYLLAB
PhyloCoCo
Phylo_win
PHYML
PhyML-Multi
PhyNav
PHYSIG
PLATO
Porn*
PRAP
PROCOV
ProtTest
PTP
r8s-bootstrap
Rate4Site
Rate-evolution
RAxML
raxmlGUI
RevDNArates
rRNA-phylogeny
SeaView
Segminator
SEMPHY
SeqPup
SeqState
SIMMAP
Simplot
SLR
Spectronet
Spectrum
SplitsTree
SSA
TipDate
Treefinder
TREE-PUZZLE
Vanilla
How can we choose software?
Which methods do you use?
Approach software like a scientist
Are any good controls available?
Positive: databases, publications,
simulation, ...
Negative: randomized, select
relevant negative data, ...
Some common accuracy metrics:
Sensitivity (true positive rate)
Specificity (true negative rate)
Mathew’s correlation coefficients
Area under an ROC curve
False positive rate
Truepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
DBS, Pfam
DBS, Treefam
DBS, Custom
PROVEAN
Polyphen−2
SIFT
FATHMM, weighted
FATHMM, unweighted
Wheeler et al. (2016) A profile-based method for
identifying functional divergence of orthologous genes
in bacterial genomes. Bioinformatics.
Benchmarks are useful, and fun...
Is there really a relationship between software speed &
accuracy?
Can we run a meta-analysis of bioinformatic benchmarks?
If speed isn’t related to accuracy, then what is?
Some possibilities:
Software age
Journal “impact” (IF & GoogleScholar H5)
Number of citations
Corresponding author’s H-index & M-index
After some literature mining...
found 43 matching articles.
102 benchmarks
Accuracy & speed ranks for 243 bioinformatic software tools
Manually extracted IF, H, age, ...
65 journals (Bioinformatics, NAR, Genome Research, ...)
151 author GoogleScholar profiles
abyss antepiseeker apg barry bellerophontes bfast bismark biss boost bowtie bowtie2 bowtiestar bratbw bsmap
bsmooth bsseeker buckycon buckymrbayes buckymrbayesspa buckypop buckyraxml builder bwa bwasw caml camp carma
ce celera clark clc clustalomega clustalw comus coprarna coral cosine crisp cro cromwell cufflinks cwt dali
de dexseq dialign dialign22 dialignt dialigntx diffsplice diginormvelvet dima djigsaw downhillsimplex dsgseq
ebi echo edenanonstrict edenastrict edit epimode ericscript erpin fa fasta fasttree fisherexacttest
fusioncatcher fusionmap gassst gatk genometa gojobori goldman gossamer gottcha greedyft gsnap heidge hitec
hmmer hshrec idbaud igtpduplossft inchworm infernal intarna jaffa kalign kbsps kraken kthse leidnl limpic
lmat lms lofreq lsqman mafft mafftfftns mafftfftns2 mafftlinsi mapsplice maq mats megan metaphlan metaphyler
methylkit methylsig mgrast minia mira mirdeep mireap mirena mirexpress mlclustalw mlclustalwquicktree mlmafft
mlmafftparttree mlmuscle mlopal mlprankgt modellerv mosaik motu mpest mpjclustalw mpsclustalw mrfast mrpml
mrpmp mrsfast msinspect multalin muscle musclemaxiters mzmine nbc ncbiblast nest newbler nfuse novoalign
oases onecodex openms pairfold paralign pass perm phylonetft phylopythias phymmbl piler poa poy poystar
pragcz probalign probcons probtree process pso pt qiime qsra quake raiphy ravenna raxml raxmllimited
rdiffparam repeatfinder repeatgluer repeatscout reptile rmap rnacofold rnaduplex rnahybrid rnaplex rnaup
rsearch rsmatch sam sate scro scwrl scwrlcons segemehl segmodencad seqgsea seqman seqmap sga sharcgs shrimp
simulatedannealing sl smalt snap snpruler snver soap soap2 soapdenovo soapec soapstar spades sparse
sparseassembler spcomp specarray spt srmapper ssaha ssake ssap ssearch ssm sst st starbeast strcutal
swissmodel taipan targetrna targetrna2 taxatortk tcoffee team tmap tophatfusion transabyss trinity upmes
varscan vcake velvet wmrpmp woodhams wublast xalign xcmswithcorrection xcmswithoutretentiontime zema
Nothing is correlated with accuracy!
R
el.age
Year
AccuracySpeed
JH
5
JIF
C
ites
R
el.citesH
−index
M
−index
R
el.age
Year
Accuracy
Speed
JH
5
JIF
C
ites
R
el.cites
H
−index
M
−index
R
el.age
Year
Speed
JH
5
JIF
C
ites
R
el.cites
H
−index
M
−index
X X X X X X
X X X X X X X
X
X
X X X X
X X X X X X X
X X X X X X X
X X X X X X X X
X X X X X X
X X X X X
Correlates with accuracy rank
Spearman'srho
−0.2
−0.1
0.0
0.1
0.2
xxx
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
xxx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
xx
x
x
xxx
x
x
xx
xx
x
x
x
x
x
xx
xx
x
x
x
xx
xx
x
xxxx
x
x
x
x
x
x
x
xxx
xx
xxx
x
x
x
x
x
x
x
xx
x
xx
x
x
x
xxxxxx
x
xx
xxxxxx
x
x
x
x
x
x
x
x
x
x
xxxx
x
xxxx
xx
x
x
x
xx
xxx
xx
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
xx
xx
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
xx
xx
x
x
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
x
x
xx
x
xxx
x x
xxxx
x
xx
x
xxxx
x
xxxx
x
x
xx
xx
x
xxx
x
xx
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
xxx
x
xx
x
x
x
xx
xxx
x
x
x
x
x
xxx
x
xx
x
xx
xx
x
x
x
x
x
x
x
xxxxx
x
x
x
x
xx
x
x
x
x
xxxxx
x
x
xx
x
x
xxx
x
xx
x
x
xx
x
x
x
x
xxxx
x
x
xx
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
xx
xx
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
xxxxxx
xx
x
x
xxx
x
x
xx
xxxx
xx
xx
xxx
x
x
xxxxxxx
x
xxx
x
xxxxxxx
x
x
x
x
xxx
xx
x
x
x
x
xxxxxx
xxx
x
x
x
x
xxxx
x
x
x
x
x
x
x
xxxxxxx
x
x
xxx
xx
xx
xxxxx
x
x
x
x
x
xx
x
x
xx
x
xxxxx
x
x
xx
xxx
x
x
x
xx
xxx
x
x
x
x
x
x
xxxx
x
x
x
xxxx
x
xxx
x
x
x
x
xx
x
xx
x
x
x
xxx
x
x
xx
x
x
xxx
x
x
xxx
x
x
x
x
x
x
x
x
xx
x x
xx
x
x
x
x
x
x
x
x
xx
xx
xx
xx
x
x
x
x
x
x
xx
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
xx
x
x
xxxx
x
x
x
x
xxx
xxxxx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xxx
xxx
xx
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
xx
x
x
x
x
x
x
xx
xx
x
x
xx
x
x
x
x
xxx
x
xx
x
x
xx
xx
x
xx
x
x
xx
x
x
x
xx
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
xx
x
x
xx
x
x
x
x
x
x
x
x
xxx
xx
x
x
xx
x
xxx
xx
x
x
xxx
xxx
xxx
x
x
x
x
x
x
xx
x
x
x
xxx
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
xxxx
x
x
x
xxx
x
x
x
x
xx
xx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
xx
x
xxx
xx
x
x
x
xx
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xxx
x
xxxxxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
xx
x
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
xxx
x
x
xx
x
x
x
x
x
xx
x
xx
x
x
x
xx
xx
xx
x
x
xx
x
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
xxxxx
x
x
x
xx
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xxxxx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
xxxx
x
xxx
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
xx
xxx
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
xxx
x
xxxxxx
x
x
x
x
x
xxxx
x
x
x
xxxxx
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
xx
xx
x
x
x
xx
x
xx
x
x
x
x
x
x
x
xxxx
x
x
x
x
xxxx
xxx
xx
x
xx
x
x
x
xxx
x
x
x
x
x
x
x
xxx
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xxx
x
xxx
xxx
x
x
x
x
x
x
x
x
x
xx
x
xxx
x
xxx
x
x
x
x
xxxx
x
xxxx
x
xx
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
xx
x
xx
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
xx
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
xxxx
xx
x
x
xxxx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xxx
xx
x
xxx
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xxx
xx
x
x
x
x
x
x
x
x
xx
x
xxx
x
xx
x
x
xxx
xx
x
x
x
x
x
x
xx
x
x
x
xx
x
xx
x
x
xx
x
x
xx
xxx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
xx
x
xxx
x
x
x
x
x
xxx
xxx
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
x
xx
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
xx
xx
x
x
xxxxxx
xx
x
xxxxx
x
x
x
xxx
xxx
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
xxx
x
x
x
x
x
xx
x
xxx
x
x
xx
x
xx
xxx
x
xx
x
x
x
x
x
xx
x
xxxxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
x
xx
x
x
x
x
xxx
x
x
xx
x
xx
x
x
x
xxx
x
xx
x
x
x
x
x
x
xx
x
xxxxx
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xxx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
x
xx
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
xxx
x
x
xx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xxx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
xx
x
x
x
x
x
xxxx
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
xx
xx
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxxxx
xx
xx
x
x
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
x
xx
xx
x
x
xxx
x
xx
xxx
x
x
x
x
xx
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxxx
x
x
x
xx
x
x
x
xxx
xx
xx
xxx
x
x
xx
x
xx
x
xx
x
x
x
x
x
xxx
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
xxxx
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
x
xxxx
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
xx
x
xx
x
x
x
x
xx
xxx
x
xxx
x
x
x
xxxxx
x
x
x
x
xx
xxx
xxx
x
xxx
x
x
x
x
x
x
x
x
xx
x
xx
xx
x
x
x
x
x
xxx
x
x
xx
xx
x
x
xx
x
x
x
xx
xx
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
x
x
x
xxx
x
x
x
xxx
x
x
xx
x
x
x
x
x
x
xx
x
x
xxxx
x
x
xx
x
x
x
x
x
xx
x
x
x
xxx
x
x
x
x
x
xxx
xx
x
x
x
xx
xxx
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xxxx
xxx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
xxxxxx
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
xxxx
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
xxx
x
xx
x
x
x
xxx
xx
x
x
x
x
x
x
x
xx
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
xx
x
x
x
xx
xx
xxx
x
x
xx
x
x
xx
xxx
x
xxx
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
xx
x
xxxxxx
x
x
x
xxx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
xx
x
xx
xx
x
x
x
x
xxx
xx
x
x
x
x
x
x
x
xx
xxx
x
x
x
xx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xxx
x
x
x
xx
xx
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
xxx
xx
x
xx
x
xx
x
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xxx
x
x
x
xx
xxx
x
x
x
x
x
x
x
x
xx
x
x
x
x
xx
x
xx xxx
xx
xxxxxx
x
x
x
x
xxx
x
x
x
x
x
x
x
x
xxxxx
x
xx
xx
x
xx
xx
xxx
x
x
xx
x
x
x
x
x
x
x
xxx
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
x
x
x
xx
x
xxx
x
x
x
xx
x
xxxx
xx
x
xx
x
xx
x
xx
x
xx
x
xx
x
xxx
x
xx
x
x
x
x
xx
xx
xx
xx
xxx
x
x
x
x
x
x
x
x
x
xx
xx
x
x
x
xx
x
x
x
x
x
x
xx
xx
xx
x
xx
x
x
x
x
x
x
x
x
x
xx
x
xxxx
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
x
xxx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
xx
x
xx
x
x
x
x
xx
x
xx
x
x
xx
x
x
x
x
xxx
xx
x
x
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
xx
xx
x
x
x
xx
x
x
x
x
xx
x
x
x
x
x
x
xxx
x
x
xx
x
x
x
x
x
x
x
x
xx
x
xxxx
x
x
x
x
x
-1 0 1
Spearman's rho
A B
-3 30
Z-score
Speed
Accuracy
Freq.
0 6 12
0
1000
2000
Freq.
0 6 12
0
1000
2000
Freq.
0 20
0
5000
10000
10
Freq.
0 6 12
0
1000
2000
Freq.
0 6 12
0
1000
2000
X
X X
X X
X X
X X X X X
X X
X X
X X X
X X
Conclusions
Speed is NOT reflective of accuracy
Neither is author/journal reputation, software age & # citations
The only reasonable way to select software is by benchmarking
Publication bias is influencing software accuracy
It doesn’t matter how famous you are, you can still write great software!
Thanks!
Avoidance: Sinan Umu, Anthony Poole & Renwick Dobson
Meta-benchmark: James Paterson, Fatemeh Ashari Ghomi, Sinan Umu,
Stephanie McGimpsey, Aleksandra Pawlik
Umu, Poole, Dobson & Gardner (2016) Avoidance of stochastic RNA interactions can be harnessed to control protein expression
levels in bacteria and archaea. eLife.
Gardner et al. (2017) A meta-analysis of bioinformatics software benchmarks reveals that publication-bias influences software
accuracy. In preparation.
These slides are available at: http://www.slideshare.net/ppgardne/presentations

More Related Content

What's hot

Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular markerShweta Tiwari
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Bharathiar university
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomicsdparks1134
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisEfi Athieniti
 
Serial analysis of gene expression
Serial analysis of gene expressionSerial analysis of gene expression
Serial analysis of gene expressionAshwini R
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaBhavya Sree
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingAmritha S R
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvementRagavendran Abbai
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andSarla Rao
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencinganita devi
 
The Human Genome Project - Part III
The Human Genome Project - Part IIIThe Human Genome Project - Part III
The Human Genome Project - Part IIIhhalhaddad
 

What's hot (20)

CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
VNTR- Minisatellite
VNTR- MinisatelliteVNTR- Minisatellite
VNTR- Minisatellite
 
Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular marker
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing Analysis
 
Serial analysis of gene expression
Serial analysis of gene expressionSerial analysis of gene expression
Serial analysis of gene expression
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Embed Repro Test
Embed Repro TestEmbed Repro Test
Embed Repro Test
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies and
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
The Human Genome Project - Part III
The Human Genome Project - Part IIIThe Human Genome Project - Part III
The Human Genome Project - Part III
 

Viewers also liked

Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seqPaul Gardner
 
Transcriptomics sequencing
Transcriptomics sequencingTranscriptomics sequencing
Transcriptomics sequencingcdgenomics525
 
Microarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyMicroarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyBia Khan
 
Listeria and Omics Approaches for Understanding Its Biology
Listeria and Omics Approaches for Understanding Its BiologyListeria and Omics Approaches for Understanding Its Biology
Listeria and Omics Approaches for Understanding Its Biologydedmark
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 

Viewers also liked (14)

Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
 
Transcriptomics sequencing
Transcriptomics sequencingTranscriptomics sequencing
Transcriptomics sequencing
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
Transcriptomica PDF
Transcriptomica PDFTranscriptomica PDF
Transcriptomica PDF
 
Microarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome studyMicroarray and dna chips for transcriptome study
Microarray and dna chips for transcriptome study
 
"Overview of the European Union reference laboratory (eu rl) for Listeria mon...
"Overview of the European Union reference laboratory (eu rl) for Listeria mon..."Overview of the European Union reference laboratory (eu rl) for Listeria mon...
"Overview of the European Union reference laboratory (eu rl) for Listeria mon...
 
Listeria and Omics Approaches for Understanding Its Biology
Listeria and Omics Approaches for Understanding Its BiologyListeria and Omics Approaches for Understanding Its Biology
Listeria and Omics Approaches for Understanding Its Biology
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
Microarray
Microarray Microarray
Microarray
 
Structure and function of the ribosome
Structure and function of the ribosomeStructure and function of the ribosome
Structure and function of the ribosome
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
DNA Chip
DNA ChipDNA Chip
DNA Chip
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 

Similar to Random RNA interactions control protein expression in prokaryotes

Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Paul Gardner
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chipnamvgta
 
2.2 analyzing and manipulating dna
2.2 analyzing and manipulating dna2.2 analyzing and manipulating dna
2.2 analyzing and manipulating dnaEmmanuel Aguon
 
19_21Translation
19_21Translation19_21Translation
19_21TranslationKaren Lewis
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3iainj88
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Candy Smellie
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expressionishi tandon
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Ek Han Tan
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of ageJan Hryca
 
Genome Editing Comes Of Age
Genome Editing Comes Of AgeGenome Editing Comes Of Age
Genome Editing Comes Of AgeChris Thorne
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
2013 transcription
2013 transcription2013 transcription
2013 transcriptionkuldip sodhi
 
2013 transcription
2013 transcription2013 transcription
2013 transcriptionkuldip sodhi
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Arnab kumar de
Arnab kumar deArnab kumar de
Arnab kumar deArnab De
 
Microarray validation
Microarray validationMicroarray validation
Microarray validationElsa von Licy
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Michael Weiner
 

Similar to Random RNA interactions control protein expression in prokaryotes (20)

Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?Does RNA avoidance dictate protein expression level?
Does RNA avoidance dictate protein expression level?
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chip
 
2.2 analyzing and manipulating dna
2.2 analyzing and manipulating dna2.2 analyzing and manipulating dna
2.2 analyzing and manipulating dna
 
19_21Translation
19_21Translation19_21Translation
19_21Translation
 
Honors ~ Dna 1314
Honors ~ Dna 1314Honors ~ Dna 1314
Honors ~ Dna 1314
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of age
 
Genome Editing Comes Of Age
Genome Editing Comes Of AgeGenome Editing Comes Of Age
Genome Editing Comes Of Age
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
2013 transcription
2013 transcription2013 transcription
2013 transcription
 
2013 transcription
2013 transcription2013 transcription
2013 transcription
 
E.coli rna polymerase
E.coli rna polymeraseE.coli rna polymerase
E.coli rna polymerase
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Arnab kumar de
Arnab kumar deArnab kumar de
Arnab kumar de
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]
 

More from Paul Gardner

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfPaul Gardner
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfPaul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methodsPaul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methodsPaul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrapPaul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlationPaul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samplesPaul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samplesPaul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spreadPaul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysisPaul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...Paul Gardner
 
BIOL335: RNA bioinformatics
BIOL335: RNA bioinformaticsBIOL335: RNA bioinformatics
BIOL335: RNA bioinformaticsPaul Gardner
 

More from Paul Gardner (20)

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
 
BIOL335: RNA bioinformatics
BIOL335: RNA bioinformaticsBIOL335: RNA bioinformatics
BIOL335: RNA bioinformatics
 

Recently uploaded

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Random RNA interactions control protein expression in prokaryotes

  • 1. Random RNA interactions control protein expression in prokaryotes Paul Gardner University of Canterbury Christchurch New Zealand
  • 2. Feel free to share what you hear These slides are available at: http://www.slideshare.net/ppgardne/presentations
  • 3. The hard work of Sinan Umu, Ant Poole & Ren Dobson
  • 4. mRNA levels are imperfectly correlated with protein levels Lu et al. (2007) Nature biotechnology.
  • 5. Determinants of protein concentration Protein concentration depends on mRNA concentration, translation and degradation rates DNA [D] RNA [R] Protein [P] ktranscription ktranslation kmRNA degradation kprotein degradation 0 1 A T GGC TA A GGGGCA A T C T T TA C A A G AT CC G T T C C T G A AC G C AC T G C G T C G G G A A C G T G T T C CAGTTTCTATTTATT T G G T G A A T G GTATTA A G C T GC AA G G G C AA A T C G A G T C T TT T G A T C AG T T C G T G A T C C T G T T G A A A A A C A C G G T C A GC C A G A T G G T TT A C A A GC A C G C G A T T T C T A C T G T T G T C C CG T CT C G C C C G G T T T C T C AT CA CA GTAA CAACGCCG GT GGC G G T A C C A G C A G T A A C T A C C A T C A TGGTAGCAGCG C G C A G A A T AC T T CC G C G C A ACAGG A C A G C G A A GAAACCG A A TAA de Sousa Abreu, Penalva, Marcotte & Vogel (2009) Global signatures of protein and mRNA expression levels. Molecular BioSystems.
  • 6. Two general models describe variation in translation rate 1. Codon usage (Ikemura, 1981) Figure from: Tuller & Zur (2015) Nucl. Acids Res.
  • 7. Two general models describe variation in translation rate 2. mRNA structure (Pelletier & Sonenberg, 1987) Figure from: Tuller & Zur (2015) Nucl. Acids Res.
  • 8. We think we have a third general model... http://dx.doi.org/10.7554/eLife.13479 http://dx.doi.org/10.7554/eLife.20686
  • 9. Non-coding RNAs are abundant q q q q q q q q 012345 log10(MeanReadDepth) Core ncRNA genes Core protein coding genes Lindgreen, Umu et al. (2014) PLOS Computational Biology.
  • 10. Bacterial non-coding RNA function Hfq AUG SD X Ribosome sRNA AUG RNase E recruitment AUG SD Ribosome Anti-antisense mechanism Selective mRNA stabilisation AUG RNase E Shine-Dalgarno sequence Sequestration of ribosome binding site Induction of mRNA decay SD = Figure by Bethany Jose
  • 11. Checking for mRNA:ncRNA interactions Looking for regulatory interactions which are specific and small in number, off-targets are non-specific and large in number Compare 5 ends of CDS & ncRNAs Looking for a bump on the left... −15 −10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Density
  • 12. Checking for mRNA:ncRNA interactions −15 −10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Native Shuffled (P = 7.69−52 )
  • 13. Checking negative controls! −15 −10 −5 0 0.000.050.100.150.200.25 Binding Energy (kcal/mol) Native Shuffled (P = 7.69−52 ) Different phylum (P = 0 ) Downstream (P = 2.66−124 ) Rev. complement (P = 6.51−57 ) Intergenic (P = 6.16−93 )
  • 14. Do ubiquitous and abundant RNAs influence translation? Given that ncRNAs are among the most abundant RNAs in the cell ([ncRNA] >> [mRNA]) AND that RNAs frequently hybridise THEN maybe stochastic interactions with mRNAs inhibit translation Corley & Laederach (2016) Bioinformatics: Selecting against accidental RNA interactions. eLife.
  • 15. How can this hypothesis be tested? We predict that: 1. There is selection against mRNA:ncRNA interactions 2. That stochastic mRNA:ncRNA interactions influence [protein]:[mRNA] ratios For consistency: focus on 6 ncRNA families & 114 mRNAs/proteins that are highly conserved & expressed; And first 21 nts of CDS. Tested 1,582 bacterial & 118 archaeal genomes
  • 16. Are mRNA:ncRNA interactions selected against? −15 −10 −5 0 −0.010−0.0050.0000.0050.0100.015 Binding Energy (kcal/mol) DensityDifference Actinobacteria (n:163) P = 9.8x10−69 Bacteroidetes (n:60) P = 8.7x10−148 Chlamydiae (n:38) P = 1.4x10−193 Cyanobacteria (n:40) P = 3.8x10−11 Firmicutes (n:378) P = 0 Proteobacteria (n:756) P = 0 Spirochaetes (n:38) P = 1.6x10−98 Archaea (n:118) P = 4.2x10−177 Background (n:100) More stable interactions NativeinteractionsShuffledinteractions Act Bac Chl Cya Fir Pro Spi Arc 010203040 −log10P
  • 17. Do mRNA:ncRNA interactions influence protein expression? ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 2.02.53.03.54.0 −300 −250 −200 −150 Rs=0.65 log10(fluorescence) Avoidance (kcal/mol) Expression data from: Kudla et al. (2009) Science.
  • 18. Do mRNA:ncRNA interactions influence protein expression? Testing the relationship between protein abundance estimates and avoidance, mRNA secondary structure, codon usage and mRNA abundance GFP datasets Mass-Spec datasets E.coli (n=52) GFP/qPCR E.coli (n=154) GFP/Northern E.coli (n=14,234) mCherry/RNAseq E.coli (n=389) MS/microarray E.coli (n=3,301) MS/microarray P.aeruginosa (n=5,479) MS/microarray P.aeruginosa (n=1,148) MS/microarray * * * * * * * * * * * * * * * * * * * * * * * * *P < 0.05 0.0 0.60.2 0.4-0.2 Correlation Coefficient Avoidance Secondary Structure Codon [mRNA]
  • 19. Testing the extremes of expression 0.1 0.5 0.8 1.2 1.6 1.9 2.3 2.6 3 3.3 3.7 4.1 4.4 4.8 Freq 0 20 40 60 80 100 120 A log10([Protein]/[mRNA]) Frequency low expression (n=10) high expression (n=10) B Avoidance Codon Sec.Str. Null Sec.Str. Codon Avoidance −2 −1 0 1 2 * * Zscore low expression (n=10) high expression (n=10) E. coli genes (n = 389)
  • 20. Designing mRNAs 239aa GFP can be encoded by 7.62x10111 synonymous mRNAs Extremes of avoidance have a stronger effect than codon usage or secondary structure ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 0.60 0.65 0.70 0.75 0.80 0.85 CAI log10(fluorescence) Rs=0.29 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 −15 −10 −5 0 Folding Energy (kcal/mol) Rs=0.34 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 4.24.34.44.54.64.7 −350 −300 −250 −200 −150 −100 Binding Energy (kcal/mol) Rs=0.56 hi low ● ● ● ● ● ● Avoid Fold Codon Optimal●
  • 21. Avoidance in 3D on the ribosome Protein binds to regions with low avoidance (green) while exposed regions are high avoidance (blue): P = 9.3x10−15, Fishers exact test
  • 22. Further Work Further work: Testing adaptation with experimental evolution experiments Do mRNA:ncRNA interactions influence eukaryotic gene expression? Number of possible interactions increases quadratically with number of genes. May require spatial & temporal separation of genes Does avoidance drive compartmentalisation and increases in nucleotide binding proteins? Do mRNA:ncRNA interactions influence viral infection, hybridisation, HGT & transformation expts? Are protein, DNA and protein:nucleotide interactions also avoided?
  • 23. And now for something completely different...
  • 24. Bioinformaticians are horrible! Bioinformaticians are bad, impatient & intolerant Build a phylogenetic tree: which of the 172 methods do you use? MBIORE ANC-GENE BAli-Phy BAMBE BayesPhylogenies BEAST BEST Bio++ bms_runner burntrees Cadence Crux IMa2 Mesquite MrBayes MrBayesPlugin MrBayes-tree-scanners Multidivtime p4 SIMMAP PAL tracer PAML Vanilla PHASE PHYLLAB PhyloBayes ARB Bionumerics BIRCH Bosque BPAnalysis CAFCA CRANN DAMBE EMBOSS TNT FootPrinter Freqpars Gambit GAPars GelCompar-II GeneTree gmaes Hennig86 IDEA LVB MALIGN MEGA Mesquite Murka Network NimbleTree NONA Notung Parsimov PAST PAUP* PAUPRat PaupUp phangorn PHYLIP PhyloNet Phylo_win POY PRAP PSODA RA SeaView SeqState Simplot sog TCS Parsimony Maximum Likelihood Bayesian ALIFRITZ aLRT ARB Bio++ Bionumerics BIRCH BootPHYML Bosque CodeAxe CoMET Concaterpillar CONSEL Crux DAMBE DART Darwin dnarates DPRML DT-ModSel EMBOSS EREM fastDNAml fastDNAmlRev FASTML FastTree GARLI GZ-Gamma HY-PHY IQPNNI Kakusan4 Leaphy Mac5 McRate Mesquite MetaPIGA MixtureTree Modelfit ModelGenerator MOLPHY MrAIC MrModeltest MrMTgui MultiPhyl NEPAL NHML nhPhyML NimbleTree p4 PAL PAML PARAT PARBOOT PASSML PAUP* PAUPRat PaupUp phangorn PHYLLAB PhyloCoCo Phylo_win PHYML PhyML-Multi PhyNav PHYSIG PLATO Porn* PRAP PROCOV ProtTest PTP r8s-bootstrap Rate4Site Rate-evolution RAxML raxmlGUI RevDNArates rRNA-phylogeny SeaView Segminator SEMPHY SeqPup SeqState SIMMAP Simplot SLR Spectronet Spectrum SplitsTree SSA TipDate Treefinder TREE-PUZZLE Vanilla
  • 25. How can we choose software? Which methods do you use?
  • 26. Approach software like a scientist Are any good controls available? Positive: databases, publications, simulation, ... Negative: randomized, select relevant negative data, ... Some common accuracy metrics: Sensitivity (true positive rate) Specificity (true negative rate) Mathew’s correlation coefficients Area under an ROC curve False positive rate Truepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 DBS, Pfam DBS, Treefam DBS, Custom PROVEAN Polyphen−2 SIFT FATHMM, weighted FATHMM, unweighted Wheeler et al. (2016) A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes. Bioinformatics.
  • 28. Is there really a relationship between software speed & accuracy? Can we run a meta-analysis of bioinformatic benchmarks? If speed isn’t related to accuracy, then what is? Some possibilities: Software age Journal “impact” (IF & GoogleScholar H5) Number of citations Corresponding author’s H-index & M-index
  • 29. After some literature mining... found 43 matching articles. 102 benchmarks Accuracy & speed ranks for 243 bioinformatic software tools Manually extracted IF, H, age, ... 65 journals (Bioinformatics, NAR, Genome Research, ...) 151 author GoogleScholar profiles abyss antepiseeker apg barry bellerophontes bfast bismark biss boost bowtie bowtie2 bowtiestar bratbw bsmap bsmooth bsseeker buckycon buckymrbayes buckymrbayesspa buckypop buckyraxml builder bwa bwasw caml camp carma ce celera clark clc clustalomega clustalw comus coprarna coral cosine crisp cro cromwell cufflinks cwt dali de dexseq dialign dialign22 dialignt dialigntx diffsplice diginormvelvet dima djigsaw downhillsimplex dsgseq ebi echo edenanonstrict edenastrict edit epimode ericscript erpin fa fasta fasttree fisherexacttest fusioncatcher fusionmap gassst gatk genometa gojobori goldman gossamer gottcha greedyft gsnap heidge hitec hmmer hshrec idbaud igtpduplossft inchworm infernal intarna jaffa kalign kbsps kraken kthse leidnl limpic lmat lms lofreq lsqman mafft mafftfftns mafftfftns2 mafftlinsi mapsplice maq mats megan metaphlan metaphyler methylkit methylsig mgrast minia mira mirdeep mireap mirena mirexpress mlclustalw mlclustalwquicktree mlmafft mlmafftparttree mlmuscle mlopal mlprankgt modellerv mosaik motu mpest mpjclustalw mpsclustalw mrfast mrpml mrpmp mrsfast msinspect multalin muscle musclemaxiters mzmine nbc ncbiblast nest newbler nfuse novoalign oases onecodex openms pairfold paralign pass perm phylonetft phylopythias phymmbl piler poa poy poystar pragcz probalign probcons probtree process pso pt qiime qsra quake raiphy ravenna raxml raxmllimited rdiffparam repeatfinder repeatgluer repeatscout reptile rmap rnacofold rnaduplex rnahybrid rnaplex rnaup rsearch rsmatch sam sate scro scwrl scwrlcons segemehl segmodencad seqgsea seqman seqmap sga sharcgs shrimp simulatedannealing sl smalt snap snpruler snver soap soap2 soapdenovo soapec soapstar spades sparse sparseassembler spcomp specarray spt srmapper ssaha ssake ssap ssearch ssm sst st starbeast strcutal swissmodel taipan targetrna targetrna2 taxatortk tcoffee team tmap tophatfusion transabyss trinity upmes varscan vcake velvet wmrpmp woodhams wublast xalign xcmswithcorrection xcmswithoutretentiontime zema
  • 30. Nothing is correlated with accuracy! R el.age Year AccuracySpeed JH 5 JIF C ites R el.citesH −index M −index R el.age Year Accuracy Speed JH 5 JIF C ites R el.cites H −index M −index R el.age Year Speed JH 5 JIF C ites R el.cites H −index M −index X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Correlates with accuracy rank Spearman'srho −0.2 −0.1 0.0 0.1 0.2 xxx x x x x x x xx xx x x x x x xx x x x x x x xx x x x xxx x x x x x xxx x xx x x xx x x x x x x x x x x xx x x xx x x x x xx x x x xx x xx x x xxx x x xx xx x x x x x xx xx x x x xx xx x xxxx x x x x x x x xxx xx xxx x x x x x x x xx x xx x x x xxxxxx x xx xxxxxx x x x x x x x x x x xxxx x xxxx xx x x x xx xxx xx x x x xx x x x x x xx x x x x x x x xx x x xx x x x xx xx x xx x x x x xx x x x x x x x x xx x xx x x x x xx xx x x x x xxx x x x x x x xx x x x x x x x x x x x xx x x x x x x x x x x x x x xx x x x x xx x x x x x x x x x xx x x xx x x x x x x xx x xxx x x xxxx x xx x xxxx x xxxx x x xx xx x xxx x xx xx x x x x x x x xx x x x x x x x xx xxx x xx x x x xx xxx x x x x x xxx x xx x xx xx x x x x x x x xxxxx x x x x xx x x x x xxxxx x x xx x x xxx x xx x x xx x x x x xxxx x x xx x x x xx xxx x x x x x x x x x x x xx x xx x xx xx x x x x x xx x x xxx x x x x x xxxxxx xx x x xxx x x xx xxxx xx xx xxx x x xxxxxxx x xxx x xxxxxxx x x x x xxx xx x x x x xxxxxx xxx x x x x xxxx x x x x x x x xxxxxxx x x xxx xx xx xxxxx x x x x x xx x x xx x xxxxx x x xx xxx x x x xx xxx x x x x x x xxxx x x x xxxx x xxx x x x x xx x xx x x x xxx x x xx x x xxx x x xxx x x x x x x x x xx x x xx x x x x x x x x xx xx xx xx x x x x x x xx x x xx x x xx x x x x xx x x x x x x x xx x x x x x xx x x xx x x xxxx x x x x xxx xxxxx x x x x x x x x x x xx x x xxx xxx xx x x x x x x xx xx x x x x x xx xx x x x x x x xx xx x x xx x x x x xxx x xx x x xx xx x xx x x xx x x x xx x x x x xxx x x x x xx x x x x xx x x x x xx x x xx x x x x x x x x xxx xx x x xx x xxx xx x x xxx xxx xxx x x x x x x xx x x x xxx xx x x x xx x x x x x x x xx x x x xxx x x x x x x xx x xx x x x x x x x xxxx x x x xxx x x x x xx xx x x x x xx x x x x x xx x x x xx x xxx xx x x x xx x xx x x x x x xx x x x x x x x x xxx x xxxxxx x x x x x x x x x x x x x x x x xx x x x xx x x x xx x xx x x x x x x xx x x x x x x xxx x x xx x x x x x xx x xx x x x xx xx xx x x xx x x x x x x xx x x xxx x x x x x x x xx x x x x xx xx x x x x x xx x x x x xxxxx x x x xx xx x xx x x x x x x x x x x x x xx x x x x x x x x x xxxxx x x x x xx x x x x xx x x x x x x x xxxx x xxx x x x x xxx x x x x xx x x x xx x x x x xx x x x xx x x xx xxx x x xx x x x x x x x x x x x x x x x x x x x x xx x xx x x x x x xxx x xxxxxx x x x x x xxxx x x x xxxxx x x x xx x x x xxx x x x x x x xx x x x x x x xx x x x x xx x x x x xx x x x x x xx x x xx xx x x x xx x xx x x x x x x x xxxx x x x x xxxx xxx xx x xx x x x xxx x x x x x x x xxx x xx x x xx x x x x xx x x x x x xxx x xxx xxx x x x x x x x x x xx x xxx x xxx x x x x xxxx x xxxx x xx x x x x x xx xx x x x x x x x x x xx xxx x x x x x x x x x x x x x x xxx x x x xx x x x x x x x xx x x xx x xx x x x x x x x xxx x x x x x x x x x x xxx x x x x x x x x xx xx xx x x x x x xx x x x x x xx x x x x x x x xx x x x x x xx x x x x x xxxx xx x x xxxx xx x x x x x x x x x x x x x xx x x x x x x xx x x x x xxx xx x xxx x x x x x x x xxx x xx x x x x x x x x x x x x x x x x x x xx x x x xxx xx x x x x x x x x xx x xxx x xx x x xxx xx x x x x x x xx x x x xx x xx x x xx x x xx xxx x x x x x x x x x x x x xx x x x xx x x x x x x x xx x xx x xxx x x x x x xxx xxx x x xx x xx x x x x x x x x x x x x x x x x x x x x x xx x x x xx x x xxx x x x x x x x x x xx xx x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xx xx x x x x xx x xx x x x x xx x x x x x x x x xx x x x x x x x x x x x x x x x xxx xx xx x x xxxxxx xx x xxxxx x x x xxx xxx x x x x x x x x xxx x xx x x x x xxx x x x x x xx x xxx x x xx x xx xxx x xx x x x x x xx x xxxxx x x x x xx x x x x x x x x x x x x xx x x xx xx x x xx x x x x xxx x x xx x xx x x x xxx x xx x x x x x x xx x xxxxx x x x x x x x xx x x x x xx x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x xx xx xxx x x x x x xx x x x x xx x x x x x x x x x x x x x x x x x x x x x xx xx x xx x x xxx xx x x x x x x x x x x x x xx x x x x xx x x x xx x x xxx x x x x x x xxx x x xx x xx x x xx x x x x x x x x x x x x xx xx xxx x x x x x x xx x x x x xx x x xx x x x x xx x x x x x x x x xx x x x x x xx x x x x xx x x x xx xx x x x x x xxxx x x x x x x x x x xx x xx x x x x x x x x x x x x x x xxx x x xx xx x x x xx xx x x x x x x x x x x x x x x x x xxx x x x x x x x x x x x x x x x xxxxx xx xx x x x x x x x x x x xxx x xx x x x x xx xx x x xxx x xx xxx x x x x xx x xx x x x x x x x x x x x x x x x x x xx x x x x x xx x x x x x x x x x x x x x x xxxx x x x xx x x x xxx xx xx xxx x x xx x xx x xx x x x x x xxx x x x x x xx x x xx x x x x xx x xxxx xx x x x xx x x x x x x x x x x xxx xx x x x x x x x x x x xx x x xx x x x x xx x x x xx x x x xxxx xx xx x x x x x x x x x x x x x x xx x x x xx x x x x x x xx x x x xxx x x x xx x xx x x x x xx xxx x xxx x x x xxxxx x x x x xx xxx xxx x xxx x x x x x x x x xx x xx xx x x x x x xxx x x xx xx x x xx x x x xx xx xx x x x x x x x x x xxx x x x x xx x x x x x xx x x x x x x x x x x xx x x xx xx x x x x xxx x x x xxx x x xx x x x x x x xx x x xxxx x x xx x x x x x xx x x x xxx x x x x x xxx xx x x x xx xxx x x x x x x x xx xx x x x x x xx x x x x x x xx x x x x x x x x x x x x x x x xx x xxxx xxx x x x x x x x x x xxx x x x x x x x xx x x x x x xx x x x x x x x x xx x x x x x x x xxxxxx x x x x x xxx x x x x xx x x x x x xx xxx x x x x x x x x xx x x x xx x x x x x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x xxx x x x x x x x x x x x xxxx x x x x x x x x xx x x x x xx xx x x x x x x x x x xxx x x x x xx x x x x x x x x x x x xxx x xx x x x xxx xx x x x x x x x xx xx x x x x x x x x x x x x x x x x x x x x x x x xx x x x x xx x x x xx x x x x x x x xx x xx x x x xx xx xxx x x xx x x xx xxx x xxx x x x x x x x x x x xx x x x x x xx x xx x xxxxxx x x x xxx x x x x xx x x x x x x x x x x x x x xx x x x x x x x x x x xx x x xx xx x xx xx x x x x xxx xx x x x x x x x xx xxx x x x xx x x x x x x x xx x x x x x xx x x x x x x x x x x x x x x x x x x x x x x x xx x x x x x x x x x x x x xxx x x x xx xx x x x x xx x x x x x x x x x xxx x x x x x x x xxx xx x xx x xx x x x x x x xx x x x x xx x x x x x xxx x x x xx xxx x x x x x x x x xx x x x x xx x xx xxx xx xxxxxx x x x x xxx x x x x x x x x xxxxx x xx xx x xx xx xxx x x xx x x x x x x x xxx x x x x x x x x x xx x xx x x x x x x xx x xxx x x x xx x xxxx xx x xx x xx x xx x xx x xx x xxx x xx x x x x xx xx xx xx xxx x x x x x x x x x xx xx x x x xx x x x x x x xx xx xx x xx x x x x x x x x x xx x xxxx x x xx x x x xx x x x x x x x x x x xx x x xxx x x x x x x x x x x x x xx x xx x xx x x xx x x x x x x x x x xx x x xx x x x x xx x xx x x x x xx x xx x x xx x x x x xxx xx x x x x x xx x x x x xx x x x x x xx xx x x x xx x x x x xx x x x x x x xxx x x xx x x x x x x x x xx x xxxx x x x x x -1 0 1 Spearman's rho A B
  • 31. -3 30 Z-score Speed Accuracy Freq. 0 6 12 0 1000 2000 Freq. 0 6 12 0 1000 2000 Freq. 0 20 0 5000 10000 10 Freq. 0 6 12 0 1000 2000 Freq. 0 6 12 0 1000 2000 X X X X X X X X X X X X X X X X X X X X X
  • 32. Conclusions Speed is NOT reflective of accuracy Neither is author/journal reputation, software age & # citations The only reasonable way to select software is by benchmarking Publication bias is influencing software accuracy It doesn’t matter how famous you are, you can still write great software!
  • 33. Thanks! Avoidance: Sinan Umu, Anthony Poole & Renwick Dobson Meta-benchmark: James Paterson, Fatemeh Ashari Ghomi, Sinan Umu, Stephanie McGimpsey, Aleksandra Pawlik Umu, Poole, Dobson & Gardner (2016) Avoidance of stochastic RNA interactions can be harnessed to control protein expression levels in bacteria and archaea. eLife. Gardner et al. (2017) A meta-analysis of bioinformatics software benchmarks reveals that publication-bias influences software accuracy. In preparation. These slides are available at: http://www.slideshare.net/ppgardne/presentations