SlideShare uma empresa Scribd logo
1 de 17
JMB—MS 440 Cust. Ref. No. PEW 84/94 [SGML]
J. Mol. Biol. (1995) 248, 27–43
Identification of Base-triples in RNA using
Comparative Sequence Analysis
Daniel Gautheret, Simon H. Damberger and Robin R. Gutell*
Comparative sequence analysis has proven to be a very efficient tool for theDepartment of Molecular
Cellular and Developmental determination of RNA secondary structure and certain tertiary interactions.
However, base-triples, an important RNA structural element, cannot beBiology, Campus Box 347
predicted accurately from sequence data. We show here that the poor baseUniversity of Colorado
Boulder, CO 80309-0347 correlations observed at base-triple positions are the result of two factors. (1)
Base covariation is not as strictly required in triples as it is in Watson–CrickU.S.A.
pairs. (2) Base-triple structures are less conserved among homologous
molecules. A particularity of known triple-helical regions is the presence of
multiple base correlations that do not reflect direct pairing. We suggest that
natural mutations in base-triples create structural changes that require
compensatory mutations in adjacent base-pairs and triples to maintain the
triple-helix conformation. On the basis of these observations, we devised two
new measures of association that significantly enhance the base-triple signal
in correlation studies. We evaluated correlations between base-pairs and
single stranded bases, and correlations between adjacent base-pairs.
Positions that score well in both analyses are the best triple candidates. This
procedure correctly identifies triples, or interactions very close to the
proposed triples, in type I and type II tRNAs and in the group I intron.
Keywords: RNA structure; comparative sequence analysis; base-triples*Corresponding author
Introduction
Base-triples are among the essential tertiary
interactions in RNA three-dimensional structure.
The best characterized RNA base-triples are those of
tRNA (Quigley & Rich, 1976; Sussman & Kim, 1976),
and there is also good evidence for base or nucleotide
triples in self-splicing group I introns, in which they
are required for enzymatic activity (Michel et al.,
1990). Base-triples involving a base-pair and a distant
single-stranded nucleotide create long-range con-
straints on RNA folding, and constitute powerful
assets for structure determination. The value of
base-triple information in modeling studies has been
clearly demonstrated in the case of group I introns
(Michel & Westhof, 1990; Jaeger et al., 1994), and
more benefits can be expected from the incorpor-
ation of base-triple information in computational
RNA folding procedures (Malhotra et al., 1990; Major
et al., 1993). The prediction of base-triples directly
from sequence information is therefore highly
desirable.
Certain base interactions, those constituting RNA
secondary structure, can be predicted accurately
from sequence data using comparative sequence
analysis, a method based on the principle that
evolution maintains a common structure through
compensatory mutations (reviewed by Gutell, 1993;
Woese & Pace, 1993). Compensatory mutations were
initially identified visually in relatively small
sequence alignments, resulting in the first reliable
secondary structure models (Gutell, 1993; Woese &
Pace, 1993). The simultaneous growth of sequence
databases and refinement of computational methods
have significantly enhanced our ability to derive
base–base interactions from sequence analysis
(Olsen, 1983; Gutell et al., 1985; Haselman et al., 1988;
Winker et al., 1990; Chiu & Kolodziejczak, 1991;
Gutell et al., 1992). Although methods have improved
sufficiently to identify correctly several tertiary
interactions in 16 S and 23 S rRNA (Gutell et al.,
1994), predicting base-triples with confidence
remains problematic. Only a few base-triples have
been suggested on the basis of comparative analysis
to date, in the early study of tRNA by Levitt (1969),
in rRNA (Gutell, et al., 1994) and in the group I intron
(Michel et al., 1990), where triples were experimen-
tally substantiated (Michel et al., 1990; Green &
Szostak, 1994).
Present address: D. Gautheret, Departement de
Biologie, Universite Aix-Marseille II, Faculte de Luminy,
13 000 Marseille, and J.G.S., C.N.R.S., 31 ch. Joseph
Aiguier, 13 402 Marseille Cedex 20, France.
0022–2836/95/160027–17 $08.00/0 7 1995 Academic Press Limited
JMB—MS 440
Identification of RNA Triples28
In spite of the scarcity of comparatively inferred
base-triples, these interactions are certainly wide-
spread, and therefore many remain to be discovered.
We have thus begun a detailed comparative
analysis of RNA triples to derive principles and
algorithms that can be applied to base-triple
prediction in different RNA molecules. The
availability of large sequence databases and of
several tRNA crystal structures now permits a more
thorough characterization of triple interactions. We
can now ask how base-triple structures vary in
related molecules, and how base sequences at and
around triples reflect these structural changes.
Principles derived from the analysis of tRNA and
group I intron triples can be incorporated into our
correlation analyses, and significantly enhance our
ability to predict base-triples from sets of aligned
sequences.
Characterization of Base-triples
Sequence correlations in the vicinity of
base-triples
Current comparative analysis methods detect
nucleotide interactions by measuring correlations
between pairs of RNA positions. This usually
involves the construction of contingency tables
containing the number of observations for each
base-pair at position i·j. Let no(Mi,Nj) be the
number of observations of base-pair M·N
(M,N $ 4A,U,G,C5) at position i·j. We compute the
number of bases M and N at positions i and j
(no(Mi) and no(Mj)) and the expected number
of observations for each M·N base pair:
ne(Mi,Nj) = no(Mi) × no(Nj). The difference be-
tween expected and observed values reflects the
dependence of the two positions. This difference can
be computed as follows (Olsen, 1983):
x2
= s
M,N
[no(Mi,Nj) − ne(Mi,Nj)]2
ne(Mi,Nj)
(1)
Mutual information is an alternative measure
of correlation that yields improved results in
the detection of RNA interactions (Chiu &
Kolodziejczak, 1991). It requires base frequencies
( fo(Mi,Nj), fo(Mi), fo(Nj)) to be used instead of
absolute numbers; it is computed as follows:
M(i,j) = s
M,N $fo(Mi,Nj) × ln
fo(Mi,Nj)
fo(Mi) × fo(Nj)% (2)
Mutual information accurately predicts the
secondary structure of tRNA, as well as the tertiary
pairs 15.48 and 26.44 (Chiu & Kolodziejczak, 1991;
Gutell et al., 1992). We present in Tables 1 and 2 the
M(i,j) values obtained in the base-triple regions of
tRNA and group I intron. For each position, the eight
highest correlations are shown (73 positions in tRNA
and 134 in the group I intron were analyzed). The
most significant correlations are at the top of each
column, and those corresponding to possible triples
are indicated by asterisks. The secondary structure
and tertiary interactions of yeast tRNAPhe
are shown
in Figure 1a. Base-triples involve positions 45·(10·25),
(12·23)·9 and (13·22)·46. The proposed group I intron
triples (Michel & Westhof, 1990) involve positions
(108·213)·259 and (109·212)·260 in the P4 stem and
(216·257)·105 and (215·258)·106 in the P6 stem. These
are shown on the intron secondary structure in
Figure 2.
The secondary structure correlations (10/25,
11/24, 12/23 and 13/22 in tRNA (see Table 1) and
108/213, 109/212, 215/258 and 216/257 in group I
(see Table 2)) are the highest at each helical position.
The correlations that follow Watson–Crick pairings
in Tables 1 and 2 are intriguing. Certain base-
triple positions correlate (23/9 and 22/46 in tRNA,
212/260 and 213/259 in the group I intron), but do
so more weakly than secondary pairs (compare, e.g.
23/12 and 23/9), and even more weakly than some
non-interacting positions. For example, in tRNA,
the value of correlation 23/9 (a base-triple) is lower
than that of 23/13 (non-interacting positions). The
Table 1
The eight best correlations (M(i,j) (Gutell et al., 1992) for tRNA positions 2, 9 to 13, 22 to 25 and 45 to 46 are evaluated
against all tRNA positions
tRNA positions
2 9 10 11 12 13 22 23 24 25 45 46
71a
0.90b
23 0.26* 25 0.08 24 0.78 23 0.99 22 0.33 13 0.33 12 0.99 11 0.78 10 0.08 46 0.12 13 0.31*
35 0.09 12 0.26* 45 0.06* 13. 0.29 13 0.30 46 0.31* 46 0.28* 13 0.28 13 0.28 24 0.06 13 0.11* 22 0.28*
31 0.06 13 0.12 64 0.04 36 0.18 9 0.26* 12 0.30 23 0.17 9 0.26* 36 0.16 11 0.06 22 0.08* 12 0.17
12 0.06 46 0.09 32 0.03 12 0.14 46 0.17 11 0.29 12 0.17 22 0.17 12 0.15 39 0.05 12 0.07 23 0.17
29 0.06 24 0.07 50 0.03 23 0.14 22 0.17 23 0.28 11 0.13 46 0.17 23 0.14 26 0.04 9 0.07 45 0.12
24 0.06 11 0.07 49 0.03 22 0.13 24 0.15 24 0.28 24 0.13 24 0.14 22 0.13 49 0.04 23 0.06 24 0.11
70 0.05 45 0.07 68 0.02 26 0.11 11 0.14 36 0.13 36 0.08 11 0.14 46 0.11 13 0.04 10 0.06* 35 0.10
41 0.05 22 0.06 5 0.02 46 0.10 26 0.09 9 0.12 45 0.08 1 0.08 26 0.11 65 0.04 36 0.06 11 0.10
Numbers in bold type denote correlations between nucleotides in close proximity in the 3D structure. Correlations corresponding
to a secondary structure base-pair are indicated by an asterisk, while base-triples in any of the type I tRNA crystal structures are
underlined.
a
tRNA position number, based on yeast Phe reference numbering. Base-triples for yeast Phe are: (10·25)·45, (12·23)·9 and (13·22)·46.
Alternative base triples found in other tRNA crystal structures are noted in Figure 1.
b
(M(i,j)) correlation value.
JMB—MS 440
Identification of RNA Triples 29
Table 2
The eight best correlations (M(i,j)) for group I intron positions 105 to 109, 212 to 213, 215 to 216 and 257 to 260 are evaluated
against the positions of the group I intron core (defined in Materials and Methods)
Group I intron positions
105 106 108 109 212 213 215 216 257 258 259 260
103a
0.39b
216 0.33 213 0.85 212 0.78 109 0.78 108 0.85 258 0.38 257 0.74 216 0.74 215 0.38 213 0.56* 212 0.47*
101 0.27 257 0.32 259 0.56* 108 0.51 108 0.49 259 0.56* 221 0.22 106 0.33 106 0.32 269 0.32 108 0.56* 109 0.43*
269 0.26 103 0.26 109 0.51 213 0.51 213 0.49 109 0.51 112 0.20 258 0.31 255 0.30 217 0.31 109 0.47 259 0.25
257 0.26* 101 0.22 212 0.49 259 0.47 260 0.47* 212 0.49 222 0.18 269 0.29 258 0.30 216 0.31 212 0.45 108 0.21
216 0.24* 105 0.21 278 0.23 260 0.43* 259 0.45 278 0.21 220 0.17 255 0.29 269 0.28 257 0.30 260 0.25 213 0.21
271 0.22 255 0.21 260 0.21 268 0.37 268 0.36 260 0.20 252 0.17 103 0.25 103 0.27 103 0.25 268 0.24 268 0.17
104 0.22 258 0.20* 96 0.21 307 0.28 307 0.28 268 0.20 208 0.17 105 0.24* 105 0.26* 255 0.25 284 0.20 258 0.14
255 0.21 217 0.18 268 0.20 256 0.21 256 0.21 96 0.19 218 0.16 101 0.21 101 0.22 222 0.24 278 0.18 253 0.13
Numbers in bold type denote correlations between nucleotides in close proximity in the 3D model of Michel & Westhof (1990).
Correlations corresponding to a secondary structure base-pair are underlined, while correlations corresponding to proposed base-triples
in the group I intron 3D model (Michel & Westhof, 1990) are denoted with an asterisk.
a
Group I intron position number based on T. thermophila reference numbering. Previously proposed base-triples are: (108·213)·259,
(109·212)·260, (215·258)·106 and (216·257)·105.
b
(M(i,j)) correlation value.
correlation 25/45 (a base-triple) is not within the
top eight correlates, ranking at number 31 in the
correlations involving position 25 (not shown).
Similar effects are observed in the group I introns
(Table 2).
A second important observation concerns the
network of correlations linking most nucleotides in
the vicinity of the tRNA base-triples. Significant
correlations between unpaired positions were
recorded earlier with smaller tRNA datasets (Olsen,
1983; Haselman et al., 1988), and in a more recent
study (Gutell et al., 1992). These ‘‘cross-correlations,’’
indicated by boldface numbers in Tables 1 and 2,
involve consecutive or non-interacting positions,
such as 11/12, 22/23, 9/46 and 9/12 in tRNA or
109/108, 109/259, 212/213 and 106/105 in group I
introns, spanning the entire triple-helical regions in
both RNAs. These correlations have values of the
same order of magnitude as the main secondary
structure correlations. This contrasts significantly
with what is usually observed in helical positions.
Typical Watson–Crick positions (see, e.g. tRNA
position 2 in Table 1, or Figure 3 of Gutell et al., 1992)
display a difference of one order of magnitude
between the first and second highest correlations,
and rarely show correlations with neighboring
positions. This analysis thus raises two questions
regarding base-triples. (1) Why do positions involved
in base-triples have correlation values that are lower
than secondary structure positions, and (2) why
would a triple-helical region display networked
correlations?
Why are sequence correlations weaker in
base-triples?
Base-triples do not demonstrate covariation as do
secondary base-pairs
Comparative analysis searches for a common
structure by identifying compensatory changes, or
covariation. This principle applies itself very well
to the detection of Watson–Crick pairs: in order
to preserve the Watson–Crick conformation,
mutations must occur in a compensatory fashion,
which results in four prominent sequence patterns
(A·U, U·A, G·C or C·G). Each base type in a position
is associated with a distinct base type in a second
position, and vice versa. Even when a significant
incidence of G·U or other non-canonical pairs is
observed, the existence of a secondary base-pair
usually remains unambiguous (Gutell et al., 1994).
Considering triple sequences in tRNA (Figure 3) and
group I introns (Figure 4), we find there is no strict
covariation between the secondary structure base-
pair and the third position. In the tRNA triple
(12·23)·9 (Figure 3b), there is covariation between the
sequences (U·A)·A and (G·C)·G, but this covariation
is obscured by the presence of several non-compen-
satory changes. For example, an A at position 9 is
associated with several different Watson–Crick pairs
at position 12·23. Similarly, a G·C pair at position
10·25 (Figure 3a) is associated with all four bases
at position 45. The other triples in tRNA and the
group I intron also display significant levels of
uncorrelated changes (Figures 3c and 4). These
observations lead us to ask why base-triples lack the
stricter patterns of covariation observed in secondary
structure base-pairs. This question can be answered,
at least in part, by an observation of base-triple
structures.
A perfect triple isomorphism is possible in the
absence of base covariation
The interaction of a Watson–Crick pair with a third
base occurs through different types of non-canonical
interactions, such as the Hoogsteen pairing. In
contrast to Watson–Crick pairs, these tertiary
interactions can retain an identical conformation
after a unilateral mutation. For example, the
Hoogsteen-like A9·A23 base-pair present in the
JMB—MS 440
Identification of RNA Triples30
(12·23)·9 triple of yeast tRNAPhe
can be converted to
a G9·A23 base-pair, which occurs in some tRNAs
(Figure 3b), while retaining the same conformation
(Figure 5a) (Klug et al., 1974). Among the multiple
non-canonical pairs that can be constructed with one
or two hydrogen bonds, there are several ways of
forming a unique conformation while modifying
either base in the pair. Numerous base-triple
conformations can thus be maintained through
non-compensatory mutations.
Base-triples vary in structure and position
The available tRNA crystal structures reveal more
structural heterogeneity in base-triples than in
secondary structure base-pairs. Figure 1 shows the
base-triples forming in four tRNA crystal structures.
The yeast tRNAPhe
base-triples (described above) are
shown in Figure 1a. Base-triples in E. coli tRNAMet
f
(Woo et al., 1980) and yeast tRNAMet
i (Basavappa &
Sigler, 1991) do not differ significantly from those
of yeast tRNAPhe
(data not shown). However, in
yeast tRNAAsp
(Dumas et al., 1985) (Figure 1b), a
base–sugar interaction that formed between pos-
itions 14 and 21 in tRNAPhe
is converted into a
base–base interaction, creating a (8·14)·21 base-triple.
On the basis of the electron density map, there is no
evidence for the triples 45·(10·25) and (13·22)·46 in
the E. coli tRNAGln
complexed with its cognate
synthetase (Rould et al., 1989; V. Rath & T. A.
Figure 1. Tertiary base/base interactions in 4 tRNA crystal structures, mapped onto the yeast tRNAPhe
secondary
structure. Continuous lines, base-triples; broken lines, other tertiary base/base interactions. Sugar–phosphate backbone
interactions are not shown. a, Yeast tRNAPhe
; b, yeast tRNAAsp
; c, E. coli tRNAGln
; d, E. coli tRNASer
. Insertions (+) and
deletions (r) relative to: a, yeast tRNAPhe
; and b, yeast tRNAAsp
, r48; c, E. coli tRNAGln
, r17; d, E. coli tRNASer
, r17, +19a,
+20a, +47a, 47b, to 47q. AA, amino acceptor stem; TCC, TCC stem and loop; D, D-stem and loop; AC, anticodon stem
and loop; V, variable loop.
JMB—MS 440
Identification of RNA Triples 31
Figure 2. Core secondary structure
and triple interactions in the T.
thermophila group I intron. Triples are
indicated by bold lines. Filled circles
denote triples and other positions
discussed in the text. The 2
putative triples, (220·253)·255 and
(110·211)·305 are shown by a thicker
broken line. The representation is
formatted as proposed in Cech et al.
1994.
Steitz, personal communication) (Figure 1c). Within
this same complex, the distance between the pair
12·23 and position 9 suggests that this triple also does
not form. Instead, a base-triple forms at positions
45·(13·22), resulting in a local conformation different
from that of tRNAPhe
(Figure 5b) (V. Rath & T. A.
Steitz, personal communication). Alignment errors
are an unlikely cause of this important difference,
since both E. coli tRNAGln
and yeast tRNAPhe
have a
variable loop of five nucleotides, and the bases
surrounding the variable triple are positioned
similarly in both three-dimensional structures.
Different triples also form in E. coli tRNASer
(GGA)
complexed with seryl-tRNA synthetase (Biou et al.,
1994) (Figure 3d). In this type II tRNA, all the
tRNAPhe
triples are absent, while other base-triples
form at positions (8·14)·21, 20·(15·48) and 9·(13·22)
(two insertions in the D-loop of this tRNA are given
the numbers 20a and 20b by Biou et al. (1994), while
they are numbered 19a and 20a in our alignment; the
triple noted 20a·(15·48) by these authors is thus
20·(15·48) here).
This comparison of tRNA structures is most
illuminating. Of the six available tRNA crystal
structures, four are different with respect to
their base-triples. Some of the observed variations
involve only small conformational changes (e.g.
the formation of the (8·14)·21 triple), and some
might result from the formation of the tRNA
synthetase complex (in tRNAGln
in particular),
but the fact remains that base-triples can form
differently even among tRNAs of the same
morphological family (e.g. type I tRNAs). There-
fore, even if triple sequences demonstrated covaria-
tion, this would not always involve the same pairs
of positions, and would therefore be poorly
detected. This is another important explanation for
the relatively low correlations observed at base-
triples.
Analysis of the network of correlations around
base-triples
We have shown that tRNA and group I introns
present networked sequence correlations in the
vicinity of base-triples. If these correlations, which
we will also refer to as ‘‘neighbor effects’’, are
specific to base-triples, they will constitute a useful
instrument for triple identification.
The previous observation of triples involving
different positions in different molecules could
explain some cross-correlations, for example 45/10
and 45/13, which could result from alternative
interactions in tRNAPhe
and tRNAGln
. However,
this does not explain correlations between
different pairs of the same helix (e.g. 11/12, 12/13,
12/22 and 22/23 in tRNA), which constitute
the majority of cross-correlations. From a closer
observation of structure variations in base-triples,
we explain below how these cross-correlations might
result from ‘‘compensatory’’ mutations involving not
only paired bases but also adjacent bases in a triple
region.
All the sequence combinations observed at a given
base-triple position cannot, in general, adopt an
identical triple conformation. For example, when the
third triple position changes from a pyrimidine to a
JMB—MS 440
Identification of RNA Triples32
Figure 3. Sequences observed at base-triple positions
in type I tRNA. Positions shown are those of yeast
tRNAPhe
triples. a, 45·(10·25) triple; b, (12·23)·9 triple; c,
(13·22)·46 triple. Only values greater than 5 are shown.
Numbers in bold type represent more than 10% of the
tRNAs.
The formation of a triple helix such as that in the
tRNA D-stem is a highly cooperative process
involving a complex network of ion binding, stacking
and van der Waals’ interactions (Bina-Stein & Stein,
1976; Holbrook et al., 1977). Therefore, a structural
change such as the one shown in Figure 5c might
adversely affect neighboring base-triples, and thus
‘‘compensatory’’ mutations may be required to
preserve the conformational or energetic properties
of the triple helix. In this case, a mutation in the third
base of an adjacent triple can be as appropriate
as further mutations in the same triple, since a
single change in the flanking position can directly
compensate for the backbone displacement. We
therefore propose that ‘‘compensatory’’ mutations in
a triple helix involve nucleotides in different stacking
planes as well as within the planes. Such mutations
could propagate through the triple helix and create
the multiple correlations observed. If this hypothesis
is confirmed, the presence of cross-correlations
would be indicative of triple helix formation.
An alternative explanation for the presence of
networked correlations could be the involvement of
the correlated positions in a common RNA identity
element. In other words, nucleotides of a base-triple
region could be selected as a whole in order to
maintain the specificity of the RNA with respect to
a certain biological process, such as interaction with
a specific protein, thus creating correlations between
non-interacting bases. Although a few identity
elements have been localized into the base-triple
region of tRNA (Pu¨tz et al., 1991; Smith & Yarus, 1989;
McClain, 1993a), we do not believe they are an
important source of networked correlations, for the
following reasons. First, cross-correlations are much
higher in the D-stem than in any other part of the
molecule (Gutell et al., 1992), although important
identity sites are present elsewhere (Hou &
Schimmel, 1989; McClain et al., 1991). Second,
cross-correlations in the group I intron are also
higher in the triple region (stems P4 and P6) than in
any other part of the molecule (see analysis below).
Finally, a recent experimental study (Hou, 1994)
demonstrated that mutations in the tRNA triples
(8·14)·21 and (13·22)·46 had major effects on the
structure of the 15·48 pair. This shows that large
physical constraints exist in this triple region that do
not result from tRNA identity. Although we cannot
exclude the possibility that identity elements
contribute to cross-correlations in base-triple re-
gions, there are better reasons for correlations to be
caused by base-triples or other complex folding
patterns.
We will now concentrate on two types of
cross-correlations. First, the interdependence of all
three bases in a triple produces correlations between
each position of the secondary structure base-pair
and the third base of the triple (see Tables 1 and 2).
Therefore, directly measuring the correlation be-
tween secondary structure base-pairs and single-
stranded bases (base to base-pair correlation) is
expected to produce a stronger signal than the usual
pairwise correlations. A second type of interesting
purine, it is not always possible to build a triple that
would accommodate the bulkier residue without
significantly displacing the sugar backbone of the
third nucleotide. In the (108·213)·259 triple in the
group I intron, most species have a pyrimidine at
position 259 (Figure 4a), and these can all be folded
into a conformation very similar to that shown for the
(C·G)·C triple in Figure 5c (Michel et al., 1990).
However, a change from (C·G)·C to (C·G)·G,
which occurs naturally, requires a relatively large
displacement of the sugar backbone of G259, as
shown in Figure 5c (Michel et al., 1990). We expect to
observe such variations in most RNA triples, since
both purines and pyrimidines generally occur as the
third residue (see Figures 3 and 4). Even when
sequence variations maintain a purine or a
pyrimidine as the third base, hydrogen bonding
constraints can prevent the conservation of an
identical structure. We thus expect to observe
widespread conformational variations of the type
shown in Figure 5c.
JMB—MS 440
Identification of RNA Triples 33
Figure 4. Sequences (seqs) observed
at base-triple positions in group I
introns. Positions shown are those of
the suggested T. thermophila triples.
a, (108·213)·259; b, (109·212)·260;
c, (215·258)·106; d, (216·257)·105. Only
values greater than 2 are shown.
Numbers in bold type represent more
than 10% of the introns.
correlation occurs between adjacent base-pairs
(base-pair to base-pair correlation). Assuming this
type of correlation is characteristic of base-triples, its
identification should also help predict triples. In the
following sections, we propose methods to quantify
these two types of correlations.
Figure 5. Alternative structures of
homologous base-triples in different
RNAs. a, Base-triple (12·23)·9 in yeast
tRNAPhe
, and a possible conformation
for the same triple after an A9 to
G9 mutation. b, Triples forming with
base-pair 13·22 in yeast tRNAPhe
(Quigley & Rich, 1976) and E. coli
RNAGln
(V. Rath & T. A. Steitz, personal
communication). c, Proposed structure
of group I intron triple (108·213)·259
forming with C259 or G259 (Michel
et al., 1990).
JMB—MS 440
Identification of RNA Triples34
Table 3
Base to base-pair correlation (x2
) and neighbor effects (N) in type I tRNAs (895 sequences)
Best Suggested cause of
correlates Pair x2
best nt x2 a
Nb
correlationc,d
+ 13·22 t 46 100 100 Triple
+ 12·23 t 9 76 64 Triple
31·39 t 36 69 20 Id (Yarus, 82)
3·70 t 35 61 16 Id tRNAAla
(Hou & Schimmel, 1989)
51·63 : 36 41 9
+ 11·24 : 36 41 57 Id tRNATrp
(Hisch, 1971)
13·22 9 45 40 100 Triple (E. coli tRNAGln
)
1·72 : 35 35 16 Id tRNAGln
(Rould et al., 1989)
+ 10·25 : 45 27 27 Triple (yeast tRNAPhe
, tRNAAsp
)
1·72 9 73 26 16 Id tRNAGln
(Rould et al., 1989)
15·48 : 35 24 NA Id tRNACys
(Hou et al., 1993)
27·43 : 36 22 14 Id tRNATrp
(Shultz & Yarus, 1994)
30·40 : 36 20 13 Id (Yarus, 1982)
Correlations are ranked by the x2
value. Only those correlates within 20% of the highest x2
value are
listed. The best correlates are identified with a plus (+) in the first column (see Table 7).
a
% of highest value.
b
Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no
neighbor in the secondary structure.
c
Id, identity element possibly responsible for the correlation. When several tRNAs have identity
elements matching the correlated positions, only one is cited as an example.
d
References are given for identity elements only.
Inferring Triple Interactions
Base to base-pair correlations: identification of
potential triples
Our goal here was to consider base-pairs as single
variables, and directly compute the correlations
between base-pairs and single-stranded positions.
These base to base-pair correlations can be evaluated
using a x2
test, replacing the usual 4 × 4 contingency
table with a 16 × 4 contingency table that compares
the four possible sequences for the single position
with the 16 possible sequences for the base-pair.
Such a table is similar to those in Figures 3 and 4. A
x2
test can be performed using equation (1).
However, large contingency tables increase the
probability of having empty or almost empty cells
that can strongly bias x2
values. To remedy this
problem, we subdivided the 16 × 4 table into several
sub-tables. This method is an alternative to that
proposed by Olsen (1983) to address the same prob-
lem. For each row M and column N in the original
16 × 4 table (T), we create a 2 × 2 table of the form:
T(M,N) Si = 1,4T(i,N) −
T(M,N)
Si = 1,4;j = 1,16T(i,j) −
Sj = 1,16T(M,j) − Si = 1,4T(i,N) −
T(M,N) Sj = 1,16T(M,j) +
T(M,N)
Values from table T are compressed in the new
2 × 2 table, so that one of the cells contains T(M,N),
two other cells contain the sums of the remaining
values of row M and column N, and the last cell
contains the sum of all remaining values in table T.
Such tables are generated for each value of M and N.
Values of x2
are then computed for all sub-tables,
except those having expected values smaller than 5
in any of their cells. The highest x2
value generated
is kept as the final correlation value.
To simulate an application of the method to
base-triple prediction, we computed only corre-
lations between known secondary structure base-
pairs and unpaired positions. In tRNA, tertiary
interactions predicted by pairwise comparative
analysis (15·48 and 26·44; Gutell et al., 1992) were also
included as base-pairs, so that triples involving these
pairs could be detected. An application of this
procedure to the type I tRNA alignment yields the
results shown in Table 3.
The following criteria were used to establish the
significance of correlations. Since x2
values do not
have an upper limit and vary with the number of
sequences considered, we did not use absolute x2
values, but the percentage of the highest value
encountered in the whole analysis. The highest
correlation observed in a given molecule thus takes
a value of 100. A cut off point for the significance of
correlations was then chosen empirically, based on
known base-triples. All known tRNA and group I
intron triples (see below) have a x2
value greater
than 25% of the highest value. We thus tentatively
considered correlations in this range as significant (a
cutoff of 20% is used in Table 3 in order to show
additional correlations, which will be discussed
below).
A second selection criterion was introduced to
treat base-pairs that had significant correlations
(>25%) with several single-stranded positions. To
solve this problem, we performed the correlation
JMB—MS 440
Identification of RNA Triples 35
analysis in two directions: we sought the single-
stranded positions that best correlated with each
base-pair, and we sought the base-pairs that best
correlated with each single-stranded position. When
a base-pair and a single-stranded position are
mutually best correlates (hereafter termed ‘‘recipro-
cal correlates’’), they are indicated with a double
arrow in Table 3. When either the base-pair is the best
correlate of the single-stranded position, or the
single-stranded position is the best correlate of the
base-pair, the correlation is indicated with a single
arrow towards the best correlate. When neither the
base-pair nor the single-stranded position is the best
correlate, the correlation is not shown. This method
considerably reduces the number of correlations
shown for each position. These two criteria are used
in all subsequent analyses.
In type I tRNAs, the two triples (13·22)·46 and
(12·23)·9 can now be predicted with confidence. They
both display high and reciprocal correlations. The
best correlate of position 10·25 is 45, as expected from
triples forming in yeast tRNAPhe
. However, this
relationship is not reciprocal: the best correlate of 45
is 13·22, which rather reflects the E. coli tRNAGln
situation, where the triple involves positions
45·(13·22). Since there is a precedent for base 45 to
form triples with at least two different base-pairs,
correlations results like this are expected.
Other high and reciprocal correlations are
(31·39)/36 and (3·70)/35. Interestingly, these seem-
ingly false positives are not artifacts. The pair 3·70 is
an important identity element for the aminoacylation
of alanine tRNAs of several organisms (McClain &
Foss, 1988; Hou & Schimmel, 1989) and of various
other tRNAs (reviewed by McClain, 1993b), so it is
not surprising that it varies in concert with position
35, at the center of the anticodon, and therefore
necessarily associated with tRNA identity as well.
The (31·39)/36 correlation was identified by Yarus
(1982) in the ‘‘extended anticodon’’ hypothesis,
which states that several positions in the anticodon
stem and loop are selected as a block to confer on the
tRNA an optimal coding accuracy. It is also
noteworthy that 36 is the best correlate of pair 27·43,
which reflects a recent experimental association
between these two sites in the control of translation
by tRNATrp
(Schultz & Yarus, 1994). Other ‘‘false
positives’’ can be related to known tRNA identity
elements (see references in Table 3), suggesting that
this method is identifying biologically meaningful
associations.
The recent determination of a tRNASer
crystal
structure (Biou et al., 1994) provides us with valuable
base-triple information about type II tRNAs.
We applied the same correlation analysis to this class
of tRNAs to determine whether the two triples
forming in tRNASer
at positions 20·(15·48) and
9·(13·22) could be detected (there is not enough
variation in basepair 8·14, also involved in a triple,
to seek correlations involving this pair). Table 4
presents the highest base to base-pair correlations
observed in an alignment of 262 type II tRNAs
(comprising serine, leucine and certain tyrosine
tRNAs). In excellent agreement with the crystallo-
graphic data, the only ‘‘reciprocal correlates’’ in
Table 4 are observed at positions involved in
base-triples in E. coli tRNASer
. The triple 20·(15·48)
has the highest overall x2
value, and the triple
9·(13·22) has a x2
value above the threshold of
significance defined previously, albeit relatively low
Table 4
Base to base-pair correlation (x2
) and neighbor effects (N) in type II tRNAs
(262 sequences)
Best Suggested cause of
correlates Pair x2
best nt x2 a
Nb
correlationc
+ 15·48 t 20 100 NA Triple
15·48 9 21 94 NA Neighbor effect
15·48 9 59 91 NA
+ 12·23 : 21 90 64
12·23 9 15 86 64
12·23 9 48 82 64
12·23 9 20 79 64
15·48 9 35 57 NA Id? (as in tRNACys
)
3·70 : 35 37 16 Id? (as in tRNAGln
)
12·23 9 73 36 64
2·71 9 20 33 11
6·67 t 15 30 13
2·71 9 59 28 11
27·43 t 35 28 14 Id? (as in tRNATrp
)
27·43 9 36 28 14 Id? (as in tRNATrp
)
+ 13·22 t 9 27 100 Triple
6·67 9 37 26 13
Correlations are ranked by the x2
value. Only those correlates within 25% of the
highest x2
value are listed. The best correlates are identified with a plus (+) in the first
column (see Table 7).
a
% of highest value.
b
Neighbor effects computed according to eqn (3) (% of highest value). NA,
base-pairs having no neighbor in the secondary structure.
c
See footnotes to Table 3.
JMB—MS 440
Identification of RNA Triples36
Figure 6. Sequences observed at positions forming
base-triples in E. coli tRNASer
. a, Triple (13·22)·9. b, Triple
(15·48)·20. Only values greater than 2 are shown. Numbers
in bold type represent more than 10% of the tRNAs.
results. These mutations are particularly interesting,
since they are probably not the result of a fortuitous
ancestral event, but instead they more likely reflect
‘‘neutral’’ changes between functionally equivalent
sequences. To help identify more base-pair and
base-triple interactions with comparative analysis,
we need to determine the number of times these
concerted mutations have occurred throughout the
evolution of the RNA under study. The larger the
number of such phylogenetic events, (e.g. concerted
mutations over evolutionary space), the more
significant that correlation is, and thus the more
confident we are that the positions of interest are
physically interacting. This general concept was
introduced a number of years ago, and was utilized
to reinforce our case for some of the first proposed
base–base tertiary interactions in 16 S rRNA (Gutell
et al., 1985). This type of observation, essential in
correlation analyses, requires knowledge of the
phylogenetic relationships among the sequences
under study. For tRNA, these relationships are
unclear. On one hand, all tRNAs interact with the
ribosome and its factors; thus, they are all under this
common constraint; changes in their sequence in the
evolutionary dimension will be neutral. On the other
hand, tRNA sequences within each acceptor family
are constrained by a specific synthetase recognition
function. Thus, tRNAs have at least two mutational
dimensions, which obscure their phylogenetic his-
tory (for a more detailed assessment of this issue,
please see: Ninio, 1982; Cedergren et al., 1981). In
contrast, a molecule such as the 16 S rRNA has the
same function in all organisms. Its phylogeny, and the
phylogeny of the cells in which these 16 S rRNAs
exist, is well defined (Woese, 1987). Thus, compara-
tive studies can determine with more confidence the
number and nature of the concerted mutations that
have occurred throughout a phylogenetic tree,
allowing us to pinpoint mutations that occurred
between closely related RNAs. These changes are
the most likely to be ‘‘neutral’’.
Group I introns are mobile elements with a fast
evolutionary clock, and therefore their phylogeny
cannot be defined as well as that of rRNAs. However,
there is no known variety of functions in group I
introns that would impede the construction of a tree
as it does for tRNAs. Since a consistent (although
imperfect) classification of group I introns is
available (Michel & Westhof, 1990), we can search for
significant phylogenetic events more rigorously than
we did for tRNAs. In this section, we implement a
simple method to count mutations, and use its results
to strengthen base-triple prediction in the group I
introns.
Our phylogenetic event counting was performed as
follows. Group I intron sequences in the alignment
were classified into phylogenetic groups as de-
scribed in Materials and Methods. For each potential
triple position (i·j)·k (i and j being base-paired
and k single-stranded in the secondary structure),
changes are counted as aligned sequences, are
examined from the first sequence to the last; c equals
the number of times a change is observed at i,j or k
(27% of the highest value). No significant corre-
lations are detected for canonical type I tRNA triples,
in agreement with crystal and solution studies,
which suggest that these triples are absent in type II
tRNAs (Biou et al., 1994; Dock-Bregeon et al., 1989;
Dietrich et al., 1990; Baron et al., 1993).
The sequences observed at position (15·48)·20 and
9·(13·22) in the type II tRNA dataset are shown in
Figure 6. (The base-triples in yeast tRNASer
are
(G15·C48)·U20 and G9·(G13·A22).) The correlation
(15·48)/20 is due primarily to an association of A20
with A15·U48, and an U or C at position 20 with
G15·C48 (Figure 6a). Analysis of the alignment
reveals these three principal sequences are present in
all type II isoacceptor groups (data not shown),
suggesting that concerted changes have occurred
several separate times through evolution. The
significance of a correlation is considerably increased
when multiple concerted changes are observed
independently, as they are here (Gutell et al., 1985).
The correlation (13·22)/9 is primarily due to an
association between A9 and A13·A22 (Figure 6b).
Although the correlation is relatively weak, con-
certed changes yielding sequence A9·(A13·A22)
occur in all type II isoacceptor groups, and even
among isoacceptor tRNAs from the same organism
(data not shown). This again indicates that the
correlation is very significant.
Phylogenetic Event Counting
In the previous section, we mentioned a few
concerted mutations occurring in closely related
tRNAs to help support some of our correlation
JMB—MS 440
Identification of RNA Triples 37
and e equals the number of times a change is
observed at (i or j) and k (i.e. a concerted change
between the base-pair and the third position). The
ratio e/c is the proportion of mutual changes over the
total number of changes, and is our measure of
phylogenetic events. As noted earlier, the detail we
can decipher with correlation analysis is enhanced
by incorporating phylogenetic event information into
our algorithm. For this paper we have not sought a
complete solution, since that would entail a better
appreciation of the phylogenetic relationships of the
RNAs under study, and better knowledge of how to
value mutual changes that occur between distantly
and closely related organisms. For the purposes of
this article we have developed a simple method
which assumes that the sequences are roughly
ordered by their phylogenetic relationships, and
treats all mutual changes as equivalent. Therefore, a
large number of mutual changes within closely
related RNA sequences will increase the e/c value
more than a few mutual changes between distantly
related RNA species. For our immediate needs, this
approximation works well, as we will see.
Results of the combined x2
and e/c analysis of
group I intron sequences are presented in Table 5.
Here the x2
analysis was performed first, followed by
an e/c analysis for each significant x2
base-pair/base
correlate. An asterisk in Table 5 denotes those triple
correlations that score the highest with the e/c
analysis.
Among the three highest x2
reciprocal correlates
are (109·212)/260 and (108·213)/259, corresponding
to the two proposed base-triples in the P4 stem
(Michel et al., 1990). Both of these triples are also
strongly supported by our phylogenetic counting
method. However, the proposed P6 stem triples
(215·258)·106 and (216·257)·105 are not accurately
predicted. Of these two previously proposed triples,
position 105 correlates best with the pair 216·257
with x2
analysis, and the e/c analysis associates the
pair 216·257 with position 105. The highest x2
correlations for the P6 base-pairs are (216·257)/106 (a
reciprocal correlate) and (215·258)/103. The
(215·258)/106 correlation is in the significant range
(38% of the highest value), but it is not shown in
Table 5 because better correlations involving
positions 106 and 215·258 exist (see above). There are
several possible explanations for these apparent
inaccuracies. First, the P3/P4 junction (positions
103 to 106) generally varies in size from three to
five bases, and there are a few examples of an
insertion of several hundred bases. Thus, the
sequences in this region cannot be aligned with
absolute confidence. Until more sequence infor-
mation suggests otherwise, we have justified the two
unpaired 3' P3/P4 nucleotides toward the P4 stem,
Table 5
Base to base-pair correlation (x2
) and neighbor effects (N) in group I introns (222 seqs)
Best x2
+ e/c Suggested cause
correlates Pair besta
nt x2 b
Nc
Sequences of correlationd
+ 109·212 * t * 260 100.0 67 222 Triple
+ 262·312 * t * 263 97.4 NA 222 Triple?
+ 108·213 * t * 259 95.8 75 222 Triplee
216·257 t 106 61.2 100 221 Neighbor effect
+ 110·211 * t * 305 56.9 36 222 Triple?f
280·298 t 279 52.9 39 210
268·307 : 279 49.3 2 215
+ 216·257 9 * 105 48.7 100 222 Triple
215·258 t 103 46.0 100 220 Neighbor effect
268·307 9 256 40.3 2 182
97·277 : 279 40.1 23 183
285·293 : 256 38.2 30 161
107·214 : 260 36.4 78 222 Neighbor effect
215·258 9 269 35.4 100 187
+ 220·253 * t 255 33.0 15 161 Triple?
216·257 9 101 32.3 100 221
215·258 9 217 29.6 100 119 Neighbor effect
111·209 : * 305 29.4 43 222 Neighbor effect?
109·212 9 304 29.2 67 222 Neighbor effect?
102·272 : 263 27.8 NA 221
286·292 : 300 26.9 18 150
102·272 9 270 25.2 NA 189
Correlations are ranked by the x2
value. Only those correlates within 25% of the highest x2
value are
listed. The best correlates are identified with a plus in the first column (see Table 7).
a
x2
best correlate noted with arrows; best e/c ratio noted with *.
b
% of highest value.
c
Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no
neighbor in the secondary structure.
d
References in text.
e
The best base e/c correlates for the (108·213) base-pair is shared by positions 259 and 302.
f
The best base-pair e/c correlates for position 305 are shared by (110·211) and (97·277). The alignment
in the vicinity of the (97·277) base-pair is questionable due to length variation of the P3 helix. Thus we
believe the best correlation is between (110·211) and 305.
JMB—MS 440
Identification of RNA Triples38
Figure 7. Sequences observed at
various correlating group I intron
positions. a, (262·312)/263. b,
(110·211)/305. c, (280·298)/279. Only
values greater than 2 are shown.
Numbers in bold type represent
more than 10% of the introns.
while the other P3/P4 nucleotides are justified
toward the P3 helix. Alternative decisions could have
produced significantly different correlations for
nucleotides 103 to 106. Other potential problems in
the identification of P6 triples were raised by a recent
NMR study (Chastain & Tinoco, 1993), which
suggested that P6 triples involved base/sugar
interactions, and varied significantly in structure
upon sequence change. In contrast to base/base
interactions, base/sugar interactions could produce
sequence constraints in which correlations between
adjacent bases become predominant, thus possibly
explaining these unexpected correlations. Finally, it is
also possible that very high neighbor effects could
relegate the actual triple correlations to second
position.
Other reciprocal correlates having high x2
values
are present. One involves positions 262·312 and 263.
This triple correlation is also strongly identified with
our phylogenetic event-based method (see Table 5).
Figure 7a shows the sequences observed at
(262·312)·263. C263 is associated with A262·U312,
while A263 is associated with G262·C312 or
C262·G312. Our observations of the alignment
reveals several concerted mutations occurring among
closely related introns, particularly within sub-
groups IB1 and IC1 (data not shown). The base-pair
262·312 is itself well supported, with multiple
independent covariations observed (data not shown).
This strong correlation can be interpreted in various
ways. On the basis of a three-dimensional modeling
study of the intron guanosine binding site (Yarus
et al., 1991), it was proposed that these three
nucleotides form a base-triple. Alternatively, it has
been suggested, from experimental mutagenesis
studies, that this sequence constraint is necessary
to ensure that the nucleotide at position 263 is
bulged out of this helix, and is not base-paired
to position 312 (Couture et al., 1990). Note that
when position 263 is a C, the 262·312 base-pair is
an A·U or U·A. When 263 is an A, 262·312 is a G·C
or C·G. Thus, position 263 is not able to form a
standard Watson–Crick pair with 312. This hypoth-
esis suggests that the triplets (U·A)A and (G·C)C
should also be found, which has not been the case to
date. We favor the suggestion by Yarus that a
base-triple interaction forms between these pos-
itions.
A second correlation, (110·211)/305, is supported
by the e/c study and a high x2
reciprocal correlation.
This correlation is particularly interesting, since it
involves nucleotides spanning two distant
domains of the group I intron, namely the P3/P7 and
P4/P6 coaxial stems. The correlation results
primarily from an exchange between the sequence
patterns (A·U)·C and (G·C)·U. This correlation,
unlike many of the others, occurs in its purest form
in the subgroups 1A and 1D (Figure 7b), although
covariation between these triplets is found in the
other subgroups, albeit intermixed with non-con-
verted variations (data not shown). The correlation
(110·211)/305 was identified previously using a
smaller dataset (Michel & Westhof, 1990) but was
disregarded on the ground of steric conflicts with the
P4 triples. However, more recent experimental data
(Pyle et al., 1992) have suggested interactions
between the P1 stem and the J7/8 strand that shift
J7/8 towards the P4 stem, and thus reduce the
distance between nucleotides 110·211 and 305.
Adjusting the current three-dimensional model to
take these new data into account could suggest
alternative ways to form a (110·211)/305 interaction,
and perhaps resolve the steric conflicts. In addition,
two other correlations in Table 5, (111·209)/305 and
(109·212)/304, resemble the neighbor effects that
could be expected in the presence of a (110·211)·305
triple.
The other reciprocal correlations in Table 5 are
(280·298)/279 and (220·253)/255. The first is not
supported by the e/c analysis, and thus we do not
consider it a credible triple candidate. The other
reciprocal correlation, (220·253)/255, is supported
JMB—MS 440
Identification of RNA Triples 39
by a significant number of coordinated changes,
primarily in subgroups IC1 and IC2. The number of
nucleotides between positions 253 and 257 is
variable. Thus it is difficult to align these unpaired
nucleotides across all the subgroups with much
confidence. However, within the IC1 subgroup this
number is three in almost all cases, while it is always
five in the IC2 subgroups, allowing us to obtain a
reliable local alignment for these two groups. The
sequences observed in these two subgroups are
shown in Figure 7c. Formation of a (220·253)·255
triple is feasible stereochemically, nucleotide 255
being situated in the internal loop flanking the
220·253 base-pair.
The combined e/c and x2
analysis has identified
three additional base-triple candidates in the group
I introns, namely (262·312)·263, (110·211)·305 in the
ID and IA subgroups, and (220·253)·255 in the IC1
and IC2 subgroups.
Base-pair to base-pair correlations:
identification of neighbor effects
The identification of base-triples requires the
ability to distinguish between correlations due to
physical interactions and those due to other factors,
such as RNA identity or accidental evolutionary
events. We have suggested that networked sequence
correlations are characteristic of triple-helix for-
mation. We now propose to use this property to help
distinguish base-triples (at least when present in
triple helices) from other correlated positions.
A simple method to assess neighbor effects is to
directly measure correlations between base-pairs.
For this purpose, we perform a x2
test as done in the
previous analysis, the only difference being a
contingency table having 16 rows and 16 columns
(instead of 16 × 4). The sparseness problem is again
resolved here by creating smaller 2 × 2 tables,
computing x2
in each table, and retaining the highest
value. A simple measure of the neighbor effect, N,
could then involve computing x2
for each set
of adjacent base-pairs (i,j) and (i + 1,j − 1):
N = x2
(i,j,i + 1,j − 1). However, since sequence corre-
lations also occur between positions separated by
several base-pairs in the same helical stem (Tables 1
and 2), the neighbor effect N at base-pair i,j can be
more accurately measured by averaging correlations
in a window comprising n base-pairs at each side of
i,j, using the following formula:
N(i,j) =
s
k = 1, n
(x2
(i,j,i + k,j − k) + x2
(i,j,i − k,j + k))
2n
(3)
If i 2 n or j 2 n is not a paired position,
the corresponding correlation is not computed,
and n is corrected accordingly. We use n = 2, and
thus evaluate a window of five base-pairs
(from i − 2 to i + 2) surrounding i,j. Figure 8
shows results obtained for tRNA and the group I
intron.
Figure 8. Neighbor effects measured in equation (3). The density of the dots is proportional to N(i,j), darker dots
representing the highest values and lighter dots the lowest values. Precise N(i,j) values for base-pairs of interest are given
in Tables 3 to 5. a, Type I tRNA. b, Group I intron.
(b)
(a)
JMB—MS 440
Identification of RNA Triples40
Table 6
Sequences observed at group I intron positions (109·212)
and (108·213)
108·213 : Neighbor effects (N)
109·212 A·U U·A C·G G·C
A·U — — 26 —
C·G 5 7 6 122
G·C — — 36 —
U·G — — — 5
Only values greater than 2 are shown.
—Numbers in bold face represent more than 10% of the group I
intron sequences.
Combining analyses for base-triple prediction
The various analyses presented here can be
combined into a single protocol for base-triple
prediction. The criteria we propose to apply in this
protocol remain loose at this stage of our work, but
will be refined as the method is applied to other
classes of RNA. These criteria are presented here.
First, we believe good triple candidates should
score well in both base to base-pair correlations (x2
and e/c) and neighbor effect analysis. A cutoff of
25% of the highest value for x2
and neighbor effect
measurements would retain all experimentally
proven triples in tRNA and group I introns. We
therefore require that values for x2
and neighbor
effects N (given in Tables 3 to 5) stand above this
threshold. A measure of phylogenetic events (e/c)
being available for group I introns, we require that
triple correlations in the group I intron are associated
to a significant level of concerted mutations (at least
one asterisk in Table 5). Finally, to tighten the
prediction criteria, we require x2
correlations to be
reciprocal. The triplets that best satisfy this stringent
criteria are revealed in the first row in Table 7.
This stringent criterion yields no false positives in
either tRNA family. In type II tRNA, the triple
(13·29)·9 is predicted, but a question remains for the
triple (15·48)·20. We cannot use equation (3) to
compute the neighbor effect associated with this
triple, since no secondary base-pair flanks the
15·48 pair. However, the strong correlation
observed in Table 4 between 15·48 and 21 could very
well be a neighbor effect. Thus, we tentatively
include this triple in Table 7. In type I tRNA, two of
the three yeast tRNAPhe
base-triples are predicted,
although 45·(10·25) is not. In group I introns,
the previously identified P4 triples are predicted,
along with one experimentally unproven interaction,
(110·211)·305. Two triple candidates with
In tRNA and group I introns, helices associated
with base-triples show significantly larger
neighbor effects (N, measured as in eqn (3)) than
those helices with no known base-triples. To
illustrate these strong base-pair to base-pair
correlations, we show in Table 6 the sequences
observed in group I introns at positions 109·212 and
108·213. The base-pair G108·C213 is strongly
associated with a C·G at position 109·212, while
C108·G213 is associated with A·U or G·C at position
109·212.
In group I introns (Figure 8b), neighbor effects
are consistent with triple formation in the P4/
P6 helices, and are also significant at positions
110·211, a base-pair having a potential triple
partner (Table 5). However, no significant neighbor
effect supports the strong triple correlations
(262·312)/263 and (220·253)/255. In spite of
this result, we still support the formation of
base-triples at these positions, since these triples
would not be part of an extended triple-helical
region, which we proposed was necessary for the
base-pairs to have noticeable neighbor effects. Also,
base-pairs near 262·312 in P7 are extremely
conserved, and thus limit any base correlation in this
region.
Table 7
Triples predicted in tRNA and group I introns based on Tables 3 to 5, using two different
criteria
Criteria for tRNA tRNA
triple prediction type I type II Group I introna
Stringent
x2
(base to base-pair) > 25% of (13·22)·46 (13·22)·9 (109·212)·260
highest value (12·23)·9 (15·48)·20b
(108·213)·259
N > 25% of highest value (110·211)·305
Best reciprocal correlate —
(262·312)·263
(220·253)·255c
Relaxed
x2
(base to base-pair) > 25% of Same + Same + Same +
highest value (11·24)·36 (12·23)·21 (216·257)·105
N > 25% of highest value (10·25)·45
Each position involved in only
one triple (not necessarily best
reciprocal correlate)
a
For group I intron triples, we use the phylogenetic event count as an additional criterion. Only
putative triples associated with an asterisk in Table 5 are included.
b
N cannot be measured for this position, but there is a large cross-correlation at (15·48)/21.
c
These 2 putative triples are not supported by neighbor effects, but are best reciprocal correlates and
associated with significant phylogenetic events (see discussion in text).
JMB—MS 440
Identification of RNA Triples 41
neighbor effects below the 25% threshold,
(262·312)·263 and (220·253)·255, are noteworthy, since
they satisfy all of our other requirements. While the
other group I intron triples would be complexed in
a triple-helix formation, these two putative triples are
both isolated from other known base-triples;
therefore, they would not be part of a triple helix.
Further study is required to determine if this is the
reason for their lack of neighbor effects. Until we have
the results from this study, the biologist’s judgement
is still necessary to resolve these ‘‘border-line’’ cases.
The possible existence of the triple (110·221)·305 has
been discussed.
The prediction criteria were relaxed by allowing
for non-reciprocal correlations, under the condition
that no base-pair or single-stranded nucleotide
belongs to more than one triple (Table 7, line 2). For
type I tRNAs, the triple 45·(10·25) is now predicted.
The relaxed criteria also identify the correlation
(11·24)/36. We suggest that this unique false positive
results from a functional linkage between positions
24 and 36, on the basis of experiments establishing
that mutations at position 24 affect codon/anticodon
recognition by tRNATrp
(Hirsh, 1971; Smith & Yarus,
1989). In type II tRNAs, the relaxed criteria identify
the correlation (12·23)/21. Instead of interacting with
the pair 12·23, as this correlation suggests, nucleotide
21 faces the pair 8·14 in the type II tRNASer
crystal
structure, and is proposed to interact with or face
pair 8·14 in other type II tRNA solution structures
(Dock-Bregeon et al., 1989; Baron et al., 1993).
However, since bases 12·23 and 21 are close in space,
we cannot rigorously exclude their interaction in
certain type II tRNAs.
In group I introns, the relaxed criterion identifies
the triple (216·257)·105, one of the previously
proposed P6 triples (Michel & Westhof, 1990).
Conclusion and Perspectives
Our previous correlation analyses sought corre-
lations that occur between two positions in an RNA
alignment (Gutell et al., 1992). While these analyses
effectively predicted secondary structure pairing, we
had difficulty identifying base-triples with confi-
dence. We suggest here two reasons for this
weakness. First, structurally similar base-triples can
form between bases that vary in a non-compensatory
fashion, which reduces covariation. Second, base-
triples do not necessarily involve the same positions
in all members of an RNA family.
With these obstacles in mind, we have developed
methods to enhance our ability to predict base-
triples by specifically seeking correlations between
secondary structure base-pairs and nucleotides
unpaired in the secondary structure. This signifi-
cantly enhances correlations for base-triples. During
our earlier studies, we also identified weaker
correlations between many of the bases in the tRNA
D-stem. We suggested that these effects could be
specific to base-triples forming local triple helices.
We developed an algorithm that quantifies these
neighbor effects in RNA secondary helices. The most
pronounced effects in tRNA were in the D-helix,
while in the group I intron they were in the P4 and
P6 helices, the same helices known to be involved in
triple formation. The combination of these two
correlation analyses identifies known base-triples
more effectively than any previous method.
The accuracy of current protocols is limited by
heterogeneity within the sequence datasets. Base-
triple prediction will remain ambiguous as long as
the dataset analyzed contains RNAs that form triples
in different positions. For example, we are currently
unable simultaneously to predict triples (13·22)·46
and 45·(13·22) in type I tRNAs, since they both
occur in the analyzed sequences. It should be
possible to isolate subsets of sequences displaying
specific correlations, and enhance predictions in
each subset. The growth of RNA databases, and
the availability of the algorithms presented herein,
will certainly lead us in that direction. Another
enhancement would be to combine the various
prediction criteria introduced in this study into an
automated protocol. An integration of x2
correlation
values and phylogenetic event counts would be
particularly useful in RNAs with well established
phylogenetic relationships, such as the ribosomal
RNAs.
Materials and Methods
Sequence alignments
The tRNA sequence alignment used was adapted from
Sprinzl et al. (1991). We aligned the variable loop (which
was not aligned in the original database), and removed
mitochondrial sequences, leaving 895 type I and 263 type
II nuclear tRNAs, which were analyzed separately. The
group I intron alignment contains 222 sequences compiled
by S. H. Damberger and R. R. Gutell (unpublished results).
Analyses were performed only on the core region
comprising the stems P1, P3, P4, P6, P6a, P7, P8, a part of
P5 and all intervening single-stranded segments. Intron
sequences were classified into structurally distinct
subgroups (IA, IB, IC and ID) according to the definitions
of Michel & Westhof (1990). We further subdivided each
subgroup using these criteria: (1) the sequences within
each subgroup were ordered by the type of gene in which
the intron was found (e.g. ATP9, SSU rRNA, etc.). (2)
The specific site in that gene where the intron was found
(e.g. SSU site 531). (3) Cellular location (e.g. nucleus,
mitochondrion, chloroplast) of the intron. (4) A rough
phylogenetic ordering of the organisms.
Structural data
Detailed base-triple information is available for six
tRNA crystal structures: yeast tRNAPhe
(Quigley & Rich,
1976; Sussman & Kim, 1976), Escherichia coli tRNAMet
f (Woo
et al., 1980), yeast tRNAAsp
(Dumas et al., 1985), E. coli
tRNAGln
(Rould et al., 1989), yeast tRNAMet
i (Basavappa &
Sigler, 1991) and Tetrahymena thermophilus tRNASer
2 (GGA)
(Biou et al., 1994). Although no crystal structures are
available for group I introns, it has been suggested that
triples form in the P4 and P6 helices (Michel et al., 1990;
Michel & Westhof, 1990). The existence of both P4 triples
and one of the proposed P6 triples is supported by
JMB—MS 440
Identification of RNA Triples42
mutagenesis experiments (Michel et al., 1990; Green &
Szostak, 1994). There is good evidence for the formation of
base–base interactions in the P4 triples, but the nature of
the interactions in the P6 triples remains unclear. NMR
experiments on a model oligonucleotide that partially
reproduced the P4/P6 domain suggested that triple
interactions exist in the form of base–backbone contacts
(Chastain & Tinoco, 1993). However, the applicability of
these latter results in the group I intron context is uncertain,
given that important parts of the P4/P6 triple domain are
absent from the construct.
Programs
Sequence alignments were visualized and manipulated
using the alignment editor AE2 (T. Macke, The Scripps
Clinic, CA) available from the Ribosomal Database Project
(Larsen et al., 1993), and studied using a comparative
sequence analysis program developed in our laboratory
(S. H. Damberger, D. Gautheret & R. R. Gutell,
unpublished results). This software computes frequencies
of bases, base-pairs and base-triples, performs pairwise
correlation analyses using mutual information (Chiu &
Kolodziejczak, 1991; Gutell et al., 1992), and computes
various types of correlations based on x2
tests and
phylogenetic event counting, as discussed above. Sec-
ondary structure graphics were produced using the
program XRNA (B. Weiser & H. Noller, unpublished
results).
Notation
We adopted the notation (X·Y)·Z to describe a triple
interaction involving the secondary base-pair X·Y and
position Z, where Z interacts with Y; and we use Z·(X·Y)
when Z interacts with X. When interacting nucleotides are
not well established, as in the group I intron, we always use
the notation (X·Y)·Z. We use the term ‘‘base-triple’’ when
only the bases interact, ‘‘nucleotide-triple’’ when base–
backbone contacts are involved, and simply ‘‘triple’’ as the
general term. Correlations between positions X and Y are
noted X/Y. The numbering systems used are those of yeast
tRNAPhe
and the T. thermophila group I intron.
Acknowledgements
This work was supported by grants from the NIH
(GM48207) and the Colorado RNA Center to R.R.G. We
thank SUN Microsystems for their donation of computer
equipment, and the W. M. Keck Foundation for its support
of RNA Science on the Boulder campus. We also thank Dr
T. Cech for comments on the manuscript, and Drs V. Rath
and T. Steitz for sharing information on the tRNAGln
structure.
References
Baron, C., Westhof, E., Bo¨ck, A. & Giege´, R. (1993). Solution
structure of selenocysteine-inserting tRNASec
from
Escherichia coli. J. Mol. Biol. 231, 274–292.
Basavappa, R. & Sigler, P. B. (1991). The 3 A˚ crystal
structure of yeast initiator tRNA: functional impli-
cations in initiator/elongator discrimination. EMBO J.
10, 3105–3111.
Bina-Stein, M. & Stein, A. (1976). Allosteric interpretations
of the Mg2 +
binding to the denaturable Escherichia coli
tRNAGlu
2 . Biochemistry, 15, 3912–3917.
Biou, V., Yaremchuk, A., Tukalo, M. & Cusack, S. (1994).
The 2.9 A˚ crystal structure of T. thermophylus
seryl-tRNA synthetase complexed with tRNASer
.
Science, 263, 1404–1410.
Cech, T. R., Damberger, S. D. & Gutell, R. R. (1994).
Representation of the secondary and tertiary structure
of group I introns. Nature Struc. Biol. 1, 273–280.
Cedergren, R. J., LaRue, B. & Grosjean, H. (1981). The
evolving tRNA molecule. CRC Crit. Rev. Biochem. 11,
35–104.
Chastain, M. & Tinoco, I., Jr (1993). Nucleoside triples
from the group I intron. Biochemistry, 32, 14220–14228.
Chiu, D. K. Y. & Kolodziejczak, T. (1991). Inferring
consensus structure from nucleic acid sequences.
Comp. Appl. Biosci. 7, 347–342.
Couture, S., Ellington, A. D., Gerber, A. S., Cherry, J. M.,
Doudna, J. A., Green, R., Hanna, M., Pace, U.,
Rajagopal, J. & Szostak, J. W. (1990). Mutational
analysis of conserved nucleotides in a self-splicing
group I intron. J. Mol. Biol. 215, 345–358.
Dietrich, A., Romby, P., Mare´chal-Drouard, L., Guillemaut,
P. & Giege´, R. (1990). Solution conformation of several
free tRNALeu
species from bean, yeast and Escherichia
coli, and interaction of these tRNAs with bean
cytoplasmic leucyl-tRNA synthetase. A phosphate
alkylation study with ethylnitrosourea. Nucl. Acids
Res. 18, 2589–2597.
Dock-Bregeon, A. C., Westhof, E., Giege´, R. & Moras, D.
(1989). Solution structure of a tRNA with a large
variable region: yeast tRNASer
. J. Mol. Biol. 206,
707–722.
Dumas, P., Ebel, J. P., Giege´, R., Moras, D., Thierry, J. C. &
Westhof, E. (1985). Crystal structure of yeast tRNAAsp
:
atomic coordinates. Biochimie, 67, 597–606.
Green, R. & Szostak, J. W. (1994). In vitro genetic analysis
of the hinge region between helical elements P5-P4-P6
and P7-P3-P8 in the sunY group I self-splicing intron.
J. Mol. Biol. 235, 140–155.
Gutell, R. R. (1993). Comparative studies of RNA: inferring
higher-order structure from patterns of sequence
variation. Curr. Opin. Struct. Biol. 3, 313–322.
Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F. (1985).
Comparative anatomy of 16S-like ribosomal RNA.
Progr. Nucl. Acid. Res. 32, 155–216.
Gutell, R. R., Power, A., Hertz, G. Z., Putz, E. J. & Stormo,
G. D. (1992). Identifying constraints on the higher-
order structure of RNA: continued development
and application of comparative sequence analysis
methods. Nucl. Acids Res. 20, 5785–5795.
Gutell, R. R., Larsen, N. & Woese, C. R. (1994). Lessons
from an evolving rRNA: 16S and 23S rRNA structures
from a comparative perspective. Microbiol. Rev. 58,
10–26.
Haselman, T., Chappelear, J. E. & Fox, G. E. (1988). Fidelity
of secondary and tertiary interactions in tRNA. Nucl.
Acids Res. 16, 5673–5684.
Hirsh, D. (1971). Tryptophan transfer RNA as the UGA
suppressor. J. Mol. Biol. 58, 439–458.
Holbrook, S. R., Warrant, R. W., Church, G. M. & Kim, S. H.
(1977). RNA–ligand interactions: [I] Magnesium
binding sites in yeast tRNAPhe
. Nucl. Acids Res. 4,
2811–2820.
Hou, Y. M. (1994). Structural elements that contribute to an
unusual tertiary interaction in a transfer RNA.
Biochemistry, 33, 4677–4681.
Hou, Y. M. & Schimmel, P. (1989). Evidence that a
major determinant for the identity of a transfer
RNA is conserved in evolution. Biochemistry, 28,
6800–6804.
Identification of RNA Triples 43
Hou, Y. M., Westhof, E. & Giege, R. (1993). An unusual
RNA tertiary interaction has a role for the specific
aminoacylation of a transfer RNA. Proc. Nat. Acad. Sci.,
U.S.A. 90, 6776–6780.
Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement of
a GNRA tetraloop in long-range RNA tertiary
interactions. J. Mol. Biol. 236, 1271–1276.
Klug, A., Ladner, J. & Robertus, J. D. (1974). The structural
geometry of co-ordinated base changes in transfer
RNA. J. Mol. Biol. 89, 511–516.
Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J.,
Overbeek, R. N., Macke, T. J., Marsh, T. L. & Woese,
C. R. (1993). The ribosomal database project. Nucl.
Acids Res. 21 Suppl., 3021–3023.
Levitt, M. (1969). Detailed model for transfer ribonucleic
acid. Nature (London), 224, 759–763.
Major, F., Gautheret, D. & Cedergren, R. (1993).
Reproducing the three-dimensional structure of a
tRNA molecule from structural constraints. Proc. Nat.
Acad. Sci., U.S.A. 90, 9408–9412.
Malhotra, A., Tan, R. K. & Harvey, S. C. (1990). Prediction
of the three-dimensional structure of Escherichia coli
30S ribosomal subunit: a molecular mechanics
approach. Proc. Nat. Acad. Sci., U.S.A. 87, 1950–1954.
McClain, W. H. (1993a). Identity of Escherichia coli tRNACys
determined by nucleotides in three regions of tRNA
tertiary structure. J. Biol. Chem. 268, 19398–19402.
McClain, W. H. (1993b). Rules that govern tRNA identity
in protein synthesis. J. Mol. Biol. 234, 257–280.
McClain, W. H. & Foss, K. R. (1988). Changing the identity
of a tRNA by introducing a G-U wobble pair near the
3' acceptor end. Science, 240, 793–796.
McClain, W. H., Foss, K. R., Jenkins, R. A. & Schneider, J.
(1991). Rapid determination of nucleotides that define
tRNAGly
acceptor identity. Proc. Nat. Acad. Sci., U.S.A.
88, 6147–6151.
Michel, F. & Westhof, E. (1990). Modelling of the
three-dimensional architecture of group I catalytic
introns based on comparative sequence analysis.
J. Mol. Biol. 216, 585–610.
Michel, F., Ellington, A. D., Couture, S. & Szostak, J. W.
(1990). Phylogenetic and genetic evidence for base
triple formation in the catalytic domain of group I
introns. Nature (London), 347, 578–580.
Ninio, J. (1982). Molecular Approaches to Evolution,
pp. 24–27, Pitman Books Ltd., London, U.K.
Olsen, G. J. (1983). Comparative analysis of nucleotide
sequence data, PhD dissertation, University of
Colorado Health Sciences Center, CO.
Pu¨tz, J., Puglisi, J. D., Florentz, C. & Giege´, R. (1991).
Identity elements for specific aminoacylation of yeast
tRNAAsp
by cognate aspartyl-tRNA synthetase.
Science, 252, 1696–1699.
Pyle, A. M., Murphy, F. L. & Cech, T. R. (1992).
RNA substrate binding site in the catalytic core of
the Tetrahymena ribozyme. Nature (London), 358,
123–128.
Quigley, G. J. & Rich, A. (1976). Structural domains of
transfer RNA molecules. Science, 194, 796–806.
Rould, M. A., Perona, J. J., So¨ll, D. & Steitz, T. A. (1989).
Structure of E. coli glutamyl-tRNA synthetase
complexed with tRNAGln
and ATP at 2.8 A˚ resolution.
Science, 246, 1135–1142.
Shultz, D. W. & Yarus, M. (1994). tRNA structure and
ribosomal function. I. tRNA nucleotide 27 to 43
mutations enhance first position wobble. J. Mol. Biol.
235, 1381–1394.
Smith, D. & Yarus, M. (1989). Transfer RNA and
coding specificity. II. A D-arm tertiary interaction
that restricts coding range. J. Mol. Biol. 206,
503–511.
Sprinzl, M., Dank, N., Nock, S. & Scho¨n, A.
(1991). Compilation of tRNA sequences and se-
quences of tRNA genes. Nucl. Acids Res. 19 (Suppl.)
2127–2171.
Sussman, J. L. & Kim, S.-H. (1976). Three-dimensional
structure of a transfer RNA in two crystal forms.
Science, 176, 853–858.
Winker, S., Overbeek, R., Woese, C. R., Olsen, G. J. &
Pfluger, N. (1990). Structure detection through
automated covariance search. Comp. Appl. Biosci. 6,
365–371.
Woese, C. R. (1987). Bacterial evolution. Microbiol. Rev. 51,
221–271.
Woese, C. R. & Pace, N. R. (1993). Probing RNA structure,
function and history by comparative analysis. In The
RNA World (Gesteland, R. F. & Atkins, J. F., eds),
pp. 91–117, Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, NY.
Woo, N. H., Roe, B. A. & Rich, A. (1980). Three-dimensional
structure of Escherichia coli initiator tRNAMet
f . Nature
(London), 286, 346–351.
Yarus, M. (1982). Translational efficiency of transfer
RNAs: uses of extended anticodon. Science, 218,
646–652.
Yaris, M., Illangesekare, M. & Christian, E. (1991). An axial
binding site in the Tetrahymena precursor RNA. J. Mol.
Biol. 222, 995–1012.
Edited by D. E. Draper
(Received 20 July 1994; accepted in revised form 20 January 1995)

Mais conteúdo relacionado

Semelhante a Gutell 044.jmb.1995.248.0027

Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Robin Gutell
 
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...ICREA
 
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...Mayi Suárez
 
PhD Thesis examination
PhD Thesis examinationPhD Thesis examination
PhD Thesis examinationHeetae Kim
 
Gutell 066.biochemistry.1998.37.11980
Gutell 066.biochemistry.1998.37.11980Gutell 066.biochemistry.1998.37.11980
Gutell 066.biochemistry.1998.37.11980Robin Gutell
 
Spectrochim acta triazine 1999
Spectrochim acta triazine 1999Spectrochim acta triazine 1999
Spectrochim acta triazine 1999Peter J. Larkin
 
Epigenome roadmap ge-mvt2016-amb.slideshare
Epigenome roadmap ge-mvt2016-amb.slideshareEpigenome roadmap ge-mvt2016-amb.slideshare
Epigenome roadmap ge-mvt2016-amb.slidesharebarrioam
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Robin Gutell
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Robin Gutell
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Robin Gutell
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Robin Gutell
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Robin Gutell
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103Farah Diba
 
CytokinesisByMitoticSpindle
CytokinesisByMitoticSpindleCytokinesisByMitoticSpindle
CytokinesisByMitoticSpindleErdinc Atilgan
 

Semelhante a Gutell 044.jmb.1995.248.0027 (20)

Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105
 
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...
 
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
 
PhD Thesis examination
PhD Thesis examinationPhD Thesis examination
PhD Thesis examination
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
Gutell 066.biochemistry.1998.37.11980
Gutell 066.biochemistry.1998.37.11980Gutell 066.biochemistry.1998.37.11980
Gutell 066.biochemistry.1998.37.11980
 
Spectrochim acta triazine 1999
Spectrochim acta triazine 1999Spectrochim acta triazine 1999
Spectrochim acta triazine 1999
 
Epigenome roadmap ge-mvt2016-amb.slideshare
Epigenome roadmap ge-mvt2016-amb.slideshareEpigenome roadmap ge-mvt2016-amb.slideshare
Epigenome roadmap ge-mvt2016-amb.slideshare
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701
 
Poster2015 FASEB copy
Poster2015 FASEB copyPoster2015 FASEB copy
Poster2015 FASEB copy
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103
 
CytokinesisByMitoticSpindle
CytokinesisByMitoticSpindleCytokinesisByMitoticSpindle
CytokinesisByMitoticSpindle
 

Mais de Robin Gutell

Gutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiGutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiRobin Gutell
 
Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Robin Gutell
 
Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Robin Gutell
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Robin Gutell
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataRobin Gutell
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigRobin Gutell
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Robin Gutell
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Robin Gutell
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Robin Gutell
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Robin Gutell
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Robin Gutell
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Robin Gutell
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Robin Gutell
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Robin Gutell
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Robin Gutell
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Robin Gutell
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Robin Gutell
 

Mais de Robin Gutell (20)

Gutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xiGutell 124.rna 2013-woese-19-vii-xi
Gutell 124.rna 2013-woese-19-vii-xi
 
Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803
 
Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_data
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfig
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016
 

Último

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Gutell 044.jmb.1995.248.0027

  • 1. JMB—MS 440 Cust. Ref. No. PEW 84/94 [SGML] J. Mol. Biol. (1995) 248, 27–43 Identification of Base-triples in RNA using Comparative Sequence Analysis Daniel Gautheret, Simon H. Damberger and Robin R. Gutell* Comparative sequence analysis has proven to be a very efficient tool for theDepartment of Molecular Cellular and Developmental determination of RNA secondary structure and certain tertiary interactions. However, base-triples, an important RNA structural element, cannot beBiology, Campus Box 347 predicted accurately from sequence data. We show here that the poor baseUniversity of Colorado Boulder, CO 80309-0347 correlations observed at base-triple positions are the result of two factors. (1) Base covariation is not as strictly required in triples as it is in Watson–CrickU.S.A. pairs. (2) Base-triple structures are less conserved among homologous molecules. A particularity of known triple-helical regions is the presence of multiple base correlations that do not reflect direct pairing. We suggest that natural mutations in base-triples create structural changes that require compensatory mutations in adjacent base-pairs and triples to maintain the triple-helix conformation. On the basis of these observations, we devised two new measures of association that significantly enhance the base-triple signal in correlation studies. We evaluated correlations between base-pairs and single stranded bases, and correlations between adjacent base-pairs. Positions that score well in both analyses are the best triple candidates. This procedure correctly identifies triples, or interactions very close to the proposed triples, in type I and type II tRNAs and in the group I intron. Keywords: RNA structure; comparative sequence analysis; base-triples*Corresponding author Introduction Base-triples are among the essential tertiary interactions in RNA three-dimensional structure. The best characterized RNA base-triples are those of tRNA (Quigley & Rich, 1976; Sussman & Kim, 1976), and there is also good evidence for base or nucleotide triples in self-splicing group I introns, in which they are required for enzymatic activity (Michel et al., 1990). Base-triples involving a base-pair and a distant single-stranded nucleotide create long-range con- straints on RNA folding, and constitute powerful assets for structure determination. The value of base-triple information in modeling studies has been clearly demonstrated in the case of group I introns (Michel & Westhof, 1990; Jaeger et al., 1994), and more benefits can be expected from the incorpor- ation of base-triple information in computational RNA folding procedures (Malhotra et al., 1990; Major et al., 1993). The prediction of base-triples directly from sequence information is therefore highly desirable. Certain base interactions, those constituting RNA secondary structure, can be predicted accurately from sequence data using comparative sequence analysis, a method based on the principle that evolution maintains a common structure through compensatory mutations (reviewed by Gutell, 1993; Woese & Pace, 1993). Compensatory mutations were initially identified visually in relatively small sequence alignments, resulting in the first reliable secondary structure models (Gutell, 1993; Woese & Pace, 1993). The simultaneous growth of sequence databases and refinement of computational methods have significantly enhanced our ability to derive base–base interactions from sequence analysis (Olsen, 1983; Gutell et al., 1985; Haselman et al., 1988; Winker et al., 1990; Chiu & Kolodziejczak, 1991; Gutell et al., 1992). Although methods have improved sufficiently to identify correctly several tertiary interactions in 16 S and 23 S rRNA (Gutell et al., 1994), predicting base-triples with confidence remains problematic. Only a few base-triples have been suggested on the basis of comparative analysis to date, in the early study of tRNA by Levitt (1969), in rRNA (Gutell, et al., 1994) and in the group I intron (Michel et al., 1990), where triples were experimen- tally substantiated (Michel et al., 1990; Green & Szostak, 1994). Present address: D. Gautheret, Departement de Biologie, Universite Aix-Marseille II, Faculte de Luminy, 13 000 Marseille, and J.G.S., C.N.R.S., 31 ch. Joseph Aiguier, 13 402 Marseille Cedex 20, France. 0022–2836/95/160027–17 $08.00/0 7 1995 Academic Press Limited
  • 2. JMB—MS 440 Identification of RNA Triples28 In spite of the scarcity of comparatively inferred base-triples, these interactions are certainly wide- spread, and therefore many remain to be discovered. We have thus begun a detailed comparative analysis of RNA triples to derive principles and algorithms that can be applied to base-triple prediction in different RNA molecules. The availability of large sequence databases and of several tRNA crystal structures now permits a more thorough characterization of triple interactions. We can now ask how base-triple structures vary in related molecules, and how base sequences at and around triples reflect these structural changes. Principles derived from the analysis of tRNA and group I intron triples can be incorporated into our correlation analyses, and significantly enhance our ability to predict base-triples from sets of aligned sequences. Characterization of Base-triples Sequence correlations in the vicinity of base-triples Current comparative analysis methods detect nucleotide interactions by measuring correlations between pairs of RNA positions. This usually involves the construction of contingency tables containing the number of observations for each base-pair at position i·j. Let no(Mi,Nj) be the number of observations of base-pair M·N (M,N $ 4A,U,G,C5) at position i·j. We compute the number of bases M and N at positions i and j (no(Mi) and no(Mj)) and the expected number of observations for each M·N base pair: ne(Mi,Nj) = no(Mi) × no(Nj). The difference be- tween expected and observed values reflects the dependence of the two positions. This difference can be computed as follows (Olsen, 1983): x2 = s M,N [no(Mi,Nj) − ne(Mi,Nj)]2 ne(Mi,Nj) (1) Mutual information is an alternative measure of correlation that yields improved results in the detection of RNA interactions (Chiu & Kolodziejczak, 1991). It requires base frequencies ( fo(Mi,Nj), fo(Mi), fo(Nj)) to be used instead of absolute numbers; it is computed as follows: M(i,j) = s M,N $fo(Mi,Nj) × ln fo(Mi,Nj) fo(Mi) × fo(Nj)% (2) Mutual information accurately predicts the secondary structure of tRNA, as well as the tertiary pairs 15.48 and 26.44 (Chiu & Kolodziejczak, 1991; Gutell et al., 1992). We present in Tables 1 and 2 the M(i,j) values obtained in the base-triple regions of tRNA and group I intron. For each position, the eight highest correlations are shown (73 positions in tRNA and 134 in the group I intron were analyzed). The most significant correlations are at the top of each column, and those corresponding to possible triples are indicated by asterisks. The secondary structure and tertiary interactions of yeast tRNAPhe are shown in Figure 1a. Base-triples involve positions 45·(10·25), (12·23)·9 and (13·22)·46. The proposed group I intron triples (Michel & Westhof, 1990) involve positions (108·213)·259 and (109·212)·260 in the P4 stem and (216·257)·105 and (215·258)·106 in the P6 stem. These are shown on the intron secondary structure in Figure 2. The secondary structure correlations (10/25, 11/24, 12/23 and 13/22 in tRNA (see Table 1) and 108/213, 109/212, 215/258 and 216/257 in group I (see Table 2)) are the highest at each helical position. The correlations that follow Watson–Crick pairings in Tables 1 and 2 are intriguing. Certain base- triple positions correlate (23/9 and 22/46 in tRNA, 212/260 and 213/259 in the group I intron), but do so more weakly than secondary pairs (compare, e.g. 23/12 and 23/9), and even more weakly than some non-interacting positions. For example, in tRNA, the value of correlation 23/9 (a base-triple) is lower than that of 23/13 (non-interacting positions). The Table 1 The eight best correlations (M(i,j) (Gutell et al., 1992) for tRNA positions 2, 9 to 13, 22 to 25 and 45 to 46 are evaluated against all tRNA positions tRNA positions 2 9 10 11 12 13 22 23 24 25 45 46 71a 0.90b 23 0.26* 25 0.08 24 0.78 23 0.99 22 0.33 13 0.33 12 0.99 11 0.78 10 0.08 46 0.12 13 0.31* 35 0.09 12 0.26* 45 0.06* 13. 0.29 13 0.30 46 0.31* 46 0.28* 13 0.28 13 0.28 24 0.06 13 0.11* 22 0.28* 31 0.06 13 0.12 64 0.04 36 0.18 9 0.26* 12 0.30 23 0.17 9 0.26* 36 0.16 11 0.06 22 0.08* 12 0.17 12 0.06 46 0.09 32 0.03 12 0.14 46 0.17 11 0.29 12 0.17 22 0.17 12 0.15 39 0.05 12 0.07 23 0.17 29 0.06 24 0.07 50 0.03 23 0.14 22 0.17 23 0.28 11 0.13 46 0.17 23 0.14 26 0.04 9 0.07 45 0.12 24 0.06 11 0.07 49 0.03 22 0.13 24 0.15 24 0.28 24 0.13 24 0.14 22 0.13 49 0.04 23 0.06 24 0.11 70 0.05 45 0.07 68 0.02 26 0.11 11 0.14 36 0.13 36 0.08 11 0.14 46 0.11 13 0.04 10 0.06* 35 0.10 41 0.05 22 0.06 5 0.02 46 0.10 26 0.09 9 0.12 45 0.08 1 0.08 26 0.11 65 0.04 36 0.06 11 0.10 Numbers in bold type denote correlations between nucleotides in close proximity in the 3D structure. Correlations corresponding to a secondary structure base-pair are indicated by an asterisk, while base-triples in any of the type I tRNA crystal structures are underlined. a tRNA position number, based on yeast Phe reference numbering. Base-triples for yeast Phe are: (10·25)·45, (12·23)·9 and (13·22)·46. Alternative base triples found in other tRNA crystal structures are noted in Figure 1. b (M(i,j)) correlation value.
  • 3. JMB—MS 440 Identification of RNA Triples 29 Table 2 The eight best correlations (M(i,j)) for group I intron positions 105 to 109, 212 to 213, 215 to 216 and 257 to 260 are evaluated against the positions of the group I intron core (defined in Materials and Methods) Group I intron positions 105 106 108 109 212 213 215 216 257 258 259 260 103a 0.39b 216 0.33 213 0.85 212 0.78 109 0.78 108 0.85 258 0.38 257 0.74 216 0.74 215 0.38 213 0.56* 212 0.47* 101 0.27 257 0.32 259 0.56* 108 0.51 108 0.49 259 0.56* 221 0.22 106 0.33 106 0.32 269 0.32 108 0.56* 109 0.43* 269 0.26 103 0.26 109 0.51 213 0.51 213 0.49 109 0.51 112 0.20 258 0.31 255 0.30 217 0.31 109 0.47 259 0.25 257 0.26* 101 0.22 212 0.49 259 0.47 260 0.47* 212 0.49 222 0.18 269 0.29 258 0.30 216 0.31 212 0.45 108 0.21 216 0.24* 105 0.21 278 0.23 260 0.43* 259 0.45 278 0.21 220 0.17 255 0.29 269 0.28 257 0.30 260 0.25 213 0.21 271 0.22 255 0.21 260 0.21 268 0.37 268 0.36 260 0.20 252 0.17 103 0.25 103 0.27 103 0.25 268 0.24 268 0.17 104 0.22 258 0.20* 96 0.21 307 0.28 307 0.28 268 0.20 208 0.17 105 0.24* 105 0.26* 255 0.25 284 0.20 258 0.14 255 0.21 217 0.18 268 0.20 256 0.21 256 0.21 96 0.19 218 0.16 101 0.21 101 0.22 222 0.24 278 0.18 253 0.13 Numbers in bold type denote correlations between nucleotides in close proximity in the 3D model of Michel & Westhof (1990). Correlations corresponding to a secondary structure base-pair are underlined, while correlations corresponding to proposed base-triples in the group I intron 3D model (Michel & Westhof, 1990) are denoted with an asterisk. a Group I intron position number based on T. thermophila reference numbering. Previously proposed base-triples are: (108·213)·259, (109·212)·260, (215·258)·106 and (216·257)·105. b (M(i,j)) correlation value. correlation 25/45 (a base-triple) is not within the top eight correlates, ranking at number 31 in the correlations involving position 25 (not shown). Similar effects are observed in the group I introns (Table 2). A second important observation concerns the network of correlations linking most nucleotides in the vicinity of the tRNA base-triples. Significant correlations between unpaired positions were recorded earlier with smaller tRNA datasets (Olsen, 1983; Haselman et al., 1988), and in a more recent study (Gutell et al., 1992). These ‘‘cross-correlations,’’ indicated by boldface numbers in Tables 1 and 2, involve consecutive or non-interacting positions, such as 11/12, 22/23, 9/46 and 9/12 in tRNA or 109/108, 109/259, 212/213 and 106/105 in group I introns, spanning the entire triple-helical regions in both RNAs. These correlations have values of the same order of magnitude as the main secondary structure correlations. This contrasts significantly with what is usually observed in helical positions. Typical Watson–Crick positions (see, e.g. tRNA position 2 in Table 1, or Figure 3 of Gutell et al., 1992) display a difference of one order of magnitude between the first and second highest correlations, and rarely show correlations with neighboring positions. This analysis thus raises two questions regarding base-triples. (1) Why do positions involved in base-triples have correlation values that are lower than secondary structure positions, and (2) why would a triple-helical region display networked correlations? Why are sequence correlations weaker in base-triples? Base-triples do not demonstrate covariation as do secondary base-pairs Comparative analysis searches for a common structure by identifying compensatory changes, or covariation. This principle applies itself very well to the detection of Watson–Crick pairs: in order to preserve the Watson–Crick conformation, mutations must occur in a compensatory fashion, which results in four prominent sequence patterns (A·U, U·A, G·C or C·G). Each base type in a position is associated with a distinct base type in a second position, and vice versa. Even when a significant incidence of G·U or other non-canonical pairs is observed, the existence of a secondary base-pair usually remains unambiguous (Gutell et al., 1994). Considering triple sequences in tRNA (Figure 3) and group I introns (Figure 4), we find there is no strict covariation between the secondary structure base- pair and the third position. In the tRNA triple (12·23)·9 (Figure 3b), there is covariation between the sequences (U·A)·A and (G·C)·G, but this covariation is obscured by the presence of several non-compen- satory changes. For example, an A at position 9 is associated with several different Watson–Crick pairs at position 12·23. Similarly, a G·C pair at position 10·25 (Figure 3a) is associated with all four bases at position 45. The other triples in tRNA and the group I intron also display significant levels of uncorrelated changes (Figures 3c and 4). These observations lead us to ask why base-triples lack the stricter patterns of covariation observed in secondary structure base-pairs. This question can be answered, at least in part, by an observation of base-triple structures. A perfect triple isomorphism is possible in the absence of base covariation The interaction of a Watson–Crick pair with a third base occurs through different types of non-canonical interactions, such as the Hoogsteen pairing. In contrast to Watson–Crick pairs, these tertiary interactions can retain an identical conformation after a unilateral mutation. For example, the Hoogsteen-like A9·A23 base-pair present in the
  • 4. JMB—MS 440 Identification of RNA Triples30 (12·23)·9 triple of yeast tRNAPhe can be converted to a G9·A23 base-pair, which occurs in some tRNAs (Figure 3b), while retaining the same conformation (Figure 5a) (Klug et al., 1974). Among the multiple non-canonical pairs that can be constructed with one or two hydrogen bonds, there are several ways of forming a unique conformation while modifying either base in the pair. Numerous base-triple conformations can thus be maintained through non-compensatory mutations. Base-triples vary in structure and position The available tRNA crystal structures reveal more structural heterogeneity in base-triples than in secondary structure base-pairs. Figure 1 shows the base-triples forming in four tRNA crystal structures. The yeast tRNAPhe base-triples (described above) are shown in Figure 1a. Base-triples in E. coli tRNAMet f (Woo et al., 1980) and yeast tRNAMet i (Basavappa & Sigler, 1991) do not differ significantly from those of yeast tRNAPhe (data not shown). However, in yeast tRNAAsp (Dumas et al., 1985) (Figure 1b), a base–sugar interaction that formed between pos- itions 14 and 21 in tRNAPhe is converted into a base–base interaction, creating a (8·14)·21 base-triple. On the basis of the electron density map, there is no evidence for the triples 45·(10·25) and (13·22)·46 in the E. coli tRNAGln complexed with its cognate synthetase (Rould et al., 1989; V. Rath & T. A. Figure 1. Tertiary base/base interactions in 4 tRNA crystal structures, mapped onto the yeast tRNAPhe secondary structure. Continuous lines, base-triples; broken lines, other tertiary base/base interactions. Sugar–phosphate backbone interactions are not shown. a, Yeast tRNAPhe ; b, yeast tRNAAsp ; c, E. coli tRNAGln ; d, E. coli tRNASer . Insertions (+) and deletions (r) relative to: a, yeast tRNAPhe ; and b, yeast tRNAAsp , r48; c, E. coli tRNAGln , r17; d, E. coli tRNASer , r17, +19a, +20a, +47a, 47b, to 47q. AA, amino acceptor stem; TCC, TCC stem and loop; D, D-stem and loop; AC, anticodon stem and loop; V, variable loop.
  • 5. JMB—MS 440 Identification of RNA Triples 31 Figure 2. Core secondary structure and triple interactions in the T. thermophila group I intron. Triples are indicated by bold lines. Filled circles denote triples and other positions discussed in the text. The 2 putative triples, (220·253)·255 and (110·211)·305 are shown by a thicker broken line. The representation is formatted as proposed in Cech et al. 1994. Steitz, personal communication) (Figure 1c). Within this same complex, the distance between the pair 12·23 and position 9 suggests that this triple also does not form. Instead, a base-triple forms at positions 45·(13·22), resulting in a local conformation different from that of tRNAPhe (Figure 5b) (V. Rath & T. A. Steitz, personal communication). Alignment errors are an unlikely cause of this important difference, since both E. coli tRNAGln and yeast tRNAPhe have a variable loop of five nucleotides, and the bases surrounding the variable triple are positioned similarly in both three-dimensional structures. Different triples also form in E. coli tRNASer (GGA) complexed with seryl-tRNA synthetase (Biou et al., 1994) (Figure 3d). In this type II tRNA, all the tRNAPhe triples are absent, while other base-triples form at positions (8·14)·21, 20·(15·48) and 9·(13·22) (two insertions in the D-loop of this tRNA are given the numbers 20a and 20b by Biou et al. (1994), while they are numbered 19a and 20a in our alignment; the triple noted 20a·(15·48) by these authors is thus 20·(15·48) here). This comparison of tRNA structures is most illuminating. Of the six available tRNA crystal structures, four are different with respect to their base-triples. Some of the observed variations involve only small conformational changes (e.g. the formation of the (8·14)·21 triple), and some might result from the formation of the tRNA synthetase complex (in tRNAGln in particular), but the fact remains that base-triples can form differently even among tRNAs of the same morphological family (e.g. type I tRNAs). There- fore, even if triple sequences demonstrated covaria- tion, this would not always involve the same pairs of positions, and would therefore be poorly detected. This is another important explanation for the relatively low correlations observed at base- triples. Analysis of the network of correlations around base-triples We have shown that tRNA and group I introns present networked sequence correlations in the vicinity of base-triples. If these correlations, which we will also refer to as ‘‘neighbor effects’’, are specific to base-triples, they will constitute a useful instrument for triple identification. The previous observation of triples involving different positions in different molecules could explain some cross-correlations, for example 45/10 and 45/13, which could result from alternative interactions in tRNAPhe and tRNAGln . However, this does not explain correlations between different pairs of the same helix (e.g. 11/12, 12/13, 12/22 and 22/23 in tRNA), which constitute the majority of cross-correlations. From a closer observation of structure variations in base-triples, we explain below how these cross-correlations might result from ‘‘compensatory’’ mutations involving not only paired bases but also adjacent bases in a triple region. All the sequence combinations observed at a given base-triple position cannot, in general, adopt an identical triple conformation. For example, when the third triple position changes from a pyrimidine to a
  • 6. JMB—MS 440 Identification of RNA Triples32 Figure 3. Sequences observed at base-triple positions in type I tRNA. Positions shown are those of yeast tRNAPhe triples. a, 45·(10·25) triple; b, (12·23)·9 triple; c, (13·22)·46 triple. Only values greater than 5 are shown. Numbers in bold type represent more than 10% of the tRNAs. The formation of a triple helix such as that in the tRNA D-stem is a highly cooperative process involving a complex network of ion binding, stacking and van der Waals’ interactions (Bina-Stein & Stein, 1976; Holbrook et al., 1977). Therefore, a structural change such as the one shown in Figure 5c might adversely affect neighboring base-triples, and thus ‘‘compensatory’’ mutations may be required to preserve the conformational or energetic properties of the triple helix. In this case, a mutation in the third base of an adjacent triple can be as appropriate as further mutations in the same triple, since a single change in the flanking position can directly compensate for the backbone displacement. We therefore propose that ‘‘compensatory’’ mutations in a triple helix involve nucleotides in different stacking planes as well as within the planes. Such mutations could propagate through the triple helix and create the multiple correlations observed. If this hypothesis is confirmed, the presence of cross-correlations would be indicative of triple helix formation. An alternative explanation for the presence of networked correlations could be the involvement of the correlated positions in a common RNA identity element. In other words, nucleotides of a base-triple region could be selected as a whole in order to maintain the specificity of the RNA with respect to a certain biological process, such as interaction with a specific protein, thus creating correlations between non-interacting bases. Although a few identity elements have been localized into the base-triple region of tRNA (Pu¨tz et al., 1991; Smith & Yarus, 1989; McClain, 1993a), we do not believe they are an important source of networked correlations, for the following reasons. First, cross-correlations are much higher in the D-stem than in any other part of the molecule (Gutell et al., 1992), although important identity sites are present elsewhere (Hou & Schimmel, 1989; McClain et al., 1991). Second, cross-correlations in the group I intron are also higher in the triple region (stems P4 and P6) than in any other part of the molecule (see analysis below). Finally, a recent experimental study (Hou, 1994) demonstrated that mutations in the tRNA triples (8·14)·21 and (13·22)·46 had major effects on the structure of the 15·48 pair. This shows that large physical constraints exist in this triple region that do not result from tRNA identity. Although we cannot exclude the possibility that identity elements contribute to cross-correlations in base-triple re- gions, there are better reasons for correlations to be caused by base-triples or other complex folding patterns. We will now concentrate on two types of cross-correlations. First, the interdependence of all three bases in a triple produces correlations between each position of the secondary structure base-pair and the third base of the triple (see Tables 1 and 2). Therefore, directly measuring the correlation be- tween secondary structure base-pairs and single- stranded bases (base to base-pair correlation) is expected to produce a stronger signal than the usual pairwise correlations. A second type of interesting purine, it is not always possible to build a triple that would accommodate the bulkier residue without significantly displacing the sugar backbone of the third nucleotide. In the (108·213)·259 triple in the group I intron, most species have a pyrimidine at position 259 (Figure 4a), and these can all be folded into a conformation very similar to that shown for the (C·G)·C triple in Figure 5c (Michel et al., 1990). However, a change from (C·G)·C to (C·G)·G, which occurs naturally, requires a relatively large displacement of the sugar backbone of G259, as shown in Figure 5c (Michel et al., 1990). We expect to observe such variations in most RNA triples, since both purines and pyrimidines generally occur as the third residue (see Figures 3 and 4). Even when sequence variations maintain a purine or a pyrimidine as the third base, hydrogen bonding constraints can prevent the conservation of an identical structure. We thus expect to observe widespread conformational variations of the type shown in Figure 5c.
  • 7. JMB—MS 440 Identification of RNA Triples 33 Figure 4. Sequences (seqs) observed at base-triple positions in group I introns. Positions shown are those of the suggested T. thermophila triples. a, (108·213)·259; b, (109·212)·260; c, (215·258)·106; d, (216·257)·105. Only values greater than 2 are shown. Numbers in bold type represent more than 10% of the introns. correlation occurs between adjacent base-pairs (base-pair to base-pair correlation). Assuming this type of correlation is characteristic of base-triples, its identification should also help predict triples. In the following sections, we propose methods to quantify these two types of correlations. Figure 5. Alternative structures of homologous base-triples in different RNAs. a, Base-triple (12·23)·9 in yeast tRNAPhe , and a possible conformation for the same triple after an A9 to G9 mutation. b, Triples forming with base-pair 13·22 in yeast tRNAPhe (Quigley & Rich, 1976) and E. coli RNAGln (V. Rath & T. A. Steitz, personal communication). c, Proposed structure of group I intron triple (108·213)·259 forming with C259 or G259 (Michel et al., 1990).
  • 8. JMB—MS 440 Identification of RNA Triples34 Table 3 Base to base-pair correlation (x2 ) and neighbor effects (N) in type I tRNAs (895 sequences) Best Suggested cause of correlates Pair x2 best nt x2 a Nb correlationc,d + 13·22 t 46 100 100 Triple + 12·23 t 9 76 64 Triple 31·39 t 36 69 20 Id (Yarus, 82) 3·70 t 35 61 16 Id tRNAAla (Hou & Schimmel, 1989) 51·63 : 36 41 9 + 11·24 : 36 41 57 Id tRNATrp (Hisch, 1971) 13·22 9 45 40 100 Triple (E. coli tRNAGln ) 1·72 : 35 35 16 Id tRNAGln (Rould et al., 1989) + 10·25 : 45 27 27 Triple (yeast tRNAPhe , tRNAAsp ) 1·72 9 73 26 16 Id tRNAGln (Rould et al., 1989) 15·48 : 35 24 NA Id tRNACys (Hou et al., 1993) 27·43 : 36 22 14 Id tRNATrp (Shultz & Yarus, 1994) 30·40 : 36 20 13 Id (Yarus, 1982) Correlations are ranked by the x2 value. Only those correlates within 20% of the highest x2 value are listed. The best correlates are identified with a plus (+) in the first column (see Table 7). a % of highest value. b Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no neighbor in the secondary structure. c Id, identity element possibly responsible for the correlation. When several tRNAs have identity elements matching the correlated positions, only one is cited as an example. d References are given for identity elements only. Inferring Triple Interactions Base to base-pair correlations: identification of potential triples Our goal here was to consider base-pairs as single variables, and directly compute the correlations between base-pairs and single-stranded positions. These base to base-pair correlations can be evaluated using a x2 test, replacing the usual 4 × 4 contingency table with a 16 × 4 contingency table that compares the four possible sequences for the single position with the 16 possible sequences for the base-pair. Such a table is similar to those in Figures 3 and 4. A x2 test can be performed using equation (1). However, large contingency tables increase the probability of having empty or almost empty cells that can strongly bias x2 values. To remedy this problem, we subdivided the 16 × 4 table into several sub-tables. This method is an alternative to that proposed by Olsen (1983) to address the same prob- lem. For each row M and column N in the original 16 × 4 table (T), we create a 2 × 2 table of the form: T(M,N) Si = 1,4T(i,N) − T(M,N) Si = 1,4;j = 1,16T(i,j) − Sj = 1,16T(M,j) − Si = 1,4T(i,N) − T(M,N) Sj = 1,16T(M,j) + T(M,N) Values from table T are compressed in the new 2 × 2 table, so that one of the cells contains T(M,N), two other cells contain the sums of the remaining values of row M and column N, and the last cell contains the sum of all remaining values in table T. Such tables are generated for each value of M and N. Values of x2 are then computed for all sub-tables, except those having expected values smaller than 5 in any of their cells. The highest x2 value generated is kept as the final correlation value. To simulate an application of the method to base-triple prediction, we computed only corre- lations between known secondary structure base- pairs and unpaired positions. In tRNA, tertiary interactions predicted by pairwise comparative analysis (15·48 and 26·44; Gutell et al., 1992) were also included as base-pairs, so that triples involving these pairs could be detected. An application of this procedure to the type I tRNA alignment yields the results shown in Table 3. The following criteria were used to establish the significance of correlations. Since x2 values do not have an upper limit and vary with the number of sequences considered, we did not use absolute x2 values, but the percentage of the highest value encountered in the whole analysis. The highest correlation observed in a given molecule thus takes a value of 100. A cut off point for the significance of correlations was then chosen empirically, based on known base-triples. All known tRNA and group I intron triples (see below) have a x2 value greater than 25% of the highest value. We thus tentatively considered correlations in this range as significant (a cutoff of 20% is used in Table 3 in order to show additional correlations, which will be discussed below). A second selection criterion was introduced to treat base-pairs that had significant correlations (>25%) with several single-stranded positions. To solve this problem, we performed the correlation
  • 9. JMB—MS 440 Identification of RNA Triples 35 analysis in two directions: we sought the single- stranded positions that best correlated with each base-pair, and we sought the base-pairs that best correlated with each single-stranded position. When a base-pair and a single-stranded position are mutually best correlates (hereafter termed ‘‘recipro- cal correlates’’), they are indicated with a double arrow in Table 3. When either the base-pair is the best correlate of the single-stranded position, or the single-stranded position is the best correlate of the base-pair, the correlation is indicated with a single arrow towards the best correlate. When neither the base-pair nor the single-stranded position is the best correlate, the correlation is not shown. This method considerably reduces the number of correlations shown for each position. These two criteria are used in all subsequent analyses. In type I tRNAs, the two triples (13·22)·46 and (12·23)·9 can now be predicted with confidence. They both display high and reciprocal correlations. The best correlate of position 10·25 is 45, as expected from triples forming in yeast tRNAPhe . However, this relationship is not reciprocal: the best correlate of 45 is 13·22, which rather reflects the E. coli tRNAGln situation, where the triple involves positions 45·(13·22). Since there is a precedent for base 45 to form triples with at least two different base-pairs, correlations results like this are expected. Other high and reciprocal correlations are (31·39)/36 and (3·70)/35. Interestingly, these seem- ingly false positives are not artifacts. The pair 3·70 is an important identity element for the aminoacylation of alanine tRNAs of several organisms (McClain & Foss, 1988; Hou & Schimmel, 1989) and of various other tRNAs (reviewed by McClain, 1993b), so it is not surprising that it varies in concert with position 35, at the center of the anticodon, and therefore necessarily associated with tRNA identity as well. The (31·39)/36 correlation was identified by Yarus (1982) in the ‘‘extended anticodon’’ hypothesis, which states that several positions in the anticodon stem and loop are selected as a block to confer on the tRNA an optimal coding accuracy. It is also noteworthy that 36 is the best correlate of pair 27·43, which reflects a recent experimental association between these two sites in the control of translation by tRNATrp (Schultz & Yarus, 1994). Other ‘‘false positives’’ can be related to known tRNA identity elements (see references in Table 3), suggesting that this method is identifying biologically meaningful associations. The recent determination of a tRNASer crystal structure (Biou et al., 1994) provides us with valuable base-triple information about type II tRNAs. We applied the same correlation analysis to this class of tRNAs to determine whether the two triples forming in tRNASer at positions 20·(15·48) and 9·(13·22) could be detected (there is not enough variation in basepair 8·14, also involved in a triple, to seek correlations involving this pair). Table 4 presents the highest base to base-pair correlations observed in an alignment of 262 type II tRNAs (comprising serine, leucine and certain tyrosine tRNAs). In excellent agreement with the crystallo- graphic data, the only ‘‘reciprocal correlates’’ in Table 4 are observed at positions involved in base-triples in E. coli tRNASer . The triple 20·(15·48) has the highest overall x2 value, and the triple 9·(13·22) has a x2 value above the threshold of significance defined previously, albeit relatively low Table 4 Base to base-pair correlation (x2 ) and neighbor effects (N) in type II tRNAs (262 sequences) Best Suggested cause of correlates Pair x2 best nt x2 a Nb correlationc + 15·48 t 20 100 NA Triple 15·48 9 21 94 NA Neighbor effect 15·48 9 59 91 NA + 12·23 : 21 90 64 12·23 9 15 86 64 12·23 9 48 82 64 12·23 9 20 79 64 15·48 9 35 57 NA Id? (as in tRNACys ) 3·70 : 35 37 16 Id? (as in tRNAGln ) 12·23 9 73 36 64 2·71 9 20 33 11 6·67 t 15 30 13 2·71 9 59 28 11 27·43 t 35 28 14 Id? (as in tRNATrp ) 27·43 9 36 28 14 Id? (as in tRNATrp ) + 13·22 t 9 27 100 Triple 6·67 9 37 26 13 Correlations are ranked by the x2 value. Only those correlates within 25% of the highest x2 value are listed. The best correlates are identified with a plus (+) in the first column (see Table 7). a % of highest value. b Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no neighbor in the secondary structure. c See footnotes to Table 3.
  • 10. JMB—MS 440 Identification of RNA Triples36 Figure 6. Sequences observed at positions forming base-triples in E. coli tRNASer . a, Triple (13·22)·9. b, Triple (15·48)·20. Only values greater than 2 are shown. Numbers in bold type represent more than 10% of the tRNAs. results. These mutations are particularly interesting, since they are probably not the result of a fortuitous ancestral event, but instead they more likely reflect ‘‘neutral’’ changes between functionally equivalent sequences. To help identify more base-pair and base-triple interactions with comparative analysis, we need to determine the number of times these concerted mutations have occurred throughout the evolution of the RNA under study. The larger the number of such phylogenetic events, (e.g. concerted mutations over evolutionary space), the more significant that correlation is, and thus the more confident we are that the positions of interest are physically interacting. This general concept was introduced a number of years ago, and was utilized to reinforce our case for some of the first proposed base–base tertiary interactions in 16 S rRNA (Gutell et al., 1985). This type of observation, essential in correlation analyses, requires knowledge of the phylogenetic relationships among the sequences under study. For tRNA, these relationships are unclear. On one hand, all tRNAs interact with the ribosome and its factors; thus, they are all under this common constraint; changes in their sequence in the evolutionary dimension will be neutral. On the other hand, tRNA sequences within each acceptor family are constrained by a specific synthetase recognition function. Thus, tRNAs have at least two mutational dimensions, which obscure their phylogenetic his- tory (for a more detailed assessment of this issue, please see: Ninio, 1982; Cedergren et al., 1981). In contrast, a molecule such as the 16 S rRNA has the same function in all organisms. Its phylogeny, and the phylogeny of the cells in which these 16 S rRNAs exist, is well defined (Woese, 1987). Thus, compara- tive studies can determine with more confidence the number and nature of the concerted mutations that have occurred throughout a phylogenetic tree, allowing us to pinpoint mutations that occurred between closely related RNAs. These changes are the most likely to be ‘‘neutral’’. Group I introns are mobile elements with a fast evolutionary clock, and therefore their phylogeny cannot be defined as well as that of rRNAs. However, there is no known variety of functions in group I introns that would impede the construction of a tree as it does for tRNAs. Since a consistent (although imperfect) classification of group I introns is available (Michel & Westhof, 1990), we can search for significant phylogenetic events more rigorously than we did for tRNAs. In this section, we implement a simple method to count mutations, and use its results to strengthen base-triple prediction in the group I introns. Our phylogenetic event counting was performed as follows. Group I intron sequences in the alignment were classified into phylogenetic groups as de- scribed in Materials and Methods. For each potential triple position (i·j)·k (i and j being base-paired and k single-stranded in the secondary structure), changes are counted as aligned sequences, are examined from the first sequence to the last; c equals the number of times a change is observed at i,j or k (27% of the highest value). No significant corre- lations are detected for canonical type I tRNA triples, in agreement with crystal and solution studies, which suggest that these triples are absent in type II tRNAs (Biou et al., 1994; Dock-Bregeon et al., 1989; Dietrich et al., 1990; Baron et al., 1993). The sequences observed at position (15·48)·20 and 9·(13·22) in the type II tRNA dataset are shown in Figure 6. (The base-triples in yeast tRNASer are (G15·C48)·U20 and G9·(G13·A22).) The correlation (15·48)/20 is due primarily to an association of A20 with A15·U48, and an U or C at position 20 with G15·C48 (Figure 6a). Analysis of the alignment reveals these three principal sequences are present in all type II isoacceptor groups (data not shown), suggesting that concerted changes have occurred several separate times through evolution. The significance of a correlation is considerably increased when multiple concerted changes are observed independently, as they are here (Gutell et al., 1985). The correlation (13·22)/9 is primarily due to an association between A9 and A13·A22 (Figure 6b). Although the correlation is relatively weak, con- certed changes yielding sequence A9·(A13·A22) occur in all type II isoacceptor groups, and even among isoacceptor tRNAs from the same organism (data not shown). This again indicates that the correlation is very significant. Phylogenetic Event Counting In the previous section, we mentioned a few concerted mutations occurring in closely related tRNAs to help support some of our correlation
  • 11. JMB—MS 440 Identification of RNA Triples 37 and e equals the number of times a change is observed at (i or j) and k (i.e. a concerted change between the base-pair and the third position). The ratio e/c is the proportion of mutual changes over the total number of changes, and is our measure of phylogenetic events. As noted earlier, the detail we can decipher with correlation analysis is enhanced by incorporating phylogenetic event information into our algorithm. For this paper we have not sought a complete solution, since that would entail a better appreciation of the phylogenetic relationships of the RNAs under study, and better knowledge of how to value mutual changes that occur between distantly and closely related organisms. For the purposes of this article we have developed a simple method which assumes that the sequences are roughly ordered by their phylogenetic relationships, and treats all mutual changes as equivalent. Therefore, a large number of mutual changes within closely related RNA sequences will increase the e/c value more than a few mutual changes between distantly related RNA species. For our immediate needs, this approximation works well, as we will see. Results of the combined x2 and e/c analysis of group I intron sequences are presented in Table 5. Here the x2 analysis was performed first, followed by an e/c analysis for each significant x2 base-pair/base correlate. An asterisk in Table 5 denotes those triple correlations that score the highest with the e/c analysis. Among the three highest x2 reciprocal correlates are (109·212)/260 and (108·213)/259, corresponding to the two proposed base-triples in the P4 stem (Michel et al., 1990). Both of these triples are also strongly supported by our phylogenetic counting method. However, the proposed P6 stem triples (215·258)·106 and (216·257)·105 are not accurately predicted. Of these two previously proposed triples, position 105 correlates best with the pair 216·257 with x2 analysis, and the e/c analysis associates the pair 216·257 with position 105. The highest x2 correlations for the P6 base-pairs are (216·257)/106 (a reciprocal correlate) and (215·258)/103. The (215·258)/106 correlation is in the significant range (38% of the highest value), but it is not shown in Table 5 because better correlations involving positions 106 and 215·258 exist (see above). There are several possible explanations for these apparent inaccuracies. First, the P3/P4 junction (positions 103 to 106) generally varies in size from three to five bases, and there are a few examples of an insertion of several hundred bases. Thus, the sequences in this region cannot be aligned with absolute confidence. Until more sequence infor- mation suggests otherwise, we have justified the two unpaired 3' P3/P4 nucleotides toward the P4 stem, Table 5 Base to base-pair correlation (x2 ) and neighbor effects (N) in group I introns (222 seqs) Best x2 + e/c Suggested cause correlates Pair besta nt x2 b Nc Sequences of correlationd + 109·212 * t * 260 100.0 67 222 Triple + 262·312 * t * 263 97.4 NA 222 Triple? + 108·213 * t * 259 95.8 75 222 Triplee 216·257 t 106 61.2 100 221 Neighbor effect + 110·211 * t * 305 56.9 36 222 Triple?f 280·298 t 279 52.9 39 210 268·307 : 279 49.3 2 215 + 216·257 9 * 105 48.7 100 222 Triple 215·258 t 103 46.0 100 220 Neighbor effect 268·307 9 256 40.3 2 182 97·277 : 279 40.1 23 183 285·293 : 256 38.2 30 161 107·214 : 260 36.4 78 222 Neighbor effect 215·258 9 269 35.4 100 187 + 220·253 * t 255 33.0 15 161 Triple? 216·257 9 101 32.3 100 221 215·258 9 217 29.6 100 119 Neighbor effect 111·209 : * 305 29.4 43 222 Neighbor effect? 109·212 9 304 29.2 67 222 Neighbor effect? 102·272 : 263 27.8 NA 221 286·292 : 300 26.9 18 150 102·272 9 270 25.2 NA 189 Correlations are ranked by the x2 value. Only those correlates within 25% of the highest x2 value are listed. The best correlates are identified with a plus in the first column (see Table 7). a x2 best correlate noted with arrows; best e/c ratio noted with *. b % of highest value. c Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no neighbor in the secondary structure. d References in text. e The best base e/c correlates for the (108·213) base-pair is shared by positions 259 and 302. f The best base-pair e/c correlates for position 305 are shared by (110·211) and (97·277). The alignment in the vicinity of the (97·277) base-pair is questionable due to length variation of the P3 helix. Thus we believe the best correlation is between (110·211) and 305.
  • 12. JMB—MS 440 Identification of RNA Triples38 Figure 7. Sequences observed at various correlating group I intron positions. a, (262·312)/263. b, (110·211)/305. c, (280·298)/279. Only values greater than 2 are shown. Numbers in bold type represent more than 10% of the introns. while the other P3/P4 nucleotides are justified toward the P3 helix. Alternative decisions could have produced significantly different correlations for nucleotides 103 to 106. Other potential problems in the identification of P6 triples were raised by a recent NMR study (Chastain & Tinoco, 1993), which suggested that P6 triples involved base/sugar interactions, and varied significantly in structure upon sequence change. In contrast to base/base interactions, base/sugar interactions could produce sequence constraints in which correlations between adjacent bases become predominant, thus possibly explaining these unexpected correlations. Finally, it is also possible that very high neighbor effects could relegate the actual triple correlations to second position. Other reciprocal correlates having high x2 values are present. One involves positions 262·312 and 263. This triple correlation is also strongly identified with our phylogenetic event-based method (see Table 5). Figure 7a shows the sequences observed at (262·312)·263. C263 is associated with A262·U312, while A263 is associated with G262·C312 or C262·G312. Our observations of the alignment reveals several concerted mutations occurring among closely related introns, particularly within sub- groups IB1 and IC1 (data not shown). The base-pair 262·312 is itself well supported, with multiple independent covariations observed (data not shown). This strong correlation can be interpreted in various ways. On the basis of a three-dimensional modeling study of the intron guanosine binding site (Yarus et al., 1991), it was proposed that these three nucleotides form a base-triple. Alternatively, it has been suggested, from experimental mutagenesis studies, that this sequence constraint is necessary to ensure that the nucleotide at position 263 is bulged out of this helix, and is not base-paired to position 312 (Couture et al., 1990). Note that when position 263 is a C, the 262·312 base-pair is an A·U or U·A. When 263 is an A, 262·312 is a G·C or C·G. Thus, position 263 is not able to form a standard Watson–Crick pair with 312. This hypoth- esis suggests that the triplets (U·A)A and (G·C)C should also be found, which has not been the case to date. We favor the suggestion by Yarus that a base-triple interaction forms between these pos- itions. A second correlation, (110·211)/305, is supported by the e/c study and a high x2 reciprocal correlation. This correlation is particularly interesting, since it involves nucleotides spanning two distant domains of the group I intron, namely the P3/P7 and P4/P6 coaxial stems. The correlation results primarily from an exchange between the sequence patterns (A·U)·C and (G·C)·U. This correlation, unlike many of the others, occurs in its purest form in the subgroups 1A and 1D (Figure 7b), although covariation between these triplets is found in the other subgroups, albeit intermixed with non-con- verted variations (data not shown). The correlation (110·211)/305 was identified previously using a smaller dataset (Michel & Westhof, 1990) but was disregarded on the ground of steric conflicts with the P4 triples. However, more recent experimental data (Pyle et al., 1992) have suggested interactions between the P1 stem and the J7/8 strand that shift J7/8 towards the P4 stem, and thus reduce the distance between nucleotides 110·211 and 305. Adjusting the current three-dimensional model to take these new data into account could suggest alternative ways to form a (110·211)/305 interaction, and perhaps resolve the steric conflicts. In addition, two other correlations in Table 5, (111·209)/305 and (109·212)/304, resemble the neighbor effects that could be expected in the presence of a (110·211)·305 triple. The other reciprocal correlations in Table 5 are (280·298)/279 and (220·253)/255. The first is not supported by the e/c analysis, and thus we do not consider it a credible triple candidate. The other reciprocal correlation, (220·253)/255, is supported
  • 13. JMB—MS 440 Identification of RNA Triples 39 by a significant number of coordinated changes, primarily in subgroups IC1 and IC2. The number of nucleotides between positions 253 and 257 is variable. Thus it is difficult to align these unpaired nucleotides across all the subgroups with much confidence. However, within the IC1 subgroup this number is three in almost all cases, while it is always five in the IC2 subgroups, allowing us to obtain a reliable local alignment for these two groups. The sequences observed in these two subgroups are shown in Figure 7c. Formation of a (220·253)·255 triple is feasible stereochemically, nucleotide 255 being situated in the internal loop flanking the 220·253 base-pair. The combined e/c and x2 analysis has identified three additional base-triple candidates in the group I introns, namely (262·312)·263, (110·211)·305 in the ID and IA subgroups, and (220·253)·255 in the IC1 and IC2 subgroups. Base-pair to base-pair correlations: identification of neighbor effects The identification of base-triples requires the ability to distinguish between correlations due to physical interactions and those due to other factors, such as RNA identity or accidental evolutionary events. We have suggested that networked sequence correlations are characteristic of triple-helix for- mation. We now propose to use this property to help distinguish base-triples (at least when present in triple helices) from other correlated positions. A simple method to assess neighbor effects is to directly measure correlations between base-pairs. For this purpose, we perform a x2 test as done in the previous analysis, the only difference being a contingency table having 16 rows and 16 columns (instead of 16 × 4). The sparseness problem is again resolved here by creating smaller 2 × 2 tables, computing x2 in each table, and retaining the highest value. A simple measure of the neighbor effect, N, could then involve computing x2 for each set of adjacent base-pairs (i,j) and (i + 1,j − 1): N = x2 (i,j,i + 1,j − 1). However, since sequence corre- lations also occur between positions separated by several base-pairs in the same helical stem (Tables 1 and 2), the neighbor effect N at base-pair i,j can be more accurately measured by averaging correlations in a window comprising n base-pairs at each side of i,j, using the following formula: N(i,j) = s k = 1, n (x2 (i,j,i + k,j − k) + x2 (i,j,i − k,j + k)) 2n (3) If i 2 n or j 2 n is not a paired position, the corresponding correlation is not computed, and n is corrected accordingly. We use n = 2, and thus evaluate a window of five base-pairs (from i − 2 to i + 2) surrounding i,j. Figure 8 shows results obtained for tRNA and the group I intron. Figure 8. Neighbor effects measured in equation (3). The density of the dots is proportional to N(i,j), darker dots representing the highest values and lighter dots the lowest values. Precise N(i,j) values for base-pairs of interest are given in Tables 3 to 5. a, Type I tRNA. b, Group I intron. (b) (a)
  • 14. JMB—MS 440 Identification of RNA Triples40 Table 6 Sequences observed at group I intron positions (109·212) and (108·213) 108·213 : Neighbor effects (N) 109·212 A·U U·A C·G G·C A·U — — 26 — C·G 5 7 6 122 G·C — — 36 — U·G — — — 5 Only values greater than 2 are shown. —Numbers in bold face represent more than 10% of the group I intron sequences. Combining analyses for base-triple prediction The various analyses presented here can be combined into a single protocol for base-triple prediction. The criteria we propose to apply in this protocol remain loose at this stage of our work, but will be refined as the method is applied to other classes of RNA. These criteria are presented here. First, we believe good triple candidates should score well in both base to base-pair correlations (x2 and e/c) and neighbor effect analysis. A cutoff of 25% of the highest value for x2 and neighbor effect measurements would retain all experimentally proven triples in tRNA and group I introns. We therefore require that values for x2 and neighbor effects N (given in Tables 3 to 5) stand above this threshold. A measure of phylogenetic events (e/c) being available for group I introns, we require that triple correlations in the group I intron are associated to a significant level of concerted mutations (at least one asterisk in Table 5). Finally, to tighten the prediction criteria, we require x2 correlations to be reciprocal. The triplets that best satisfy this stringent criteria are revealed in the first row in Table 7. This stringent criterion yields no false positives in either tRNA family. In type II tRNA, the triple (13·29)·9 is predicted, but a question remains for the triple (15·48)·20. We cannot use equation (3) to compute the neighbor effect associated with this triple, since no secondary base-pair flanks the 15·48 pair. However, the strong correlation observed in Table 4 between 15·48 and 21 could very well be a neighbor effect. Thus, we tentatively include this triple in Table 7. In type I tRNA, two of the three yeast tRNAPhe base-triples are predicted, although 45·(10·25) is not. In group I introns, the previously identified P4 triples are predicted, along with one experimentally unproven interaction, (110·211)·305. Two triple candidates with In tRNA and group I introns, helices associated with base-triples show significantly larger neighbor effects (N, measured as in eqn (3)) than those helices with no known base-triples. To illustrate these strong base-pair to base-pair correlations, we show in Table 6 the sequences observed in group I introns at positions 109·212 and 108·213. The base-pair G108·C213 is strongly associated with a C·G at position 109·212, while C108·G213 is associated with A·U or G·C at position 109·212. In group I introns (Figure 8b), neighbor effects are consistent with triple formation in the P4/ P6 helices, and are also significant at positions 110·211, a base-pair having a potential triple partner (Table 5). However, no significant neighbor effect supports the strong triple correlations (262·312)/263 and (220·253)/255. In spite of this result, we still support the formation of base-triples at these positions, since these triples would not be part of an extended triple-helical region, which we proposed was necessary for the base-pairs to have noticeable neighbor effects. Also, base-pairs near 262·312 in P7 are extremely conserved, and thus limit any base correlation in this region. Table 7 Triples predicted in tRNA and group I introns based on Tables 3 to 5, using two different criteria Criteria for tRNA tRNA triple prediction type I type II Group I introna Stringent x2 (base to base-pair) > 25% of (13·22)·46 (13·22)·9 (109·212)·260 highest value (12·23)·9 (15·48)·20b (108·213)·259 N > 25% of highest value (110·211)·305 Best reciprocal correlate — (262·312)·263 (220·253)·255c Relaxed x2 (base to base-pair) > 25% of Same + Same + Same + highest value (11·24)·36 (12·23)·21 (216·257)·105 N > 25% of highest value (10·25)·45 Each position involved in only one triple (not necessarily best reciprocal correlate) a For group I intron triples, we use the phylogenetic event count as an additional criterion. Only putative triples associated with an asterisk in Table 5 are included. b N cannot be measured for this position, but there is a large cross-correlation at (15·48)/21. c These 2 putative triples are not supported by neighbor effects, but are best reciprocal correlates and associated with significant phylogenetic events (see discussion in text).
  • 15. JMB—MS 440 Identification of RNA Triples 41 neighbor effects below the 25% threshold, (262·312)·263 and (220·253)·255, are noteworthy, since they satisfy all of our other requirements. While the other group I intron triples would be complexed in a triple-helix formation, these two putative triples are both isolated from other known base-triples; therefore, they would not be part of a triple helix. Further study is required to determine if this is the reason for their lack of neighbor effects. Until we have the results from this study, the biologist’s judgement is still necessary to resolve these ‘‘border-line’’ cases. The possible existence of the triple (110·221)·305 has been discussed. The prediction criteria were relaxed by allowing for non-reciprocal correlations, under the condition that no base-pair or single-stranded nucleotide belongs to more than one triple (Table 7, line 2). For type I tRNAs, the triple 45·(10·25) is now predicted. The relaxed criteria also identify the correlation (11·24)/36. We suggest that this unique false positive results from a functional linkage between positions 24 and 36, on the basis of experiments establishing that mutations at position 24 affect codon/anticodon recognition by tRNATrp (Hirsh, 1971; Smith & Yarus, 1989). In type II tRNAs, the relaxed criteria identify the correlation (12·23)/21. Instead of interacting with the pair 12·23, as this correlation suggests, nucleotide 21 faces the pair 8·14 in the type II tRNASer crystal structure, and is proposed to interact with or face pair 8·14 in other type II tRNA solution structures (Dock-Bregeon et al., 1989; Baron et al., 1993). However, since bases 12·23 and 21 are close in space, we cannot rigorously exclude their interaction in certain type II tRNAs. In group I introns, the relaxed criterion identifies the triple (216·257)·105, one of the previously proposed P6 triples (Michel & Westhof, 1990). Conclusion and Perspectives Our previous correlation analyses sought corre- lations that occur between two positions in an RNA alignment (Gutell et al., 1992). While these analyses effectively predicted secondary structure pairing, we had difficulty identifying base-triples with confi- dence. We suggest here two reasons for this weakness. First, structurally similar base-triples can form between bases that vary in a non-compensatory fashion, which reduces covariation. Second, base- triples do not necessarily involve the same positions in all members of an RNA family. With these obstacles in mind, we have developed methods to enhance our ability to predict base- triples by specifically seeking correlations between secondary structure base-pairs and nucleotides unpaired in the secondary structure. This signifi- cantly enhances correlations for base-triples. During our earlier studies, we also identified weaker correlations between many of the bases in the tRNA D-stem. We suggested that these effects could be specific to base-triples forming local triple helices. We developed an algorithm that quantifies these neighbor effects in RNA secondary helices. The most pronounced effects in tRNA were in the D-helix, while in the group I intron they were in the P4 and P6 helices, the same helices known to be involved in triple formation. The combination of these two correlation analyses identifies known base-triples more effectively than any previous method. The accuracy of current protocols is limited by heterogeneity within the sequence datasets. Base- triple prediction will remain ambiguous as long as the dataset analyzed contains RNAs that form triples in different positions. For example, we are currently unable simultaneously to predict triples (13·22)·46 and 45·(13·22) in type I tRNAs, since they both occur in the analyzed sequences. It should be possible to isolate subsets of sequences displaying specific correlations, and enhance predictions in each subset. The growth of RNA databases, and the availability of the algorithms presented herein, will certainly lead us in that direction. Another enhancement would be to combine the various prediction criteria introduced in this study into an automated protocol. An integration of x2 correlation values and phylogenetic event counts would be particularly useful in RNAs with well established phylogenetic relationships, such as the ribosomal RNAs. Materials and Methods Sequence alignments The tRNA sequence alignment used was adapted from Sprinzl et al. (1991). We aligned the variable loop (which was not aligned in the original database), and removed mitochondrial sequences, leaving 895 type I and 263 type II nuclear tRNAs, which were analyzed separately. The group I intron alignment contains 222 sequences compiled by S. H. Damberger and R. R. Gutell (unpublished results). Analyses were performed only on the core region comprising the stems P1, P3, P4, P6, P6a, P7, P8, a part of P5 and all intervening single-stranded segments. Intron sequences were classified into structurally distinct subgroups (IA, IB, IC and ID) according to the definitions of Michel & Westhof (1990). We further subdivided each subgroup using these criteria: (1) the sequences within each subgroup were ordered by the type of gene in which the intron was found (e.g. ATP9, SSU rRNA, etc.). (2) The specific site in that gene where the intron was found (e.g. SSU site 531). (3) Cellular location (e.g. nucleus, mitochondrion, chloroplast) of the intron. (4) A rough phylogenetic ordering of the organisms. Structural data Detailed base-triple information is available for six tRNA crystal structures: yeast tRNAPhe (Quigley & Rich, 1976; Sussman & Kim, 1976), Escherichia coli tRNAMet f (Woo et al., 1980), yeast tRNAAsp (Dumas et al., 1985), E. coli tRNAGln (Rould et al., 1989), yeast tRNAMet i (Basavappa & Sigler, 1991) and Tetrahymena thermophilus tRNASer 2 (GGA) (Biou et al., 1994). Although no crystal structures are available for group I introns, it has been suggested that triples form in the P4 and P6 helices (Michel et al., 1990; Michel & Westhof, 1990). The existence of both P4 triples and one of the proposed P6 triples is supported by
  • 16. JMB—MS 440 Identification of RNA Triples42 mutagenesis experiments (Michel et al., 1990; Green & Szostak, 1994). There is good evidence for the formation of base–base interactions in the P4 triples, but the nature of the interactions in the P6 triples remains unclear. NMR experiments on a model oligonucleotide that partially reproduced the P4/P6 domain suggested that triple interactions exist in the form of base–backbone contacts (Chastain & Tinoco, 1993). However, the applicability of these latter results in the group I intron context is uncertain, given that important parts of the P4/P6 triple domain are absent from the construct. Programs Sequence alignments were visualized and manipulated using the alignment editor AE2 (T. Macke, The Scripps Clinic, CA) available from the Ribosomal Database Project (Larsen et al., 1993), and studied using a comparative sequence analysis program developed in our laboratory (S. H. Damberger, D. Gautheret & R. R. Gutell, unpublished results). This software computes frequencies of bases, base-pairs and base-triples, performs pairwise correlation analyses using mutual information (Chiu & Kolodziejczak, 1991; Gutell et al., 1992), and computes various types of correlations based on x2 tests and phylogenetic event counting, as discussed above. Sec- ondary structure graphics were produced using the program XRNA (B. Weiser & H. Noller, unpublished results). Notation We adopted the notation (X·Y)·Z to describe a triple interaction involving the secondary base-pair X·Y and position Z, where Z interacts with Y; and we use Z·(X·Y) when Z interacts with X. When interacting nucleotides are not well established, as in the group I intron, we always use the notation (X·Y)·Z. We use the term ‘‘base-triple’’ when only the bases interact, ‘‘nucleotide-triple’’ when base– backbone contacts are involved, and simply ‘‘triple’’ as the general term. Correlations between positions X and Y are noted X/Y. The numbering systems used are those of yeast tRNAPhe and the T. thermophila group I intron. Acknowledgements This work was supported by grants from the NIH (GM48207) and the Colorado RNA Center to R.R.G. We thank SUN Microsystems for their donation of computer equipment, and the W. M. Keck Foundation for its support of RNA Science on the Boulder campus. We also thank Dr T. Cech for comments on the manuscript, and Drs V. Rath and T. Steitz for sharing information on the tRNAGln structure. References Baron, C., Westhof, E., Bo¨ck, A. & Giege´, R. (1993). Solution structure of selenocysteine-inserting tRNASec from Escherichia coli. J. Mol. Biol. 231, 274–292. Basavappa, R. & Sigler, P. B. (1991). The 3 A˚ crystal structure of yeast initiator tRNA: functional impli- cations in initiator/elongator discrimination. EMBO J. 10, 3105–3111. Bina-Stein, M. & Stein, A. (1976). Allosteric interpretations of the Mg2 + binding to the denaturable Escherichia coli tRNAGlu 2 . Biochemistry, 15, 3912–3917. Biou, V., Yaremchuk, A., Tukalo, M. & Cusack, S. (1994). The 2.9 A˚ crystal structure of T. thermophylus seryl-tRNA synthetase complexed with tRNASer . Science, 263, 1404–1410. Cech, T. R., Damberger, S. D. & Gutell, R. R. (1994). Representation of the secondary and tertiary structure of group I introns. Nature Struc. Biol. 1, 273–280. Cedergren, R. J., LaRue, B. & Grosjean, H. (1981). The evolving tRNA molecule. CRC Crit. Rev. Biochem. 11, 35–104. Chastain, M. & Tinoco, I., Jr (1993). Nucleoside triples from the group I intron. Biochemistry, 32, 14220–14228. Chiu, D. K. Y. & Kolodziejczak, T. (1991). Inferring consensus structure from nucleic acid sequences. Comp. Appl. Biosci. 7, 347–342. Couture, S., Ellington, A. D., Gerber, A. S., Cherry, J. M., Doudna, J. A., Green, R., Hanna, M., Pace, U., Rajagopal, J. & Szostak, J. W. (1990). Mutational analysis of conserved nucleotides in a self-splicing group I intron. J. Mol. Biol. 215, 345–358. Dietrich, A., Romby, P., Mare´chal-Drouard, L., Guillemaut, P. & Giege´, R. (1990). Solution conformation of several free tRNALeu species from bean, yeast and Escherichia coli, and interaction of these tRNAs with bean cytoplasmic leucyl-tRNA synthetase. A phosphate alkylation study with ethylnitrosourea. Nucl. Acids Res. 18, 2589–2597. Dock-Bregeon, A. C., Westhof, E., Giege´, R. & Moras, D. (1989). Solution structure of a tRNA with a large variable region: yeast tRNASer . J. Mol. Biol. 206, 707–722. Dumas, P., Ebel, J. P., Giege´, R., Moras, D., Thierry, J. C. & Westhof, E. (1985). Crystal structure of yeast tRNAAsp : atomic coordinates. Biochimie, 67, 597–606. Green, R. & Szostak, J. W. (1994). In vitro genetic analysis of the hinge region between helical elements P5-P4-P6 and P7-P3-P8 in the sunY group I self-splicing intron. J. Mol. Biol. 235, 140–155. Gutell, R. R. (1993). Comparative studies of RNA: inferring higher-order structure from patterns of sequence variation. Curr. Opin. Struct. Biol. 3, 313–322. Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F. (1985). Comparative anatomy of 16S-like ribosomal RNA. Progr. Nucl. Acid. Res. 32, 155–216. Gutell, R. R., Power, A., Hertz, G. Z., Putz, E. J. & Stormo, G. D. (1992). Identifying constraints on the higher- order structure of RNA: continued development and application of comparative sequence analysis methods. Nucl. Acids Res. 20, 5785–5795. Gutell, R. R., Larsen, N. & Woese, C. R. (1994). Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev. 58, 10–26. Haselman, T., Chappelear, J. E. & Fox, G. E. (1988). Fidelity of secondary and tertiary interactions in tRNA. Nucl. Acids Res. 16, 5673–5684. Hirsh, D. (1971). Tryptophan transfer RNA as the UGA suppressor. J. Mol. Biol. 58, 439–458. Holbrook, S. R., Warrant, R. W., Church, G. M. & Kim, S. H. (1977). RNA–ligand interactions: [I] Magnesium binding sites in yeast tRNAPhe . Nucl. Acids Res. 4, 2811–2820. Hou, Y. M. (1994). Structural elements that contribute to an unusual tertiary interaction in a transfer RNA. Biochemistry, 33, 4677–4681. Hou, Y. M. & Schimmel, P. (1989). Evidence that a major determinant for the identity of a transfer RNA is conserved in evolution. Biochemistry, 28, 6800–6804.
  • 17. Identification of RNA Triples 43 Hou, Y. M., Westhof, E. & Giege, R. (1993). An unusual RNA tertiary interaction has a role for the specific aminoacylation of a transfer RNA. Proc. Nat. Acad. Sci., U.S.A. 90, 6776–6780. Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement of a GNRA tetraloop in long-range RNA tertiary interactions. J. Mol. Biol. 236, 1271–1276. Klug, A., Ladner, J. & Robertus, J. D. (1974). The structural geometry of co-ordinated base changes in transfer RNA. J. Mol. Biol. 89, 511–516. Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J., Overbeek, R. N., Macke, T. J., Marsh, T. L. & Woese, C. R. (1993). The ribosomal database project. Nucl. Acids Res. 21 Suppl., 3021–3023. Levitt, M. (1969). Detailed model for transfer ribonucleic acid. Nature (London), 224, 759–763. Major, F., Gautheret, D. & Cedergren, R. (1993). Reproducing the three-dimensional structure of a tRNA molecule from structural constraints. Proc. Nat. Acad. Sci., U.S.A. 90, 9408–9412. Malhotra, A., Tan, R. K. & Harvey, S. C. (1990). Prediction of the three-dimensional structure of Escherichia coli 30S ribosomal subunit: a molecular mechanics approach. Proc. Nat. Acad. Sci., U.S.A. 87, 1950–1954. McClain, W. H. (1993a). Identity of Escherichia coli tRNACys determined by nucleotides in three regions of tRNA tertiary structure. J. Biol. Chem. 268, 19398–19402. McClain, W. H. (1993b). Rules that govern tRNA identity in protein synthesis. J. Mol. Biol. 234, 257–280. McClain, W. H. & Foss, K. R. (1988). Changing the identity of a tRNA by introducing a G-U wobble pair near the 3' acceptor end. Science, 240, 793–796. McClain, W. H., Foss, K. R., Jenkins, R. A. & Schneider, J. (1991). Rapid determination of nucleotides that define tRNAGly acceptor identity. Proc. Nat. Acad. Sci., U.S.A. 88, 6147–6151. Michel, F. & Westhof, E. (1990). Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. J. Mol. Biol. 216, 585–610. Michel, F., Ellington, A. D., Couture, S. & Szostak, J. W. (1990). Phylogenetic and genetic evidence for base triple formation in the catalytic domain of group I introns. Nature (London), 347, 578–580. Ninio, J. (1982). Molecular Approaches to Evolution, pp. 24–27, Pitman Books Ltd., London, U.K. Olsen, G. J. (1983). Comparative analysis of nucleotide sequence data, PhD dissertation, University of Colorado Health Sciences Center, CO. Pu¨tz, J., Puglisi, J. D., Florentz, C. & Giege´, R. (1991). Identity elements for specific aminoacylation of yeast tRNAAsp by cognate aspartyl-tRNA synthetase. Science, 252, 1696–1699. Pyle, A. M., Murphy, F. L. & Cech, T. R. (1992). RNA substrate binding site in the catalytic core of the Tetrahymena ribozyme. Nature (London), 358, 123–128. Quigley, G. J. & Rich, A. (1976). Structural domains of transfer RNA molecules. Science, 194, 796–806. Rould, M. A., Perona, J. J., So¨ll, D. & Steitz, T. A. (1989). Structure of E. coli glutamyl-tRNA synthetase complexed with tRNAGln and ATP at 2.8 A˚ resolution. Science, 246, 1135–1142. Shultz, D. W. & Yarus, M. (1994). tRNA structure and ribosomal function. I. tRNA nucleotide 27 to 43 mutations enhance first position wobble. J. Mol. Biol. 235, 1381–1394. Smith, D. & Yarus, M. (1989). Transfer RNA and coding specificity. II. A D-arm tertiary interaction that restricts coding range. J. Mol. Biol. 206, 503–511. Sprinzl, M., Dank, N., Nock, S. & Scho¨n, A. (1991). Compilation of tRNA sequences and se- quences of tRNA genes. Nucl. Acids Res. 19 (Suppl.) 2127–2171. Sussman, J. L. & Kim, S.-H. (1976). Three-dimensional structure of a transfer RNA in two crystal forms. Science, 176, 853–858. Winker, S., Overbeek, R., Woese, C. R., Olsen, G. J. & Pfluger, N. (1990). Structure detection through automated covariance search. Comp. Appl. Biosci. 6, 365–371. Woese, C. R. (1987). Bacterial evolution. Microbiol. Rev. 51, 221–271. Woese, C. R. & Pace, N. R. (1993). Probing RNA structure, function and history by comparative analysis. In The RNA World (Gesteland, R. F. & Atkins, J. F., eds), pp. 91–117, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Woo, N. H., Roe, B. A. & Rich, A. (1980). Three-dimensional structure of Escherichia coli initiator tRNAMet f . Nature (London), 286, 346–351. Yarus, M. (1982). Translational efficiency of transfer RNAs: uses of extended anticodon. Science, 218, 646–652. Yaris, M., Illangesekare, M. & Christian, E. (1991). An axial binding site in the Tetrahymena precursor RNA. J. Mol. Biol. 222, 995–1012. Edited by D. E. Draper (Received 20 July 1994; accepted in revised form 20 January 1995)