Gutell 044.jmb.1995.248.0027

JMB—MS 440 Cust. Ref. No. PEW 84/94 [SGML]
J. Mol. Biol. (1995) 248, 27–43
Identification of Base-triples in RNA using
Comparative Sequence Analysis
Daniel Gautheret, Simon H. Damberger and Robin R. Gutell*
Comparative sequence analysis has proven to be a very efficient tool for theDepartment of Molecular
Cellular and Developmental determination of RNA secondary structure and certain tertiary interactions.
However, base-triples, an important RNA structural element, cannot beBiology, Campus Box 347
predicted accurately from sequence data. We show here that the poor baseUniversity of Colorado
Boulder, CO 80309-0347 correlations observed at base-triple positions are the result of two factors. (1)
Base covariation is not as strictly required in triples as it is in Watson–CrickU.S.A.
pairs. (2) Base-triple structures are less conserved among homologous
molecules. A particularity of known triple-helical regions is the presence of
multiple base correlations that do not reflect direct pairing. We suggest that
natural mutations in base-triples create structural changes that require
compensatory mutations in adjacent base-pairs and triples to maintain the
triple-helix conformation. On the basis of these observations, we devised two
new measures of association that significantly enhance the base-triple signal
in correlation studies. We evaluated correlations between base-pairs and
single stranded bases, and correlations between adjacent base-pairs.
Positions that score well in both analyses are the best triple candidates. This
procedure correctly identifies triples, or interactions very close to the
proposed triples, in type I and type II tRNAs and in the group I intron.
Keywords: RNA structure; comparative sequence analysis; base-triples*Corresponding author
Introduction
Base-triples are among the essential tertiary
interactions in RNA three-dimensional structure.
The best characterized RNA base-triples are those of
tRNA (Quigley & Rich, 1976; Sussman & Kim, 1976),
and there is also good evidence for base or nucleotide
triples in self-splicing group I introns, in which they
are required for enzymatic activity (Michel et al.,
1990). Base-triples involving a base-pair and a distant
single-stranded nucleotide create long-range con-
straints on RNA folding, and constitute powerful
assets for structure determination. The value of
base-triple information in modeling studies has been
clearly demonstrated in the case of group I introns
(Michel & Westhof, 1990; Jaeger et al., 1994), and
more benefits can be expected from the incorpor-
ation of base-triple information in computational
RNA folding procedures (Malhotra et al., 1990; Major
et al., 1993). The prediction of base-triples directly
from sequence information is therefore highly
desirable.
Certain base interactions, those constituting RNA
secondary structure, can be predicted accurately
from sequence data using comparative sequence
analysis, a method based on the principle that
evolution maintains a common structure through
compensatory mutations (reviewed by Gutell, 1993;
Woese & Pace, 1993). Compensatory mutations were
initially identified visually in relatively small
sequence alignments, resulting in the first reliable
secondary structure models (Gutell, 1993; Woese &
Pace, 1993). The simultaneous growth of sequence
databases and refinement of computational methods
have significantly enhanced our ability to derive
base–base interactions from sequence analysis
(Olsen, 1983; Gutell et al., 1985; Haselman et al., 1988;
Winker et al., 1990; Chiu & Kolodziejczak, 1991;
Gutell et al., 1992). Although methods have improved
sufficiently to identify correctly several tertiary
interactions in 16 S and 23 S rRNA (Gutell et al.,
1994), predicting base-triples with confidence
remains problematic. Only a few base-triples have
been suggested on the basis of comparative analysis
to date, in the early study of tRNA by Levitt (1969),
in rRNA (Gutell, et al., 1994) and in the group I intron
(Michel et al., 1990), where triples were experimen-
tally substantiated (Michel et al., 1990; Green &
Szostak, 1994).
Present address: D. Gautheret, Departement de
Biologie, Universite Aix-Marseille II, Faculte de Luminy,
13 000 Marseille, and J.G.S., C.N.R.S., 31 ch. Joseph
Aiguier, 13 402 Marseille Cedex 20, France.
0022–2836/95/160027–17 $08.00/0 7 1995 Academic Press Limited

JMB—MS 440
Identification of RNA Triples28
In spite of the scarcity of comparatively inferred
base-triples, these interactions are certainly wide-
spread, and therefore many remain to be discovered.
We have thus begun a detailed comparative
analysis of RNA triples to derive principles and
algorithms that can be applied to base-triple
prediction in different RNA molecules. The
availability of large sequence databases and of
several tRNA crystal structures now permits a more
thorough characterization of triple interactions. We
can now ask how base-triple structures vary in
related molecules, and how base sequences at and
around triples reflect these structural changes.
Principles derived from the analysis of tRNA and
group I intron triples can be incorporated into our
correlation analyses, and significantly enhance our
ability to predict base-triples from sets of aligned
sequences.
Characterization of Base-triples
Sequence correlations in the vicinity of
base-triples
Current comparative analysis methods detect
nucleotide interactions by measuring correlations
between pairs of RNA positions. This usually
involves the construction of contingency tables
containing the number of observations for each
base-pair at position i·j. Let no(Mi,Nj) be the
number of observations of base-pair M·N
(M,N $ 4A,U,G,C5) at position i·j. We compute the
number of bases M and N at positions i and j
(no(Mi) and no(Mj)) and the expected number
of observations for each M·N base pair:
ne(Mi,Nj) = no(Mi) × no(Nj). The difference be-
tween expected and observed values reflects the
dependence of the two positions. This difference can
be computed as follows (Olsen, 1983):
x2
= s
M,N
[no(Mi,Nj) − ne(Mi,Nj)]2
ne(Mi,Nj)
(1)
Mutual information is an alternative measure
of correlation that yields improved results in
the detection of RNA interactions (Chiu &
Kolodziejczak, 1991). It requires base frequencies
( fo(Mi,Nj), fo(Mi), fo(Nj)) to be used instead of
absolute numbers; it is computed as follows:
M(i,j) = s
M,N $fo(Mi,Nj) × ln
fo(Mi,Nj)
fo(Mi) × fo(Nj)% (2)
Mutual information accurately predicts the
secondary structure of tRNA, as well as the tertiary
pairs 15.48 and 26.44 (Chiu & Kolodziejczak, 1991;
Gutell et al., 1992). We present in Tables 1 and 2 the
M(i,j) values obtained in the base-triple regions of
tRNA and group I intron. For each position, the eight
highest correlations are shown (73 positions in tRNA
and 134 in the group I intron were analyzed). The
most significant correlations are at the top of each
column, and those corresponding to possible triples
are indicated by asterisks. The secondary structure
and tertiary interactions of yeast tRNAPhe
are shown
in Figure 1a. Base-triples involve positions 45·(10·25),
(12·23)·9 and (13·22)·46. The proposed group I intron
triples (Michel & Westhof, 1990) involve positions
(108·213)·259 and (109·212)·260 in the P4 stem and
(216·257)·105 and (215·258)·106 in the P6 stem. These
are shown on the intron secondary structure in
Figure 2.
The secondary structure correlations (10/25,
11/24, 12/23 and 13/22 in tRNA (see Table 1) and
108/213, 109/212, 215/258 and 216/257 in group I
(see Table 2)) are the highest at each helical position.
The correlations that follow Watson–Crick pairings
in Tables 1 and 2 are intriguing. Certain base-
triple positions correlate (23/9 and 22/46 in tRNA,
212/260 and 213/259 in the group I intron), but do
so more weakly than secondary pairs (compare, e.g.
23/12 and 23/9), and even more weakly than some
non-interacting positions. For example, in tRNA,
the value of correlation 23/9 (a base-triple) is lower
than that of 23/13 (non-interacting positions). The
Table 1
The eight best correlations (M(i,j) (Gutell et al., 1992) for tRNA positions 2, 9 to 13, 22 to 25 and 45 to 46 are evaluated
against all tRNA positions
tRNA positions
2 9 10 11 12 13 22 23 24 25 45 46
71a
0.90b
23 0.26* 25 0.08 24 0.78 23 0.99 22 0.33 13 0.33 12 0.99 11 0.78 10 0.08 46 0.12 13 0.31*
35 0.09 12 0.26* 45 0.06* 13. 0.29 13 0.30 46 0.31* 46 0.28* 13 0.28 13 0.28 24 0.06 13 0.11* 22 0.28*
31 0.06 13 0.12 64 0.04 36 0.18 9 0.26* 12 0.30 23 0.17 9 0.26* 36 0.16 11 0.06 22 0.08* 12 0.17
12 0.06 46 0.09 32 0.03 12 0.14 46 0.17 11 0.29 12 0.17 22 0.17 12 0.15 39 0.05 12 0.07 23 0.17
29 0.06 24 0.07 50 0.03 23 0.14 22 0.17 23 0.28 11 0.13 46 0.17 23 0.14 26 0.04 9 0.07 45 0.12
24 0.06 11 0.07 49 0.03 22 0.13 24 0.15 24 0.28 24 0.13 24 0.14 22 0.13 49 0.04 23 0.06 24 0.11
70 0.05 45 0.07 68 0.02 26 0.11 11 0.14 36 0.13 36 0.08 11 0.14 46 0.11 13 0.04 10 0.06* 35 0.10
41 0.05 22 0.06 5 0.02 46 0.10 26 0.09 9 0.12 45 0.08 1 0.08 26 0.11 65 0.04 36 0.06 11 0.10
Numbers in bold type denote correlations between nucleotides in close proximity in the 3D structure. Correlations corresponding
to a secondary structure base-pair are indicated by an asterisk, while base-triples in any of the type I tRNA crystal structures are
underlined.
a
tRNA position number, based on yeast Phe reference numbering. Base-triples for yeast Phe are: (10·25)·45, (12·23)·9 and (13·22)·46.
Alternative base triples found in other tRNA crystal structures are noted in Figure 1.
b
(M(i,j)) correlation value.

JMB—MS 440
Identification of RNA Triples 29
Table 2
The eight best correlations (M(i,j)) for group I intron positions 105 to 109, 212 to 213, 215 to 216 and 257 to 260 are evaluated
against the positions of the group I intron core (defined in Materials and Methods)
Group I intron positions
105 106 108 109 212 213 215 216 257 258 259 260
103a
0.39b
216 0.33 213 0.85 212 0.78 109 0.78 108 0.85 258 0.38 257 0.74 216 0.74 215 0.38 213 0.56* 212 0.47*
101 0.27 257 0.32 259 0.56* 108 0.51 108 0.49 259 0.56* 221 0.22 106 0.33 106 0.32 269 0.32 108 0.56* 109 0.43*
269 0.26 103 0.26 109 0.51 213 0.51 213 0.49 109 0.51 112 0.20 258 0.31 255 0.30 217 0.31 109 0.47 259 0.25
257 0.26* 101 0.22 212 0.49 259 0.47 260 0.47* 212 0.49 222 0.18 269 0.29 258 0.30 216 0.31 212 0.45 108 0.21
216 0.24* 105 0.21 278 0.23 260 0.43* 259 0.45 278 0.21 220 0.17 255 0.29 269 0.28 257 0.30 260 0.25 213 0.21
271 0.22 255 0.21 260 0.21 268 0.37 268 0.36 260 0.20 252 0.17 103 0.25 103 0.27 103 0.25 268 0.24 268 0.17
104 0.22 258 0.20* 96 0.21 307 0.28 307 0.28 268 0.20 208 0.17 105 0.24* 105 0.26* 255 0.25 284 0.20 258 0.14
255 0.21 217 0.18 268 0.20 256 0.21 256 0.21 96 0.19 218 0.16 101 0.21 101 0.22 222 0.24 278 0.18 253 0.13
Numbers in bold type denote correlations between nucleotides in close proximity in the 3D model of Michel & Westhof (1990).
Correlations corresponding to a secondary structure base-pair are underlined, while correlations corresponding to proposed base-triples
in the group I intron 3D model (Michel & Westhof, 1990) are denoted with an asterisk.
a
Group I intron position number based on T. thermophila reference numbering. Previously proposed base-triples are: (108·213)·259,
(109·212)·260, (215·258)·106 and (216·257)·105.
b
(M(i,j)) correlation value.
correlation 25/45 (a base-triple) is not within the
top eight correlates, ranking at number 31 in the
correlations involving position 25 (not shown).
Similar effects are observed in the group I introns
(Table 2).
A second important observation concerns the
network of correlations linking most nucleotides in
the vicinity of the tRNA base-triples. Significant
correlations between unpaired positions were
recorded earlier with smaller tRNA datasets (Olsen,
1983; Haselman et al., 1988), and in a more recent
study (Gutell et al., 1992). These ‘‘cross-correlations,’’
indicated by boldface numbers in Tables 1 and 2,
involve consecutive or non-interacting positions,
such as 11/12, 22/23, 9/46 and 9/12 in tRNA or
109/108, 109/259, 212/213 and 106/105 in group I
introns, spanning the entire triple-helical regions in
both RNAs. These correlations have values of the
same order of magnitude as the main secondary
structure correlations. This contrasts significantly
with what is usually observed in helical positions.
Typical Watson–Crick positions (see, e.g. tRNA
position 2 in Table 1, or Figure 3 of Gutell et al., 1992)
display a difference of one order of magnitude
between the first and second highest correlations,
and rarely show correlations with neighboring
positions. This analysis thus raises two questions
regarding base-triples. (1) Why do positions involved
in base-triples have correlation values that are lower
than secondary structure positions, and (2) why
would a triple-helical region display networked
correlations?
Why are sequence correlations weaker in
base-triples?
Base-triples do not demonstrate covariation as do
secondary base-pairs
Comparative analysis searches for a common
structure by identifying compensatory changes, or
covariation. This principle applies itself very well
to the detection of Watson–Crick pairs: in order
to preserve the Watson–Crick conformation,
mutations must occur in a compensatory fashion,
which results in four prominent sequence patterns
(A·U, U·A, G·C or C·G). Each base type in a position
is associated with a distinct base type in a second
position, and vice versa. Even when a significant
incidence of G·U or other non-canonical pairs is
observed, the existence of a secondary base-pair
usually remains unambiguous (Gutell et al., 1994).
Considering triple sequences in tRNA (Figure 3) and
group I introns (Figure 4), we find there is no strict
covariation between the secondary structure base-
pair and the third position. In the tRNA triple
(12·23)·9 (Figure 3b), there is covariation between the
sequences (U·A)·A and (G·C)·G, but this covariation
is obscured by the presence of several non-compen-
satory changes. For example, an A at position 9 is
associated with several different Watson–Crick pairs
at position 12·23. Similarly, a G·C pair at position
10·25 (Figure 3a) is associated with all four bases
at position 45. The other triples in tRNA and the
group I intron also display significant levels of
uncorrelated changes (Figures 3c and 4). These
observations lead us to ask why base-triples lack the
stricter patterns of covariation observed in secondary
structure base-pairs. This question can be answered,
at least in part, by an observation of base-triple
structures.
A perfect triple isomorphism is possible in the
absence of base covariation
The interaction of a Watson–Crick pair with a third
base occurs through different types of non-canonical
interactions, such as the Hoogsteen pairing. In
contrast to Watson–Crick pairs, these tertiary
interactions can retain an identical conformation
after a unilateral mutation. For example, the
Hoogsteen-like A9·A23 base-pair present in the

JMB—MS 440
(12·23)·9 triple of yeast tRNAPhe
can be converted to
a G9·A23 base-pair, which occurs in some tRNAs
(Figure 3b), while retaining the same conformation
(Figure 5a) (Klug et al., 1974). Among the multiple
non-canonical pairs that can be constructed with one
or two hydrogen bonds, there are several ways of
forming a unique conformation while modifying
either base in the pair. Numerous base-triple
conformations can thus be maintained through
non-compensatory mutations.
Base-triples vary in structure and position
The available tRNA crystal structures reveal more
structural heterogeneity in base-triples than in
secondary structure base-pairs. Figure 1 shows the
base-triples forming in four tRNA crystal structures.
The yeast tRNAPhe
base-triples (described above) are
shown in Figure 1a. Base-triples in E. coli tRNAMet
f
(Woo et al., 1980) and yeast tRNAMet
i (Basavappa &
Sigler, 1991) do not differ signiﬁcantly from those
of yeast tRNAPhe
(data not shown). However, in
yeast tRNAAsp
(Dumas et al., 1985) (Figure 1b), a
base–sugar interaction that formed between pos-
itions 14 and 21 in tRNAPhe
is converted into a
base–base interaction, creating a (8·14)·21 base-triple.
On the basis of the electron density map, there is no
evidence for the triples 45·(10·25) and (13·22)·46 in
the E. coli tRNAGln
complexed with its cognate
synthetase (Rould et al., 1989; V. Rath & T. A.
Figure 1. Tertiary base/base interactions in 4 tRNA crystal structures, mapped onto the yeast tRNAPhe
secondary
structure. Continuous lines, base-triples; broken lines, other tertiary base/base interactions. Sugar–phosphate backbone
interactions are not shown. a, Yeast tRNAPhe
; b, yeast tRNAAsp
; c, E. coli tRNAGln
; d, E. coli tRNASer
. Insertions (+) and
deletions (r) relative to: a, yeast tRNAPhe
; and b, yeast tRNAAsp
, r48; c, E. coli tRNAGln
, r17; d, E. coli tRNASer
, r17, +19a,
+20a, +47a, 47b, to 47q. AA, amino acceptor stem; TCC, TCC stem and loop; D, D-stem and loop; AC, anticodon stem
and loop; V, variable loop.

JMB—MS 440
Figure 2. Core secondary structure
and triple interactions in the T.
thermophila group I intron. Triples are
indicated by bold lines. Filled circles
denote triples and other positions
discussed in the text. The 2
putative triples, (220·253)·255 and
(110·211)·305 are shown by a thicker
broken line. The representation is
formatted as proposed in Cech et al.
1994.
Steitz, personal communication) (Figure 1c). Within
this same complex, the distance between the pair
12·23 and position 9 suggests that this triple also does
not form. Instead, a base-triple forms at positions
45·(13·22), resulting in a local conformation different
from that of tRNAPhe
(Figure 5b) (V. Rath & T. A.
Steitz, personal communication). Alignment errors
are an unlikely cause of this important difference,
since both E. coli tRNAGln
and yeast tRNAPhe
have a
variable loop of five nucleotides, and the bases
surrounding the variable triple are positioned
similarly in both three-dimensional structures.
Different triples also form in E. coli tRNASer
(GGA)
complexed with seryl-tRNA synthetase (Biou et al.,
1994) (Figure 3d). In this type II tRNA, all the
tRNAPhe
triples are absent, while other base-triples
form at positions (8·14)·21, 20·(15·48) and 9·(13·22)
(two insertions in the D-loop of this tRNA are given
the numbers 20a and 20b by Biou et al. (1994), while
they are numbered 19a and 20a in our alignment; the
triple noted 20a·(15·48) by these authors is thus
20·(15·48) here).
This comparison of tRNA structures is most
illuminating. Of the six available tRNA crystal
structures, four are different with respect to
their base-triples. Some of the observed variations
involve only small conformational changes (e.g.
the formation of the (8·14)·21 triple), and some
might result from the formation of the tRNA
synthetase complex (in tRNAGln
in particular),
but the fact remains that base-triples can form
differently even among tRNAs of the same
morphological family (e.g. type I tRNAs). There-
fore, even if triple sequences demonstrated covaria-
tion, this would not always involve the same pairs
of positions, and would therefore be poorly
detected. This is another important explanation for
the relatively low correlations observed at base-
triples.
Analysis of the network of correlations around
base-triples
We have shown that tRNA and group I introns
present networked sequence correlations in the
vicinity of base-triples. If these correlations, which
we will also refer to as ‘‘neighbor effects’’, are
specific to base-triples, they will constitute a useful
instrument for triple identification.
The previous observation of triples involving
different positions in different molecules could
explain some cross-correlations, for example 45/10
and 45/13, which could result from alternative
interactions in tRNAPhe
and tRNAGln
. However,
this does not explain correlations between
different pairs of the same helix (e.g. 11/12, 12/13,
12/22 and 22/23 in tRNA), which constitute
the majority of cross-correlations. From a closer
observation of structure variations in base-triples,
we explain below how these cross-correlations might
result from ‘‘compensatory’’ mutations involving not
only paired bases but also adjacent bases in a triple
region.
All the sequence combinations observed at a given
base-triple position cannot, in general, adopt an
identical triple conformation. For example, when the
third triple position changes from a pyrimidine to a

JMB—MS 440
Figure 3. Sequences observed at base-triple positions
in type I tRNA. Positions shown are those of yeast
tRNAPhe
triples. a, 45·(10·25) triple; b, (12·23)·9 triple; c,
(13·22)·46 triple. Only values greater than 5 are shown.
Numbers in bold type represent more than 10% of the
tRNAs.
The formation of a triple helix such as that in the
tRNA D-stem is a highly cooperative process
involving a complex network of ion binding, stacking
and van der Waals’ interactions (Bina-Stein & Stein,
1976; Holbrook et al., 1977). Therefore, a structural
change such as the one shown in Figure 5c might
adversely affect neighboring base-triples, and thus
‘‘compensatory’’ mutations may be required to
preserve the conformational or energetic properties
of the triple helix. In this case, a mutation in the third
base of an adjacent triple can be as appropriate
as further mutations in the same triple, since a
single change in the flanking position can directly
compensate for the backbone displacement. We
therefore propose that ‘‘compensatory’’ mutations in
a triple helix involve nucleotides in different stacking
planes as well as within the planes. Such mutations
could propagate through the triple helix and create
the multiple correlations observed. If this hypothesis
is confirmed, the presence of cross-correlations
would be indicative of triple helix formation.
An alternative explanation for the presence of
networked correlations could be the involvement of
the correlated positions in a common RNA identity
element. In other words, nucleotides of a base-triple
region could be selected as a whole in order to
maintain the specificity of the RNA with respect to
a certain biological process, such as interaction with
a specific protein, thus creating correlations between
non-interacting bases. Although a few identity
elements have been localized into the base-triple
region of tRNA (Pu¨tz et al., 1991; Smith & Yarus, 1989;
McClain, 1993a), we do not believe they are an
important source of networked correlations, for the
following reasons. First, cross-correlations are much
higher in the D-stem than in any other part of the
molecule (Gutell et al., 1992), although important
identity sites are present elsewhere (Hou &
Schimmel, 1989; McClain et al., 1991). Second,
cross-correlations in the group I intron are also
higher in the triple region (stems P4 and P6) than in
any other part of the molecule (see analysis below).
Finally, a recent experimental study (Hou, 1994)
demonstrated that mutations in the tRNA triples
(8·14)·21 and (13·22)·46 had major effects on the
structure of the 15·48 pair. This shows that large
physical constraints exist in this triple region that do
not result from tRNA identity. Although we cannot
exclude the possibility that identity elements
contribute to cross-correlations in base-triple re-
gions, there are better reasons for correlations to be
caused by base-triples or other complex folding
patterns.
We will now concentrate on two types of
cross-correlations. First, the interdependence of all
three bases in a triple produces correlations between
each position of the secondary structure base-pair
and the third base of the triple (see Tables 1 and 2).
Therefore, directly measuring the correlation be-
tween secondary structure base-pairs and single-
stranded bases (base to base-pair correlation) is
expected to produce a stronger signal than the usual
pairwise correlations. A second type of interesting
purine, it is not always possible to build a triple that
would accommodate the bulkier residue without
significantly displacing the sugar backbone of the
third nucleotide. In the (108·213)·259 triple in the
group I intron, most species have a pyrimidine at
position 259 (Figure 4a), and these can all be folded
into a conformation very similar to that shown for the
(C·G)·C triple in Figure 5c (Michel et al., 1990).
However, a change from (C·G)·C to (C·G)·G,
which occurs naturally, requires a relatively large
displacement of the sugar backbone of G259, as
shown in Figure 5c (Michel et al., 1990). We expect to
observe such variations in most RNA triples, since
both purines and pyrimidines generally occur as the
third residue (see Figures 3 and 4). Even when
sequence variations maintain a purine or a
pyrimidine as the third base, hydrogen bonding
constraints can prevent the conservation of an
identical structure. We thus expect to observe
widespread conformational variations of the type
shown in Figure 5c.

JMB—MS 440
Figure 4. Sequences (seqs) observed
at base-triple positions in group I
introns. Positions shown are those of
the suggested T. thermophila triples.
a, (108·213)·259; b, (109·212)·260;
c, (215·258)·106; d, (216·257)·105. Only
values greater than 2 are shown.
Numbers in bold type represent more
than 10% of the introns.
correlation occurs between adjacent base-pairs
(base-pair to base-pair correlation). Assuming this
type of correlation is characteristic of base-triples, its
identiﬁcation should also help predict triples. In the
following sections, we propose methods to quantify
these two types of correlations.
Figure 5. Alternative structures of
homologous base-triples in different
RNAs. a, Base-triple (12·23)·9 in yeast
tRNAPhe
, and a possible conformation
for the same triple after an A9 to
G9 mutation. b, Triples forming with
base-pair 13·22 in yeast tRNAPhe
(Quigley & Rich, 1976) and E. coli
RNAGln
(V. Rath & T. A. Steitz, personal
communication). c, Proposed structure
of group I intron triple (108·213)·259
forming with C259 or G259 (Michel
et al., 1990).

JMB—MS 440
Table 3
Base to base-pair correlation (x2
) and neighbor effects (N) in type I tRNAs (895 sequences)
Best Suggested cause of
correlates Pair x2
best nt x2 a
Nb
correlationc,d
+ 13·22 t 46 100 100 Triple
+ 12·23 t 9 76 64 Triple
31·39 t 36 69 20 Id (Yarus, 82)
3·70 t 35 61 16 Id tRNAAla
(Hou & Schimmel, 1989)
51·63 : 36 41 9
+ 11·24 : 36 41 57 Id tRNATrp
(Hisch, 1971)
13·22 9 45 40 100 Triple (E. coli tRNAGln
)
1·72 : 35 35 16 Id tRNAGln
(Rould et al., 1989)
+ 10·25 : 45 27 27 Triple (yeast tRNAPhe
, tRNAAsp
)
1·72 9 73 26 16 Id tRNAGln
(Rould et al., 1989)
15·48 : 35 24 NA Id tRNACys
(Hou et al., 1993)
27·43 : 36 22 14 Id tRNATrp
(Shultz & Yarus, 1994)
30·40 : 36 20 13 Id (Yarus, 1982)
Correlations are ranked by the x2
value. Only those correlates within 20% of the highest x2
value are
listed. The best correlates are identified with a plus (+) in the first column (see Table 7).
a
% of highest value.
b
Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no
neighbor in the secondary structure.
c
Id, identity element possibly responsible for the correlation. When several tRNAs have identity
elements matching the correlated positions, only one is cited as an example.
d
References are given for identity elements only.
Inferring Triple Interactions
Base to base-pair correlations: identification of
potential triples
Our goal here was to consider base-pairs as single
variables, and directly compute the correlations
between base-pairs and single-stranded positions.
These base to base-pair correlations can be evaluated
using a x2
test, replacing the usual 4 × 4 contingency
table with a 16 × 4 contingency table that compares
the four possible sequences for the single position
with the 16 possible sequences for the base-pair.
Such a table is similar to those in Figures 3 and 4. A
x2
test can be performed using equation (1).
However, large contingency tables increase the
probability of having empty or almost empty cells
that can strongly bias x2
values. To remedy this
problem, we subdivided the 16 × 4 table into several
sub-tables. This method is an alternative to that
proposed by Olsen (1983) to address the same prob-
lem. For each row M and column N in the original
16 × 4 table (T), we create a 2 × 2 table of the form:
T(M,N) Si = 1,4T(i,N) −
T(M,N)
Si = 1,4;j = 1,16T(i,j) −
Sj = 1,16T(M,j) − Si = 1,4T(i,N) −
T(M,N) Sj = 1,16T(M,j) +
T(M,N)
Values from table T are compressed in the new
2 × 2 table, so that one of the cells contains T(M,N),
two other cells contain the sums of the remaining
values of row M and column N, and the last cell
contains the sum of all remaining values in table T.
Such tables are generated for each value of M and N.
Values of x2
are then computed for all sub-tables,
except those having expected values smaller than 5
in any of their cells. The highest x2
value generated
is kept as the final correlation value.
To simulate an application of the method to
base-triple prediction, we computed only corre-
lations between known secondary structure base-
pairs and unpaired positions. In tRNA, tertiary
interactions predicted by pairwise comparative
analysis (15·48 and 26·44; Gutell et al., 1992) were also
included as base-pairs, so that triples involving these
pairs could be detected. An application of this
procedure to the type I tRNA alignment yields the
results shown in Table 3.
The following criteria were used to establish the
significance of correlations. Since x2
values do not
have an upper limit and vary with the number of
sequences considered, we did not use absolute x2
values, but the percentage of the highest value
encountered in the whole analysis. The highest
correlation observed in a given molecule thus takes
a value of 100. A cut off point for the significance of
correlations was then chosen empirically, based on
known base-triples. All known tRNA and group I
intron triples (see below) have a x2
value greater
than 25% of the highest value. We thus tentatively
considered correlations in this range as significant (a
cutoff of 20% is used in Table 3 in order to show
additional correlations, which will be discussed
below).
A second selection criterion was introduced to
treat base-pairs that had significant correlations
(>25%) with several single-stranded positions. To
solve this problem, we performed the correlation

JMB—MS 440
analysis in two directions: we sought the single-
stranded positions that best correlated with each
base-pair, and we sought the base-pairs that best
correlated with each single-stranded position. When
a base-pair and a single-stranded position are
mutually best correlates (hereafter termed ‘‘recipro-
cal correlates’’), they are indicated with a double
arrow in Table 3. When either the base-pair is the best
correlate of the single-stranded position, or the
single-stranded position is the best correlate of the
base-pair, the correlation is indicated with a single
arrow towards the best correlate. When neither the
base-pair nor the single-stranded position is the best
correlate, the correlation is not shown. This method
considerably reduces the number of correlations
shown for each position. These two criteria are used
in all subsequent analyses.
In type I tRNAs, the two triples (13·22)·46 and
(12·23)·9 can now be predicted with confidence. They
both display high and reciprocal correlations. The
best correlate of position 10·25 is 45, as expected from
triples forming in yeast tRNAPhe
. However, this
relationship is not reciprocal: the best correlate of 45
is 13·22, which rather reflects the E. coli tRNAGln
situation, where the triple involves positions
45·(13·22). Since there is a precedent for base 45 to
form triples with at least two different base-pairs,
correlations results like this are expected.
Other high and reciprocal correlations are
(31·39)/36 and (3·70)/35. Interestingly, these seem-
ingly false positives are not artifacts. The pair 3·70 is
an important identity element for the aminoacylation
of alanine tRNAs of several organisms (McClain &
Foss, 1988; Hou & Schimmel, 1989) and of various
other tRNAs (reviewed by McClain, 1993b), so it is
not surprising that it varies in concert with position
35, at the center of the anticodon, and therefore
necessarily associated with tRNA identity as well.
The (31·39)/36 correlation was identified by Yarus
(1982) in the ‘‘extended anticodon’’ hypothesis,
which states that several positions in the anticodon
stem and loop are selected as a block to confer on the
tRNA an optimal coding accuracy. It is also
noteworthy that 36 is the best correlate of pair 27·43,
which reflects a recent experimental association
between these two sites in the control of translation
by tRNATrp
(Schultz & Yarus, 1994). Other ‘‘false
positives’’ can be related to known tRNA identity
elements (see references in Table 3), suggesting that
this method is identifying biologically meaningful
associations.
The recent determination of a tRNASer
crystal
structure (Biou et al., 1994) provides us with valuable
base-triple information about type II tRNAs.
We applied the same correlation analysis to this class
of tRNAs to determine whether the two triples
forming in tRNASer
at positions 20·(15·48) and
9·(13·22) could be detected (there is not enough
variation in basepair 8·14, also involved in a triple,
to seek correlations involving this pair). Table 4
presents the highest base to base-pair correlations
observed in an alignment of 262 type II tRNAs
(comprising serine, leucine and certain tyrosine
tRNAs). In excellent agreement with the crystallo-
graphic data, the only ‘‘reciprocal correlates’’ in
Table 4 are observed at positions involved in
base-triples in E. coli tRNASer
. The triple 20·(15·48)
has the highest overall x2
value, and the triple
9·(13·22) has a x2
value above the threshold of
significance defined previously, albeit relatively low
Table 4
) and neighbor effects (N) in type II tRNAs
(262 sequences)
Best Suggested cause of
correlates Pair x2
best nt x2 a
Nb
correlationc
+ 15·48 t 20 100 NA Triple
15·48 9 21 94 NA Neighbor effect
15·48 9 59 91 NA
+ 12·23 : 21 90 64
12·23 9 15 86 64
12·23 9 48 82 64
12·23 9 20 79 64
15·48 9 35 57 NA Id? (as in tRNACys
)
3·70 : 35 37 16 Id? (as in tRNAGln
)
12·23 9 73 36 64
2·71 9 20 33 11
6·67 t 15 30 13
2·71 9 59 28 11
27·43 t 35 28 14 Id? (as in tRNATrp
)
27·43 9 36 28 14 Id? (as in tRNATrp
)
+ 13·22 t 9 27 100 Triple
6·67 9 37 26 13
value. Only those correlates within 25% of the
highest x2
value are listed. The best correlates are identified with a plus (+) in the first
column (see Table 7).
a
% of highest value.
b
Neighbor effects computed according to eqn (3) (% of highest value). NA,
base-pairs having no neighbor in the secondary structure.
c
See footnotes to Table 3.

JMB—MS 440
Figure 6. Sequences observed at positions forming
base-triples in E. coli tRNASer
. a, Triple (13·22)·9. b, Triple
(15·48)·20. Only values greater than 2 are shown. Numbers
in bold type represent more than 10% of the tRNAs.
results. These mutations are particularly interesting,
since they are probably not the result of a fortuitous
ancestral event, but instead they more likely reflect
‘‘neutral’’ changes between functionally equivalent
sequences. To help identify more base-pair and
base-triple interactions with comparative analysis,
we need to determine the number of times these
concerted mutations have occurred throughout the
evolution of the RNA under study. The larger the
number of such phylogenetic events, (e.g. concerted
mutations over evolutionary space), the more
significant that correlation is, and thus the more
confident we are that the positions of interest are
physically interacting. This general concept was
introduced a number of years ago, and was utilized
to reinforce our case for some of the first proposed
base–base tertiary interactions in 16 S rRNA (Gutell
et al., 1985). This type of observation, essential in
correlation analyses, requires knowledge of the
phylogenetic relationships among the sequences
under study. For tRNA, these relationships are
unclear. On one hand, all tRNAs interact with the
ribosome and its factors; thus, they are all under this
common constraint; changes in their sequence in the
evolutionary dimension will be neutral. On the other
hand, tRNA sequences within each acceptor family
are constrained by a specific synthetase recognition
function. Thus, tRNAs have at least two mutational
dimensions, which obscure their phylogenetic his-
tory (for a more detailed assessment of this issue,
please see: Ninio, 1982; Cedergren et al., 1981). In
contrast, a molecule such as the 16 S rRNA has the
same function in all organisms. Its phylogeny, and the
phylogeny of the cells in which these 16 S rRNAs
exist, is well defined (Woese, 1987). Thus, compara-
tive studies can determine with more confidence the
number and nature of the concerted mutations that
have occurred throughout a phylogenetic tree,
allowing us to pinpoint mutations that occurred
between closely related RNAs. These changes are
the most likely to be ‘‘neutral’’.
Group I introns are mobile elements with a fast
evolutionary clock, and therefore their phylogeny
cannot be defined as well as that of rRNAs. However,
there is no known variety of functions in group I
introns that would impede the construction of a tree
as it does for tRNAs. Since a consistent (although
imperfect) classification of group I introns is
available (Michel & Westhof, 1990), we can search for
significant phylogenetic events more rigorously than
we did for tRNAs. In this section, we implement a
simple method to count mutations, and use its results
to strengthen base-triple prediction in the group I
introns.
Our phylogenetic event counting was performed as
follows. Group I intron sequences in the alignment
were classified into phylogenetic groups as de-
scribed in Materials and Methods. For each potential
triple position (i·j)·k (i and j being base-paired
and k single-stranded in the secondary structure),
changes are counted as aligned sequences, are
examined from the first sequence to the last; c equals
the number of times a change is observed at i,j or k
(27% of the highest value). No significant corre-
lations are detected for canonical type I tRNA triples,
in agreement with crystal and solution studies,
which suggest that these triples are absent in type II
tRNAs (Biou et al., 1994; Dock-Bregeon et al., 1989;
Dietrich et al., 1990; Baron et al., 1993).
The sequences observed at position (15·48)·20 and
9·(13·22) in the type II tRNA dataset are shown in
Figure 6. (The base-triples in yeast tRNASer
are
(G15·C48)·U20 and G9·(G13·A22).) The correlation
(15·48)/20 is due primarily to an association of A20
with A15·U48, and an U or C at position 20 with
G15·C48 (Figure 6a). Analysis of the alignment
reveals these three principal sequences are present in
all type II isoacceptor groups (data not shown),
suggesting that concerted changes have occurred
several separate times through evolution. The
significance of a correlation is considerably increased
when multiple concerted changes are observed
independently, as they are here (Gutell et al., 1985).
The correlation (13·22)/9 is primarily due to an
association between A9 and A13·A22 (Figure 6b).
Although the correlation is relatively weak, con-
certed changes yielding sequence A9·(A13·A22)
occur in all type II isoacceptor groups, and even
among isoacceptor tRNAs from the same organism
(data not shown). This again indicates that the
correlation is very significant.
Phylogenetic Event Counting
In the previous section, we mentioned a few
concerted mutations occurring in closely related
tRNAs to help support some of our correlation

JMB—MS 440
and e equals the number of times a change is
observed at (i or j) and k (i.e. a concerted change
between the base-pair and the third position). The
ratio e/c is the proportion of mutual changes over the
total number of changes, and is our measure of
phylogenetic events. As noted earlier, the detail we
can decipher with correlation analysis is enhanced
by incorporating phylogenetic event information into
our algorithm. For this paper we have not sought a
complete solution, since that would entail a better
appreciation of the phylogenetic relationships of the
RNAs under study, and better knowledge of how to
value mutual changes that occur between distantly
and closely related organisms. For the purposes of
this article we have developed a simple method
which assumes that the sequences are roughly
ordered by their phylogenetic relationships, and
treats all mutual changes as equivalent. Therefore, a
large number of mutual changes within closely
related RNA sequences will increase the e/c value
more than a few mutual changes between distantly
related RNA species. For our immediate needs, this
approximation works well, as we will see.
Results of the combined x2
and e/c analysis of
group I intron sequences are presented in Table 5.
Here the x2
analysis was performed first, followed by
an e/c analysis for each significant x2
base-pair/base
correlate. An asterisk in Table 5 denotes those triple
correlations that score the highest with the e/c
analysis.
Among the three highest x2
reciprocal correlates
are (109·212)/260 and (108·213)/259, corresponding
to the two proposed base-triples in the P4 stem
(Michel et al., 1990). Both of these triples are also
strongly supported by our phylogenetic counting
method. However, the proposed P6 stem triples
(215·258)·106 and (216·257)·105 are not accurately
predicted. Of these two previously proposed triples,
position 105 correlates best with the pair 216·257
with x2
analysis, and the e/c analysis associates the
pair 216·257 with position 105. The highest x2
correlations for the P6 base-pairs are (216·257)/106 (a
reciprocal correlate) and (215·258)/103. The
(215·258)/106 correlation is in the significant range
(38% of the highest value), but it is not shown in
Table 5 because better correlations involving
positions 106 and 215·258 exist (see above). There are
several possible explanations for these apparent
inaccuracies. First, the P3/P4 junction (positions
103 to 106) generally varies in size from three to
five bases, and there are a few examples of an
insertion of several hundred bases. Thus, the
sequences in this region cannot be aligned with
absolute confidence. Until more sequence infor-
mation suggests otherwise, we have justified the two
unpaired 3' P3/P4 nucleotides toward the P4 stem,
Table 5
) and neighbor effects (N) in group I introns (222 seqs)
Best x2
+ e/c Suggested cause
correlates Pair besta
nt x2 b
Nc
Sequences of correlationd
+ 109·212 * t * 260 100.0 67 222 Triple
+ 262·312 * t * 263 97.4 NA 222 Triple?
+ 108·213 * t * 259 95.8 75 222 Triplee
216·257 t 106 61.2 100 221 Neighbor effect
+ 110·211 * t * 305 56.9 36 222 Triple?f
280·298 t 279 52.9 39 210
268·307 : 279 49.3 2 215
+ 216·257 9 * 105 48.7 100 222 Triple
215·258 t 103 46.0 100 220 Neighbor effect
268·307 9 256 40.3 2 182
97·277 : 279 40.1 23 183
285·293 : 256 38.2 30 161
107·214 : 260 36.4 78 222 Neighbor effect
215·258 9 269 35.4 100 187
+ 220·253 * t 255 33.0 15 161 Triple?
216·257 9 101 32.3 100 221
215·258 9 217 29.6 100 119 Neighbor effect
111·209 : * 305 29.4 43 222 Neighbor effect?
109·212 9 304 29.2 67 222 Neighbor effect?
102·272 : 263 27.8 NA 221
286·292 : 300 26.9 18 150
102·272 9 270 25.2 NA 189
value. Only those correlates within 25% of the highest x2
value are
listed. The best correlates are identified with a plus in the first column (see Table 7).
a
x2
best correlate noted with arrows; best e/c ratio noted with *.
b
% of highest value.
c
Neighbor effects computed according to eqn (3) (% of highest value). NA, base-pairs having no
neighbor in the secondary structure.
d
References in text.
e
The best base e/c correlates for the (108·213) base-pair is shared by positions 259 and 302.
f
The best base-pair e/c correlates for position 305 are shared by (110·211) and (97·277). The alignment
in the vicinity of the (97·277) base-pair is questionable due to length variation of the P3 helix. Thus we
believe the best correlation is between (110·211) and 305.

JMB—MS 440
Figure 7. Sequences observed at
various correlating group I intron
positions. a, (262·312)/263. b,
(110·211)/305. c, (280·298)/279. Only
values greater than 2 are shown.
Numbers in bold type represent
more than 10% of the introns.
while the other P3/P4 nucleotides are justified
toward the P3 helix. Alternative decisions could have
produced significantly different correlations for
nucleotides 103 to 106. Other potential problems in
the identification of P6 triples were raised by a recent
NMR study (Chastain & Tinoco, 1993), which
suggested that P6 triples involved base/sugar
interactions, and varied significantly in structure
upon sequence change. In contrast to base/base
interactions, base/sugar interactions could produce
sequence constraints in which correlations between
adjacent bases become predominant, thus possibly
explaining these unexpected correlations. Finally, it is
also possible that very high neighbor effects could
relegate the actual triple correlations to second
position.
Other reciprocal correlates having high x2
values
are present. One involves positions 262·312 and 263.
This triple correlation is also strongly identified with
our phylogenetic event-based method (see Table 5).
Figure 7a shows the sequences observed at
(262·312)·263. C263 is associated with A262·U312,
while A263 is associated with G262·C312 or
C262·G312. Our observations of the alignment
reveals several concerted mutations occurring among
closely related introns, particularly within sub-
groups IB1 and IC1 (data not shown). The base-pair
262·312 is itself well supported, with multiple
independent covariations observed (data not shown).
This strong correlation can be interpreted in various
ways. On the basis of a three-dimensional modeling
study of the intron guanosine binding site (Yarus
et al., 1991), it was proposed that these three
nucleotides form a base-triple. Alternatively, it has
been suggested, from experimental mutagenesis
studies, that this sequence constraint is necessary
to ensure that the nucleotide at position 263 is
bulged out of this helix, and is not base-paired
to position 312 (Couture et al., 1990). Note that
when position 263 is a C, the 262·312 base-pair is
an A·U or U·A. When 263 is an A, 262·312 is a G·C
or C·G. Thus, position 263 is not able to form a
standard Watson–Crick pair with 312. This hypoth-
esis suggests that the triplets (U·A)A and (G·C)C
should also be found, which has not been the case to
date. We favor the suggestion by Yarus that a
base-triple interaction forms between these pos-
itions.
A second correlation, (110·211)/305, is supported
by the e/c study and a high x2
reciprocal correlation.
This correlation is particularly interesting, since it
involves nucleotides spanning two distant
domains of the group I intron, namely the P3/P7 and
P4/P6 coaxial stems. The correlation results
primarily from an exchange between the sequence
patterns (A·U)·C and (G·C)·U. This correlation,
unlike many of the others, occurs in its purest form
in the subgroups 1A and 1D (Figure 7b), although
covariation between these triplets is found in the
other subgroups, albeit intermixed with non-con-
verted variations (data not shown). The correlation
(110·211)/305 was identified previously using a
smaller dataset (Michel & Westhof, 1990) but was
disregarded on the ground of steric conflicts with the
P4 triples. However, more recent experimental data
(Pyle et al., 1992) have suggested interactions
between the P1 stem and the J7/8 strand that shift
J7/8 towards the P4 stem, and thus reduce the
distance between nucleotides 110·211 and 305.
Adjusting the current three-dimensional model to
take these new data into account could suggest
alternative ways to form a (110·211)/305 interaction,
and perhaps resolve the steric conflicts. In addition,
two other correlations in Table 5, (111·209)/305 and
(109·212)/304, resemble the neighbor effects that
could be expected in the presence of a (110·211)·305
triple.
The other reciprocal correlations in Table 5 are
(280·298)/279 and (220·253)/255. The first is not
supported by the e/c analysis, and thus we do not
consider it a credible triple candidate. The other
reciprocal correlation, (220·253)/255, is supported

JMB—MS 440
by a significant number of coordinated changes,
primarily in subgroups IC1 and IC2. The number of
nucleotides between positions 253 and 257 is
variable. Thus it is difficult to align these unpaired
nucleotides across all the subgroups with much
confidence. However, within the IC1 subgroup this
number is three in almost all cases, while it is always
five in the IC2 subgroups, allowing us to obtain a
reliable local alignment for these two groups. The
sequences observed in these two subgroups are
shown in Figure 7c. Formation of a (220·253)·255
triple is feasible stereochemically, nucleotide 255
being situated in the internal loop flanking the
220·253 base-pair.
The combined e/c and x2
analysis has identified
three additional base-triple candidates in the group
I introns, namely (262·312)·263, (110·211)·305 in the
ID and IA subgroups, and (220·253)·255 in the IC1
and IC2 subgroups.
Base-pair to base-pair correlations:
identification of neighbor effects
The identification of base-triples requires the
ability to distinguish between correlations due to
physical interactions and those due to other factors,
such as RNA identity or accidental evolutionary
events. We have suggested that networked sequence
correlations are characteristic of triple-helix for-
mation. We now propose to use this property to help
distinguish base-triples (at least when present in
triple helices) from other correlated positions.
A simple method to assess neighbor effects is to
directly measure correlations between base-pairs.
For this purpose, we perform a x2
test as done in the
previous analysis, the only difference being a
contingency table having 16 rows and 16 columns
(instead of 16 × 4). The sparseness problem is again
resolved here by creating smaller 2 × 2 tables,
computing x2
in each table, and retaining the highest
value. A simple measure of the neighbor effect, N,
could then involve computing x2
for each set
of adjacent base-pairs (i,j) and (i + 1,j − 1):
N = x2
(i,j,i + 1,j − 1). However, since sequence corre-
lations also occur between positions separated by
several base-pairs in the same helical stem (Tables 1
and 2), the neighbor effect N at base-pair i,j can be
more accurately measured by averaging correlations
in a window comprising n base-pairs at each side of
i,j, using the following formula:
N(i,j) =
s
k = 1, n
(x2
(i,j,i + k,j − k) + x2
(i,j,i − k,j + k))
2n
(3)
If i 2 n or j 2 n is not a paired position,
the corresponding correlation is not computed,
and n is corrected accordingly. We use n = 2, and
thus evaluate a window of five base-pairs
(from i − 2 to i + 2) surrounding i,j. Figure 8
shows results obtained for tRNA and the group I
intron.
Figure 8. Neighbor effects measured in equation (3). The density of the dots is proportional to N(i,j), darker dots
representing the highest values and lighter dots the lowest values. Precise N(i,j) values for base-pairs of interest are given
in Tables 3 to 5. a, Type I tRNA. b, Group I intron.
(b)
(a)

JMB—MS 440
Table 6
Sequences observed at group I intron positions (109·212)
and (108·213)
108·213 : Neighbor effects (N)
109·212 A·U U·A C·G G·C
A·U — — 26 —
C·G 5 7 6 122
G·C — — 36 —
U·G — — — 5
Only values greater than 2 are shown.
—Numbers in bold face represent more than 10% of the group I
intron sequences.
Combining analyses for base-triple prediction
The various analyses presented here can be
combined into a single protocol for base-triple
prediction. The criteria we propose to apply in this
protocol remain loose at this stage of our work, but
will be refined as the method is applied to other
classes of RNA. These criteria are presented here.
First, we believe good triple candidates should
score well in both base to base-pair correlations (x2
and e/c) and neighbor effect analysis. A cutoff of
25% of the highest value for x2
and neighbor effect
measurements would retain all experimentally
proven triples in tRNA and group I introns. We
therefore require that values for x2
and neighbor
effects N (given in Tables 3 to 5) stand above this
threshold. A measure of phylogenetic events (e/c)
being available for group I introns, we require that
triple correlations in the group I intron are associated
to a significant level of concerted mutations (at least
one asterisk in Table 5). Finally, to tighten the
prediction criteria, we require x2
correlations to be
reciprocal. The triplets that best satisfy this stringent
criteria are revealed in the first row in Table 7.
This stringent criterion yields no false positives in
either tRNA family. In type II tRNA, the triple
(13·29)·9 is predicted, but a question remains for the
triple (15·48)·20. We cannot use equation (3) to
compute the neighbor effect associated with this
triple, since no secondary base-pair flanks the
15·48 pair. However, the strong correlation
observed in Table 4 between 15·48 and 21 could very
well be a neighbor effect. Thus, we tentatively
include this triple in Table 7. In type I tRNA, two of
the three yeast tRNAPhe
base-triples are predicted,
although 45·(10·25) is not. In group I introns,
the previously identified P4 triples are predicted,
along with one experimentally unproven interaction,
(110·211)·305. Two triple candidates with
In tRNA and group I introns, helices associated
with base-triples show significantly larger
neighbor effects (N, measured as in eqn (3)) than
those helices with no known base-triples. To
illustrate these strong base-pair to base-pair
correlations, we show in Table 6 the sequences
observed in group I introns at positions 109·212 and
108·213. The base-pair G108·C213 is strongly
associated with a C·G at position 109·212, while
C108·G213 is associated with A·U or G·C at position
109·212.
In group I introns (Figure 8b), neighbor effects
are consistent with triple formation in the P4/
P6 helices, and are also significant at positions
110·211, a base-pair having a potential triple
partner (Table 5). However, no significant neighbor
effect supports the strong triple correlations
(262·312)/263 and (220·253)/255. In spite of
this result, we still support the formation of
base-triples at these positions, since these triples
would not be part of an extended triple-helical
region, which we proposed was necessary for the
base-pairs to have noticeable neighbor effects. Also,
base-pairs near 262·312 in P7 are extremely
conserved, and thus limit any base correlation in this
region.
Table 7
Triples predicted in tRNA and group I introns based on Tables 3 to 5, using two different
criteria
Criteria for tRNA tRNA
triple prediction type I type II Group I introna
Stringent
x2
(base to base-pair) > 25% of (13·22)·46 (13·22)·9 (109·212)·260
highest value (12·23)·9 (15·48)·20b
(108·213)·259
N > 25% of highest value (110·211)·305
Best reciprocal correlate —
(262·312)·263
(220·253)·255c
Relaxed
x2
(base to base-pair) > 25% of Same + Same + Same +
highest value (11·24)·36 (12·23)·21 (216·257)·105
N > 25% of highest value (10·25)·45
Each position involved in only
one triple (not necessarily best
reciprocal correlate)
a
For group I intron triples, we use the phylogenetic event count as an additional criterion. Only
putative triples associated with an asterisk in Table 5 are included.
b
N cannot be measured for this position, but there is a large cross-correlation at (15·48)/21.
c
These 2 putative triples are not supported by neighbor effects, but are best reciprocal correlates and
associated with significant phylogenetic events (see discussion in text).

JMB—MS 440
neighbor effects below the 25% threshold,
(262·312)·263 and (220·253)·255, are noteworthy, since
they satisfy all of our other requirements. While the
other group I intron triples would be complexed in
a triple-helix formation, these two putative triples are
both isolated from other known base-triples;
therefore, they would not be part of a triple helix.
Further study is required to determine if this is the
reason for their lack of neighbor effects. Until we have
the results from this study, the biologist’s judgement
is still necessary to resolve these ‘‘border-line’’ cases.
The possible existence of the triple (110·221)·305 has
been discussed.
The prediction criteria were relaxed by allowing
for non-reciprocal correlations, under the condition
that no base-pair or single-stranded nucleotide
belongs to more than one triple (Table 7, line 2). For
type I tRNAs, the triple 45·(10·25) is now predicted.
The relaxed criteria also identify the correlation
(11·24)/36. We suggest that this unique false positive
results from a functional linkage between positions
24 and 36, on the basis of experiments establishing
that mutations at position 24 affect codon/anticodon
recognition by tRNATrp
(Hirsh, 1971; Smith & Yarus,
1989). In type II tRNAs, the relaxed criteria identify
the correlation (12·23)/21. Instead of interacting with
the pair 12·23, as this correlation suggests, nucleotide
21 faces the pair 8·14 in the type II tRNASer
crystal
structure, and is proposed to interact with or face
pair 8·14 in other type II tRNA solution structures
(Dock-Bregeon et al., 1989; Baron et al., 1993).
However, since bases 12·23 and 21 are close in space,
we cannot rigorously exclude their interaction in
certain type II tRNAs.
In group I introns, the relaxed criterion identifies
the triple (216·257)·105, one of the previously
proposed P6 triples (Michel & Westhof, 1990).
Conclusion and Perspectives
Our previous correlation analyses sought corre-
lations that occur between two positions in an RNA
alignment (Gutell et al., 1992). While these analyses
effectively predicted secondary structure pairing, we
had difficulty identifying base-triples with confi-
dence. We suggest here two reasons for this
weakness. First, structurally similar base-triples can
form between bases that vary in a non-compensatory
fashion, which reduces covariation. Second, base-
triples do not necessarily involve the same positions
in all members of an RNA family.
With these obstacles in mind, we have developed
methods to enhance our ability to predict base-
triples by specifically seeking correlations between
secondary structure base-pairs and nucleotides
unpaired in the secondary structure. This signifi-
cantly enhances correlations for base-triples. During
our earlier studies, we also identified weaker
correlations between many of the bases in the tRNA
D-stem. We suggested that these effects could be
specific to base-triples forming local triple helices.
We developed an algorithm that quantifies these
neighbor effects in RNA secondary helices. The most
pronounced effects in tRNA were in the D-helix,
while in the group I intron they were in the P4 and
P6 helices, the same helices known to be involved in
triple formation. The combination of these two
correlation analyses identifies known base-triples
more effectively than any previous method.
The accuracy of current protocols is limited by
heterogeneity within the sequence datasets. Base-
triple prediction will remain ambiguous as long as
the dataset analyzed contains RNAs that form triples
in different positions. For example, we are currently
unable simultaneously to predict triples (13·22)·46
and 45·(13·22) in type I tRNAs, since they both
occur in the analyzed sequences. It should be
possible to isolate subsets of sequences displaying
specific correlations, and enhance predictions in
each subset. The growth of RNA databases, and
the availability of the algorithms presented herein,
will certainly lead us in that direction. Another
enhancement would be to combine the various
prediction criteria introduced in this study into an
automated protocol. An integration of x2
correlation
values and phylogenetic event counts would be
particularly useful in RNAs with well established
phylogenetic relationships, such as the ribosomal
RNAs.
Materials and Methods
Sequence alignments
The tRNA sequence alignment used was adapted from
Sprinzl et al. (1991). We aligned the variable loop (which
was not aligned in the original database), and removed
mitochondrial sequences, leaving 895 type I and 263 type
II nuclear tRNAs, which were analyzed separately. The
group I intron alignment contains 222 sequences compiled
by S. H. Damberger and R. R. Gutell (unpublished results).
Analyses were performed only on the core region
comprising the stems P1, P3, P4, P6, P6a, P7, P8, a part of
P5 and all intervening single-stranded segments. Intron
sequences were classified into structurally distinct
subgroups (IA, IB, IC and ID) according to the definitions
of Michel & Westhof (1990). We further subdivided each
subgroup using these criteria: (1) the sequences within
each subgroup were ordered by the type of gene in which
the intron was found (e.g. ATP9, SSU rRNA, etc.). (2)
The specific site in that gene where the intron was found
(e.g. SSU site 531). (3) Cellular location (e.g. nucleus,
mitochondrion, chloroplast) of the intron. (4) A rough
phylogenetic ordering of the organisms.
Structural data
Detailed base-triple information is available for six
tRNA crystal structures: yeast tRNAPhe
(Quigley & Rich,
1976; Sussman & Kim, 1976), Escherichia coli tRNAMet
f (Woo
et al., 1980), yeast tRNAAsp
(Dumas et al., 1985), E. coli
tRNAGln
(Rould et al., 1989), yeast tRNAMet
i (Basavappa &
Sigler, 1991) and Tetrahymena thermophilus tRNASer
2 (GGA)
(Biou et al., 1994). Although no crystal structures are
available for group I introns, it has been suggested that
triples form in the P4 and P6 helices (Michel et al., 1990;
Michel & Westhof, 1990). The existence of both P4 triples
and one of the proposed P6 triples is supported by

JMB—MS 440
mutagenesis experiments (Michel et al., 1990; Green &
Szostak, 1994). There is good evidence for the formation of
base–base interactions in the P4 triples, but the nature of
the interactions in the P6 triples remains unclear. NMR
experiments on a model oligonucleotide that partially
reproduced the P4/P6 domain suggested that triple
interactions exist in the form of base–backbone contacts
(Chastain & Tinoco, 1993). However, the applicability of
these latter results in the group I intron context is uncertain,
given that important parts of the P4/P6 triple domain are
absent from the construct.
Programs
Sequence alignments were visualized and manipulated
using the alignment editor AE2 (T. Macke, The Scripps
Clinic, CA) available from the Ribosomal Database Project
(Larsen et al., 1993), and studied using a comparative
sequence analysis program developed in our laboratory
(S. H. Damberger, D. Gautheret & R. R. Gutell,
unpublished results). This software computes frequencies
of bases, base-pairs and base-triples, performs pairwise
correlation analyses using mutual information (Chiu &
Kolodziejczak, 1991; Gutell et al., 1992), and computes
various types of correlations based on x2
tests and
phylogenetic event counting, as discussed above. Sec-
ondary structure graphics were produced using the
program XRNA (B. Weiser & H. Noller, unpublished
results).
Notation
We adopted the notation (X·Y)·Z to describe a triple
interaction involving the secondary base-pair X·Y and
position Z, where Z interacts with Y; and we use Z·(X·Y)
when Z interacts with X. When interacting nucleotides are
not well established, as in the group I intron, we always use
the notation (X·Y)·Z. We use the term ‘‘base-triple’’ when
only the bases interact, ‘‘nucleotide-triple’’ when base–
backbone contacts are involved, and simply ‘‘triple’’ as the
general term. Correlations between positions X and Y are
noted X/Y. The numbering systems used are those of yeast
tRNAPhe
and the T. thermophila group I intron.
Acknowledgements
This work was supported by grants from the NIH
(GM48207) and the Colorado RNA Center to R.R.G. We
thank SUN Microsystems for their donation of computer
equipment, and the W. M. Keck Foundation for its support
of RNA Science on the Boulder campus. We also thank Dr
T. Cech for comments on the manuscript, and Drs V. Rath
and T. Steitz for sharing information on the tRNAGln
structure.
References
Baron, C., Westhof, E., Bo¨ck, A. & Giege´, R. (1993). Solution
structure of selenocysteine-inserting tRNASec
from
Escherichia coli. J. Mol. Biol. 231, 274–292.
Basavappa, R. & Sigler, P. B. (1991). The 3 A˚ crystal
structure of yeast initiator tRNA: functional impli-
cations in initiator/elongator discrimination. EMBO J.
10, 3105–3111.
Bina-Stein, M. & Stein, A. (1976). Allosteric interpretations
of the Mg2 +
binding to the denaturable Escherichia coli
tRNAGlu
2 . Biochemistry, 15, 3912–3917.
Biou, V., Yaremchuk, A., Tukalo, M. & Cusack, S. (1994).
The 2.9 A˚ crystal structure of T. thermophylus
seryl-tRNA synthetase complexed with tRNASer
.
Science, 263, 1404–1410.
Cech, T. R., Damberger, S. D. & Gutell, R. R. (1994).
Representation of the secondary and tertiary structure
of group I introns. Nature Struc. Biol. 1, 273–280.
Cedergren, R. J., LaRue, B. & Grosjean, H. (1981). The
evolving tRNA molecule. CRC Crit. Rev. Biochem. 11,
35–104.
Chastain, M. & Tinoco, I., Jr (1993). Nucleoside triples
from the group I intron. Biochemistry, 32, 14220–14228.
Chiu, D. K. Y. & Kolodziejczak, T. (1991). Inferring
consensus structure from nucleic acid sequences.
Comp. Appl. Biosci. 7, 347–342.
Couture, S., Ellington, A. D., Gerber, A. S., Cherry, J. M.,
Doudna, J. A., Green, R., Hanna, M., Pace, U.,
Rajagopal, J. & Szostak, J. W. (1990). Mutational
analysis of conserved nucleotides in a self-splicing
group I intron. J. Mol. Biol. 215, 345–358.
Dietrich, A., Romby, P., Mare´chal-Drouard, L., Guillemaut,
P. & Giege´, R. (1990). Solution conformation of several
free tRNALeu
species from bean, yeast and Escherichia
coli, and interaction of these tRNAs with bean
cytoplasmic leucyl-tRNA synthetase. A phosphate
alkylation study with ethylnitrosourea. Nucl. Acids
Res. 18, 2589–2597.
Dock-Bregeon, A. C., Westhof, E., Giege´, R. & Moras, D.
(1989). Solution structure of a tRNA with a large
variable region: yeast tRNASer
. J. Mol. Biol. 206,
707–722.
Dumas, P., Ebel, J. P., Giege´, R., Moras, D., Thierry, J. C. &
Westhof, E. (1985). Crystal structure of yeast tRNAAsp
:
atomic coordinates. Biochimie, 67, 597–606.
Green, R. & Szostak, J. W. (1994). In vitro genetic analysis
of the hinge region between helical elements P5-P4-P6
and P7-P3-P8 in the sunY group I self-splicing intron.
J. Mol. Biol. 235, 140–155.
Gutell, R. R. (1993). Comparative studies of RNA: inferring
higher-order structure from patterns of sequence
variation. Curr. Opin. Struct. Biol. 3, 313–322.
Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F. (1985).
Comparative anatomy of 16S-like ribosomal RNA.
Progr. Nucl. Acid. Res. 32, 155–216.
Gutell, R. R., Power, A., Hertz, G. Z., Putz, E. J. & Stormo,
G. D. (1992). Identifying constraints on the higher-
order structure of RNA: continued development
and application of comparative sequence analysis
methods. Nucl. Acids Res. 20, 5785–5795.
Gutell, R. R., Larsen, N. & Woese, C. R. (1994). Lessons
from an evolving rRNA: 16S and 23S rRNA structures
from a comparative perspective. Microbiol. Rev. 58,
10–26.
Haselman, T., Chappelear, J. E. & Fox, G. E. (1988). Fidelity
of secondary and tertiary interactions in tRNA. Nucl.
Acids Res. 16, 5673–5684.
Hirsh, D. (1971). Tryptophan transfer RNA as the UGA
suppressor. J. Mol. Biol. 58, 439–458.
Holbrook, S. R., Warrant, R. W., Church, G. M. & Kim, S. H.
(1977). RNA–ligand interactions: [I] Magnesium
binding sites in yeast tRNAPhe
. Nucl. Acids Res. 4,
2811–2820.
Hou, Y. M. (1994). Structural elements that contribute to an
unusual tertiary interaction in a transfer RNA.
Biochemistry, 33, 4677–4681.
Hou, Y. M. & Schimmel, P. (1989). Evidence that a
major determinant for the identity of a transfer
RNA is conserved in evolution. Biochemistry, 28,
6800–6804.

Hou, Y. M., Westhof, E. & Giege, R. (1993). An unusual
RNA tertiary interaction has a role for the specific
aminoacylation of a transfer RNA. Proc. Nat. Acad. Sci.,
U.S.A. 90, 6776–6780.
Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement of
a GNRA tetraloop in long-range RNA tertiary
interactions. J. Mol. Biol. 236, 1271–1276.
Klug, A., Ladner, J. & Robertus, J. D. (1974). The structural
geometry of co-ordinated base changes in transfer
RNA. J. Mol. Biol. 89, 511–516.
Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J.,
Overbeek, R. N., Macke, T. J., Marsh, T. L. & Woese,
C. R. (1993). The ribosomal database project. Nucl.
Acids Res. 21 Suppl., 3021–3023.
Levitt, M. (1969). Detailed model for transfer ribonucleic
acid. Nature (London), 224, 759–763.
Major, F., Gautheret, D. & Cedergren, R. (1993).
Reproducing the three-dimensional structure of a
tRNA molecule from structural constraints. Proc. Nat.
Acad. Sci., U.S.A. 90, 9408–9412.
Malhotra, A., Tan, R. K. & Harvey, S. C. (1990). Prediction
of the three-dimensional structure of Escherichia coli
30S ribosomal subunit: a molecular mechanics
approach. Proc. Nat. Acad. Sci., U.S.A. 87, 1950–1954.
McClain, W. H. (1993a). Identity of Escherichia coli tRNACys
determined by nucleotides in three regions of tRNA
tertiary structure. J. Biol. Chem. 268, 19398–19402.
McClain, W. H. (1993b). Rules that govern tRNA identity
in protein synthesis. J. Mol. Biol. 234, 257–280.
McClain, W. H. & Foss, K. R. (1988). Changing the identity
of a tRNA by introducing a G-U wobble pair near the
3' acceptor end. Science, 240, 793–796.
McClain, W. H., Foss, K. R., Jenkins, R. A. & Schneider, J.
(1991). Rapid determination of nucleotides that define
tRNAGly
acceptor identity. Proc. Nat. Acad. Sci., U.S.A.
88, 6147–6151.
Michel, F. & Westhof, E. (1990). Modelling of the
three-dimensional architecture of group I catalytic
introns based on comparative sequence analysis.
J. Mol. Biol. 216, 585–610.
Michel, F., Ellington, A. D., Couture, S. & Szostak, J. W.
(1990). Phylogenetic and genetic evidence for base
triple formation in the catalytic domain of group I
introns. Nature (London), 347, 578–580.
Ninio, J. (1982). Molecular Approaches to Evolution,
pp. 24–27, Pitman Books Ltd., London, U.K.
Olsen, G. J. (1983). Comparative analysis of nucleotide
sequence data, PhD dissertation, University of
Colorado Health Sciences Center, CO.
Pu¨tz, J., Puglisi, J. D., Florentz, C. & Giege´, R. (1991).
Identity elements for specific aminoacylation of yeast
tRNAAsp
by cognate aspartyl-tRNA synthetase.
Science, 252, 1696–1699.
Pyle, A. M., Murphy, F. L. & Cech, T. R. (1992).
RNA substrate binding site in the catalytic core of
the Tetrahymena ribozyme. Nature (London), 358,
123–128.
Quigley, G. J. & Rich, A. (1976). Structural domains of
transfer RNA molecules. Science, 194, 796–806.
Rould, M. A., Perona, J. J., So¨ll, D. & Steitz, T. A. (1989).
Structure of E. coli glutamyl-tRNA synthetase
complexed with tRNAGln
and ATP at 2.8 A˚ resolution.
Science, 246, 1135–1142.
Shultz, D. W. & Yarus, M. (1994). tRNA structure and
ribosomal function. I. tRNA nucleotide 27 to 43
mutations enhance first position wobble. J. Mol. Biol.
235, 1381–1394.
Smith, D. & Yarus, M. (1989). Transfer RNA and
coding specificity. II. A D-arm tertiary interaction
that restricts coding range. J. Mol. Biol. 206,
503–511.
Sprinzl, M., Dank, N., Nock, S. & Scho¨n, A.
(1991). Compilation of tRNA sequences and se-
quences of tRNA genes. Nucl. Acids Res. 19 (Suppl.)
2127–2171.
Sussman, J. L. & Kim, S.-H. (1976). Three-dimensional
structure of a transfer RNA in two crystal forms.
Science, 176, 853–858.
Winker, S., Overbeek, R., Woese, C. R., Olsen, G. J. &
Pfluger, N. (1990). Structure detection through
automated covariance search. Comp. Appl. Biosci. 6,
365–371.
Woese, C. R. (1987). Bacterial evolution. Microbiol. Rev. 51,
221–271.
Woese, C. R. & Pace, N. R. (1993). Probing RNA structure,
function and history by comparative analysis. In The
RNA World (Gesteland, R. F. & Atkins, J. F., eds),
pp. 91–117, Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, NY.
Woo, N. H., Roe, B. A. & Rich, A. (1980). Three-dimensional
structure of Escherichia coli initiator tRNAMet
f . Nature
(London), 286, 346–351.
Yarus, M. (1982). Translational efficiency of transfer
RNAs: uses of extended anticodon. Science, 218,
646–652.
Yaris, M., Illangesekare, M. & Christian, E. (1991). An axial
binding site in the Tetrahymena precursor RNA. J. Mol.
Biol. 222, 995–1012.
Edited by D. E. Draper
(Received 20 July 1994; accepted in revised form 20 January 1995)

Gutell 044.jmb.1995.248.0027

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Gutell 044.jmb.1995.248.0027

Semelhante a Gutell 044.jmb.1995.248.0027 (20)

Mais de Robin Gutell

Mais de Robin Gutell (20)

Último

Último (20)

Gutell 044.jmb.1995.248.0027