Breaking the Kubernetes Kill Chain: Host Path Mount
Gutell 074.jmb.2000.304.0335
1. A Story: Unpaired Adenosine Bases in
Ribosomal RNAs
R. R. Gutell1
*, J. J. Cannone1
, Z. Shang1
, Y. Du1
and M. J. Serra2
1
Institute for Cellular and
Molecular Biology, University
of Texas, 2500 Speedway
Austin, TX 78712-1095, USA
2
Department of Chemistry
Allegheny College, 520 N.
Main St., Meadville
PA 16335, USA
In 1985 an analysis of the Escherichia coli 16 S rRNA covariation-based
structure model revealed a strong bias for unpaired adenosines. The
same analysis revealed that the majority of the G, C, and U bases were
paired. These biases are (now) consistent with the high percentage of
unpaired adenosine nucleotides in several structure motifs.
An analysis of a larger set of bacterial comparative 16 S and 23 S
rRNA structure models has substantiated this initial ®nding and revealed
new biases in the distribution of adenosine nucleotides in loop regions.
The majority of the adenosine nucleotides are unpaired, while the
majority of the G, C, and U bases are paired in the covariation-based
structure model. The unpaired adenosine nucleotides predominate in the
middle and at the 3H
end of loops, and are the second most frequent
nucleotide type at the 5H
end of loops (G is the most common nucleotide).
There are additional biases for unpaired adenosine nucleotides at the
3H
end of loops and adjacent to a G at the 5' end of the helix. The most
prevalent consecutive nucleotides are GG, GA, AG, and AA. A total of
70 % of the GG sequences are within helices, while more than 70 % of the
AA sequences are unpaired. Nearly 50 % of the GA sequences are
unpaired, and approximately one-third of the AG sequences are within
helices while another third are at the 3H
loop.5H
helix junction.
Unpaired positions with an adenosine nucleotide in more than 50 % of
the sequences at the 3H
end of 16 S and 23 S rRNA loops were identi®ed
and arranged into the A-motif categories XAZ, AAZ, XAG, AAG, and
AAG:U, where G or Z is paired, G:U is a base-pair, and X is not an A
and Z is not a G in more than 50 % of the sequences. These sequence
motifs were associated with several structural motifs, such as adenosine
platforms, E and E-like loops, A:A and A:G pairings at the end of helices,
G:A tandem base-pairs, GNRA tetraloop hairpins, and U-turns.
# 2000 Academic Press
Keywords: RNA structure; comparative sequence analysis; unpaired
adenosines; structure motifs; computational biology/bioinformatics*Corresponding author
Introduction
RNA molecules can form similar secondary and
tertiary structures for sequences that are not identi-
cal, and in many situations with less than 50 %
sequence similarity. Comparative sequence anal-
ysis attempts to identify those structural elements
that are in common between different sequences
that are members of the same RNA family (e.g.
tRNA). Comparative sequence analysis has been
used successfully to predict secondary and tertiary
interactions in several RNA molecules (reviewed
by Woese & Pace, 1993: Gutell, 1996; Michel et al.,
2000). The majority of these interactions are com-
posed of G:C and A:U base-pairs (here, we de®ne
underlined nucleotides as base-paired), organized
into regular secondary structure helices, and ident-
i®ed with covariation analysis due to the manner
in which both paired positions coordinately
change, or covary, their nucleotide composition
(Woese et al., 1983; Gutell et al., 1985). Beyond the
prediction of standard base-pairs in secondary
structure helices, covariation analysis is also pre-
dicting non-standard base-pairs (e.g. A:G
exchanges with G:A, and U:U exchanges with C:C)
and base-pairs that form tertiary structure (Gutell,
1996; Gutell et al., unpublished results). We
now believe that all of the standard secondary
structure base-pairs in the Escherichia coli 16 S
E-mail address of the corresponding author:
robin.gutell@mail.utexas.edu
doi:10.1006/jmbi.2000.4172 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 304, 335±354
0022-2836/00/030335±20 $35.00/0 # 2000 Academic Press
2. and 23 S rRNAs have been identi®ed with our
covariation analysis. For those situations where we
can compare and contrast a solved crystal structure
with comparative data from a RNA sequence
alignment, paired positions with a strong covaria-
tion are nearly always base-paired in the crystal
structure (Gutell, 1999; Gutell et al., unpublished
results). Therefore, covariation analysis, when used
judiciously, can accurately predict base-pairs in an
RNA structure.
We now wonder what type of contribution com-
parative analysis will have on the prediction and
understanding of the three-dimensional structures
of the rRNAs (Ban et al., 1999, 2000; Cate et al.,
1999; Clemons et al., 1999; Tocilj et al., 1999;
Schluenzen et al., 2000; Wimberly et al., 2000). We
can begin to address this issue when we appreciate
that comparative analysis, in its most general form,
identi®es patterns of variation in its search for a
common structure. Base-pairs are predicted for
those positions that vary at the same time in the
evolution of that RNA, regardless of the type
of base-pairing and/or the arrangement of this
pairing in relationship with the ¯anking positions.
Since the majority of the base-pairs are G:C, A:U,
or G:U, and these pairs are arranged into standard
secondary structure helices, we conclude that cov-
ariation analysis can identify the basic building
blocks of RNA structure without any structural or
other preconceived biases.
Given this success, we now question if other
RNA building blocks or motifs can be deciphered
from our comparative RNA sequence and structure
data sets. Our traditional comparative secondary
structure model only shows those secondary
and tertiary structure base-pairs with positional
covariation within the underlying sequences plus
invariant Watson-Crick base-pairs which are
directly adjacent to base-pairs with positional
covariation. All of the unpaired positions in these
diagrams imply the lack of pairings with covaria-
tion, not that these positions are not paired or
interacting with other regions of the RNA. Can we
relate speci®c patterns of variation that occur
within a de®ned structural context to a three-
dimensional structure motif? Can we now predict
structure for the positions that do not covary with
other positions? Alternatively, we question what
types of structure occur at the unpaired positions
in the covariation structure model and ask if
can we develop principles that relate sequence
variation with these structural elements.
While some structural elements, such as base-
pairs and helices, form similar structures with
sequences whose positions covary, other structural
elements with similar shapes form sets of aligned
sequences that do not have positional covariation
with one another (Gautheret et al., 1995a). Com-
parative analysis of nucleotide distributions in
different structural elements has resulted in the
identi®cation of several sequence and structure
motifs in these unpaired regions. This list includes
tetraloops (Woese et al., 1990), tandem G:A base-
pairs (Gautheret et al., 1994), dominant G:U base-
pairs (Gautheret et al., 1995b), E-loops (Gutell et al.,
unpublished results; Gautheret et al., 1994;
Wimberly 1994; Leontis & Westhof, 1998), U-turns
(Gutell et al., 2000), and A:A and A:G base-pairs at
the ends of helices (here-after called AA.AG@he-
lix.ends). These sequence-based analyses are given
more meaning, biologically and structurally,
from their comparison with experimental studies,
especially the NMR and crystallographic analysis
of several rRNA fragments (Szewczak et al., 1993;
Kalurachchi et al., 1997; Conn et al., 1999;
Wimberly et al., 1999; Agalarov et al., 2000; Nikulin
et al., 2000). Our goals for the future are to identify
more biased distributions of nucleotides and
sequences in different structural arrangements, to
ascribe biological and structural signi®cance to
them, and to deduce sets of sequence-structure
relationship rules, from which we aspire to accu-
rately predict detailed RNA structure from a single
sequence.
In 1985, a simple count of the paired and
unpaired nucleotides in E. coli 16 S rRNA revealed
a strong bias for unpaired adenosine nucleotides
(Gutell et al., 1985). A total of 62 % of the adenosine
nucleotides were unpaired, while approximately
30 % of the G, C, and U bases were unpaired. The
structural signi®cance for this bias was not known
at the time. However, these biases are (now)
consistent with the high percentage of unpaired
adenosine bases in the GNRA tetraloops (Woese
et al., 1990), E-loops (Gautheret et al., 1994;
Wimberly, 1994; Leontis & Westhof, 1998), adeno-
sine platforms (Cate et al., 1996b) and AA side-step
(Conn et al., 1999) RNA sequence and structure
motifs found after this initial adenosine bias was
found.
Here, we follow up with a larger and more
detailed analysis of paired and unpaired nucleo-
tides in our collection of rRNA and group I intron
comparative structure models, track the frequently
occurring unpaired nucleotides, and associate these
with different structural motifs.
Results
The base compositions for 175 bacterial 16 S and
71 bacterial 23 S rRNA comparative structure
models have been analyzed and presented here.
For our online presentation (see Materials and
Methods for detailed explanations), we have ana-
lyzed a larger set of comparative structures from
5 S, 16 S, and 23 S rRNAs (including bacteria,
archaea, and eucarya nuclear, chloroplast, and
mitochondria sequences) and group I introns. Our
collection of structure diagrams represents all of
the major phylogenetic groups within the bacterial
domain (as well as for the other primary phylo-
genetic domains). The comparative structure
model is based on covariation analysis (Woese
et al., 1983; Gutell et al., 1985, unpublished results).
For the purposes of the current analysis, positions
336 Unpaired Adenosine Bases in Ribosomal RNAs
3. with substantial covariation or containing invariant
Watson-Crick base-pairs are base-paired and
positions that do not covary with other positions
are unpaired in our covariation structure model.
The current 16 S and 23 S rRNA secondary
structure models are available from http://
www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html
The frequencies for single nucleotide positions
are presented in histogram format (Figure 1). The
total frequencies for the four RNA nucleotides A,
U, C, and G were characterized into helices (base-
paired) and loops (unpaired), and then subdivided
further into the 5H
end, center, and 3H
end positions
for helices and loops. Overall, G (31.4 %) is the
most prevalent nucleotide, followed by A (25.7 %),
C (22.4 %), and U (20.5 %), G is also the most com-
mon nucleotide in helices (36.6 %), while A (14.5 %)
occurs with the lowest frequency in paired pos-
itions. Guanosine occurs with an even higher
frequency at the 5H
end of helices (46.2 %), where U
is the least frequent (13.5 %). Meanwhile, C is the
most abundant nucleotide at the 3H
end of helices
(38.1 %), followed by G (30.4 %). Adenosine is the
most prevalent nucleotide at unpaired positions,
occurring at 42.6 %, while C is the least common at
12.5 %. Adenosine is even more dominant at the 3H
end of loops, occurring in 53.5 % of the sequences.
Meanwhile, G is the most common nucleotide at
the 5H
end of loops (37.1 %); adenosine is second at
29.3 %. Another measure of the bias in unpaired
adenosine bases is revealed in the ratio of unpaired
to paired nucleotides for single nucleotides (see
also the online query system). The unpaired/
paired ratio for each nucleotide is: A, 1.96; U, 0.71;
G, 0.43; and C, 0.29. Alternatively, 66.2 % of the
adenosine bases are unpaired; the percentages of
unpaired U, G, and C bases are 41.5 %, 30.1 %, and
22.3 %, respectively, for our collection of bacterial
16 S and 23 S rRNA structure models. These
values are similar but not identical with the values
determined for the 1985 version of the E. coli 16 S
rRNA covariation structure model (Gutell et al.,
1985). The same trends and nucleotide biases also
occur for our other RNA structure models (avail-
able online).
Figure 1. Frequency and
distribution of single nucleotides in
bacterial 16 S and 23 S rRNAs com-
parative structure models. The total
number of occurrences for each of
the four nucleotides at nine struc-
tural categories: total (all positions),
paired, unpaired, 5H
-helix.end
(5H
end of a helix), 3H
-helix.end
(3H
end of a helix), 5H
-loop.end
(5H
end of a loop), 3H
-loop.end (3H
end of a loop), helix.center (all pos-
itions within a helix that are not at
the 5H
or 3H
ends of a helix), and
loop.center (all positions within a
loop that are not at the 5H
or 3H
ends of a loop).
Figure 2. Frequency and distribution of consecutive nucleotides in bacterial 16 S and 23 S rRNAs comparative
structure models. The total number of occurrences for the 16 dinucleotides at three structural categories: total (all
positions), in helix (paired), and in loop (unpaired).
Unpaired Adenosine Bases in Ribosomal RNAs 337
4. Next, we investigated the frequency and
distribution of consecutive nucleotides. The most
common dinucleotides are the four purine combi-
nations. Consecutive GG residues are the most
prevalent at 9.86 %, followed by GA (7.92 %), AG
(7.88 %), and AA (7.65 %) (Figure 2). The
dinucleotides were classi®ed into four categories:
paired (helical), unpaired (loop), and the two
paired/unpaired junctions, 3H
loop.5H
helix and
3H
helix.5H
loop. The most frequent consecutive dinu-
cleotides are distinctly different between these four
categories. In helices, GG (14.1 %), GC (10.4 %), CC
(9.0 %), and GU (8.3 %) are the most prevalent con-
secutive dinucleotides; note that these consecutive
dinucleotide arrangements are components of the
most stable nearest-neighbors (Xia et al., 1998). In
contrast, AA (19.2 %), GA (13.4 %), and UA (9.8 %)
are the most common adjacent dinucleotides in
loop motifs (Figure 2). Greater than 70 % of
the consecutive adenosine residues are within
unpaired regions, consistent with the observation
that 5H
-AA-3H
/3H
-UU-5H
is the least stable nearest-
neighbor (Xia et al., 1998).
The adjacent dinucleotides with the highest
unpaired to paired ratio are AA (5.68), UA (2.03),
GA (1.47), AU (1.20), while the three lowest ratios
are GC (0.17), GG (0.15), and CC (0.11). These
ratios again emphasize that adenosine bases tend
to be unpaired, consecutive adenosine bases are
even more likely to be unpaired, and that consecu-
tive G and C bases tend to be paired.
The most abundant dinucleotides at loop-helix
junctions were analyzed (Figure 3). CG (14.6 %),
GA (10.3 %), and CA (10.2 %) are the most abun-
dant at the 3H
helix.5H
loop junction; AG (25.0 %) and
AC (13.3 %) are the two most abundant pairs at the
3H
loop.5H
helix junction. These results are consistent
with the abundance of A and G bases at the 5H
end
of loops, A nucleotides at the 3H
end of loops, and
G and C nucleotides at the 5H
and 3H
ends of helices.
The strong preference for AG at loop-helix junc-
tions might not be a simple consequence of stab-
ility since all 5H
dangling ends have nearly the
same small stabilizing effect helices (Freier et al.,
1986). The most stable 3H
dangling end sequences,
CA, CG, GA, and GG (Freier et al., 1986), occur
frequently in our 16 S and 23 S rRNA structure
data sets (Figure 3).
Next, we investigated the frequencies for
three consecutive nucleotides - NNN and NNN at
loop.helix and helix.loop interfaces, where N is
unpaired and N is paired. Figure 4(a) and (b) dis-
play the 32 most prevalent trinucleotide combi-
nations for NNN (a) and NNN (b). The observed
triplets at these junctions are very biased in
their distributions. At the 3H
loop.5H
helix interface
(Figure 4(a)), AAG occurs in 14.4 % of the junc-
tions, followed by AAC (6.7 %) and GAG (5.4 %).
All of the 11 most frequent sequences contain at
least one unpaired A nucleotide; nine of these 11
trinucleotides have an A base at the extreme 3H
end
of the loop. The trinucleotides at the 3H
helix.5H
loop
interface (Figure 4(b)) are signi®cantly different.
The three most abundant trinucleotides are BGA,
where B is not A: CGA (7.6 %), UGA (5.8 %), and
GGA (5.4 %). The six most frequent sequences have
at least one adenosine base in the two unpaired
positions, with purines accounting for 11 of the 12
unpaired positions. In addition to these biased dis-
tributions of triplets at loop/helix junctions,
Figure 4(a) and (b) also reveal that only 32 of the
64 possible triplets account for more than 80 % of
these occurrences.
The most signi®cant ®ndings to this stage in our
analysis are the high percentages of: (1) unpaired
adenosine bases, with adenosine residues account-
ing for more than 50 % of the nucleotides at the 3H
loop ends; (2) paired guanosine bases, with guano-
sine accounting for nearly 50 % of the nucleotides
at the 5H
end of helices; (3) unpaired consecutive
adenosine bases; and (4) AG at 3H
loop.5H
helix
junctions.
Our next set of goals is to map these frequently
occurring nucleotides onto the 16 S and 23 S rRNA
comparative structure models, to determine those
positions where the unpaired adenosine residue at
the 3H
end of the loop occurs in more than 50 % of
the bacterial sequences, and to identify larger
motifs that build onto these dominant adenosine
bases. We rationalize that 3H
loop positions with an
adenosine in more than 50 % of the sequences
(hereafter called the ``A-motifs'') are important for
Figure 3. Frequency and distri-
bution of dinucleotides at loop-
helix junctions in bacterial 16 S and
23 S rRNAs comparative structure
models. Total number of occur-
rences of consecutive nucleotides at
the two loop-helix junctions,
3H
helix.5H
loop and 3H
loop.5H
helix.
338 Unpaired Adenosine Bases in Ribosomal RNAs
5. the formation of conserved structural motifs. A
total of 527 unpaired positions in the 16 S and 23 S
rRNAs are followed by a base-pair predicted with
covariation analysis. We expect, based upon the
observed nucleotide frequencies in the bacterial
16 S and 23 S rRNA sequences (A, 25.7 %; C,
22.4 %; G, 31.4 %; U, 20.5 %), adenosine to occur at
25.7 % (135 occurrences) of these 3H
loop ends for
any one set of 16 S and 23 S rRNA structures. We
observe that, collectively, the positions at the 3H
loop ends contain 54.5 % adenosine bases. The two
extreme cases for the distribution of these adeno-
sine bases among the 527 3H
loop ends are (1) the
adenosine nucleotides are distributed evenly, so
that each of the loop ends contains 54.5 % adeno-
sine; and (2) the adenosine nucleotides are concen-
trated such that 287 of the loop ends contain 100 %
adenosine. In fact, 294 of the 527 3H
loop ends have
an adenosine base in more than 50 % of the bac-
terial 16 S and 23 S rRNA sequences (Table 1); the
average conservation value for adenosine at these
positions is 93.7 %. Therefore, there is a very
pronounced bias for adenosines to be very
conserved at the 3H
loop ends of the 16 S and 23 S
rRNAs.
Of the 294 3H
loop ends with an adenosine base
in more than 50 % of bacterial sequences, 136 are
followed by a paired G in more than 50 % of those
sequences (AG motif; Table 1). In contrast, we
expect 43 of these motifs in the 16 S and 23 S
rRNAs, based on the observed nucleotide frequen-
cies (527*.257*.314). Finally, the number of AA and
AAG motifs observed is again more than the num-
ber expected for a random distribution (Table 1).
The distributions of the expected and observed A,
AA, AG, AAG, and AAG:U motifs in hairpin,
multi-stem, internal, and bulge loops were deter-
mined (Table 1). The number of observed A-motifs
at each of the loop motifs is (again) signi®cantly
larger than expected. (Note for the following
A-motifs (where each motif occurs in a minimum
of 50 % of the sequences): AAG, the G is not paired
to a U in more than 33 % of the sequences; AA, the
nucleotide 3H
of the second A is not a G in more
than 50 % of the sequences; AG, the nucleotide 5H
of the A is not an A in more than 50 % of the
sequences; A, the paired nucleotide following the
A is not a G in more than 50 % of the sequences
and the nucleotide preceding the A is not an A in
more than 50 % of the sequences.)
The A-motifs have been mapped onto the 16 S
and 23 S rRNA secondary structure models
(Figure 5). Each of ®ve motifs is assigned a differ-
ent color: AAG:U motifs are indicated in red, AAG
in green, AG in blue, AA in orange, and A in
yellow. Position numbers for the A-motifs in the
16 S and 23 S rRNA are listed in Tables 2
(AAG:U), 3 (AAG), 4 (AG), 5 (AA), and 6 (A).
The loop-helix junctions listed in Table 2 have
the AAG sequence present in more than 50 % of
the bacterial sequences, and G:U in more than 33 %
of the same sequence set. Thirteen 16 S and 23 S
rRNA junctions satisfy this criteria. The majority of
these occur in internal loops (10), and a few occur
in bulge (2) and multi-stem (1) loops; three occur
in 16 S rRNA, and ten appear in 23 S rRNA (see
Table 2 and Figure 5). The majority of these are
very well conserved, occurring with percentages
signi®cantly higher than the required minimum.
Seven have greater than 90 % AAG and 90 % G:U
base-pair conservation; the average conservation
values are 81 % AAG and 77 % G:U.
The remaining 43 AAG loop-helix junctions are
listed in Table 3. These junctions are distributed
more evenly than the AAG:U A-motif in hairpin
(9), multi-stem (19), and internal (14) loops, with
one in a bulge loop; 15 occur in 16 S rRNA and 28
occur in 23 S rRNA (see Table 3 and Figure 5).
More than 75 % of the hairpin junctions are part of
a GNRA tetraloop. Over half (23) of these AAG
junctions are conserved in more than 90 % of the
sequences, with an average conservation value of
Figure 4. Frequency and distribution of consecutive
trinucleotides at loop-helix junctions in bacterial 16 S
and 23 S rRNAs. The ranking of the top 32 most fre-
quent trinucleotides at the two loop-helix junctions,
3H
helix.5H
loop and 3H
loop.5H
helix. Two of the three con-
secutive nucleotides are unpaired at both junctions. The
paired nucleotides are underlined. (a) 3H
loop.5H
helix junc-
tion. (b) 3H
helix.5H
loop junction.
Unpaired Adenosine Bases in Ribosomal RNAs 339
6. 86 %. The consecutive AA nucleotides are con-
served in approximately 93 % of the sequences.
AG loop-helix junctions are listed in Table 4.
There are 80 examples of this motif, with a sig-
ni®cant proportion occurring in internal (26),
multi-stem (28), and hairpin (17) loops, and the
remaining nine in bulge loops; 23 occur in 16 S
rRNA and 57 occur in 23 S rRNA (see Table 4
and Figure 5). Almost 60 % of the AG motifs are
conserved in more than 90 % of the sequences,
and 81 % of these motifs are conserved in more
than 70 % of the sequences. Six of the hairpin
loops are GNRA tetraloops; seven other loops
have unusually stable G:A mismatches between
the ®rst and last nucleotides of the hairpin loop
(Serra et al., 1994).
Figure 5 (legend shown on page 342)
340 Unpaired Adenosine Bases in Ribosomal RNAs
7. Figure 5 (legend shown on page 342)
Unpaired Adenosine Bases in Ribosomal RNAs 341
8. A total of 56 AA motifs (Table 5) occur pre-
dominantly in multi-stem (24), internal (16), and
hairpin (12) loops; four occur in bulge loops (see
Table 5 and Figure 5). 18 occur in 16 S rRNA
and 39 occur in 23 S rRNA. Over 60 % of these
motifs are conserved in more than 90 % of
the sequences. Table 5 also contains the most
prevalent AAN sequence at each motif
site (where N is base-paired; sites having
AAG > 50 % appear in Tables 2 or 3). Nearly
50 % of the AA motifs in Table 5 are AAC.
Eight of the hairpin loops have unusually stable
sequences, either GNRA tetraloops (4) or G:A
®rst mismatches (4) (Serra et al., 1994).
Figure 5. A-motifs mapped onto the Escherichia coli 16 S and 23 S rRNA comparative secondary structure models.
Unpaired positions at the 3H
end of loops that occur in more than 50 % of the bacterial sequences are highlighted in
different colors: XAZ, yellow; AAZ, orange; XAG, blue; AAG, green; and AAG:U, red; where X is not A in more
than 50 % of the sequences, Z is not G in more than 50 % of the sequences, and paired nucleotides are underlined.
Diagrams were generated using the program XRNA (Weiser, B. & Noller, H., University of California at Santa Cruz).
(a) 16 S rRNA. (b) 23 S rRNA, 5H
half. (c) 23 S rRNA, 3H
half.
342 Unpaired Adenosine Bases in Ribosomal RNAs
9. There are 102 A-motifs, with a signi®cant num-
ber of occurrences in multi-stem (38), internal (29),
bulge (20), and hairpin (15) loops; 41 occur in 16 S
and 61 occur in 23 S rRNA (see Table 6 and
Figure 5). A total of 77 % of the A motifs are
conserved in more than 90 % of the bacterial
sequences, and 50 % are 100 % conserved in those
sequences!
Discussion
Analysis of a large set of bacterial 16 S and 23 S
rRNA covariation-based comparative structure
models has revealed a propensity for adenosine
bases to be unpaired. A disproportionate number
of these unpaired adenosine nucleotides are con-
secutive, at the 3H
end of loops, and adjacent to a
paired G at the 3H
loop.5H
helix junction. The highly
conserved nature of the loop-helix junctions
described here suggests that they are an important
part of several different motifs. Because they occur
so frequently, we believe that they are a major
building block in the 16 S and 23 S rRNA struc-
tures. Our goal is to transform these sequence
motifs into structural motifs that help coordinate
three-dimensional structure. We have named the
adenosine bases that occur at the 3H
end of loops in
more than 50 % of the bacterial 16 S and 23 S
rRNA sequences A-motifs. These are associated
with several known structural motifs and are
classi®ed into ®ve categories: AAG:U, AAG, AG,
AA, and A.
Adenosine platforms
The ®rst set of loop-helix junctions to consider is
those with a AAG:U motif (Table 2 and Figure 5).
Thirteen positions in the 16 S and 23 S rRNA con-
tain the AAG sequence conserved in more than
50 % of the sequences (see Table 2) and the G:U
base-pair conserved in more than 33 % of the
sequences (16 S positions 415, 432, and 1289; 23 S
positions 14, 706, 1214, 1470, 1854, 1877, 1890,
2135, 2542, 2851). Seven of these sites (in italics) are
conserved in more than 90 % of the sequences.
This complex sequence motif forms the adeno-
sine platform present in the crystal structure of the
Tetrahymena thermophila group I intron P4-P6
domain (Cate et al., 1996a,b). To ascertain if the
adenosine platform-like sequence motifs in the
16 S and 23 S rRNA are capable of forming the
Table 1. Characterization of nucleotides at loop-helix junctions for loops with unpaired 5H
nucleotides in 16 S and
23 S rRNA
Loop type Total A AA AG AAG AAG:U
Total Measured 527 294 (56 %) 113 (21 %) 136 (26 %) 56 (11 %) 13 (2 %)
Predicted ± 135 (26 %) 35 (7 %) 43 (8 %) 11 (2 %) 2 (1 %)
Hairpin Measured 91 53 (58 %) 21 (23 %) 26 (29 %) 9 (10 %) 0 (±)
Predicted ± 24 (25 %) 6 (6 %) 8 (8 %) 2 (2 %) 0 (±)
Multi stem Measured 202 110 (54 %) 45 (22 %) 48 (24 %) 20 (10 %) 1 (1 %)
Predicted ± 51 (26 %) 13 (7 %) 16 (8 %) 4 (2 %) 1 (1 %)
Internal Measured 163 95 (58 %) 40 (25 %) 50 (31 %) 24 (15 %) 10 (6 %)
Predicted ± 42 (26 %) 11 (7 %) 13 (8 %) 3 (2 %) 1 (1 %)
Bulge Measured 71 36 (51 %) 7 (10 %) 12 (17 %) 13 (4 %) 2 (3 %)
Predicted ± 18 (25 %) 5 (7 %) 6 (8 %) 1 (1 %) 0 (±)
Junctions were counted if an A-motif occurred in greater than 50 % (33 % for AAG:U) of the sequences in the bacterial 16 S and
23 S rRNA alignments (http://www.rna.icmb.utexas.edu/). Predicted values were calculated with nucleotide frequencies: A
(25.7 %), G (31.4 %), and U (20.5 %); values are rounded to the nearest whole number. Percentages are calculated with respect to the
total number of positions for that loop type; values are rounded to the nearest whole number, with ``±'' used to represent zero.
Table 2. A-motif: AAG:U sites in 16 S and 23 S rRNA
Positiona
AA (%)b
AAG (%)b
G:U (%)b
Predicted
structure
motifs c
A. Multi-stem loops
23 S rRNA
14 99 99 98 P
B. Internal loops
16 S rRNA
415 76 75 59 EL, P
432 100 55 45 GA, P
1289 100 55 55 A, P
23 S rRNA
706 100 94 94 A, P
1214 97 97 97 A, P
1470 86 81 76 GA, P
1854 100 54 39 GA, P
1877 98 98 98 P
1890 100 100 100 P
2135d
86 48 46 P
C. Bulge loops
23 S rRNA
2542 100 100 99 P
2851 93 91 91 P
rRNA positions have an AAG:U motif in more than 33 % of
the bacterial sequences and are indicated in red on Figure 5.
a
The position number is the nucleotide at the 3H
loop end,
at the loop-helix junction.
b
More detailed information is available at http://
www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; EL, E-like Loop; GA, tandem G:A
base-pairs; P, adenosine platform (see Discussion).
d
Although this site contains less than 50 % AAG, it was
included because it contains more than 33 % G:U and narrowly
missed the required minimum for AAG.
Unpaired Adenosine Bases in Ribosomal RNAs 343
10. adenosine platform structural motif, we have ana-
lyzed the group I intron adenosine platforms from
a comparative sequence perspective. The crystal
structure of the P4-P6 domain of the group I intron
has three adenosine platforms at positions 172,
219, and 226 (numbers refer to the second A of
the AAG motif for the T. thermophila sequence
(GenBank Accession # J01235)). Each of the three
adenosine platforms occurs in a distinct structural
environment in the comparative secondary struc-
Table 3. A-motif: AAG sites in 16 S and 23 S rRNA
Position a
AA (%)b
AAG (%)b
Predicted structure motifsc
Loop d
A. Hairpin loops
16 S rRNA
383 98 70 A GNRA
901 100 97 A, U GNRA
23 S rRNA
311 91 84 U 6
633 100 77 U GNRA
1226 62 52 A, U GNRA
1810 95 88 A GNRA
1872 70 65 GNRA
1928 100 100 U 3
2361 62 55 6
B. Internal loops
16 S rRNA
1333 100 99 A
1434 98 94
1469 54 54
1493 99 99 A
1503 100 100
23 S rRNA
609 100 68 A
1001 99 99 A, GA
1156 98 85
1354 100 99 A, GA, U
1572 92 83 A, GA, U
1580 88 86 GA
1701 100 99 A, EL
2469 96 96 A, GA
2810 83 83 A
C. Multi-stem loops
16 S rRNA
60 98 98 A, GA
197 99 93 A
499 99 98
574 99 98
768 97 96 EL
873 100 89
915 100 85
938 100 99
23 S rRNA
423 100 93
472 94 94
603 53 53 A, GA
1010 100 53
1029 100 65 A, GA
1308 99 99
1641 86 85 A
2336 100 99
2378 100 96 A, U
2412 93 85
2566 100 100 A
D. Bulge loops
23 S rRNA
1848 100 96
rRNA positions have an AAG motif in more than 50 % of the bacterial sequences and are indicated in green on Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion).
d
Hairpin loop size (in nucleotides) and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of
the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
344 Unpaired Adenosine Bases in Ribosomal RNAs
11. ture model (Michel & Dujon, 1983; Michel &
Westhof, 1990) and the three-dimensional crystal
structure (Cate et al., 1996b): a hairpin loop at
position 172, a symmetric 3 Â 3 internal loop
at position 219 (where 3 Â 3 refers to the number
of nucleotides on each side of the internal loop),
and an asymmetric 3 Â 2 internal loop at position
226. They also differ in regards to the type of
tertiary interactions with which they are associ-
ated. The adenosine platform at position 226 is
part of the tetraloop receptor (Murphy & Cech,
1994; Cate et al., 1996b) that makes an intramolecu-
lar contact with a tetraloop at position 150, one of
the interactions responsible for aligning the two
Table 4. A-motif: AG sites in 16 S and 23 S rRNA
Position a
A (%) b
AG (%) b
Predicted
structure
motifs c
Loop d
A. Hairpin loops
16 S rRNA
300 100 100 A, U GNRA
1080 100 90 A, U GNRA
1269 100 72 A, U GNRA
23 S rRNA
167 100 99 9*
251 100 98 A 5*
322 100 71 3
466 100 79 A, U GNRA
492 99 75 5*
646 87 86 5*
1073 100 100 U 9
1098 99 99 A, U 6*
1618 98 95 A 6*
1755 100 73 3
2147 95 95 4*
2534 54 53 6
2598 100 100 A, U GNRA
2662 # 100 100 A, U GNRA
B. Multi-stem loops
16 S rRNA
8 98 98
26‡ 100 99 A
288 100 92
353 98 98 A
523‡ 100 99
828 80 71
860 96 88 A
1046‡ 100 99
1067 100 100 A, U
23 S rRNA
177‡ 59 58 A
324‡ 73 55
332 100 88
374 100 67 A, GA, E
532‡ 65 61
627 99 98 A, GA
655 98 98 A, GA
699‡ 99 95 A
945 99 76 A
975 99 99 A
1189 100 99 A, GA, E
1342 100 100 U
1791 100 98 A
1932 100 100 A, GA, EL
2119 100 100
2126 100 100 A, GA
2587 100 83 A, U
2629 63 57
Position a
A (%) b
AG (%) b
Predicted
structure
motifs c
Loop d
C. Internal loops
16 S rRNA
246‡ 100 100 A
520 100 100 A
665 70 67
687 100 97 A
802 100 99 A, EL
1252 72 68
1275 93 92
1418 100 98 A, GA
1456‡ 82 73
23 S rRNA
84 100 98
244 100 99 A, GA, E
294‡ 100 88 A
861 100 96 A, GA, E
878 86 73
1111 100 100
1237 100 82
1268 100 65 A, GA, E
1373‡ 100 91 EL
1434 78 58
1439 90 56
1477 92 88 A, GA, EL
1866 99 90 A, GA
2158 100 99
2298‡ 91 67
2320 60 51
2388‡ 100 100
2639 100 78 A, GA
D. Bulge loops
16 S rRNA
583‡ 100 100
777 100 96
23 S rRNA
213 100 100
764‡ 100 60
941‡ 100 99
1205‡ 76 67
1490‡ 97 96
1586 90 79
2602‡ 100 100
rRNA positions have an AG motif in more than 50 % of the bacterial sequences and are indicated in blue on Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is
base-paired; #, Sarcin/Ricin loop.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).
d
Hairpin loop size and special characteristics:. GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
Unpaired Adenosine Bases in Ribosomal RNAs 345
12. coaxial stacked helices of the P4-P6 domain. The
other two adenosine platforms form intermolecular
crystal contacts, whose physiological signi®cance is
uncertain. We will focus on the two internal loops
at positions 219 and 226, since ten of the 13 adeno-
sine platform candidates in 16 S and 23 S rRNA
occur in internal loops (two occur in bulge loops,
and the last occurs in a multi-stem loop (Table 2
and Figure 5)). The adenosine platform at the hair-
pin loop at position 172 of the P4-P6 domain will
not be considered here, in part because it is also
involved in a intramolecular crystal interaction that
is not physiological.
The P4-P6 domain, as represented by the
T. thermophila crystal structure, is only present in
the C1 and C2 subgroups of the group I introns
(Michel & Westhof, 1990; Damberger & Gutell,
1994). To ensure that we are comparing similar
structural elements, we only analyzed those C1
sequences that have the same number of nucleo-
tides as T. thermophila at the positions involved
in the two adenosine platforms. Only 110 of the
319 sequences in the group C1 intron alignment
have a symmetric 3 Â 3 internal loop at position
219 in our sequence alignments and data set.
Table 7 reveals the high degree of conservation
of the two adenosine residues 5H
of the loop-
helix junction; 98 % of the sequences have an A
residue at positions 218 and 219. Position G220
and its pairing partner U253 are each conserved
in approximately 70 % of the sequences, while
the G:U base-pair occurs in less in less than
60 % of the sequences. The second most common
base-pair is C:G, followed by A:U and G:C. In
Table 5. A-motif: AA sites in 16 S and 23 S rRNA
Positiona
AA (%)b
Sequencec
Predicted
Structure
Motifsd
Loope
A. Hairpin loops
16 S rRNA
162 99 AAC A, U GNRA
622 99 AAC U 5
696 100 AAU A, U 6*
1170 97 AAA 5*
1519 97 AAG A GNRA
23 S rRNA
127 100 AAC A GNRA
390 72 AAA 7
752 92 AAA U 8*
1085 100 AAA U 3
1367 66 AAG GNRA
1635 55 AAU A 5*
2311 84 AAU 7
B. Internal loops
16 S rRNA
374 100 AAU A
449 52 AAG E
676 100 AAU A, GA
782 100 AAC A, EL
909 100 AAC A, E
1447 94 AAC
23 S rRNA
257 60 AAG E
346 89 AAA
515 100 AAC U
677 82 AAC
901 60 AAC
911 100 AAC
1143 100 AAA
1322 71 AAG
1655 99 AAC A
2015 90 AAU
2741 100 AAC A, GA, U
Positiona
AA (%)b
Sequencec
Predicted
Structure
Motifsd
Loope
C. Multi-stem loops
16 S rRNA
120 99 AAC U
510 99 AAC
959 100 AAU A, GA
1005 51 AAU
23 S rRNA
182 56 AAC A
218 61 AAA
223 94 AAU A, GA, U
300 99 AAC EL
429 98 AAA
483 58 AAC A, U
735 99 AAC
793 61 AAA A, GA
821 100 AAU U
1275 99 AAA
1302 68 AAG
1610 100 AAC
1786 100 AAA
1978 100 AAC A
2199 100 AAC A, GA, U
2287 65 AAA A, GA
2426 98 AAC U
2433 100 AAA U
2734 50 AAG
D. Bulge loops
16 S rRNA
51 87 AAC
72 58 AGC
642 51 AAC
23 S rRNA
1900 89 AAA
rRNA positions have an AA motif in more than 50 % of the bacterial sequences and are indicated in orange in Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
Most prevalent loop-helix sequence.
d
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).
e
Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al. 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
346 Unpaired Adenosine Bases in Ribosomal RNAs
13. Table 6. A-motif: A sites in 16 S and 23 S rRNA
Positiona
A (%)b
Predicted
structure
motifsc
Loopd
A. Hairpin Loops
16 S rRNA
845 61 5
1016 94 A, U GNRA
1453 52 UNGG
23 S rRNA
199 l00 4
548 59 4
574 76 U 8
616 75 5*
1176 62 4
1918 93 7
2478 100 A 7*
2705 99 4*
2757 100 A 11*
2799 56 3
2826 100 7
2860 96 A, U GNRA
B. Multi-stem loops
16 S rRNA
16 100 A
315‡ 100 A
338‡ 99 A
366 65
495 99
546‡ 51
864 100 U
983 100 A
994 100
1101 100
1157 100 A, GA
1191 100
1339 100
1349 100 A, GA, E
1398 100 A
23 S rRNA
52 99 A, GA
73 100
94 81
149‡ 95 A
233 100 GA
270 92
340 99 A, GA, EL
412 98
432 100
460 99 A, GA, E
670 100
990 100
1103‡ 100 A
1384 100
1603 99
1829 100
2042 84
2062 100
2171‡ 100 U
2173‡ 100 A, GA, U
2346 100 A, GA
2358 98 A
2835 100 A
Positiona
A (%)b
Predicted
structure
motifsc
Loopd
C. Internal loops
16 S rRNA
151 100
174 94 A, GA
282 100 A
389‡ 100 A
482 98 A, GA
487 99 A, GA, E
535 100
715 100 A, GA
1306 100 A, GA
1408 99 A
1483 99 A, GA
1499 100
23 S rRNA
63‡ 56
91‡ 89
103 99 A
207 99 A, GA, E
1050 100
1419 95 A, GA
1664‡ 100
1689 100 A, GA, EL
1723 62
1745 53
1802 100 A, GA
1885‡ 98
2005‡ 85 A
2327‡ 100 A
2614 100
2657 # 100 A, GA, E
2690 68
D. Bulge loops
16 S rRNA
55‡ 100
65 94
130‡ 100
205 83
397‡ 100
595‡ 79 BT
1042‡ 55
1055 100
1196‡ 99
1227‡ 100
1394‡ 100
23 S rRNA
443‡ 100
739‡ 61 BT
896‡ 99
927‡ 89
1819 100
1981‡ 99
2051‡ 61
2873‡ 100
2879‡ 98
rRNA positions have an A motif in more than 50 % of the bacterial sequences and are indicated in yellow on Figure 5.
a
The position number is the nucleotide at the 3' loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is
base-paired; #, Sarcin/Ricin loop
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; BT, base triple; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion)
d
Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
Unpaired Adenosine Bases in Ribosomal RNAs 347
14. addition, positions 219:254 do not form Watson-
Crick base-pairs.
A total of 139 of the 319 ICI sequences had a
3 Â 2 internal loop at position 226 (Table 7). As in
the previous example, adenosine bases are the
most frequent nucleotide at the two positions 5H
of
the loop-helix junction; however, the frequencies of
these two adenosine bases are not as high. One-
quarter of the sequences have a C base in place of
the adenosine at position 226, which is consistent
with previous sequence analysis and in vitro selec-
tion experiments (Costa & Michel, 1997). The G at
position 227 and the G:U base-pair at positions
227:247 are both present in 65 % and 62 % of the
sequences, respectively. One of the most conserved
features of the 226 adenosine platform is the
U224:A248 reverse Hoogsteen base-pair, which
occurs in 87 % of the sequences. While all four
nucleotides are observed at the bulge at position
249, 88 % of the sequences are pyrimidine bases; in
the P4-P6 crystal structure (Cate et al., 1996a), this
position is involved in the tertiary interactions
with the tetraloop at position 150 and can poten-
tially form a hydrogen bond to A226 (Costa &
Michel, 1997).
The adenosine platform at position 226 in the
P4-P6 domain crystal structure widens the minor
groove of the RNA helix to allow tertiary contact
with the tetraloop at positions 150-153. The tetra-
loop receptor in the absence of bound tetraloop
assumes an alternate structure, with the adenosine
forming a cross-strand stack (Butcher et al., 1997).
The adenosine bases, rather than forming the side-
by-side arrangement observed in the crystal struc-
ture, are arranged in a stacked zipper-like arrange-
ment. In addition, the ®rst adenosine nucleotides
of the two platforms (218 and 225) become suscep-
tible to methylation by dimethylsulfate when the
tetraloop-receptor interaction is disrupted by
mutation (Murphy & Cech, 1994). Thus, the adeno-
sine platform motif appears to have both confor-
mational and sequence plasticity. The majority of
the ICI sequences with the same internal loop con-
®guration as the Tetrahymena group I intron (see
above) have an adenosine and purine juxtaposed
and adjacent to the G:U base-pair (positions 219
and 254, and 226 and 248; see Table 7).
The most conserved features of the two group I
intron adenosine platforms that occur at internal
loops are the two consecutive adenosines at the 3H
end of the loop. The paired G at the 3H
loop.5H
helix
junction and the G:U base-pair are also moderately
conserved. Since the majority of the 16 S and 23 S
rRNA adenosine platform candidates are more
conserved at these four positions than the two
known intron adenosine platforms, it is reasonable
to expect this motif to occur at the majority (if not
all) of the 16 S and 23 S rRNA AAG:U sequence
motifs listed in Table 2. Also note that the majority
(77 %) of the rRNA platform candidates occur in
internal loops (Table 2 and Figure 5). Most of our
16 S and 23 S rRNA adenosine platform candidates
also have an adenosine and purine juxtaposed and
adjacent to the G:U base-pair facing the loop
(Gautheret et al., 1995b), similar to the two intron
adenosine platforms; the most notable exception is
the junction at position 1890 in the 23 S rRNA,
where a highly conserved (97 %) uridine at position
1852 is opposite the ®rst A at position 1890. Two
sets of rRNA adenosine platform candidates (16 S
rRNA positions 415 and 432, and 23 S rRNA pos-
itions 1854 and 1890) occur at the two opposing
ends of the same internal loop. The structural and
functional signi®cance of this tight clustering of
adenosine platforms is currently unknown. We
wonder if these two potential adenosine platform
Table 7. Base composition of adenosine platforms in group IC1 introns
Percentagea
A C G U A C G U Pairingb
Structurec
a
Percentages were determined as described in the text. Only percentages greater than 1 % are shown.
b
Base-pairing occurring in more than 5 % of the sequences examined.
c
Partial secondary structure of the Tetrahymena thermophila IC1 intron (GenBank #J01235). The complete structure is available at
http://www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html
d
Indicates base present in the P4-P6 subdomain of Tetrahymena thermophila.
348 Unpaired Adenosine Bases in Ribosomal RNAs
15. motifs form simultaneously, or perhaps alternate
in formation during protein biosynthesis. Addition-
ally, six of the putative adenosine platforms in
Table 2 overlap with other A-motifs, e.g. 16 S
rRNA position 1289 is part of the adenosine plat-
form and the AA.AG@helix.ends motif 16 S rRNA
position 415 (Elgavish et al., unpublished results) is
part of the adenosine platform and the E-like loop
motif (see below). The A-motifs, that are associated
with adenosine platforms are noted in Table 2.
E and E-like loops
Comparative sequence analysis has identi®ed
potential E loop motifs (Varani et al., 1989;
Wimberly et al., 1993) in both 16 S and 23 S rRNA
(Gautheret et al., 1994; Wimberly, 1994; Leontis &
Westhof, 1998). Thirteen dominant A sites in
Tables 2-6 overlap with eleven E loops; each occur-
rence is indicated in these Tables. Two 16 S and
eight 23 S rRNA loop E motifs were predicted ear-
lier. The 16 S rRNA positions are 909 (Table 5) and
1349 (Table 6); the 23 S rRNA positions are 207
(Table 6), 244 (Table 4), 374 (Table 4), 460 (Table 6),
674, 1189 (Table 4), 1268 (Table 4), and 2657
(Table 6). Our analysis identi®ed all of these except
for position 674 in 23 S rRNA. This E loop motif
overlapped with two positions (674 and 806) that
are now base-paired in our covariation structure
model (comparative support shown in base-pair
frequency tables at the CRW Site; see Materials
and Methods) but were unpaired at the time that
the E loop was proposed (Leontis & Westhof,
1998). Therefore, we don't consider this putative E
loop to be valid.
Our analysis of dominant A positions has also
revealed two new E loop sequence motifs. The ®rst
is at positions 447-449 and 484-487 in 16 S rRNA,
with both positions 449 and 487 containing a domi-
nant A. This potential E loop motif is at the center
of an elongated and irregular compound helix.
This motif is ¯anked on one side by a helix and on
the other by a lone pair (450:483, E. coli number-
ing). A tandem G:A base-pair is on the other side
of this lone pair. The second new E loop sequence
motif is in the 23 S rRNA at positions 858-861 and
916-918. The nucleotides in this motif were paired
in the older versions of the 23 S rRNA secondary
structure model, thus preventing its detection until
now. The previous base-pairs were removed from
the current structure model since the variations at
the individual positions were not matched by a
similar pattern of variation at the partner positions.
Our analysis of the dominant A bases at the 3H
end of loops has also revealed a sequence motif
that is similar to but not identical with the E loop
motif. The canonical E loop motif has an asym-
metric 4Â3 internal loop, as shown in Figure 6(a).
For sequences 5H
-NGUAP-3H
and 5H
-QGAA-3H
, P
and Q (positions 5 and 6) are base-paired, with
unusual pairing conformations between positions 1
and 9, 3 and 8, and 4 and 7 (Figure 6(a)). In con-
trast, our E-like loop motif, as we like to call it,
also contains the two sequences 5H
-NGUAP-3H
and
5H
-QGAAZ-3H
(Figure 6(b)). Here again, P and Q
(positions 5 and 6) and N and Z (positions 1 and
10) form two canonical base-pairs, leaving the 5H
GUA-3H
in sequence 1 juxtaposed with the 5H
-GAA
3H
in sequence 2. Presumably three additional pair-
ings are formed: G:A (2 and 9), U:A (3 and 8), and
A:G (4 and 7). The conformations for the second
and third pairings, U:A and A:G are related to the
G:A type II tandems as described by Gautheret
et al. (1994). Here, the invariant U:A base-pair is
thought to adopt the reverse Hoogsteen confor-
mation, adjacent to a sheared A:G base-pair, result-
ing in the two adenosine bases protruding into the
minor groove and overwinding the helix. This
arrangement of nucleotides is present in the bac-
terial version of the 5 S rRNA E loop, and is called
the cross-strand A stack (Correll et al., 1997). Poss-
ibly the ®rst sheared A:G base-pair (positions 2
and 9 in Figure 6(b)) underwinds the helix and
returns it to register. Eight E-like loop motifs are
present in the conserved core of the 16 S and 23 S
rRNAs and contain eleven dominant A sites. Three
of these motifs occur at positions 413-415/428-430,
765-767/812-814, and 780-782/800-802 in the 16 S
rRNA; ®ve more occur in the 23 S rRNA at
positions 298-300/338-340, 1358-1360/1371-1373,
1475-1477/1514-1516, 1687-1689/1699-1701, and
1930-1932/1968-1970. Five of these E-like loops
occur in internal loops; three are present in multi-
stem loops. The A-motifs that are associated with
E and E-like loops are noted in Tables 2-6.
AA.AG@helix.ends and tandem G:A base-pairs
Adenosine bases at the 3H
end of loops have also
been associated with G:A base-pairs at the end of
helices (Traub & Sussman, 1982; Woese et al.,
1983). Here, the helix is extended by at least one
G:A base-pair (for example, the sequences 5H
-AGP-
3H
and 5H
-QCG-3H
interact to form A:G, G:C, and
P:Q base-pairs). G:A juxtapositions have been
Figure 6. Schematic of E and E-like loops. Nucleotides
are numbered for reference. Types of base-pairing are
indicated by lines: canonical pairings (G:C, A:U) have
thick, continuous lines, type II tandem G:A pairings
have thin, broken lines, and other non-canonical pair-
ings are shown with thick, broken lines. (a). Canonical E
loop, where positions 1-4 and 7-9 comprise the 4 Â 3
internal loop. (b). E-like loop. Positions 2-4 and 7-9 com-
prise the 3 Â 3 internal loop.
Unpaired Adenosine Bases in Ribosomal RNAs 349
16. shown to be energetically stable in one thermo-
dynamic study of bulge loops (Longfellow et al.,
1990). More recently, we have analyzed a large
number of 16 S and 23 S rRNAs comparative struc-
ture models and con®rmed that many helices do
close with a G:A juxtaposition (Elgavish et al.
unpublished results). However, we also noted in
our comparative study that many of these juxtapo-
sitions in E. coli are maintained in at least 90 % of
the sequences and found, in addition to the G:A
juxtapositions, that many helices are ¯anked by
A:A or A:A/G:A juxtapositions. Our studies
revealed a strong bias in the orientation for these
G:A base-pairs: A is always 5H
to the helix, while G
or A is 3H
to the helix. These observations are con-
sistent with the bias for unpaired adenosine bases
at the 3H
end of loops and for the high percentage
of unpaired G and A at the 5H
end of loops. Note
that some of these AA.AG@helix.ends are a
component of E and E-like loops and that GNRA
tetraloops (Woese et al., 1990) have the AA.AG@
helix.ends motif. A total of 116 A-motifs are associ-
ated with AA.AG@helix.ends and are noted in
Tables 2-6.
Several of these A:A and G:A juxtapositions at
the 5H
end of helices are ¯anked on their 5H
side by
a second A:A or G:A pair. Tandem G:A and A:A
pairs in the 16 S and 23 S RNA were identi®ed ear-
lier (SantaLucia et al., 1990; Gautheret et al., 1994),
and can adopt a single structure conformation that
is consistent with their pattern of nucleotide
substitutions (Gautheret et al., 1994). We have
searched again for these tandem G:A/A:A motifs
in our newer 16 S and 23 S rRNA comparative
structure models and our larger collection of
comparative rRNA structure models. In addition to
the tandems identi®ed earlier (Gautheret et al.,
1994), we have found 23 new tandems that
are conserved in at least 90 % of the bacterial 16 S
and 23 S rRNA sequences. Fifty A-motifs are
associated with G:A tandems, and they are noted
in Tables 2-6.
U-turns
The U-turn, a structure motif characterized by a
sharp turn in the RNA, was ®rst identi®ed in the
tRNA crystal structure (Quigley & Rich, 1976), and
subsequently has been found in several other
RNAs (Pley et al., 1994; Jucker & Pardi, 1995;
Huang et al., 1996; Fountain et al., 1996; Conn et al.,
1999; Culver et al. 1999; Stallings & Moore, 1997;
Puglisi & Puglisi, 1998).
Dominant A nucleotides at the 3H
end of 16 S
and 23 S rRNA loops are also found in some of the
tetra- and hexanucleotide hairpin loops that form
U-turns (Woese et al., 1990; Jucker & Pardi, 1995;
Huang et al., 1996; Fountain et al., 1996). In both of
these loop mo®fs, a base-pair forms between the
guanosine at the ®rst position of the hairpin loop
(and 3H
to the helix), and the adenosine at the last
position of the loop (and 5H
to the helix). Recently,
we have predicted, based on the analysis of many
comparative structure models, 57 positions in the
16 S and 23 S rRNA where the U-turn motif might
occur (Gutell et al., 2000). The 39 U-turn candidates
that are coincident with A-motifs are noted in
Tables 2-6. Of these, 22 occur in hairpin loops; 13
(59 %) of these are GNRA tetraloops. The remain-
ing 17 occur in internal loops and multi-stem
loops.
Concluding comments
Of the 527 positions at the 3H
end of loops in the
16 S and 23 S rRNA, nearly 300 are occupied with
a dominant A, an adenosine that occurs in more
than 50 % of the bacterial sequences. Larger
sequence motifs that occur frequently are built
onto these A-motifs. There are 102 A, 56 AA, 80
AG, 43 AAG, and 13 AAG:U A-motifs. A total of
51 % of of these sites are part of a known structural
motif (Table 8(a)). Of these, 39 % of the A-motifs
are associated with the AA.AG@helix.ends motif;
14 % of these are within GNRA tetraloops. Tandem
G:A pairs and U-turns are also common, occurring
at 17 % and 14 % of the A-motif sites, respectively.
There are smaller percentages of adenosine
platforms (4 %) and E loop (4 %) and E-like loop
(4 %) sequence motifs (Table 8(a)).
Some of these structural motifs are part of a lar-
ger structural element. For example, some of the
AA.AG@helix.ends motifs are within the bound-
aries of E and E-like loops, the tandem G:A motif,
and GNRA tetraloops. Some of these GNRA hair-
pin loops are themselves involved in larger tertiary
folds (Jaeger et al., 1994; Costa & Michel, 1995;
Cate et al., 1996b). Other A-motifs are associated
with more than one structural motif in which one
motif is not entirely contained within the other.
Here, the structural motifs involve positions that
are not utilized by the other, except for the domi-
nant A at the 3H
end of the loop. For example, pos-
ition 415 in 16 S rRNA is part of the E-like loop
and adenosine platform motifs. Two examples
where a single dominant A is part of both an ade-
nosine platform and a G:A tandem are at 16 S
rRNA position 432 and position 1854 in 23 S
rRNA. Although our understanding of RNA struc-
tural motifs is not complete, these overlapping and
possibly competing structural A-motifs suggest
that these junctions of the RNA might be under-
going conformational changes. In total, only one
structural motif occurs at 51 % of the A-motifs that
are associated with a known structural motif
(Table 8). A total of 37 % are associated with two
structural motifs, and 13 % are associated with
three structural motifs.
In contrast, we are unable to predict the struc-
ture conformation for 49 % of the A-motifs. There-
fore, there is the possibility that new structural
motifs occur at these positions. Alternatively, struc-
tural motifs that we are already familiar with occur
at these A-motifs with a composition and arrange-
ment of nucleotides that were not previously
associated with that motif (for example, adenosine
350 Unpaired Adenosine Bases in Ribosomal RNAs
17. platforms occur at positions with sequences other
than AAG:U). To help resolve this issue, the con-
formations of these adenosine bases in the 30 S
and 50 S ribosomal subunit crystal structures (Ban
et al., 2000; Schluenzen et al., 2000; Wimberly et al.,
2000) need to be analyzed. Some 8 % of the
A-motifs are single bulge adenosine nucleotides;
while the structural signi®cance for all of them are
unknown, covariation analysis and NMR have
revealed a base-triple in 16 S rRNA between a
bulged A at position 595 and the base-pair at
596:644 (CRW Site; Kalurachchi et al., 1997).
Although the thermodynamic consequences of
the unpaired adenosine bases identi®ed here in the
covariation-based structure models are not known,
an earlier thermodynamic study of internal loops
revealed that unpaired adenosine bases in asym-
metrical loops are more destabilizing than those in
symmetrical loops (Peritz et al., 1991). The three
sets of results, (1) this thermodynamic study; (2)
the preponderance of adenosine bases in unpaired
regions of the covariation-based structure model,
with the majority of these occurring in asymmetri-
cal loops; and (3) the structural studies that reveal
that the majority of these unpaired adenosine
nucleotides are base-paired, albeit in an irregular
manner (Cate et al., 1996a,b; Ban et al., 2000;
Schluenzen et al., 2000; Wimberly et al., 2000), may
all be coordinated and in¯uence RNA folding. We
speculate that these destabilizing, asymmetrically
placed adenosine nucleotides are a signi®cant com-
ponent in the transition from secondary to tertiary
RNA structure. The destabilizing effects of these
adenosines on secondary structure, coupled with
the need for an RNA molecule to adopt its minimal
energetic state, suggest that these abundant adeno-
sine nucleotides will actively seek out energetically
stabilizing tertiary interactions and, in the process,
form a three-dimensional RNA molecule.
The propensity for conserved and unpaired ade-
nosine bases in the 16 S and 23 S rRNA covariation
structure models must be related to the structure
and function of the ribosome. As stated earlier,
unpaired positions in the covariation structure
model do not imply that those positions are not
paired; it (only) says that they don't pair in the
regular manner that most covariation-based base-
pairs do. And given that other unpaired positions
are paired, albeit irregularly, in other RNA
molecules whose structures have been solved by
crystallography or NMR (e.g. adenosine platforms,
E loops), we anticipate these unpaired positions in
the 16 S and 23 S rRNA covariation structure
models to be paired. We now wonder if these unu-
sual pairings can be predicted with comparative
analysis. Our A story is a beginning towards this
end.
As noted, the A-motifs come in various forms,
i.e. A, AA, AG, AAG, and AAG:U, and these are
associated with several known structural motifs.
These observations suggest that unpaired adeno-
sine bases can form a variety of different structural
conformations. What is special about adenosine
that lends itself to participating in these structural
motifs? And in some situations, it appears as
though at least two different structural elements
can occur at the same A-motif. Does one structural
motif predominate at these positions, or do these
sites provide the ribosome with an opportunity to
alternate conformations during the ribosome cycle?
Is the prevalence of adenosine bases at these pos-
itions related to the ability of adenosine to accom-
modate a variety of binding partners, perhaps its
base stacking potential, or other interesting inter-
actions? The A story is not ®nished.
Table 8. Summary of domainant A nucleotides and related motifs (based upon Tables 1-6)
A. Occurrences of motifs at dominant A positions
Category 16 S rRNA 23 S rRNA Total
1 # of adenosine platforms 3 (3 %) 10 (5 %) 13 (4 %)
2 # of loops 4 (4 %) 8 (4 %) 12 (4 %)
3 # of E-like loops 4 (4 %) 7 (4 %) 11 (4 %)
4 # of AA,AG@helix.ends 44 (44 %) 72 (37 %) 116 (39 %)
4a # of AA,AG@helix.ends in GNRA tetraloops 8 (8 %) 8 (4 %) 16 (5 %)
4b # of other AA,AG@helix.ends 36 (36 %) 64 (33 %) 100 (34 %)
5 # of tandem GA's 13 (13 %) 37 (19 %) 50 (17 %)
6 # of U-turns 11 (11 %) 29 (15 %) 40 (14 %)
7 # of single bulges 9 (9 %) 14 (7 %) 23 (8 %)
8 Total # of dominant A bases associated with motifs (1-6)a
51 (51 %) 98 (51 %) 149 (51 %)
9 # of dominant A bases not associated with motifs (1-6) 49 (49 %) 96 (49 %) 145 (49 %)
10 Total # of dominant A bases at 3H
ends of loops (8 ‡ 9) 100 194 294
B. Number of motifs per dominant A nucleotide (not including single bulges)
Motifs 16 S rRNA 23 S rRNA Total
1 25 (49 %) 51 (52 %) 76 (51 %)
2 24 (47 %) 30 (31 %) 54 (36 %)
3 2 (4 %) 17 (17 %) 19 (13 %)
Total # of dominant A bases 51 98 149
Total # of associated motifs 79 162 241
Average # of associated motifs per dominant A position with an associated
motif
1.5 1.7 1.6
a
A single dominant A may be associated with 1-3 motifs.
Unpaired Adenosine Bases in Ribosomal RNAs 351
18. Materials and Methods
Additional supporting data is presented at the CRW
Site (http://www.rna.icmb.utexas.edu) and the CRW A
Story pages (http://www.rna.icmb.utexas.edu/ANAL-
YSIS/A-STORY/). The CRW A story information
supplements the data presented in Figures 1-4 and
Tables 1-8 and is divided into four categories: general
data; position-speci®c data; structure diagrams; and
manuscript materials. The general data (GE) section con-
tains generalized counts for the number and frequency
of different A-motifs in the 16 S and 23 S rRNA com-
parative structure models from the (1) bacteria (summar-
ized in Figures 1-4); (2) the archaea and eucarya
(nuclear, chloroplast, and mitochondria); and (3) A-motif
analysis of the comparative structure models from 5 S
rRNA and group I introns. The position-speci®c data
(PS) section presents frequency tables for all of the 16 S
and 23 S rRNA positions which contain an A-motif (with
data from the three phylogenetic domains, chloroplasts,
and mitochondria); larger motifs (adenosine platforms,
E and E-like loops, AA.AG@helix.ends, tandem G:A
pairings, and U-turns) that map onto the A-motifs are
identi®ed. Frequency tables for E and E-like Loops
(including only bacterial data) are also provided here.
The structure diagrams (SD) section contains Figure 5
and includes secondary structure diagrams for each of
the motifs examined in these motifs. The manuscript
materials (MS) section contains all of the Figures and
Tables from this manuscript.
The RNA sequence alignments used for this analysis
are maintained by us at the University of Texas (R.R.G.,
unpublished results; CRW Site). Sequences were manu-
ally aligned with the alignment editor AE2 (T. Macke,
Scripps Research Institute, San Diego, CA). As of June
2000, the bacterial 16 S alignment contains 5859
sequences, and the bacterial 23 S alignment contains 327
sequences; both alignments use E. coli (GenBank Acces-
sion # J01695) as their reference sequence for position
numbers. The group I intron (C1 subclass) alignment
contains 319 sequences and uses T. thermophila (GenBank
Accession # J01235) as its reference sequence for position
numbers. Two subalignments of 110 and 139 sequences
having the appropriate arrangement of nucleotides at the
219 and 226 adenosine platform internal loops (see the
text) were created from this larger alignment. These
sequence alignments will be available from this site in
the future.
Secondary structure models for representatives of the
main phylogenetic groupings are inferred by compara-
tive sequence analysis (Gutell; 1996; Gutell et al., unpub-
lished results). As of June 2000, a total of 399 16 S
rRNAs, 292 23 S rRNA, 73 5 S rRNAs, and 174 group I
intron secondary structure models are in our collection
(CRW Site). At present, only a subset of these diagrams
(those diagrams incorporating all of the newest pairings
in our re®ned structure models and in which we have
the most con®dence) are publicly available; as diagrams
are updated to meet these standards, they will be made
available. For Figures 1-4, we counted the overall distri-
butions of the four nucleotides for the entire RNA struc-
ture, and for paired, unpaired, and loop-helix junction
positions, analyzing 278 bacterial structures (209 from
16 S rRNA and 69 from 23 S rRNA); a complete list of
these models is available online. We also present online
the detailed frequencies used to calculate the histograms
in Figures 1-4. For these tables (CRW A Story (GE)), we
have analyzed all of our 16 S, 23 S, and 5 S rRNA
(bacterial, archaea, eucarya, chloroplast, and mitochon-
dria) and group I intron comparative structure models.
The numbers of structure models analyzed for the online
tables are included in those tables. Other nucleotide dis-
tributions are listed dynamically on our online tables.
The programs that generate this information will be pre-
sented elsewhere (Z.S. & R.G., unpublished results).
These online tables will be routinely updated as more
comparative structure models are determined.
Positions at the 3H
ends of loops in the E. coli 16 S and
23 S rRNA secondary structure models were manually
identi®ed. Each site was classi®ed into one of four loop
types: hairpin, multi-stem, internal, or bulge. The pre-
dicted A-motif frequencies in Table 1 were calculated
using the nucleotide frequency values determined from
the bacterial 16 S and 23 S structures (above).
The program query (Gutell et al., unpublished
program) was used to collect nucleotide frequency data
from (AE2) sequence alignments. Base frequencies
for each site were computed independently from the
bacterial alignments (16 S and 23 S rRNA). For bacterial
data, sites with a given A-motif in more than 50 % of the
sequences (33 % for the AAG:U motif) are summarized
in Table 1 and detailed in Tables 2-6; the data from
Tables 1-6 are summarized with respect to structural
motifs in Table 8. Single nucleotide and base-pair
frequencies in Table 7 were calculated from the intron
alignments using query.
The secondary structure ®gures showing the A-motif
sites (Figure 5), the group I intron secondary structure
diagram portion in Table 7, and the additional secondary
structure diagrams available online were generated with
the program XRNA (Weiser & Noller, University of
California, Santa Cruz).
Acknowledgments
This work was supported by the NIH (awarded to
R.R.G., GM48207), NSF (awarded to M.S., MCB-
9707940), Welch Foundation (awarded to R.R.G.), and
from startup funds from the Institute for Cellular and
Molecular Biology at the University of Texas at Austin
(awarded to R.R.G.).
References
Agalarov, S. C., Prasad, G. S., Funke, P. M., Stout, C. D.
& Williamson, J. R. (2000). Structure of the
S15,S6,S18-rRNA complex: assembly of the 30 S
ribosome central domain. Science, 288, 107-112.
Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B.
& Steitz, T. A. (1999). Placement of protein and
RNA structures into a 5 A-resolution map of the
50 S ribosomal subunit. Nature, 400, 841-847.
Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz,
T. A. (2000). The complete atomic structure of the
large ribosomal subunit at 2.4 A resolution. Science,
289, 905-920.
Butcher, S. E., Dieckmann, T. & Feigon, J. (1997).
Solution structure of a GAAA tetraloop receptor
RNA. EMBO J, 16, 7490-7499.
Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,
B. L., Kundrot, C. E. et al. (1996a). Crystal structure
of a group I ribozyme domain: principles of RNA
packing. Science, 273, 1678-1686.
Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,
B. L., Szewczak, A. A., Kundrot, C. E., Cech, T. R.
352 Unpaired Adenosine Bases in Ribosomal RNAs
19. & Doudna, J. A. (1996b). RNA tertiary structure
mediation by adenosine platforms. Science, 273,
1696-1699.
Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest,
T. N. & Noller, H. F. (1999). X-ray crystal structures
of 70S ribosome functional complexes. Science, 285,
2095-2104.
Clemons, W. M., Jr, May, J. L. C., Wimberly, B. T.,
McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V.
(1999). Structure of a bacterial 30 S ribosomal
subunit at 5.5 A resolution. Nature, 400, 833-840.
Conn, G. L., Draper, D. E., Lattman, E. E. & Gittis, A. G.
(1999). Crystal structure of a conserved ribosomal
protein-RNA complex. Science, 284, 1171-1174.
Correll, C. C., Freeborn, B., Moore, P. B. & Steitz, T. A.
(1997). Metals, motifs, and recognition in the crystal
structure of a 5S rRNA domain. Cell, 91, 705-712.
Costa, M. & Michel, F. (1995). Frequent use of the same
tertiary motif by self-folding RNAs. EMBO J. 14,
1276-1285.
Costa, M. & Michel, F. (1997). Rules for RNA recog-
nition of GNRA tetraloops deduced by in vitro
selection: comparison with in vivo evolution. EMBO
J. 16, 3289-3302.
Culver, G. M., Cate, J. H., Yusupova, G. Z., Yusupov,
M. M. & Noller, H. F. (1999). Identi®cation of an
RNA-protein bridge spanning the ribosomal sub-
unit interface. Science, 285, 2133-2136.
Damberger, S. H. & Gutell, R. R. (1994). A comparative
database of group I intron structures. Nucl. Acids
Res. 22, 3508-3510.
Fountain, M. A., Serra, M. J., Krugh, T. R. & Turner,
D. H. (1996). Structural features of a six-nucleotide
RNA hairpin loop found in ribosomal RNA.
Biochemistry, 35, 6539-6548.
Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N.,
Caruthers, M. H., Neilson, T. & Turner, D. H.
(1986). Improved free-energy parameters for predic-
tions of RNA duplex stability. Proc. Natl Acad. Sci.
USA, 83, 9373-9377.
Gautheret, D., Konings, D. & Gutell, R. R. (1994). A
major family of motifs involving G:A mismatches in
ribosomal RNA. J. Mol. Biol. 242, 1-8.
Gautheret, D., Damberger, S. H. & Gutell, R. R. (1995a).
Identi®cation of base-triples in RNA using com-
parative sequence analysis. J. Mol. Biol. 248, 27-43.
Gautheret, D., Konings, D. & Gutell, R. R. (1995b). G:U
base pairing motifs in ribosomal RNAs. RNA, 1,
807-814.
Gutell, R. R. (1996). Comparative sequence analysis and
the structure of 16S and 23S rRNA. In Ribosomal
RNA: Structure, Evolution, Processing and Func-
tion in Protein Biosynthesis (Dahlberg, A. E. &
Zimmermann, R. A., eds), pp. 111-128, CRC Press,
Boca Raton, FL, USA.
Gutell, R. R. (1999). Comparative analysis of RNA
sequences. Nucl. Acids Symp. Ser. 41, 48-53.
Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F.
(1985). Comparative anatomy of 16S- like ribosomal
RNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216.
Gutell, R. R., Cannone, J. J., Konings, D. & Gautheret, D.
(2000). Predicting U-turns in ribosomal RNA with
comparative sequence analysis. J. Mol. Biol. 300,
791-803.
Huang, S., Wang, Y.-X. & Draper, D. E. (1996). Structure
of a hexanucleotide RNA hairpin loop conserved in
ribosomal RNAs. J. Mol. Biol. 258, 308-321.
Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement
of a GNRA tetraloop in Long-range RNA tertiary
interactions. J. Mol. Biol. 236, 1271-1276.
Jucker, F. M. & Pardi, A. (1995). GNRA tetraloops make
a U-turn. RNA, 1, 219-222.
Kalurachchi, K., Uma, K., Zimmermann, R. A. &
Nikonowicz, E. P. (1997). Structural features of the
binding site for ribosomal protein S8 in Escherichia
coli 16S rRNA de®ned using NMR spectroscopy.
Proc. Natl Acad. Sci. USA, 94, 2139-2144.
Leontis, N. B. & Westhof, E. (1998). A common motif
organizes the structure of multi-helix loops in 16 S
and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-583.
Longfellow, C. E., Kierzek, R. & Turner, D. H. (1990).
Thermodynamic and spectroscopic study of bulge
loops in oligoribonucleotides. Biochemistry, 29, 278-
285.
Michel, F. & Dujon, B. (1983). Conservation of RNA sec-
ondary structures in two intron families including
mitochondrial-, chloroplast- and nuclear-encoded
members. EMBO J. 2, 33-38.
Michel, F. & Westhof, E. (1990). Modeling of the three-
dimensional architecture of group I catalytic introns
based upon comparative sequence analysis. J. Mol.
Biol. 216, 585-610.
Michel, F., Costa, M., Massire, I. & Westhof, E. (2000).
Modeling RNA tertiary structure from patterns of
sequence variation. Methods Enzymol. 317, 491-510.
Murphy, F. L. & Cech, T. R. (1994). GAAA tetraloop
and conserved bulge stabilize tertiary structure of a
group I intron domain. J. Mol. Biol. 236, 49-63.
Nikulin, A., Serganov, A., Ennifar, E., Tishchenko, S.,
Nevskaya, N., Shepard, W., Portier, C., Garber, M.,
Ehresmann, B., Ehresmann, C., Nikonov, S. &
Dumas, P. (2000). Crystal structure of the S15-rRNA
complex. Nature Struct. Biol. 7, 273-277.
Peritz, A. E., Kierzek, R., Sugimoto, N. & Turner, D. H.
(1991). Thermodynamic study of internal loops in
oligoribonucleotides: symmetric loops are more
stable than symmetric loops. Biochemistry, 30, 6428-
6436.
Pley, H. W., Flaherty, K. M. & McKay, D. B. (1994).
Three-dimensional structure of a hammerhead ribo-
zyme. Nature, 372, 68-74.
Puglisi, E. V. & Puglisi, J. D. (1998). HIV-1 A-rich RNA
loop mimics the tRNA anticodon structure. Nature
Struct. Biol. 5, 1533-1036.
Quigley, G. J. & Rich, A. (1976). Structural domains of
transfer RNA molecules. Science, 194, 796-806.
SantaLucia, J., Kierzek, R. & Turner, D. H. (1990). Effects
of GA mismatches on the structure and thermo-
dynamics of RNA internal loops. Biochemistry, 9,
8813-8819.
Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J.,
Gluehmann, M., Janell, D., Bashan, A., Bartels, H.,
Agman, I., Franceschi, F. & Yonath, A. (2000). Struc-
ture of functionally activated small ribosomal sub-
unit at 3.3 AÊ resolution. Cell, 102, 615-623.
Serra, M. J., Axenson, T. J. & Turner, D. H. (1994). A
model for the stabilities of RNA hairpins based on
a study of the sequence dependence of stability for
hairpins with six nucleotides. Biochemistry, 33,
14289-14296.
Stallings, S. C. & Moore, P. B. (1997). The structure of
an essential splicing element: stem loop IIA from
yeast U2 snRNA. Structure, 5, 1173-1185.
Szewczak, A. A., Moore, P., Chan, Y-L. & Wool, I. G.
(1993). The conformation of the sarcin/ricin loop
Unpaired Adenosine Bases in Ribosomal RNAs 353
20. from 28S ribosomal RNA. Proc. Natl Acad. Sci. USA,
90, 9581-9585.
Tocilj, A., Schluenzen, F., Janell, D., Gluehmann, M.,
Hansen, H. A. S., Harms, J., Bashan, A., Bartels, H.,
Agmon, I., Franceschi, F. & Yonath, A. (1999). The
small ribosomal subunit from Thermus thermophilus
at 4.5 AÊ resolution: pattern ®ttings and the identi®-
cation of functional site. Proc. Natl Acad. Sci. USA.
96, 14252-14257.
Traub, W. & Sussman, J. L. (1982). Adenine-guanine
base pairing in ribosomal RNA. Nucl. Acids Res. 10,
2701-2708.
Varani, G., Wimberly, B. & Tinoco, I., Jr (1989). Confor-
mation and dynamics of an RNA internal loop.
Biochemistry, 28, 7760-7772.
Wimberly, B. (1994). A common RNA loop motif as a
docking module and its function in the hammer-
head ribozyme. Nature Struct. Biol. 1, 820-827.
Wimberly, B., Varani, G. & Tinoco, I., Jr (1993). The con-
formation of loop E of eukaryotic 5S ribosomal
RNA. Biochemistry, 32, 1078-1087.
Wimberly, B. R., Guymon, R., McCutcheon, J. P., White,
S. W. & Ramakrishnan, V. (1999). A detailed view
of a ribosomal active site: the structure of the L11-
RNA complex. Cell, 97, 491-502.
Wimberly, B. T., Broderson, D. E., Clemons, W. M., Jr,
Morgan-Warren, R. J., Carter, A. P., Vonrhein, C.,
Hartsch, T. & Ramakrishnan, V. (2000). Structure of
the 30 S ribosomal subunit. Nature, 407, 327-339.
Woese, C. R. & Pace, N. R. (1993). Probing RNA struc-
ture, function, and history by comparative analysis.
In The RNA World (Gesteland, R. F. & Atkins, J. F.,
eds), pp. 91-117, Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, NY.
Woese, C. R., Gutell, R., Gupta, R. & Noller, H. F.
(1983). Detailed analysis of the higher-order struc-
ture of 16S-like ribosomal ribonucleic acids. Microb.
Rev. 47, 621-669.
Woese, C. R., Winker, S. & Gutell, R. R. (1990). Architec-
ture of ribosomal RNA: constraints on the sequence
of ``tetra-loops''. Proc. Natl Acad. Sci. USA, 87, 8467-
8471.
Xia, T., SantaLucia, J., Jr, Burkard, M. E., Kierzek, R.,
Schroeder, S., Jiao, X., Cox, C. & Turner, D. H.
(1998). Thermodynamic parameters for an
expanded nearest neighbor model for formation
of RNA duplexes with Watson-Crick base-pairs.
Biochemistry, 37, 14719-14735.
Edited by D. E. Draper
(Received 7 July 2000; received in revised form 9 September 2000; accepted 9 September 2000)
354 Unpaired Adenosine Bases in Ribosomal RNAs