SlideShare uma empresa Scribd logo
1 de 20
A Story: Unpaired Adenosine Bases in
Ribosomal RNAs
R. R. Gutell1
*, J. J. Cannone1
, Z. Shang1
, Y. Du1
and M. J. Serra2
1
Institute for Cellular and
Molecular Biology, University
of Texas, 2500 Speedway
Austin, TX 78712-1095, USA
2
Department of Chemistry
Allegheny College, 520 N.
Main St., Meadville
PA 16335, USA
In 1985 an analysis of the Escherichia coli 16 S rRNA covariation-based
structure model revealed a strong bias for unpaired adenosines. The
same analysis revealed that the majority of the G, C, and U bases were
paired. These biases are (now) consistent with the high percentage of
unpaired adenosine nucleotides in several structure motifs.
An analysis of a larger set of bacterial comparative 16 S and 23 S
rRNA structure models has substantiated this initial ®nding and revealed
new biases in the distribution of adenosine nucleotides in loop regions.
The majority of the adenosine nucleotides are unpaired, while the
majority of the G, C, and U bases are paired in the covariation-based
structure model. The unpaired adenosine nucleotides predominate in the
middle and at the 3H
end of loops, and are the second most frequent
nucleotide type at the 5H
end of loops (G is the most common nucleotide).
There are additional biases for unpaired adenosine nucleotides at the
3H
end of loops and adjacent to a G at the 5' end of the helix. The most
prevalent consecutive nucleotides are GG, GA, AG, and AA. A total of
70 % of the GG sequences are within helices, while more than 70 % of the
AA sequences are unpaired. Nearly 50 % of the GA sequences are
unpaired, and approximately one-third of the AG sequences are within
helices while another third are at the 3H
loop.5H
helix junction.
Unpaired positions with an adenosine nucleotide in more than 50 % of
the sequences at the 3H
end of 16 S and 23 S rRNA loops were identi®ed
and arranged into the A-motif categories XAZ, AAZ, XAG, AAG, and
AAG:U, where G or Z is paired, G:U is a base-pair, and X is not an A
and Z is not a G in more than 50 % of the sequences. These sequence
motifs were associated with several structural motifs, such as adenosine
platforms, E and E-like loops, A:A and A:G pairings at the end of helices,
G:A tandem base-pairs, GNRA tetraloop hairpins, and U-turns.
# 2000 Academic Press
Keywords: RNA structure; comparative sequence analysis; unpaired
adenosines; structure motifs; computational biology/bioinformatics*Corresponding author
Introduction
RNA molecules can form similar secondary and
tertiary structures for sequences that are not identi-
cal, and in many situations with less than 50 %
sequence similarity. Comparative sequence anal-
ysis attempts to identify those structural elements
that are in common between different sequences
that are members of the same RNA family (e.g.
tRNA). Comparative sequence analysis has been
used successfully to predict secondary and tertiary
interactions in several RNA molecules (reviewed
by Woese & Pace, 1993: Gutell, 1996; Michel et al.,
2000). The majority of these interactions are com-
posed of G:C and A:U base-pairs (here, we de®ne
underlined nucleotides as base-paired), organized
into regular secondary structure helices, and ident-
i®ed with covariation analysis due to the manner
in which both paired positions coordinately
change, or covary, their nucleotide composition
(Woese et al., 1983; Gutell et al., 1985). Beyond the
prediction of standard base-pairs in secondary
structure helices, covariation analysis is also pre-
dicting non-standard base-pairs (e.g. A:G
exchanges with G:A, and U:U exchanges with C:C)
and base-pairs that form tertiary structure (Gutell,
1996; Gutell et al., unpublished results). We
now believe that all of the standard secondary
structure base-pairs in the Escherichia coli 16 S
E-mail address of the corresponding author:
robin.gutell@mail.utexas.edu
doi:10.1006/jmbi.2000.4172 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 304, 335±354
0022-2836/00/030335±20 $35.00/0 # 2000 Academic Press
and 23 S rRNAs have been identi®ed with our
covariation analysis. For those situations where we
can compare and contrast a solved crystal structure
with comparative data from a RNA sequence
alignment, paired positions with a strong covaria-
tion are nearly always base-paired in the crystal
structure (Gutell, 1999; Gutell et al., unpublished
results). Therefore, covariation analysis, when used
judiciously, can accurately predict base-pairs in an
RNA structure.
We now wonder what type of contribution com-
parative analysis will have on the prediction and
understanding of the three-dimensional structures
of the rRNAs (Ban et al., 1999, 2000; Cate et al.,
1999; Clemons et al., 1999; Tocilj et al., 1999;
Schluenzen et al., 2000; Wimberly et al., 2000). We
can begin to address this issue when we appreciate
that comparative analysis, in its most general form,
identi®es patterns of variation in its search for a
common structure. Base-pairs are predicted for
those positions that vary at the same time in the
evolution of that RNA, regardless of the type
of base-pairing and/or the arrangement of this
pairing in relationship with the ¯anking positions.
Since the majority of the base-pairs are G:C, A:U,
or G:U, and these pairs are arranged into standard
secondary structure helices, we conclude that cov-
ariation analysis can identify the basic building
blocks of RNA structure without any structural or
other preconceived biases.
Given this success, we now question if other
RNA building blocks or motifs can be deciphered
from our comparative RNA sequence and structure
data sets. Our traditional comparative secondary
structure model only shows those secondary
and tertiary structure base-pairs with positional
covariation within the underlying sequences plus
invariant Watson-Crick base-pairs which are
directly adjacent to base-pairs with positional
covariation. All of the unpaired positions in these
diagrams imply the lack of pairings with covaria-
tion, not that these positions are not paired or
interacting with other regions of the RNA. Can we
relate speci®c patterns of variation that occur
within a de®ned structural context to a three-
dimensional structure motif? Can we now predict
structure for the positions that do not covary with
other positions? Alternatively, we question what
types of structure occur at the unpaired positions
in the covariation structure model and ask if
can we develop principles that relate sequence
variation with these structural elements.
While some structural elements, such as base-
pairs and helices, form similar structures with
sequences whose positions covary, other structural
elements with similar shapes form sets of aligned
sequences that do not have positional covariation
with one another (Gautheret et al., 1995a). Com-
parative analysis of nucleotide distributions in
different structural elements has resulted in the
identi®cation of several sequence and structure
motifs in these unpaired regions. This list includes
tetraloops (Woese et al., 1990), tandem G:A base-
pairs (Gautheret et al., 1994), dominant G:U base-
pairs (Gautheret et al., 1995b), E-loops (Gutell et al.,
unpublished results; Gautheret et al., 1994;
Wimberly 1994; Leontis & Westhof, 1998), U-turns
(Gutell et al., 2000), and A:A and A:G base-pairs at
the ends of helices (here-after called AA.AG@he-
lix.ends). These sequence-based analyses are given
more meaning, biologically and structurally,
from their comparison with experimental studies,
especially the NMR and crystallographic analysis
of several rRNA fragments (Szewczak et al., 1993;
Kalurachchi et al., 1997; Conn et al., 1999;
Wimberly et al., 1999; Agalarov et al., 2000; Nikulin
et al., 2000). Our goals for the future are to identify
more biased distributions of nucleotides and
sequences in different structural arrangements, to
ascribe biological and structural signi®cance to
them, and to deduce sets of sequence-structure
relationship rules, from which we aspire to accu-
rately predict detailed RNA structure from a single
sequence.
In 1985, a simple count of the paired and
unpaired nucleotides in E. coli 16 S rRNA revealed
a strong bias for unpaired adenosine nucleotides
(Gutell et al., 1985). A total of 62 % of the adenosine
nucleotides were unpaired, while approximately
30 % of the G, C, and U bases were unpaired. The
structural signi®cance for this bias was not known
at the time. However, these biases are (now)
consistent with the high percentage of unpaired
adenosine bases in the GNRA tetraloops (Woese
et al., 1990), E-loops (Gautheret et al., 1994;
Wimberly, 1994; Leontis & Westhof, 1998), adeno-
sine platforms (Cate et al., 1996b) and AA side-step
(Conn et al., 1999) RNA sequence and structure
motifs found after this initial adenosine bias was
found.
Here, we follow up with a larger and more
detailed analysis of paired and unpaired nucleo-
tides in our collection of rRNA and group I intron
comparative structure models, track the frequently
occurring unpaired nucleotides, and associate these
with different structural motifs.
Results
The base compositions for 175 bacterial 16 S and
71 bacterial 23 S rRNA comparative structure
models have been analyzed and presented here.
For our online presentation (see Materials and
Methods for detailed explanations), we have ana-
lyzed a larger set of comparative structures from
5 S, 16 S, and 23 S rRNAs (including bacteria,
archaea, and eucarya nuclear, chloroplast, and
mitochondria sequences) and group I introns. Our
collection of structure diagrams represents all of
the major phylogenetic groups within the bacterial
domain (as well as for the other primary phylo-
genetic domains). The comparative structure
model is based on covariation analysis (Woese
et al., 1983; Gutell et al., 1985, unpublished results).
For the purposes of the current analysis, positions
336 Unpaired Adenosine Bases in Ribosomal RNAs
with substantial covariation or containing invariant
Watson-Crick base-pairs are base-paired and
positions that do not covary with other positions
are unpaired in our covariation structure model.
The current 16 S and 23 S rRNA secondary
structure models are available from http://
www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html
The frequencies for single nucleotide positions
are presented in histogram format (Figure 1). The
total frequencies for the four RNA nucleotides A,
U, C, and G were characterized into helices (base-
paired) and loops (unpaired), and then subdivided
further into the 5H
end, center, and 3H
end positions
for helices and loops. Overall, G (31.4 %) is the
most prevalent nucleotide, followed by A (25.7 %),
C (22.4 %), and U (20.5 %), G is also the most com-
mon nucleotide in helices (36.6 %), while A (14.5 %)
occurs with the lowest frequency in paired pos-
itions. Guanosine occurs with an even higher
frequency at the 5H
end of helices (46.2 %), where U
is the least frequent (13.5 %). Meanwhile, C is the
most abundant nucleotide at the 3H
end of helices
(38.1 %), followed by G (30.4 %). Adenosine is the
most prevalent nucleotide at unpaired positions,
occurring at 42.6 %, while C is the least common at
12.5 %. Adenosine is even more dominant at the 3H
end of loops, occurring in 53.5 % of the sequences.
Meanwhile, G is the most common nucleotide at
the 5H
end of loops (37.1 %); adenosine is second at
29.3 %. Another measure of the bias in unpaired
adenosine bases is revealed in the ratio of unpaired
to paired nucleotides for single nucleotides (see
also the online query system). The unpaired/
paired ratio for each nucleotide is: A, 1.96; U, 0.71;
G, 0.43; and C, 0.29. Alternatively, 66.2 % of the
adenosine bases are unpaired; the percentages of
unpaired U, G, and C bases are 41.5 %, 30.1 %, and
22.3 %, respectively, for our collection of bacterial
16 S and 23 S rRNA structure models. These
values are similar but not identical with the values
determined for the 1985 version of the E. coli 16 S
rRNA covariation structure model (Gutell et al.,
1985). The same trends and nucleotide biases also
occur for our other RNA structure models (avail-
able online).
Figure 1. Frequency and
distribution of single nucleotides in
bacterial 16 S and 23 S rRNAs com-
parative structure models. The total
number of occurrences for each of
the four nucleotides at nine struc-
tural categories: total (all positions),
paired, unpaired, 5H
-helix.end
(5H
end of a helix), 3H
-helix.end
(3H
end of a helix), 5H
-loop.end
(5H
end of a loop), 3H
-loop.end (3H
end of a loop), helix.center (all pos-
itions within a helix that are not at
the 5H
or 3H
ends of a helix), and
loop.center (all positions within a
loop that are not at the 5H
or 3H
ends of a loop).
Figure 2. Frequency and distribution of consecutive nucleotides in bacterial 16 S and 23 S rRNAs comparative
structure models. The total number of occurrences for the 16 dinucleotides at three structural categories: total (all
positions), in helix (paired), and in loop (unpaired).
Unpaired Adenosine Bases in Ribosomal RNAs 337
Next, we investigated the frequency and
distribution of consecutive nucleotides. The most
common dinucleotides are the four purine combi-
nations. Consecutive GG residues are the most
prevalent at 9.86 %, followed by GA (7.92 %), AG
(7.88 %), and AA (7.65 %) (Figure 2). The
dinucleotides were classi®ed into four categories:
paired (helical), unpaired (loop), and the two
paired/unpaired junctions, 3H
loop.5H
helix and
3H
helix.5H
loop. The most frequent consecutive dinu-
cleotides are distinctly different between these four
categories. In helices, GG (14.1 %), GC (10.4 %), CC
(9.0 %), and GU (8.3 %) are the most prevalent con-
secutive dinucleotides; note that these consecutive
dinucleotide arrangements are components of the
most stable nearest-neighbors (Xia et al., 1998). In
contrast, AA (19.2 %), GA (13.4 %), and UA (9.8 %)
are the most common adjacent dinucleotides in
loop motifs (Figure 2). Greater than 70 % of
the consecutive adenosine residues are within
unpaired regions, consistent with the observation
that 5H
-AA-3H
/3H
-UU-5H
is the least stable nearest-
neighbor (Xia et al., 1998).
The adjacent dinucleotides with the highest
unpaired to paired ratio are AA (5.68), UA (2.03),
GA (1.47), AU (1.20), while the three lowest ratios
are GC (0.17), GG (0.15), and CC (0.11). These
ratios again emphasize that adenosine bases tend
to be unpaired, consecutive adenosine bases are
even more likely to be unpaired, and that consecu-
tive G and C bases tend to be paired.
The most abundant dinucleotides at loop-helix
junctions were analyzed (Figure 3). CG (14.6 %),
GA (10.3 %), and CA (10.2 %) are the most abun-
dant at the 3H
helix.5H
loop junction; AG (25.0 %) and
AC (13.3 %) are the two most abundant pairs at the
3H
loop.5H
helix junction. These results are consistent
with the abundance of A and G bases at the 5H
end
of loops, A nucleotides at the 3H
end of loops, and
G and C nucleotides at the 5H
and 3H
ends of helices.
The strong preference for AG at loop-helix junc-
tions might not be a simple consequence of stab-
ility since all 5H
dangling ends have nearly the
same small stabilizing effect helices (Freier et al.,
1986). The most stable 3H
dangling end sequences,
CA, CG, GA, and GG (Freier et al., 1986), occur
frequently in our 16 S and 23 S rRNA structure
data sets (Figure 3).
Next, we investigated the frequencies for
three consecutive nucleotides - NNN and NNN at
loop.helix and helix.loop interfaces, where N is
unpaired and N is paired. Figure 4(a) and (b) dis-
play the 32 most prevalent trinucleotide combi-
nations for NNN (a) and NNN (b). The observed
triplets at these junctions are very biased in
their distributions. At the 3H
loop.5H
helix interface
(Figure 4(a)), AAG occurs in 14.4 % of the junc-
tions, followed by AAC (6.7 %) and GAG (5.4 %).
All of the 11 most frequent sequences contain at
least one unpaired A nucleotide; nine of these 11
trinucleotides have an A base at the extreme 3H
end
of the loop. The trinucleotides at the 3H
helix.5H
loop
interface (Figure 4(b)) are signi®cantly different.
The three most abundant trinucleotides are BGA,
where B is not A: CGA (7.6 %), UGA (5.8 %), and
GGA (5.4 %). The six most frequent sequences have
at least one adenosine base in the two unpaired
positions, with purines accounting for 11 of the 12
unpaired positions. In addition to these biased dis-
tributions of triplets at loop/helix junctions,
Figure 4(a) and (b) also reveal that only 32 of the
64 possible triplets account for more than 80 % of
these occurrences.
The most signi®cant ®ndings to this stage in our
analysis are the high percentages of: (1) unpaired
adenosine bases, with adenosine residues account-
ing for more than 50 % of the nucleotides at the 3H
loop ends; (2) paired guanosine bases, with guano-
sine accounting for nearly 50 % of the nucleotides
at the 5H
end of helices; (3) unpaired consecutive
adenosine bases; and (4) AG at 3H
loop.5H
helix
junctions.
Our next set of goals is to map these frequently
occurring nucleotides onto the 16 S and 23 S rRNA
comparative structure models, to determine those
positions where the unpaired adenosine residue at
the 3H
end of the loop occurs in more than 50 % of
the bacterial sequences, and to identify larger
motifs that build onto these dominant adenosine
bases. We rationalize that 3H
loop positions with an
adenosine in more than 50 % of the sequences
(hereafter called the ``A-motifs'') are important for
Figure 3. Frequency and distri-
bution of dinucleotides at loop-
helix junctions in bacterial 16 S and
23 S rRNAs comparative structure
models. Total number of occur-
rences of consecutive nucleotides at
the two loop-helix junctions,
3H
helix.5H
loop and 3H
loop.5H
helix.
338 Unpaired Adenosine Bases in Ribosomal RNAs
the formation of conserved structural motifs. A
total of 527 unpaired positions in the 16 S and 23 S
rRNAs are followed by a base-pair predicted with
covariation analysis. We expect, based upon the
observed nucleotide frequencies in the bacterial
16 S and 23 S rRNA sequences (A, 25.7 %; C,
22.4 %; G, 31.4 %; U, 20.5 %), adenosine to occur at
25.7 % (135 occurrences) of these 3H
loop ends for
any one set of 16 S and 23 S rRNA structures. We
observe that, collectively, the positions at the 3H
loop ends contain 54.5 % adenosine bases. The two
extreme cases for the distribution of these adeno-
sine bases among the 527 3H
loop ends are (1) the
adenosine nucleotides are distributed evenly, so
that each of the loop ends contains 54.5 % adeno-
sine; and (2) the adenosine nucleotides are concen-
trated such that 287 of the loop ends contain 100 %
adenosine. In fact, 294 of the 527 3H
loop ends have
an adenosine base in more than 50 % of the bac-
terial 16 S and 23 S rRNA sequences (Table 1); the
average conservation value for adenosine at these
positions is 93.7 %. Therefore, there is a very
pronounced bias for adenosines to be very
conserved at the 3H
loop ends of the 16 S and 23 S
rRNAs.
Of the 294 3H
loop ends with an adenosine base
in more than 50 % of bacterial sequences, 136 are
followed by a paired G in more than 50 % of those
sequences (AG motif; Table 1). In contrast, we
expect 43 of these motifs in the 16 S and 23 S
rRNAs, based on the observed nucleotide frequen-
cies (527*.257*.314). Finally, the number of AA and
AAG motifs observed is again more than the num-
ber expected for a random distribution (Table 1).
The distributions of the expected and observed A,
AA, AG, AAG, and AAG:U motifs in hairpin,
multi-stem, internal, and bulge loops were deter-
mined (Table 1). The number of observed A-motifs
at each of the loop motifs is (again) signi®cantly
larger than expected. (Note for the following
A-motifs (where each motif occurs in a minimum
of 50 % of the sequences): AAG, the G is not paired
to a U in more than 33 % of the sequences; AA, the
nucleotide 3H
of the second A is not a G in more
than 50 % of the sequences; AG, the nucleotide 5H
of the A is not an A in more than 50 % of the
sequences; A, the paired nucleotide following the
A is not a G in more than 50 % of the sequences
and the nucleotide preceding the A is not an A in
more than 50 % of the sequences.)
The A-motifs have been mapped onto the 16 S
and 23 S rRNA secondary structure models
(Figure 5). Each of ®ve motifs is assigned a differ-
ent color: AAG:U motifs are indicated in red, AAG
in green, AG in blue, AA in orange, and A in
yellow. Position numbers for the A-motifs in the
16 S and 23 S rRNA are listed in Tables 2
(AAG:U), 3 (AAG), 4 (AG), 5 (AA), and 6 (A).
The loop-helix junctions listed in Table 2 have
the AAG sequence present in more than 50 % of
the bacterial sequences, and G:U in more than 33 %
of the same sequence set. Thirteen 16 S and 23 S
rRNA junctions satisfy this criteria. The majority of
these occur in internal loops (10), and a few occur
in bulge (2) and multi-stem (1) loops; three occur
in 16 S rRNA, and ten appear in 23 S rRNA (see
Table 2 and Figure 5). The majority of these are
very well conserved, occurring with percentages
signi®cantly higher than the required minimum.
Seven have greater than 90 % AAG and 90 % G:U
base-pair conservation; the average conservation
values are 81 % AAG and 77 % G:U.
The remaining 43 AAG loop-helix junctions are
listed in Table 3. These junctions are distributed
more evenly than the AAG:U A-motif in hairpin
(9), multi-stem (19), and internal (14) loops, with
one in a bulge loop; 15 occur in 16 S rRNA and 28
occur in 23 S rRNA (see Table 3 and Figure 5).
More than 75 % of the hairpin junctions are part of
a GNRA tetraloop. Over half (23) of these AAG
junctions are conserved in more than 90 % of the
sequences, with an average conservation value of
Figure 4. Frequency and distribution of consecutive
trinucleotides at loop-helix junctions in bacterial 16 S
and 23 S rRNAs. The ranking of the top 32 most fre-
quent trinucleotides at the two loop-helix junctions,
3H
helix.5H
loop and 3H
loop.5H
helix. Two of the three con-
secutive nucleotides are unpaired at both junctions. The
paired nucleotides are underlined. (a) 3H
loop.5H
helix junc-
tion. (b) 3H
helix.5H
loop junction.
Unpaired Adenosine Bases in Ribosomal RNAs 339
86 %. The consecutive AA nucleotides are con-
served in approximately 93 % of the sequences.
AG loop-helix junctions are listed in Table 4.
There are 80 examples of this motif, with a sig-
ni®cant proportion occurring in internal (26),
multi-stem (28), and hairpin (17) loops, and the
remaining nine in bulge loops; 23 occur in 16 S
rRNA and 57 occur in 23 S rRNA (see Table 4
and Figure 5). Almost 60 % of the AG motifs are
conserved in more than 90 % of the sequences,
and 81 % of these motifs are conserved in more
than 70 % of the sequences. Six of the hairpin
loops are GNRA tetraloops; seven other loops
have unusually stable G:A mismatches between
the ®rst and last nucleotides of the hairpin loop
(Serra et al., 1994).
Figure 5 (legend shown on page 342)
340 Unpaired Adenosine Bases in Ribosomal RNAs
Figure 5 (legend shown on page 342)
Unpaired Adenosine Bases in Ribosomal RNAs 341
A total of 56 AA motifs (Table 5) occur pre-
dominantly in multi-stem (24), internal (16), and
hairpin (12) loops; four occur in bulge loops (see
Table 5 and Figure 5). 18 occur in 16 S rRNA
and 39 occur in 23 S rRNA. Over 60 % of these
motifs are conserved in more than 90 % of
the sequences. Table 5 also contains the most
prevalent AAN sequence at each motif
site (where N is base-paired; sites having
AAG > 50 % appear in Tables 2 or 3). Nearly
50 % of the AA motifs in Table 5 are AAC.
Eight of the hairpin loops have unusually stable
sequences, either GNRA tetraloops (4) or G:A
®rst mismatches (4) (Serra et al., 1994).
Figure 5. A-motifs mapped onto the Escherichia coli 16 S and 23 S rRNA comparative secondary structure models.
Unpaired positions at the 3H
end of loops that occur in more than 50 % of the bacterial sequences are highlighted in
different colors: XAZ, yellow; AAZ, orange; XAG, blue; AAG, green; and AAG:U, red; where X is not A in more
than 50 % of the sequences, Z is not G in more than 50 % of the sequences, and paired nucleotides are underlined.
Diagrams were generated using the program XRNA (Weiser, B. & Noller, H., University of California at Santa Cruz).
(a) 16 S rRNA. (b) 23 S rRNA, 5H
half. (c) 23 S rRNA, 3H
half.
342 Unpaired Adenosine Bases in Ribosomal RNAs
There are 102 A-motifs, with a signi®cant num-
ber of occurrences in multi-stem (38), internal (29),
bulge (20), and hairpin (15) loops; 41 occur in 16 S
and 61 occur in 23 S rRNA (see Table 6 and
Figure 5). A total of 77 % of the A motifs are
conserved in more than 90 % of the bacterial
sequences, and 50 % are 100 % conserved in those
sequences!
Discussion
Analysis of a large set of bacterial 16 S and 23 S
rRNA covariation-based comparative structure
models has revealed a propensity for adenosine
bases to be unpaired. A disproportionate number
of these unpaired adenosine nucleotides are con-
secutive, at the 3H
end of loops, and adjacent to a
paired G at the 3H
loop.5H
helix junction. The highly
conserved nature of the loop-helix junctions
described here suggests that they are an important
part of several different motifs. Because they occur
so frequently, we believe that they are a major
building block in the 16 S and 23 S rRNA struc-
tures. Our goal is to transform these sequence
motifs into structural motifs that help coordinate
three-dimensional structure. We have named the
adenosine bases that occur at the 3H
end of loops in
more than 50 % of the bacterial 16 S and 23 S
rRNA sequences A-motifs. These are associated
with several known structural motifs and are
classi®ed into ®ve categories: AAG:U, AAG, AG,
AA, and A.
Adenosine platforms
The ®rst set of loop-helix junctions to consider is
those with a AAG:U motif (Table 2 and Figure 5).
Thirteen positions in the 16 S and 23 S rRNA con-
tain the AAG sequence conserved in more than
50 % of the sequences (see Table 2) and the G:U
base-pair conserved in more than 33 % of the
sequences (16 S positions 415, 432, and 1289; 23 S
positions 14, 706, 1214, 1470, 1854, 1877, 1890,
2135, 2542, 2851). Seven of these sites (in italics) are
conserved in more than 90 % of the sequences.
This complex sequence motif forms the adeno-
sine platform present in the crystal structure of the
Tetrahymena thermophila group I intron P4-P6
domain (Cate et al., 1996a,b). To ascertain if the
adenosine platform-like sequence motifs in the
16 S and 23 S rRNA are capable of forming the
Table 1. Characterization of nucleotides at loop-helix junctions for loops with unpaired 5H
nucleotides in 16 S and
23 S rRNA
Loop type Total A AA AG AAG AAG:U
Total Measured 527 294 (56 %) 113 (21 %) 136 (26 %) 56 (11 %) 13 (2 %)
Predicted ± 135 (26 %) 35 (7 %) 43 (8 %) 11 (2 %) 2 (1 %)
Hairpin Measured 91 53 (58 %) 21 (23 %) 26 (29 %) 9 (10 %) 0 (±)
Predicted ± 24 (25 %) 6 (6 %) 8 (8 %) 2 (2 %) 0 (±)
Multi stem Measured 202 110 (54 %) 45 (22 %) 48 (24 %) 20 (10 %) 1 (1 %)
Predicted ± 51 (26 %) 13 (7 %) 16 (8 %) 4 (2 %) 1 (1 %)
Internal Measured 163 95 (58 %) 40 (25 %) 50 (31 %) 24 (15 %) 10 (6 %)
Predicted ± 42 (26 %) 11 (7 %) 13 (8 %) 3 (2 %) 1 (1 %)
Bulge Measured 71 36 (51 %) 7 (10 %) 12 (17 %) 13 (4 %) 2 (3 %)
Predicted ± 18 (25 %) 5 (7 %) 6 (8 %) 1 (1 %) 0 (±)
Junctions were counted if an A-motif occurred in greater than 50 % (33 % for AAG:U) of the sequences in the bacterial 16 S and
23 S rRNA alignments (http://www.rna.icmb.utexas.edu/). Predicted values were calculated with nucleotide frequencies: A
(25.7 %), G (31.4 %), and U (20.5 %); values are rounded to the nearest whole number. Percentages are calculated with respect to the
total number of positions for that loop type; values are rounded to the nearest whole number, with ``±'' used to represent zero.
Table 2. A-motif: AAG:U sites in 16 S and 23 S rRNA
Positiona
AA (%)b
AAG (%)b
G:U (%)b
Predicted
structure
motifs c
A. Multi-stem loops
23 S rRNA
14 99 99 98 P
B. Internal loops
16 S rRNA
415 76 75 59 EL, P
432 100 55 45 GA, P
1289 100 55 55 A, P
23 S rRNA
706 100 94 94 A, P
1214 97 97 97 A, P
1470 86 81 76 GA, P
1854 100 54 39 GA, P
1877 98 98 98 P
1890 100 100 100 P
2135d
86 48 46 P
C. Bulge loops
23 S rRNA
2542 100 100 99 P
2851 93 91 91 P
rRNA positions have an AAG:U motif in more than 33 % of
the bacterial sequences and are indicated in red on Figure 5.
a
The position number is the nucleotide at the 3H
loop end,
at the loop-helix junction.
b
More detailed information is available at http://
www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; EL, E-like Loop; GA, tandem G:A
base-pairs; P, adenosine platform (see Discussion).
d
Although this site contains less than 50 % AAG, it was
included because it contains more than 33 % G:U and narrowly
missed the required minimum for AAG.
Unpaired Adenosine Bases in Ribosomal RNAs 343
adenosine platform structural motif, we have ana-
lyzed the group I intron adenosine platforms from
a comparative sequence perspective. The crystal
structure of the P4-P6 domain of the group I intron
has three adenosine platforms at positions 172,
219, and 226 (numbers refer to the second A of
the AAG motif for the T. thermophila sequence
(GenBank Accession # J01235)). Each of the three
adenosine platforms occurs in a distinct structural
environment in the comparative secondary struc-
Table 3. A-motif: AAG sites in 16 S and 23 S rRNA
Position a
AA (%)b
AAG (%)b
Predicted structure motifsc
Loop d
A. Hairpin loops
16 S rRNA
383 98 70 A GNRA
901 100 97 A, U GNRA
23 S rRNA
311 91 84 U 6
633 100 77 U GNRA
1226 62 52 A, U GNRA
1810 95 88 A GNRA
1872 70 65 GNRA
1928 100 100 U 3
2361 62 55 6
B. Internal loops
16 S rRNA
1333 100 99 A
1434 98 94
1469 54 54
1493 99 99 A
1503 100 100
23 S rRNA
609 100 68 A
1001 99 99 A, GA
1156 98 85
1354 100 99 A, GA, U
1572 92 83 A, GA, U
1580 88 86 GA
1701 100 99 A, EL
2469 96 96 A, GA
2810 83 83 A
C. Multi-stem loops
16 S rRNA
60 98 98 A, GA
197 99 93 A
499 99 98
574 99 98
768 97 96 EL
873 100 89
915 100 85
938 100 99
23 S rRNA
423 100 93
472 94 94
603 53 53 A, GA
1010 100 53
1029 100 65 A, GA
1308 99 99
1641 86 85 A
2336 100 99
2378 100 96 A, U
2412 93 85
2566 100 100 A
D. Bulge loops
23 S rRNA
1848 100 96
rRNA positions have an AAG motif in more than 50 % of the bacterial sequences and are indicated in green on Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion).
d
Hairpin loop size (in nucleotides) and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of
the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
344 Unpaired Adenosine Bases in Ribosomal RNAs
ture model (Michel & Dujon, 1983; Michel &
Westhof, 1990) and the three-dimensional crystal
structure (Cate et al., 1996b): a hairpin loop at
position 172, a symmetric 3 Â 3 internal loop
at position 219 (where 3 Â 3 refers to the number
of nucleotides on each side of the internal loop),
and an asymmetric 3 Â 2 internal loop at position
226. They also differ in regards to the type of
tertiary interactions with which they are associ-
ated. The adenosine platform at position 226 is
part of the tetraloop receptor (Murphy & Cech,
1994; Cate et al., 1996b) that makes an intramolecu-
lar contact with a tetraloop at position 150, one of
the interactions responsible for aligning the two
Table 4. A-motif: AG sites in 16 S and 23 S rRNA
Position a
A (%) b
AG (%) b
Predicted
structure
motifs c
Loop d
A. Hairpin loops
16 S rRNA
300 100 100 A, U GNRA
1080 100 90 A, U GNRA
1269 100 72 A, U GNRA
23 S rRNA
167 100 99 9*
251 100 98 A 5*
322 100 71 3
466 100 79 A, U GNRA
492 99 75 5*
646 87 86 5*
1073 100 100 U 9
1098 99 99 A, U 6*
1618 98 95 A 6*
1755 100 73 3
2147 95 95 4*
2534 54 53 6
2598 100 100 A, U GNRA
2662 # 100 100 A, U GNRA
B. Multi-stem loops
16 S rRNA
8 98 98
26‡ 100 99 A
288 100 92
353 98 98 A
523‡ 100 99
828 80 71
860 96 88 A
1046‡ 100 99
1067 100 100 A, U
23 S rRNA
177‡ 59 58 A
324‡ 73 55
332 100 88
374 100 67 A, GA, E
532‡ 65 61
627 99 98 A, GA
655 98 98 A, GA
699‡ 99 95 A
945 99 76 A
975 99 99 A
1189 100 99 A, GA, E
1342 100 100 U
1791 100 98 A
1932 100 100 A, GA, EL
2119 100 100
2126 100 100 A, GA
2587 100 83 A, U
2629 63 57
Position a
A (%) b
AG (%) b
Predicted
structure
motifs c
Loop d
C. Internal loops
16 S rRNA
246‡ 100 100 A
520 100 100 A
665 70 67
687 100 97 A
802 100 99 A, EL
1252 72 68
1275 93 92
1418 100 98 A, GA
1456‡ 82 73
23 S rRNA
84 100 98
244 100 99 A, GA, E
294‡ 100 88 A
861 100 96 A, GA, E
878 86 73
1111 100 100
1237 100 82
1268 100 65 A, GA, E
1373‡ 100 91 EL
1434 78 58
1439 90 56
1477 92 88 A, GA, EL
1866 99 90 A, GA
2158 100 99
2298‡ 91 67
2320 60 51
2388‡ 100 100
2639 100 78 A, GA
D. Bulge loops
16 S rRNA
583‡ 100 100
777 100 96
23 S rRNA
213 100 100
764‡ 100 60
941‡ 100 99
1205‡ 76 67
1490‡ 97 96
1586 90 79
2602‡ 100 100
rRNA positions have an AG motif in more than 50 % of the bacterial sequences and are indicated in blue on Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is
base-paired; #, Sarcin/Ricin loop.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).
d
Hairpin loop size and special characteristics:. GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
Unpaired Adenosine Bases in Ribosomal RNAs 345
coaxial stacked helices of the P4-P6 domain. The
other two adenosine platforms form intermolecular
crystal contacts, whose physiological signi®cance is
uncertain. We will focus on the two internal loops
at positions 219 and 226, since ten of the 13 adeno-
sine platform candidates in 16 S and 23 S rRNA
occur in internal loops (two occur in bulge loops,
and the last occurs in a multi-stem loop (Table 2
and Figure 5)). The adenosine platform at the hair-
pin loop at position 172 of the P4-P6 domain will
not be considered here, in part because it is also
involved in a intramolecular crystal interaction that
is not physiological.
The P4-P6 domain, as represented by the
T. thermophila crystal structure, is only present in
the C1 and C2 subgroups of the group I introns
(Michel & Westhof, 1990; Damberger & Gutell,
1994). To ensure that we are comparing similar
structural elements, we only analyzed those C1
sequences that have the same number of nucleo-
tides as T. thermophila at the positions involved
in the two adenosine platforms. Only 110 of the
319 sequences in the group C1 intron alignment
have a symmetric 3 Â 3 internal loop at position
219 in our sequence alignments and data set.
Table 7 reveals the high degree of conservation
of the two adenosine residues 5H
of the loop-
helix junction; 98 % of the sequences have an A
residue at positions 218 and 219. Position G220
and its pairing partner U253 are each conserved
in approximately 70 % of the sequences, while
the G:U base-pair occurs in less in less than
60 % of the sequences. The second most common
base-pair is C:G, followed by A:U and G:C. In
Table 5. A-motif: AA sites in 16 S and 23 S rRNA
Positiona
AA (%)b
Sequencec
Predicted
Structure
Motifsd
Loope
A. Hairpin loops
16 S rRNA
162 99 AAC A, U GNRA
622 99 AAC U 5
696 100 AAU A, U 6*
1170 97 AAA 5*
1519 97 AAG A GNRA
23 S rRNA
127 100 AAC A GNRA
390 72 AAA 7
752 92 AAA U 8*
1085 100 AAA U 3
1367 66 AAG GNRA
1635 55 AAU A 5*
2311 84 AAU 7
B. Internal loops
16 S rRNA
374 100 AAU A
449 52 AAG E
676 100 AAU A, GA
782 100 AAC A, EL
909 100 AAC A, E
1447 94 AAC
23 S rRNA
257 60 AAG E
346 89 AAA
515 100 AAC U
677 82 AAC
901 60 AAC
911 100 AAC
1143 100 AAA
1322 71 AAG
1655 99 AAC A
2015 90 AAU
2741 100 AAC A, GA, U
Positiona
AA (%)b
Sequencec
Predicted
Structure
Motifsd
Loope
C. Multi-stem loops
16 S rRNA
120 99 AAC U
510 99 AAC
959 100 AAU A, GA
1005 51 AAU
23 S rRNA
182 56 AAC A
218 61 AAA
223 94 AAU A, GA, U
300 99 AAC EL
429 98 AAA
483 58 AAC A, U
735 99 AAC
793 61 AAA A, GA
821 100 AAU U
1275 99 AAA
1302 68 AAG
1610 100 AAC
1786 100 AAA
1978 100 AAC A
2199 100 AAC A, GA, U
2287 65 AAA A, GA
2426 98 AAC U
2433 100 AAA U
2734 50 AAG
D. Bulge loops
16 S rRNA
51 87 AAC
72 58 AGC
642 51 AAC
23 S rRNA
1900 89 AAA
rRNA positions have an AA motif in more than 50 % of the bacterial sequences and are indicated in orange in Figure 5.
a
The position number is the nucleotide at the 3H
loop end, at the loop-helix junction.
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
Most prevalent loop-helix sequence.
d
A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion).
e
Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al. 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
346 Unpaired Adenosine Bases in Ribosomal RNAs
Table 6. A-motif: A sites in 16 S and 23 S rRNA
Positiona
A (%)b
Predicted
structure
motifsc
Loopd
A. Hairpin Loops
16 S rRNA
845 61 5
1016 94 A, U GNRA
1453 52 UNGG
23 S rRNA
199 l00 4
548 59 4
574 76 U 8
616 75 5*
1176 62 4
1918 93 7
2478 100 A 7*
2705 99 4*
2757 100 A 11*
2799 56 3
2826 100 7
2860 96 A, U GNRA
B. Multi-stem loops
16 S rRNA
16 100 A
315‡ 100 A
338‡ 99 A
366 65
495 99
546‡ 51
864 100 U
983 100 A
994 100
1101 100
1157 100 A, GA
1191 100
1339 100
1349 100 A, GA, E
1398 100 A
23 S rRNA
52 99 A, GA
73 100
94 81
149‡ 95 A
233 100 GA
270 92
340 99 A, GA, EL
412 98
432 100
460 99 A, GA, E
670 100
990 100
1103‡ 100 A
1384 100
1603 99
1829 100
2042 84
2062 100
2171‡ 100 U
2173‡ 100 A, GA, U
2346 100 A, GA
2358 98 A
2835 100 A
Positiona
A (%)b
Predicted
structure
motifsc
Loopd
C. Internal loops
16 S rRNA
151 100
174 94 A, GA
282 100 A
389‡ 100 A
482 98 A, GA
487 99 A, GA, E
535 100
715 100 A, GA
1306 100 A, GA
1408 99 A
1483 99 A, GA
1499 100
23 S rRNA
63‡ 56
91‡ 89
103 99 A
207 99 A, GA, E
1050 100
1419 95 A, GA
1664‡ 100
1689 100 A, GA, EL
1723 62
1745 53
1802 100 A, GA
1885‡ 98
2005‡ 85 A
2327‡ 100 A
2614 100
2657 # 100 A, GA, E
2690 68
D. Bulge loops
16 S rRNA
55‡ 100
65 94
130‡ 100
205 83
397‡ 100
595‡ 79 BT
1042‡ 55
1055 100
1196‡ 99
1227‡ 100
1394‡ 100
23 S rRNA
443‡ 100
739‡ 61 BT
896‡ 99
927‡ 89
1819 100
1981‡ 99
2051‡ 61
2873‡ 100
2879‡ 98
rRNA positions have an A motif in more than 50 % of the bacterial sequences and are indicated in yellow on Figure 5.
a
The position number is the nucleotide at the 3' loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is
base-paired; #, Sarcin/Ricin loop
b
More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/.
c
A, AA.AG@helix.ends; BT, base triple; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion)
d
Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial
rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch.
Unpaired Adenosine Bases in Ribosomal RNAs 347
addition, positions 219:254 do not form Watson-
Crick base-pairs.
A total of 139 of the 319 ICI sequences had a
3 Â 2 internal loop at position 226 (Table 7). As in
the previous example, adenosine bases are the
most frequent nucleotide at the two positions 5H
of
the loop-helix junction; however, the frequencies of
these two adenosine bases are not as high. One-
quarter of the sequences have a C base in place of
the adenosine at position 226, which is consistent
with previous sequence analysis and in vitro selec-
tion experiments (Costa & Michel, 1997). The G at
position 227 and the G:U base-pair at positions
227:247 are both present in 65 % and 62 % of the
sequences, respectively. One of the most conserved
features of the 226 adenosine platform is the
U224:A248 reverse Hoogsteen base-pair, which
occurs in 87 % of the sequences. While all four
nucleotides are observed at the bulge at position
249, 88 % of the sequences are pyrimidine bases; in
the P4-P6 crystal structure (Cate et al., 1996a), this
position is involved in the tertiary interactions
with the tetraloop at position 150 and can poten-
tially form a hydrogen bond to A226 (Costa &
Michel, 1997).
The adenosine platform at position 226 in the
P4-P6 domain crystal structure widens the minor
groove of the RNA helix to allow tertiary contact
with the tetraloop at positions 150-153. The tetra-
loop receptor in the absence of bound tetraloop
assumes an alternate structure, with the adenosine
forming a cross-strand stack (Butcher et al., 1997).
The adenosine bases, rather than forming the side-
by-side arrangement observed in the crystal struc-
ture, are arranged in a stacked zipper-like arrange-
ment. In addition, the ®rst adenosine nucleotides
of the two platforms (218 and 225) become suscep-
tible to methylation by dimethylsulfate when the
tetraloop-receptor interaction is disrupted by
mutation (Murphy & Cech, 1994). Thus, the adeno-
sine platform motif appears to have both confor-
mational and sequence plasticity. The majority of
the ICI sequences with the same internal loop con-
®guration as the Tetrahymena group I intron (see
above) have an adenosine and purine juxtaposed
and adjacent to the G:U base-pair (positions 219
and 254, and 226 and 248; see Table 7).
The most conserved features of the two group I
intron adenosine platforms that occur at internal
loops are the two consecutive adenosines at the 3H
end of the loop. The paired G at the 3H
loop.5H
helix
junction and the G:U base-pair are also moderately
conserved. Since the majority of the 16 S and 23 S
rRNA adenosine platform candidates are more
conserved at these four positions than the two
known intron adenosine platforms, it is reasonable
to expect this motif to occur at the majority (if not
all) of the 16 S and 23 S rRNA AAG:U sequence
motifs listed in Table 2. Also note that the majority
(77 %) of the rRNA platform candidates occur in
internal loops (Table 2 and Figure 5). Most of our
16 S and 23 S rRNA adenosine platform candidates
also have an adenosine and purine juxtaposed and
adjacent to the G:U base-pair facing the loop
(Gautheret et al., 1995b), similar to the two intron
adenosine platforms; the most notable exception is
the junction at position 1890 in the 23 S rRNA,
where a highly conserved (97 %) uridine at position
1852 is opposite the ®rst A at position 1890. Two
sets of rRNA adenosine platform candidates (16 S
rRNA positions 415 and 432, and 23 S rRNA pos-
itions 1854 and 1890) occur at the two opposing
ends of the same internal loop. The structural and
functional signi®cance of this tight clustering of
adenosine platforms is currently unknown. We
wonder if these two potential adenosine platform
Table 7. Base composition of adenosine platforms in group IC1 introns
Percentagea
A C G U A C G U Pairingb
Structurec
a
Percentages were determined as described in the text. Only percentages greater than 1 % are shown.
b
Base-pairing occurring in more than 5 % of the sequences examined.
c
Partial secondary structure of the Tetrahymena thermophila IC1 intron (GenBank #J01235). The complete structure is available at
http://www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html
d
Indicates base present in the P4-P6 subdomain of Tetrahymena thermophila.
348 Unpaired Adenosine Bases in Ribosomal RNAs
motifs form simultaneously, or perhaps alternate
in formation during protein biosynthesis. Addition-
ally, six of the putative adenosine platforms in
Table 2 overlap with other A-motifs, e.g. 16 S
rRNA position 1289 is part of the adenosine plat-
form and the AA.AG@helix.ends motif 16 S rRNA
position 415 (Elgavish et al., unpublished results) is
part of the adenosine platform and the E-like loop
motif (see below). The A-motifs, that are associated
with adenosine platforms are noted in Table 2.
E and E-like loops
Comparative sequence analysis has identi®ed
potential E loop motifs (Varani et al., 1989;
Wimberly et al., 1993) in both 16 S and 23 S rRNA
(Gautheret et al., 1994; Wimberly, 1994; Leontis &
Westhof, 1998). Thirteen dominant A sites in
Tables 2-6 overlap with eleven E loops; each occur-
rence is indicated in these Tables. Two 16 S and
eight 23 S rRNA loop E motifs were predicted ear-
lier. The 16 S rRNA positions are 909 (Table 5) and
1349 (Table 6); the 23 S rRNA positions are 207
(Table 6), 244 (Table 4), 374 (Table 4), 460 (Table 6),
674, 1189 (Table 4), 1268 (Table 4), and 2657
(Table 6). Our analysis identi®ed all of these except
for position 674 in 23 S rRNA. This E loop motif
overlapped with two positions (674 and 806) that
are now base-paired in our covariation structure
model (comparative support shown in base-pair
frequency tables at the CRW Site; see Materials
and Methods) but were unpaired at the time that
the E loop was proposed (Leontis & Westhof,
1998). Therefore, we don't consider this putative E
loop to be valid.
Our analysis of dominant A positions has also
revealed two new E loop sequence motifs. The ®rst
is at positions 447-449 and 484-487 in 16 S rRNA,
with both positions 449 and 487 containing a domi-
nant A. This potential E loop motif is at the center
of an elongated and irregular compound helix.
This motif is ¯anked on one side by a helix and on
the other by a lone pair (450:483, E. coli number-
ing). A tandem G:A base-pair is on the other side
of this lone pair. The second new E loop sequence
motif is in the 23 S rRNA at positions 858-861 and
916-918. The nucleotides in this motif were paired
in the older versions of the 23 S rRNA secondary
structure model, thus preventing its detection until
now. The previous base-pairs were removed from
the current structure model since the variations at
the individual positions were not matched by a
similar pattern of variation at the partner positions.
Our analysis of the dominant A bases at the 3H
end of loops has also revealed a sequence motif
that is similar to but not identical with the E loop
motif. The canonical E loop motif has an asym-
metric 4Â3 internal loop, as shown in Figure 6(a).
For sequences 5H
-NGUAP-3H
and 5H
-QGAA-3H
, P
and Q (positions 5 and 6) are base-paired, with
unusual pairing conformations between positions 1
and 9, 3 and 8, and 4 and 7 (Figure 6(a)). In con-
trast, our E-like loop motif, as we like to call it,
also contains the two sequences 5H
-NGUAP-3H
and
5H
-QGAAZ-3H
(Figure 6(b)). Here again, P and Q
(positions 5 and 6) and N and Z (positions 1 and
10) form two canonical base-pairs, leaving the 5H
GUA-3H
in sequence 1 juxtaposed with the 5H
-GAA
3H
in sequence 2. Presumably three additional pair-
ings are formed: G:A (2 and 9), U:A (3 and 8), and
A:G (4 and 7). The conformations for the second
and third pairings, U:A and A:G are related to the
G:A type II tandems as described by Gautheret
et al. (1994). Here, the invariant U:A base-pair is
thought to adopt the reverse Hoogsteen confor-
mation, adjacent to a sheared A:G base-pair, result-
ing in the two adenosine bases protruding into the
minor groove and overwinding the helix. This
arrangement of nucleotides is present in the bac-
terial version of the 5 S rRNA E loop, and is called
the cross-strand A stack (Correll et al., 1997). Poss-
ibly the ®rst sheared A:G base-pair (positions 2
and 9 in Figure 6(b)) underwinds the helix and
returns it to register. Eight E-like loop motifs are
present in the conserved core of the 16 S and 23 S
rRNAs and contain eleven dominant A sites. Three
of these motifs occur at positions 413-415/428-430,
765-767/812-814, and 780-782/800-802 in the 16 S
rRNA; ®ve more occur in the 23 S rRNA at
positions 298-300/338-340, 1358-1360/1371-1373,
1475-1477/1514-1516, 1687-1689/1699-1701, and
1930-1932/1968-1970. Five of these E-like loops
occur in internal loops; three are present in multi-
stem loops. The A-motifs that are associated with
E and E-like loops are noted in Tables 2-6.
AA.AG@helix.ends and tandem G:A base-pairs
Adenosine bases at the 3H
end of loops have also
been associated with G:A base-pairs at the end of
helices (Traub & Sussman, 1982; Woese et al.,
1983). Here, the helix is extended by at least one
G:A base-pair (for example, the sequences 5H
-AGP-
3H
and 5H
-QCG-3H
interact to form A:G, G:C, and
P:Q base-pairs). G:A juxtapositions have been
Figure 6. Schematic of E and E-like loops. Nucleotides
are numbered for reference. Types of base-pairing are
indicated by lines: canonical pairings (G:C, A:U) have
thick, continuous lines, type II tandem G:A pairings
have thin, broken lines, and other non-canonical pair-
ings are shown with thick, broken lines. (a). Canonical E
loop, where positions 1-4 and 7-9 comprise the 4 Â 3
internal loop. (b). E-like loop. Positions 2-4 and 7-9 com-
prise the 3 Â 3 internal loop.
Unpaired Adenosine Bases in Ribosomal RNAs 349
shown to be energetically stable in one thermo-
dynamic study of bulge loops (Longfellow et al.,
1990). More recently, we have analyzed a large
number of 16 S and 23 S rRNAs comparative struc-
ture models and con®rmed that many helices do
close with a G:A juxtaposition (Elgavish et al.
unpublished results). However, we also noted in
our comparative study that many of these juxtapo-
sitions in E. coli are maintained in at least 90 % of
the sequences and found, in addition to the G:A
juxtapositions, that many helices are ¯anked by
A:A or A:A/G:A juxtapositions. Our studies
revealed a strong bias in the orientation for these
G:A base-pairs: A is always 5H
to the helix, while G
or A is 3H
to the helix. These observations are con-
sistent with the bias for unpaired adenosine bases
at the 3H
end of loops and for the high percentage
of unpaired G and A at the 5H
end of loops. Note
that some of these AA.AG@helix.ends are a
component of E and E-like loops and that GNRA
tetraloops (Woese et al., 1990) have the AA.AG@
helix.ends motif. A total of 116 A-motifs are associ-
ated with AA.AG@helix.ends and are noted in
Tables 2-6.
Several of these A:A and G:A juxtapositions at
the 5H
end of helices are ¯anked on their 5H
side by
a second A:A or G:A pair. Tandem G:A and A:A
pairs in the 16 S and 23 S RNA were identi®ed ear-
lier (SantaLucia et al., 1990; Gautheret et al., 1994),
and can adopt a single structure conformation that
is consistent with their pattern of nucleotide
substitutions (Gautheret et al., 1994). We have
searched again for these tandem G:A/A:A motifs
in our newer 16 S and 23 S rRNA comparative
structure models and our larger collection of
comparative rRNA structure models. In addition to
the tandems identi®ed earlier (Gautheret et al.,
1994), we have found 23 new tandems that
are conserved in at least 90 % of the bacterial 16 S
and 23 S rRNA sequences. Fifty A-motifs are
associated with G:A tandems, and they are noted
in Tables 2-6.
U-turns
The U-turn, a structure motif characterized by a
sharp turn in the RNA, was ®rst identi®ed in the
tRNA crystal structure (Quigley & Rich, 1976), and
subsequently has been found in several other
RNAs (Pley et al., 1994; Jucker & Pardi, 1995;
Huang et al., 1996; Fountain et al., 1996; Conn et al.,
1999; Culver et al. 1999; Stallings & Moore, 1997;
Puglisi & Puglisi, 1998).
Dominant A nucleotides at the 3H
end of 16 S
and 23 S rRNA loops are also found in some of the
tetra- and hexanucleotide hairpin loops that form
U-turns (Woese et al., 1990; Jucker & Pardi, 1995;
Huang et al., 1996; Fountain et al., 1996). In both of
these loop mo®fs, a base-pair forms between the
guanosine at the ®rst position of the hairpin loop
(and 3H
to the helix), and the adenosine at the last
position of the loop (and 5H
to the helix). Recently,
we have predicted, based on the analysis of many
comparative structure models, 57 positions in the
16 S and 23 S rRNA where the U-turn motif might
occur (Gutell et al., 2000). The 39 U-turn candidates
that are coincident with A-motifs are noted in
Tables 2-6. Of these, 22 occur in hairpin loops; 13
(59 %) of these are GNRA tetraloops. The remain-
ing 17 occur in internal loops and multi-stem
loops.
Concluding comments
Of the 527 positions at the 3H
end of loops in the
16 S and 23 S rRNA, nearly 300 are occupied with
a dominant A, an adenosine that occurs in more
than 50 % of the bacterial sequences. Larger
sequence motifs that occur frequently are built
onto these A-motifs. There are 102 A, 56 AA, 80
AG, 43 AAG, and 13 AAG:U A-motifs. A total of
51 % of of these sites are part of a known structural
motif (Table 8(a)). Of these, 39 % of the A-motifs
are associated with the AA.AG@helix.ends motif;
14 % of these are within GNRA tetraloops. Tandem
G:A pairs and U-turns are also common, occurring
at 17 % and 14 % of the A-motif sites, respectively.
There are smaller percentages of adenosine
platforms (4 %) and E loop (4 %) and E-like loop
(4 %) sequence motifs (Table 8(a)).
Some of these structural motifs are part of a lar-
ger structural element. For example, some of the
AA.AG@helix.ends motifs are within the bound-
aries of E and E-like loops, the tandem G:A motif,
and GNRA tetraloops. Some of these GNRA hair-
pin loops are themselves involved in larger tertiary
folds (Jaeger et al., 1994; Costa & Michel, 1995;
Cate et al., 1996b). Other A-motifs are associated
with more than one structural motif in which one
motif is not entirely contained within the other.
Here, the structural motifs involve positions that
are not utilized by the other, except for the domi-
nant A at the 3H
end of the loop. For example, pos-
ition 415 in 16 S rRNA is part of the E-like loop
and adenosine platform motifs. Two examples
where a single dominant A is part of both an ade-
nosine platform and a G:A tandem are at 16 S
rRNA position 432 and position 1854 in 23 S
rRNA. Although our understanding of RNA struc-
tural motifs is not complete, these overlapping and
possibly competing structural A-motifs suggest
that these junctions of the RNA might be under-
going conformational changes. In total, only one
structural motif occurs at 51 % of the A-motifs that
are associated with a known structural motif
(Table 8). A total of 37 % are associated with two
structural motifs, and 13 % are associated with
three structural motifs.
In contrast, we are unable to predict the struc-
ture conformation for 49 % of the A-motifs. There-
fore, there is the possibility that new structural
motifs occur at these positions. Alternatively, struc-
tural motifs that we are already familiar with occur
at these A-motifs with a composition and arrange-
ment of nucleotides that were not previously
associated with that motif (for example, adenosine
350 Unpaired Adenosine Bases in Ribosomal RNAs
platforms occur at positions with sequences other
than AAG:U). To help resolve this issue, the con-
formations of these adenosine bases in the 30 S
and 50 S ribosomal subunit crystal structures (Ban
et al., 2000; Schluenzen et al., 2000; Wimberly et al.,
2000) need to be analyzed. Some 8 % of the
A-motifs are single bulge adenosine nucleotides;
while the structural signi®cance for all of them are
unknown, covariation analysis and NMR have
revealed a base-triple in 16 S rRNA between a
bulged A at position 595 and the base-pair at
596:644 (CRW Site; Kalurachchi et al., 1997).
Although the thermodynamic consequences of
the unpaired adenosine bases identi®ed here in the
covariation-based structure models are not known,
an earlier thermodynamic study of internal loops
revealed that unpaired adenosine bases in asym-
metrical loops are more destabilizing than those in
symmetrical loops (Peritz et al., 1991). The three
sets of results, (1) this thermodynamic study; (2)
the preponderance of adenosine bases in unpaired
regions of the covariation-based structure model,
with the majority of these occurring in asymmetri-
cal loops; and (3) the structural studies that reveal
that the majority of these unpaired adenosine
nucleotides are base-paired, albeit in an irregular
manner (Cate et al., 1996a,b; Ban et al., 2000;
Schluenzen et al., 2000; Wimberly et al., 2000), may
all be coordinated and in¯uence RNA folding. We
speculate that these destabilizing, asymmetrically
placed adenosine nucleotides are a signi®cant com-
ponent in the transition from secondary to tertiary
RNA structure. The destabilizing effects of these
adenosines on secondary structure, coupled with
the need for an RNA molecule to adopt its minimal
energetic state, suggest that these abundant adeno-
sine nucleotides will actively seek out energetically
stabilizing tertiary interactions and, in the process,
form a three-dimensional RNA molecule.
The propensity for conserved and unpaired ade-
nosine bases in the 16 S and 23 S rRNA covariation
structure models must be related to the structure
and function of the ribosome. As stated earlier,
unpaired positions in the covariation structure
model do not imply that those positions are not
paired; it (only) says that they don't pair in the
regular manner that most covariation-based base-
pairs do. And given that other unpaired positions
are paired, albeit irregularly, in other RNA
molecules whose structures have been solved by
crystallography or NMR (e.g. adenosine platforms,
E loops), we anticipate these unpaired positions in
the 16 S and 23 S rRNA covariation structure
models to be paired. We now wonder if these unu-
sual pairings can be predicted with comparative
analysis. Our A story is a beginning towards this
end.
As noted, the A-motifs come in various forms,
i.e. A, AA, AG, AAG, and AAG:U, and these are
associated with several known structural motifs.
These observations suggest that unpaired adeno-
sine bases can form a variety of different structural
conformations. What is special about adenosine
that lends itself to participating in these structural
motifs? And in some situations, it appears as
though at least two different structural elements
can occur at the same A-motif. Does one structural
motif predominate at these positions, or do these
sites provide the ribosome with an opportunity to
alternate conformations during the ribosome cycle?
Is the prevalence of adenosine bases at these pos-
itions related to the ability of adenosine to accom-
modate a variety of binding partners, perhaps its
base stacking potential, or other interesting inter-
actions? The A story is not ®nished.
Table 8. Summary of domainant A nucleotides and related motifs (based upon Tables 1-6)
A. Occurrences of motifs at dominant A positions
Category 16 S rRNA 23 S rRNA Total
1 # of adenosine platforms 3 (3 %) 10 (5 %) 13 (4 %)
2 # of loops 4 (4 %) 8 (4 %) 12 (4 %)
3 # of E-like loops 4 (4 %) 7 (4 %) 11 (4 %)
4 # of AA,AG@helix.ends 44 (44 %) 72 (37 %) 116 (39 %)
4a # of AA,AG@helix.ends in GNRA tetraloops 8 (8 %) 8 (4 %) 16 (5 %)
4b # of other AA,AG@helix.ends 36 (36 %) 64 (33 %) 100 (34 %)
5 # of tandem GA's 13 (13 %) 37 (19 %) 50 (17 %)
6 # of U-turns 11 (11 %) 29 (15 %) 40 (14 %)
7 # of single bulges 9 (9 %) 14 (7 %) 23 (8 %)
8 Total # of dominant A bases associated with motifs (1-6)a
51 (51 %) 98 (51 %) 149 (51 %)
9 # of dominant A bases not associated with motifs (1-6) 49 (49 %) 96 (49 %) 145 (49 %)
10 Total # of dominant A bases at 3H
ends of loops (8 ‡ 9) 100 194 294
B. Number of motifs per dominant A nucleotide (not including single bulges)
Motifs 16 S rRNA 23 S rRNA Total
1 25 (49 %) 51 (52 %) 76 (51 %)
2 24 (47 %) 30 (31 %) 54 (36 %)
3 2 (4 %) 17 (17 %) 19 (13 %)
Total # of dominant A bases 51 98 149
Total # of associated motifs 79 162 241
Average # of associated motifs per dominant A position with an associated
motif
1.5 1.7 1.6
a
A single dominant A may be associated with 1-3 motifs.
Unpaired Adenosine Bases in Ribosomal RNAs 351
Materials and Methods
Additional supporting data is presented at the CRW
Site (http://www.rna.icmb.utexas.edu) and the CRW A
Story pages (http://www.rna.icmb.utexas.edu/ANAL-
YSIS/A-STORY/). The CRW A story information
supplements the data presented in Figures 1-4 and
Tables 1-8 and is divided into four categories: general
data; position-speci®c data; structure diagrams; and
manuscript materials. The general data (GE) section con-
tains generalized counts for the number and frequency
of different A-motifs in the 16 S and 23 S rRNA com-
parative structure models from the (1) bacteria (summar-
ized in Figures 1-4); (2) the archaea and eucarya
(nuclear, chloroplast, and mitochondria); and (3) A-motif
analysis of the comparative structure models from 5 S
rRNA and group I introns. The position-speci®c data
(PS) section presents frequency tables for all of the 16 S
and 23 S rRNA positions which contain an A-motif (with
data from the three phylogenetic domains, chloroplasts,
and mitochondria); larger motifs (adenosine platforms,
E and E-like loops, AA.AG@helix.ends, tandem G:A
pairings, and U-turns) that map onto the A-motifs are
identi®ed. Frequency tables for E and E-like Loops
(including only bacterial data) are also provided here.
The structure diagrams (SD) section contains Figure 5
and includes secondary structure diagrams for each of
the motifs examined in these motifs. The manuscript
materials (MS) section contains all of the Figures and
Tables from this manuscript.
The RNA sequence alignments used for this analysis
are maintained by us at the University of Texas (R.R.G.,
unpublished results; CRW Site). Sequences were manu-
ally aligned with the alignment editor AE2 (T. Macke,
Scripps Research Institute, San Diego, CA). As of June
2000, the bacterial 16 S alignment contains 5859
sequences, and the bacterial 23 S alignment contains 327
sequences; both alignments use E. coli (GenBank Acces-
sion # J01695) as their reference sequence for position
numbers. The group I intron (C1 subclass) alignment
contains 319 sequences and uses T. thermophila (GenBank
Accession # J01235) as its reference sequence for position
numbers. Two subalignments of 110 and 139 sequences
having the appropriate arrangement of nucleotides at the
219 and 226 adenosine platform internal loops (see the
text) were created from this larger alignment. These
sequence alignments will be available from this site in
the future.
Secondary structure models for representatives of the
main phylogenetic groupings are inferred by compara-
tive sequence analysis (Gutell; 1996; Gutell et al., unpub-
lished results). As of June 2000, a total of 399 16 S
rRNAs, 292 23 S rRNA, 73 5 S rRNAs, and 174 group I
intron secondary structure models are in our collection
(CRW Site). At present, only a subset of these diagrams
(those diagrams incorporating all of the newest pairings
in our re®ned structure models and in which we have
the most con®dence) are publicly available; as diagrams
are updated to meet these standards, they will be made
available. For Figures 1-4, we counted the overall distri-
butions of the four nucleotides for the entire RNA struc-
ture, and for paired, unpaired, and loop-helix junction
positions, analyzing 278 bacterial structures (209 from
16 S rRNA and 69 from 23 S rRNA); a complete list of
these models is available online. We also present online
the detailed frequencies used to calculate the histograms
in Figures 1-4. For these tables (CRW A Story (GE)), we
have analyzed all of our 16 S, 23 S, and 5 S rRNA
(bacterial, archaea, eucarya, chloroplast, and mitochon-
dria) and group I intron comparative structure models.
The numbers of structure models analyzed for the online
tables are included in those tables. Other nucleotide dis-
tributions are listed dynamically on our online tables.
The programs that generate this information will be pre-
sented elsewhere (Z.S. & R.G., unpublished results).
These online tables will be routinely updated as more
comparative structure models are determined.
Positions at the 3H
ends of loops in the E. coli 16 S and
23 S rRNA secondary structure models were manually
identi®ed. Each site was classi®ed into one of four loop
types: hairpin, multi-stem, internal, or bulge. The pre-
dicted A-motif frequencies in Table 1 were calculated
using the nucleotide frequency values determined from
the bacterial 16 S and 23 S structures (above).
The program query (Gutell et al., unpublished
program) was used to collect nucleotide frequency data
from (AE2) sequence alignments. Base frequencies
for each site were computed independently from the
bacterial alignments (16 S and 23 S rRNA). For bacterial
data, sites with a given A-motif in more than 50 % of the
sequences (33 % for the AAG:U motif) are summarized
in Table 1 and detailed in Tables 2-6; the data from
Tables 1-6 are summarized with respect to structural
motifs in Table 8. Single nucleotide and base-pair
frequencies in Table 7 were calculated from the intron
alignments using query.
The secondary structure ®gures showing the A-motif
sites (Figure 5), the group I intron secondary structure
diagram portion in Table 7, and the additional secondary
structure diagrams available online were generated with
the program XRNA (Weiser & Noller, University of
California, Santa Cruz).
Acknowledgments
This work was supported by the NIH (awarded to
R.R.G., GM48207), NSF (awarded to M.S., MCB-
9707940), Welch Foundation (awarded to R.R.G.), and
from startup funds from the Institute for Cellular and
Molecular Biology at the University of Texas at Austin
(awarded to R.R.G.).
References
Agalarov, S. C., Prasad, G. S., Funke, P. M., Stout, C. D.
& Williamson, J. R. (2000). Structure of the
S15,S6,S18-rRNA complex: assembly of the 30 S
ribosome central domain. Science, 288, 107-112.
Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B.
& Steitz, T. A. (1999). Placement of protein and
RNA structures into a 5 A-resolution map of the
50 S ribosomal subunit. Nature, 400, 841-847.
Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz,
T. A. (2000). The complete atomic structure of the
large ribosomal subunit at 2.4 A resolution. Science,
289, 905-920.
Butcher, S. E., Dieckmann, T. & Feigon, J. (1997).
Solution structure of a GAAA tetraloop receptor
RNA. EMBO J, 16, 7490-7499.
Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,
B. L., Kundrot, C. E. et al. (1996a). Crystal structure
of a group I ribozyme domain: principles of RNA
packing. Science, 273, 1678-1686.
Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden,
B. L., Szewczak, A. A., Kundrot, C. E., Cech, T. R.
352 Unpaired Adenosine Bases in Ribosomal RNAs
& Doudna, J. A. (1996b). RNA tertiary structure
mediation by adenosine platforms. Science, 273,
1696-1699.
Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest,
T. N. & Noller, H. F. (1999). X-ray crystal structures
of 70S ribosome functional complexes. Science, 285,
2095-2104.
Clemons, W. M., Jr, May, J. L. C., Wimberly, B. T.,
McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V.
(1999). Structure of a bacterial 30 S ribosomal
subunit at 5.5 A resolution. Nature, 400, 833-840.
Conn, G. L., Draper, D. E., Lattman, E. E. & Gittis, A. G.
(1999). Crystal structure of a conserved ribosomal
protein-RNA complex. Science, 284, 1171-1174.
Correll, C. C., Freeborn, B., Moore, P. B. & Steitz, T. A.
(1997). Metals, motifs, and recognition in the crystal
structure of a 5S rRNA domain. Cell, 91, 705-712.
Costa, M. & Michel, F. (1995). Frequent use of the same
tertiary motif by self-folding RNAs. EMBO J. 14,
1276-1285.
Costa, M. & Michel, F. (1997). Rules for RNA recog-
nition of GNRA tetraloops deduced by in vitro
selection: comparison with in vivo evolution. EMBO
J. 16, 3289-3302.
Culver, G. M., Cate, J. H., Yusupova, G. Z., Yusupov,
M. M. & Noller, H. F. (1999). Identi®cation of an
RNA-protein bridge spanning the ribosomal sub-
unit interface. Science, 285, 2133-2136.
Damberger, S. H. & Gutell, R. R. (1994). A comparative
database of group I intron structures. Nucl. Acids
Res. 22, 3508-3510.
Fountain, M. A., Serra, M. J., Krugh, T. R. & Turner,
D. H. (1996). Structural features of a six-nucleotide
RNA hairpin loop found in ribosomal RNA.
Biochemistry, 35, 6539-6548.
Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N.,
Caruthers, M. H., Neilson, T. & Turner, D. H.
(1986). Improved free-energy parameters for predic-
tions of RNA duplex stability. Proc. Natl Acad. Sci.
USA, 83, 9373-9377.
Gautheret, D., Konings, D. & Gutell, R. R. (1994). A
major family of motifs involving G:A mismatches in
ribosomal RNA. J. Mol. Biol. 242, 1-8.
Gautheret, D., Damberger, S. H. & Gutell, R. R. (1995a).
Identi®cation of base-triples in RNA using com-
parative sequence analysis. J. Mol. Biol. 248, 27-43.
Gautheret, D., Konings, D. & Gutell, R. R. (1995b). G:U
base pairing motifs in ribosomal RNAs. RNA, 1,
807-814.
Gutell, R. R. (1996). Comparative sequence analysis and
the structure of 16S and 23S rRNA. In Ribosomal
RNA: Structure, Evolution, Processing and Func-
tion in Protein Biosynthesis (Dahlberg, A. E. &
Zimmermann, R. A., eds), pp. 111-128, CRC Press,
Boca Raton, FL, USA.
Gutell, R. R. (1999). Comparative analysis of RNA
sequences. Nucl. Acids Symp. Ser. 41, 48-53.
Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F.
(1985). Comparative anatomy of 16S- like ribosomal
RNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216.
Gutell, R. R., Cannone, J. J., Konings, D. & Gautheret, D.
(2000). Predicting U-turns in ribosomal RNA with
comparative sequence analysis. J. Mol. Biol. 300,
791-803.
Huang, S., Wang, Y.-X. & Draper, D. E. (1996). Structure
of a hexanucleotide RNA hairpin loop conserved in
ribosomal RNAs. J. Mol. Biol. 258, 308-321.
Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement
of a GNRA tetraloop in Long-range RNA tertiary
interactions. J. Mol. Biol. 236, 1271-1276.
Jucker, F. M. & Pardi, A. (1995). GNRA tetraloops make
a U-turn. RNA, 1, 219-222.
Kalurachchi, K., Uma, K., Zimmermann, R. A. &
Nikonowicz, E. P. (1997). Structural features of the
binding site for ribosomal protein S8 in Escherichia
coli 16S rRNA de®ned using NMR spectroscopy.
Proc. Natl Acad. Sci. USA, 94, 2139-2144.
Leontis, N. B. & Westhof, E. (1998). A common motif
organizes the structure of multi-helix loops in 16 S
and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-583.
Longfellow, C. E., Kierzek, R. & Turner, D. H. (1990).
Thermodynamic and spectroscopic study of bulge
loops in oligoribonucleotides. Biochemistry, 29, 278-
285.
Michel, F. & Dujon, B. (1983). Conservation of RNA sec-
ondary structures in two intron families including
mitochondrial-, chloroplast- and nuclear-encoded
members. EMBO J. 2, 33-38.
Michel, F. & Westhof, E. (1990). Modeling of the three-
dimensional architecture of group I catalytic introns
based upon comparative sequence analysis. J. Mol.
Biol. 216, 585-610.
Michel, F., Costa, M., Massire, I. & Westhof, E. (2000).
Modeling RNA tertiary structure from patterns of
sequence variation. Methods Enzymol. 317, 491-510.
Murphy, F. L. & Cech, T. R. (1994). GAAA tetraloop
and conserved bulge stabilize tertiary structure of a
group I intron domain. J. Mol. Biol. 236, 49-63.
Nikulin, A., Serganov, A., Ennifar, E., Tishchenko, S.,
Nevskaya, N., Shepard, W., Portier, C., Garber, M.,
Ehresmann, B., Ehresmann, C., Nikonov, S. &
Dumas, P. (2000). Crystal structure of the S15-rRNA
complex. Nature Struct. Biol. 7, 273-277.
Peritz, A. E., Kierzek, R., Sugimoto, N. & Turner, D. H.
(1991). Thermodynamic study of internal loops in
oligoribonucleotides: symmetric loops are more
stable than symmetric loops. Biochemistry, 30, 6428-
6436.
Pley, H. W., Flaherty, K. M. & McKay, D. B. (1994).
Three-dimensional structure of a hammerhead ribo-
zyme. Nature, 372, 68-74.
Puglisi, E. V. & Puglisi, J. D. (1998). HIV-1 A-rich RNA
loop mimics the tRNA anticodon structure. Nature
Struct. Biol. 5, 1533-1036.
Quigley, G. J. & Rich, A. (1976). Structural domains of
transfer RNA molecules. Science, 194, 796-806.
SantaLucia, J., Kierzek, R. & Turner, D. H. (1990). Effects
of GA mismatches on the structure and thermo-
dynamics of RNA internal loops. Biochemistry, 9,
8813-8819.
Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J.,
Gluehmann, M., Janell, D., Bashan, A., Bartels, H.,
Agman, I., Franceschi, F. & Yonath, A. (2000). Struc-
ture of functionally activated small ribosomal sub-
unit at 3.3 AÊ resolution. Cell, 102, 615-623.
Serra, M. J., Axenson, T. J. & Turner, D. H. (1994). A
model for the stabilities of RNA hairpins based on
a study of the sequence dependence of stability for
hairpins with six nucleotides. Biochemistry, 33,
14289-14296.
Stallings, S. C. & Moore, P. B. (1997). The structure of
an essential splicing element: stem loop IIA from
yeast U2 snRNA. Structure, 5, 1173-1185.
Szewczak, A. A., Moore, P., Chan, Y-L. & Wool, I. G.
(1993). The conformation of the sarcin/ricin loop
Unpaired Adenosine Bases in Ribosomal RNAs 353
from 28S ribosomal RNA. Proc. Natl Acad. Sci. USA,
90, 9581-9585.
Tocilj, A., Schluenzen, F., Janell, D., Gluehmann, M.,
Hansen, H. A. S., Harms, J., Bashan, A., Bartels, H.,
Agmon, I., Franceschi, F. & Yonath, A. (1999). The
small ribosomal subunit from Thermus thermophilus
at 4.5 AÊ resolution: pattern ®ttings and the identi®-
cation of functional site. Proc. Natl Acad. Sci. USA.
96, 14252-14257.
Traub, W. & Sussman, J. L. (1982). Adenine-guanine
base pairing in ribosomal RNA. Nucl. Acids Res. 10,
2701-2708.
Varani, G., Wimberly, B. & Tinoco, I., Jr (1989). Confor-
mation and dynamics of an RNA internal loop.
Biochemistry, 28, 7760-7772.
Wimberly, B. (1994). A common RNA loop motif as a
docking module and its function in the hammer-
head ribozyme. Nature Struct. Biol. 1, 820-827.
Wimberly, B., Varani, G. & Tinoco, I., Jr (1993). The con-
formation of loop E of eukaryotic 5S ribosomal
RNA. Biochemistry, 32, 1078-1087.
Wimberly, B. R., Guymon, R., McCutcheon, J. P., White,
S. W. & Ramakrishnan, V. (1999). A detailed view
of a ribosomal active site: the structure of the L11-
RNA complex. Cell, 97, 491-502.
Wimberly, B. T., Broderson, D. E., Clemons, W. M., Jr,
Morgan-Warren, R. J., Carter, A. P., Vonrhein, C.,
Hartsch, T. & Ramakrishnan, V. (2000). Structure of
the 30 S ribosomal subunit. Nature, 407, 327-339.
Woese, C. R. & Pace, N. R. (1993). Probing RNA struc-
ture, function, and history by comparative analysis.
In The RNA World (Gesteland, R. F. & Atkins, J. F.,
eds), pp. 91-117, Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, NY.
Woese, C. R., Gutell, R., Gupta, R. & Noller, H. F.
(1983). Detailed analysis of the higher-order struc-
ture of 16S-like ribosomal ribonucleic acids. Microb.
Rev. 47, 621-669.
Woese, C. R., Winker, S. & Gutell, R. R. (1990). Architec-
ture of ribosomal RNA: constraints on the sequence
of ``tetra-loops''. Proc. Natl Acad. Sci. USA, 87, 8467-
8471.
Xia, T., SantaLucia, J., Jr, Burkard, M. E., Kierzek, R.,
Schroeder, S., Jiao, X., Cox, C. & Turner, D. H.
(1998). Thermodynamic parameters for an
expanded nearest neighbor model for formation
of RNA duplexes with Watson-Crick base-pairs.
Biochemistry, 37, 14719-14735.
Edited by D. E. Draper
(Received 7 July 2000; received in revised form 9 September 2000; accepted 9 September 2000)
354 Unpaired Adenosine Bases in Ribosomal RNAs

Mais conteúdo relacionado

Destaque

ใบงานสำรวจตนเอง
ใบงานสำรวจตนเองใบงานสำรวจตนเอง
ใบงานสำรวจตนเองThanwarat Twrp
 
Social media mc cann 2010
Social media mc cann 2010Social media mc cann 2010
Social media mc cann 2010Javier Ruiz
 
Citco Industry Spotlight - Autumn 2014 DSA article
Citco Industry Spotlight - Autumn 2014 DSA articleCitco Industry Spotlight - Autumn 2014 DSA article
Citco Industry Spotlight - Autumn 2014 DSA articleKieran Dolan
 
Ltr_Telstra_DStratton
Ltr_Telstra_DStrattonLtr_Telstra_DStratton
Ltr_Telstra_DStrattonHelder Santos
 
Буклет комании EBG
Буклет комании EBGБуклет комании EBG
Буклет комании EBGSashaKarepina
 
Cтартап тренды 2012
Cтартап тренды 2012Cтартап тренды 2012
Cтартап тренды 2012Freshle Community
 
CV_Mohammad Golam Sarowar _Civil Engineer
CV_Mohammad Golam Sarowar  _Civil EngineerCV_Mohammad Golam Sarowar  _Civil Engineer
CV_Mohammad Golam Sarowar _Civil EngineerMohammad Sarowar
 
Инфографика. Краткая вводная
Инфографика. Краткая вводнаяИнфографика. Краткая вводная
Инфографика. Краткая вводнаяFreshle Community
 
Shavedoctor - примеры для презентации
Shavedoctor - примеры для презентацииShavedoctor - примеры для презентации
Shavedoctor - примеры для презентацииSashaKarepina
 
Тайм-менеджмент в интернете в 2017 году
Тайм-менеджмент в интернете в 2017 годуТайм-менеджмент в интернете в 2017 году
Тайм-менеджмент в интернете в 2017 годуYegor Golubev
 
Брендинг городов как путь к развитию инфраструктуры страны
Брендинг городов как путь к развитию инфраструктуры страныБрендинг городов как путь к развитию инфраструктуры страны
Брендинг городов как путь к развитию инфраструктуры страныGor Matevosyan
 
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"Gor Matevosyan
 
งานนำเสนอพลังงานแสงอาทิตย์
งานนำเสนอพลังงานแสงอาทิตย์งานนำเสนอพลังงานแสงอาทิตย์
งานนำเสนอพลังงานแสงอาทิตย์mintra_duangsamorn
 

Destaque (14)

ใบงานสำรวจตนเอง
ใบงานสำรวจตนเองใบงานสำรวจตนเอง
ใบงานสำรวจตนเอง
 
Social media mc cann 2010
Social media mc cann 2010Social media mc cann 2010
Social media mc cann 2010
 
Citco Industry Spotlight - Autumn 2014 DSA article
Citco Industry Spotlight - Autumn 2014 DSA articleCitco Industry Spotlight - Autumn 2014 DSA article
Citco Industry Spotlight - Autumn 2014 DSA article
 
Ltr_Telstra_DStratton
Ltr_Telstra_DStrattonLtr_Telstra_DStratton
Ltr_Telstra_DStratton
 
Буклет комании EBG
Буклет комании EBGБуклет комании EBG
Буклет комании EBG
 
Tecnologia
TecnologiaTecnologia
Tecnologia
 
Cтартап тренды 2012
Cтартап тренды 2012Cтартап тренды 2012
Cтартап тренды 2012
 
CV_Mohammad Golam Sarowar _Civil Engineer
CV_Mohammad Golam Sarowar  _Civil EngineerCV_Mohammad Golam Sarowar  _Civil Engineer
CV_Mohammad Golam Sarowar _Civil Engineer
 
Инфографика. Краткая вводная
Инфографика. Краткая вводнаяИнфографика. Краткая вводная
Инфографика. Краткая вводная
 
Shavedoctor - примеры для презентации
Shavedoctor - примеры для презентацииShavedoctor - примеры для презентации
Shavedoctor - примеры для презентации
 
Тайм-менеджмент в интернете в 2017 году
Тайм-менеджмент в интернете в 2017 годуТайм-менеджмент в интернете в 2017 году
Тайм-менеджмент в интернете в 2017 году
 
Брендинг городов как путь к развитию инфраструктуры страны
Брендинг городов как путь к развитию инфраструктуры страныБрендинг городов как путь к развитию инфраструктуры страны
Брендинг городов как путь к развитию инфраструктуры страны
 
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"
Проект "Вызов 24 часов" / Конвейер проектов, Форум "Евразия"
 
งานนำเสนอพลังงานแสงอาทิตย์
งานนำเสนอพลังงานแสงอาทิตย์งานนำเสนอพลังงานแสงอาทิตย์
งานนำเสนอพลังงานแสงอาทิตย์
 

Semelhante a Gutell 074.jmb.2000.304.0335

Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Robin Gutell
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Robin Gutell
 
Gutell 087.mpe.2003.29.0216
Gutell 087.mpe.2003.29.0216Gutell 087.mpe.2003.29.0216
Gutell 087.mpe.2003.29.0216Robin Gutell
 
Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Robin Gutell
 
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrGutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrRobin Gutell
 
Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Robin Gutell
 
Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Robin Gutell
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Robin Gutell
 
Gutell 014.emboj.1986.05.1111
Gutell 014.emboj.1986.05.1111Gutell 014.emboj.1986.05.1111
Gutell 014.emboj.1986.05.1111Robin Gutell
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Robin Gutell
 
Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Robin Gutell
 
Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Robin Gutell
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Robin Gutell
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Robin Gutell
 
Gutell 015.nar.1988.16.r175
Gutell 015.nar.1988.16.r175Gutell 015.nar.1988.16.r175
Gutell 015.nar.1988.16.r175Robin Gutell
 
Canonical structures for the hypervariable regions of immunoglobulins
Canonical structures for the hypervariable regions of immunoglobulinsCanonical structures for the hypervariable regions of immunoglobulins
Canonical structures for the hypervariable regions of immunoglobulinsNational Institute of Biologics
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Robin Gutell
 
Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655Robin Gutell
 
Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Robin Gutell
 
Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Robin Gutell
 

Semelhante a Gutell 074.jmb.2000.304.0335 (20)

Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735Gutell 075.jmb.2001.310.0735
Gutell 075.jmb.2001.310.0735
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701
 
Gutell 087.mpe.2003.29.0216
Gutell 087.mpe.2003.29.0216Gutell 087.mpe.2003.29.0216
Gutell 087.mpe.2003.29.0216
 
Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383Gutell 119.plos_one_2017_7_e39383
Gutell 119.plos_one_2017_7_e39383
 
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrGutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
 
Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065
 
Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978
 
Gutell 014.emboj.1986.05.1111
Gutell 014.emboj.1986.05.1111Gutell 014.emboj.1986.05.1111
Gutell 014.emboj.1986.05.1111
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559
 
Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119
 
Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013Gutell 122.chapter comparative analy_russell_2013
Gutell 122.chapter comparative analy_russell_2013
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785
 
Gutell 015.nar.1988.16.r175
Gutell 015.nar.1988.16.r175Gutell 015.nar.1988.16.r175
Gutell 015.nar.1988.16.r175
 
Canonical structures for the hypervariable regions of immunoglobulins
Canonical structures for the hypervariable regions of immunoglobulinsCanonical structures for the hypervariable regions of immunoglobulins
Canonical structures for the hypervariable regions of immunoglobulins
 
Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769Gutell 108.jmb.2009.391.769
Gutell 108.jmb.2009.391.769
 
Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655
 
Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495
 
Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313Gutell 028.cosb.1993.03.0313
Gutell 028.cosb.1993.03.0313
 

Mais de Robin Gutell

Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Robin Gutell
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Robin Gutell
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataRobin Gutell
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Robin Gutell
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Robin Gutell
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Robin Gutell
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Robin Gutell
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Robin Gutell
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Robin Gutell
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Robin Gutell
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Robin Gutell
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Robin Gutell
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Robin Gutell
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Robin Gutell
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Robin Gutell
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodRobin Gutell
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Robin Gutell
 

Mais de Robin Gutell (20)

Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803
 
Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676Gutell 121.bibm12 alignment 06392676
Gutell 121.bibm12 alignment 06392676
 
Gutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_dataGutell 120.plos_one_2012_7_e38320_supplemental_data
Gutell 120.plos_one_2012_7_e38320_supplemental_data
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.good
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Gutell 074.jmb.2000.304.0335

  • 1. A Story: Unpaired Adenosine Bases in Ribosomal RNAs R. R. Gutell1 *, J. J. Cannone1 , Z. Shang1 , Y. Du1 and M. J. Serra2 1 Institute for Cellular and Molecular Biology, University of Texas, 2500 Speedway Austin, TX 78712-1095, USA 2 Department of Chemistry Allegheny College, 520 N. Main St., Meadville PA 16335, USA In 1985 an analysis of the Escherichia coli 16 S rRNA covariation-based structure model revealed a strong bias for unpaired adenosines. The same analysis revealed that the majority of the G, C, and U bases were paired. These biases are (now) consistent with the high percentage of unpaired adenosine nucleotides in several structure motifs. An analysis of a larger set of bacterial comparative 16 S and 23 S rRNA structure models has substantiated this initial ®nding and revealed new biases in the distribution of adenosine nucleotides in loop regions. The majority of the adenosine nucleotides are unpaired, while the majority of the G, C, and U bases are paired in the covariation-based structure model. The unpaired adenosine nucleotides predominate in the middle and at the 3H end of loops, and are the second most frequent nucleotide type at the 5H end of loops (G is the most common nucleotide). There are additional biases for unpaired adenosine nucleotides at the 3H end of loops and adjacent to a G at the 5' end of the helix. The most prevalent consecutive nucleotides are GG, GA, AG, and AA. A total of 70 % of the GG sequences are within helices, while more than 70 % of the AA sequences are unpaired. Nearly 50 % of the GA sequences are unpaired, and approximately one-third of the AG sequences are within helices while another third are at the 3H loop.5H helix junction. Unpaired positions with an adenosine nucleotide in more than 50 % of the sequences at the 3H end of 16 S and 23 S rRNA loops were identi®ed and arranged into the A-motif categories XAZ, AAZ, XAG, AAG, and AAG:U, where G or Z is paired, G:U is a base-pair, and X is not an A and Z is not a G in more than 50 % of the sequences. These sequence motifs were associated with several structural motifs, such as adenosine platforms, E and E-like loops, A:A and A:G pairings at the end of helices, G:A tandem base-pairs, GNRA tetraloop hairpins, and U-turns. # 2000 Academic Press Keywords: RNA structure; comparative sequence analysis; unpaired adenosines; structure motifs; computational biology/bioinformatics*Corresponding author Introduction RNA molecules can form similar secondary and tertiary structures for sequences that are not identi- cal, and in many situations with less than 50 % sequence similarity. Comparative sequence anal- ysis attempts to identify those structural elements that are in common between different sequences that are members of the same RNA family (e.g. tRNA). Comparative sequence analysis has been used successfully to predict secondary and tertiary interactions in several RNA molecules (reviewed by Woese & Pace, 1993: Gutell, 1996; Michel et al., 2000). The majority of these interactions are com- posed of G:C and A:U base-pairs (here, we de®ne underlined nucleotides as base-paired), organized into regular secondary structure helices, and ident- i®ed with covariation analysis due to the manner in which both paired positions coordinately change, or covary, their nucleotide composition (Woese et al., 1983; Gutell et al., 1985). Beyond the prediction of standard base-pairs in secondary structure helices, covariation analysis is also pre- dicting non-standard base-pairs (e.g. A:G exchanges with G:A, and U:U exchanges with C:C) and base-pairs that form tertiary structure (Gutell, 1996; Gutell et al., unpublished results). We now believe that all of the standard secondary structure base-pairs in the Escherichia coli 16 S E-mail address of the corresponding author: robin.gutell@mail.utexas.edu doi:10.1006/jmbi.2000.4172 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 304, 335±354 0022-2836/00/030335±20 $35.00/0 # 2000 Academic Press
  • 2. and 23 S rRNAs have been identi®ed with our covariation analysis. For those situations where we can compare and contrast a solved crystal structure with comparative data from a RNA sequence alignment, paired positions with a strong covaria- tion are nearly always base-paired in the crystal structure (Gutell, 1999; Gutell et al., unpublished results). Therefore, covariation analysis, when used judiciously, can accurately predict base-pairs in an RNA structure. We now wonder what type of contribution com- parative analysis will have on the prediction and understanding of the three-dimensional structures of the rRNAs (Ban et al., 1999, 2000; Cate et al., 1999; Clemons et al., 1999; Tocilj et al., 1999; Schluenzen et al., 2000; Wimberly et al., 2000). We can begin to address this issue when we appreciate that comparative analysis, in its most general form, identi®es patterns of variation in its search for a common structure. Base-pairs are predicted for those positions that vary at the same time in the evolution of that RNA, regardless of the type of base-pairing and/or the arrangement of this pairing in relationship with the ¯anking positions. Since the majority of the base-pairs are G:C, A:U, or G:U, and these pairs are arranged into standard secondary structure helices, we conclude that cov- ariation analysis can identify the basic building blocks of RNA structure without any structural or other preconceived biases. Given this success, we now question if other RNA building blocks or motifs can be deciphered from our comparative RNA sequence and structure data sets. Our traditional comparative secondary structure model only shows those secondary and tertiary structure base-pairs with positional covariation within the underlying sequences plus invariant Watson-Crick base-pairs which are directly adjacent to base-pairs with positional covariation. All of the unpaired positions in these diagrams imply the lack of pairings with covaria- tion, not that these positions are not paired or interacting with other regions of the RNA. Can we relate speci®c patterns of variation that occur within a de®ned structural context to a three- dimensional structure motif? Can we now predict structure for the positions that do not covary with other positions? Alternatively, we question what types of structure occur at the unpaired positions in the covariation structure model and ask if can we develop principles that relate sequence variation with these structural elements. While some structural elements, such as base- pairs and helices, form similar structures with sequences whose positions covary, other structural elements with similar shapes form sets of aligned sequences that do not have positional covariation with one another (Gautheret et al., 1995a). Com- parative analysis of nucleotide distributions in different structural elements has resulted in the identi®cation of several sequence and structure motifs in these unpaired regions. This list includes tetraloops (Woese et al., 1990), tandem G:A base- pairs (Gautheret et al., 1994), dominant G:U base- pairs (Gautheret et al., 1995b), E-loops (Gutell et al., unpublished results; Gautheret et al., 1994; Wimberly 1994; Leontis & Westhof, 1998), U-turns (Gutell et al., 2000), and A:A and A:G base-pairs at the ends of helices (here-after called AA.AG@he- lix.ends). These sequence-based analyses are given more meaning, biologically and structurally, from their comparison with experimental studies, especially the NMR and crystallographic analysis of several rRNA fragments (Szewczak et al., 1993; Kalurachchi et al., 1997; Conn et al., 1999; Wimberly et al., 1999; Agalarov et al., 2000; Nikulin et al., 2000). Our goals for the future are to identify more biased distributions of nucleotides and sequences in different structural arrangements, to ascribe biological and structural signi®cance to them, and to deduce sets of sequence-structure relationship rules, from which we aspire to accu- rately predict detailed RNA structure from a single sequence. In 1985, a simple count of the paired and unpaired nucleotides in E. coli 16 S rRNA revealed a strong bias for unpaired adenosine nucleotides (Gutell et al., 1985). A total of 62 % of the adenosine nucleotides were unpaired, while approximately 30 % of the G, C, and U bases were unpaired. The structural signi®cance for this bias was not known at the time. However, these biases are (now) consistent with the high percentage of unpaired adenosine bases in the GNRA tetraloops (Woese et al., 1990), E-loops (Gautheret et al., 1994; Wimberly, 1994; Leontis & Westhof, 1998), adeno- sine platforms (Cate et al., 1996b) and AA side-step (Conn et al., 1999) RNA sequence and structure motifs found after this initial adenosine bias was found. Here, we follow up with a larger and more detailed analysis of paired and unpaired nucleo- tides in our collection of rRNA and group I intron comparative structure models, track the frequently occurring unpaired nucleotides, and associate these with different structural motifs. Results The base compositions for 175 bacterial 16 S and 71 bacterial 23 S rRNA comparative structure models have been analyzed and presented here. For our online presentation (see Materials and Methods for detailed explanations), we have ana- lyzed a larger set of comparative structures from 5 S, 16 S, and 23 S rRNAs (including bacteria, archaea, and eucarya nuclear, chloroplast, and mitochondria sequences) and group I introns. Our collection of structure diagrams represents all of the major phylogenetic groups within the bacterial domain (as well as for the other primary phylo- genetic domains). The comparative structure model is based on covariation analysis (Woese et al., 1983; Gutell et al., 1985, unpublished results). For the purposes of the current analysis, positions 336 Unpaired Adenosine Bases in Ribosomal RNAs
  • 3. with substantial covariation or containing invariant Watson-Crick base-pairs are base-paired and positions that do not covary with other positions are unpaired in our covariation structure model. The current 16 S and 23 S rRNA secondary structure models are available from http:// www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html The frequencies for single nucleotide positions are presented in histogram format (Figure 1). The total frequencies for the four RNA nucleotides A, U, C, and G were characterized into helices (base- paired) and loops (unpaired), and then subdivided further into the 5H end, center, and 3H end positions for helices and loops. Overall, G (31.4 %) is the most prevalent nucleotide, followed by A (25.7 %), C (22.4 %), and U (20.5 %), G is also the most com- mon nucleotide in helices (36.6 %), while A (14.5 %) occurs with the lowest frequency in paired pos- itions. Guanosine occurs with an even higher frequency at the 5H end of helices (46.2 %), where U is the least frequent (13.5 %). Meanwhile, C is the most abundant nucleotide at the 3H end of helices (38.1 %), followed by G (30.4 %). Adenosine is the most prevalent nucleotide at unpaired positions, occurring at 42.6 %, while C is the least common at 12.5 %. Adenosine is even more dominant at the 3H end of loops, occurring in 53.5 % of the sequences. Meanwhile, G is the most common nucleotide at the 5H end of loops (37.1 %); adenosine is second at 29.3 %. Another measure of the bias in unpaired adenosine bases is revealed in the ratio of unpaired to paired nucleotides for single nucleotides (see also the online query system). The unpaired/ paired ratio for each nucleotide is: A, 1.96; U, 0.71; G, 0.43; and C, 0.29. Alternatively, 66.2 % of the adenosine bases are unpaired; the percentages of unpaired U, G, and C bases are 41.5 %, 30.1 %, and 22.3 %, respectively, for our collection of bacterial 16 S and 23 S rRNA structure models. These values are similar but not identical with the values determined for the 1985 version of the E. coli 16 S rRNA covariation structure model (Gutell et al., 1985). The same trends and nucleotide biases also occur for our other RNA structure models (avail- able online). Figure 1. Frequency and distribution of single nucleotides in bacterial 16 S and 23 S rRNAs com- parative structure models. The total number of occurrences for each of the four nucleotides at nine struc- tural categories: total (all positions), paired, unpaired, 5H -helix.end (5H end of a helix), 3H -helix.end (3H end of a helix), 5H -loop.end (5H end of a loop), 3H -loop.end (3H end of a loop), helix.center (all pos- itions within a helix that are not at the 5H or 3H ends of a helix), and loop.center (all positions within a loop that are not at the 5H or 3H ends of a loop). Figure 2. Frequency and distribution of consecutive nucleotides in bacterial 16 S and 23 S rRNAs comparative structure models. The total number of occurrences for the 16 dinucleotides at three structural categories: total (all positions), in helix (paired), and in loop (unpaired). Unpaired Adenosine Bases in Ribosomal RNAs 337
  • 4. Next, we investigated the frequency and distribution of consecutive nucleotides. The most common dinucleotides are the four purine combi- nations. Consecutive GG residues are the most prevalent at 9.86 %, followed by GA (7.92 %), AG (7.88 %), and AA (7.65 %) (Figure 2). The dinucleotides were classi®ed into four categories: paired (helical), unpaired (loop), and the two paired/unpaired junctions, 3H loop.5H helix and 3H helix.5H loop. The most frequent consecutive dinu- cleotides are distinctly different between these four categories. In helices, GG (14.1 %), GC (10.4 %), CC (9.0 %), and GU (8.3 %) are the most prevalent con- secutive dinucleotides; note that these consecutive dinucleotide arrangements are components of the most stable nearest-neighbors (Xia et al., 1998). In contrast, AA (19.2 %), GA (13.4 %), and UA (9.8 %) are the most common adjacent dinucleotides in loop motifs (Figure 2). Greater than 70 % of the consecutive adenosine residues are within unpaired regions, consistent with the observation that 5H -AA-3H /3H -UU-5H is the least stable nearest- neighbor (Xia et al., 1998). The adjacent dinucleotides with the highest unpaired to paired ratio are AA (5.68), UA (2.03), GA (1.47), AU (1.20), while the three lowest ratios are GC (0.17), GG (0.15), and CC (0.11). These ratios again emphasize that adenosine bases tend to be unpaired, consecutive adenosine bases are even more likely to be unpaired, and that consecu- tive G and C bases tend to be paired. The most abundant dinucleotides at loop-helix junctions were analyzed (Figure 3). CG (14.6 %), GA (10.3 %), and CA (10.2 %) are the most abun- dant at the 3H helix.5H loop junction; AG (25.0 %) and AC (13.3 %) are the two most abundant pairs at the 3H loop.5H helix junction. These results are consistent with the abundance of A and G bases at the 5H end of loops, A nucleotides at the 3H end of loops, and G and C nucleotides at the 5H and 3H ends of helices. The strong preference for AG at loop-helix junc- tions might not be a simple consequence of stab- ility since all 5H dangling ends have nearly the same small stabilizing effect helices (Freier et al., 1986). The most stable 3H dangling end sequences, CA, CG, GA, and GG (Freier et al., 1986), occur frequently in our 16 S and 23 S rRNA structure data sets (Figure 3). Next, we investigated the frequencies for three consecutive nucleotides - NNN and NNN at loop.helix and helix.loop interfaces, where N is unpaired and N is paired. Figure 4(a) and (b) dis- play the 32 most prevalent trinucleotide combi- nations for NNN (a) and NNN (b). The observed triplets at these junctions are very biased in their distributions. At the 3H loop.5H helix interface (Figure 4(a)), AAG occurs in 14.4 % of the junc- tions, followed by AAC (6.7 %) and GAG (5.4 %). All of the 11 most frequent sequences contain at least one unpaired A nucleotide; nine of these 11 trinucleotides have an A base at the extreme 3H end of the loop. The trinucleotides at the 3H helix.5H loop interface (Figure 4(b)) are signi®cantly different. The three most abundant trinucleotides are BGA, where B is not A: CGA (7.6 %), UGA (5.8 %), and GGA (5.4 %). The six most frequent sequences have at least one adenosine base in the two unpaired positions, with purines accounting for 11 of the 12 unpaired positions. In addition to these biased dis- tributions of triplets at loop/helix junctions, Figure 4(a) and (b) also reveal that only 32 of the 64 possible triplets account for more than 80 % of these occurrences. The most signi®cant ®ndings to this stage in our analysis are the high percentages of: (1) unpaired adenosine bases, with adenosine residues account- ing for more than 50 % of the nucleotides at the 3H loop ends; (2) paired guanosine bases, with guano- sine accounting for nearly 50 % of the nucleotides at the 5H end of helices; (3) unpaired consecutive adenosine bases; and (4) AG at 3H loop.5H helix junctions. Our next set of goals is to map these frequently occurring nucleotides onto the 16 S and 23 S rRNA comparative structure models, to determine those positions where the unpaired adenosine residue at the 3H end of the loop occurs in more than 50 % of the bacterial sequences, and to identify larger motifs that build onto these dominant adenosine bases. We rationalize that 3H loop positions with an adenosine in more than 50 % of the sequences (hereafter called the ``A-motifs'') are important for Figure 3. Frequency and distri- bution of dinucleotides at loop- helix junctions in bacterial 16 S and 23 S rRNAs comparative structure models. Total number of occur- rences of consecutive nucleotides at the two loop-helix junctions, 3H helix.5H loop and 3H loop.5H helix. 338 Unpaired Adenosine Bases in Ribosomal RNAs
  • 5. the formation of conserved structural motifs. A total of 527 unpaired positions in the 16 S and 23 S rRNAs are followed by a base-pair predicted with covariation analysis. We expect, based upon the observed nucleotide frequencies in the bacterial 16 S and 23 S rRNA sequences (A, 25.7 %; C, 22.4 %; G, 31.4 %; U, 20.5 %), adenosine to occur at 25.7 % (135 occurrences) of these 3H loop ends for any one set of 16 S and 23 S rRNA structures. We observe that, collectively, the positions at the 3H loop ends contain 54.5 % adenosine bases. The two extreme cases for the distribution of these adeno- sine bases among the 527 3H loop ends are (1) the adenosine nucleotides are distributed evenly, so that each of the loop ends contains 54.5 % adeno- sine; and (2) the adenosine nucleotides are concen- trated such that 287 of the loop ends contain 100 % adenosine. In fact, 294 of the 527 3H loop ends have an adenosine base in more than 50 % of the bac- terial 16 S and 23 S rRNA sequences (Table 1); the average conservation value for adenosine at these positions is 93.7 %. Therefore, there is a very pronounced bias for adenosines to be very conserved at the 3H loop ends of the 16 S and 23 S rRNAs. Of the 294 3H loop ends with an adenosine base in more than 50 % of bacterial sequences, 136 are followed by a paired G in more than 50 % of those sequences (AG motif; Table 1). In contrast, we expect 43 of these motifs in the 16 S and 23 S rRNAs, based on the observed nucleotide frequen- cies (527*.257*.314). Finally, the number of AA and AAG motifs observed is again more than the num- ber expected for a random distribution (Table 1). The distributions of the expected and observed A, AA, AG, AAG, and AAG:U motifs in hairpin, multi-stem, internal, and bulge loops were deter- mined (Table 1). The number of observed A-motifs at each of the loop motifs is (again) signi®cantly larger than expected. (Note for the following A-motifs (where each motif occurs in a minimum of 50 % of the sequences): AAG, the G is not paired to a U in more than 33 % of the sequences; AA, the nucleotide 3H of the second A is not a G in more than 50 % of the sequences; AG, the nucleotide 5H of the A is not an A in more than 50 % of the sequences; A, the paired nucleotide following the A is not a G in more than 50 % of the sequences and the nucleotide preceding the A is not an A in more than 50 % of the sequences.) The A-motifs have been mapped onto the 16 S and 23 S rRNA secondary structure models (Figure 5). Each of ®ve motifs is assigned a differ- ent color: AAG:U motifs are indicated in red, AAG in green, AG in blue, AA in orange, and A in yellow. Position numbers for the A-motifs in the 16 S and 23 S rRNA are listed in Tables 2 (AAG:U), 3 (AAG), 4 (AG), 5 (AA), and 6 (A). The loop-helix junctions listed in Table 2 have the AAG sequence present in more than 50 % of the bacterial sequences, and G:U in more than 33 % of the same sequence set. Thirteen 16 S and 23 S rRNA junctions satisfy this criteria. The majority of these occur in internal loops (10), and a few occur in bulge (2) and multi-stem (1) loops; three occur in 16 S rRNA, and ten appear in 23 S rRNA (see Table 2 and Figure 5). The majority of these are very well conserved, occurring with percentages signi®cantly higher than the required minimum. Seven have greater than 90 % AAG and 90 % G:U base-pair conservation; the average conservation values are 81 % AAG and 77 % G:U. The remaining 43 AAG loop-helix junctions are listed in Table 3. These junctions are distributed more evenly than the AAG:U A-motif in hairpin (9), multi-stem (19), and internal (14) loops, with one in a bulge loop; 15 occur in 16 S rRNA and 28 occur in 23 S rRNA (see Table 3 and Figure 5). More than 75 % of the hairpin junctions are part of a GNRA tetraloop. Over half (23) of these AAG junctions are conserved in more than 90 % of the sequences, with an average conservation value of Figure 4. Frequency and distribution of consecutive trinucleotides at loop-helix junctions in bacterial 16 S and 23 S rRNAs. The ranking of the top 32 most fre- quent trinucleotides at the two loop-helix junctions, 3H helix.5H loop and 3H loop.5H helix. Two of the three con- secutive nucleotides are unpaired at both junctions. The paired nucleotides are underlined. (a) 3H loop.5H helix junc- tion. (b) 3H helix.5H loop junction. Unpaired Adenosine Bases in Ribosomal RNAs 339
  • 6. 86 %. The consecutive AA nucleotides are con- served in approximately 93 % of the sequences. AG loop-helix junctions are listed in Table 4. There are 80 examples of this motif, with a sig- ni®cant proportion occurring in internal (26), multi-stem (28), and hairpin (17) loops, and the remaining nine in bulge loops; 23 occur in 16 S rRNA and 57 occur in 23 S rRNA (see Table 4 and Figure 5). Almost 60 % of the AG motifs are conserved in more than 90 % of the sequences, and 81 % of these motifs are conserved in more than 70 % of the sequences. Six of the hairpin loops are GNRA tetraloops; seven other loops have unusually stable G:A mismatches between the ®rst and last nucleotides of the hairpin loop (Serra et al., 1994). Figure 5 (legend shown on page 342) 340 Unpaired Adenosine Bases in Ribosomal RNAs
  • 7. Figure 5 (legend shown on page 342) Unpaired Adenosine Bases in Ribosomal RNAs 341
  • 8. A total of 56 AA motifs (Table 5) occur pre- dominantly in multi-stem (24), internal (16), and hairpin (12) loops; four occur in bulge loops (see Table 5 and Figure 5). 18 occur in 16 S rRNA and 39 occur in 23 S rRNA. Over 60 % of these motifs are conserved in more than 90 % of the sequences. Table 5 also contains the most prevalent AAN sequence at each motif site (where N is base-paired; sites having AAG > 50 % appear in Tables 2 or 3). Nearly 50 % of the AA motifs in Table 5 are AAC. Eight of the hairpin loops have unusually stable sequences, either GNRA tetraloops (4) or G:A ®rst mismatches (4) (Serra et al., 1994). Figure 5. A-motifs mapped onto the Escherichia coli 16 S and 23 S rRNA comparative secondary structure models. Unpaired positions at the 3H end of loops that occur in more than 50 % of the bacterial sequences are highlighted in different colors: XAZ, yellow; AAZ, orange; XAG, blue; AAG, green; and AAG:U, red; where X is not A in more than 50 % of the sequences, Z is not G in more than 50 % of the sequences, and paired nucleotides are underlined. Diagrams were generated using the program XRNA (Weiser, B. & Noller, H., University of California at Santa Cruz). (a) 16 S rRNA. (b) 23 S rRNA, 5H half. (c) 23 S rRNA, 3H half. 342 Unpaired Adenosine Bases in Ribosomal RNAs
  • 9. There are 102 A-motifs, with a signi®cant num- ber of occurrences in multi-stem (38), internal (29), bulge (20), and hairpin (15) loops; 41 occur in 16 S and 61 occur in 23 S rRNA (see Table 6 and Figure 5). A total of 77 % of the A motifs are conserved in more than 90 % of the bacterial sequences, and 50 % are 100 % conserved in those sequences! Discussion Analysis of a large set of bacterial 16 S and 23 S rRNA covariation-based comparative structure models has revealed a propensity for adenosine bases to be unpaired. A disproportionate number of these unpaired adenosine nucleotides are con- secutive, at the 3H end of loops, and adjacent to a paired G at the 3H loop.5H helix junction. The highly conserved nature of the loop-helix junctions described here suggests that they are an important part of several different motifs. Because they occur so frequently, we believe that they are a major building block in the 16 S and 23 S rRNA struc- tures. Our goal is to transform these sequence motifs into structural motifs that help coordinate three-dimensional structure. We have named the adenosine bases that occur at the 3H end of loops in more than 50 % of the bacterial 16 S and 23 S rRNA sequences A-motifs. These are associated with several known structural motifs and are classi®ed into ®ve categories: AAG:U, AAG, AG, AA, and A. Adenosine platforms The ®rst set of loop-helix junctions to consider is those with a AAG:U motif (Table 2 and Figure 5). Thirteen positions in the 16 S and 23 S rRNA con- tain the AAG sequence conserved in more than 50 % of the sequences (see Table 2) and the G:U base-pair conserved in more than 33 % of the sequences (16 S positions 415, 432, and 1289; 23 S positions 14, 706, 1214, 1470, 1854, 1877, 1890, 2135, 2542, 2851). Seven of these sites (in italics) are conserved in more than 90 % of the sequences. This complex sequence motif forms the adeno- sine platform present in the crystal structure of the Tetrahymena thermophila group I intron P4-P6 domain (Cate et al., 1996a,b). To ascertain if the adenosine platform-like sequence motifs in the 16 S and 23 S rRNA are capable of forming the Table 1. Characterization of nucleotides at loop-helix junctions for loops with unpaired 5H nucleotides in 16 S and 23 S rRNA Loop type Total A AA AG AAG AAG:U Total Measured 527 294 (56 %) 113 (21 %) 136 (26 %) 56 (11 %) 13 (2 %) Predicted ± 135 (26 %) 35 (7 %) 43 (8 %) 11 (2 %) 2 (1 %) Hairpin Measured 91 53 (58 %) 21 (23 %) 26 (29 %) 9 (10 %) 0 (±) Predicted ± 24 (25 %) 6 (6 %) 8 (8 %) 2 (2 %) 0 (±) Multi stem Measured 202 110 (54 %) 45 (22 %) 48 (24 %) 20 (10 %) 1 (1 %) Predicted ± 51 (26 %) 13 (7 %) 16 (8 %) 4 (2 %) 1 (1 %) Internal Measured 163 95 (58 %) 40 (25 %) 50 (31 %) 24 (15 %) 10 (6 %) Predicted ± 42 (26 %) 11 (7 %) 13 (8 %) 3 (2 %) 1 (1 %) Bulge Measured 71 36 (51 %) 7 (10 %) 12 (17 %) 13 (4 %) 2 (3 %) Predicted ± 18 (25 %) 5 (7 %) 6 (8 %) 1 (1 %) 0 (±) Junctions were counted if an A-motif occurred in greater than 50 % (33 % for AAG:U) of the sequences in the bacterial 16 S and 23 S rRNA alignments (http://www.rna.icmb.utexas.edu/). Predicted values were calculated with nucleotide frequencies: A (25.7 %), G (31.4 %), and U (20.5 %); values are rounded to the nearest whole number. Percentages are calculated with respect to the total number of positions for that loop type; values are rounded to the nearest whole number, with ``±'' used to represent zero. Table 2. A-motif: AAG:U sites in 16 S and 23 S rRNA Positiona AA (%)b AAG (%)b G:U (%)b Predicted structure motifs c A. Multi-stem loops 23 S rRNA 14 99 99 98 P B. Internal loops 16 S rRNA 415 76 75 59 EL, P 432 100 55 45 GA, P 1289 100 55 55 A, P 23 S rRNA 706 100 94 94 A, P 1214 97 97 97 A, P 1470 86 81 76 GA, P 1854 100 54 39 GA, P 1877 98 98 98 P 1890 100 100 100 P 2135d 86 48 46 P C. Bulge loops 23 S rRNA 2542 100 100 99 P 2851 93 91 91 P rRNA positions have an AAG:U motif in more than 33 % of the bacterial sequences and are indicated in red on Figure 5. a The position number is the nucleotide at the 3H loop end, at the loop-helix junction. b More detailed information is available at http:// www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/. c A, AA.AG@helix.ends; EL, E-like Loop; GA, tandem G:A base-pairs; P, adenosine platform (see Discussion). d Although this site contains less than 50 % AAG, it was included because it contains more than 33 % G:U and narrowly missed the required minimum for AAG. Unpaired Adenosine Bases in Ribosomal RNAs 343
  • 10. adenosine platform structural motif, we have ana- lyzed the group I intron adenosine platforms from a comparative sequence perspective. The crystal structure of the P4-P6 domain of the group I intron has three adenosine platforms at positions 172, 219, and 226 (numbers refer to the second A of the AAG motif for the T. thermophila sequence (GenBank Accession # J01235)). Each of the three adenosine platforms occurs in a distinct structural environment in the comparative secondary struc- Table 3. A-motif: AAG sites in 16 S and 23 S rRNA Position a AA (%)b AAG (%)b Predicted structure motifsc Loop d A. Hairpin loops 16 S rRNA 383 98 70 A GNRA 901 100 97 A, U GNRA 23 S rRNA 311 91 84 U 6 633 100 77 U GNRA 1226 62 52 A, U GNRA 1810 95 88 A GNRA 1872 70 65 GNRA 1928 100 100 U 3 2361 62 55 6 B. Internal loops 16 S rRNA 1333 100 99 A 1434 98 94 1469 54 54 1493 99 99 A 1503 100 100 23 S rRNA 609 100 68 A 1001 99 99 A, GA 1156 98 85 1354 100 99 A, GA, U 1572 92 83 A, GA, U 1580 88 86 GA 1701 100 99 A, EL 2469 96 96 A, GA 2810 83 83 A C. Multi-stem loops 16 S rRNA 60 98 98 A, GA 197 99 93 A 499 99 98 574 99 98 768 97 96 EL 873 100 89 915 100 85 938 100 99 23 S rRNA 423 100 93 472 94 94 603 53 53 A, GA 1010 100 53 1029 100 65 A, GA 1308 99 99 1641 86 85 A 2336 100 99 2378 100 96 A, U 2412 93 85 2566 100 100 A D. Bulge loops 23 S rRNA 1848 100 96 rRNA positions have an AAG motif in more than 50 % of the bacterial sequences and are indicated in green on Figure 5. a The position number is the nucleotide at the 3H loop end, at the loop-helix junction. b More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/. c A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion). d Hairpin loop size (in nucleotides) and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch. 344 Unpaired Adenosine Bases in Ribosomal RNAs
  • 11. ture model (Michel & Dujon, 1983; Michel & Westhof, 1990) and the three-dimensional crystal structure (Cate et al., 1996b): a hairpin loop at position 172, a symmetric 3 Â 3 internal loop at position 219 (where 3 Â 3 refers to the number of nucleotides on each side of the internal loop), and an asymmetric 3 Â 2 internal loop at position 226. They also differ in regards to the type of tertiary interactions with which they are associ- ated. The adenosine platform at position 226 is part of the tetraloop receptor (Murphy & Cech, 1994; Cate et al., 1996b) that makes an intramolecu- lar contact with a tetraloop at position 150, one of the interactions responsible for aligning the two Table 4. A-motif: AG sites in 16 S and 23 S rRNA Position a A (%) b AG (%) b Predicted structure motifs c Loop d A. Hairpin loops 16 S rRNA 300 100 100 A, U GNRA 1080 100 90 A, U GNRA 1269 100 72 A, U GNRA 23 S rRNA 167 100 99 9* 251 100 98 A 5* 322 100 71 3 466 100 79 A, U GNRA 492 99 75 5* 646 87 86 5* 1073 100 100 U 9 1098 99 99 A, U 6* 1618 98 95 A 6* 1755 100 73 3 2147 95 95 4* 2534 54 53 6 2598 100 100 A, U GNRA 2662 # 100 100 A, U GNRA B. Multi-stem loops 16 S rRNA 8 98 98 26‡ 100 99 A 288 100 92 353 98 98 A 523‡ 100 99 828 80 71 860 96 88 A 1046‡ 100 99 1067 100 100 A, U 23 S rRNA 177‡ 59 58 A 324‡ 73 55 332 100 88 374 100 67 A, GA, E 532‡ 65 61 627 99 98 A, GA 655 98 98 A, GA 699‡ 99 95 A 945 99 76 A 975 99 99 A 1189 100 99 A, GA, E 1342 100 100 U 1791 100 98 A 1932 100 100 A, GA, EL 2119 100 100 2126 100 100 A, GA 2587 100 83 A, U 2629 63 57 Position a A (%) b AG (%) b Predicted structure motifs c Loop d C. Internal loops 16 S rRNA 246‡ 100 100 A 520 100 100 A 665 70 67 687 100 97 A 802 100 99 A, EL 1252 72 68 1275 93 92 1418 100 98 A, GA 1456‡ 82 73 23 S rRNA 84 100 98 244 100 99 A, GA, E 294‡ 100 88 A 861 100 96 A, GA, E 878 86 73 1111 100 100 1237 100 82 1268 100 65 A, GA, E 1373‡ 100 91 EL 1434 78 58 1439 90 56 1477 92 88 A, GA, EL 1866 99 90 A, GA 2158 100 99 2298‡ 91 67 2320 60 51 2388‡ 100 100 2639 100 78 A, GA D. Bulge loops 16 S rRNA 583‡ 100 100 777 100 96 23 S rRNA 213 100 100 764‡ 100 60 941‡ 100 99 1205‡ 76 67 1490‡ 97 96 1586 90 79 2602‡ 100 100 rRNA positions have an AG motif in more than 50 % of the bacterial sequences and are indicated in blue on Figure 5. a The position number is the nucleotide at the 3H loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is base-paired; #, Sarcin/Ricin loop. b More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/. c A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion). d Hairpin loop size and special characteristics:. GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch. Unpaired Adenosine Bases in Ribosomal RNAs 345
  • 12. coaxial stacked helices of the P4-P6 domain. The other two adenosine platforms form intermolecular crystal contacts, whose physiological signi®cance is uncertain. We will focus on the two internal loops at positions 219 and 226, since ten of the 13 adeno- sine platform candidates in 16 S and 23 S rRNA occur in internal loops (two occur in bulge loops, and the last occurs in a multi-stem loop (Table 2 and Figure 5)). The adenosine platform at the hair- pin loop at position 172 of the P4-P6 domain will not be considered here, in part because it is also involved in a intramolecular crystal interaction that is not physiological. The P4-P6 domain, as represented by the T. thermophila crystal structure, is only present in the C1 and C2 subgroups of the group I introns (Michel & Westhof, 1990; Damberger & Gutell, 1994). To ensure that we are comparing similar structural elements, we only analyzed those C1 sequences that have the same number of nucleo- tides as T. thermophila at the positions involved in the two adenosine platforms. Only 110 of the 319 sequences in the group C1 intron alignment have a symmetric 3 Â 3 internal loop at position 219 in our sequence alignments and data set. Table 7 reveals the high degree of conservation of the two adenosine residues 5H of the loop- helix junction; 98 % of the sequences have an A residue at positions 218 and 219. Position G220 and its pairing partner U253 are each conserved in approximately 70 % of the sequences, while the G:U base-pair occurs in less in less than 60 % of the sequences. The second most common base-pair is C:G, followed by A:U and G:C. In Table 5. A-motif: AA sites in 16 S and 23 S rRNA Positiona AA (%)b Sequencec Predicted Structure Motifsd Loope A. Hairpin loops 16 S rRNA 162 99 AAC A, U GNRA 622 99 AAC U 5 696 100 AAU A, U 6* 1170 97 AAA 5* 1519 97 AAG A GNRA 23 S rRNA 127 100 AAC A GNRA 390 72 AAA 7 752 92 AAA U 8* 1085 100 AAA U 3 1367 66 AAG GNRA 1635 55 AAU A 5* 2311 84 AAU 7 B. Internal loops 16 S rRNA 374 100 AAU A 449 52 AAG E 676 100 AAU A, GA 782 100 AAC A, EL 909 100 AAC A, E 1447 94 AAC 23 S rRNA 257 60 AAG E 346 89 AAA 515 100 AAC U 677 82 AAC 901 60 AAC 911 100 AAC 1143 100 AAA 1322 71 AAG 1655 99 AAC A 2015 90 AAU 2741 100 AAC A, GA, U Positiona AA (%)b Sequencec Predicted Structure Motifsd Loope C. Multi-stem loops 16 S rRNA 120 99 AAC U 510 99 AAC 959 100 AAU A, GA 1005 51 AAU 23 S rRNA 182 56 AAC A 218 61 AAA 223 94 AAU A, GA, U 300 99 AAC EL 429 98 AAA 483 58 AAC A, U 735 99 AAC 793 61 AAA A, GA 821 100 AAU U 1275 99 AAA 1302 68 AAG 1610 100 AAC 1786 100 AAA 1978 100 AAC A 2199 100 AAC A, GA, U 2287 65 AAA A, GA 2426 98 AAC U 2433 100 AAA U 2734 50 AAG D. Bulge loops 16 S rRNA 51 87 AAC 72 58 AGC 642 51 AAC 23 S rRNA 1900 89 AAA rRNA positions have an AA motif in more than 50 % of the bacterial sequences and are indicated in orange in Figure 5. a The position number is the nucleotide at the 3H loop end, at the loop-helix junction. b More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/. c Most prevalent loop-helix sequence. d A, AA.AG@helix.ends; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see discussion). e Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al. 1990) occur in more than 70 % of the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch. 346 Unpaired Adenosine Bases in Ribosomal RNAs
  • 13. Table 6. A-motif: A sites in 16 S and 23 S rRNA Positiona A (%)b Predicted structure motifsc Loopd A. Hairpin Loops 16 S rRNA 845 61 5 1016 94 A, U GNRA 1453 52 UNGG 23 S rRNA 199 l00 4 548 59 4 574 76 U 8 616 75 5* 1176 62 4 1918 93 7 2478 100 A 7* 2705 99 4* 2757 100 A 11* 2799 56 3 2826 100 7 2860 96 A, U GNRA B. Multi-stem loops 16 S rRNA 16 100 A 315‡ 100 A 338‡ 99 A 366 65 495 99 546‡ 51 864 100 U 983 100 A 994 100 1101 100 1157 100 A, GA 1191 100 1339 100 1349 100 A, GA, E 1398 100 A 23 S rRNA 52 99 A, GA 73 100 94 81 149‡ 95 A 233 100 GA 270 92 340 99 A, GA, EL 412 98 432 100 460 99 A, GA, E 670 100 990 100 1103‡ 100 A 1384 100 1603 99 1829 100 2042 84 2062 100 2171‡ 100 U 2173‡ 100 A, GA, U 2346 100 A, GA 2358 98 A 2835 100 A Positiona A (%)b Predicted structure motifsc Loopd C. Internal loops 16 S rRNA 151 100 174 94 A, GA 282 100 A 389‡ 100 A 482 98 A, GA 487 99 A, GA, E 535 100 715 100 A, GA 1306 100 A, GA 1408 99 A 1483 99 A, GA 1499 100 23 S rRNA 63‡ 56 91‡ 89 103 99 A 207 99 A, GA, E 1050 100 1419 95 A, GA 1664‡ 100 1689 100 A, GA, EL 1723 62 1745 53 1802 100 A, GA 1885‡ 98 2005‡ 85 A 2327‡ 100 A 2614 100 2657 # 100 A, GA, E 2690 68 D. Bulge loops 16 S rRNA 55‡ 100 65 94 130‡ 100 205 83 397‡ 100 595‡ 79 BT 1042‡ 55 1055 100 1196‡ 99 1227‡ 100 1394‡ 100 23 S rRNA 443‡ 100 739‡ 61 BT 896‡ 99 927‡ 89 1819 100 1981‡ 99 2051‡ 61 2873‡ 100 2879‡ 98 rRNA positions have an A motif in more than 50 % of the bacterial sequences and are indicated in yellow on Figure 5. a The position number is the nucleotide at the 3' loop end, at the loop-helix junction; ‡, the nucleotide prior to this position is base-paired; #, Sarcin/Ricin loop b More detailed information is available at http://www.rna.icmb.utexas.edu/ANALYSIS/A-STORY/. c A, AA.AG@helix.ends; BT, base triple; E, E loop; EL, E-like loop; GA, tandem G:A base-pairs; U, U-turn (see Discussion) d Hairpin loop size and special characteristics: GNRA, tetraloops (Woese et al., 1990) occur in more than 70 % of the bacterial rRNA sequences; *, unusually stable G:A or U:U ®rst mismatch. Unpaired Adenosine Bases in Ribosomal RNAs 347
  • 14. addition, positions 219:254 do not form Watson- Crick base-pairs. A total of 139 of the 319 ICI sequences had a 3 Â 2 internal loop at position 226 (Table 7). As in the previous example, adenosine bases are the most frequent nucleotide at the two positions 5H of the loop-helix junction; however, the frequencies of these two adenosine bases are not as high. One- quarter of the sequences have a C base in place of the adenosine at position 226, which is consistent with previous sequence analysis and in vitro selec- tion experiments (Costa & Michel, 1997). The G at position 227 and the G:U base-pair at positions 227:247 are both present in 65 % and 62 % of the sequences, respectively. One of the most conserved features of the 226 adenosine platform is the U224:A248 reverse Hoogsteen base-pair, which occurs in 87 % of the sequences. While all four nucleotides are observed at the bulge at position 249, 88 % of the sequences are pyrimidine bases; in the P4-P6 crystal structure (Cate et al., 1996a), this position is involved in the tertiary interactions with the tetraloop at position 150 and can poten- tially form a hydrogen bond to A226 (Costa & Michel, 1997). The adenosine platform at position 226 in the P4-P6 domain crystal structure widens the minor groove of the RNA helix to allow tertiary contact with the tetraloop at positions 150-153. The tetra- loop receptor in the absence of bound tetraloop assumes an alternate structure, with the adenosine forming a cross-strand stack (Butcher et al., 1997). The adenosine bases, rather than forming the side- by-side arrangement observed in the crystal struc- ture, are arranged in a stacked zipper-like arrange- ment. In addition, the ®rst adenosine nucleotides of the two platforms (218 and 225) become suscep- tible to methylation by dimethylsulfate when the tetraloop-receptor interaction is disrupted by mutation (Murphy & Cech, 1994). Thus, the adeno- sine platform motif appears to have both confor- mational and sequence plasticity. The majority of the ICI sequences with the same internal loop con- ®guration as the Tetrahymena group I intron (see above) have an adenosine and purine juxtaposed and adjacent to the G:U base-pair (positions 219 and 254, and 226 and 248; see Table 7). The most conserved features of the two group I intron adenosine platforms that occur at internal loops are the two consecutive adenosines at the 3H end of the loop. The paired G at the 3H loop.5H helix junction and the G:U base-pair are also moderately conserved. Since the majority of the 16 S and 23 S rRNA adenosine platform candidates are more conserved at these four positions than the two known intron adenosine platforms, it is reasonable to expect this motif to occur at the majority (if not all) of the 16 S and 23 S rRNA AAG:U sequence motifs listed in Table 2. Also note that the majority (77 %) of the rRNA platform candidates occur in internal loops (Table 2 and Figure 5). Most of our 16 S and 23 S rRNA adenosine platform candidates also have an adenosine and purine juxtaposed and adjacent to the G:U base-pair facing the loop (Gautheret et al., 1995b), similar to the two intron adenosine platforms; the most notable exception is the junction at position 1890 in the 23 S rRNA, where a highly conserved (97 %) uridine at position 1852 is opposite the ®rst A at position 1890. Two sets of rRNA adenosine platform candidates (16 S rRNA positions 415 and 432, and 23 S rRNA pos- itions 1854 and 1890) occur at the two opposing ends of the same internal loop. The structural and functional signi®cance of this tight clustering of adenosine platforms is currently unknown. We wonder if these two potential adenosine platform Table 7. Base composition of adenosine platforms in group IC1 introns Percentagea A C G U A C G U Pairingb Structurec a Percentages were determined as described in the text. Only percentages greater than 1 % are shown. b Base-pairing occurring in more than 5 % of the sequences examined. c Partial secondary structure of the Tetrahymena thermophila IC1 intron (GenBank #J01235). The complete structure is available at http://www.rna.icmb.utexas.edu/CSI/2STR/ref2str.html d Indicates base present in the P4-P6 subdomain of Tetrahymena thermophila. 348 Unpaired Adenosine Bases in Ribosomal RNAs
  • 15. motifs form simultaneously, or perhaps alternate in formation during protein biosynthesis. Addition- ally, six of the putative adenosine platforms in Table 2 overlap with other A-motifs, e.g. 16 S rRNA position 1289 is part of the adenosine plat- form and the AA.AG@helix.ends motif 16 S rRNA position 415 (Elgavish et al., unpublished results) is part of the adenosine platform and the E-like loop motif (see below). The A-motifs, that are associated with adenosine platforms are noted in Table 2. E and E-like loops Comparative sequence analysis has identi®ed potential E loop motifs (Varani et al., 1989; Wimberly et al., 1993) in both 16 S and 23 S rRNA (Gautheret et al., 1994; Wimberly, 1994; Leontis & Westhof, 1998). Thirteen dominant A sites in Tables 2-6 overlap with eleven E loops; each occur- rence is indicated in these Tables. Two 16 S and eight 23 S rRNA loop E motifs were predicted ear- lier. The 16 S rRNA positions are 909 (Table 5) and 1349 (Table 6); the 23 S rRNA positions are 207 (Table 6), 244 (Table 4), 374 (Table 4), 460 (Table 6), 674, 1189 (Table 4), 1268 (Table 4), and 2657 (Table 6). Our analysis identi®ed all of these except for position 674 in 23 S rRNA. This E loop motif overlapped with two positions (674 and 806) that are now base-paired in our covariation structure model (comparative support shown in base-pair frequency tables at the CRW Site; see Materials and Methods) but were unpaired at the time that the E loop was proposed (Leontis & Westhof, 1998). Therefore, we don't consider this putative E loop to be valid. Our analysis of dominant A positions has also revealed two new E loop sequence motifs. The ®rst is at positions 447-449 and 484-487 in 16 S rRNA, with both positions 449 and 487 containing a domi- nant A. This potential E loop motif is at the center of an elongated and irregular compound helix. This motif is ¯anked on one side by a helix and on the other by a lone pair (450:483, E. coli number- ing). A tandem G:A base-pair is on the other side of this lone pair. The second new E loop sequence motif is in the 23 S rRNA at positions 858-861 and 916-918. The nucleotides in this motif were paired in the older versions of the 23 S rRNA secondary structure model, thus preventing its detection until now. The previous base-pairs were removed from the current structure model since the variations at the individual positions were not matched by a similar pattern of variation at the partner positions. Our analysis of the dominant A bases at the 3H end of loops has also revealed a sequence motif that is similar to but not identical with the E loop motif. The canonical E loop motif has an asym- metric 4Â3 internal loop, as shown in Figure 6(a). For sequences 5H -NGUAP-3H and 5H -QGAA-3H , P and Q (positions 5 and 6) are base-paired, with unusual pairing conformations between positions 1 and 9, 3 and 8, and 4 and 7 (Figure 6(a)). In con- trast, our E-like loop motif, as we like to call it, also contains the two sequences 5H -NGUAP-3H and 5H -QGAAZ-3H (Figure 6(b)). Here again, P and Q (positions 5 and 6) and N and Z (positions 1 and 10) form two canonical base-pairs, leaving the 5H GUA-3H in sequence 1 juxtaposed with the 5H -GAA 3H in sequence 2. Presumably three additional pair- ings are formed: G:A (2 and 9), U:A (3 and 8), and A:G (4 and 7). The conformations for the second and third pairings, U:A and A:G are related to the G:A type II tandems as described by Gautheret et al. (1994). Here, the invariant U:A base-pair is thought to adopt the reverse Hoogsteen confor- mation, adjacent to a sheared A:G base-pair, result- ing in the two adenosine bases protruding into the minor groove and overwinding the helix. This arrangement of nucleotides is present in the bac- terial version of the 5 S rRNA E loop, and is called the cross-strand A stack (Correll et al., 1997). Poss- ibly the ®rst sheared A:G base-pair (positions 2 and 9 in Figure 6(b)) underwinds the helix and returns it to register. Eight E-like loop motifs are present in the conserved core of the 16 S and 23 S rRNAs and contain eleven dominant A sites. Three of these motifs occur at positions 413-415/428-430, 765-767/812-814, and 780-782/800-802 in the 16 S rRNA; ®ve more occur in the 23 S rRNA at positions 298-300/338-340, 1358-1360/1371-1373, 1475-1477/1514-1516, 1687-1689/1699-1701, and 1930-1932/1968-1970. Five of these E-like loops occur in internal loops; three are present in multi- stem loops. The A-motifs that are associated with E and E-like loops are noted in Tables 2-6. AA.AG@helix.ends and tandem G:A base-pairs Adenosine bases at the 3H end of loops have also been associated with G:A base-pairs at the end of helices (Traub & Sussman, 1982; Woese et al., 1983). Here, the helix is extended by at least one G:A base-pair (for example, the sequences 5H -AGP- 3H and 5H -QCG-3H interact to form A:G, G:C, and P:Q base-pairs). G:A juxtapositions have been Figure 6. Schematic of E and E-like loops. Nucleotides are numbered for reference. Types of base-pairing are indicated by lines: canonical pairings (G:C, A:U) have thick, continuous lines, type II tandem G:A pairings have thin, broken lines, and other non-canonical pair- ings are shown with thick, broken lines. (a). Canonical E loop, where positions 1-4 and 7-9 comprise the 4 Â 3 internal loop. (b). E-like loop. Positions 2-4 and 7-9 com- prise the 3 Â 3 internal loop. Unpaired Adenosine Bases in Ribosomal RNAs 349
  • 16. shown to be energetically stable in one thermo- dynamic study of bulge loops (Longfellow et al., 1990). More recently, we have analyzed a large number of 16 S and 23 S rRNAs comparative struc- ture models and con®rmed that many helices do close with a G:A juxtaposition (Elgavish et al. unpublished results). However, we also noted in our comparative study that many of these juxtapo- sitions in E. coli are maintained in at least 90 % of the sequences and found, in addition to the G:A juxtapositions, that many helices are ¯anked by A:A or A:A/G:A juxtapositions. Our studies revealed a strong bias in the orientation for these G:A base-pairs: A is always 5H to the helix, while G or A is 3H to the helix. These observations are con- sistent with the bias for unpaired adenosine bases at the 3H end of loops and for the high percentage of unpaired G and A at the 5H end of loops. Note that some of these AA.AG@helix.ends are a component of E and E-like loops and that GNRA tetraloops (Woese et al., 1990) have the AA.AG@ helix.ends motif. A total of 116 A-motifs are associ- ated with AA.AG@helix.ends and are noted in Tables 2-6. Several of these A:A and G:A juxtapositions at the 5H end of helices are ¯anked on their 5H side by a second A:A or G:A pair. Tandem G:A and A:A pairs in the 16 S and 23 S RNA were identi®ed ear- lier (SantaLucia et al., 1990; Gautheret et al., 1994), and can adopt a single structure conformation that is consistent with their pattern of nucleotide substitutions (Gautheret et al., 1994). We have searched again for these tandem G:A/A:A motifs in our newer 16 S and 23 S rRNA comparative structure models and our larger collection of comparative rRNA structure models. In addition to the tandems identi®ed earlier (Gautheret et al., 1994), we have found 23 new tandems that are conserved in at least 90 % of the bacterial 16 S and 23 S rRNA sequences. Fifty A-motifs are associated with G:A tandems, and they are noted in Tables 2-6. U-turns The U-turn, a structure motif characterized by a sharp turn in the RNA, was ®rst identi®ed in the tRNA crystal structure (Quigley & Rich, 1976), and subsequently has been found in several other RNAs (Pley et al., 1994; Jucker & Pardi, 1995; Huang et al., 1996; Fountain et al., 1996; Conn et al., 1999; Culver et al. 1999; Stallings & Moore, 1997; Puglisi & Puglisi, 1998). Dominant A nucleotides at the 3H end of 16 S and 23 S rRNA loops are also found in some of the tetra- and hexanucleotide hairpin loops that form U-turns (Woese et al., 1990; Jucker & Pardi, 1995; Huang et al., 1996; Fountain et al., 1996). In both of these loop mo®fs, a base-pair forms between the guanosine at the ®rst position of the hairpin loop (and 3H to the helix), and the adenosine at the last position of the loop (and 5H to the helix). Recently, we have predicted, based on the analysis of many comparative structure models, 57 positions in the 16 S and 23 S rRNA where the U-turn motif might occur (Gutell et al., 2000). The 39 U-turn candidates that are coincident with A-motifs are noted in Tables 2-6. Of these, 22 occur in hairpin loops; 13 (59 %) of these are GNRA tetraloops. The remain- ing 17 occur in internal loops and multi-stem loops. Concluding comments Of the 527 positions at the 3H end of loops in the 16 S and 23 S rRNA, nearly 300 are occupied with a dominant A, an adenosine that occurs in more than 50 % of the bacterial sequences. Larger sequence motifs that occur frequently are built onto these A-motifs. There are 102 A, 56 AA, 80 AG, 43 AAG, and 13 AAG:U A-motifs. A total of 51 % of of these sites are part of a known structural motif (Table 8(a)). Of these, 39 % of the A-motifs are associated with the AA.AG@helix.ends motif; 14 % of these are within GNRA tetraloops. Tandem G:A pairs and U-turns are also common, occurring at 17 % and 14 % of the A-motif sites, respectively. There are smaller percentages of adenosine platforms (4 %) and E loop (4 %) and E-like loop (4 %) sequence motifs (Table 8(a)). Some of these structural motifs are part of a lar- ger structural element. For example, some of the AA.AG@helix.ends motifs are within the bound- aries of E and E-like loops, the tandem G:A motif, and GNRA tetraloops. Some of these GNRA hair- pin loops are themselves involved in larger tertiary folds (Jaeger et al., 1994; Costa & Michel, 1995; Cate et al., 1996b). Other A-motifs are associated with more than one structural motif in which one motif is not entirely contained within the other. Here, the structural motifs involve positions that are not utilized by the other, except for the domi- nant A at the 3H end of the loop. For example, pos- ition 415 in 16 S rRNA is part of the E-like loop and adenosine platform motifs. Two examples where a single dominant A is part of both an ade- nosine platform and a G:A tandem are at 16 S rRNA position 432 and position 1854 in 23 S rRNA. Although our understanding of RNA struc- tural motifs is not complete, these overlapping and possibly competing structural A-motifs suggest that these junctions of the RNA might be under- going conformational changes. In total, only one structural motif occurs at 51 % of the A-motifs that are associated with a known structural motif (Table 8). A total of 37 % are associated with two structural motifs, and 13 % are associated with three structural motifs. In contrast, we are unable to predict the struc- ture conformation for 49 % of the A-motifs. There- fore, there is the possibility that new structural motifs occur at these positions. Alternatively, struc- tural motifs that we are already familiar with occur at these A-motifs with a composition and arrange- ment of nucleotides that were not previously associated with that motif (for example, adenosine 350 Unpaired Adenosine Bases in Ribosomal RNAs
  • 17. platforms occur at positions with sequences other than AAG:U). To help resolve this issue, the con- formations of these adenosine bases in the 30 S and 50 S ribosomal subunit crystal structures (Ban et al., 2000; Schluenzen et al., 2000; Wimberly et al., 2000) need to be analyzed. Some 8 % of the A-motifs are single bulge adenosine nucleotides; while the structural signi®cance for all of them are unknown, covariation analysis and NMR have revealed a base-triple in 16 S rRNA between a bulged A at position 595 and the base-pair at 596:644 (CRW Site; Kalurachchi et al., 1997). Although the thermodynamic consequences of the unpaired adenosine bases identi®ed here in the covariation-based structure models are not known, an earlier thermodynamic study of internal loops revealed that unpaired adenosine bases in asym- metrical loops are more destabilizing than those in symmetrical loops (Peritz et al., 1991). The three sets of results, (1) this thermodynamic study; (2) the preponderance of adenosine bases in unpaired regions of the covariation-based structure model, with the majority of these occurring in asymmetri- cal loops; and (3) the structural studies that reveal that the majority of these unpaired adenosine nucleotides are base-paired, albeit in an irregular manner (Cate et al., 1996a,b; Ban et al., 2000; Schluenzen et al., 2000; Wimberly et al., 2000), may all be coordinated and in¯uence RNA folding. We speculate that these destabilizing, asymmetrically placed adenosine nucleotides are a signi®cant com- ponent in the transition from secondary to tertiary RNA structure. The destabilizing effects of these adenosines on secondary structure, coupled with the need for an RNA molecule to adopt its minimal energetic state, suggest that these abundant adeno- sine nucleotides will actively seek out energetically stabilizing tertiary interactions and, in the process, form a three-dimensional RNA molecule. The propensity for conserved and unpaired ade- nosine bases in the 16 S and 23 S rRNA covariation structure models must be related to the structure and function of the ribosome. As stated earlier, unpaired positions in the covariation structure model do not imply that those positions are not paired; it (only) says that they don't pair in the regular manner that most covariation-based base- pairs do. And given that other unpaired positions are paired, albeit irregularly, in other RNA molecules whose structures have been solved by crystallography or NMR (e.g. adenosine platforms, E loops), we anticipate these unpaired positions in the 16 S and 23 S rRNA covariation structure models to be paired. We now wonder if these unu- sual pairings can be predicted with comparative analysis. Our A story is a beginning towards this end. As noted, the A-motifs come in various forms, i.e. A, AA, AG, AAG, and AAG:U, and these are associated with several known structural motifs. These observations suggest that unpaired adeno- sine bases can form a variety of different structural conformations. What is special about adenosine that lends itself to participating in these structural motifs? And in some situations, it appears as though at least two different structural elements can occur at the same A-motif. Does one structural motif predominate at these positions, or do these sites provide the ribosome with an opportunity to alternate conformations during the ribosome cycle? Is the prevalence of adenosine bases at these pos- itions related to the ability of adenosine to accom- modate a variety of binding partners, perhaps its base stacking potential, or other interesting inter- actions? The A story is not ®nished. Table 8. Summary of domainant A nucleotides and related motifs (based upon Tables 1-6) A. Occurrences of motifs at dominant A positions Category 16 S rRNA 23 S rRNA Total 1 # of adenosine platforms 3 (3 %) 10 (5 %) 13 (4 %) 2 # of loops 4 (4 %) 8 (4 %) 12 (4 %) 3 # of E-like loops 4 (4 %) 7 (4 %) 11 (4 %) 4 # of AA,AG@helix.ends 44 (44 %) 72 (37 %) 116 (39 %) 4a # of AA,AG@helix.ends in GNRA tetraloops 8 (8 %) 8 (4 %) 16 (5 %) 4b # of other AA,AG@helix.ends 36 (36 %) 64 (33 %) 100 (34 %) 5 # of tandem GA's 13 (13 %) 37 (19 %) 50 (17 %) 6 # of U-turns 11 (11 %) 29 (15 %) 40 (14 %) 7 # of single bulges 9 (9 %) 14 (7 %) 23 (8 %) 8 Total # of dominant A bases associated with motifs (1-6)a 51 (51 %) 98 (51 %) 149 (51 %) 9 # of dominant A bases not associated with motifs (1-6) 49 (49 %) 96 (49 %) 145 (49 %) 10 Total # of dominant A bases at 3H ends of loops (8 ‡ 9) 100 194 294 B. Number of motifs per dominant A nucleotide (not including single bulges) Motifs 16 S rRNA 23 S rRNA Total 1 25 (49 %) 51 (52 %) 76 (51 %) 2 24 (47 %) 30 (31 %) 54 (36 %) 3 2 (4 %) 17 (17 %) 19 (13 %) Total # of dominant A bases 51 98 149 Total # of associated motifs 79 162 241 Average # of associated motifs per dominant A position with an associated motif 1.5 1.7 1.6 a A single dominant A may be associated with 1-3 motifs. Unpaired Adenosine Bases in Ribosomal RNAs 351
  • 18. Materials and Methods Additional supporting data is presented at the CRW Site (http://www.rna.icmb.utexas.edu) and the CRW A Story pages (http://www.rna.icmb.utexas.edu/ANAL- YSIS/A-STORY/). The CRW A story information supplements the data presented in Figures 1-4 and Tables 1-8 and is divided into four categories: general data; position-speci®c data; structure diagrams; and manuscript materials. The general data (GE) section con- tains generalized counts for the number and frequency of different A-motifs in the 16 S and 23 S rRNA com- parative structure models from the (1) bacteria (summar- ized in Figures 1-4); (2) the archaea and eucarya (nuclear, chloroplast, and mitochondria); and (3) A-motif analysis of the comparative structure models from 5 S rRNA and group I introns. The position-speci®c data (PS) section presents frequency tables for all of the 16 S and 23 S rRNA positions which contain an A-motif (with data from the three phylogenetic domains, chloroplasts, and mitochondria); larger motifs (adenosine platforms, E and E-like loops, AA.AG@helix.ends, tandem G:A pairings, and U-turns) that map onto the A-motifs are identi®ed. Frequency tables for E and E-like Loops (including only bacterial data) are also provided here. The structure diagrams (SD) section contains Figure 5 and includes secondary structure diagrams for each of the motifs examined in these motifs. The manuscript materials (MS) section contains all of the Figures and Tables from this manuscript. The RNA sequence alignments used for this analysis are maintained by us at the University of Texas (R.R.G., unpublished results; CRW Site). Sequences were manu- ally aligned with the alignment editor AE2 (T. Macke, Scripps Research Institute, San Diego, CA). As of June 2000, the bacterial 16 S alignment contains 5859 sequences, and the bacterial 23 S alignment contains 327 sequences; both alignments use E. coli (GenBank Acces- sion # J01695) as their reference sequence for position numbers. The group I intron (C1 subclass) alignment contains 319 sequences and uses T. thermophila (GenBank Accession # J01235) as its reference sequence for position numbers. Two subalignments of 110 and 139 sequences having the appropriate arrangement of nucleotides at the 219 and 226 adenosine platform internal loops (see the text) were created from this larger alignment. These sequence alignments will be available from this site in the future. Secondary structure models for representatives of the main phylogenetic groupings are inferred by compara- tive sequence analysis (Gutell; 1996; Gutell et al., unpub- lished results). As of June 2000, a total of 399 16 S rRNAs, 292 23 S rRNA, 73 5 S rRNAs, and 174 group I intron secondary structure models are in our collection (CRW Site). At present, only a subset of these diagrams (those diagrams incorporating all of the newest pairings in our re®ned structure models and in which we have the most con®dence) are publicly available; as diagrams are updated to meet these standards, they will be made available. For Figures 1-4, we counted the overall distri- butions of the four nucleotides for the entire RNA struc- ture, and for paired, unpaired, and loop-helix junction positions, analyzing 278 bacterial structures (209 from 16 S rRNA and 69 from 23 S rRNA); a complete list of these models is available online. We also present online the detailed frequencies used to calculate the histograms in Figures 1-4. For these tables (CRW A Story (GE)), we have analyzed all of our 16 S, 23 S, and 5 S rRNA (bacterial, archaea, eucarya, chloroplast, and mitochon- dria) and group I intron comparative structure models. The numbers of structure models analyzed for the online tables are included in those tables. Other nucleotide dis- tributions are listed dynamically on our online tables. The programs that generate this information will be pre- sented elsewhere (Z.S. & R.G., unpublished results). These online tables will be routinely updated as more comparative structure models are determined. Positions at the 3H ends of loops in the E. coli 16 S and 23 S rRNA secondary structure models were manually identi®ed. Each site was classi®ed into one of four loop types: hairpin, multi-stem, internal, or bulge. The pre- dicted A-motif frequencies in Table 1 were calculated using the nucleotide frequency values determined from the bacterial 16 S and 23 S structures (above). The program query (Gutell et al., unpublished program) was used to collect nucleotide frequency data from (AE2) sequence alignments. Base frequencies for each site were computed independently from the bacterial alignments (16 S and 23 S rRNA). For bacterial data, sites with a given A-motif in more than 50 % of the sequences (33 % for the AAG:U motif) are summarized in Table 1 and detailed in Tables 2-6; the data from Tables 1-6 are summarized with respect to structural motifs in Table 8. Single nucleotide and base-pair frequencies in Table 7 were calculated from the intron alignments using query. The secondary structure ®gures showing the A-motif sites (Figure 5), the group I intron secondary structure diagram portion in Table 7, and the additional secondary structure diagrams available online were generated with the program XRNA (Weiser & Noller, University of California, Santa Cruz). Acknowledgments This work was supported by the NIH (awarded to R.R.G., GM48207), NSF (awarded to M.S., MCB- 9707940), Welch Foundation (awarded to R.R.G.), and from startup funds from the Institute for Cellular and Molecular Biology at the University of Texas at Austin (awarded to R.R.G.). References Agalarov, S. C., Prasad, G. S., Funke, P. M., Stout, C. D. & Williamson, J. R. (2000). Structure of the S15,S6,S18-rRNA complex: assembly of the 30 S ribosome central domain. Science, 288, 107-112. Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B. & Steitz, T. A. (1999). Placement of protein and RNA structures into a 5 A-resolution map of the 50 S ribosomal subunit. Nature, 400, 841-847. Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science, 289, 905-920. Butcher, S. E., Dieckmann, T. & Feigon, J. (1997). Solution structure of a GAAA tetraloop receptor RNA. EMBO J, 16, 7490-7499. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E. et al. (1996a). Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 1678-1686. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Szewczak, A. A., Kundrot, C. E., Cech, T. R. 352 Unpaired Adenosine Bases in Ribosomal RNAs
  • 19. & Doudna, J. A. (1996b). RNA tertiary structure mediation by adenosine platforms. Science, 273, 1696-1699. Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). X-ray crystal structures of 70S ribosome functional complexes. Science, 285, 2095-2104. Clemons, W. M., Jr, May, J. L. C., Wimberly, B. T., McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V. (1999). Structure of a bacterial 30 S ribosomal subunit at 5.5 A resolution. Nature, 400, 833-840. Conn, G. L., Draper, D. E., Lattman, E. E. & Gittis, A. G. (1999). Crystal structure of a conserved ribosomal protein-RNA complex. Science, 284, 1171-1174. Correll, C. C., Freeborn, B., Moore, P. B. & Steitz, T. A. (1997). Metals, motifs, and recognition in the crystal structure of a 5S rRNA domain. Cell, 91, 705-712. Costa, M. & Michel, F. (1995). Frequent use of the same tertiary motif by self-folding RNAs. EMBO J. 14, 1276-1285. Costa, M. & Michel, F. (1997). Rules for RNA recog- nition of GNRA tetraloops deduced by in vitro selection: comparison with in vivo evolution. EMBO J. 16, 3289-3302. Culver, G. M., Cate, J. H., Yusupova, G. Z., Yusupov, M. M. & Noller, H. F. (1999). Identi®cation of an RNA-protein bridge spanning the ribosomal sub- unit interface. Science, 285, 2133-2136. Damberger, S. H. & Gutell, R. R. (1994). A comparative database of group I intron structures. Nucl. Acids Res. 22, 3508-3510. Fountain, M. A., Serra, M. J., Krugh, T. R. & Turner, D. H. (1996). Structural features of a six-nucleotide RNA hairpin loop found in ribosomal RNA. Biochemistry, 35, 6539-6548. Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N., Caruthers, M. H., Neilson, T. & Turner, D. H. (1986). Improved free-energy parameters for predic- tions of RNA duplex stability. Proc. Natl Acad. Sci. USA, 83, 9373-9377. Gautheret, D., Konings, D. & Gutell, R. R. (1994). A major family of motifs involving G:A mismatches in ribosomal RNA. J. Mol. Biol. 242, 1-8. Gautheret, D., Damberger, S. H. & Gutell, R. R. (1995a). Identi®cation of base-triples in RNA using com- parative sequence analysis. J. Mol. Biol. 248, 27-43. Gautheret, D., Konings, D. & Gutell, R. R. (1995b). G:U base pairing motifs in ribosomal RNAs. RNA, 1, 807-814. Gutell, R. R. (1996). Comparative sequence analysis and the structure of 16S and 23S rRNA. In Ribosomal RNA: Structure, Evolution, Processing and Func- tion in Protein Biosynthesis (Dahlberg, A. E. & Zimmermann, R. A., eds), pp. 111-128, CRC Press, Boca Raton, FL, USA. Gutell, R. R. (1999). Comparative analysis of RNA sequences. Nucl. Acids Symp. Ser. 41, 48-53. Gutell, R. R., Weiser, B., Woese, C. R. & Noller, H. F. (1985). Comparative anatomy of 16S- like ribosomal RNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216. Gutell, R. R., Cannone, J. J., Konings, D. & Gautheret, D. (2000). Predicting U-turns in ribosomal RNA with comparative sequence analysis. J. Mol. Biol. 300, 791-803. Huang, S., Wang, Y.-X. & Draper, D. E. (1996). Structure of a hexanucleotide RNA hairpin loop conserved in ribosomal RNAs. J. Mol. Biol. 258, 308-321. Jaeger, L., Michel, F. & Westhof, E. (1994). Involvement of a GNRA tetraloop in Long-range RNA tertiary interactions. J. Mol. Biol. 236, 1271-1276. Jucker, F. M. & Pardi, A. (1995). GNRA tetraloops make a U-turn. RNA, 1, 219-222. Kalurachchi, K., Uma, K., Zimmermann, R. A. & Nikonowicz, E. P. (1997). Structural features of the binding site for ribosomal protein S8 in Escherichia coli 16S rRNA de®ned using NMR spectroscopy. Proc. Natl Acad. Sci. USA, 94, 2139-2144. Leontis, N. B. & Westhof, E. (1998). A common motif organizes the structure of multi-helix loops in 16 S and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-583. Longfellow, C. E., Kierzek, R. & Turner, D. H. (1990). Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides. Biochemistry, 29, 278- 285. Michel, F. & Dujon, B. (1983). Conservation of RNA sec- ondary structures in two intron families including mitochondrial-, chloroplast- and nuclear-encoded members. EMBO J. 2, 33-38. Michel, F. & Westhof, E. (1990). Modeling of the three- dimensional architecture of group I catalytic introns based upon comparative sequence analysis. J. Mol. Biol. 216, 585-610. Michel, F., Costa, M., Massire, I. & Westhof, E. (2000). Modeling RNA tertiary structure from patterns of sequence variation. Methods Enzymol. 317, 491-510. Murphy, F. L. & Cech, T. R. (1994). GAAA tetraloop and conserved bulge stabilize tertiary structure of a group I intron domain. J. Mol. Biol. 236, 49-63. Nikulin, A., Serganov, A., Ennifar, E., Tishchenko, S., Nevskaya, N., Shepard, W., Portier, C., Garber, M., Ehresmann, B., Ehresmann, C., Nikonov, S. & Dumas, P. (2000). Crystal structure of the S15-rRNA complex. Nature Struct. Biol. 7, 273-277. Peritz, A. E., Kierzek, R., Sugimoto, N. & Turner, D. H. (1991). Thermodynamic study of internal loops in oligoribonucleotides: symmetric loops are more stable than symmetric loops. Biochemistry, 30, 6428- 6436. Pley, H. W., Flaherty, K. M. & McKay, D. B. (1994). Three-dimensional structure of a hammerhead ribo- zyme. Nature, 372, 68-74. Puglisi, E. V. & Puglisi, J. D. (1998). HIV-1 A-rich RNA loop mimics the tRNA anticodon structure. Nature Struct. Biol. 5, 1533-1036. Quigley, G. J. & Rich, A. (1976). Structural domains of transfer RNA molecules. Science, 194, 796-806. SantaLucia, J., Kierzek, R. & Turner, D. H. (1990). Effects of GA mismatches on the structure and thermo- dynamics of RNA internal loops. Biochemistry, 9, 8813-8819. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agman, I., Franceschi, F. & Yonath, A. (2000). Struc- ture of functionally activated small ribosomal sub- unit at 3.3 AÊ resolution. Cell, 102, 615-623. Serra, M. J., Axenson, T. J. & Turner, D. H. (1994). A model for the stabilities of RNA hairpins based on a study of the sequence dependence of stability for hairpins with six nucleotides. Biochemistry, 33, 14289-14296. Stallings, S. C. & Moore, P. B. (1997). The structure of an essential splicing element: stem loop IIA from yeast U2 snRNA. Structure, 5, 1173-1185. Szewczak, A. A., Moore, P., Chan, Y-L. & Wool, I. G. (1993). The conformation of the sarcin/ricin loop Unpaired Adenosine Bases in Ribosomal RNAs 353
  • 20. from 28S ribosomal RNA. Proc. Natl Acad. Sci. USA, 90, 9581-9585. Tocilj, A., Schluenzen, F., Janell, D., Gluehmann, M., Hansen, H. A. S., Harms, J., Bashan, A., Bartels, H., Agmon, I., Franceschi, F. & Yonath, A. (1999). The small ribosomal subunit from Thermus thermophilus at 4.5 AÊ resolution: pattern ®ttings and the identi®- cation of functional site. Proc. Natl Acad. Sci. USA. 96, 14252-14257. Traub, W. & Sussman, J. L. (1982). Adenine-guanine base pairing in ribosomal RNA. Nucl. Acids Res. 10, 2701-2708. Varani, G., Wimberly, B. & Tinoco, I., Jr (1989). Confor- mation and dynamics of an RNA internal loop. Biochemistry, 28, 7760-7772. Wimberly, B. (1994). A common RNA loop motif as a docking module and its function in the hammer- head ribozyme. Nature Struct. Biol. 1, 820-827. Wimberly, B., Varani, G. & Tinoco, I., Jr (1993). The con- formation of loop E of eukaryotic 5S ribosomal RNA. Biochemistry, 32, 1078-1087. Wimberly, B. R., Guymon, R., McCutcheon, J. P., White, S. W. & Ramakrishnan, V. (1999). A detailed view of a ribosomal active site: the structure of the L11- RNA complex. Cell, 97, 491-502. Wimberly, B. T., Broderson, D. E., Clemons, W. M., Jr, Morgan-Warren, R. J., Carter, A. P., Vonrhein, C., Hartsch, T. & Ramakrishnan, V. (2000). Structure of the 30 S ribosomal subunit. Nature, 407, 327-339. Woese, C. R. & Pace, N. R. (1993). Probing RNA struc- ture, function, and history by comparative analysis. In The RNA World (Gesteland, R. F. & Atkins, J. F., eds), pp. 91-117, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Woese, C. R., Gutell, R., Gupta, R. & Noller, H. F. (1983). Detailed analysis of the higher-order struc- ture of 16S-like ribosomal ribonucleic acids. Microb. Rev. 47, 621-669. Woese, C. R., Winker, S. & Gutell, R. R. (1990). Architec- ture of ribosomal RNA: constraints on the sequence of ``tetra-loops''. Proc. Natl Acad. Sci. USA, 87, 8467- 8471. Xia, T., SantaLucia, J., Jr, Burkard, M. E., Kierzek, R., Schroeder, S., Jiao, X., Cox, C. & Turner, D. H. (1998). Thermodynamic parameters for an expanded nearest neighbor model for formation of RNA duplexes with Watson-Crick base-pairs. Biochemistry, 37, 14719-14735. Edited by D. E. Draper (Received 7 July 2000; received in revised form 9 September 2000; accepted 9 September 2000) 354 Unpaired Adenosine Bases in Ribosomal RNAs