In mouse brain cDNA libraries generated from small RNA molecules we have identified a total of 201 different expressed RNA sequences potentially encoding novel small non‐messenger RNA species (snmRNAs). Based on sequence and structural motifs, 113 of these RNAs can be assigned to the C/D box or H/ACA box subclass of small nucleolar RNAs (snoRNAs), known as guide RNAs for rRNA. While 30 RNAs represent mouse homologues of previously identified human C/D or H/ACA snoRNAs, 83 correspond to entirely novel snoRNAs. Among these, for the first time, we identified four C/D box snoRNAs and four H/ACA box snoRNAs predicted to direct modifications within U2, U4 or U6 small nuclear RNAs (snRNAs). Furthermore, 25 snoRNAs from either class lacked antisense elements for rRNAs or snRNAs. Therefore, additional snoRNA targets have to be considered. Surprisingly, six C/D box snoRNAs and one H/ACA box snoRNA were expressed exclusively in brain. Of the 88 RNAs not belonging to either snoRNA subclass, at least 26 are probably derived from truncated heterogeneous nuclear RNAs (hnRNAs) or mRNAs. Short interspersed repetitive elements (SINEs) are located on five RNA sequences and may represent rare examples of transcribed SINEs. The remaining RNA species could not as yet be assigned either to any snmRNA class or to a part of a larger hnRNA/mRNA. It is likely that at least some of the latter will represent novel, unclassified snmRNAs.
A major goal of the joint international efforts of the Human Genome Project is the sequence, identification, structure, regulation and function of all 30 000–40 000 genes and their products. To facilitate functional analysis of the encoded gene products, this endeavour has been extended to model organisms from bacteria to mouse. Furthermore, expressed sequence tags (ESTs) have been employed to catalogue all mRNAs and recent efforts to generate their full‐length sequences provide essential tools to study post‐transcriptional processing of transcripts including alternative splicing, identification of protein coding genes and functional analysis. In contrast, not many experimental efforts address the class of small non‐messenger RNAs (snmRNAs; Kiss‐Laszlo et al., 1996; Olivias et al., 1997). These molecules do not encode proteins, but have cellular functions on their own or in complex with proteins that are bound to the RNA and thus form ribonucleoprotein complexes (RNPs). Such RNPs, found in cellular compartments as diverse as the nucleolus or dendritic processes of nerve cells (Tiedge et al., 1993; Pederson, 1998), exhibit a surprisingly diverse range of functions. However, the biological role of some of them remains elusive. Moreover, most systematic genomic searches are biased against their detection, and comprehensive identification by computational analysis of the genomic sequence of any organism remains an unsolved problem (Eddy, 1999). Hundreds of genes and their RNA products may thus remain undetected. Their functions, interactions in cellular circuits and roles in disease would remain unknown and our understanding of the functioning of a cell would be incomplete. Therefore, we set out to identify directly snmRNAs and their genes in the human genome and those of various model organisms.
Here we describe our experimental approach to the discovery of novel snmRNAs in mouse. This EST‐like approach has been tailored for the detection of small RNAs [starting with material that is usually discarded: small total RNA in the size range ∼50–500 nucleotides (nt)]. The resulting sequences have been termed expressed RNA sequences (ERNS). In this study, we present the first unbiased look at the small RNA population in a mammalian cell. Thus far, we have identified ∼200 candidates for novel snmRNA species via ERNS. More than half of them correspond to new members of the two expanding subclasses of small nucleolar RNAs (snoRNAs) that guide RNA ribose methylation and pseudouridylation. Interestingly, while the vast majority of previously known members of these two snoRNA classes direct the modification of rRNA, several of the novel members are able to guide modification of spliceosomal small nuclear RNAs (snRNAs). Moreover, an unexpectedly large number of them remain without identified RNA targets. Finally, some of them are not ubiquitously expressed, as expected for rRNA or snRNA modification guides, raising the possibility of tissue‐specific targets, presumably mRNAs.
Results and discussion
Library construction and analysis
We constructed two cDNA libraries from mouse brain (see Materials and methods) based on small RNAs sized from ∼50 to ∼110 (Fraction II) and from ∼110 to ∼500 nt (Fraction I). Two separate libraries were generated to avoid potential overrepresentation of highly abundant tRNA species in Fraction II. We randomly sequenced 400 clones from each fraction and identified sequences by a BLASTN database search (Figure 1).
In Fraction I, many of the cDNA sequences could be assigned to genes encoding known snmRNAs (Figure 1). In addition to rRNAs or snRNAs, we identified other known small RNA species such as 7SL RNA, 7SK RNA, Y1 scRNA, RNase P or a brain‐specific snmRNA, designated BC1 RNA (DeChiara and Brosius, 1987). About 1% of the sequences were derived from known mRNA fragments. Only 3% of cDNA sequences could not be assigned to known RNAs and therefore represented potentially novel snmRNAs. The Fraction II library contained, among others, sequences derived from tRNAs, 4.5S RNA and previously identified snRNAs or snoRNAs. As observed in the Fraction I cDNA libary, degradation fragments of 28S or 18S rRNA genes were also present. Compared with Fraction I, a larger number of novel, unknown cDNA clones (7%) could be identified by a BLASTN database search, thus potentially representing novel snmRNAs (Figure 1).
To enrich the fraction of novel RNA species in our analysis, cDNA clones were spotted on filters in high density arrays and hybridized to radiolabelled oligonucleotides identifying the most abundant, known snmRNAs. By this approach, we could significantly increase the amount of novel RNA species in our selection procedure from 3 to 20% in Fraction I and from 7 to 22% in Fraction II. From each library, ∼40 000 clones were screened by the hybridization procedure. Signals obtained in the filter hybridization were ranked by computer‐aided analysis. Subsequently, we sequenced ∼2500 clones from each fraction exhibiting the lowest hybridization scores.
Analysis of candidates for snmRNAs
By sequence analysis, 201 novel ERNS from mouse were identified. Expression and sizes of these potential snmRNA species were confirmed by northern blot analysis. In general, they matched the sizes of the corresponding cDNAs (Tables I, II, III, IV, V, VI), which were shorter by at least 5–10 bases, since the extreme 5′ ends of RNAs were not present due to the cloning strategy employed. In several cases, a complete sequence of the mouse snmRNA could be found within a mouse EST entry using a BLASTN database search. We also investigated whether novel RNAs would be expressed specifically in any of the following tissues: brain, liver, heart, kidney or testis (data not shown). The expression of a subset of ERNS could not be confirmed by northern blot analysis. This could be explained by low expression levels of the respective RNA species.
We determined the total number of independent cDNA clones obtained for each snmRNA. While most clones were found only once in our screen, some were present in numerous copies. This correlated well with their abundance as deduced by northern blot analysis. Homologues for more than half of the mouse ERNS could be identified in human genomic or EST sequences (sequence similarity >80%), consistent with a functional role of these RNAs. Based on structural hallmarks, expression and presumed function, the novel 201 snmRNA candidates were assigned to 13 different subgroups (see Tables I, II, III, IV, V, VI).
Novel mouse snoRNAs
Based on sequence and structural features (Maxwell and Fournier, 1995; Balakin et al., 1996; Ganot et al., 1997b), we identified 72 novel snoRNA species from the C/D box and 41 from the H/ACA box type. The known function of snoRNAs is post‐transcriptional processing and modification of rRNAs or snRNAs. C/D box antisense snoRNAs guide 2′‐O‐ribose methylation at specific sites in rRNAs or snRNAs, while H/ACA snoRNAs guide specific pseudouridylation within these RNA species (for reviews see Tollervey, 1996; Smith and Steitz, 1997; Ofengand and Fournier, 1998; Weinstein and Steitz, 1999; Bachellerie et al., 2000). Unexpectedly, a substantial number of the novel specimens of both snoRNA classes do not appear to target rRNA or a snRNA. Moreover, seven of them are not ubiquitously expressed in mouse tissues, but are specific to brain.
Novel ubiquitous C/D box snoRNAs
C/D box snoRNAs contain two short sequence motifs, box C and box D, located only a few nucleotides away from the 5′ and 3′ ends, respectively, generally as part of a typical 5′–3′ terminal stem–box structure (for a review see Bachellerie and Cavaillé, 1998). Immediately upstream from box D or from an additional box (D′) in the 5′ half, C/D snoRNAs feature sequence tracts, 10–21 nt in length, that are complementary to rRNA spanning the sites of 2′‐O‐ribose methylation. In the corresponding RNA duplexes, the ribose‐methylated nucleotide is always at the same location, paired to the fifth snoRNA nucleotide upstream from box D or box D′ (Kiss‐Laszlo et al., 1996; Nicoloso et al., 1996). In rRNA of the yeast Saccharomyces cerevisiae, cognate box C/D snoRNAs have been identified for 51 of the 55 ribose‐methylated sites (Lowe and Eddy, 1999). In mammals, however, a large fraction of the 105–107 expected rRNA 2′‐O‐ribose methylations (Maden, 1990) remained without a known cognate guide until completion of this study. Moreover, it is now apparent that the complexity of C/D box snoRNAs might be greater than anticipated, since methylation guide snoRNAs targeting substrates other than rRNA have been identified. Thus, three 2′‐O‐ribose methylations of spliceosomal U6 snRNA in human are also directed by bona fide C/D box antisense snoRNAs (Tycowski et al., 1998; Ganot et al., 1999), while both a 2′‐O‐ribose methylation and a pseudouridylation in U5 snRNA are guided by a novel C/D–H/ACA ‘hybrid’ snoRNA (Jady and Kiss, 2001).
Of the 72 novel mouse C/D box snoRNAs identified in this study, 66 are ubiquitously expressed, of which 24 correspond to novel C/D box snoRNAs able to guide a 2′‐O‐methylation within rRNA (Table I, group I) and 23 to orthologues of previously identified human snoRNAs able to guide methylation in rRNAs or in U6 snRNA (see supplementary data available at The EMBO Journal Online; Table I, group IV). Particularly interesting is MBII‐239, able to direct methylation at position U14 within ribosomal 5.8S RNA. Um14 is unique among all vertebrate rRNA ribose methylations because it is partial, takes place in the cytoplasm rather than the nucleolus, and is undermethylated in tumour tissues (Nazar et al., 1980; Munholland and Nazar, 1987). The detection of MBII‐239 strongly suggests that the atypical Um14 methylation of 5.8S rRNA is catalysed by the same snoRNA‐guided machinery as the remainder of rRNA ribose methylations, raising the issue of assessing the MBI‐239 snoRNA intracellular site of action and its expression level in tumour tissues.
From 15 of this first subset of 24 novel mouse C/D box snoRNAs, the human orthologues can be found as genomic or EST entries in DDBJ/EMBL/GenBank, further supporting the functional relevance of the identified cDNAs, as does their location in introns (in two cases: for MBII‐202 and MBII‐240). Collectively, the novel C/D box snoRNAs in group I are able to direct a total of 24 rRNA methylations, since MBII‐211 represents an apparent isoform of MBII‐180, able to direct the same methylation in 28S rRNA, while MBII‐202 can direct two methylations, corresponding to Um428 and Am2378 in human 18S rRNA.
Within group I, one particular snoRNA, MBI‐43, stands out for its exceptionally large size (240 nt) for a C/D box snoRNA. So far, the only non‐canonical specimen in this regard was the recently reported C/D–H/ACA ‘hybrid’ snoRNA, which exhibits a roughly similar size (Jady and Kiss, 2001). Curiously, the uridine targeted by MBI‐43 is the only site that is both ribose methylated and pseudouridylated in mammalian rRNA (Maden, 1990; Ofengand and Bakin, 1997). The presumptive H/ACA snoRNA able to guide this particular pseudouridylation remains unknown so far (see below). Careful inspection of the sequence and folding potential of both MBI‐43 and its human homologue in DDBJ/EMBL/GenBank did not reveal the presence of H/ACA snoRNA hallmarks in addition to C/D motifs, ruling out the possibility that the atypical snoRNA corresponds to a hybrid C/D–H/ACA snoRNA directing the two types of modification at the same nucleotide position. As a consequence of the work presented here, only 14 rRNA ribose methylations, from a total of 105–107 in mammals (Maden, 1990), remain without identified cognate guide snoRNA.
In our screen we have also discovered four novel C/D box snoRNAs (MBII‐19, MBII‐119, MBII‐166 and MBII‐382) able to direct ribose methylation within U2, U4 or U6 snRNAs [group II, Table I; see Massenet et al. (1998) for a review on snRNA nucleotide modifications). For MBII‐119 and MBII‐382, this assignment is supported by comparison of the complete human homologous sequences found as ESTs in DDBJ/EMBL/GenBank, which both exhibit box C and a 4 or 5 bp terminal stem in addition to conserved antisense elements. While MBII‐382 could guide two distinct ribose methylations in U2 snRNA, the two cognate antisense elements are unusual, because they are both located in the 5′ half of the snoRNA and found immediately upstream of a potential D′ box carrying two deviations from the consensus.
We also discovered 15 ubiquitously expressed RNAs with structural hallmarks of C/D box snoRNAs, but devoid of complementarity to rRNAs or snRNAs at the expected position relative to the box motifs (Table I, group III). While clone MBII‐426 is severely truncated (30 nt in length), it unambiguously corresponds to a bona fide C/D box snoRNA, since it matches perfectly a mouse EST sequence exhibiting box C, box C′ and a 4 bp terminal stem at the expected locations. For MBII‐115, MBII‐163, MBII‐289 and MBII‐295, human homologues are available in DDBJ/EMBL/GenBank, and in each case one of the two presumptive antisense elements is conserved between mouse and human, supporting the notion that these snoRNAs also represent bona fide methylation guides. Two ubiquitous methylation guide snoRNAs devoid of rRNA or snRNA complementarity have been reported recently (Jady and Kiss, 2000). This expanding subset of box C/D snoRNAs lacking complementarity to rRNA might be involved in rRNA processing (Tycowski et al., 1994) or other, still unknown, aspects of ribosome biogenesis or other functions. Alternatively, these snoRNAs might target cellular RNAs other than rRNAs or snRNAs, such as ubiquitous snmRNAs transiting through the nucleolus, like telomerase RNA, RNase P, SRP RNA or pre‐tRNAs (for review see Pederson, 1998). However, the presence of 2′‐O‐ribose‐methylated nucleotides has not been reported in these RNAs thus far. Furthermore, systematic searches of these snoRNA sequences did not reveal any potential antisense element of at least 8 bp that could direct 2′‐O‐ribose methylation within these potential targets.
Novel ubiquitous mouse H/ACA snoRNAs
The formation of pseudouridines in eukaryotic rRNA is directed by a large family of site‐specific H/ACA box snoRNAs carrying an appropriate bipartite guide sequence in the internal loop of one (or both) of their two major hairpin domains (Ganot et al., 1997a; Ni et al., 1997; Ofengand and Fournier, 1998; Bortolin et al., 1999). In contrast to methylation guide snoRNAs, pseudouridylation guide snoRNAs, thus far, have been exclusively discovered by experimental approaches, as identification in genomic sequences is critically hampered by their shorter box motifs and shorter bipartite antisense elements. Of 91–93 pseudouridines of mammalian rRNAs (Maden, 1990; Ofengand and Bakin, 1997), only 15 can be guided by one of the 13 previously reported human H/ACA snoRNAs (Ganot et al., 1997a). The present study dramatically expands the repertoire of eukaryotic H/ACA snoRNAs, decreasing the number of rRNA pseudouridinylation sites without a cognate H/ACA snoRNA by 27 to 49–51.
The majority of novel H/ACA snoRNA species have no counterpart among the mammalian snoRNAs reported so far, except for seven corresponding to homologues (sequence similarity 73–90%) of human pseudouridylation guides, namely U23, E2, U64, U65, U68, U69 and U70 (see supplementary data; Table II, group VIII). All novel mouse RNAs contain the two H and ACA box motifs at the expected locations and fold into the typical two major hairpin domains connected by a single‐strand hinge region carrying the box H motif (data not shown). The vast majority also exhibit bipartite antisense elements matching known RNA pseudouridylation sites (Figure 2). This large set of novel data overwhelmingly confirms the validity of the model for the base‐paired snoRNA–rRNA interaction guiding site‐specific pseudouridylation (Ganot et al., 1997a).
Nineteen entirely new specimens (Table II, group V) can collectively direct 27 of the 93–95 pseudouridylations identified in mammalian rRNAs. All but three of them display an identical location in human and mouse rRNAs (Figure 2A). Mouse homologues of seven already known human H/ACA snoRNAs show, as expected, the perfect sequence conservation of the antisense elements proposed previously (Ganot et al., 1997a), except for one. Thus, the proposal that the 3′ pseudouridylation pocket of U69 could target Ψ36 in human 18S rRNA (Ganot et al., 1997a) is not phylogenetically supported, as in comparison with its human homologue the mouse MBI‐134 sequence exhibits three nucleotide differences over the presumptive bipartite 3′ antisense element of U69.
A large fraction of the novel rRNA pseudouridylation guides can each target two distinct modifications, through appropriate antisense elements in both pseudouridylation pockets. One of them, MBI‐89, is even able to direct three distinct modifications: its 5′ pseudouridylation pocket contains an antisense element matching two distinct target sites, one in 18S rRNA, one in 28S rRNA. Intriguingly, in addition to the 27 rRNA pseudouridines targeted by the novel mouse snoRNAs reported previously (Ofengand and Bakin, 1997), three rRNA uridines not known to be pseudouridylated nevertheless appear as bona fide targets for two of the novel H/ACA specimens, MBI‐39 (through its 5′ pseudouridylation pocket) and MBI‐164 (through both pseudouridylation pockets), as shown in Figure 2A. In this regard, it is noteworthy that several rRNA 2′‐O‐ribose methylations had not been identified until the detection of a cognate guide snoRNA prompted further scrutiny (Qu et al., 1999).
The set of 34 entirely novel H/ACA snoRNAs identified in mouse also includes four outstanding specimens able to target pseudouridylation onto snRNA instead of rRNA (Table II, group VI). MBI‐57, MBI‐100, MBI‐114 and MBI‐125 have the potential to target U2 or U6 snRNAs, through the formation of guide duplexes that appear perfectly canonical as compared with those matching rRNA targets (Figure 2B). Interestingly, the sequence of a likely homologue of MBI‐57, which can direct both ψ34 and ψ44 in U2 snRNA, is present in a Xenopus laevis EST (BE507485), which could provide the basis for a direct experimental analysis of the elusive function of these two U2 pseudouridylations.
Novel mouse H/ACA RNAs also include 11 species for which we could not identify any reasonable target uridine in rRNAs or snRNAs (Table II, group VII). Two of these RNAs, MBI‐79 and MBI‐87, are encoded in introns of ubiquitously expressed genes, like all vertebrate rRNA modification guide snoRNAs characterized so far (see below). Searches involving antisense elements of the 11 novel RNAs were also negative for potential target uridines in other stable non‐coding RNAs trafficking through the nucleolus, such as telomerase RNA, RNase P or SRP RNA, the pseudouridine content of which remains unknown. We cannot rule out with certainty that novel snoRNAs might still target these RNA species; however, they could also target other cellular RNAs such as mRNA, as suggested recently in the case of a C/D box snoRNA (Cavaillé et al., 2000). Finally, recent findings that telomerase RNA in vertebrates contains a typical H/ACA domain (Mitchell et al., 1999) and that human H/ACA snoRNPs and telomerase share evolutionarily conserved proteins (Pogacic et al., 2000) expand the structural and functional diversity of H/ACA box snoRNAs, suggesting that some of the novel snoRNAs in this group might have unanticipated functions.
We identified six C/D box snoRNAs (MBII‐13, MBII‐48, MBII49, MBII‐52, MBII‐78 and MBII‐85; Table III, group IX) and one H/ACA box snoRNA (MBI‐36) that are expressed in mouse brain but not in other tissues tested so far (heart, liver, kidney, testis or muscle). Human homologues of MBI‐36, MBII‐13, MBII‐52 and MBII‐85 have been identified (Cavaillé et al., 2000; Filipowicz, 2000). In human, genes encoding MBII‐52 and MBII‐85 are present in multi‐copy repeats on chromosome 15q11–13 and located in introns of host genes that apparently have no capacity to encode proteins (Cavaillé et al., 2000). The multi‐copy repeat arrangement of these two snoRNAs is in agreement with their high abundance in cells, as deduced by northern blot analysis; in our screen, these clones could be identified 37 times (MBII‐52) and 56 times (MBII‐85), respectively, while most other cDNA clones encoding snoRNA genes are only found once. The human homologue of MBII‐13 maps as a single‐copy gene to chromosome 15q11–13 and MBI‐36 maps to the large intron 2 of the serotonin 5‐HT2C receptor gene, consistent with its brain‐specific expression pattern with highest levels in the choroid plexus (Cavaillé et al., 2000).
None of the brain‐specific snoRNAs exhibits complementarity to ribosomal or snRNAs within their antisense element(s), in agreement with a role different from targeting these RNA species. Remarkably, the antisense element of MBII‐52 snoRNA is complementary to the serotonin receptor 5‐HT2C mRNA (the same gene serving as a host gene for MBI‐36, see above) and is proposed to regulate editing or alternative splicing of the mRNA (Cavaillé et al., 2000). In addition, MBII‐13, MBII‐52 and MBII‐85 C/D box snoRNAs might be involved in the aetiology of Prader–Willi syndrome, a neurodegenerative disease, thus constituting the first snoRNAs whose absence is potentially causing a human disorder (Cavaillé et al., 2000). So far, no potential targets have been identified for the remaining six brain‐specific snoRNAs. Availability of the complete mouse and human genomes might reveal conserved target sites for these unusual snoRNA species.
Intronic localization of novel modification guide snoRNAs
Sequences encoding a large subset of the novel snoRNAs in mouse (or their unambiguous orthologues in another mammalian species) are located within long fragments of the mammalian genomes in sequence databases. Whenever exons and introns were annotated, the snoRNA coding region was found within an intron of a mostly ubiquitously expressed gene. This is in agreement with the observed pattern for vertebrate modification guide snoRNAs (Pelczar and Filipowicz, 1998; Smith and Steitz, 1998; Weinstein and Steitz, 1999; Bachellerie et al., 2000), which are usually processed from the debranched lariat by exonucleolytic trimming of excess intronic sequences (Kiss and Filipowicz, 1995). Characterization of our novel mouse snoRNAs largely expands the repertoire of known host genes. A majority of the novel host genes identified in this study encode ribosomal proteins, in agreement with previous observations. They include rpL3, rpL13, rpL23a, rpL37 and rpS12, as well as rpL23, rpL27a, rpL32‐3A, rpP2, rpS12 and rpS16 for C/D box and H/ACA snoRNAs, respectively. In addition, detection of the mouse homologues of human C/D box U58 snoRNA and H/ACA U68 snoRNA allowed us to identify their cognate host genes, rpL17 and rpL18a, respectively. Five novel ubiquitous genes for non‐ribosomal proteins have also been identified as hosts for intronic snoRNAs. One is an unidentified 5′TOP gene hosting MBII‐99 C/D box snoRNA, while the other four contain intron‐encoded H/ACA snoRNAs (Tables I and II). Curiously, one of them encodes dyskerin (Heiss et al., 1998), the mammalian homologue of yeast Cbf5, the pseudouridine synthase thought to catalyse the snoRNA‐guided isomerization of uridine (Lafontaine et al., 1998). For a more extensive analysis of novel snoRNAs see supplementary data.
Novel RNAs that do not exhibit snoRNA motifs
Of the 201 ERNS, 88 could not be assigned to known classes of snmRNAs and no function can be attributed to these RNA species at this point. However, hallmarks might exist at the level of secondary structure, as observed for H/ACA box RNAs. In fact, some of the RNA sequences can be folded into highly stable stem–loop structures. Since we are currently analysing cDNA libraries encoding small RNA species from organisms including Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana, interspecies comparisons of the novel sequences might reveal conserved structural or sequence motifs and provide hints as to the function of these RNA species in the cell.
Novel snmRNAs located within mRNA coding regions
From 88 novel ERNS from the non‐snoRNA type, 26 can be located within known or predicted mRNA or heterogeneous nuclear RNA (hnRNA) coding regions (Table IV, groups X and XI). Thereby, 16 ERNS are part of the open reading frame of mRNAs, whereas 10 are located within 5′ or 3′ untranslated regions (UTRs). At this point, the function of these RNAs remains elusive. It is noteworthy that the expression as snmRNAs of some but not all ERNS from this group can be confirmed by northern blot analysis. While ERNS derived from coding regions might correspond to more or less stable intermediates during degradation of hnRNAs or mRNAs, snmRNAs derived from the 5′ or 3′ UTRs of mRNAs could exhibit regulatory functions. Such mRNA regions have been shown previously to be involved in cis in the control of mRNA stability and intracellular localization (Schuldt et al., 1998; Saunders et al., 1999).
Novel snmRNAs resembling repetitive elements
Clones MBI‐2, MBI‐56, MBI‐160, MBII‐133 and MBII‐373 contain sequences derived from short interspersed repetitive elements (SINEs; Table V, group XII). A common denominator between sequences from this group of clones is sequence similarity to nuclear 4.5SH RNA, 4.5SI RNA or the related B1, B2 or B4 retronuons. These sequences, in turn, are related to an ancestral SRP RNA or tRNA (Jurka, 2000). The small RNAs that served as templates for the aforementioned cDNAs may be related to 4.5SH RNA, 4.5SI RNA or directly transcribed B‐type SINEs. Alternatively they may reflect degradation or processing products from larger hnRNAs or mRNAs that harbour such sequences. B1 and B2 SINEs, for example, can often be found in 3′ UTRs of mature mRNAs in both orientations (Brosius, 1999). Further work is necessary to establish whether the clones from this category reflect novel snmRNAs related to 4.5SH RNA, 4.5SI RNA, B1, B2 or B4 RNAs.
Novel ERNS without known sequence or structural motifs
Of the 88 novel snmRNAs identified without snoRNA motifs, 57 did not exhibit any sequence or structural motifs that would have made it possible to assign a genomic location within the mouse genome or a specific function to these RNAs. A notable exception is clone MBI‐44, which maps to the mitochondrial pro/D loop (Table VI, group XIII). Twenty snmRNAs from the group of 57 were expressed, as assessed by northern blot analysis, while the expression of the remaining 37 snmRNAs could not be confirmed. From three randomly chosen clones of that group, however, we could amplify cDNA fragments of the expected size by RT–PCR, demonstrating their expression. Again, at this point, we cannot exclude the possibility that some of the cDNA sequences of this class represent degradation products of unknown hnRNAs or mRNAs rather than snmRNAs. If this turns out to be the case, these sequences are still useful in providing ESTs for novel mRNAs in mouse. Further analysis of the human and mouse genomes should provide a better insight as to whether these sequences represent novel snmRNAs, as at least some of them will.
This study represents a first unbiased look at the population of snmRNA species in a mammalian cell, providing the basis for a comprehensive understanding of genomic, cellular and organismal function. By our experimental approach, we could identify a large set of novel snoRNAs of the C/D or H/ACA box type guiding ribose methylation or pseudouridylation not only in rRNA, as expected, but also in snRNAs. For the first time, we report the detection of guide snoRNAs directing ribose methylations in U2 and U4 snRNAs, as well as snoRNA guides for pseudouridylations in U2 and U6 snRNAs. In addition, we identified a surprisingly large number of snoRNA species from both classes without the potential to target rRNAs or snRNAs, as deduced from their lack of appropriate complementarity. Especially intriguing was the identification of several brain‐specific snmRNAs, all of which belong to the snoRNA type. This might lead to further studies to identify snoRNAs expressed tissue specifically in tissues other than brain. One of the brain‐specific snoRNAs (MBII‐52) has been suggested to target serotonin receptor 5‐HT2C hnRNA or mRNA, which in turn is expressed specifically in brain (Cavaillé et al., 2000). This may be indicative of a novel function of snoRNAs, namely the regulation of gene expression by binding to and/or modifying mRNAs or their hnRNA precursors via their antisense elements. At this stage, it is difficult to speculate about the function of potential snmRNAs of the non‐snoRNA type. As demonstrated, some of these novel species are derived from hnRNAs or mRNAs and might therefore correspond to degradation products of larger transcripts. Alternatively, they could regulate the expression of mRNAs by as yet unknown mechanisms, especially when located within their 5′ or 3′ UTRs. Their detection sets the stage for direct experimental testing of these hypotheses.
Materials and methods
Identification of novel RNA species
We prepared total RNA from mouse brain by the TRIzol method (Gibco‐BRL). Total RNA was subsequently fractionated on a denaturing 8% polyacrylamide gel (7 M urea, 1× TBE buffer). RNAs in the size range ∼50 to ∼110 (Fraction II) or ∼110 to ∼500 nt (Fraction I) were excised from the gel, passively eluted and ethanol precipitated. Subsequently, 5 μg of RNA were tailed with CTP using poly(A) polymerase, as described by DeChiara and Brosius (1987). RNAs were reverse transcribed into cDNAs using primer GIBCO1 (see supplementary data) and cloned into pSPORT 1 vector employing the GIBCO Superscript™ system (Gibco‐BRL). cDNAs were amplified by PCR using primers FSP and RSP (see supplementary data). PCR products were spotted by robots in high density arrays onto filters by the method of Schmitt et al. (1999), performed at the Resource Center of the German Human Genome Project (Berlin, Germany).
Filter hybridization and isolation of clones
For exclusion of the most abundant, known, small RNA species, we end‐labelled oligonucleotides (see supplementary data) derived from these sequences with [33P]ATP and T4 polynucleotide kinase, and hybridized oligonucleotides to DNA arrays spotted on filters (see above). We performed hybridization in 0.5 M sodium phosphate pH 7.2, 7% SDS, 1 mM EDTA at 53°C for 12 h. We washed filters twice at room temperature for 15 min in 40 mM sodium phosphate buffer pH 7.2, 0.1% SDS, exposed filters to a phosphoimaging screen and analysed filters by computer‐aided determination of hybridization signals (Maier et al., 1994).
Accession numbers of sequences
We would like to thank Dr Stefan Hennig for his support with computer analysis of data and Christine Mersmann for technical assistance during the initial phase of the project. This work was supported by the German Human Genome Project through the BMBF (#01KW9616 and #01KW9966) to J.B. and A.H., by an IZKF grant (Teilprojekt F3, Münster) to A.H. and a grant from the Association pour la Recherche sur le Cancer and laboratory funds from the Centre National de la Recherche Scientifique and Université Paul‐Sabatier, Toulouse, to J.‐P.B.
- Copyright © 2001 European Molecular Biology Organization