Retrotransposition of LINEs and other retroelements increases repetition in mammalian genomes and can cause deleterious mutations. Recent insertions of two full‐length L1s, L1spa and L1Orl, caused the disease phenotypes of the spastic and Orleans reeler mice respectively. Here we show that these two recently retrotransposed L1s are nearly identical in sequence, have two open reading frames and belong to a novel subfamily related to the ancient F subfamily. We have named this new subfamily TF (for transposable) and show that many full‐length members of this family are present in the mouse genome. The TF 5′ untranslated region has promoter activity, and TF‐type RNA is abundant in cytoplasmic ribonucleoprotein particles, which are likely intermediates in retrotransposition. Both L1spa and L1Orl have reverse transcriptase activity in a yeast‐based assay and retrotranspose at high frequency in cultured cells. Together, our data indicate that the TF subfamily of L1s contains a major class of mobile elements that is expanding in the mouse genome.
LINEs, or L1s, are repeated sequences that pervade mammalian genomes (Burton et al., 1986). Although most L1s are inactive because they are truncated, rearranged or mutated, some are active and disperse by retrotransposition (i.e. transposition requiring reverse transcription and reintegration of the resulting cDNA). Full‐length mouse and human LINEs contain shared features that include two protein‐coding regions [open reading frames (ORFs) 1 and 2], a 3′ untranslated region (UTR) and a 3′ poly(A) tail. ORF1 encodes an RNA‐binding protein (Martin, 1991; Hohjoh and Singer, 1996, 1997; Kolosha and Martin, 1997) required for retrotransposition (Moran et al., 1996). ORF2 encodes an endonuclease (Feng et al., 1996), a reverse transcriptase (RT; Mathias et al., 1991) and a highly conserved cysteine‐rich motif (Fanning and Singer, 1987), all of which are also required for retrotransposition (Moran et al., 1996).
Mouse L1s, unlike human L1s, have 5′ UTRs with tandemly repeated units called monomers at their 5′‐most end. Earlier studies revealed two types of monomers, A and F, which are about 200 bp long and unrelated in sequence (Loeb et al., 1986; Padgett et al., 1988). Although A‐ and F‐type L1s are equally abundant in the genome, A‐subfamily members display little divergence, often have intact ORFs and are transcribed (Schichman et al., 1992, 1993; Severynse et al., 1992). In contrast, F‐subfamily members are divergent, lack intact ORFs and are not transcribed (Schichman et al., 1992). Thus, while some A elements may be active, F elements appear ‘dead’ in that they dispersed millions of years ago and have accumulated deleterious mutations (Adey et al., 1994).
New retrotranspositions of active human elements have been identified through systematic screening of disease genes (Kazazian et al., 1988; Morse et al., 1988; Miki et al., 1992; Narita et al., 1993; Holmes et al., 1994). Although those insertions were truncated progeny of active L1s, isolation of their precursors (Dombroski et al., 1991; Holmes et al., 1994) led to demonstration of autonomous retrotransposition in cultured cells (Moran et al., 1996).
Recently, four L1 insertions causing disease have been identified in mice (Kingsmore et al., 1994; Mulhardt et al., 1994; Kohrmann et al., 1996; Takahara et al., 1996; Perou et al., 1997). Surprisingly, two of these insertions (L1spa and L1Orl) were full‐length, and thus may retain the ability to retrotranspose (Kingsmore et al., 1994; Mulhardt et al., 1994; Takahara et al., 1996). A partial sequence of L1spa (Kingsmore et al., 1994) and complete sequence of L1Orl (Takahara et al., 1996) suggested that both were members of the inactive F subfamily. Here we show that L1spa and L1Orl are active autonomous retrotransposons that belong to a novel L1 subfamily related to the F subfamily. We name this new subfamily the TF subfamily.
L1spa and L1Orl are members of a novel L1 subfamily with many full‐length members
We sequenced L1spa and showed that it is 7.5 kb long, has seven and a half monomers in its 5′ UTR and retains both ORFs with intact endonuclease and RT domains (Figure 1). Comparison of the L1spa sequence (DDBJ/EMBL/GenBank accession No. AF016099) with the reported sequence of L1Orl showed that they are closely related with an overall identity of 99.4% (Figure 1). Both elements have three copies of a 42 bp repeat in the length polymorphism region (LPR) of ORF1. Reported expressed A‐type L1s contain two copies of this sequence while genomic F‐type elements have a single 42 bp block and a related 66 bp block; three 42 bp repeats have been observed in only a few genomic A‐type elements (Schichman et al., 1992).
The L1spa and L1Orl 5′ UTRs are 97.7% identical. Consensus sequences made by aligning the seven and a half L1spa monomers or the five and two‐thirds L1Orl monomers showed 100% identity to each other, but only 73% identity to an F‐monomer consensus (Figure 2a). Furthermore, L1spa and L1Orl are identical over the 3′‐most 125 bp of monomer 1, which are divergent even among L1s from the same subfamily. Therefore, the two elements are distinct but closely related members of the previously uncharacterized TF subfamily.
The human and mouse genomes each contain approximately 100 000 L1s, but the vast majority are 5′ truncated (Voliva et al., 1983; Fanning and Singer, 1987; Hutchison et al., 1989). As a result, the copy number of full‐length L1s is much lower than that of 3′ UTR sequences. To estimate the number of full‐length TF L1s in the mouse genome, we screened two mouse genomic libraries with a probe containing TF monomer sequence. We determined that the diploid mouse genome contains 2000–3000 copies of the TF 5′ UTR.
To determine the percentage of TF 5′ UTRs associated with downstream L1 sequence, we isolated 30 phage cores that hybridized with the TF probe. Twenty‐nine of these cores (97%) contained a phage that also hybridized to an ORF2 probe. We purified these 29 phage and PCR‐amplified a 3.6 kb fragment encompassing most of ORFs 1 and 2 from 18 of them (62%). Thus, our data suggest that almost all of the 2000–3000 TF sequences are associated with downstream L1 sequence, and that at least 60% are contained in full‐length L1s. Considering that many of our PCR‐negative phage probably contained TF L1s truncated at internal restriction sites during construction of the library, the true number of full‐length TF L1s may approach the total number of TF 5′ UTR sequences.
TF 5′ UTR has promoter activity and TF RNA predominates in ribonucleoprotein particles
To test the novel TF 5′ UTR for promoter activity, we created an expression construct (pRJD801) in which the L1spa 5′ UTR directs transcription of the β‐galactosidase (β‐gal) gene (Figure 2B). We transiently transfected mouse LTK– and F9 cells with pRJD801 and assayed for β‐gal activity after two days. In both cell lines, we observed substantial activity in both qualitative staining and quantitative enzyme assays (Figure 2B). In the quantitative enzyme assay, we compared the promoter activity of the L1spa 5′ UTR with the activity of a construct (pJCB8) containing the promoter for the large subunit of mouse RNA polymerase II (pPol II). We used pPol II as a reference promoter because its activity is similar in LTK– and F9 cells (Figure 2B). As another control, we transfected a construct (pRJD802) which lacked promoter sequences; cells transfected with this construct had very little β‐gal activity (Figure 2B). These data prove that the TF‐type 5′ UTR of L1spa is an active promoter in mouse cell lines.
Since the L1spa 5′ UTR is an active promoter, we asked whether TF‐type L1s are expressed in vivo. Previous studies showed that L1 RNA and ORF1 protein co‐localize in cytoplasmic ribonucleoprotein (RNP) particles isolated from mouse F9 cells (Martin, 1991). Northern blot analysis of RNA from RNP particles using F‐, A‐ and TF‐monomer probes revealed that TF‐type RNA is much more abundant than A‐type RNA (Figure 2C). We also used a probe that did not discriminate among L1 subfamilies (ORF) to ensure that equal amounts of L1 RNA were present in each lane (Figure 2C). Since L1spa and L1Orl both contain the longest known form of the ORF1 LPR (three 42 bp repeats), the abundance of TF RNA in RNP particles is consistent with a previous demonstration that the larger form of ORF1 protein is most plentiful in particles (Kolosha and Martin, 1995).
L1spa and L1Orl encode RT and retrotranspose in cultured cells
We next determined whether the ORF2s of L1spa and L1Orl encode RT activity in a yeast‐based assay. Both ORF2s demonstrated RT activity at levels comparable to the activity encoded by ORF2 of L1.2, the human L1 used as a positive control (Figure 3). Two mutations in the critical RT motif F/Y‐X‐DD, D709Y and D709N, showed no RT activity in this assay.
Retrotransposition in mammalian cells requires several functions in addition to RT activity and a functional promoter, including ORF1 protein and the endonuclease activity of ORF2 protein (Feng et al., 1996; Moran et al., 1996). To prove that L1spa and L1Orl are active retrotransposons, we used a cell culture assay that accurately detects authentic retrotransposition events (Moran et al., 1996). In this assay, L1 retrotransposition does not require the 5′ UTR when an exogenous promoter is provided. We cloned L1spa and L1Orl lacking their 5′ UTRs downstream of the cytomegalovirus immediate early promoter (pCMV) to create constructs pTN202 and pTN207, respectively (Figure 4). Both constructs generated G418R foci in LTK– cells (40–300 events per 106 hygromycin‐resistant cells, Figure 4). In contrast, an RT‐defective allele of L1spa (pTN203) failed to generate G418R foci.
PCR analysis on genomic DNA of individual G418R foci generated by pTN202 revealed that the neo insertions lacked the γ‐globin intron, consistent with retrotransposition. Furthermore, Southern blots showed that neo sequences were located at different genomic positions (data not shown). These data verify that L1spa and L1Orl are capable of high‐frequency retrotransposition in mouse cells.
Since L1spa and L1Orl are nearly identical, we concentrated further analyses on L1spa. We cloned the 5′ UTR of L1spa into pTN202, creating pTN201, and found that the presence of both pCMV and the 5′ UTR greatly increased retrotransposition frequency compared with that of pTN202 (1400 events per 106 hygromycin‐resistant cells, Figure 4). To test L1spa in a more biologically relevant manner, we then deleted pCMV from pTN201. This construct, pTN205, which contains the full L1spa sequence, generated retrotransposition events at frequencies similar to pTN202 (100–120 events per 106 hygromycin‐resistant cells), while pTN206, which lacks both promoters, did not retrotranspose. Thus, retrotransposition of a TF‐subfamily L1 occurs at readily observable frequencies and does not require an exogenous promoter.
We have used a cell culture assay to prove that L1s from humans (Moran et al., 1996; Sassaman et al., 1997) and mice (this work) can retrotranspose. Previously, we showed that a human full‐length L1 containing a 5′ UTR retrotransposed more frequently than a human L1 lacking a 5′ UTR, even when both constructs contained pCMV (Moran et al., 1996). Here we obtained a similar result with a mouse L1 (compare pTN201 with pTN202 in Figure 4). This could result from increased L1 transcription on constructs containing both CMV and 5′ UTR promoters. Alternatively, presence of 5′ UTR sequence in an L1 RNA might make it a more efficient intermediate for retrotransposition than a transcript lacking 5′ UTR sequence. Perhaps the 5′ UTR of an active L1 has an additional function in retrotransposition independent of promoter activity, such as providing a protein docking site or improving RNA stability.
The sequences of L1spa and L1Orl suggest that they are members of a novel subfamily that contains active L1s. Two mouse mutations besides spastic and Orleans reeler have been caused by L1 insertions. One insertion into the sodium channel gene, Scn8a, is too short to be assigned to a subfamily (Kohrmann et al., 1996). The other insertion, into the beige gene, is 1.1 kb long (Perou et al., 1997); it matches the TF subfamily at 27 of 37 polymorphic nucleotides, and the A subfamily at the other 10 sites. Thus, it is likely that the active progenitor of this L1 is also a member of the TF subfamily.
All recent L1 retrotranspositions in humans and mice lack stop mutations within the inserted L1 DNA, while all randomly cloned human L1 cDNAs contain stop mutations (Skowronski et al., 1988). It is therefore likely that the proteins encoded by active L1 progenitors function preferentially to retrotranspose the transcript that encodes them, i.e. they function in cis (Dombroski et al., 1991; Moran et al., 1996; Boeke, 1997). Consistently, mutations in conserved domains of an active L1 progenitor markedly reduce its abililty to retrotranspose in cultured cells (Moran et al., 1996). Therefore, it appears that trans complementation to retrotranspose transcripts from inactive L1s is a rare event. This cis preference would prevent efficient expansion of truncated, rearranged or otherwise mutated L1 transcripts, and curtail promiscuous retrotransposition of other cellular mRNAs. The two TF L1s described here support cis preference because both have retrotransposed recently, contain intact ORFs and retrotranspose autonomously in cultured cells.
Why might many active mouse L1s belong to a particular subfamily? In the human genome, most active L1s are concentrated in a specific family called the Ta subset. There are only about 160 full‐length members of this subset, about a quarter of which are active (Sassaman et al., 1997). The mouse TF subfamily probably presents a strikingly different picture. The first two characterized TF‐subfamily L1s, L1spa and L1Orl, are the first examples of recent L1 insertions that have retained retrotransposition capacity. If TF‐subfamily L1s frequently retrotranspose as full‐length insertions, they would expand among active L1s, leading to a preponderance of active L1s in the TF subfamily. Determining the number of TF elements capable of retrotransposition will provide important information concerning the evolution of the subfamily and its current role in shaping the mouse genome. Considering the large number of full‐length TF L1s, it is probable that this subfamily contains more active members than the ∼40 in the human Ta subset, implying a greater potential for L1 mutagenesis in mice than in humans.
Many questions crucial to understanding the impact of L1 in shaping mammalian genomes have been impossible to address because no active L1s have been identified from an experimental organism. Characterization of an active mouse L1 subfamily now provides an opportunity to design whole‐animal experiments using tagged active mouse L1s in transgenic mice.
Materials and methods
Bacterial and yeast strains
Bacterial strain DH5α (Stratagene) was used in cloning. Growth media, antibiotic selection and bacterial transformation were according to standard protocols (Sambrook et al., 1989). HIS3 pseudogene experiments were carried out in strain YH50 (MATα his3Δ200 ura3‐167 trp1Δ1 leu2Δ1 spt3‐202) (Dombroski et al., 1994). Yeast transformation and media were according to standard protocols (Rose et al., 1990).
Cloning techniques, DNA preparation and sequencing
Standard molecular biology techniques were performed as previously described (Sambrook et al., 1989). Site directed mutagenesis was performed as described (Kunkel et al., 1991). Plasmid DNAs were purified on Qiagen maxi columns (Qiagen). DNAs for transfection experiments were tested for superhelicity by electrophoresis on 0.8% agarose–ethidium bromide gels and only highly supercoiled preparations were used for transfections. DNA sequencing was perfomed on an Applied Biosystems DNA sequencer (ABI377). Oligonucleotide sequences used in this study are available upon request. All cloning‐related PCR products, ORF2 mutations and relevant restriction fragments were sequenced in their entirety. Nucleotide sequences and deduced protein sequences were analyzed using the GCG software package (Biotechnology Center, University of Wisconsin–Madison, Madison, WI).
Filter hybridization of mouse genomic libraries and phage PCR
Partial Sau3AI λ genomic libraries derived from mouse strain 129 were obtained commercially (Stratagene) or from K.Kaestner. The libraries were screened with a probe containing L1spa monomers 5–7½ liberated by NotI and XbaI digestion of pTNC7. Secondary and tertiary screens also employed a 1.5 kb ORF2 probe liberated from pTNC7 by BstXI digestion. Probes were labeled by random priming and hybridized under standard conditions (Sambrook et al., 1989). The number of genomic TF sequences was estimated from the number of positive plaques per filter, the number of phage per plate and the average insert size in the library. Phage purification was performed using standard protocols (Sambrook et al., 1989). For PCR analysis on purified phage, we used L1spa‐derived oligomers to amplify a 3.6 kb fragment which included most of ORF1 and ORF2. This product contained nucleotides 2183–5814 of the L1spa sequence.
Clones used in yeast experiments
pBSM is a derivative of pSM42 (Dombroski et al., 1994) in which the primer‐binding site of Ty1 was mutated (J.V.Moran, Q.Feng, B.A.Dombroski, T.P.Naas, R.J.DeBerardinis, J.D.Boeke and H.H.Kazazian, manuscript in preparation). pSM43 is an active site mutant (ORF2 D702Y) of pSM42 (Dombroski et al., 1994). pTNC1 was constructed by cloning a 3.9 kb Pfu polymerase PCR product containing L1spa ORF2 as a SalI–SacI fragment into pBluescript II (pBS, Stratagene). Site‐directed mutagenesis of ORF2 was performed using this vector. pTNC2 is an ORF2 D709Y mutant of pTNC1, and pTNC3 is an ORF2 D709N mutant of pTNC1. SalI–SacI fragments were cloned from pTNC1, pTNC2 and pTNC3 into pBSM giving rise to pTNY1, pTNY2 and pTNY3, respectively. pTNC5 was constructed by cloning a 3.9 kb Pfu polymerase PCR product containing L1Orl ORF2 as a SalI–SacI fragment into pBS. The relevant SalI–SacI fragment was cloned into pBSM to make pTNY5.
Clones used in tissue culture experiments
Plasmid pNI contains a 15 kb NotI genomic DNA fragment with the entire L1spa insertion cloned into pBS (Kingsmore et al., 1994). pA contains an 8 kb PvuII genomic DNA containing the entire L1Orl insertion cloned into pBS (Takahara et al., 1996). pJM102 and pJM105 were previously described (Moran et al., 1996). L1spa was engineered to contain a unique NotI site either upstream of its 5′ UTR or immediately upstream of ORF1. A unique PacI site was engineered at nucleotide 7430 in the 3′ UTR. NotI–PacI fragments were cloned into pBS resulting in L1spa constructs either containing the 5′ UTR (pTNC201) or lacking it (pTNC202). The NotI–blunted PacI fragments from these plasmids were cloned along with a BamHI–blunted AccI fragment containing the neo cassette from pJCC9 (Moran et al., 1996) into NotI‐ and BamHI‐digested pCEP4 (Invitrogen). The resulting plasmids contained a tagged L1spa either containing the 5′ UTR (pTN201) or lacking it (pTN202). The D709Y mutant in ORF2 was subcloned from pTNC2 into pTNC202 producing pTNC203. A NotI–XhoI (XhoI cuts at one site in the γ‐globin intron in the neo gene) fragment from pTNC203 was cloned into pTN202 digested with the same enzymes producing pTN203. L1Orl was engineered to contain a unique NotI restriction site immediately upstream of ORF1. A NotI–SfiI fragment was cloned into NotI‐ and SfiI‐digested pTNC202 resulting in pTNC207. The NotI–blunted PacI fragment from this plasmid was cloned along with a BamHI–blunted AccI fragment containing the neo cassette from pJCC9 (Moran et al., 1996) into NotI‐ and BamHI‐digested pCEP4 (Invitrogen) as described above. The resulting plasmid, pTN207, contains a tagged L1Orl lacking the 5′ UTR. Plasmids pTN205 and pTN206 were constructed by cloning a NotI–XhoI fragment from pTN201 and pTN202, respectively, into pCEP4ΔCMV (Moran et al., 1996).
L1spa 5′ UTR–β‐gal constructs
pCMVβ (Clontech) was digested with EcoRI, blunted with T4 DNA polymerase and partially digested with NotI to yield a 6.4 kb fragment lacking pCMV and the SV40 intron. The 5′ UTR of L1spa was obtained from pTNC7, a pBS derivative containing full‐length L1spa as a NotI–XhoI fragment. pTNC7 was digested with BstXI, which cuts in the pBS polylinker 5′ to the L1spa sequence, blunted with T4 DNA polymerase and digested with StyI, which cuts in monomer 1, resuting in a 1.5 kb fragment containing most of the 5′ UTR. Then a 300 bp StyI–NotI PCR product of the remainder of the 5′ UTR was prepared with Pfu polymerase. Ligation of the 1.5 kb blunted BstXI–StyI, 0.3 kb StyI–NotI and the 6.4 kb NotI–blunted EcoRI fragments resulted in pRJD800 which contained the entire L1spa 5′ UTR upstream of the β‐gal gene. To add back the SV40 intron and create pRJD801, the 1.8 kb NotI fragment was liberated from pRJD800, blunted with T4 DNA polymerase and ligated with the blunted EcoRI–blunted XhoI fragment of pCMVβ. pRJD802 is a deletion derivative of pCMVβ created by removing pCMV by EcoRI and XhoI digestion, then blunting and recircularizing the backbone. pJCB8 contains the promoter for the large subunit of mouse RNA polymerase II (pPol II) in place of pCMV while pJCB14 contains the mouse phosphoglycerate kinase‐1 promoter (pPGK) driving the luciferase gene (Bradford, 1997).
Promoter expression experiments
F9 or LTK– cells were grown as described by ATCC. Cells (4×105) were incubated in 7% CO2 at 37°C for 18 h. Cells were transiently co‐transfected with 1 μg of pJCB14 and 1 μg of either pJCB8, pRJD801 or pRJD802 using lipofectamine (BRL). Cell extracts were prepared after 48 h. β‐gal and luciferase assays were performed using commercially available kits (Promega).
Cytoplamic RNP particles and Northern analysis
L1 RNP particles of mouse F9 cells were prepared by sedimentation through sucrose step gradients as described (Martin, 1991). RNA was extracted from 20 μl of the 30%/60% interface using TRIZOL (Life Technologies), resolved on formaldehyde–agarose gels and blotted on to nitrocellulose. Probes were made from gel‐purified fragments by random‐primed labeling with [α‐32P]dCTP. The A‐monomer probe is from L1Md9 (Shehee et al., 1988), the F‐monomer probe is from Padgett et al. (1988) and the TF probe was generated from L1Orl. The ORF1–ORF2 probe spans positions 1490–7001 of L1Md‐A2 (Loeb et al., 1986). The specificity of the monomer probes was proven by filter hybridization to cloned fragments (data not shown).
HIS3 pseudogene experiments
The HIS3 pseudogene assay was performed essentially as described (Derr et al., 1991; Dombroski et al., 1994). Yeast strain YH50 was co‐transformed with Ty1/L1 ORF2 expression constructs and the indicator cassette plasmid (pSM50). Colonies were selected on SC medium (–Ura, –Trp). Transformants were purified and four colonies were grown as patches on SC medium (–Ura, –Trp) for 3 days at 30°C. In order to induce the expression of the Ty1/L1 construct, the patches were subsequently replica plated on to two different SC (–Ura, –Trp) plates containing 2% galactose and incubated for 5 days at 22°C. After induction, one plate was replica plated to SC medium (–His) to obtain a qualitative assessment of RT activity. Patches from the other plate were diluted in water, plated on SC (–His) and YPD medium, and grown for 4 days at 30°C. The relative RT activity was reported as the number of His+ colonies/total number of colonies plated.
The retrotransposition assay was performed as described (Moran et al., 1996). In this assay, an antisense neomycin resistance gene (neo) under the control of an SV40 promoter (pSV40) is interrupted by a sense γ‐globin intron, and cloned into the 3′ UTR of L1 elements. G418R cells result only when an L1 message containing antisense neo is transcribed, the γ‐globin intron is removed by splicing, the transcript is reverse transcribed and integrated into the genome, and the neo gene is expressed from pSV40. LTK– cells were grown as described (Moran et al., 1996).
We thank Julie Saxton and Jeremy Bradford for technical assistance, Klaus Kaestner for providing a mouse genomic library, and Beth Dombroski for critical reading of the manuscript. We thank the DNA sequencing core facility at the University of Pennsylvania School of Medicine. This work was supported by NIH grants to H.H.K. and S.L.M.; T.P.N. was supported by an EMBO Long‐Term Fellowship and J.V.M. by a Damon Runyon–Walter Winchell Cancer Research Fund Fellowship (DRG 1332).
- Copyright © 1998 European Molecular Biology Organization