3′ end mRNA processing: molecular mechanisms and implications for health and disease

Sven Danckwardt, Matthias W Hentze, Andreas E Kulozik

Author Affiliations

  1. Sven Danckwardt1,2,
  2. Matthias W Hentze2,3 and
  3. Andreas E Kulozik*,1,2
  1. 1 Department of Pediatric Oncology, Hematology and Immunology, University of Heidelberg, Heidelberg, Germany
  2. 2 EMBL—University of Heidelberg Molecular Medicine Partnership Unit, Heidelberg, Germany
  3. 3 European Molecular Biology Laboratory, Heidelberg, Germany
  1. *Corresponding author. Department of Pediatric Oncology, Hematology and Immunology, University of Heidelberg, Heidelberg, Germany; and EMBL—University of Heidelberg Molecular Medicine Partnership Unit, Im Neuenheimer Feld 150, Heidelberg 69120, Germany. Tel.: +49 06221 564555; Fax: +49 06221 564559; E-mail: andreas.kulozik{at}


Recent advances in the understanding of the molecular mechanism of mRNA 3′ end processing have uncovered a previously unanticipated integrated network of transcriptional and RNA‐processing mechanisms. A variety of human diseases impressively reflect the importance of the precision of the complex 3′ end‐processing machinery and gene specific deregulation of 3′ end processing can result from mutations of RNA sequence elements that bind key specific processing factors. Interestingly, more general deregulation of 3′ end processing can be caused either by mutations of these processing factors or by the disturbance of the well‐coordinated equilibrium between these factors. From a medical perspective, both loss of function and gain of function can be functionally relevant, and an increasing number of different disease entities exemplifies that inappropriate 3′ end formation of human mRNAs can have a tremendous impact on health and disease. Here, we review the mechanistic hallmarks of mRNA 3′ end processing, highlight the medical relevance of deregulation of this important step of mRNA maturation and illustrate the implications for diagnostic and therapeutic strategies.


Messenger RNA 3′ end processing is a well‐orchestrated process that involves components of the transcription, the splicing and the translation machinery. The medical importance of 3′ end processing is illustrated by an increasing number of different disease entities, which are caused by inappropriate 3′ end processing. In this review, we discuss key mechanistic features of 3′ end formation, although the reader is referred to several excellent reviews that cover the basic aspects of 3′ end formation and its regulation in more detail (Zhao and Manley, 1996; Colgan and Manley, 1997; Edwalds‐Gilbert et al, 1997; Keller and Minvielle‐Sebastia, 1997; Wahle and Kuhn, 1997; Barabino and Keller, 1999; Zhao et al, 1999; Edmonds, 2002; Proudfoot et al, 2002; Gilmartin, 2005; Weiner, 2005; Rosonina et al, 2006). We then focus on the medical perspective of 3′ end processing and illustrate how disease can be caused either by mutations of important RNA sequence elements and/or by the pathological expression of the proteins interacting with such sequence elements and between each other.

The eukaryotic mRNA 3′ end‐processing machinery

With the exception of some histone mRNAs, all eukaryotic mRNAs possess poly(A) tails at their 3 end, which are produced by a two‐step reaction involving endonucleolytic cleavage and subsequent poly(A) tail addition. The specificity and efficiency of 3′ end processing is determined by the binding of multiprotein complexes to specific elements at the 3′ end of the pre‐mRNA. Most cellular pre‐mRNAs contain two core elements (Figure 1A). The canonical polyadenylation signal AAUAAA (or less frequently AUUAAA) upstream of the cleavage site is recognized by the multimeric cleavage and polyadenylation specificity factor (CPSF, light blue in Figure 1A) consisting of at least five subunits (CPSF 160, CPSF 100, CPSF 73, CPSF 30 and hFip1). This RNA–protein interaction determines the site of cleavage 10–30 nt downstream, preferentially immediately 3′ of a CA dinucleotide. The second canonical sequence element is characterized by a high density of G/U or U residues and is located up to 30 nt downstream of the cleavage site. This downstream sequence element (DSE) is bound by the 64‐kDa subunit of the heterotrimeric cleavage‐stimulating factor (CstF, dark blue in Figure 1A) that promotes the efficiency of 3′ end processing. Furthermore, accessory sequences that have initially been identified in (retro‐) viral polyadenylation signals (Gilmartin et al, 1995; Graveley et al, 1996) can function as upstream sequence elements (USEs) (Moreira et al, 1995, 1998; Brackenridge and Proudfoot, 2000; Natalizio et al, 2002; Legendre and Gautheret, 2003; Danckwardt et al, 2004, 2007; Hall‐Pogar et al, 2005, 2007; Hu et al, 2005; Xie et al, 2005) to facilitate 3′ end processing by serving as an additional anchor for the (canonical) 3′ end‐processing machinery (Danckwardt et al, 2007; Hall‐Pogar et al, 2007), or by recruiting canonical 3′ end factors directly (Moreira et al, 1995, 1998). Moreover, in vitro studies have identified a UGUA motif to be present in one or more copies at variable distances upstream of the cleavage site. These sequence motifs are thought to recruit the heterodimeric cleavage factor CFIm (Brown and Gilmartin, 2003; Venkataraman et al, 2005). These elements may also represent a primary determinant of poly(A) site recognition in the absence of the highly conserved A(A/U)UAAA motif (Venkataraman et al, 2005). Unlike the replication‐dependent histone pre‐mRNA processing elements (see below), conventional poly(A) sites thus exhibit a wide range of both sequence and spatial variability.

Figure 1.

Cis‐acting sequence elements and trans‐acting factors involved in mammalian 3′ end processing. (A) 3′ end processing of polyadenylated mRNAs. After assembly of multiprotein complexes at the respective RNA recognition motifs (upper panel), the primary transcript is endonucleolytically cleaved at the cleavage site by CPSF 73. This is followed by the addition of adenine residues to the 3′ end to form a poly(A) tail that is bound by PABPN1. The interaction of PAP with this protein and with CPSF is critical to establish the processive action of the polymerase for the synthesis of approximately 250 A residues. Following polyadenylation, the interaction of PABPN1 with the poly(A) tail is characterized by a rapid on–off rate, and PABN1 is exchanged by cytoplasmic PAPB (PABPC) at the time of nuclear export. PABPC interacts with the translation initiation factor elF4G as part of the initiation complex thus generating a translation competent pseudocircular ribonucleoprotein particle. (B) In contrast to mRNAs with a poly(A) site, 3′ end processing of replication‐dependent histone mRNAs requires conserved sequence elements and structural elements. The signal for 3′ end processing of this small class of pre‐mRNAs consists of a conserved stem–loop element that is positioned upstream of the cleavage site and a purine‐rich HDE. The HDE is recognized by base pairing with the 5′ end of the U7 small nuclear RNA, which is incorporated into a U7 snRNP. The stem–loop is bound by the SLBP, which functions in 3′ end processing, translation and in the coupling of message stability to DNA replication and the cell cycle. After recruitment of the 3′ end‐processing apparatus, the site of endonucleolytic cleavage occurs 3′ of a CA dinucleotide 9–12 nucleotides upstream of the intermolecular U7/HDE RNA duplex. The SLBP also serves to establish a pseudo‐circularization together with initiation factor elF4G. Stimulatory interactions are highlighted (+). Cis‐acting RNA elements and trans‐acting factors are summarized inTables I and II, USE=upstream sequence element, AAUAAA=poly(A) signal, DSE=downstream sequence element, HDE=Histone downstream element, CstF=cleavage stimulating factor (blue complex), CPSF=cleavage/polyadenylation specificity factor (light blue complex), Pol II=RNA polymerase II (light green complex) with CTD (C‐terminal domain), CF I=cleavage factor I (green complex), CF II=cleavage factor II (grey complex), PAP=poly(A)‐polymerase (yellow complex), PABPN1=nuclear poly(A)‐binding protein, PABPC=cytoplasmic poly(A)‐binding protein (dark blue complex), 4A, 4E, 4G=translation initiation factors (grey complex), SLBP=stem loop binding protein (yellow complex).

After assembly of the basal 3′ end‐processing machinery, the endonucleolytic cleavage reaction is thought to be catalyzed by CPSF 73 (Ryan et al, 2004; Dominski et al, 2005; Mandel et al, 2006). Subsequently, a nuclear poly(A) polymerase (PAP) adds ∼250 A‐nucleotides to the 3′ end in a template‐independent manner. The length of the poly(A) tail is similar in different mRNAs, and is thought to be determined by an interaction between the nuclear poly(A)‐binding protein (PABPN1, PABP2), CPSF and PAP (Kuhn and Wahle, 2004). The binding of PABPN1 to the RNA is unstable and upon nuclear export PABPN1 is replaced by the cytosolic poly(A)‐binding protein (PABPC; for review see Mangus et al, 2003; Kuhn and Wahle, 2004), which interacts with the translation initiation factor eIF4G, stimulating translation and regulating mRNA stability (Sachs et al, 1997; Kahvejian et al, 2001; Kuhn and Wahle, 2004). Furthermore, PABPC can interact with the translation termination factor eRF3 (Cosson et al, 2002), implicating a role of the poly(A) tail in translation termination and possibly ribosome recycling.

In contrast to bulk mRNAs, 3′ processing of the replication‐dependent histone mRNAs requires both conserved sequence and structural elements (Figure 1B; for review see Marzluff and Duronio, 2002; Gilmartin, 2005; Marzluff, 2005). The signal for 3′ end processing of this class of pre‐mRNAs consists of a conserved stem–loop that is positioned upstream of the cleavage site, and a purine‐rich histone downstream element (HDE). The HDE is recognized by base pairing with the 5′ end of the U7 small nuclear RNA, which is incorporated into a ribonucleoprotein (RNP) of the Sm class (U7 snRNP). The stem–loop is bound by stem–loop‐binding protein (SLBP) that functions in 3′ end processing, translation and in coupling of histone mRNA stability to DNA replication and the cell cycle. The site of endonucleolytic cleavage occurs after a CA dinucleotide of the histone pre‐mRNA sequence, and is located 9–12 nucleotides upstream of the intermolecular U7/HDE RNA duplex. Thus, the replication‐dependent histone mRNAs display a far more rigid composition of cis‐acting sequence elements than mRNAs with poly(A) sites. Despite the distinct mechanisms that specify the site of 3′ end processing, it is now becoming increasingly clear that both classes of transcripts share a common catalytic core consisting of the CPSF and CstF subcomplexes (Kolev and Steitz, 2005) likely including CPSF 73 as the endonuclease (Ryan et al, 2004; Dominski et al, 2005; Mandel et al, 2006).

The shared components point to a common evolutionary origin of the different 3′ end formation machineries. However, whereas poly(A) site processing is a two‐step process coupled to other steps of the gene expression pathway (transcription and splicing; Minvielle‐Sebastia and Keller, 1999; Adamson et al, 2005; see next section), the synthesis of mature (inherently intronless) histone mRNAs requires only one RNA‐processing reaction. In addition, histone mRNA 3′ end processing seems to be incompatible with splicing (Pandey et al, 1990) and does not appear to depend on transcription (Adamson and Price, 2003). Thus, by sharing the same catalytic core machinery, but using specific 3′ end‐processing sites, the histone pre‐mRNAs have developed an efficient way to meet their needs of replication‐dependent gene expression.

An integrated network of co‐transcriptional mRNA‐processing events controls gene expression

Producing a mature RNA from a mammalian gene requires a complex and tightly coupled series of molecular mechanisms (Hirose and Manley, 2000; Maniatis and Reed, 2002; Proudfoot et al, 2002; Proudfoot, 2004); as a nascent pre‐mRNA transcript emerges from the elongating RNA polymerase II (Pol II) both extensive constitutive and alternative splicing events occur co‐transcriptionally (Lopez, 1998; Hirose and Manley, 2000; Proudfoot et al, 2002), giving rise to a perplexingly high diversity within the transcriptome (Black, 2000, 2003; Graveley, 2001; Shin and Manley, 2004). Eventually, transcription termination stops the elongating polymerase triggered by recognition of poly(A) signals in the nascent transcript (Proudfoot et al, 2002; Rosonina et al, 2006), and both CPSF and CstF are transferred by the C‐terminal domain (CTD) of Pol II to their specific pre‐mRNA‐binding sites (see above) to produce the mRNA 3′ end. The phosphorylation of serine residues within the CTD critically regulates gene expression (Ahn et al, 2004): it coordinates the recruitment of RNA‐processing factors (including the 5′ capping complex, the spliceosome and 3′ processing machinery), and regulates chromatin organization by histone methylation (Hampsey and Reinberg, 2003).

As proposed for the coupling between transcription and pre‐mRNA 3′ end processing at poly(A) sites, the extensive crosstalk between the different pre‐mRNA‐processing activities (capping, splicing and polyadenylation) has recently emerged to be critical for both gene expression and, interestingly, genome integrity (Figure 2); proteins that bind to the cap structure of pre‐mRNAs interact with splicing factors and promote recognition of the cap‐proximal splice site (Colot et al, 1996; Lewis et al, 1996). Conversely, splicing factors that associate with the 3′ terminal intron also interact with downstream polyadenylation factors to mutually promote both 3′ end cleavage/polyadenylation and terminal intron splicing (Niwa et al, 1990; Wassarman and Steitz, 1993; Lutz et al, 1996; Gunderson et al, 1997; Vagner et al, 2000; Li et al, 2001; McCracken et al, 2002, 2003; Millevoi et al, 2002, 2006; Awasthi and Alwine, 2003; Kyburz et al, 2006; Danckwardt et al, 2007). It should be noted, however, that the effect of these interactions might also be inhibitory when factors are bound at different positions such as within the 3′ terminal exon or within the 3′ UTR. These interactions ensure that the correct splice sites are recognized, and that 3′ end processing is timely, accurate and efficient. Moreover, the extensive integration between the different co‐transcriptional mechanisms is believed to protect chromosomes from potentially deleterious effects that could arise from interaction between the nascent RNA and template DNA during transcription (Li and Manley, 2006; Figure 2B). Finally, functional polyadenylation signals and polyadenylation factors are required for efficient transcription termination (Proudfoot et al, 2002; Buratowski, 2005; Rosonina et al, 2006; Kaneko et al, 2007) and release of the polyadenylated mRNAs for export from the nucleus (Reed and Hurt, 2002). Therefore, the efficiency of polyadenylation can have significant quantitative effects on gene expression in general, and defects of mRNA 3′ end formation can profoundly affect cell viability, growth and development.

Figure 2.

Integrated networks of co‐transcriptional mRNA processing to regulate gene expression and to maintain genomic stability. (A) Transcription initiation, elongation and termination (circular arrow) are tightly coupled to mRNA processing steps such as capping, splicing and 3′ end processing (inner circle). Appropriate 3′ end processing is functionally interconnected with transcription and mRNA capping and splicing, and impacts on post‐transcriptional mechanisms (mRNA release, export, abundance and translation). Loss or gain of function of 3′ end processing thus critically interferes with other gene expression steps. (B) Co‐transcriptional mRNA processing is believed to promote packaging of the nascent RNA transcript (formation of an ‘inert’ RNP particle, upper panel) and thus to prevent the accumulation of co‐transcriptional R‐loops (lower panel), which can lead to DNA double strand breaks and chromosomal rearrangements. Disruption of co‐transcriptional RNA processing is therefore thought to result in genomic instability (for review see Li and Manley, 2006).

The medical relevance of errors of 3′ end processing is exemplified by different inherited and acquired human disorders. In the following section, we focus on a group of disorders that highlight the most characteristic features of 3′ end processing and show how alterations of sequence elements and of protein components of the 3′ end processing machinery result in human pathology.

Mutations of sequence elements

Loss of function—no flexibility in cis?

Loss‐of‐function mutations of globin mRNA 3′ end processing are a well‐recognized cause of thalassemias. The thalassemias are a heterogenous group of very common human genetic disorders that result from defects in hemoglobin production. The globin genes were the first human genes to be cloned (Maniatis et al, 1976), and thus represent the earliest medically important genes that illustrate how different steps of the gene expression pathway can be inactivated by naturally occurring mutations. Thalassemias are characterized by highly variable phenotypes that are determined by the extent of hemolysis and ineffective erythropoiesis—ranging from a complete lack of symptoms to severe, transfusion‐dependent anemia.

This remarkable phenotypic diversity reflects the heterogeneity of mutations of the globin loci and modulation by a variety of modifiers (for review see Weatherall, 2001). In addition, general surveillance mechanisms of gene expression, such as nonsense‐mediated mRNA decay (NMD), represent potent phenotypic modifiers as has been exemplified by the identification of β‐thalassemia alleles with an unusual dominant mode of inheritance (for review see Hentze and Kulozik, 1999; Holbrook et al, 2004; Maquat, 2004; Neu‐Yilik et al, 2004).

The understanding of 3′ end mRNA processing has been markedly advanced by studies of thalassemia patients with different mutations that result in an alteration of the AAUAAA hexanucleotide. Such mutations have been identified in both the α‐globin (Higgs et al, 1983; Harteveld et al, 1994) and β‐globin genes (Orkin et al, 1985; Jankovic et al, 1990; Rund et al, 1992; van Solinge et al, 1996), and invariably inactivate or severely inhibit gene expression. Similar mutations have been observed in, for example, the Foxp3 gene causing IPEX syndrome (Bennett et al, 2001), a rare fatal disorder characterized by immune dysregulation, polyendocrinopathy, enteropathy and X‐linked inheritance.

However, it is important to note that poly(A) signal mutations do not always cause disease. This is exemplified by individuals who carry an AAUAAC to AGUAAC mutation of the arylsulfatase A gene poly(A) signal. Null mutations of this gene cause metachromatic leucodystrophy, a most serious neurodegenerative disorder. The AAUAAC to AGUAAC mutation reduces, but does not completely inactivate, mRNA and enzyme expression. Carriers of this hypomorphic mutation do not develop disease symptoms but a state referred to as ‘pseudodeficiency’, exemplifying that biochemically severe mutations can remain clinically silent (Barth et al, 1993; Harvey et al, 1998).

The complexity of effects of poly(A) site mutations is further reflected by mutations of the lysosomal alpha‐galactosidase A (alpha‐GalA) gene. Human alpha‐GalA is one of the rare mammalian genes that bears its polyadenylation signal within the coding sequence and lacks a 3′ untranslated region (Bishop et al, 1988). An AA dinucleotide deletion within this poly(A) site results in aberrant 3′ end formation, with generation of multiple non‐functional transcripts, and in complete inactivation of the gene (functional null allele). Affected patients develop Fabry disease, a severe X‐linked recessive inborn error of glycosphingolipid catabolism.

These examples underscore the functional importance of the highly conserved poly(A) signal and show that there is little sequence flexibility with regard to the hexanucleotide signal.

Gain of function—too much of a good thing

Clinically relevant gain‐of‐function mutations stimulating 3′ end processing were first identified in the prothrombin (coagulation factor II; F2) gene (Gehring et al, 2001; Danckwardt et al, 2004). Such mutations cause raised prothrombin plasma concentrations, which disturb the finely tuned balance between pro‐ and anticoagulatory activities, and result in an increased risk to develop thrombosis (referred to as thrombophilia). The F2 20210*A mutation affects approximately 1–2% of the general Caucasian population and hence represents a common cause of thrombophilia (Poort et al, 1996). This mutation affects the most 3′ nucleotide of the F2 mRNA, where it is endonucleolytically cleaved and polyadenylated (Figure 3; Gehring et al, 2001). Physiological 3′ cleavage of the F2 mRNA occurs 3′ of a CG dinucleotide, whereas most mRNAs are cleaved 3′ of a CA dinucleotide. The CG dinucleotide has been shown to be less efficient in promoting the cleavage reaction in vitro (Chen et al, 1995), and the CG → CA mutation increases F2 3′ end processing efficiency in cell lines and in transgenic mice (Gehring et al, 2001; Danckwardt et al, 2004; Kuwahara et al, 2004). Hence, the G → A mutation at position 20 210 reverts the physiologically inefficient F2 cleavage site into the mechanistically most efficient CA dinucleotide, which increases cleavage site recognition and results in an approximately twofold enhancement of prothrombin mRNA and protein expression. Increased mRNA 3′ end formation efficiency thus emerged as a novel molecular principle causing pathological gene expression and explains the role of F2 20210*A in the pathogenesis of thrombophilia. Furthermore, this mutation represents a paradigm of a quantitatively subtle change of gene expression that can cause functionally and clinically most significant consequences.

Figure 3.

The human prothrombin 3′ end‐processing signal shows an unusual architecture of non‐canonical sequence elements, which are susceptible to clinically relevant gain‐of‐function mutations. A sequence comparison of the 3′ end‐processing signals of efficiently processed mRNAs such as SV40 late or β‐globin (HBB) with F2 (lower lane) revealed an inefficient F2 cleavage dinucleotide context and a uridine‐poor DSE. In contrast, the F2 3′ untranslated region contains a uridine‐rich USE that promotes 3′ end processing and hence balances F2 mRNA expression. Sequences encompassing the cleavage site and the 3′‐flanking region represent a vulnerable region for clinically relevant gain‐of‐function mutations.

Subsequently, two further, albeit rare, thrombosis‐related mutations of F2 3′ end processing were identified (Balim et al, 2003; Schrijver et al, 2003; Danckwardt et al, 2004, 2006b; Soo et al, 2005) and shown to increase 3′ end formation efficiency (Figure 3; Danckwardt et al, 2004, 2006b). One of these mutations is a C → T exchange that occurs at the penultimate position 20 209 of the F2 3′ UTR (F2 20209*T). The other mutation introduces an additional U‐residue 11 nucleotides 3′ of the cleavage site into the putative CstF‐binding site in the F2 3′‐flanking region (F2 20221*T). The detailed analysis of F2 3′ end formation determinants in patients with thrombophilia has thus identified an unusual architecture of non‐canonical sequence elements, which explains the susceptibility of the F2 3′ end processing region to gain‐of‐function mutations (Danckwardt et al, 2004, 2006a); (1) the F2 3′ end formation signal contains the least efficient dinucleotide at the physiological cleavage site. Mutations such as F2 20210*A and F2 20209*T can revert this physiological inefficiency to pathologically increased efficiency. (2) The F2 putative CstF‐binding site displays an unusually low density of uridine residues when compared with efficiently 3′ end‐processed mRNAs such as β‐globin and SV 40. Consequently, the introduction of (an) additional uridine residue(s) into the F2 3′‐flanking sequence by either the natural F2 20221*T mutation or the experimental insertion at adjacent sites enhances 3′ end processing, presumably by facilitating interaction of CstF with the pre‐mRNA (Danckwardt et al, 2004).

Based on these data, the question arose how efficient gene expression in general and 3′ end‐processing efficiency in particular can be maintained for a gene with inefficient canonical 3′ end‐processing elements, but which encodes a highly abundant protein that makes up approximately 5% (≈3 g/l) of the total blood plasma protein content. This question could be resolved by the identification of a stimulatory USE within the F2 3′ UTR, which compensates for the weak functional activities of the cleavage site and the downstream U‐rich element (DSE) in the F2 3′‐flanking sequence (Figure 3). The F2 3′ end‐processing signal thus appears to be balanced by an unusual architecture of weak and strong stimulatory sequence elements (Danckwardt et al, 2004, 2006a).

This architecture in general and the USE in particular have elicited considerable interest, because USEs play key roles in important physiological functions such as blood coagulation (prothrombin; Danckwardt et al, 2004), innate immunity (complement C2; Moreira et al, 1998), inflammation (cyclooxygenase‐2 (Cox‐2); Hall‐Pogar et al, 2005) and in the maintenance of cell structure (lamin B2; Brackenridge and Proudfoot, 2000) and the intercellular matrix (collagen; Natalizio et al, 2002). Biocomputational analyses predict that USE‐dependent 3′ end processing may be common amongst cellular mRNAs (Legendre and Gautheret, 2003; Hu et al, 2005; Danckwardt et al, 2007), and characterize a novel class of transcripts that are regulated by alternative 3′ end processing (Hall‐Pogar et al, 2005). In some RNAs, USEs functionally compensate for the lack of a DSE by recruiting CstF (Moreira et al, 1995, 1998). Furthermore, novel USE‐interacting trans‐acting factors have been identified (hnRNPI/PTB, U2AF35, U2AF65, PSF, U2A1, p54nrb), which are thought to stabilize protein–mRNA interactions with the canonical 3′ end‐processing machinery (CPSF and CstF) and thus to promote 3′ end processing (Moreira et al, 1995, 1998; Danckwardt et al, 2007; Hall‐Pogar et al, 2007). Moreover, the USE‐dependent recruitment of splicing factors involved in 3′ end processing indicates that sequence elements in the 3′ UTR contribute to the integrated network of different mRNA‐processing steps (Hirose and Manley, 2000; Maniatis and Reed, 2002; Proudfoot et al, 2002).

The physiological importance of USE‐dependent 3′ end‐processing signals has not yet been fully elucidated. However, because of different cofactor requirements (Danckwardt et al, 2007; Hall‐Pogar et al, 2007), it is tempting to speculate that 3′ end formation of some of these mRNAs might be differentially regulated. USE‐dependent 3′ end formation may thus play an important role in the regulated physiology of processes such as blood coagulation or inflammatory processes (see below).

Role of trans‐acting factors

Regulated alternative 3′ end processing in immunity and inflammation

Alternative poly(A) site selection has been identified to represent an important and evolutionary conserved regulatory mechanism for spatial (tissue specificity, that is, Cox‐2 or calcitonin/calcitonin gene‐related peptide; Feng et al, 1993; Lukiw and Bazan, 1997; Lou and Gagel, 1998; Hall‐Pogar et al, 2005) and temporal control of gene expression (i.e., immunoglobulin class switch; Alt et al, 1980; Early et al, 1980; Rogers et al, 1980; Cushley et al, 1982; Takagaki et al, 1996; Takagaki and Manley, 1998; for review see Zhao and Manley, 1996; Edwalds‐Gilbert et al, 1997; Barabino and Keller, 1999; Zhao et al, 1999). This is underlined by the observation that about half of the human mRNAs contain more than one polyadenylation site (Tian et al, 2005; Yan and Marr, 2005). Alternative poly(A) site choices are usually regulated by various cis‐and trans‐acting determinants, and primarily influenced by (1) the intrinsic strength of sequence elements, (2) the concentration or activity of polyadenylation factors and/or (3) by tissue‐ or stage‐specific regulatory factors (Barabino and Keller, 1999). In this respect, the regulation of IgM heavy‐chain expression during B‐cell differentiation represents an intriguing example (Figure 4A; Alt et al, 1980; Early et al, 1980; Rogers et al, 1980): In this mRNA, alternative poly(A) site selection has been proposed to be regulated by the concentration of the CstF 64‐kDa subunit. According to this model, increased CstF‐64 switches the IgM heavy‐chain expression from a membrane‐bound form (μm) to the secreted form (μs) by activation of an alternative upstream μs‐specific poly(A) site in plasma cells (Takagaki et al, 1996). The low affinity upstream CstF‐64‐binding site would thus be favored under conditions of high CstF‐64 concentration, whereas the high‐affinity site of the membrane‐bound form (μm) is used when the CstF‐64 concentration is low.

Figure 4.

Regulated and alternative 3′ end processing in development and disease. (A) During B‐cell differentiation, alternative poly(A) site selection effects a switch of the IgM heavy‐chain expression from a membrane‐bound form (μm) to the secreted form (μs). In this example, CstF‐64 binding to the site of the RNA giving rise to secreted IgM (μs) is favored either by high CstF‐64 concentrations (Takagaki et al, 1996) or under conditions of low hnRNP F and/or low U1A concentrations (Veraldi et al, 2001; Phillips et al, 2004) in plasma cells (lower lane). In contrast, the high‐affinity site of the membrane bound form (μm) is used in B cells (upper lane), where the CstF‐64 concentration is low or when high concentrations of U1A and/or hnRNP F inhibit CstF‐64 binding to the secretory μs‐specific poly(A) site (modified after Barabino and Keller, 1999; boxes indicate exons). (B) The BRCA1‐associated protein BARD1 physically interacts with CstF‐50, thereby repressing the polyadenylation machinery (Kleiman and Manley, 1999). Both BARD1 and CstF‐50 also interact with Pol II (not shown), and BARD1 has been proposed to sense sites of DNA damage and repair. The BARD1‐mediated inhibition of polyadenylation thus prevents inappropriate RNA processing during transcription at such compromised sites. Consequently, challenging cells with DNA‐damaging agents results in a transient inhibition of 3′ end formation by enhanced formation of a CstF/BARD1/BRCA1 complex. Furthermore, a tumor‐associated germline mutation in BARD1 (Gln564His) decreases its affinity to CstF‐50 and renders the protein inactive in polyadenylation inhibition. These findings link 3′ end RNA processing with DNA repair, and loss of wild‐type BARD1 could therefore lead to defective control of gene expression as a result of inappropriate polyadenylation (Kleiman and Manley, 2001). (C) In influenza A virus‐infected cells, the highly abundant NS1 protein interacts with the cellular 30‐kDa subunit of CPSF (Nemeroff et al, 1998) and PABPN1 (not shown) (Chen et al, 1999). This prevents binding of the CPSF complex to its RNA substrates and selectively inhibits 3′ end processing and nuclear export of host pre‐mRNAs (adopted from; Nemeroff et al, 1998). In contrast, the 3′ terminal poly(A) sequence on viral mRNAs is produced by the viral transcriptase, which reiteratively copies a stretch of 4–7 uridines in the virion RNA templates. In addition, an endonuclease intrinsic to the viral polymerase cleaves cellular capped RNAs to generate capped fragments that serve as primers for the viral mRNA synthesis (so‐called ‘cap‐snatching mechanism’; Rao et al, 2003). Thus, by interfering with the activity of an essential 3′ end‐processing factor, influenza has devised an efficient way to specifically shut off cellular gene expression and to facilitate viral gene expression (further detail see Nemeroff et al, 1998; Chen et al, 1999; Rao et al, 2003).

Subsequent publications, however, showed that CstF‐64 levels do not change in primary human splenic B cells during differentiation (Martincic et al, 1998), and additional factors have been suggested to play an essential role in activating the secretory μs‐specific poly(A) site (Figure 4A; Edmonds, 2002); in one case, hnRNP F has been proposed to compete with CstF‐64 for binding sequences downstream of the μs‐specific poly(A) site (Veraldi et al, 2001). Furthermore, the U1A protein was reported to inhibit the secretory poly(A) site by a dual mechanism: U1A binding to cognate binding sites upstream to the 3′ end processing signal was reported to inhibit polyadenylation step at the μs‐specific poly(A) site (Phillips et al, 2001), whereas binding downstream inhibited the cleavage reaction by sterically inhibiting CstF‐64 RNA binding at the μs‐specific poly(A) site (Phillips et al, 2004). In any case, these findings illustrate that alternative 3′ end processing may be regulated by a number of different mechanisms to tightly control appropriate poly(A) site selection.

A similar mechanism seems to underlie the regulated expression of the transcription factor NF‐ATc during T‐cell differentiation (Chuvpilo et al, 1999). Two longer isoforms of NF‐ATc mRNA are synthesized in naïve T cells, whereas a shorter isoform is found in effector cells. This switch is mediated by activation of a proximal poly(A) site by upregulation of CstF‐64, which occurs upon T‐cell stimulation. Interestingly, LPS stimulation has recently been shown to increase CstF‐64 expression in macrophages, which promotes alternative polyadenylation of several mRNAs (Shell et al, 2005), establishing a possible link between the induction of an acute‐phase response and alternative polyadenylation. Such a link is also suggested by differences in poly(A) site use in the pancreatitis‐associated protein II/regenerating gene III (PAP II/Reg III), which is expressed during the acute‐phase response following acute pancreatitis. An elongated isoform of the PAP II/Reg III mRNA, which is processed at a downstream poly(A) site, appears to be specifically expressed during the early hours of regeneration, whereas the shorter isoform is expressed for a much longer period (Honda et al, 2002).

Further links between acute‐phase reactions and alternative polyadenylation have since been demonstrated for the inducible and tissue‐specific expression Cox‐2 mRNA in liver and colon cells (Ristimaki et al, 1996; Hall‐Pogar et al, 2005), in human neocortex in Alzheimer disease (Lukiw and Bazan, 1997) and for tropoelastin expression in sun‐damaged skin (Schwartz et al, 1998).

Cancer and 3′ end formation

Mutations that drive uncontrolled cell‐cycle progression are critical events in tumorigenesis (Kastan and Bartek, 2004). Interestingly, PAP activity is regulated in a cyclin‐dependent manner (Colgan et al, 1996, 1998) and is critical for early development (Ballantyne et al, 1995). Furthermore, depletion of CstF‐64 can lead to cell‐cycle arrest and result in apoptosis (Takagaki and Manley, 1998). Similarly, defective PAP causes cell‐cycle arrest in the G0–G1 phases (Zhao and Manley, 1996). PAP mRNA is overexpressed in human carcinomas of the breast, colon, ovary and pancreas (Pendurthi et al, 1997), and polyadenylation activity is significantly enhanced in aggressive acute leukemias and Burkitt lymphoma compared with less aggressive chronic leukemias and normal lymphocytes (Trangas et al, 1984); a similar situation applies to aggressive forms compared to less progressive forms of breast cancer (Scorilas et al, 2000). The activity of PAP thus likely reflects the proliferative activity of cells and serves as an indicator for dedifferentiation. Therefore, it may serve as a prognostic biomarker in different forms of malignancy (Pangalis et al, 1985; Sasaki et al, 1990; for review see Jacob et al, 1989; Scorilas, 2002).

It is not known whether enhanced PAP activity per se contributes to tumorigenesis or whether it represents a coincidental (epi‐) phenomenon to meet the requirements of rapidly proliferating cells. The latter mechanism has been suggested to account for the overexpression of a neoplasma‐associated ‘Neo’‐PAP (isotype), which is thought to be regulated by a mechanism distinct from that utilized by PAP in normal cells (Topalian et al, 2001). Notably, PAP isotype antibodies in the serum of cancer patients and rat models seem to have a predictive potential for cancer disease progression and outcome (Stetler et al, 1981). However, iso‐PAP antibodies are not cancer‐specific and have also been observed in patients with rheumatic disorders (Stetler et al, 1987).

A more specific causal link between mRNA 3′ end processing and DNA repair and tumor suppression is suggested by the observation that the BRCA1‐associated protein BARD1 physically interacts with CstF‐50 to inhibit the polyadenylation machinery (Kleiman and Manley, 1999; Figure 4B). Both BARD1 and CstF‐50 also interact with Pol II, and BARD1 has been proposed to sense sites of DNA damage and repair. The BARD1‐mediated inhibition of polyadenylation may thus prevent inappropriate RNA processing during transcription of damaged DNA loci. Challenging cells with DNA‐damaging agents has been shown to transiently inhibit 3′ end formation by enhanced formation of CstF/BARD1/BRCA1 complexes. Furthermore, a tumor‐associated germline mutation in BARD1 (Gln564His) decreases its affinity for CstF‐50 and renders the protein inactive in polyadenylation inhibition. These findings highlight an intriguing link between 3′ end RNA processing and DNA repair (Kleiman and Manley, 2001).

Notably, mutations of cis‐acting determinants of 3′ end processing can also elicit unregulated cell‐cycle progression and can be observed during carcinogenesis; polymorphisms in the acetyltransferase 1 gene (NAT1) affecting the poly(A) signal (or located in immediate vicinity) are believed to modulate N‐ or O‐acetylation of carcinogens. These polymorphisms have thus been implicated to represent an important genetic determinant of colorectal cancer risk (Bell et al, 1995).

Idiopathic hypereosinophilic syndrome and hFip1

The hypereosinophilic syndrome represents a severe hematologic disorder, with sustained overproduction of eosinophils in the bone marrow, eosinophilia, tissue infiltration, and organ damage. A DNA rearrangement involving a chromosomal deletion of 800 kb and fusion of the hFip1 (FIP1L1) and PDGFRα genes might be the cause of this syndrome (Cools et al, 2003). The corresponding chimeric protein, hFip1‐PDGFRα, contains the N‐terminus of hFip1, an integral subunit of the CPSF complex, and the C‐terminal kinase domain of PDGFRα. The expression of the hFip1‐PDGFRα fusion protein in hematopoietic cells constitutively activates the PDGFRα kinase and transforms cells. As constitutive activation of tyrosine kinases is a key feature of the pathogenesis of myeloproliferative disorders, it is tempting to speculate that the N‐terminal domain of hFip1 could mediate growth factor‐independent dimerization and thus result in the activation of the fusion protein. Alternatively, the N‐terminal domain of hFip1 could directly stimulate the kinase domain that is the therapeutic target for the highly specific kinase inhibitor imatinib.

Furthermore, the deletion that creates the hFip1‐PDGFRα fusion gene also causes a functional depletion of hFip1 that, under normal conditions, stimulates 3′ end processing (Kaufmann et al, 2004). Therefore, loss of hFip activity and an interference with 3′ end processing may contribute to or even alternatively explain the observed phenotype.

Ocular muscle dystrophy and the nuclear poly(A)‐binding protein

The medical relevance of mutations affecting 3′ end processing factors is further highlighted by oculopharyngeal muscular dystrophy (OPMD). This disorder is an adult‐onset disease with slowly progressive muscle weakness chiefly affecting the eyelids, resulting in ptosis, and the pharyngeal muscles resulting in dysphagia. OPMD is usually inherited as an autosomally dominant trait, but a more rare allelic autosomal recessive form also exists.

Both forms are typically caused by short trinucleotide repeat [(GCG)8–13] expansions in the coding region of the nuclear poly(A)‐binding protein 1 (PABPN1, PABP2; Brais et al, 1998). Normally, the polyalanine stretch encoded by this trinucleotide comprises 10 alanines, which is expanded to 12–17 alanines in autosomal‐dominant OPMD. This expansion results in an increase of self‐association, misfolding and filamentous nuclear aggregation of the PABPN1 protein in skeletal muscle. It is unknown why only muscle is affected phenotypically despite the ubiquitous expression of PAPBN1. In vitro, the mutant protein is fully active and OPMD cells do not display a severe polyadenylation defect (Calado et al, 2000; Kuhn and Wahle, 2004). It is likely therefore that the subtle clinical phenotype is caused by a quantitatively minor disturbance of the protein's function in polyadenylation, which may be difficult to detect in vitro or in transfected cells. Alternatively, the polyalanine stretch may increase the binding affinity to other proteins, which could consequently be inactivated by co‐sequestration. Finally, PABPN1 has also been suggested to play a role in the transcription of muscle‐specific genes, which could help to explain why other tissues are unaffected (Kim et al, 2001).

First come, first served: influenza A and its connection to the cellular 3′ end apparatus

Influenza A virus infection provides another interesting link between the disturbance of protein interactions within the polyadenylation machinery and human disease. The influenza A NS1 protein interacts with the cellular 30‐kDa subunit of CPSF, which prevents binding of the entire CPSF complex to the RNA substrate (Nemeroff et al, 1998; Figure 4C). Efficient polyadenylation depends on the interaction of CPSF 30 with PABPN1, which together stimulate PAP from a distributive to a processive mode of action. Importantly, the influenza a virus protein NS1 represents one of the most abundant proteins synthesized in infected cells (Lazarowitz et al, 1971), and regulates several post‐transcriptional processing steps (Fortes et al, 1994; Lu et al, 1994). In influenza virus‐infected cells, the highly abundant NS1 protein sequesters CPSF 30 and inhibits 3′ end cleavage and polyadenylation of the host pre‐mRNAs. Interestingly, the NS1 protein also targets PABPN1, which inhibits the processive synthesis of long poly(A) tails catalyzed by PAP (Chen et al, 1999). As mRNA processing represents a prerequisite for cytoplasmic export (Stutz and Rosbash, 1998), the uncleaved host pre‐mRNAs are retained in the nucleus. In contrast, viral RNAs are still exported. Thus, by interfering with the activity of an essential 3′ end processing factor, influenza has devised an efficient way to specifically shut off cellular gene expression and to facilitate viral gene expression that does not depend on the cellular 3′ end processing apparatus (Nemeroff et al, 1998; Chen et al, 1999).

Similar mechanisms have been identified to contribute to a preferential viral gene expression in HSV‐1‐infected cells through ICP27 (a multifunctional regulator of HSV gene expression, also known as IE63) that functions post‐transcriptionally to activate weaker viral poly(A) sites and to inhibit splicing of host cell mRNAs (McLauchlan et al, 1989, 1992; Sandri‐Goldin and Mendoza, 1992).


The 3′ UTR has clearly emerged as a hot spot for post‐transcriptional gene regulation, controlling important cellular functions such as, for example, morphogenesis, cell differentiation, metabolism, cell proliferation and apoptosis, by controlling mRNA translation, stability, localization as well as 3′ end processing (Ambros, 2004; Calin et al, 2005; Croce and Calin, 2005; He et al, 2005; Lu et al, 2005; O'Donnell et al, 2005). More recently, sequence determinants and 3′ end‐processing factors have emerged as common targets of mutations resulting in human pathology (see Tables I and II) (Chen et al, 2006a, 2006b).

View this table:
Table 1. Cis‐acting RNA elements involved in 3′ end processing and their medical relevance
View this table:
Table 2. Essential and auxiliary trans‐acting factors involved in poly(A) mRNA 3′ end processing, and their impact on health and disease

With the ongoing identification of novel sequence elements that impact on 3′ end processing (Moreira et al, 1995, 1998; Brackenridge and Proudfoot, 2000; Natalizio et al, 2002; Danckwardt et al, 2004, 2007; Hall‐Pogar et al, 2005, 2007), additional mutations located in these regions or in close proximity to the poly(A) signal altering 3′ end processing are likely to be found. Possibly, some of these may account for a number of polymorphisms linked to disease, which have not yet been functionally characterized. Such polymorphisms represent risk factors for cancer (Zheng et al, 2004; Zhang et al, 2004a, 2004b) that strikingly influence their treatment modalities (Cox et al, 2004; Stoehlmacher et al, 2004). Other 3′ UTR polymorphisms have been associated with the risk to develop type II diabetes (Shin et al, 2003), insulin resistance (Pizzuti et al, 2002), coronary artery disease (Chen et al, 2003), Alzheimer disease (Lambert et al, 2003), end‐stage renal disease (Bensen et al, 2003), the Fukuyama‐type congenital muscular dystrophy (Kobayashi et al, 1998), and a variety of clinically relevant polymorphisms known to affect the immunoregulation (Hennig et al, 2002; Kuroki et al, 2002; Seegers et al, 2002), although their molecular mode of action is still unknown.

While some of the established mutations affect the efficiency of constitutive 3′ end mRNA processing, others could conceivably affect the regulation of this critical step. Regulated 3′ end mRNA processing efficiency and regulated alternative 3′ end mRNA processing are becoming increasingly topical both in biology and in medicine, because both aspects of regulation add to the functional complexity of the transcriptome and are already known to be altered in human disease.

Such mechanisms of regulated 3′ end mRNA processing may include the recruitment of critical proteins to the 3′ end‐processing machinery in response to specific stimuli, as has previously been proposed for the control of splicing (Shin and Manley, 2004). Furthermore, specific sequence determinants of 3′ end processing may serve as target sites for competitive binding of mutually exclusive protein complexes or miRNAs, thus enabling a complex layer of post‐transcriptional regulation (Figure 5). Similar mechanisms have recently been shown to account for SXL‐dependent poly(A) site choice in the Drosophila female germline (Gawande et al, 2006), or to regulate 3′ end‐processing efficiencies in different tissues (Zhu et al, 2006, 2007). Finally, the activation of alternative, distal polyadenylation sites may result in the inclusion of additional mRNA sequence motifs in a transcript. Such sequence motifs may be recognized by proteins that regulate the stability of the mRNA (Wilson and Treisman, 1988; Shyu and Wilkinson, 2000; Barreau et al, 2005; Gilat and Shweiki, 2007). Alternatively, downstream activation of alternative polyadenylation sites may modulate the accessibility for miRNAs (Stark et al, 2005), thus resulting in an additional layer of regulation of mRNA degradation and translation (Cohen and Brennecke, 2006; Giraldez et al, 2006; Legendre et al, 2006).

Figure 5.

Regulated and alternative 3′ end processing modulates the temporal and spatial diversity of gene expression. About half of the human pre‐mRNAs contain (multiple) alternative poly(A) signals. A large number of these pre‐mRNAs has alternative, mostly tandem, arrays of poly(A) sites within the 3′ UTR. A smaller set of pre‐mRNAs bears alternative poly(A) signals within intronic or exonic regions. In both the cases, endogenous and exogenous factors can modulate pre‐mRNA poly(A) site selection by interfering with constitutive and/or auxiliary 3′ end‐processing factors/subunits (upper panel; depicted are various scenarios on a single pre‐mRNA). This results in various polyadenylated mRNAs that either code for identical (tandem terminal poly(A) sites) or C‐terminally modified (internal poly(A) sites) proteins (see also Figure 4A). Furthermore, alternatively 3′ end‐processed mRNAs can display different 3′ UTR properties (middle and lower panel; mRNA variants 1 and 2, respectively). This diversity can affect mRNA abundance, mRNA localization, mRNA transport and translation of the respective mRNA variant (lower lane). Importantly, interaction of AU‐rich elements (AREs) with the respective binding proteins (ABP) has a dual function and can result in both mRNA stabilization and destabilization. Mutually exclusive binding of different trans‐acting factors allows a complex modulation of different processing activities. Auxiliary 3′ end‐processing sequences (USE) may represent target sites for trans‐acting factors that modulate 3′ end formation efficiencies.

From a medical perspective, it will be interesting to explore polymorphisms of components of the 3′ end‐processing machinery as biomarkers that influence therapeutic decisions. This is already exemplified by the introduction of the common prothrombin 20 210 mutation into clinical algorithms for secondary thrombosis prophylaxis (Hirsh and Lee, 2002). Even more interesting, a better understanding of the molecular pathology of specific disorders may lead to approaches to correct deregulated RNA processing events. This is exemplified in OPMD (see above). In animal models of this form of muscular dystrophy, the detrimental impact of aggregates of the nuclear poly(A)‐binding protein in skeletal muscle could be reduced by chaperone‐guided approaches or by doxycyclin treatment (Davies et al, 2006). Finally, the emerging knowledge about USEs stimulating mRNA 3′ end formation (Moreira et al, 1995, 1998; Brackenridge and Proudfoot, 2000; Natalizio et al, 2002; Danckwardt et al, 2004, 2007; Hall‐Pogar et al, 2005, 2007) may have useful implications for the development of safe retroviral gene therapy strategies (Valsamakis et al, 1991; Zaiss et al, 2002). USEs improve the efficiency of the poly(A) signal (Gilmartin et al, 1995), increase transgene expression efficiency and decrease the risk of downstream activation of potentially harmful cellular genes by reducing readthrough transcription (Schambach et al, 2007).

In sum, the mechanisms of 3′ end mRNA processing and of its regulation are highly relevant both in biology and in medicine, and deserve to be met with an increasing level of awareness.


We thank the members of the Molecular Medicine Partnership Unit for discussions. We apologize for the limited citation of the primary literature outside of the main focus of this review. This work was supported by the Fritz Thyssen Stiftung, the Deutsche Forschungsgemeinschaft and by the ‘Young Investigator Award’ fellowship from the University of Heidelberg (to SD).