Structure of tandem RNA recognition motifs from polypyrimidine tract binding protein reveals novel features of the RRM fold

Maria R. Conte, Tim Grüne, Jamie Ghuman, Geoff Kelly, Anastasia Ladas, Stephen Matthews, Stephen Curry

Author Affiliations

  1. Maria R. Conte1,2,
  2. Tim Grüne3,4,
  3. Jamie Ghuman1,3,
  4. Geoff Kelly1,
  5. Anastasia Ladas3,
  6. Stephen Matthews (s.j.matthews{at}*,1 and
  7. Stephen Curry (s.curry{at}*,3
  1. 1 Department of Biochemistry, Imperial College of Science, Technology and Medicine, Exhibition Road, London, SW7 2AY, UK
  2. 2 Present address: Biophysics Laboratories, School of Biological Sciences, St Michael's Building, University of Portsmouth, White Swan Road, Portsmouth, PO1 2DT, UK
  3. 3 Biophysics Section, Blackett Laboratory, Imperial College of Science, Technology and Medicine, Prince Consort Road, London, SW7 2BW, UK
  4. 4 Present address: EMBL Grenoble Outstation, 6 rue Jules Horowitz, BP156, F‐38042, Grenoble, Cedex 9, France
  1. * E‐mail: s.j.matthews{at}

    E‐mail: s.curry{at}


Polypyrimidine tract binding protein (PTB), an RNA binding protein containing four RNA recognition motifs (RRMs), is involved in both pre‐mRNA splicing and translation initiation directed by picornaviral internal ribosome entry sites. Sequence comparisons previously indicated that PTB is a non‐canonical RRM protein. The solution structure of a PTB fragment containing RRMs 3 and 4 shows that the protein consists of two domains connected by a long, flexible linker. The two domains tumble independently in solution, having no fixed relative orientation. In addition to the βαββαβ topology, which is characteristic of RRM domains, the C‐terminal extension of PTB RRM‐3 incorporates an unanticipated fifth β‐strand, which extends the RNA binding surface. The long, disordered polypeptide connecting β4 and β5 in RRM‐3 is poised above the RNA binding surface and is likely to contribute to RNA recognition. Mutational analyses show that both RRM‐3 and RRM‐4 contribute to RNA binding specificity and that, despite its unusual sequence, PTB binds RNA in a manner akin to that of other RRM proteins.


The polypyrimidine tract binding protein (PTB), also known as hnRNP‐I, was originally identified as a nuclear‐localized splicing factor that binds specifically to the polypyrimidine tract within pre‐mRNA introns (Gil et al., 1991; Patton et al., 1991; Ghetti et al., 1992; for review, see Valcárcel and Gebauer, 1997). PTB is a homodimer (Pérez et al., 1997a) and appears to function mainly as a negative regulator of pre‐mRNA splicing by binding to high‐affinity sequence motifs in the introns upstream and downstream of regulated exons and repressing their incorporation into mature RNA (reviewed in Valcárcel and Gebauer, 1997). PTB has been shown to be involved in the regulation of splicing in a number of different genes, including α‐tropomyosin (Pérez et al., 1997a; Gooding et al., 1998), α‐actinin (Southby et al., 1999), c‐src (Chan and Black, 1997) and the γ2 subunit of the GABAA receptor (Ashiya and Grabowski, 1997). Recent studies have shown that PTB may have additional roles in the regulation of RNA polyadenylation (Moreira et al., 1998; Lou et al., 1999) and RNA localization (Cote et al., 1999). The specific intron RNA sequence motifs that bind PTB are only partially characterized; in general, short RNA stretches such as UCUUC or UUCU/C within a longer pyrimidine segment represent the consensus binding site for the protein (Ashiya and Grabowski, 1997; Pérez et al., 1997b).

Intriguingly, the protein also interacts with picornaviral internal ribosomal entry site (IRES) elements. These are large RNA segments (∼450 bases) with extensive secondary structure found within the lengthy 5′‐untranslated region of the picornavirus positive‐sense singled‐stranded RNA genome (reviewed in Jackson, 1996). Picornaviral RNA lacks a 5′‐cap structure, which is normally used by eukaryotic messages to recruit the initiation factors that enable the small ribosomal subunit to locate the initiation codon by scanning. Instead, the IRES structure permits the productive association of the viral message with the small ribosomal subunit by a cap‐independent mechanism that largely avoids scanning (Jackson, 1996). IRES‐dependent translation initiation requires a subset of canonical eukaryotic translation initiation factors (eIFs), but additional proteins are also needed (Belsham and Sonenberg, 1996). PTB was identified as one of these additional factors since it can stimulate the IRES‐dependent translation initiation of most picornaviruses (Jang and Wimmer, 1990; Kaminski et al., 1995; Kaminski and Jackson, 1998; Hunt and Jackson, 1999), and multiple PTB binding sites have now been mapped within the IRES elements of a number of different picornaviruses (Jang and Wimmer, 1990; Luz and Beck, 1991; Hellen et al., 1994; Kolupaeva et al., 1996).

The mechanisms of action for PTB in both splicing regulation and stimulation of translation initiation remain obscure, although both functions clearly require binding of PTB to RNA. PTB belongs to a family of RNA binding proteins characterized by possession of at least one ribonucleoprotein consensus sequence (RNP‐CS) RNA binding domain. This RNA recognition motif (RRM) is ∼90 amino acids long and contains conserved six‐ and eight‐residue motifs called RNP‐2 and RNP‐1, respectively. The RRM domain folds into a βαββαβ structure that presents a four‐stranded β‐sheet for RNA binding (Nagai et al., 1990). Each monomer of PTB is predicted to have four RRM domains but these are invariably non‐canonical and contain unusual features in the sequences of the ‘conserved’ motifs (Kenan et al., 1991). In particular, there is a general absence of aromatic residues at key positions in RNP‐1 and RNP‐2, which, in other RRM domains, have been shown to be responsible for non‐specific contacts with the RNA (Oubridge et al., 1994; Allain et al., 1997; Price et al., 1998; Deo et al., 1999; Handa et al., 1999) (Figure 1). Moreover, at position 2 in RNP‐1 a conserved glycine residue is substituted by much larger side chains in all four RRM domains of PTB (Figure 1).

Figure 1.

Sequence comparison of RNP‐2 and RNP‐1 motifs for PTB with the RRM family consensus (Kenan et al., 1991). The PTB motifs contain quite unusual amino acids relative to the consensus (indicated by shading). Positions of conserved hydrophobic positions are indicated by an asterisk.

The strongest determinants for RNA binding have been identified by deletion mutagenesis to reside in the two C‐terminal RRM domains of PTB (Kaminski et al., 1995; Pérez et al., 1997a; Oh et al., 1998). To gain a clearer understanding of the mode of action of the protein in splicing and translation initiation and of the specificity determinants of its unusual RRM domains, we have undertaken NMR‐based structural studies of a C‐terminal fragment that contains these two domains, designated PTB‐34. The three‐dimensional solution structure of PTB‐34 reveals an unanticipated C‐terminal extension of the β‐sheet RNA binding surface of RRM‐3 by an additional strand and shows that the two linked RRM‐3 and RRM‐4 domains tumble independently in solution, having no fixed relative orientation. In addition, site‐directed mutagenesis and RNA binding experiments have been used to map the interaction of PTB with IRES RNA.

Results and discussion

Structure determination

The expression and sequence‐specific 1HN, 15N, 13Cα, 13Cβ and 13C′ assignments of PTB‐34 (residues 335‐531) were performed as described previously (Conte et al., 1999). The side chain resonance assignments were mainly achieved using HCCH‐TOCSY experiments (Bax et al., 1990). The distance restraints used in the structure calculation were obtained from 3D 1H/15N and 1H/13C‐edited NOESY‐HSQC experiments (Fesik and Zuiderweg, 1988; Norwood et al., 1990), as well as from 15N/13C and 13C/13C‐edited HMQC‐NOESY‐HSQC experiments (Vuister et al., 1993). The dihedral angle restraints were determined using coupling constants and the program TALOS (Cornilescu et al., 1999). The structures were calculated with the program X‐PLOR using a dynamic simulated annealing protocol (Nilges et al., 1988; Brünger, 1993) on the basis of 702 total nuclear Overhauser effect (NOE) conformational sensitive distance restraints, 120 H‐bond distance restraints and 223 dihedral angle restraints. The relatively low number of long‐range NOEs is caused by severe spectral overlap from the unstructured and extensive linker region. Despite this, the backbone atoms of the secondary structure elements are well defined.

Figure 2 shows a final family of 20 structures for RRM‐3 and RRM‐4. The two domains were superimposed separately, since in solution they tumble independently with respect to each other (see below). The overall root mean square deviation (r.m.s.d.) between the family and the mean co‐ordinate position is 0.62 and 0.60 Å for the backbone atoms in regions of secondary structure for RRM‐3 and RRM‐4, respectively. None of the 20 structures has any distance violation >0.2 Å or dihedral angle violation >4.0°. The structural calculation statistics are shown in Table I.

Figure 2.

Superposition of 20 refined structures for PTB RRM‐3 and PTB RRM‐4 domains. The backbone traces for residues 334‐433 in RRM‐3 and 449‐531 in RRM‐4 are shown. The N‐ and C‐termini are indicated for each domain and β‐strands are numbered.

View this table:
Table 1. Summary of structural statistics for PTB‐34

Tertiary structure of PTB‐34

PTB‐34 has a tandem domain structure in which the RRMs are tethered by a linker peptide of 25 residues in length. While the conformations of the secondary structural elements within RRMs 3 and 4 are well defined, the inter‐domain linker region of PTB‐34 was mostly unassigned (Conte et al., 1999). Attenuation of resonances for most of the 25 residues forming the linker region could be caused by internal mobility on a slow or intermediate timescale. In addition, we found no evidence of contact between the two RRM domains, which appear to tumble independently in solution.

Although RRM‐4 contains the βαββαβ topology expected for an RRM domain, the C‐terminal extension of RRM‐3 incorporates an additional fifth β‐strand (Figures 2, 3, 4), a feature that is unprecedented for this family of proteins. The inter‐residue NOE network reveals that this strand—designated β5 (residues 426‐430)—lies in an antiparallel fashion to β2 on one side of the β‐sheet and is connected to β4 on the opposite side of the domain by an extended loop (Figure 3). This loop (residues 409‐425) appears to be very mobile and is unstructured apart from a single helical turn close to the start of β5. In our construct the linker makes no observable contact with the rest of the protein; however, it is positioned above the RNA binding surface and may play a role in RNA recognition (see below).

Figure 3.

Ribbon diagrams for PTB‐34 and Sex‐lethal. (A) Comparison of tandem domain structures of PTB and Sex‐lethal. The relative orientation of the two domains shown for PTB is arbitrary, as is the structure of the inter‐domain linker. The structure of Sex‐lethal was solved crystallographically in the presence of bound RNA (Handa et al., 1999), which has been omitted from the figure. (B) Comparison of RRM‐3 and RRM‐4 domains from PTB with RRM‐1 of Sex‐lethal. PTB RRM‐3 contains an additional strand (β5) on one side of the RNA binding surface. Note that the conformation shown for the β4‐β5 loop is only one of many conformations that are consistent with the data (see Figure 2).

Figure 4.

Structure‐based sequence alignment of RRM‐3 and RRM‐4 from PTB. Secondary structure elements are depicted above the sequences. Identical residues are shaded and the RNP‐1 and RNP‐2 motifs are boxed.

Helices α1 and α2 of RRM‐3 are very well defined and pack against the β‐sheet on the opposite side to the RNA binding surface, creating an extensive hydrophobic core (Figures 2 and 3). Several loop residues participate in this hydrophobic interaction, in particular loops α1‐β2 and β3‐α2, which appear to be very well defined. To characterize the dynamic aspects of PTB‐34, the {1H}‐15N NOE values were determined (data not shown). The observation of significantly smaller {1H}‐15N NOE values (0.1‐0.6) in the extended β4‐β5 linker indicates that this region undergoes fast internal motion in the pico‐ to nanosecond timescale, so the poor convergence of the β4‐β5 loop and the lack of observable NOEs result from intrinsic flexibility. Smaller than average NOE values are also observed for some residues in loops β1‐α1 and α2‐β4, which are less well defined than the core of RRM‐3. The β2‐β3 loop also appears to be disordered in solution. Several of the resonances in this loop were very weak (L370, N372, K373) or unassigned (F371), preventing measurement of their {1H}‐15N NOE values; however, attenuation of these signals could be caused by internal mobility of these regions on a slow or intermediate timescale (μs‐ms).

The structure of RRM‐4 is very similar to that of canonical RRM motifs (Figures 2, 3, 4) and contains just the standard four‐strand β‐sheet within a βαββαβ topology—there is no additional fifth β‐strand. The inner surface of the four‐stranded β‐sheet forms an extensive hydrophobic core with the inner surface of the two α‐helices. As observed for RRM‐3 the secondary structure elements are very well defined, as is loop β3‐α2. In contrast, loops β1‐α1, α1‐β2 and α2‐β4 of RRM‐4 have rather small {1H}‐15N NOE values and appear to be more disordered in solution than the corresponding regions in RRM‐3. As was found for RRM‐3, signals in the β2‐β3 loop of RRM‐4 were weak (K489, D490) or missing (Q488), suggesting an intrinsic flexibility for this loop in the micro‐ to millisecond timescale.

Analysis of RNA binding

The structure of PTB‐34 reveals that the RNA binding domains, despite their unusual RNP‐1 and RNP‐2 motifs, have much in common with other RRM proteins. To examine the role of the unusual motif sequences in RNA binding and to probe for additional determinants of binding specificity, mutagenesis of RRMs 3 and 4 was performed. Filter binding assays were used to measure the affinity of the various PTB fragments and mutants (Figure 5A) for radiolabelled RNA transcripts containing either the full‐length FMDV or EMCV IRES, or domain 1 from the EMCV IRES (Figure 5B). Domain 1 contains nucleotides 260‐444 of the EMCV IRES and incorporates stem‐loops D‐H (Kaminski et al., 1995); stem‐loop H is considered to have the highest affinity PTB binding site (Jang and Wimmer, 1990; Witherall et al., 1993).

Figure 5.

Overview of protein and RNA constructs used in this study. (A) Schematic depiction of PTB constructs; RRM domains are indicated by shading. (B) Schematic diagram of the EMCV IRES; domain 1 is boxed.

The binding affinities of hisPTB for domain 1 and the full‐length EMCV IRES are comparable (Table II), confirming that domain 1 contains the most important PTB binding site. The full‐length hisPTB protein binds IRES RNA with dissociation constants in the range 0.03‐0.06 μM, which is consistent with previous data (Witherall et al., 1993). Truncation of hisPTB to hisPTB‐34a, which includes just the third and fourth RRM domains, reduced binding to EMCV IRES targets ∼7‐fold (Figure 6), although the loss of affinity was somewhat less for FMDV IRES (Table II). This may be due in part to loss of the dimerization function of the N‐terminal domain of PTB (Pérez et al., 1997a), but it may also suggest that the N‐terminal RRMs contribute more to RNA binding than was previously thought. The single‐domain constructs hisPTB‐3a, hisPTB‐3b and hisPTB‐4a had even lower affinities for EMCV IRES domain 1, although the third RRM binds significantly more tightly than the fourth. This is somewhat at odds with UV cross‐linking experiments, which found no RNA binding activity associated with single‐domain constructs of PTB (Kaminski et al., 1995; Pérez et al., 1997a), although the discrepancy is probably attributable to the fact that constructs used in UV experiments contained neither the fourth nor the fifth β‐strands of the domain.

Figure 6.

Binding curves for the interaction of full‐length and truncated PTB constructs with EMCV IRES domain 1 RNA. Experimental details are given in Materials and methods. The data for each protein have been normalized. Analysis of such curves was used to determine dissociation constants for all the PTB constructs and mutants used in this study (Tables II and III).

View this table:
Table 2. Dissociation constants (μM) for binding of deletion mutants of PTB to IRES RNA

Constructs hisPTB‐3a and hisPTB‐3b (Figure 5A) differ in that only the latter fragment contains the β5 strand but the proteins have the same affinity for the EMCV IRES, suggesting that β5 and the β4‐β5 linker do not contribute to RNA binding in the context of these single‐domain fragments. Nonetheless, these features may well play a role in RNA recognition in the presence of the fourth domain or in the context of the intact protein.

To probe the contribution to RNA recognition of individual amino acids on the binding surface, single and double amino acid substitutions were introduced into hisPTB‐34a. Binding experiments with these mutants gave a number of surprising results, summarized in Table III. Several of the mutations within the core RNP‐1 and RNP‐2 motifs, including N343A, Q380A, N460A and Q497A, have almost no effect on binding. These residues lie towards the top and bottom edges of the RNA binding surface (Figure 7) and do not always contact the nucleic acid ligand (Ding et al., 1999; Handa et al., 1999). The mutation R491S at the start of the RNP‐1 motif in RRM‐4 has little, if any, effect on RNA binding, while the more drastic double mutation K373E/K374E at the equivalent position in RRM‐3 did reduce the RNA affinity 4‐fold. These findings stand in marked contrast to results for other RRM proteins such as U1A or Sex‐lethal, which lose all RNA binding affinity if the basic residue conserved at the start of RNP‐1 is mutated (Nagai et al., 1990; Lee et al., 1997).

Figure 7.

Mapping of sites of mutation onto the structure of PTB‐34. The RNP‐1 and RNP‐2 motifs are indicated on the protein structure by dark shading. Large spheres (with underlined labels), positions of mutations that caused >1.5‐fold decrease in binding; small spheres, positions of mutations that caused <1.5‐fold decrease in binding (see Table II).

View this table:
Table 3. Dissociation constants (μM) for binding of mutants of hisPTB‐34a to EMCV IRES RNA

The unusual RNP‐2 and RNP‐1 sequences in PTB mean that, unlike most other RRM proteins, the RNA binding surface is almost completely devoid of aromatic residues that might stack with the RNA bases. Nevertheless, the unusual residues that occur within the RNP‐1 and RNP‐2 motifs of PTB (Figure 1) do contribute to RNA binding. For example, mutations L378S and H457A, which are in positions normally occupied by conserved aromatic amino acids in the vast majority of RRM proteins, both significantly reduce IRES binding. These data confirm that, despite possessing unusual RNP‐1 and RNP‐2 motifs, the β‐sheet surface on RRM‐3 and RRM‐4 forms an RNA binding site. Mutations K368E and K485E, which lie in equivalent positions in β2 of RRM‐3 and RRM‐4, respectively, both substantially decrease RNA binding. This may be due to the loss of direct contacts with the RNA ligand. However, the structure shows that the K368 side chain can interact with E424 within the helical turn at the C‐terminal end of the β4‐β5 loop, so the impact of the substitution of this residue on RNA binding may be due to induced conformational changes in the protein. On balance, substitutions of amino acids in RRM‐4 are as deleterious as substitutions in RRM‐3, confirming that RRM‐4 is important for RNA binding (Table III; Figure 7). The linker connecting the two RRM domains also contributes to RNA binding, since the double mutation K439E/K440E significantly reduces the affinity. This is consistent with crystallographic structural analyses of Sex‐lethal and poly(A)‐binding protein (PABP) complexed with their respective RNA targets (Deo et al., 1999; Handa et al., 1999), which revealed specific contacts between the inter‐domain linker and the bound RNA, although the topology of the linker in PTB is likely to be very different because of the presence of the β5 strand, which does not occur in Sex‐lethal or PABP.

The observation that the effects of amino acid substitutions on the binding of hisPTB‐34a to IRES RNA are usually rather modest contrasts sharply with studies on other RRM proteins, such as U1A and Sex‐lethal (Nagai et al., 1990; Lee et al., 1997), which found more severe reductions in binding associated with single amino acid substitutions. It may be that the PTB‐IRES interaction is less specific than those of other RRM proteins or that PTB relies on dimeric, multi‐domain contacts for specific recognition.

We also assessed the affinity of PTB for an intron target, an RNA oligomer containing 15 bases from the sequence of the polypyrimidine tract of the downstream regulatory element of exon 3 from α‐tropomyosin (Pérez et al., 1997b). This oligomer (Dpy) has the sequence UUUCUCCUCUUCUUU and is bound by hisPTB and hisPTB‐34a with dissociation constants of 0.17 and 1.3 μM, respectively. Using a different assay, the affinity of PTB for a slightly longer DPy oligomer was previously reported to be 0.003 μM (Pérez et al., 1997b) but at present we have no satisfactory explanation for the discrepancy. Our finding that PTB binds much more tightly to IRES targets may be due to the presence of more than one binding site on the significantly larger viral RNA.

Comparison with other modular RRM proteins

Structural similarities of PTB‐34 to other RRM proteins were assessed using the program DALI (Holm and Sander, 1993). Apart from the novel fifth strand revealed in PTB RRM‐3 and despite the presence of unusual residues in the RNP‐1 and RNP‐2 motifs, the remainder of domain 3 and the whole of RRM‐4 are very similar to the classical RRM fold (Figure 3). The closest structural homologues of PTB RRM‐3 were found to be Sex‐lethal RRM‐1 (Handa et al., 1999; r.m.s.d. of 3.7 Å over 82 equivalent Cα atoms, with a Z score of 7.0) and U2AF65 RRM‐2 (Ito et al., 1999; 2.6 Å over 75 sites, Z = 6.8). For PTB RRM‐4 the closest homologues were again Sex‐lethal RRM‐1 (2.8 Å over 67 sites, Z = 6.1) and U2AF65 RRM‐2 (3.3 Å over 72 sites, Z = 6.0). Curiously, both Sex‐lethal and U2AF65, like PTB, recognize polypyrimidine sequences.

PTB RRM‐3 differs from other members of the RRM family in that it possesses an additional, fifth β‐strand and a long flexible loop between β4 and β5. A survey of PTB homologues in the sequence database indicates that the β5 strand in RRM‐3 is conserved in mammals, Caenorhabditis elegans and plants (Figure 8). Homologous proteins include NPTB (a recently identified neurone‐specific variant of PTB; 74% identity; Doug Black, personal communication), ROD1 (a mammalian protein implicated in differentiation control; 74% identity; Yamamoto et al., 1999) and hnRNP‐L (23% identity), which has already been noted as similar to PTB (Ghetti et al., 1992). In all cases the homologues have four RRM domains and, as in PTB, the predicted fifth strand occurs only in the third domain. The presence of a fifth β‐strand in more distantly related RRM proteins has not yet been detected; several are known that have longer inter‐domain linkers than PTB‐34 (Shamoo et al., 1995) but the sequence determinants for this feature are not yet sufficiently well defined to permit an effective database search.

Figure 8.

Alignment of PTB and homologous sequences in the region of β4 and β5 of RRM‐3, indicating the conservation of β5 in eukaryotes. DDBJ/EMBL/GenBank accession Nos for PTB and homologue sequences are: human (X62006), pig (X93009), rat (Q00438), C.elegans (Z36948), Arabidopsis thaliana (AF076924), NPTB (AF176085), ROD1 (AB023967) and hnRNP‐L (X16135).

N‐ and C‐terminal extensions to the core βαββαβ structure of canonical RRM domains are important for RNA recognition (Query et al., 1989; Scherly et al., 1989; Lutz‐Freyermuth et al., 1990; Gürlach et al., 1994; Deo et al., 1999; Handa et al., 1999). The β4‐β5 loop of PTB RRM‐3 occupies a similar position to the C‐terminal helix of U1A protein, which is implicated in RNA binding (Scherly et al., 1989; Lutz‐Freyermuth et al., 1990), although there are differences between the two structures. In the absence of bound RNA, the C‐terminal helix in U1A packs in a well defined manner across the RNA binding β‐sheet (Avis et al., 1996) and prevents solvent exposure of hydrophobic residues. However, binding of RNA displaces the helix to allow the ligand to form a close network of interactions involving residues of the β‐sheet and the C‐terminal helix (Oubridge et al., 1994; Allain et al., 1997). In contrast, the loop β4‐β5 of PTB RRM‐3 is highly disordered in solution, and no interaction was found with the rest of the molecule. Nevertheless, its position with respect to the central β‐sheet suggests that it may function as determinant for RNA specificity, perhaps making specific hydrogen bonds with RNA bases that help to clamp the bound RNA in position. Consistent with this notion, the β4‐β5 region is rich in glutamine residues, which, along with asparagines, can make specific hydrogen bonds with RNA bases (Handa et al., 1999). It is interesting to note that the C‐terminus of PTB RRM‐4 extends for five amino acids beyond the end of β4 and lies close to the RNA binding surface, and thus may also contribute to RNA recognition by that domain. The loops joining the conserved secondary structure elements of RRM domains are the most variable stretches between different RRM proteins, differing in length, amino acid sequence, conformation and flexibility (Kenan et al., 1991). Both crystallographic and NMR data indicate that the β2‐β3 loop, which can play a central role in RNA recognition, is rather flexible (Nagai et al., 1990; Allain et al., 1997; Shamoo et al., 1997; Handa et al., 1999; Nagata et al., 1999); this flexibility appears to allow induced fit of the RNA ligand (Mittermaier et al., 1999). The β2‐β3 loops in both RRM‐3 and RRM‐4 of PTB also appear flexible, and may well be important for RNA binding, although our mutagenesis results suggest that they make less of a contribution to RNA binding than has been observed in other cases.

PTB is predicted to be a modular RRM protein containing four independently folded domains (Ghetti et al., 1992), two of which occur in PTB‐34. To our knowledge, this is the first report of the solution structure of an RRM protein containing more than one domain, although the structures of double‐domain fragments from three modular RRM proteins have been determined by X‐ray crystallography. Crystallographic analyses of bound and free forms of such fragments have already suggested that in the absence of bound nucleic acid, the pair of RRM domains has some relative freedom of movement (Shamoo et al., 1997; Xu et al., 1997; Crowder et al., 1999; Ding et al., 1999; Handa et al., 1999). RNA binding is coupled with stabilization of RRM inter‐domain contacts and rigidification of the inter‐domain linkers. In our present analysis of the NMR structure of PTB‐34, the extent of flexibility of the inter‐domain linker is particularly striking (Figure 3): there are no inter‐domain contacts whatsoever and the two RRM domains appear to tumble independently, the long linker region making no fixed contacts with the rest of the protein, presumably due to disorder. As with other modular RRM proteins, a fixed conformation of PTB‐34 may only occur upon binding, although it is worth noting that PTB has a 25‐residue linker, which is significantly longer than those found in Sex‐lethal (10), PABP (12) or hnRNP‐A1 (16) (Shamoo et al., 1995).

Superposition of the RRM‐RNA or RRM‐DNA crystal structures that have been published to date (U1A, U2B″, Sex‐lethal, PABP, hnRNP‐A1) shows that, despite the diversity of nucleic acid ligands, there are common features to the mode of binding. The centre of the RRM binding surface generally interacts with single‐stranded RNA or DNA lying in a 5′ to 3′ direction across β1‐β3‐β2 at an angle of ∼45° to the direction of the β‐strands. Moreover, the central pair of bound nucleotides, which most directly overlie the RNP‐2 and RNP‐1 motifs (on β1 and β3, respectively), occur in more or less the same position in all the structures, the planes of the bases being approximately parallel to the surface of the β‐sheet. In marked contrast, beyond this central region there are variations in the conformations of the 3′ ends of the bound nucleic acid and these can be attributed primarily to the disposition of the N‐ or C‐terminal extensions to the RRM domain (Figure 9). When these structures are superposed on PTB RRM‐3 it is clear that the PTB domain may well bind the central pair of nucleotides of its cognate RNA in a manner very similar to that found for other RRM proteins, a proposal supported by our mutagenesis data (Figure 7). However, the superposition also shows that the 3′ end of the ligands observed for the other RRM proteins would clash with the C‐terminal end of the β4‐β5 loop (Figure 9). Thus, in PTB RRM‐3 either the bound RNA adopts a different path to that observed previously or binding of RNA would affect the conformation adopted by the β4‐β5 loop and possibly displace β5 itself.

Figure 9.

The role of sequences flanking the core RRM domain in RNA recognition. (A) The structure of U2B″ complexed with its cognate RNA (Price et al., 1998). For ease of comparison with the other structures only a portion of the stem‐loop RNA (nucleotides 8‐14) is shown and U2A′ (which does not interact with the portion of RNA included in the figure) has been omitted. The C‐terminal helix is crucial for high‐affinity RNA binding. (B) Structure of RRM‐2 of PABP bound to poly(A) (nucleotides 1‐7; Deo et al., 1999). The figure shows one RRM from the two‐domain fragment that was present in the crystal. The N‐terminal helical turn preceding β1 interacts specifically with the bound RNA. (C) Structure of RRM‐2 of Sex‐lethal bound to its cognate RNA (nucleotides 1‐11; Handa et al., 1999). RRM‐1, which lies on the right and also interacts with the RNA, has been omitted from the figure. (D) Superposition of the RNAs from U2B″, PABP and sex‐lethal onto PTB RRM‐3. The matrix required to superpose each protein domain on PTB RRM‐3 was determined and applied to the RNA. In each case the conformation of the bound RNA clashes with the C‐terminal end of the β4‐β5 loop in PTB RRM‐3 (indicated by arrows).

A more detailed understanding of PTB‐RNA interactions will require structural analysis of protein‐RNA complexes. Work is in now in progress to optimize RNA targets, which will permit a high resolution structural analysis of PTB‐34 complexed with RNA.

Materials and methods

Plasmid construction and protein expression

A 198‐residue C‐terminal fragment of human PTB‐1 (Gil et al., 1991), which contains the third and fourth RRM domains, was subcloned by PCR and ligated into pET‐15b (Novagen) using engineered NcoI and NdeI restriction sites (Conte et al., 1999). The resulting fragment (residues 335‐531; designated PTB‐34) has the authentic C‐terminus and an additional methionine at the N‐terminus. The protein was overexpressed in BL21 (DE3) Escherichia coli. Cell pellets were lysed in 25 mM HEPES pH 7.25, 1.5 mM MgCl2, 0.2 mM EDTA (buffer A) containing 0.2 M KCl. The protein was applied to a 20 ml cation exchanger column (POROS HS50; Perkin‐Elmer Biosystems) and eluted with a linear 0.2‐1.0 M KCl gradient in buffer A; PTB‐34 eluted at 0.45 M KCl. The peak fractions were pooled, diluted to 0.2 M KCl with buffer A and loaded onto a 5 ml HiTrap Blue column (Cibacron Blue 3G; Amersham‐Pharmacia Biotech). PTB‐34 eluted as an almost homogenous protein at 1.3 M KCl. Peak fractions were pooled, dialysed into 20 mM sodium acetate pH 5.2 and concentrated to 1 mM.

A variety of histidine‐tagged PTB constructs were prepared for use in binding assays. These constructs were generated by PCR to introduce 5′ BamHI and 3′ HindIII restriction sites, which were used to incorporate the insert into expression vector pQE‐9 (Qiagen), which adds an N‐terminal histidine tag to the expressed protein. The details of the various constructs are summarized in Figure 5A. HisPTB‐34 is simply a tagged version of the protein PTB‐34 that was used for all the NMR studies. HisPTB‐34a has a slightly longer N‐terminus than hisPTB‐34; this construct starts at residue 324 and incorporates an Arg to Ser mutation at residue 325 due to the introduction of a BamHI site. The single‐domain constructs for RRM‐3 (hisPTB‐3a and hisPTB‐3b) have the same N‐terminus as hisPTB‐34a, since attempts to generate these constructs with the same N‐terminus as hisPTB‐34 resulted in very poor expression. HisPTB‐4a was engineered with a Phe to Tyr substitution at residue 446 (just upstream of the RRM domain) to give the protein a non‐zero absorbance at 280 nm and thereby facilitate accurate quantitation of the concentration of the purified protein. Site‐directed mutagenesis was performed in the context of the hisPTB‐34a construct using overlap PCR. All mutations were confirmed by cDNA sequencing.

His‐tagged PTB fragments and mutants were expressed in SG13009 E.coli (Qiagen). Cell pellets were lysed into 20 mM Tris pH 7.7, 250 mM NaCl (buffer B) and purified on TALON® metal affinity resin (Clontech) using the manufacturer's protocol. The purified protein was eluted with buffer B containing 100 mM imidazole. In all cases except one the protein was at least 95% pure as judged by Coomassie staining of SDS polyacrylamide gels. The mutant hisPTB‐34a(L378S) gave a relatively low yield of soluble protein that was ∼90% pure.

NMR spectroscopy

NMR sample preparation and sequence‐specific 1HN, 15N, 13Cα, 13Cβ and 13C′ assignments have been reported previously (Conte et al., 1999). Side chain resonance assignments have been achieved using HCCH‐TOCSY experiments (Bax et al., 1990). NMR spectra were acquired at 302 K on a four channel Bruker DRX500 equipped with a z‐shielded gradient and triple resonance probe. Distance restraints used in structure calculations were obtained from 1H/15N and 1H/13C‐edited NOESY‐HSQC experiments (Fesik and Zuiderweg, 1988; Norwood et al., 1990), as well as from 15N/13C and 13C/13C‐edited HMQC‐NOESY‐HSQC experiments (Vuister et al., 1993). Dihedral φ angles were obtained estimating 3JHNα coupling constants from HNHA experiments (Kuboniwa et al., 1994). Hydrogen‐bonded NH groups were identified by the presence of amide proton resonances in HSQC spectra recorded 12 h after dissolving in D2O. {1H}‐15N NOE experiments with minimal water saturation were acquired using the pulse‐sequence of Farrow et al. (1994). All experiments incorporated gradient sensitivity enhancement (Kay et al., 1992). NMR data were processed using XWINNMR and analysed using AURELIA (Neidig et al., 1995).

Structure calculation

NOE cross peak intensities were measured at 100 and 120 ms mixing times. Fifty structures were calculated from random starting co‐ordinates on the basis of 702 NOE distance restraints, composed of six intraresidue, 464 short range (residue i to residue i + j, where 1 < j ≤ 4) and 238 long range (residue i to residue i + j, where j > 4) connectivities, 223 dihedral angle restraints, composed of 112 φ and 111 ψ angles, and 120 H‐bond distance restraints. We used only conformational‐sensitive NOEs, i.e. intraresidue conformational insensitive distance restraints were not employed. A dynamics simulated annealing protocol executed within the program X‐PLOR (Nilges et al., 1988; Brünger, 1993) was employed. The restraints were distributed as follows: five intraresidue, 221 short range and 123 long range distance restraints, 115 dihedral angle and 62 H‐bond restraints in RRM‐3, and one intraresidue, 237 short range and 115 long range distance restraints, 108 dihedral angle and 58 H‐bond restraints in RRM‐4. The distance restraints were calibrated internally using known sequential distances. NOEs observed at 120 ms mixing time were placed in three categories on the basis of estimated cross peak intensities: strong (< 2.8 Å), medium (< 3.5 Å) and weak (< 5.0 Å). The dihedral angle restraints were calculated from 3JHNα coupling constants and by using the backbone torsion angle prediction package TALOS (Cornilescu et al., 1999). Excellent agreement was found for φ backbone angle values from 3JHNα measurements and from TALOS. Of the 97 good φ/ψ predictions only one φ angle was found in disagreement with the 3JHNα coupling constant. Interactive analysis of the TALOS predictions, together with 3JHNα coupling constant data, allowed additional dihedral restraints (14 φ/ψ predictions and one φ angle) to be included in the structure calculation. The final family comprised the 20 lowest total energy structures, which contain no distance violation >0.2 Å and no dihedral violation >4.0°.

Preparation of RNA

EMCV IRES plasmids were the kind gift of Richard Jackson and Ann Kaminski. The construction of plasmid pSG6A, which contains nucleotides 259‐848 of the EMCV IRES fused to the open reading frame of poliovirus 2A protein, has been described elsewhere (Kaminski et al., 1994; Kaminski and Jackson, 1998). pSG6A is derived from plasmid pSG1 and contains six adenines in the ‘A bulge’ within the JK stem‐loop (Kaminski and Jackson, 1998). Plasmid pGEM1‐Dom1 contains nucleotides 260‐444 corresponding to domain 1 of the EMCV IRES (Kaminski et al., 1995). A plasmid containing the full‐length IRES of FMDV strain O1K was kindly provided by Graham Belsham [pSP64‐FMD(polyA)]. Transcription reactions were performed using methods similar to those described previously (Kaminski et al., 1990). Typically, 20 μl reactions, programmed with 1 μg of linearized DNA, were performed in 40 mM Tris pH 8, 15 mM MgCl2, 5 mM dithiothreitol (DTT), 6 U of RNAGuard (Amersham‐Pharmacia Biotech), 1 mM each of ATP, CTP and GTP, 0.5 mM UTP, 50 μCi of [α‐32P]UTP (at >3000 Ci/mmol) and 1 μl of purified T7 RNA polymerase (provided by A.Kaminski). The reactions were incubated for 30 min at 37°C; unincorporated nucleotides were removed using NucTrap gel filtration columns (Stratagene).

Synthetic RNA oligonucleotides were prepared and purified by HPLC by Oswell Research Products Ltd. For use in binding assays the RNA oligonucleotides were 5′‐end‐labelled with [γ‐32P]ATP in reactions catalysed by T4 polynucleotide kinase (USB). Unincorporated nucleotides were removed by the use of Microspin G‐25 spin columns (Amersham‐Pharmacia Biotech).

Binding assays

The affinity of the various PTB fragments and mutants for radiolabelled RNA targets was assessed using filter binding assays. PTB‐RNA binding reactions (75 μl) were prepared and incubated for at least 15 min at room temperature (24°C) in 10 mM HEPES pH 7.25, 100 mM KCl, 3 mM MgCl2, 5% glycerol, 1 mM DTT, 50 μg/ml yeast tRNA (Boehringer Mannheim), 50 μg/ml human serum albumin (Delta Biotechnology). The RNA concentration was typically fixed at 4 nM. Assays were performed using the protein‐binding Protran BA‐85 nitrocellulose membrane (Schleicher and Schuell). The membrane was washed extensively in 10 mM HEPES pH 7.25, 3 mM MgCl2, 5% glycerol, 1 mM DTT and mounted on a 96‐well dot‐blotter (Bio‐Rad). Before and after application of 65 μl of the binding reaction, the membrane was washed with 180 μl of wash buffer. Following the experiment, the membrane was dried and the quantity of bound PTB‐RNA complex determined by scintillation counting of Cerenkov radiation.

The results of binding assays were analysed using simple equilibrium binding models for the protein‐RNA interaction. For the truncated proteins, a single binding site was assumed. HisPTB, which is a dimer in solution (Pérez et al., 1997a), was assumed to bind in either of two orientations to a single site on the RNA target. Such a scheme may be an over‐simplification of the true interaction but provides a reasonable fit to the data. The values reported for dissociation constants in Tables II and III are the average of at least two independent experiments; variation between individual experiments was typically <20%.


Structure co‐ordinates have been submitted to the Protein Data Bank, accession code 1qm9.


We are grateful to Professor P.A.Sharp (MIT, USA) for the gift of human PTB cDNA, to Richard Jackson and Ann Kaminski (Cambridge, UK) for the gift of EMCV IRES plasmids, to Graham Belsham (Pirbright, UK) for the gift of a plasmid containing the full‐length IRES of FMDV strain O1K, to Doug Black (UCLA, USA) for providing us with information prior to publication, to Chris Smith (Cambridge, UK) for valuable discussions and to Peter Brick for critical reading of the manuscript. The authors are indebted to The Wellcome Trust and the BBSRC for financial support. S.M. and S.C. are members of the Imperial College Centre for Structural Biology. J.G. acknowledges the award of an MRC studentship. A list of NMR assignments and restraints is available from M.R.C. (sasi.conte{at}