Advertisement

Structural basis of the RNA‐binding specificity of human U1A protein

Frédéric H.‐T. Allain, Peter W.A. Howe, David Neuhaus, Gabriele Varani

Author Affiliations

  1. Frédéric H.‐T. Allain22,
  2. Peter W.A. Howe33,
  3. David Neuhaus1 and
  4. Gabriele Varani*,1
  1. 1 MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK
  2. 2 Kemisk Institut V, Copenhagen University, Universitetsparken 5, DK‐2100, Copenhagen Ø, Denmark
  3. 3 Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, 90095‐1569, USA
  1. *E-mail: gv1{at}mrc-lmb.cam.ac.uk

Abstract

The RNP domain is a very common eukaryotic protein domain involved in recognition of a wide range of RNA structures and sequences. Two structures of human U1A in complex with distinct RNA substrates have revealed important aspects of RNP‐RNA recognition, but have also raised intriguing questions concerning the origin of binding specificity. The β‐sheet of the domain provides an extensive RNA‐binding platform for packing aromatic RNA bases and hydrophobic protein side chains. However, many interactions between functional groups on the single‐stranded nucleotides and residues on the β‐sheet surface are potentially common to RNP proteins with diverse specificity and therefore make only limited contribution to molecular discrimination. The refined structure of the U1A complex with the RNA polyadenylation inhibition element reported here clarifies the role of the RNP domain principal specificity determinants (the variable loops) in molecular recognition. The most variable region of RNP proteins, loop 3, plays a crucial role in defining the global geometry of the intermolecular interface. Electrostatic interactions with the RNA phosphodiester backbone involve protein side chains that are unique to U1A and are likely to be important for discrimination. This analysis provides a novel picture of RNA‐protein recognition, much closer to our current understanding of protein‐protein recognition than that of DNA‐protein recognition.

Introduction

The RNP domain is one of the most common eukaryotic protein sequence motifs (Hodgkin, et al., 1995), found in hundreds of RNA‐binding proteins (Kenan et al., 1991; Birney et al., 1993; Burd and Dreyfuss, 1994a; Nagai et al., 1995). Structural analysis of RNP domains from several different proteins (Nagai et al., 1990; Hoffman et al., 1991; Wittekind et al., 1992; Garrett et al., 1994; Lu and Hall, 1995; Avis et al., 1996) has demonstrated that the domain folds into a compact αβ structure with a four‐stranded β‐sheet packed against two α‐helices through an extensive hydrophobic core. RNP proteins bind RNA with exquisite specificity utilizing this common structure. Site‐directed mutagenesis (Nagai et al., 1990; Jessen et al., 1991) and NMR studies (Gürlach et al., 1992; Howe et al., 1994; Kanaar et al., 1995) identified the surface of the β‐sheet as the primary site of RNA recognition. The crystal structure of the complex between the N‐terminal domain of human U1A protein and stem‐loop II of U1 snRNA (Oubridge et al., 1994) revealed the structural basis for RNP‐RNA recognition. The subsequent NMR structure of the same protein domain in complex with the polyadenylation inhibition element (Allain et al., 1996), together with NMR structures of the free components (Avis et al., 1996; Gubser and Varani, 1996), revealed that intermolecular recognition requires extensive conformational changes in both protein and RNA components.

The two structures of complexes of human U1A protein with distinct RNA substrates have clarified many important aspects of RNP‐RNA recognition, but have also raised intriguing questions concerning the molecular basis of specificity. The structure of the U1A‐hairpin complex (Oubridge et al., 1994) showed that seven single‐stranded nucleotides that represent the primary recognition site are involved in a very extensive network of hydrogen bonds with residues located on the surface of the β‐sheet and in a loop immediately C‐terminal to the domain. This observation would seem to imply that recognition of unpaired bases determines the specificity of RNP protein. This is somewhat puzzling, since the sequence diversity on the surface of the β‐sheet is limited: RNP proteins are identified by highly conserved amino acids within the two central strands of the β‐sheet itself (Birney et al., 1993). How can this region form the basis for so many different and highly specific interactions of RNP proteins, when it is itself so highly conserved? Another puzzle is that two proteins (U1A and U2B″) that share essentially all residues involved in recognition of the single‐stranded nucleotides have different substrate specificities (Scherly et al., 1990; Bentley and Keene, 1991).

The conservation of amino acids on the RNP domain β‐sheet surface implies that much of its role is to provide a generic RNA‐binding surface, while the specificity of the interaction is defined largely by the variable loops connecting the secondary structural elements of the protein (Kenan et al., 1991). The refinement of the structure of the U1A complex with the polyadenylation inhibition element (PIE) reported here clarifies the role of the RNP domain specificity determinants (the variable loops of the domain) in recognizing a unique RNA three‐dimensional structure. These results provide new insight into the molecular basis of the ability of RNP proteins to discriminate between RNA substrates.

Results

Structure refinement

Human U1A protein comprises two RNP domains separated by a linker containing a diffuse nuclear localization signal. The determinants of the affinity and specificity of this protein reside entirely within residues 1‐102, i.e. the first RNP domain and the residues immediately C‐terminal to it (Scherly et al., 1990, 1991; Hall and Stump, 1992). The structure of the complex between a 30 nucleotide internal loop RNA derived from the U1A polyadenylation inhibition element (Figure 1A) and the U1A protein RNA‐binding domain (amino acids 2‐102) (Figure 1B) was determined using a very extensive set (≈2600) of NMR‐derived experimental constraints (Allain et al., 1996). These data defined most of the structure and the intermolecular interface to high precision. However, critical parts of the interface (loop 1 and loop 3) were defined less precisely. The structure has been refined to improve the definition of these key parts of the intermolecular interface, resulting in the present precision of ≈1 Å for the entire intermolecular interface (Table I).

Figure 1.

(A) Secondary structures of one of two repeated U1A‐binding motifs from the polyadenylation inhibition element (left) and of stem‐loop II of U1 snRNA (right); nucleotides involved in protein‐RNA interactions are highlighted in bold. (B) Sequence and secondary structure of the N‐terminal domain of human U1A protein.

View this table:
Table 1. Statistics of NMR‐derived experimental constraints and structural statistics for 31 converged structures

The refinement was accomplished by using the previously reported set of experimentally determined structures (Allain et al., 1996) to resolve ambiguous nuclear Overhauser effect (NOE) interactions. Novel intermolecular NOE contacts involve primarily resonances from the RNA sugars; these signals are poorly dispersed and overlap with protein Hα resonances. An illustration of the difficulties encountered during this process is presented in Figure 2; a detailed technical description of this process is presented elsewhere (Howe et al., 1997). The new intermolecular NOE contacts involving H2′ and H3′ of A24 and H2′ of G25 were particularly important to define interactions between A24 and G25 and loop 3 of U1A (residues 45‐53). These intermolecular contacts, together with intramolecular interactions between A24 and G23, define precisely the relative position of the two helical stems beginning with G25.C38 and G23.C46. As a consequence of this improved definition, it has been possible to identify many more intermolecular interactions involving loop 3 and to establish that basic residues from loops 1 and 3 of U1A interact with the backbone phosphates of stem 2 (Figure 1A).

Figure 2.

Sections of a three‐dimensional 13C‐edited NOESY spectrum of the U1A‐RNA spectrum containing intermolecular NOE interactions between Ser48 resonances and A24 sugar resonances.

General features of the structure

A view of the refined structure is presented in Figure 3A. As previously reported (Allain et al., 1996), the RNA is severely kinked in the complex. The RNA is recognized by amino acids located on the surface of the β‐sheet and in the loops connecting the first strand of the β‐sheet with helix A (loop 1), the second and third strands of the β‐sheet (loop 3) and the loop connecting the end of β4 to helix C. The intimate nature of the protein‐RNA interface is particularly striking, the overall architecture of the complex being reminiscent of an enlarged protein structure. The RNA bases at the interface pack against hydrophobic protein side chains on the β‐sheet surface to form what is in essence an enlarged hydrophobic core, while the negatively charged backbone phosphates of the RNA are on the surface of the structure, directed away from the protein and towards the solvent (Figure 3B). This arrangement is strikingly different from that of DNA‐protein complexes. It also differs markedly from an early model for RNP‐RNA interaction for the related complex of U1A with stem‐loop II of U1 snRNA (Jessen et al., 1991; Howe et al., 1994). In that model, the RNA bases were exposed to solvent while the phosphates interacted with the protein; the model explained the observation that mutations of nucleotides in the single‐stranded loop cause only small decreases in affinity, while the phosphates are protected against chemical modifications (Jessen et al., 1991).

Figure 3.

(A) Global view of a converged structure of the U1A‐PIE complex. The RNA bases are splayed out across the surface of the β‐sheet of U1A to form extensive intermolecular interactions, while the phosphates (dark blue) remain exposed to solvent. (B) The superposition of 31 converged structures highlights the excellent definition of the intermolecular interface. The backbone at the C‐terminus of U1A N‐terminal RNP domain is highlighted in yellow.

Intermolecular interactions

A list of all intermolecular interactions observed in the refined structure is presented in Tables II. Intermolecular contacts have been divided between hydrophobic interactions (van der Waals and stacking, Table II); hydrogen bonding (Table III); electrostatic interactions or salt bridges (Tables IV).

View this table:
Table 2. Hydrophobic and stacking interactions observed in the complex between human U1A protein and PIE RNA
View this table:
Table 3. Hydrogen bonding interactions observed in the complex between human U1A protein and PIE RNA
View this table:
Table 4. Electrostatic interactions observed in the complex between human U1A protein and PIE RNA

Intermolecular interactions involving the surface of the β‐sheet and the seven common single‐stranded nucleotides are very similar between the present structure and the crystal structure of the related hairpin complex (Figure 4). The root mean square deviation (r.m.s.d.) between the average of the present NMR ensemble and the crystal structure is 1.13 Å for the seven unpaired nucleotides and 1.29 Å for the portion of the protein‐RNA interface involving those nucleotides. In this common region, differences between solution and crystal structures are most likely due to intrinsically greater flexibility in solution and to difficulties in detecting, by NMR, bound water molecules, which mediate several interactions in the crystal structure.

Figure 4.

Intermolecular RNA‐protein interactions in a low energy member of the family of converged structures. (A) Interactions between single‐stranded nucleotide residues both in and immediately preceding helix C, and on the surface of the β‐sheet; this view is identical to Figure 3B but a single structure is shown. (B) Electrostatic interactions involving stem 2 of the RNA and basic residues from loop 1 and loop 3. (C) Interactions of loop 3 in the present NMR structure and (D) in the crystal structure of the stem‐loop II complex (Oubridge et al., 1994).

Novel interactions identified in this refined structure include electrostatic contacts from Arg47, Lys23 and Lys96 side chains to the RNA phosphates of stem 2 (Table IV and Figure 4B). These interactions, and others involving loop 3, are very different or absent in the hairpin complex (Figure 4C‐D). In the present structure, the aromatic ring of A24 approaches the protein backbone and interacts via two hydrogen bonds and hydrophobic interactions with Ser46. By contrast, U13, C14 and C15 in stem‐loop II (which correspond roughly to A24 in the present structure) are poorly ordered in the hairpin complex and involved in crystal packing contacts (Oubridge et al., 1994; Nagai et al., 1995). The extensive contacts to G25 sugar and phosphates in the present complex have a single counterpart in the stem‐loop 2 complex (an interaction involving G16 phosphate). Finally, Lys23, Ser46, Arg47 and Ser48 interact with the PIE RNA, but these residues are involved only in intramolecular interactions with other protein residues in the stem‐loop II complex.

The differences between the two structures are directly supported by the primary data. Backbone amide resonances from loop 3 (Ser46‐Arg52) are readily visible in 1H‐15N correlated spectra for the present complex, but were not detected in the corresponding spectrum recorded for the stem‐loop II complex (data not shown). The absence of these signals from the spectrum of the stem‐loop II complex probably reflects dynamic disorder of loop 3 in the solution conformation of the hairpin complex, due to less extensive intermolecular interactions. Therefore, U1A‐RNA interactions are not identical between the two complexes, nor are they limited to the seven conserved nucleotides (5′AUUGCAC3′) and the G.C base pair at the base of the loop.

In total, the NMR structure shows that residues from loop 3 (residues 46‐52) form 18 hydrophobic contacts (Table II), eight hydrogen bonds (Table III) and two salt bridges with the RNA (Table IV) (Figure 4C), emphasizing the importance of this region of RNP proteins in RNA recognition. Val45 carbonyl and Arg47 and Leu49 main‐chain amides make three hydrogen bonds with the RNA, while seven hydrophobic contacts involve the Ser46 and Ser48 side chains and one electrostatic interaction is observed from the Arg47 side chain. Additional interactions between loop 3 and the RNA involve the Ser48 hydroxyl and the side chains of Leu49, Lys50, Met51 and Arg52; these contacts were reported previously (Allain, et al., 1996) and have been confirmed in this refined structure.

Discussion

Human U1A protein binds with very high affinity and specificity two distinct RNA targets; stem‐loop II from U1 snRNA and the polyadenylation inhibition element (PIE) from the 3′‐untranslated region of the U1A pre‐mRNA (van Gelder et al., 1993; Teunissen et al., 1997). Binding to both RNAs occurs a with sub‐nanomolar dissociation constant (Hall and Stump, 1992; van Gelder et al., 1993; Hall, 1994; Gubser and Varani, 1996), while non‐specific binding to poly(A), tRNA or double‐stranded RNA is at least 106‐fold weaker (Hall and Stump, 1992). The numerous intermolecular interactions between U1A and either hairpin or internal loop targets could explain the very low dissociation constant and the large enthalpic contribution to binding (Hall and Kranz, 1995). However, the molecular basis for the ability of U1A and other RNP proteins to discriminate different RNAs remains an outstanding question.

Binding of U1A to hairpin and internal loop RNAs requires seven conserved single‐stranded nucleotides and the presentation of these nucleotides within the correct secondary structure (van Gelder et al., 1993; Hall, 1994). The single‐stranded nucleotides are recognized primarily by residues from the protein β‐sheet surface and from the loop connecting the end of the domain with helix C. Intermolecular contacts involving these nucleotides are very similar between the two complexes of U1A (Oubridge et al., 1994; Allain et al., 1996), but interactions involving the protein variable loops differ significantly. In the hairpin complex (Oubridge et al., 1994), U1A interacts only with the G.C base pair that closes the helical stem in addition to the AUUGCAC sequence. An intact double‐helical stem (which is essential for binding) was seen to restrict the conformational freedom of the single‐stranded loop (thereby reducing entropic losses upon complex formation), and to allow recognition of the terminal G.C pair (Nagai et al., 1995). The present structure contains more extensive interactions between the RNA helical regions and amino acids from loop 1 and loop 3. Since these loops represent the principal specificity determinants of RNP proteins (Kenan et al., 1991), the analysis of these interactions provides insight into how the specificity of U1A (and presumably other RNP proteins) is determined.

Interactions with the single‐stranded nucleotides provide only limited ability to discriminate different RNA substrates

Nucleotides from the single‐stranded A39UUGCAC45 sequence are splayed out against the protein β‐sheet surface and the bases are involved in numerous intermolecular interactions. It is tempting to assume that recognition of the exposed functional groups on these bases suffices to define the specificity of U1A. As demonstrated by elegant in vitro selection experiments (Tsai et al., 1991), these interactions are undoubtedly important determinants of U1A specificity. However, interactions with other regions of the RNA are also important.

The main consideration against a dominant role for residues from the β‐sheet surface in determining RNP specificity is the extremely high sequence conservation in this region among different RNP proteins (Birney et al., 1993). For example, protein residues involved in intermolecular stacking interactions (at positions 13, 54 and 56; Table II) correspond to either Phe or Tyr in ≈70% of all RNP domains (Birney et al., 1993). Although U1A has glutamine at position 54, this side chain also stacks (on G42), consistent with these intermolecular stacking interactions being general characteristics of RNP‐RNA complexes. Similarly, the Ser46‐A24 and Lys88‐C43 hydrophobic interactions (Table II) can also be fulfilled by most of the equivalent residues in RNP proteins with diverse specificity. Thus (with the probable exception of Gln54; vide infra) these contacts are unlikely to be specific to U1A.

Nine intermolecular contacts involving direct side‐chain interactions with base functional groups from the AUUGCAC sequence could be thought of as primary specificity determinants (Figure 4A). Seven of these contacts involve six residues on the surface of the β‐sheet, while two involve residues of loop 3. Four of these residues, Asn15, Glu19, Leu44 and Gln85, are not well conserved in RNP proteins. These four residues contribute a total of six hydrogen bonds and are likely to be true specificity determinants on the surface of the β‐sheet of U1A.

The present discussion provides an opportunity to re‐examine the extensive mutagenesis data on the stem‐loop II complex. Since interactions involving the single‐stranded AUUGCAC sequence are common between the two structures, the effects of mutations here are likely to be very similar in the two cases. In fact, the C43→U mutation within the PIE RNA (van Gelder et al., 1993) and the corresponding change within U1 snRNA stem‐loop II (Hall 1994) lead to an ≈20‐fold increase in the dissociation constant in both cases.

Mutations of the conserved aromatic side chains on the β‐sheet surface cause large reductions in affinity. Planar intermolecular stacking interactions are rare in DNA‐protein complexes but have commonly been observed in RNA‐protein complexes (Rould et al., 1989, 1991; Caverelli et al., 1993; Biou et al., 1994; Valegárd et al., 1994; Cusack et al., 1996) and contribute significantly to binding energy (Stump and Hall, 1995; LeCuyer et al., 1996). Mutation of Gln54 to Phe reduces the affinity >100‐fold (Jessen et al., 1991); this may reflect inability of the U1A‐RNA interface to accommodate a larger side chain at this particular position. On the other hand, mutation of residues forming hydrogen bonding contacts from the amino acid side chains generally has a small effect (≈10‐fold or less) on binding, with the exception of the Asn15→Val substitution that abolishes binding altogether (Jessen et al., 1991; Scherly et al., 1991). Many mutations of single‐stranded nucleotides also cause only small decreases (≈10‐ to 20‐fold) in the binding constant (Hall 1994), even when residues involved in extensive intermolecular contacts are changed [for example C43→U or C45→G (Hall and Stump, 1992; Hall, 1994)]. The result of the C45→G mutation is particularly surprising, since the structure does not allow any space for a purine in place of C45 and this nucleotide is important for discrimination between U1A and U2B′ (Scherly, et al., 1990). Only in two cases is the base identity truly critical: G42→A and A44→G transitions cause large decreases (≈104‐fold) in affinity (Hall, 1994). G42 is hydrogen bonded to two of the amino acid side chains that have been identified as primary determinants for U1A recognition of the single‐stranded RNA loop, Asn16 and Glu19; however, Asn16→Val and Glu19→Asp mutations cause only small losses in binding constant (10‐fold or less) (Scherly et al., 1991). A44 forms a specific hydrophobic interaction with another unique amino acid side chain (Leu44).

An attractive explanation for the relative insensitivity of the binding constant to mutations of interfacial nucleotides is provided by considerations of flexibility. The small effect of the C45→G substitution is inconsistent with a rigid interface, since there is insufficient space to fit a guanosine base in place of C45. Selective broadening of NMR resonances from residues close to C45 suggests that there are local motions in this region of the structure. Thus, the interface appears to be sufficiently flexible to accommodate either base through local conformational adjustments, as observed in crystallographic and thermodynamic studies of proteins (Alber et al., 1987). Studies of residual dynamics at intermolecular protein‐protein interfaces have led to the suggestion that a fine balance between rigidity and flexibility may provide a compromise between complete specificity (at large entropic cost) and complete lack of selectivity (Kay et al., 1996). The residual conformational flexibility may reduce the ability to discriminate between different nucleotides.

The analysis described here suggests that intermolecular interactions with the seven single‐stranded nucleotides provide significant binding energy but only limited capacity to discriminate between different RNAs. Although the AUUGCAC sequence is optimal for U1A recognition (Tsai et al., 1991), several contacts involving these nucleotides can be disrupted by mutations in the protein or RNA with only small effects on binding. Furthermore, many interactions involve either protein main‐chain amide and carbonyl functionalities (Tables II,III,IV) or amino acids that are highly conserved among RNP proteins: these cannot be principal determinants of U1A specificity. To this extent, the surface of the β‐sheet is indeed a generic RNA‐binding surface (Kenan et al., 1991), and the key molecular determinants of specificity must be sought in interactions involving loop 1, loop 3 and the region immediately C‐terminal to the RNP domain.

The position of helix C is an important determinant of specificity

Sequences flanking the RNP domain are required for RNA binding in many RNP‐containing proteins; often the isolated domain does not contain sufficient information to function as a sequence‐specific RNA‐binding entity (Kenan et al., 1991). In the case of U1A, the region immediately following the end of the RNP domain is essential for RNA binding. A protein construct truncated at residue 91 does not bind RNA at all (Scherly et al., 1989), truncation at residue 95 reduces binding 30‐fold (Jessen et al., 1991; Scherly, et al., 1991; Hall, 1994) and substitutions of Lys96 and Lys98 with Gln reduce affinity (Jessen, et al., 1991). Since sequence alignment suggests this region of the protein is unique to U1A (and U2B″), interactions involving these residues are likely to contribute to the ability of U1A to discriminate different RNAs.

Residues 92‐98 form a well‐defined α‐helix in free U1A (Howe et al., 1994; Avis et al., 1996), and this helical structure is preserved in the complex (Oubridge et al., 1994; Allain et al., 1996). Residues 88‐92 are involved in extensive interactions with nucleotides C43, A44 and C45 (Figures 3B and 4A). Remarkably, almost all intermolecular contacts from this region of the protein involve main‐chain functionalities. The reason why these interactions are nonetheless specific may lie in the conformational properties of helix C. The position of this helix in the complex is defined by hydrophobic interactions between Ile93, Ile94 and Met97 within helix C and His10, Leu41, Leu58 and Ile62. Mutations of some of the amino acids involved in positioning helix C (for example Thr11 next to His10) reduce RNA binding significantly (Jessen et al., 1991). These residues and the residues within helix C are not preserved in other RNP proteins. Thus, these main‐chain interactions become specific to U1A because the formation of helix C and its positioning in the complex through hydrophobic interactions with the rest of the domain are unlikely to occur in other RNP proteins.

A second mechanism by which helix C may contribute to molecular discrimination is related to its position in the free protein. In the absence of RNA, helix C lies across the surface of the β‐sheet and covers a large part of the RNA‐binding surface (Figure 5B) (Avis et al., 1996). Interactions between U1A and non‐cognate RNAs may provide insufficient energy to drive this conformational change in the protein, thereby reducing the affinity for non‐cognate RNAs.

Figure 5.

(A) The surface representation of the complex shows loop 3 of U1A (white) protruding through the hole in the RNA internal loop (light blue); interactions involving Leu49, Ser46 and Ser48 (dark blue) and Arg52 (hidden in this orientation) are critical to dock the protein against the RNA. (B) Surface representation of free and bound protein structures. The red surface identifies residues involved in intermolecular stacking interactions that become exposed upon the rearrangement of helix C; dark blue identifies loop 3 residues that interact with the RNA by rigid fit; light blue identifies the location of the remaining sites of intermolecular contact.

Electrostatic interactions between the variable loops of the domain and the RNA phosphodiester backbone contribute to substrate discrimination

The variable loop 1 and loop 3 connecting the secondary structural elements of U1A (Figure 1B) contain an unusual cluster of basic residues (Lys20, Lys22, Lys23, Arg47, Lys50 and Arg52). The salt dependence of the binding constant for the hairpin complex suggests that electrostatic interactions contribute ≈30% of the total binding energy (Hall and Stump, 1992; Hall, 1994). The role of these basic residues in U1A‐PIE recognition is shown in Figure 4B, where electrostatic interactions between Lys23, Arg47 and Lys96 and the phosphodiester backbone of stem 2 are highlighted. Inspection of the electrostatic potential reveals that basic residues within U1A closely follow the path of the RNA phosphodiester backbone in both complexes (Nagai et al., 1995; Allain et al., 1996). One could argue that interactions with the phosphodiester backbone are of a non‐specific nature. However, these interactions originate from residues in loop 1 and loop 3 that are unique to U1A (Birney et al., 1993) and depend on RNA conformation, so they could well be specific for a particular RNA structure. The structure indicates that these interactions contribute to substrate discrimination by allowing the recognition of the charge distribution of the RNA substrate through electrostatic contacts. The importance of electrostatic interactions in RNP‐RNA discrimination has been demonstrated for hnRNP A1, where an increase of the monovalent ion concentration reduces the ability to discriminate specific from non‐specific substrates (Abdul‐Manam et al., 1996).

Loop 3‐RNA interactions lock the conformation of the complex

Loop 3 (connecting strands β2 and β3) is the site of greatest diversity in sequence and length among RNP proteins. Residues from loop 3 of U1A form many intermolecular interactions with A24, G25.C38, A39 and U40 in the RNA at the junction between the single‐stranded region and the two stems (Figure 4C). A particularly important role had been proposed for Arg52 (Oubridge et al., 1994; Nagai et al., 1995); in the stem‐loop II complex, this residue recognizes the G25.C38 base pair and forms five hydrogen bonds. The hydrogen bonds involving Arg52 are defined less precisely in the present structure. Although Arg52 cannot be mutated to Gln, substitution of Arg52 by Lys causes only a 3‐fold loss in binding constant (Nagai et al., 1990; Jessen et al., 1991). Taken together, these facts may be further pointers to conformational flexibility at the interface, as has also been observed in NMR structures of protein‐DNA complexes (Berglund et al., 1995; Slijper et al., 1997).

The sequence requirements within loop 3 for binding to stem‐loop II RNA have been investigated exhaustively in a genetic study based on the phage display method (Laird‐Offinga and Belasco, 1995). Leu49 was conserved in all but one tight binding clone and was found to have a disproportionate effect on the kinetic parameters of binding (Leu49→Met substitution increased koff 100‐fold). In the present structure, the side chains of Leu49 and Arg52 interact with A39 and the G25.C38 base pair at the stem‐loop junction. An identical interaction was observed for Leu136 with A72 stacking on the C71.G2 base pair in the acceptor stem of the tRNAGln‐synthetase complex (Rould et al., 1989). These hydrophobic interactions cannot simultaneously be fulfilled by any amino acid except leucine.

As shown in the surface representation of Figure 5A, contacts involving Ser46, Ser48 and Leu49 (and Arg52, not visible in this orientation) from loop 3 lock the protein into the hole defined by the RNA structure. Together with the stacking of A24 on stem 2 and A39 on stem 3, these interactions define the position of the helical stems with respect to the single‐stranded nucleotides. In addition to intermolecular contacts involving the side chains of Ser46, Ser48, Leu49 and Arg52, every loop 3 residue except Ser46, Ser48 and Met51 forms intermolecular interactions involving main‐chain functionalities. As in the case of the C‐terminal region of the domain, these interactions may be specific because of the unique length (Birney et al., 1993) and conformation of loop 3 in U1A, that forms a short helical structure in both free and bound U1A proteins.

Loop 3 residues are critical for discrimination between stem loop II of U1 snRNA and stem‐loop IV of U2 snRNA (Scherly et al., 1990; Bentley and Keene, 1991; Laird‐Offinga and Belasco, 1995); the main difference between these RNAs is at the stem‐loop junction. Disruption of the structure of the RNA stem‐loop junction or an A39→C mutation (that would disrupt the interactions involving Leu49 and Arg52), reduces the affinity of U1A for its RNA by >1000‐fold (Hall, 1994). Docking of loop 3 at the RNA stem‐loop junction may be essential to define precisely the relative position of the double‐helical stems (recognized through electrostatic interactions from basic residues in loops 1 and 3 to the phosphodiester backbone of stem 2) and single‐stranded nucleotides (recognized by extensive interactions involving the protein β‐sheet surface and helix C).

Recognition mechanism

An additional level of complexity in rationalizing RNP‐RNA interaction concerns the extensive conformational changes observed in both protein and RNA components (Allain et al., 1996; Avis et al., 1996; Gubser and Varani, 1996). The major difference between the free and bound protein structures is in the position of helix C. In the complex, helix C is directed away from the β‐sheet surface and the RNA (Figure 5B), while in the free structure it points towards loop 3 and covers part of the β‐sheet surface (Avis et al., 1996). Five of the seven single‐stranded nucleotides (U41‐C45) are highly flexible in the free RNA structure, but are much more highly ordered in the complex (Gubser and Varani, 1996). Distinct changes in base stacking interactions between the free and bound RNA structures are shown by the clear differences in the pattern of NOE interactions. Protein binding opens up the conformation of the RNA loop: in the free RNA, the A39, U40, U41, A44, C45 and A24 bases are oriented towards the inside of the loop, filling the cavity created by the sugar‐phosphate backbone of G23‐G25 and C38‐C46; only G42 and C43 are solvent exposed (Gubser and Varani, 1996). The space in the cavity occupied by U41, A44 and C45 in the free RNA is filled instead in the complex by protein residues from loop 3, while the bases of A24, A39, U40, U41, A44 and C45 are now directed towards the protein.

Implications for the binding specificity of other RNP proteins

The interconnection between RNA structure and specificity illustrated by the present structure provides a possible rationale to explain the high selectivity of RNP proteins that recognize highly structured RNAs, for example U1 70K. The numerous RNA‐protein interactions mediated by highly conserved residues on the surface of the β‐sheet are likely to be common to all RNP proteins, and may provide the free energy required for non‐specific binding to any RNA target (Burd and Dreyfuss, 1994b). Substrate discrimination could be achieved by recognition of the unique three‐dimensional shape and charge distribution of different RNAs, and fine‐tuned by base‐specific interactions with exposed single‐stranded nucleotides. However, proteins that recognize genuinely single‐stranded RNA must utilize a different recognition mechanism. Remarkably, single RNP domains fail to bind single‐stranded RNA with high specificity: multiple domains are required for highly specific molecular recognition (Shamoo et al., 1994; Kanaar et al., 1995; Tacke and Manley, 1995; Tacke et al., 1997). How this is achieved in structural terms remains unclear and poses the next logical question towards understanding RNP‐RNA recognition.

Conclusions

The many structural and thermodynamic studies of DNA‐protein complexes have provided a powerful paradigm in protein‐nucleic acid recognition. Sequence‐specific DNA binding requires recognition by protein side chains of functionalities on the DNA bases exposed in the major groove of B‐form DNA (Steitz, 1990). This paradigm is so effective that a code, albeit a highly degenerate one, has been proposed for DNA recognition by zinc finger proteins (Choo and Klug, 1997). The implications of the structures of U1A‐RNA complexes provide a very different picture of RNA‐protein recognition, much closer to our understanding of protein‐protein recognition than to DNA‐protein recognition.

The recognition by U1A of the identity of RNA base functionalities exposed in single‐stranded loops is an important element of molecular discrimination. However, many interactions with functional groups on the single‐stranded nucleotides originate from the β‐sheet surface and are potentially common to RNP proteins with diverse specificity. A large part of the role of the β‐sheet is to form a generic RNA‐binding surface by providing the amino acid side chains that participate in what is essentially an ‘intermolecular core’. The formation of intermolecular stacking interactions (Oubridge et al., 1994; Allain et al., 1996) through reorganization of the RNA and protein structures is an essential step in this process. Formation of a specific complex produces a buried interface that resembles the interior of a protein (Figure 3B). As observed in studies of protein stability, the intermolecular interface may readily adjust to mitigate the effects of potentially deleterious substitutions (Alber et al., 1987). Thus, although the sequence of the single‐stranded nucleotides provides optimal intermolecular contacts and is therefore well conserved, mutations are surprisingly well tolerated. In contrast, inclusion of bulky amino acid side chains, disruption of Watson‐Crick pairing in the helical stems or changes in the size of the single‐stranded loop (Tsai et al., 1991; Hall, 1994) reduce binding to nearly non‐specific levels. The most variable region of RNP proteins, loop 3, plays a crucial role in defining the geometry of the intermolecular interface. Long‐range electrostatic interactions with the phosphodiester backbone involve side chains that are unique to U1A and therefore likely to be important for discrimination.

Both protein and RNA structures acquire the correct conformation upon extensive conformational changes. This phenomenon has been observed for all RNA complexes with proteins and peptides for which structural data exist for both free and bound components (Puglisi et al., 1992; Aboul‐ela et al., 1995; Peterson and Feigon, 1996). RNA‐protein recognition is clearly a dynamic event: RNA structure defines distinct protein‐binding surfaces, which are then reorganized upon interaction to optimize surface complementarity and functional group recognition. This flexibility of RNA structure has been exploited by proteins that regulate the activity of RNA enzymes (Weeks and Cech, 1995, 1996; Caprara et al., 1996). An intriguing question yet to be addressed is to what extent these conformational changes are sequence‐sensitive, in which case the ability to undergo such structural shifts could be as finely tuned and highly evolved as the RNA and protein structures themselves.

Materials and methods

All NMR experiments were conducted at 27°C in buffer containing either 10 mM sodium phosphate (pH ∼6.5) or 5 mM sodium acetate (pH ∼6). A detailed technical description of spectroscopic methods and the procedures adopted to obtain complete spectral assignments, to construct the constraint list and to calculate the structure of the complex is presented elsewhere (Howe et al., 1997). Statistics for the constraints used in the calculations and all relevant structural statistics are reported in Table I.

The refinement of the structure presented here utilized the previously published structure (Allain et al., 1996) to identify novel and ambiguous intermolecular NOE interactions. Fifty new intermolecular and 49 new intramolecular NOE constraints were identified during the refinement process. The majority of the novel intermolecular distance constraints (32 out of 49) involved RNA sugar resonances; given the extensive overlap in this region of the spectrum, these NOE interactions could only be identified conclusively using the first set of structures to resolve ambiguities. In addition, some constraints were removed from the original list because the previous assignments could not be validated conclusively in this more complete spectral analysis. The refinement of the constraint list resulted in a 20% improvement in the precision of the structure. A comparison between the two ensembles of structures revealed that the refined structures are within the envelope of structures defined by the first, less complete set of constraints. The 20% improvement in precision may seem disproportionate, since <100 novel constraints were added to a total of 2600 constraints. However, the majority of the novel constraints occur at the RNA‐protein interface and often belong to the (previously) less well‐defined regions of the structure.

Converged and non‐converged structures were classified using clear differences in energy‐ordered profiles and energy‐ordered r.m.s.d. profiles (Avis et al., 1996; Fletcher et al., 1996; Varani et al., 1996). In addition to improving the overall precision, the refined constraint set improved significantly the proportion of structure calculations that converged and allowed a much clearer separation between converged and non‐converged structures (Howe et al., 1997). Thirty‐one out of 50 structures have comparable low numbers of violations and low values for the pseudoenergy corresponding to NOE distance constraints; violations increase significantly for structures 32‐50, and the energy‐ordered r.m.s.d. profile diverges from the clear plateau established for the first 31 structures (Howe et al., 1997). All statistics reported in Table I and the analysis of intermolecular interactions were based on the ensemble of 31 converged structures.

Intermolecular interactions (Tables II,III,IV) have been defined by a statistical analysis based on a systematic search through all converged structures for hydrogen bonding, van der Waals and electrostatic contacts. Interactions were considered to be established only when observed in the majority of all converged structures. Since NMR structures are constructed from numerous, short interproton distances, stacking, hydrophobic and hydrogen bonding contacts are defined much more precisely than electrostatic interactions or salt bridges. Furthermore, we could not observe any direct NOE interaction between basic protein side chains and the RNA phosphodiester backbone. Therefore, the proposed electrostatic contacts could only be inferred indirectly by a statistical analysis of converged structures. We emphasize, nevertheless, that the electrostatic component of the force field was never introduced during any stage of the structure calculation process, thereby avoiding any bias that would favour interactions between basic amino acids and the RNA phosphates.

Acknowledgements

We would like to thank Drs Johanna Avis, Charles Gubser, Kiyoshi Nagai and Chris Oubridge for help and suggestions at various stages of this project.

References