Molecular basis of RNA recognition by the human alternative splicing factor Fox‐1

Sigrid D Auweter, Rudi Fasan, Luc Reymond, Jason G Underwood, Douglas L Black, Stefan Pitsch, Frédéric H‐T Allain

Author Affiliations

  1. Sigrid D Auweter1,,
  2. Rudi Fasan2,
  3. Luc Reymond3,
  4. Jason G Underwood4,
  5. Douglas L Black4,
  6. Stefan Pitsch3 and
  7. Frédéric H‐T Allain*,1
  1. 1 Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology (ETH) Zurich, Zurich, Switzerland
  2. 2 Institute of Organic Chemistry, University of Zurich, Zurich, Switzerland
  3. 3 Laboratory of Nucleic Acid Chemistry LCAN‐EPFL, Lausanne, Switzerland
  4. 4 Department of Microbiology, Immunology and Molecular Genetics and Howard Hughes Medical Institute, University of California, Los Angeles, CA, USA
  1. *Corresponding author. Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology (ETH) Zurich, 8093 Zurich, Switzerland. Tel.: +41 1 633 39 40; Fax: +41 1 633 12 94; E‐mail: allain{at}
View Full Text


The Fox‐1 protein regulates alternative splicing of tissue‐specific exons by binding to GCAUG elements. Here, we report the solution structure of the Fox‐1 RNA binding domain (RBD) in complex with UGCAUGU. The last three nucleotides, UGU, are recognized in a canonical way by the four‐stranded β‐sheet of the RBD. In contrast, the first four nucleotides, UGCA, are bound by two loops of the protein in an unprecedented manner. Nucleotides U1, G2, and C3 are wrapped around a single phenylalanine, while G2 and A4 form a base‐pair. This novel RNA binding site is independent from the β‐sheet binding interface. Surface plasmon resonance analyses were used to quantify the energetic contributions of electrostatic and hydrogen bond interactions to complex formation and support our structural findings. These results demonstrate the unusual molecular mechanism of sequence‐specific RNA recognition by Fox‐1, which is exceptional in its high affinity for a defined but short sequence element.


The RNA element UGCAUG has long been known to strongly influence splicing of a variety of alternative exons in mammalian genes, including the c‐src N1 exon (Black, 1992; Modafferi and Black, 1997), the calcitonin/CGRP exon 4 (Hedjran et al, 1997), the fibronectin exon IIIB (Huh and Hynes, 1994), the fibroblast growth factor receptor 2 exon IIIb (Baraniak et al, 2003), and the nonmuscle myosin II heavy chain B exon N30 (Kawamoto, 1996). The UGCAUG element is a key feature within the intronic enhancers of these genes, often occurring in multiple copies. A computational study also suggests that UGCAUG is highly over‐represented in the downstream introns of neuron‐ and muscle‐specific alternative cassette exons (Brudno et al, 2001).

The Fox‐1 (feminizing locus on X) gene was originally identified in Caenorhabditis elegans, where it acts as a numerator element in counting the number of X chromosomes relative to ploidity, and determining male or hermaphrodite development (Hodgkin et al, 1994; Skipper et al, 1999). The worm Fox‐1 gene product is thought to post‐transcriptionally repress the expression of Xol‐1, the main switch controlling sex determination (Nicoll et al, 1997; Meyer, 2000). Since several alternatively spliced isoforms of Xol‐1 exist while only one of these splice variants is necessary and sufficient as a sex determinant (Rhind et al, 1995), it was speculated that Fox‐1 might lead to unproductive splicing of the Xol‐1 gene.

In vitro selection experiments identified the sequence GCAUG as the optimal recognition site for the Fox‐1 homolog from zebra fish (zFox1; Jin et al, 2003). zFox‐1 mRNA was found to be specifically expressed in muscle, while the mouse Fox‐1 protein (mFox‐1) was found in muscle, heart, and brain tissue. zFox‐1 and mFox‐1 were shown to repress muscle‐specific exons in nonmuscle tissue and to enhance splicing of the Fibronectin exon EIIIB by binding to GCAUG elements. Others showed that tissue‐specific isoforms of mouse Fox‐1 proteins differ in terms of subcellular localization and activity as splicing regulators (Nakahata and Kawamoto, 2005). Finally, neuronal isoforms of the Fox‐1 protein have been shown to mediate splicing activation via UGCAUG elements, and to control inclusion of certain neuron‐specific exons (Nakahata and Kawamoto, 2005; Underwood et al, 2005).

In human, there are three genes that encode Fox‐1 like proteins. In the Swissprot database, these proteins are referred to as RNA binding motif protein 9 (RBM9), Ataxin 2‐binding protein 1 (A2BP1) and Hexaribonucleotide Binding Protein 1 (HRNBP1) ( In addition, several alternatively spliced variants exist for each of these Fox‐1 like proteins. The RNA binding activity of the Fox‐1 proteins is believed to reside in a single ∼100 amino‐acid region with homology to the RNA binding domain (RBD, also called RNA recognition motif (RRM) or ribonucleoprotein (RNP) domain). This domain is conserved among the different human Fox‐1 homologs and present in nearly all splice variants. In contrast, the flanking N‐ and C‐terminal domains are not as highly conserved and do not show significant similarity to any protein motifs in current data bases. A typical RBD folds into an αβ‐sandwich with a β1α1β2β3α2β4 topology in which a four‐stranded antiparallel β‐sheet is packed against two α‐helices. A single RBD generally recognizes three to four nucleotides of single‐stranded RNA sequence‐specifically using the β‐sheet as the primary RNA binding surface (Maris et al, 2005).

To understand how the Fox‐1 proteins recognize their target RNA sequence, we have determined the solution structure of the RBD of human Fox‐1 in complex with the RNA heptamer UGCAUGU using NMR spectroscopy. Surface plasmon resonance (SPR) analyses give further insight into the mechanism underlying RNA recognition by Fox‐1 and support our structural findings.


Structure determination

The RBD of Fox‐1 adopts a folded structure both in the presence and absence of RNA and gives rise to highly dispersed NMR spectra (Figure 1A). Titration of the RBD of Fox‐1 with an RNA 5′‐UGCAUGU‐3′, followed by NMR, shows that saturation is reached at a 1:1 stochiometric ratio and that the complex is in slow exchange on the NMR time scale. This RNA was chosen because it contains the Fox‐1 binding sequence identified by in vitro selection experiments (Jin et al, 2003) flanked by two uracils as in the downstream control sequence (DCS) of the c‐src alternative N1 exon (Modafferi and Black, 1997). Addition of this RNA causes large and numerous chemical shift changes in the 15N‐labeled HSQC spectrum, indicating that a large number of protein residues are perturbed by RNA binding (Figure 1A). A comparison of the chemical shifts of the free and the bound form of the protein shows that the perturbed residues are found in the β‐strands and in loops β1α1 and β2β3 (Figure 1C). Furthermore, all nucleotides of 5′‐UGCAUGU‐3′, from U1 to U7, are affected by binding to the protein as indicated by the overlay of the TOCSY spectra of the free and bound RNAs (Figure 1B).

Figure 1.

UGCAUGU binds to the RBD of Fox‐1 and affects residues in the β‐sheet and in loops. (A) 15N‐labeled HSQC spectra of ∼1 mM solutions of the free RBD of Fox‐1 (blue) and of the RBD of Fox‐1 in the presence of one equivalent of 5′‐UGCAUGU‐3′ (red) at 313 K. (B) Sections of 2D TOCSY spectra showing the H5–H6 correlations of uracil and cytosine of ∼1 mM solutions of free 5′‐UGCAUGU‐3′ (blue) and of 5′‐UGCAUGU‐3′ in the presence of one equivalent of protein (red). (C) The changes in chemical shift of the backbone amide nitrogen (black) and proton (grey) between free and bound Fox‐1 (in Hz, on a 600 MHz spectrometer) are plotted versus the amino‐acid residue number. Large chemical shift changes occur in the β‐strands as well as in loops β1α1 and β2β3 (assignments for residues 125, 126, 131, 152 and after 191 could not be obtained for the free protein).

Complete resonance assignments of the protein in complex could be obtained using published methods. Resonance assignment of the RNA was more difficult and required the synthesis of two isotopically labeled RNA oligonucleotides to resolve ambiguities. In one molecule, the sugar moieties of U1, C3 and U5 were 13C‐labeled, in the other molecule, the sugar moieties of G2, A4, G6, and U7 were 13C‐labeled. These two partially labeled RNA molecules were essential to unambiguously assign numerous unusual sugar–sugar and intermolecular NOE cross‐peaks. In total, 30 conformers of the Fox‐1–UGCAUGU complex were calculated from a total of 1460 NOE‐derived distance constraints (including 149 intermolecular and 119 intra‐RNA distance constraints), six torsion angle constraints and 29 hydrogen bond constraints (see Table I and Materials and methods). The polypeptide backbone of these structures is ordered from P116 to R194 and the conformation of the RNA is precisely defined (Figure 2A). The heavy atoms of the structured part of the entire complex have an RMS deviation of 0.90 Å (Table I).

Figure 2.

Overview of the solution structure of the RBD of Fox‐1 in complex with UGCAUGU. (A) Overlay of the final 30 structures superposed on the heavy atoms of the structured parts of the protein and of the RNA. The protein backbone is gray, the RNA backbone is orange, the phosphate groups are red, and the RNA bases are yellow. Only the ordered region of the protein (residues 116–194) is shown. (B) Surface (heavy atoms of residues 116–194) and stick (heavy atoms of the RNA) representation of the lowest energy structure. The protein surface is painted according to surface potential with red indicating negative charges and blue indicating positive charges. The RNA is colored as in panel (A). (C) The lowest energy structure in ribbon (protein backbone) and stick (RNA) representation. The color scheme is the same as in (A), important protein side chains involved in hydrophobic interactions with the RNA are represented as green sticks. (D) Same as (C) but rotated by 90° around the indicated axis. Figures were generated with MOLMOL (Koradi et al, 1996).

View this table:
Table 1. NMR structure determination statistics

Overview of the Fox‐1–UGCAUGU complex structure

The protein in the complex adopts the typical β1α1β2β3α2β4 fold of an RBD with the two α‐helices packed against a four‐stranded antiparallel β‐sheet (Figure 2). Furthermore, the structure of the protein is characterized by an additional small two‐stranded β‐sheet located between α2 and β4 (Figure 2D).

The RNA, which is unstructured in the free state (data not shown), adopts a bent conformation upon binding to the protein (Figure 2). The RNA bases, rather than the sugar‐phosphate backbone, are making most of the contacts to the protein. Three of the seven nucleotides, U5–U7, are lying across the canonical binding interface of the RBD, the β‐sheet. The remaining four nucleotides (U1–A4) are in contact with loops β1α1, β2β3 and α2β4. In particular, U1, G2 and C3 are wrapped around a single phenylalanine of the β1α1‐loop, F126. Moreover, G2 and A4 form an interesting mismatch base‐pair. All the sugars, except for the sugar of U1, adopt a C2′‐endo conformation, and the base of G6 adopts a syn conformation (Figure 2C and D).

Complex formation between the Fox‐1 RBD and 5′‐UGCAUGU‐3′ is driven by numerous electrostatic and hydrophobic interactions. Four positively charged side chains, R194, K156, R127, and R184, are in contact with the RNA phosphate backbone (Figure 2B). Two phenylalanines and one histidine contact the RNA via base stacking. U1 and G2 stack on each side of F126, which is part of loop β1α1. U5 and G6 stack on H120 and F160, respectively, two residues located on the β‐sheet. Additional hydrophobic contacts are seen for the base of C3 that points its hydrophobic edge towards F126, for the sugars of U5 and G6 that pack from both sides against F158, and for the sugar of U7 that packs against I149 (Figures 2C, D and 3). Hydrophobic interactions equivalent to the ones observed for H120, F158 and F160 were observed in many RBD–RNA complexes (Supplementary Figure S1) (Maris et al, 2005). However, the extensive hydrophobic interactions mediated by F126, which contacts U1, G2, and C3 simultaneously, comprise a novel structural feature that is unique to RNA recognition by Fox‐1.

Figure 3.

Molecular recognition of UGCAUGU by the RBD of Fox‐1. Close‐up views of the RNA binding interface of the overlay of the final 30 structures superposed on the heavy atoms of the structured parts of the protein and of the RNA (left), single structures showing the intermolecular and intra‐RNA interactions that are most commonly observed in the 30 structures (middle; see Supplementary Table SI) and schematic representations of the hydrogen bonding interactions that are most commonly observed in the 30 structures (right). The ribbon representation of the protein backbone is shown in grey, side chains of the protein are in green and the RNA is in yellow. Recognition of U1 and C3 (A), of G2 and A4 (B), of U5 (C), and of G6 and U7 (D). Figures were generated with MOLMOL (Koradi et al, 1996).

Fox‐1 is a sequence‐specific RNA binding protein

In addition to the numerous hydrophobic and electrostatic interactions that provide affinity, there is a dense network of hydrogen bonds that provide sequence‐specificity to the first six nucleotides 5′‐UGCAUG‐3′. The most important interactions at the protein–RNA interface and within the RNA are described in Supplementary Table SI. Those interactions that are most frequently observed for a certain atom are shown in the middle and right panels of Figure 3. The last nucleotide, U7, is in contact with the protein as well, and is precisely defined in the structure of the complex (Figure 2A), but its base is not recognized by any specific hydrogen bond.

The recognition of the U1–C3 pair is mediated by an intra‐RNA hydrogen bond between U1 O2 and the H42 of C3 and by two intermolecular hydrogen bonds to the side chains of R127 (U1) and N151 (C3) (Figure 3A). However, we do not find a single conformation for this region of the complex (Figure 3A, left panel). In particular, U1 can be oriented either parallel or perpendicular to F126. This might reflect the physical situation since the NOE intensities of the U1 H6–H2′, H6–H1′ and H6–H3′ correlations are of similar intensity even at very short mixing times.

G2 and A4 form a Trans Watson Crick/Shallow Groove AG base‐pair. The guanine is further contacting the backbone carbonyl of I124 with a bifurcated hydrogen bond by its H21 and H1 atoms. Furthermore, the O6 of G2 is hydrogen bonded to the side chain of R184 (Figure 3B). However, this arginine might also be stacking on G2, as seen in one third of the structures (Supplementary Table SI).

U5 is specifically recognized by hydrogen bonds to the backbone amide of T192 and to the backbone carbonyl of N190. Furthermore, in several conformers, the O4 of U5 is forming a hydrogen bond with the side chain of N189. However, in most structures, the side chain of N189 is hydrogen bonded to H120. Indicated by 15N HMQC spectra, this histidine is present as the Nε2‐H tautomer, and can therefore act as a hydrogen bond acceptor at the Nδ1 (data not shown, Pelton et al, 1993; Drohat et al, 1999). The orientation of the N189 side chain is further stabilized by H177, which is also present as the Nε2‐H tautomer (Figure 3C).

The base of G6 is hydrogen bonded to the side chain of R118 and to the backbone carbonyl of T192. R194 is further contacting both the 5′‐phosphates of G6 and U7. Additionally, two further intra‐RNA hydrogen bonds stabilize the RNA structure in the complex: the H22 of G6 is in contact with the O3′ of U5 and the 2′OH of G6 is hydrogen bonded to the O3′ of U7 (Figure 3D).

Characterization of the Fox‐1–UGCAUGU interactions by surface plasmon resonance

The interaction between Fox‐1 and UGCAUGU was further investigated using SPR. In these experiments, an RNA oligonucleotide of the sequence biotin‐5′‐CUCUGCAUGU‐3′ was immobilized on a streptavidin coated chip and binding of Fox‐1 to this oligonucleotide was monitored. The affinity of Fox‐1 to the immobilized RNA is very high (dissociation constant KD=0.49 nM at 150 mM NaCl).

To examine in greater detail the electrostatic contribution to Fox‐1–RNA complex formation and stability, we performed SPR kinetic analyses at varying salt concentrations. The affinity depends strongly on the salt concentration with both the association rate constant kon and the dissociation rate constant koff being affected (Figure 4A, Supplementary Table SII). Between 0.075 and 0.6 M NaCl, there is a linear relationship between log f± and all three log KD, log koff, and log kon (Figure 4B) (Debye and Hückel, 1923). This indicates that the activation energy of the rate‐limiting step for both association and dissociation is affected by electrostatic interactions, since log f± is proportional to the electrostatic potential of protein and RNA, and log kon and log koff are inversely proportional to the activation energy of association and dissociation, respectively.

Figure 4.

Salt dependence of RNA binding examined by surface plasmon resonance measurements. (A) Representative curves for binding of the RBD of Fox‐1 to an immobilized oligonucleotide biotin‐5′‐CUCUGCAUGU‐3′ at different salt concentrations. At 75, 150, 300 and 500 mM NaCl, binding curves for 20, 10, 5, 2.5, 1.25, 0.625, 0.312 and 0.156 nM protein, 20, 10, 5, 2.5, 1.25, 0.625 and 0.312 nM protein, 80, 40, 20, 10, 5, 2.5 and 1.25 nM protein, and 320, 160, 80, 40, 20, 10, 5 and 2.5 nM protein are shown, respectively. Curves are fit according to a 1:1 Langmuir interaction model including a correction term for mass transport limitations and are shown as grey lines. (B) Plot of log KD (•), log koff (○) and log kon (▴) versus log f±. f± is the electrostatic contribution to the mean rational activity coefficient, which is linked to the ionic strength, see Materials and methods section and Supplementary Table SII. Each data point represents the average of at least three independent measurements.

Even though the structure presented here is very precise (Table I), there are variations in the patterns of hydrogen bonds observed in each structure of the ensemble (Supplementary Table SI). Therefore, it is difficult to tell from the structure alone, which hydrogen bond patterns reflect the physical situation. To characterize important intermolecular and intra‐RNA interactions observed in the structure more precisely, we performed competition experiments with various mutant oligonucleotides. Half maximal inhibitory concentrations (IC50s) were derived for each mutant oligonucleotide. These were compared to the IC50 of the immobilized RNA to estimate the energetic contribution of individual interactions to binding (Table II). The binding affinity of the oligonucleotide that was used for structure determination, 5′‐UGCAUGU‐3′, and the immobilized RNA, 5′‐CUCUGCAUGU‐3′, are similar, as can be seen from the nearly identical IC50 values. This indicates that the additional three nucleotides of the immobilized RNA do not affect Fox‐1 binding. We then tested individual mutations in the competitor RNAs. Replacement of U1 by either A or C leads to a loss of free binding energy (ΔΔG) of 4.0 and 4.5 kJ/mol, respectively. Referring to the structure, each of these mutations results in the loss of one hydrogen bond. Mutating C3 to U leads to a more dramatic loss of free binding energy (ΔΔG=14 kJ/mol). According to the structure, this mutation leads to the loss of two hydrogen bonds. The mutations G2 to A, A4 to Purine, and A4 to Inosine lead to ΔΔG values of 15, 5.2 and 13 kJ/mol, respectively. Based on the structure, these replacements should lead to disruption of four, one and two hydrogen bonds, respectively. Replacement of U5 by C should disrupt one hydrogen bond, and generates a ΔΔG of 3.9 kJ/mol, and replacement of G6 by A should disrupt four hydrogen bonds, and leads to a binding free energy difference of 19 kJ/mol. The differences in binding energy of the mutant oligonucleotides, compared with the wild‐type binding sequence, correlate well with the predicted number of lost hydrogen bonds. The loss of one hydrogen bond gave a ΔΔG between 3.9 and 5.2 kJ/mol (U1A, U1C, A4P, and U5C). Two predicted lost hydrogen bonds gave a ΔΔG of 13 or 14 kJ/mol (A4I and C3U). The G6A mutation with four predicted hydrogen bonds lost, gave a ΔΔG of 19 kJ/mol. The one inconsistency in the correlation of ΔΔG with lost hydrogen bonds is the G2A mutation. It leads to a loss of 15 kJ/mol in binding free energy, rather low for the expected loss of four hydrogen bonds. It may be that Arg184 is stacking on G2 rather than contacting it by a hydrogen bond, as it is seen in about one‐third of the NMR structures.

View this table:
Table 2. Surface plasmon resonance studies with mutant oligonucleotides

F126 is crucial for the unusual mode of RNA binding by Fox‐1

Our structure suggests a critical role for F126 in the unusual binding of the four 5′‐nucleotides UGCA. To test the importance of F126, several mutant proteins were prepared, in which F126 was replaced by alanine, histidine, isoleucine, leucine, arginine, tryptophane or tyrosine (Figure 5A). The affinity of UGCAUGU to Fox‐1 F126A, Fox‐1 F126I and Fox‐1 F126R is reduced about 1500‐fold (KD=1.62, 1.62 and 1.58 μM, respectively, at 150 mM NaCl). This effect is comparable to the impact of replacing the RNP consensus residues H120, F158 and F160 by alanine; residues that are known to contribute significantly to RNA binding (Figure 5A). The RNA binding affinity can be almost entirely restored by substituting F126 by a tyrosine and it is only about one order of magnitude less when F126 is substituted by a histidine or a tryptophane, showing that an aromatic residue is critical in this position (Figure 5A). Finally, replacing F126 by leucine gives an intermediate affinity of 3.74 × 10−7 M (∼300 fold less), which suggests that hydrophobic packing with a residue that fits sterically can partially substitute for an aromatic side chain.

Figure 5.

F126 plays a crucial role in RNA binding. (A) Affinities of single amino‐acid mutants of Fox‐1. Values for KDs are derived from steady state binding levels at different protein concentrations using surface plasmon resonance. Each measurement was repeated three times at 150 mM NaCl and pH 7.4. (B) Overlay of sections of 2D TOCSY spectra showing the H5–H6 correlations of uracil and cytosine of ∼1 mM solutions of 5′‐UGCAUGU‐3′ in the presence of one equivalent of Fox‐1 (red), Fox‐1 F126A (black), and Fox‐1 F160A (cyan). (C) Sections of 2D NOESY spectra of a 1:1 complex of Fox‐1 (red) or Fox‐1 F126A (black) with UGCAUGU showing NOE crosspeaks to the imino protons of G6, G2 and U5.

To further investigate the role of F126 in RNA binding, we recorded a TOCSY spectrum of a 1:1 complex of Fox‐1 F126A with UGCAUGU. A comparison of this spectrum with the TOCSY spectrum of a 1:1 complex of wild‐type Fox‐1 with UGCAUGU shows that the H5–H6 correlations of U5 and U7 are almost in the same position in the spectra of these two complexes, whereas the H5–H6 correlations of U1 and C3 have changed considerably (Figure 5B). This means that U5 and U7 are bound in an analogous way in both complexes, while U1 and C3 are not. Conversely, when removing F160, an aromatic side chain on the β‐sheet surface, binding of U1 and C3 is retained, while the H5–H6 crosspeaks of U5 and U7 display very different chemical shifts (Figure 5B).

There are three spectroscopically observable imino protons present in the wild‐type Fox‐1–UGCAUGU complex, corresponding to the imino of G2, U5 and G6. This indicates that these iminos are engaged in hydrogen bonds (Figure 5C). These iminos give rise to a large number of NOE cross peaks. In the Fox‐1 F126A–UGCAUGU complex, the imino groups of U5 and G6 remain observable at almost identical chemical shifts and give rise to the same NOE cross peaks as in the wild‐type complex (Figure 5C). In contrast, the imino group of G2 is no longer observable.

Together, these results show that F126 is crucial for the unusual mode of recognition of the four 5′‐nucleotides UGCA and that RNA binding by Fox‐1 can be divided into two independent parts: a canonical part, mediating the recognition of the 3′‐terminal nucleotides via the RNP consensus residues and a novel part that mediates recognition of the 5′‐terminal nucleotides and critically depends on F126.


The structure shows that the RBD of Fox‐1 binds all seven nucleotides of the RNA heptamer UGCAUGU and explains how the first six nucleotides, UGCAUG, are recognized specifically. The structure is in agreement with in vitro selection experiments which first identified Fox‐1 as a sequence‐specific RNA binding protein with specificity to GCAUG (Jin et al, 2003). It also confirms the preference for U in the first position of the binding elements seen in studies of its role as an enhancer of alternative splicing (Huh and Hynes, 1994).

The structure of Fox‐1 in complex with UGCAUGU demonstrates a novel mode of RNA recognition by the RBD

The structure of the Fox‐1–UGCAUGU complex contains several typical attributes. Like all other RBD–RNA or RBD–DNA complexes whose structures have been solved until now (Maris et al, 2005), Fox‐1 utilizes the β‐sheet to bind several nucleotides. In the case of Fox‐1, these are U5, G6 and U7 of the UGCAUGU heptamer (Figure 2). Binding affinity to U5 and G6 is provided by hydrophobic interactions with three residues within the RNP consensus sequence, F158, F160, and H120. Specificity for U5 and G6 is mainly achieved by hydrogen bond interactions between the C‐terminus of the domain and functional groups of the bases. These structural features are very similar, for example, to oligonucleotide recognition by the first RBD of hnRNP A1 (Supplementary Figure S1) (Ding et al, 1999).

However, there are features that are unique to the mode of recognition of Fox‐1. These features mediate the binding of U1, G2, C3 and A4. Particularly important for RNA binding is the β1α1 loop that contains a phenylalanine, F126 (Figure 2). Three nucleotides, U1, G2 and C3, wrap around this phenylalanine forming a hydrophobic ‘cage’ around it. The data presented in this study show that this extension of the RNA binding platform of the RBD of Fox‐1 is independent from the interactions with the canonical binding site.

A phenylalanine at the position equivalent to F126 of Fox‐1 is found in 59 (11%) of the 531 human RBDs published in the Pfam database. In 52 (9.8%) and 23 (4.3%) additional human RBDs, it is exchanged for the similar amino‐acids tyrosine and tryptophane, respectively ( Considering the observed amino‐acid frequencies in vertebrates, which are 4.0, 3.3 and 1.3% for phenylalanine, tyrosine and tryptophane, respectively, these amino acids are significantly enriched at this position. Another example of a protein that exhibits a phenylalanine at this position of the RBD is the murine mRNA export factor REF2‐I. NMR chemical shift mapping experiments have shown that the RBD of this protein contributes to interactions with RNA using loops β1α1 and α2β4, but not via the canonical β‐sheet binding interface, with the main RNA binding site located in the flexible N‐ and C‐terminal domains (A Golovanov, G Hautbergue, L‐Y Lian, SA Wilson, personal communication, 2005). This implies that this novel feature of RNA recognition is very likely to be shared by many other RBDs. However, a histidine at the equivalent position is found in only 6 (1.1%) human RBDs and is hence under‐represented (average frequency=2.9%), even though in the case of Fox‐1, the F126H mutant has a similar affinity as the F126W mutant.

Another unique feature of RNA recognition by Fox‐1 is the unusually high number of intramolecular hydrogen bonds within the bound RNA that are important for sequence specificity and binding affinity. For example, U1 and C3 are contacting one another with one hydrogen bond (Figure 3A), G6 makes hydrogen bond contacts to both U7 and U5 (Figure 3D), and most prominently, G2 and A4 form a mismatch base pair (Figure 3B) such that A4 is solely recognized by intra‐RNA interactions. Since by NMR analysis, we could observe that the RNA is unstructured in its free form, these interactions are established upon binding to the protein (induced fit) and therefore contribute to complex stability. This is further confirmed by our SPR analysis, where we show that directed disruption of intra‐RNA hydrogen bonds leads to a loss of free binding energy. Intra‐RNA interactions at the RBD–RNA interface have been observed in other structures of RBD–RNA complexes. However, these were mostly stacking interactions that influence binding affinity but have little impact on sequence specificity (Price et al, 1998; Deo et al, 1999; Handa et al, 1999; Allain et al, 2000; Varani et al, 2000; Wang and Hall, 2001).

SPR reveals extraordinary affinity, shows the importance of electrostatic interactions for association and confirms the NMR structure

Surface plasmon resonance experiments provided additional insight into the molecular mechanism underlying RNA recognition by Fox‐1 and validated the NMR structure (Figure 4, Table II and Supplementary Table SII). The behavior of kon according to the Debye–Hückel theory was shown previously for protein–protein association (Schreiber and Fersht, 1996; Baerga‐Ortiz et al, 2004) and for an ATPase–ADP/ATP complex (Fedosova et al, 2002). Salt dependence of the koff was demonstrated for the N‐terminal domain of U1A in complex with the U1 hairpin II (U1hpII) RNA (Katsamba et al, 2001). Here, we show that the salt dependence of the koff for a protein–ligand complex follows the Debye–Hückel theory. However, the salt‐effect on koff is much weaker than on kon, while kon changes by two orders of magnitude, koff shifts by a factor of about 4 over the concentration range tested. Extrapolation to zero ionic strength, or log f±=0, gives a kon,0 of 8.1 × 1010 M−1 s−1. Since the rate constants for diffusion‐limited association for protein–ligand complexes are in the order of 105–106 M−1 s−1, the rate enhancement due to electrostatic attraction and steering in the Fox‐1–UGCAUGU complex at zero ionic strength is about 104‐ to 105‐fold (Berg and vonHippel, 1985). These findings emphasize the role of electrostatic potentials in the initial interaction of Fox‐1 with the RNA oligonucleotide. In contrast, the limited effect of the salt concentration on koff suggests that other factors beside short‐range electrostatic interactions contribute to the stability of the protein–RNA complex. These findings are in accordance with the net charges of protein (positive) and RNA (negative) and with the structure, where the RNA is engaged in several salt bridges but also in many hydrogen bond and van der Waals contacts with the protein (Supplementary Table SI).

To investigate the energetic contributions of individual intermolecular and intra‐RNA hydrogen bonds, mutations were introduced into the RNA oligonucleotide and binding studies were performed (Table II). These measurements are consistent with the intermolecular and intra‐RNA interactions observed in the complex. From our data, it appears that one hydrogen bond will lower the total free energy of the complex by about 4–7 kJ/mol, which is in accordance with the predicted value (Fersht, 1987).

The binding affinity of the Fox‐1–UGCAUGU complex with a KD of 0.49 nM at 150 mM salt is extraordinarily high for a single RBD binding to single‐stranded RNA. The N‐terminal RBD of U1A was shown to bind with similar affinities to nucleotides exposed in RNA stem‐loops. If the U1A binding sequence is present in a single‐stranded RNA, the affinity is decreased about 104‐fold (Hall, 1994). To achieve nanomolar affinity for single‐stranded RNA, most RBD proteins use multiple domains and the high affinity is lost when individual domains are deleted (Zamore et al, 1992; Serin et al, 1997; Park et al, 2000; Sladic et al, 2004). In the case of Fox‐1, the contacts to three nucleotides provided by a single phenylalanine (F126) of the α1β2 loop in addition to the canonical contacts mediated by the β‐sheet surface explain how such a high affinity is reached with a single RBD. Therefore, we asked whether an aromatic residue in the position equivalent to F126 occurs more often in RNA binding proteins containing only one RBD. We analyzed 159 single‐RBD proteins published in the Pfam database and found similar frequencies for phenylalanine (11%), tyrosine (11%), tryptophane (3.1%), and histidine (1.9%) as for the full set of 531 human RBDs. Hence, this novel kind of interaction does not seem to be generally employed to substitute for further RBDs. However, evolutionary pressure does not necessarily favor high affinity and multi‐RBD proteins could be employed to recognize the distribution of specific binding sites in addition to the sequences themselves.

Implications for alternative splicing regulation

The RBDs of human and C. elegans Fox‐1 are 75% identical. In zebra fish and mouse, the conservation is even higher ( Moreover, all the residues that are in contact with the RNA are conserved, including not just the side chains involved in direct stacking, electrostatic and hydrogen bond contacts with the RNA, but the complete side of the protein that is facing the RNA. This suggests that the mode of RNA recognition, and in particular the binding specificity of Fox‐1, is conserved from C. elegans to human.

The Fox‐1 binding sequence, UGCAUGU, is a key element for the regulation of alternative splicing (Huh and Hynes, 1994; Hedjran et al, 1997; Modafferi and Black, 1997; Lim and Sharp, 1998; Brudno et al, 2001; Baraniak et al, 2003; Jin et al, 2003). The structure of Fox‐1 in complex with UGCAUGU is particularly interesting for understanding this function of the protein. As shown in Figure 2, Fox‐1 induces a curvature in the RNA upon binding. Therefore, the binding of Fox‐1 to its RNA targets might lead to conformational changes in the RNA that in turn influence splicing regulation. Another possible role for Fox‐1 in splicing regulation could be to compete with other splicing factors for the same or overlapping binding sites. The high affinity of Fox‐1 determined by SPR indicates that Fox‐1 could be an efficient competitor for binding sites on pre‐mRNAs. However, tests of Fox‐1 activity on model substrates indicate that the protein can activate splicing from a multimerized UGCAUG element. This activation is independent of other binding elements (Underwood et al, 2005). Thus, Fox‐1 can apparently activate splicing and not just release an exon from repression by other proteins.

Materials and methods

Protein and RNA preparation

DNA encoding the RBD of Fox‐1 (residues 109–208, Swissprot Q9NWB1) was isolated by PCR amplification from a full‐length Fox‐1 cDNA clone and cloned into pET28a (N‐terminal His‐tag). The protein was expressed in transformed BL21(DE3) Escherichia coli at 37°C in minimal medium M9 containing 1 g l−1 15N‐NH4Cl and 4 g l−1 glucose (for 15N‐labeled proteins) or 1 g l−1 15N‐NH4Cl and 2 g l−1 13C‐glucose (for 15N‐ and 13C‐labeled proteins) and 50 mg l−1 kanamycin. Cells were grown to OD600≈0.6 and induced with 1 mM IPTG. Cells were harvested after 4 h by centrifugation. Cells were resuspended in 20 ml lysis buffer per litre of culture medium (300 mM NaCl, 50 mM Na2HPO4, pH 8.0, 0.002% (v/v) SUPERase RNase inhibitor (Ambion Inc.)) containing 10 mM Imidazole and were lysed by two passages through a cell cracker (Avestin Inc.). The cell lysate was centrifuged at 20 000 g and the supernatant was incubated with NiNTA beads for >1 h. After washing with lysis buffer, the protein was eluted with a step gradient of imidazole (20–500 mM). The purest fractions as judged by 18% SDS–PAGE were subjected to a second identical NiNTA affinity chromatography. Pure fractions were dialyzed against 5 l NMR buffer (20 mM NaCl, 10 mM NaH2PO4, pH 6.5). The protein was concentrated to ∼1 mM by centrifugation at 4°C using a 5 kDa molecular mass cutoff membrane. The identity of the protein was confirmed by MALDI MS and N‐terminal Edman sequencing. The yield of purified Fox‐1 was ∼10 mg l−1 of culture medium. Protein mutagenesis was carried out following the instructions given by the manufacturer (Quick Change Site‐Directed Mutagenesis Kit, Stratagene). All unlabeled RNA oligonucleotides were purchased from Dharmacon Research, deprotected according to the instructions by the manufacturer, desalted using a G‐15 size exclusion column (Amersham), lyophilized and resuspended in NMR buffer (20 mM NaCl, 10 mM NaH2PO4, pH 6.5) or water. Oligos of the sequence 5′‐UGCAUGU‐3′ with 13C‐labeled sugars of U1, C3 and U5 or 13C‐labeled sugars of G2, A4, G6 and U7 were chemically synthesized by LR and SP (manuscript in preparation).

NMR measurements and resonance assignments

NMR spectra were recorded at 313 K on Bruker DRX‐500, DRX‐600 and Avance 900 spectrometers. Data was processed with XWINNMR (Bruker) and analyzed with Sparky ( Protein backbone 1H and 15N resonance assignments for the free protein were obtained using HNCA, HN(CO)CA (Grzesiek and Bax, 1992) and CBCA(CO)NH (Grzesiek and Bax, 1993) spectra acquired on a 15N, 13C labeled protein in 90% H2O, 10% 2H2O. Complete protein backbone 1H, 15N and 13C resonance assignments of the complex were obtained for residues 116–196 using HNCA and CBCA(CO)NH spectra acquired on a 15N, 13C labeled protein in complex with unlabeled 5′‐UGCAUGU‐3′ in 90% H2O, 10% 2H2O. Aliphatic side chain assignments were obtained from H(C)CH‐TOCSY (Bax et al, 1990), 3D 15N and 13C NOESY‐HSQC (τm=150 ms) (Talluri and Wagner, 1996; Baur et al, 1998) and 15N and 13C HSQC experiments (Susumu Mori et al, 1995). Aromatic side chains were assigned using 2D TOCSY (τm=50 ms) (Bax and Davis, 1985) and 2D NOESY (τm=150 ms) (Wider et al, 1984) spectra. Resonance assignments of the RNA were obtained from 2D NOESY, 2D TOCSY and natural abundance 13C HSQC experiments of 15N‐labeled protein in complex with unlabeled RNA and confirmed by 13C HSQC spectra recorded with 15N‐labeled protein in complex with 5′‐UGCAUGU‐3′ with 13C‐labeled sugars of U1, C3 and U5 or 13C‐labeled sugars of G2, A4, G6 and U7.

Experimental restraints

Seven cycles of CANDID and DYANA (Guntert et al, 1997; Herrmann et al, 2002) were run to yield a list of automatically assigned intramolecular protein NOE distance constraints. This calculation included peak lists from 3D 15N‐ and 13C‐edited NOESY‐HSQC (τm=150 ms) and 2D NOESY (τm=150 ms) experiments. The automatically generated list was reviewed and combined with a list containing manually assigned intra‐RNA, intermolecular and additional intra‐protein NOE distance restraints. These constraints were derived from 3D 15N‐ and 13C‐edited NOESY‐HSQC, 2D NOESY, 2D F1‐edited, F2‐edited NOESY, 2D F1‐filtered, F2‐edited NOESY (Peterson et al, 2004) and 3D 13C F1‐filtered, F3‐edited NOESY‐HSQC (Zwahlen et al, 1997) spectra on a complex of 13C, 15N‐labeled protein and unlabeled RNA, as well as from 2D F1‐filtered, F2‐edited NOESY spectra of complexes of 15N‐labeled protein and RNA having 13C‐labeled sugars of either U1, C3, U5 or G2, A4, G6, U7. To exclude that critical NOE cross peaks arose from spin diffusion, a 2D NOESY with a short mixing time (30 ms) was recorded and critical NOE restraints were reviewed. NOE cross peaks to the imino protons of G2, U5 and G6 could be observed at 293 K in a 2D NOESY spectrum. Dihedral angle constraints for the sugars of G2–U7 (130°⩽δ⩽190°, i.e., C2′‐endo) were added based on high H1′–H2′ cross‐peak intensities in the 2D TOCSY experiment. In total, 26 intra‐protein hydrogen bond constraints were based on slow exchanging amides (15N‐HSQC after ∼3 h in 2H2O at 40°C), typical Cα shifts and NOE cross‐peak patterns typical for secondary structure elements; 3 intermolecular hydrogen bond constraints were based on observable imino protons of G2, U5 and G6 and careful analysis of local NOE cross‐peaks. The tautomeric state of His130 and His187 was determined from 15N HMQC spectra (Pelton et al, 1993; Drohat et al, 1999). Distance restraints were calibrated using cross‐peak intensities corresponding to fixed inter‐atomic distances and were assigned upper distance limits of 3.0 (strong), 4.5 (medium) and 6.0 Å (weak) and lower distance limits of 1.8 Å.

Structure calculation

With the final set of constraints, a total of 100 structures of the complex were generated in DYANA (Guntert et al, 1997) starting from random structures. The 30 structures with the lowest target function were refined in a restrained simulated annealing run in implicit solvent (generalized Born solvation model (Tsui and Case, 2000)) in the SANDER module of AMBER 7.0 (Pearlman et al, 1995) using the Cornell et al (1995) force field. The same simulated annealing protocol as described by Padrta et al (2002) was used, except that the system was heated to 1500 K and that the time constant for heat bath coupling (TAUTP) was gradually decreased from 0.1 to 0.05 ps during the last picosecond of simulation. The final structures were analyzed with PROCHECK (Laskowski et al, 1996).

Surface plasmon resonance

Analyses were carried out using a BIAcore 3000 instrument. All experiments were performed at 25°C using HBS (10 mM HEPES, 150 mM NaCl, 3.4 mM EDTA, 0.005% P20, pH 7.4) as running buffer. When required, NaCl concentration and pH have been adjusted to 75, 125, 225, 300, 400, 500, 600 mM and to 6.0 and 8.5, respectively. For kinetic studies, 5–8 RU of 5′ biotinylated CUCUGCAUGU were captured on an SA‐chip (BIAcore). Background noise and unspecific binding were corrected using an untreated surface as control surface. Binding studies were carried out injecting serial dilutions of Fox‐1 at a flow rate of 70 μl min−1 for 90 s over the specific and reference surfaces. Protein samples were injected for three times in random order. At the end of each cycle, surfaces were washed with three consecutive 1 min‐injections of 1 M NaCl. The reported mean values were derived from at least three independent experiments. Data were globally fit to a simple 1:1 Langmuir interaction model with a correction for mass transport using BIA evaluation software 3.1. Most mutant Fox‐1 proteins displayed unfavorable kinetics for SPR kinetic analyses, such that kon and koff could not be reliably determined from curve fits. Therefore, affinity constants of mutant proteins were derived from steady‐state binding levels at different protein concentrations using a chip surface coated with ∼10 RU of biotinylated RNA and longer association times. For the inhibition assays, 20 RU of biotinylated oligonucleotide were captured on the SA chip. Fox‐1 at 2 nM was incubated with different concentrations of mutant oligonucleotides. Solutions were injected for 2 min at a flow rate of 20 μl/min over the specific and reference surface. Surfaces were regenerated with three 1 min injections of 1 M NaCl. Inhibition curves were obtained by monitoring the decrease of binding response upon increase of oligonucleotide concentration. Values for half‐maximal inhibition (IC50) were calculated from fitting curves. Each inhibition assay was carried out in triplicate.


According to the Debye and Hückel (1923) theroy, log f± is related to the ionic strength as

Embedded Image

where I is the ionic strength of the solution, ∣z1z2∣ is the charge product of protein and ligand, A=0.512 M−1/2, B=0.329*108 M−1/2 cm−1 (Robinson and Stokes, 2002) and a is an adjustable parameter and gave best fits as a=5.6 Å.

Structural data. All restraints used in structure determination and the derived atomic coordinates for the 30 final structures have been deposited at the Protein Data Bank with accession code 2ERR.

Supplementary data

Supplementary data are available at The EMBO Journal Online.

Supplementary Information

Supplementary Figure S1 [emboj7600918-sup-0001.pdf]

Supplementary Table S1 [emboj7600918-sup-0002.pdf]

Supplementary Table S2 [emboj7600918-sup-0003.pdf]


We would like to thank Dr Richard Stefl for help with computational issues, Goran Malojčić for invaluable discussions and Abdelhamid Benattallah for backbone assignment of the free protein. We would like to thank Professor John A Robinson for helpful support and A Golovanov for communicating unpublished data. We acknowledge the Functional Genomics Center Zurich, especially Mike Scott, for providing explanations on and access to the Biacore instrument. This investigation was supported by grants from the Swiss National Foundation, National Center for Competence in Research Structural Biology to FHTA and SP and by the Roche Research Fund for Biology at the ETHZ to FHTA. FHTA is an EMBO Young Investigator.


  • PhD Program for Molecular Life Sciences Zurich, Switzerland


View Abstract