The NusB protein of Escherichia coli is involved in the regulation of rRNA biosynthesis by transcriptional antitermination. In cooperation with several other proteins, it binds to a dodecamer motif designated rrn boxA on the nascent rRNA. The antitermination proteins of E.coli are recruited in the replication cycle of bacteriophage λ, where they play an important role in switching from the lysogenic to the lytic cycle. Multidimensional heteronuclear NMR experiments were performed with recombinant NusB protein labelled with 13C, 15N and 2H. The three‐dimensional structure of the protein was solved from 1926 NMR‐derived distances and 80 torsion angle restraints. The protein folds into an α/α‐helical topology consisting of six helices; the arginine‐rich N‐terminus appears to be disordered. Complexation of the protein with an RNA dodecamer equivalent to the rrn boxA site results in chemical shift changes of numerous amide signals. The overall packing of the protein appears to be conserved, but the flexible N‐terminus adopts a more rigid structure upon RNA binding, indicating that the N‐terminus functions as an arginine‐rich RNA‐binding motif (ARM).
Modification of the efficiency of transcription termination at specific sites is used as a regulatory mechanism in prokaryotic and eukaryotic organisms. It has been studied in considerable detail in the Escherichia coli phage λ, where it plays a crucial role in the switch between the lysogenic and the lytic cycle (for review, see Richardson and Greenblatt, 1996).
The genes of the E.coli phage λ are organized in several operons. Only promoter‐proximal genes are transcribed during the lysogenic state. The transition to the lytic cycle is characterized by extension of the transcripts into the more distal parts of the operons, which becomes possible by overreading specific terminator sequences. This phenomenon is mediated by the phage protein N in conjunction with several host proteins (DeVito and Das, 1994; Mogridge et al., 1995).
The host genes encoding the proteins involved in this process are collectively designated nus for ‘N utilization substance’. More specifically, the products of the genes nusA, nusB, nusE and nusG and at least one additional, so far unidentified, gene product are required. The NusE protein has been shown to be identical to the S10 protein of the small ribosomal subunit (Friedman et al., 1981; Richardson and Greenblatt, 1996).
The various components of the antitermination system exert their effect on phage λ transcription by interacting with phage mRNA sites designated nut (for N utilization). The two sequence motifs that form the λ phage nut sites are designated boxA and boxB (Figure 1) (Nodwell and Greenblatt, 1991). A large body of genetic and biochemical evidence indicates that λ phage N protein interacts with boxB, whereas Nus proteins of the bacterial host interact with boxA of the bacteriophage mRNA. A complex involving the various protein components of the antitermination systems forms via weak protein–protein interactions. NusB protein and NusE/S10 protein have also been shown to form a heterodimer in vitro (Mason et al., 1992; Nodwell and Greenblatt, 1993).
Whereas the recruitment of the E.coli Nus proteins for the regulatory purposes of lambdoid phages has been analysed in considerable detail, the physiological role of the Nus proteins in the bacterial cell is less well understood. The available evidence indicates that antitermination mediated by these proteins is one of numerous mechanisms involved in the control of rRNA biosynthesis (for review, see Keener and Nomura, 1996). The E.coli genome includes seven rrn operons coding for rRNA. Each of these operons contains the sequence motif UGCUCUUUAACA (designated rrn boxA) at a location slightly downstream from the transcription start site. Binding of NusB protein in cooperation with other Nus proteins at rrn boxA is supposed to modulate the efficiency of ribosomal biosynthesis via control of rRNA synthesis, thus influencing the growth rate of the microorganism (Gaal et al., 1997). Nonsense mutations of the nusB gene therefore lead to a reduced growth rate (Taura et al., 1992).
An antitermination mechanism has been found to play an important role in the replication cycle of human immunodeficiency virus (HIV) (Cheng et al., 1991; Karn and Graeble, 1992; Harada et al., 1996). The viral Tat protein interacts with the tar sequence in the long terminal repeat of the nascent viral RNA. The λ phage/E.coli antitermination system may be able to serve as a model for the antitermination mechanism of HIV.
Within the last few years, the three‐dimensional structures of several RNA‐binding proteins have been reported. The three most common folds of such proteins, ribonucleoprotein (RNP), K‐homology (KH) and double‐stranded RNA‐binding domains (dsRBD), all show a mixed αβ topology. It has therefore been suggested that the αβ topology might represent a particularly favourable framework for RNA recognition (for review, see Varani, 1997). However, more recently, a number of all‐helical RNA‐binding proteins have also been described (Berglund et al., 1997; Markus et al., 1997; Xing et al., 1997).
The secondary structure of NusB has been characterized recently as completely α‐helical (cf. Altieri et al., 1997; Berglechner et al., 1997). Here we report the solution structure of the NusB protein of E.coli, based on multidimensional nuclear magnetic resonance (NMR) data. It contains a putative arginine‐rich RNA‐binding motif (ARM), but otherwise does not resemble any known RNA‐binding fold. Our studies of the interaction of NusB protein with its specific binding site, rrn boxA RNA, further suggest that the binding of RNA induces a stable conformation of the flexible N‐terminal segment.
Most of the 1H, 13C and 15N resonances of the NusB protein have been assigned by multidimensional heteronuclear NMR experiments, as reported earlier (Altieri et al., 1997; Berglechner et al., 1997). A total of 89% of the backbone resonances of NusB were assigned from NMR spectra of U‐13C,15N‐labelled NusB samples that were also 75% randomly deuterated for improved spectral resolution (Berglechner et al., 1997). The remainder of the backbone resonances could not be observed either in the triple resonance or in 15N‐resolved nuclear Overhauser enhancement (NOE) spectra. Specifically, residues located in the hypothetical RNA‐binding region (see below), including the putative ARM region (residues 1–10) and the central part of the loop linking helices 2 and 3 (residues 40–42), failed to yield any detectable signals. This is most likely due to the absence of a stable conformation of these regions when not bound to RNA (for details, see Discussion).
The secondary structure of NusB has been published previously, based on an analysis of the NOE data involving the backbone protons, 3JHNHα coupling constants, amide exchange rates and the chemical shift information (Berglechner et al., 1997). For the determination of the tertiary structure, overlapping resonances were resolved in a three‐dimensional 15N,15N‐HMQC‐NOESY‐HSQC spectrum (Kay et al., 1990) at 600 MHz and a three‐dimensional 13C,13C‐HSQC‐NOESY‐HSQC spectrum (Clore et al., 1991; Bax and Grezesiek, 1993) at 750 MHz. All NH–NH NOEs were measured on the 75% fractionally deuterated sample to achieve increased amide relaxation times and reduced spin diffusion. This results in a significant increase in the number of observable NOEs compared with NMR studies on the fully protonated protein (Torchia et al., 1988; Grzesiek et al., 1995; Venters et al., 1995). Representative strip plots of the NH–NH NOEs for part of helix 6 are shown in Figure 3 to document the quality of the acquired spectra.
For the determination of the three‐dimensional structure of the NusB protein, 1926 NMR‐derived distances and 80 torsion angle restraints were used (Table I, Figure 4). Due to missing assignments and lack of NOEs, the N‐terminal 10 residues as well as residues 39–45 are largely unstructured in the uncomplexed protein (Figure 4). For the well‐defined regions (residues 11–38 and 46–139), the atomic r.m.s. distribution of the best 18 structures about the mean coordinate position is 0.39 ± 0.1 Å for the backbone atoms and 0.81 ± 0.16 Å for all heavy atoms.
Structure of NusB
NusB is composed of six helices, formed by residues Cys12–Leu22 (helix 1), Ile27–Leu35 (helix 2), Leu46–Leu65 (helix 3), Gln79–Ser93 (helix 4), Tyr100–Ser113 (helix 5) and His120–Ala131 (helix 6). As indicated by the chemical shift deviations and the large 3JHNHα coupling constants of Thr57 and Asn58, helix 3 displays a distinctive kink at Thr57–Thr59.
In the calculated ensemble of NusB structures, all individual secondary structure elements are very well defined. This is also true for the relative orientation of the helices. While the N‐terminus (residues 1–10) is largely disordered in the structural ensemble (due to missing NMR data in this region), the C‐terminus shows only little divergence between the structures. Of the five loops connecting the helices, four are reasonably well ordered. However, the loop linking helices 2 and 3 (residues 36–45) seems to be largely unstructured, as indicated by NMR data (i.e. very weak or completely missing signals). Putative NusB homologues from different bacteria show considerable variability in the length of this loop (Figure 2). The hypothetical NusB protein of the cyanobacterium Synechocystis sp. (not shown in Figure 2) shows an insertion of 117 amino acid residues in comparison with the E.coli sequence. The absence of sequence and length conservation in this loop region agrees well with the high degree of flexibility observed here by NMR.
The overall fold of NusB can be characterized as an α/α sandwich, with helices 1 and 3 forming one layer, and helices 4–6 forming a second layer co‐planar to the first. Helix 2 is packed onto the interface between helices 1 and 3 on the outside of the first layer (Figure 5).
The fold of the subdomain formed by helices 1, 2 and 3 is stabilized by interactions between hydrophobic side chains of helix 1 (Ala16, Leu17, Trp20, Leu22), helix 2 (Ile27, Ala28, Val30, Tyr32, Leu35, Ala36) and helix 3 (Leu46, Phe48, Leu51, Leu52, Val55, Ala56). Helix 1 and the N‐terminal part of helix 3 (residues 46–56) are oriented nearly parallel.
In the second α‐helical layer (helices 4, 5 and 6), the helices are also positioned in an almost planar arrangement. Interactions between hydrophobic side chains of helix 4 (Val80, Ala83, Val84, Ile87, Ala88, Leu89, Tyr90, Leu92) and helix 5 (Val102, Ala103, Ile108, Leu110, Ala111), as well as between helix 5 (Ala103, Ile104, Ala107, Ile108, Leu110, Ala111) and helix 6 (His120, Val123, Val126, Leu127, Ala130, Ala131), stabilize the fold of this subdomain.
The only covalent connection between both layers is the long loop connecting helices 3 and 4 (residues 66–78). In addition, there is a wealth of hydrophobic interactions between the two subdomains stabilizing the tertiary structure, as indicated by numerous NOE contacts. Residues from helix 1 (Ala13, Val14, Ala16, Tyr18, Leu22) interact with residues of helices 5 (Val102, Ile104, Ile108) and 6 (His120, Leu127, Ala131), while residues from helix 3 (Leu46, Leu52, Ala53, Ala60, Tyr61, Leu62) interact with residues of helices 4 (Ala83, Leu89, Tyr90), 5 (Ile108) and 6 (Glu117, His120).
Binding of RNA to NusB
To investigate the interactions between NusB protein and an RNA dodecamer equivalent to the rrn boxA motif (Figure 1), a series of 1H,15N‐HSQC spectra were recorded. The chemical shifts of both nuclei are sensitive to their local electronic environment and therefore can be used as a probe for interactions between the labelled protein and unlabelled RNA. Presumably, the strongest perturbation of the electronic environment will be observed for the residues that either come into direct contact with RNA or that are involved in major conformational changes upon binding to RNA.
NusB protein (0.7 mM) complexed to an excess of the RNA dodecamer exhibited only one set of peaks in the 1H,15N‐HSQC spectra (Figure 6A). This indicates the presence of only a single conformation of the protein–RNA complex on the NMR time scale. Titration of NusB with substoichiometric amounts of the RNA dodecamer yielded two sets of peaks, one identical to the set observed for the free protein, the other identical to the peaks of the fully complexed protein.
Due to the large number of signals and the absence of any intermediate signals, the complete assignment of the spectra of NusB bound to RNA has not been accomplished yet. However, there are a number of interesting insights which can be gained even from a preliminary analysis of the spectra of the complex. While a large number of signals migrate upon binding, some are clearly not affected, as for example the NH signals of Trp20, Val45 and Asp97 (Figure 6A), suggesting that the local environments of these amino acid residues involved in interhelical contacts are not affected by RNA binding.
Furthermore, a set of four additional signals appears in the Arg‐NHε region (Figure 6B), which were not visible in the spectra of the uncomplexed protein and therefore belong to the four previously unassigned arginines 6, 7, 8 and 10. The Arg side chain NHs of arginines 49, 72, 86, 95 and 135 (assigned from NOE data in the free NusB spectra) are not affected by RNA addition, with the exception of a small shift for Arg86.
In contrast, complexation of the NusB protein with the RNA mutant C3G, reported as non‐binding under the low RNA concentration conditions of band shift assays (Nodwell and Greenblatt, 1993), showed that some of the N‐terminal arginine residues of the proposed ARM motif are not visible and others are shifted differently (data not shown). This suggests that the mutant RNA does actually bind under the high RNA concentration conditions of the NMR experiment. However, the differences in the protein HSQC patterns of the complexes with wild‐type and mutant boxA RNA indicate that at least some of the observed NusB–RNA interactions are indeed sequence specific.
Structure of NusB
Most sequence‐specific RNA‐binding proteins contain domains of 60–90 amino acids responsible for RNA recognition and additional domains involved in regulation and other activities (Biamonti and Riva, 1994; Burd and Dreyfuss, 1994; Nagai, 1996; for a review, see Varani, 1997). The three most common folds of RNA‐binding proteins—RNP, KH and dsRBD—form compact globular structures (Nagai, 1996). They all are αβ proteins with an antiparallel β‐sheet on one face of the protein packed via a hydrophobic core against a layer of α‐helices. This αβ structural theme is conserved in many RNA‐binding proteins, even if they do not share sequence homology with other members of these three structure families. Examples include ribosomal proteins (Liljas and Garber, 1995) and other protein factors involved in translation (Biou et al., 1995; Garcia et al., 1995; Kang et al., 1995; Liljas and Garber, 1995). The three families of RNA‐binding αβ proteins have distinct topologies: the RNP domain contains tandem repeats of a βαβ motif (Nagai et al., 1990; Hoffmann et al., 1991), dsRBDs have an αβββα topology (Bycroft et al., 1995; Kharrat et al., 1995) and KH proteins display a βααββα fold (Musco et al., 1996).
The best characterized example of an all‐helical RNA‐binding protein, the Rop protein, displays a four‐helix bundle formed by the association of two identical helix–turn–helix monomers (Banner et al., 1987; Eberle et al., 1991). Recently, the structures of two entirely α‐helical rRNA‐binding proteins (S15 and L11) have been determined by NMR (Berglund et al., 1997; Markus et al., 1997; Xing et al., 1997). However, the three‐dimensional structure of NusB is different from that of either of these proteins, as well as from that of any other protein or domain structure published so far, according to a search of the Brookhaven protein data bank (Bernstein et al., 1977) with the program DALI (Holm and Sander, 1993).
Complex formation between NusB protein and rrn boxA RNA leads to a shift of many of the amide backbone signals of the protein. For a reliable determination of the interface region, it will therefore be necessary to repeat the assignment process for most of the backbone signals. However, the HSQC data already indicate that a major change occurs in the N‐terminal region representing the putative ARM motif. It has been shown earlier that the nusB5 mutation (replacement of Tyr18 by Asp in NusB; shown in yellow in Figure 5C) prevents switching to the lytic cycle in phage λ infected E.coli cells but does not confer a cold‐sensitive phenotype which has been described for an insertion mutation (ssyB63) and an early amber mutation (ssaD5) (for summary, see Court et al., 1995). This suggests tentatively that Tyr18 may be essential for interaction with phage λ boxA and/or N protein, but not for recognition of bacterial boxA.
Genetic studies indicate that a D118N mutant (nusB101) (shown in green in Figure 5D) of NusB protein fails to interact with NusE/S10 (see Court et al., 1995). The nusB101 mutation can be suppressed by the nusE71 mutation (A86D). Asp118 is located in the loop between helices 5 and 6 of the NusB protein, in close proximity to the N‐terminus containing the proposed ARM region. This suggests that this surface region is involved in the interaction with the NusE/S10 protein as well as with the RNA site. To provide a structural scaffold for the ARM region to fold against when binding to the presumably also unstructured boxA RNA, it is to be expected that the structured part of NusB will also be involved in RNA binding as well as possibly other Nus factors.
Comparison of free and RNA‐bound protein
In the so‐called ‘basic‐domain class’ of RNA‐binding proteins, RNA recognition is mediated by a sequence of 10–15 amino acids rich in arginine and lysine residues. The sequences and conformations of the ARMs in different proteins vary widely (Lazinski et al., 1989). For example, the ARM region of bovine immunodeficiency virus (BIV) Tat binds as a β hairpin to RNA (Puglisi et al., 1995; Ye et al., 1995) while the arginine‐rich region adopts an α‐helical conformation for either the HIV‐1 Rev peptide (Battiste et al., 1996; Ye et al., 1996) or the λ N (Su et al., 1997a,b; Zwahlen et al., 1997) and P22 N peptides (Cai et al., 1998) upon binding to RNA. Many of the ARM proteins are partially or completely unfolded in the absence of RNA and only adopt a stable conformation upon binding to RNA (Calnan et al., 1991; Wolberger, 1996; Van Gilst et al., 1997).
This is in good agreement with the observations made for NusB and its protein–RNA complex. In the three‐dimensional structure, the two regions ill defined according to the NMR data, namely the N‐terminus containing the ARM motif (residues 1–10) and the loop linking helices 2 and 3 (residues 39–44), are located in close proximity to each other at the edge of the subdomain consisting of helices 1–3. In the spectra of the complex, the Arg‐NHε signals of the ARM motif are detectable, suggesting that binding to the recognition site induces a well‐defined structure at least for the arginine side chains in the N‐terminus. Such a hypothesis is in good agreement with the induced fit observed for other proteins with ARM regions upon binding to RNA. At the same time, not all backbone amides are affected, indicating that the overall structure of the protein remains conserved in the complex.
The fact that only two distinct conformations for the free and complexed protein can be detected in the titration experiments suggests that the lifetime of the complex is above the millisecond range. A more detailed analysis of the complex by NMR is currently in progress.
Materials and methods
15NH4Cl was purchased from Isotec (Miamisburg, FL). RNA was synthesized by Xeragon AG, Zürich (Switzerland). Isotope‐labelled NusB protein was prepared as described earlier (Berglechner et al., 1997).
NMR spectroscopy and structure determination
All NMR experiments were performed at 22°C on four‐channel Bruker DMX600 and Bruker DMX750 spectrometers. Assignment of the 1H, 13C and 15N resonances of the backbone and the side chains has been described previously (Berglechner et al., 1997). Distance information was obtained from a series of 15N‐ or 13C‐resolved three‐dimensional NOESY experiments: 15N‐NOESY‐HSQC (Bax et al., 1990) (uniformly 15N‐labelled sample), τm = 80 ms; 15N‐NOESY‐HSQC (uniformly 15N‐labelled and fractionally deuterated sample), τm = 100 ms; 13C‐NOESY‐HSQC (Ikura et al., 1990; Zuiderweg et al., 1990) (uniformly 13C/15N‐labelled sample), τm = 50 ms; 15N,15N‐HMQC‐NOESY‐HSQC (Kay et al., 1990) (uniformly 15N‐labelled and fractionally deuterated sample), τm = 150 ms; 13C,13C‐HSQC‐NOESY‐HSQC (Clore et al., 1991; Bax and Grezesiek, 1993) (uniformly 13C/15N‐labelled sample), τm = 50 ms. Amide protons involved in hydrogen bonds were obtained from an analysis of the amide exchange rates measured in MEXICO experiments (Gemmecker et al., 1993) on the deuterated sample with mixing times τm of 50, 100, 150 and 200 ms. Stereospecific assignments of the Leu and Val methyl groups were obtained by non‐random fractional 13C‐labelling (Neri et al., 1989). Side chain amide protons of Asn and Gln were stereospecifically assigned based on an H2NCO‐E.COSY spectrum (Löhr and Rüterjans, 1997).
The structure ensemble for NusB was generated with the X‐PLOR program (Brünger, 1992) by a simulated annealing protocol (Nilges et al., 1988, 1991), and subsequently refined by incorporating van der Waals and electrostatic potentials. The structure calculations employed a total of 1926 proton–proton distance restraints. The NOE‐derived distance restraints were given upper bounds of 2.2, 2.9, 3.8, 4.9 and 6.0 Å based on the measured NOE intensities. Fifty‐one hydrogen bonds were implemented as ambiguous distance restraints between NHi and COi−3 and COi−4, respectively, to let the calculation choose the actual donor–acceptor pairing (M.Nilges, personal communication). In addition, 67 Φ angle restraints and 13 3JHNHα coupling constants obtained from the HNHA data (Vuister and Bax, 1993) were also included in the structure calculations. Diastereotopic assignments were obtained by the AQUA‐module ASSIGNCHECK (Laskowski et al., 1996).
The coordinates of the ensemble of 18 structures for NusB have been deposited in the Brookhaven Protein Data Bank under the ident code 1baq.
We thank H.Oschkinat and D.Oesterhelt for isotope‐labelled algal hydrolysate, and Xeragon AG, Zürich, for a generous gift of synthetic RNA. This work was supported by the Deutsche Forschungsgemeinschaft, the Sonderforschungsbereich 369, the Dr.‐Ing. Leonhard Lorenz‐Stiftung and the Fonds der Chemischen Industrie.
- Copyright © 1998 European Molecular Biology Organization