SacY belongs to a family of, at present, seven bacterial transcriptional antiterminators. The RNA‐binding and antitermination capacity of SacY resides in the 55 amino acids at the N‐terminal [SacY(1‐55)]. The crystal structure at 2 Å resolution shows that SacY(1‐55) forms a dimer in the crystal, in accordance with the NMR solution structure. The structure of the monomer is a four‐stranded β‐sheet with a simple β1β2β3β4 topology. One side of the sheet is covered by a long surface loop and the other side forms the dimer interface. The dimer is stabilized by the orthogonal stacking of the two β‐sheets. The crystal structure is in excellent agreement with the NMR solution structure (r.m.s. distance for Cα coordinates is 1.3 Å). The structure of SacY(1‐55) reveals a new RNA‐binding motif.
The Bacillus subtilis sacB gene codes for the sugar metabolizing enzyme levansucrase and is inducible by its substrate, sucrose. This enzyme functions in the extracellular medium and synthetizes levan polysaccharide from sucrose. In the absence of inducer, a terminator, situated in the control region of the sacB gene (represented in Figure 1A), causes premature termination of transcription (Shimotsu and Henner, 1986). Alterations to this terminator result in constitutive expression. The presence of inducer leads to activation of the antiterminator protein SacY that binds to the RNA transcript at the level of an antiterminator sequence, partially overlapping with the terminator. It was shown that the folding of this RAT (Ribonucleic acid AntiTerminator) into a stem‐loop structure (Figure 1B) is essential for the binding of SacY. This binding could inhibit the formation of the terminator stem‐loop and thereby prevent the early termination of transcription (Aymerich and Steinmetz, 1992).
SacY belongs to a bacterial antiterminator protein family, which has to date seven members. All these proteins contain ∼280 amino acids and show considerable homology. None of their sequences however contains a recognizable RNA‐binding motif (Draper, 1995). It was shown in vitro that the RNA‐binding capacity of SacY is situated in the N‐terminal 55 residues [SacY(1‐55)] (Manival et al., 1997). Specific RNA recognition was also proven for the corresponding regions in three other bacterial transcriptional antiterminators: BglG, LicT and SacT. The N‐terminal fragments are as efficient as the intact proteins in preventing early transcriptional termination, but are no longer regulated by the inducer. It was therefore proposed that all the members of this antiterminator family consist of two functional regions: (i) an RNA‐binding domain; and (ii) a domain responsible for regulation. The remaining part of SacY (residues 56 to 280) is indeed responsible for the regulation of the antiterminating activity: this domain senses indirectly the external level of inducer through the general energy coupling proteins of the phosphoenolpyruvate:sugarphosphotransferase system and a sucrose‐specific PTS‐permease (Tortosa et al., 1997).
Many RNA‐binding proteins have modular structures consisting of one or more RNA‐binding motifs and auxiliary domains with additional functions. Structural information on RNA‐binding motifs is progressing, but detailed knowledge on protein‐RNA interactions is still very limited. The structures of some RNA‐binding modules such as the K‐homologous (KH) domains, the ribonucleoprotein (RNP), or the double‐stranded RNA‐binding domains, possess a central β‐sheet flanked on one or both sides by helices (Bycroft et al., 1995; Kharrat et al., 1995; Musco et al., 1996; Nagai, 1996). Structural information and biochemical experiments have shown the importance of the β‐sheet in the interactions of these modules with their respective target RNAs. SacY(1‐55) does not possess sequence similarity with any known RNA‐binding motif. In the accompanying paper we showed by NMR that SacY(1‐55) forms a well‐defined structural motif and that it is present in solution as a symmetrical dimer (Manival et al., 1997). We describe here the crystal structure of this motif at 2.0 Å resolution, which is in excellent agreement with the solution structure. We show that SacY(1‐55) is present as a non‐crystallographic dimer in two different crystal forms and that the oligomeric structure introduces a new structural protein motif for interaction with RNA.
Solving the structure and quality of the model
Classical heavy atom soaking did not yield a useful derivative, as could be expected for a small protein with no cysteines and a single histidine. We therefore introduced single cysteines by site‐directed mutagenesis. As explained in Materials and methods, only double mutants carrying an extra cysteine at the N‐terminal crystallized readily, even one that refused to do so as a single mutant. Two of these double mutants (NTC/A26C; NTC/D54C) yielded Hg derivatives of sufficient quality to solve the structure. The results of data collection and the refinement statistics are gathered in Table I. The structure was solved in crystal form II (P21; unit cell dimensions a = 32.6 Å, b = 40.25 Å, c = 41.0 Å and β = 94.8°) since this was the only one of the six crystal forms we obtained, which did not need Cd ions in the crystallization medium. Refinement was carried out with the much better diffracting crystal form III (P21; a = 29.7 Å, b = 51.2 Å, c = 38.9 Å and β = 108.2°. The model refined well, but the Rfree factor remained rather high. Inspection of the 2Fo−Fc difference maps after refinement of the protein model revealed three strong peaks. We interpreted these peaks as bound Cd ions and introduction of these peaks in the refinement (full occupancy) improved the Rfree value considerably. The Cd ions have comparable B‐factors to the protein, indicating that they are strongly bound in the crystal. One Cd ion is mediating an important crystal contact between His9 and Glu19 of a symmetry‐related molecule. The two remaining Cd ions are bound to Glu20 of both monomers. Details of the binding modes of the Cd ions will be discussed elsewhere. The structure of SacY(1‐55), containing three additional Cd ions, four Cl ions and 25 water molecules was refined to an R‐factor of 21.9% using data between 8.0 and 2.0 Å (Rfree is 28%, 7% of the data never included in the refinement). The quality of the final 2Fo−Fc electron density map can be judged from Figure 2. The structure is well defined for residues 1‐50, but no electron density is observed beyond. Therefore, the region between residues 51 and 55 seems disordered, which agrees with the absence of long‐distance NOEs in the NMR structure for this region (Manival et al., 1997).
Structure of the protein
The protein is present identically in both crystal forms as a symmetric dimer. Figure 3 shows a representation of the global fold and packing of the SacY(1‐55) dimer. The monomer consists of a four‐stranded antiparallel β‐sheet, in the order β1‐β2‐β3‐β4, with a pronounced right‐handed twist. The N‐terminal (residues 1‐3) is not part of the β‐sheet, but is stabilized by main chain hydrogen bonds with residues 35‐37 in the long loop. The end of the first strand (residues 4‐7) contains a bulge, stabilized by hydrogen bonds between the main chain nitrogens of Leu7 and Asn8 on strand one and the main chain carbonyl of Ala11 on strand two. The first three strands are connected by two sharp turns. The type I turn between strands 2 and 3 is ordered (residues 16‐19) in one monomer (where it is stabilized by symmetry contacts), but its electron density is almost absent in the other monomer. Despite the influence of crystal contacts, the conformation of this loop is in agreement with the NMR solution structure. The apparent disorder is probably due to inherent flexibility, since weak or no electron density for this loop was observed for crystal structures determined in four different space groups (H.van Tilbeurgh, unpublished results). These observations are also in agreement with a structurally less well‐defined part of the molecule in the NMR solution structure and are explained by the fact that this turn sticks out of the core of the protein, not making contacts with any other region of the molecule. The connection between strands β3 and β4 is made by a long surface loop (residues 25‐46), covering one side of the β‐sheet. This loop starts with three successive β‐turns (residues 26‐35) running towards β1 and a more elongated stretch of residues (36‐43) which brings the chain back into the fourth strand. The direction of this loop is perpendicular to the orientation of the β‐sheet. Its conformation is stabilized by main chain interactions with the N‐terminal, side chain interactions, and hydrophobic packing. The conformation of the first part of the long loop (residues 30‐32) is the only region where the crystal and NMR structures are slightly divergent (see Discussion). The direction of the fourth strand is rather twisted. The first part of the strand (residues 44‐45) makes main chain hydrogen bonds only with β‐strand 3, while at the end (residues 46‐49) it is involved in β‐sheet interactions with strand 4 of the other monomer. No density is visible beyond residue 50 in both monomers, and again this agrees well with the absence of a well‐defined conformation for this region as observed by NMR. Truncation mutants have shown that the three C‐terminal residues of SacY(1‐55) can be disposed of without important loss of antitermination activity (Manival et al., 1997). Residues 51‐54 probably belong to a linker region between the RNA‐binding and the regulatory domain, the latter constituting the remaining part of SacY (Tortosa et al., 1997).
The β strands contain a high proportion of hydrophobic residues. One side of the sheet (Ile3, Ile12, Val14, Leu23) is in contact with a hydrophobic cluster of residues from the long loop between the third and the fourth β‐strand (Ile28, Val38, Ile43), forming the hydrophobic core of the monomer. The other face of the β‐sheet is forming the dimer interface.
Structure of the dimer
Gel filtration experiments and NMR spectroscopy have shown that SacY(1‐55) is present in solution as a dimer. The presence of a non‐crystallographic dimer in the crystal structure was suspected on basis of self‐rotation function calculations and was evidently confirmed by the two crystal structures. Dimer formation was expected for the intact protein, but came somewhat as a surprise for the RNA‐binding domain alone. The integrity of the dimer in solution is sensitive to salt concentration. It was shown by NMR and gel filtration experiments that SacY(1‐55) dissociates into monomers and unfolds at low ionic strength and that refolding is hard to accomplish (Manival et al., 1997).
The dimer has an ellipsoid shape of overall dimensions 40 Å×25 Å×25 Å. The dimer interface is formed by orthogonal stacking of the β‐sheets of the two monomers. The 2‐fold symmetry axis of the dimer runs through the core of the dimer interface, making an angle of ∼45° with the direction of the β‐sheets. The two sheets form a β‐barrel which is closed at one side by four main chain hydrogen bonds between the two β4‐strands of the two monomers (Figure 3B). At the other side the barrel closes by two symmetric hydrogen bonds between the Leu7 carbonyl and the Nδ2 of Asn8. The remaining contacts are made by the hydrophobic surfaces of the two sheets. The residues involved in the stacking interaction are: Leu7, Val13, Ile22, Leu24 and Phe47. The accessible surface area covered by dimer formation is 770 Å2 per monomer, which is ∼25% of the total accessible surface of the monomer. More than half of this surface is conferred by the hydrophobic residues of the fourth strand, and especially by the well‐conserved Phe47. Stacking of β‐sheets as dimer motif has been observed in a number of other proteins [e.g. Streptomyces subtilisin inhibitor (Takeuchi et al., 1991) and Bence Jones protein (Wang et al., 1979)] but in these structures the sheets are not forming a closed β‐barrel (Jones and Thornton, 1995).
Comparison between the crystal structure and the NMR structure
The crystal structure and the NMR solution structure, as described in Manival et al. (1997), were determined independently and are in excellent agreement. In Figure 4A we show the superposition of the Cα atoms for residues 1‐50 of the SacY(1‐55) monomer as determined by both techniques. The structure of the SacY(1‐55) monomer and the association of the monomers in the dimer are very similar in solution and in the crystal structure. The root‐mean‐square (r.m.s.) deviation for the α‐carbon atoms (residues 1‐50) between the crystal structure and the best NMR structure is 1.3 Å for the monomer and 1.7 Å for the dimer. In Figure 4B we represent the r.m.s. distance of Cα positions between a crystallographic monomer and the NMR monomer structure along the sequence. The structure superimposes best for the regions included in the β‐sheet. Two regions deviate considerably between the two structures: the loop between the second and third strand (around Gln18) and part of the long loop (around Asn31). These differences are clearly due to crystal contacts between these two regions and a detailed comparison will be given elsewhere.
In Figure 4C and D we compare for all main chain atoms the crystallographic B‐factor profile with the r.m.s. deviation of the 10 best NMR structures. The profile of the B‐factor variation correlates well with the r.m.s. distance variation of the NMR structures. The turn between residues 16 and 19 is badly defined in one of the monomers in the crystal structure. In the other monomer it is stabilized by a crystal contact mediated by a bound Cd ion. This turn seems also less well‐defined in the NMR structure as can be deduced from the higher r.m.s. values of the main chain coordinates for this region. No electron density is observed for residues beyond residue 50 and this correlates with the absence of long‐distance NOEs for this region.
Structure of SacY(1‐55) and comparison with the corresponding sequences of the other members in the antiterminator family
SacY(1‐55) has no sequence homology to any other known RNA‐binding motifs. However, a number of other bacterial transcriptional antiterminator proteins, regulating carbohydrate metabolism, contain a homologous domain at their N‐terminus. Antiterminator capacity of the N‐terminal domain has been proven for three of these proteins: SacT, BglG and LicT (Manival et al., 1997). We have aligned all the known sequences of these domains in Figure 5. Sequence identity with the SacY(1‐55) domain is between 40% and 50% and sequence similarity between 65% and 75% depending on the members considered. The hydrophobic character of residues involved either in packing of the monomer core (Ile3, Ile12, Val14, Leu23, Ile28, Val38, Ile43) or in dimer stacking (Leu7, Val13, Ile22, Leu24, Phe47) is well conserved. In all the known sequences, the domain is about the same length, no major insertions or deletions are present. In general, the strand regions are better conserved than the long loop covering the β‐sheet. We therefore can assume that all these sequences have the same fold and that the motif represents a well‐defined protein domain. It is therefore very likely that RNA‐binding activity is situated in the N‐terminal part of all these proteins.
The few protein‐RNA complexes described at the structural level have shown that aromatic and positively charged residues play an important role in RNA binding (Oubridge et al., 1994; Valegard et al., 1994). The aligned RNA‐binding domain sequences of the antiterminator proteins reveal four well‐conserved positive charges: Arg/Lys5, Lys32, Lys/Arg33 and Lys45. Arg5 together with Lys4, Lys15, Lys34, Lys45, Lys49 and Arg50 are part of two symmetry‐related charged clusters, situated on the rim of the interface, at opposite poles of the dimer. The electrostatic charge potential mapped on the surface of the molecule (Figure 6), shows the corresponding dense positively charged region. This region overlaps partially with the RNA contact surface as determined by NMR titration experiments of 15N‐labelled SacY(1‐55) with its 29‐nucleotide stem‐loop RNA target, suggesting that RNA contact regions are situated from Lys5 to His9 and from Gly25 to Phe30 (Manival et al., 1997). SacY(1‐55) contains only two conserved aromatic residues: Phe30 and Phe47. Phe47 does not show any spectral shifts upon RNA binding, which agrees with its totally buried position at the dimer interface. Phe30 on the other hand is situated at the edge of a turn in the long loop and is totally solvent‐accessible. In one of the monomers of the crystal structure the side chain of Phe30 is partially disordered, indicating high mobility. Phe30 and His9 are part of two turns that are close together in space and that are situated at the border of the positive charge cluster of the facing monomer (Figure 6). This could mean that a single RNA‐binding site is formed by different regions from both monomers. Homo‐dimer formation is commonly observed for proteins interacting with DNA, but has only rarely been described for protein‐RNA interactions. SacY(1‐55) forms a symmetrical dimer in all the crystal forms, which is the same as the dimer formed in solution. Since RNA titration experiments have shown that the dimer structure is not disrupted upon RNA binding, the asymmetrical RAT target sequence is recognized by a symmetrical protein. Asymmetrical interaction was also observed for the interaction between MS2 phage capsid protein with an operator RNA nucleotide (Valegard et al., 1994).
The fold of SacY(1‐55) is not original and a structural similarity search with the DALI server revealed that the β1β2β3β4 motif is present for instance in multiple copies of the β‐sheet propeller motif of enzymes such as neuraminidase and methylamine dehydrogenase (Holm and Sander, 1994).
Comparison with other RNA‐binding motifs
The structure of SacY(1‐55) has no homology with and no strong resemblance to any other RNA‐binding domains whose structures are known. These RNA‐binding domains all possess a central β‐sheet decorated with supplementary secondary structure elements (Nagai et al., 1990; Bycroft et al., 1995; Kharrat et al., 1995; Musco et al., 1996). The β‐sheet of SacY(1‐55) superposes relatively well onto the β‐sheet from the U1A spliceosomal protein. The r.m.s. distance for Cα positions of 27 residues included in β‐sheet formation is 1.1 Å. The topologies of the sheets are different: SacY(1‐55) has a β1β2β3β4 fold, whereas U1A has it in the order β4β1β3β2. However, there is no evidence in the SacY(1‐55) structure for a motif equivalent to the well‐conserved sequence motifs RNP1 or RNP2 in U1A. Similar superficial similarity is observed between SacY(1‐55) and two other RNA‐binding motifs, double‐stranded RNA‐binding protein and the KH homology domain (Bycroft et al., 1995; Kharrat et al., 1995; Musco et al., 1996). The β‐sheets of these latter proteins contain only three strands, flanked by α‐helices. However, one of the most marked differences is that all these motifs, although some are present in multiple copies in their protein, exist as monomers whereas SacY(1‐55) is functional as a dimer.
Interaction with target RNA
The structures of two RNA‐binding domains in complex with their target RNA were recently determined (Oubridge et al., 1994; Valegard et al., 1994; Allain et al., 1996). In the case of U1A snRNP protein, which belongs to the major class of RNA‐binding domains, called RNP, the RNA is bound to the accessible surface of the β‐sheet, containing the RNP sequences, but variable loops also play a decisive role in the formation of the complex (Allain et al., 1996). Similarly, the MS2 bacteriophage coat protein, which does not belong to the RNP family, binds its target RNA by virtue of the large accessible surface of the β‐sheet (Valegard et al., 1994). However, the β‐sheet being almost completely involved in dimer formation, SacY(1‐55) must interact with its target RNA in a different fashion. The participation of the long surface loop and a sharp turn of SacY(1‐55) in RNA binding was demonstrated by NMR measurements. We could not determine from these experiments whether the RNA target is interacting at once with both SacY(1‐55) monomers or whether the binding surface is conferred by a single monomer. The detailed knowledge of the interactions must await the crystal structure of the complex, but it is certain that the SacY(1‐55) structure introduces a novel motif and a new mechanism for RNA recognition.
Materials and methods
Expression, purification and crystallization
SacY(1‐55) was overexpressed as a fusion protein with glutathione S‐transferase using the pGEX‐2T plasmid (Pharmacia). After thrombolytic cleavage of the fusion protein, SacY(1‐55) was purified to homogeneity. Details on the purification and crystallization will be described elsewhere. Screening of crystallization conditions yielded several crystal forms (Jancarik and Kim, 1991). For heavy metal search we used the P21 crystal form II, obtained from 30% Jeffamine 600 M at pH 6.5 and 50 mM CsCl (unit cell dimensions a = 32.6 Å, b = 40.25 Å, c = 41.0 Å and β = 94.8°). This crystal form, the only one that did not require high concentrations of Cd or Zn ions, diffracted to ∼2.8 Å but suffered from severe diffuse scattering. Cd or Zn heavily precipitated all the cysteine mutants that were tried for Hg fixation. Classical heavy atom screening yielded no valuable derivatives. Crystallization trials with eight different single cysteine mutants were unsuccessful. We therefore prepared a series of double cysteine mutants, all having an additional cysteine at the N‐terminal (NTC). Two of these mutants (NTC/A26C and NTC/D54C) crystallized isomorphically to the form II crystals of the wild‐type protein and were used for Hg soaking (100 mM HgCl2, 12 h).
Data collection and processing
All data were collected using CuKα (λ = 1.5418 Å) radiation from a RIGAKU‐RU200 rotating anode operating at 40 kV and 80 mA. Intensities were measured on a MAR Research 30 cm Image Plate scanner. Data integration during heavy atom search was done with the MAR version of the XDS program (Kabsch, 1988). Data that were used for the final refinement were processed with the DENZO (Otwinowski, 1986) integration program and reduced with the CCP4 suite of programs (Collaborative Computing Project Number 4, 1994). Self‐rotation function calculations on form II data using POLARFN showed a strong peak, indicative of a non‐crystallographic two‐fold axis, perpendicular to the crystallographic screw axis. Assuming a dimer in the asymmetric unit yielded a realistic solvent content of ∼35%. NMR measurements also indicated that in solution SacY(1‐55) was present as a dimer (Manival et al., 1997).
Phasing, model building and refinement
The crystals obtained from the NTC/D54C mutant were of better quality than those of the wild‐type crystals and these data were used for the resolution of the structure. Analysis of the Harker section of both the isomorphous and anomalous Hg‐Patterson map revealed a single Hg site. The same was done for the NTC/A26C mutant soaked in HgCl2. The resulting isomorphous difference Patterson maps (using the NTC/D54C mutant data as native data set) indicated the presence of two Hg atoms. Double difference and cross‐phase Fourier maps were calculated to check the mercury positions in both mutants. Heavy atom positions were refined by maximum likelihood phased refinement using MLPHARE (Otwinowski, 1991). The correct hand of the heavy atom constellation was verified by anomalous occupancy values. The average FOM for all reflections was 0.52. The initial electron density map was good enough to dissect the boundaries of the two monomers in the asymmetric units. Model construction was done with the TURBO graphics program (Biographics, Marseilles, France) (Roussel and Cambillau, 1991). A partial model allowed us to determine the superposition matrix of the two monomers. This matrix confirmed the orientation of the two‐fold axis as determined by the self‐rotation function. After refinement using the RAVE‐program (G.Kleywegt, Uppsala, Sweden), this matrix was applied in the non‐crystallographic symmetry density modification procedure together with solvent flattening and histogram matching as implemented in the CCP4 program DM (Cowtan and Main, 1993). The efficiency of the procedure was checked by the Rfree value (Brünger, 1992a). The resulting electron density map allowed unambiguous construction of residues 1‐50. Refinement was done with the simulated annealing procedure using X‐PLOR (Brünger, 1992b). This brought the R‐factor down to 24%, but the Rfree value remained rather high (40%). Further refinement of the structure was done with crystals obtained from 30% PEG400, 100 mM CdCl2, 0.1 M NaAc buffer at pH 4.5 (form IV). Their space group was equally P21 but the cell dimensions were different: a = 29.7 Å, b = 51.2 Å, c = 38.9 Å and β = 108.2°. The quality of these crystals was much better and they diffract to 2.0 Å resolution on a rotating anode X‐ray source. The partially refined model from the form II crystals was used as search model in molecular replacement with the program AMoRe (Navaza, 1994). This yielded a solution with a R‐factor = 32% and a map correlation coefficient of 73% for data between 10 and 3.5 Å. Further refinement and calculation of 2Fo−Fc maps revealed the presence of three strongly bound Cd ions, coordinated to protein side chains and water and/or chloride ions. Inclusion of these ions improved the refinement considerably. No electron density was observed for residues beyond position 50 in neither of the monomers. Electron density is also poor for the turn contained between position 17 and 19. The current R‐factor is 21.9% (Rfree = 28%, 7% of the data never included in refinement) for reflections between 8.0 and 2.0 Å with intensity above 2σ. The model includes 50 residues for both monomers, three Cd ions, four Cl ions and 25 water molecules and has good stereochemistry, as verified by the program PROCHECK (Laskowski et al., 1993).
We thank Dr Nathalie Declerck (INRA, Thiverval Grignon) for the genetic construction of a few cysteine mutants. We acknowledge the financial support from the ‘Action concertées coordonnées des sciences du vivant’ (ACC‐SV5) of the Ministère de l'Education nationale, de l'Enseignement supérieur, de la recherche et de l'insertion professionnelle.
- Copyright © 1997 European Molecular Biology Organization