The 3D solution structure of the GCC‐box binding domain of a protein from Arabidopsis thaliana in complex with its target DNA fragment has been determined by heteronuclear multidimensional NMR in combination with simulated annealing and restrained molecular dynamic calculation. The domain consists of a three‐stranded anti‐parallel β‐sheet and an α‐helix packed approximately parallel to the β‐sheet. Arginine and tryptophan residues in the β‐sheet are identified to contact eight of the nine consecutive base pairs in the major groove, and at the same time bind to the sugar phosphate backbones. The target DNA bends slightly at the central CG step, thereby allowing the DNA to follow the curvature of the β‐sheet.
Expression of pathogenesis‐related proteins (Bowles, 1990) is induced in plants by stimuli including UV, salicylic acid and ethylene (Green and Fluhr, 1995), and leads to a defence response by inhibition of bacterial and fungal growth (Hammond‐Kosack et al., 1996; Ryals et al., 1996). A consensus nucleotide sequence, AGCCGCC, known as the GCC‐box, has been identified in the promoter region of the pathogenesis‐related genes (Ohme‐Takagi and Shinshi, 1995). Four cDNAs coding ethylene‐responsive element binding proteins (EREBPs) from tobacco have been isolated, which specifically bind to the GCC‐box (Ohme‐Takagi and Shinshi, 1995). Equivalents of EREBPs, AtERF1‐4 (M.Ohta, H.Shinshi and M.Ohme‐Takagi, unpublished results) and AtEBP (Büttner and Singh, 1997), have been identified in Arabidopsis thaliana.
A region of ∼60 amino acid residues (Figure 1B) is highly conserved among EREBPs (Jofuku et al., 1994; Ohme‐Takagi and Shinshi, 1995). The region is referred to here as the GCC‐box binding domain (GBD). A large number of divergent genes in a wide range of plants contain the GBD domain (Weigel, 1995; Elliott et al., 1996; Klucher et al., 1996; Wilson et al., 1996; Okamuro et al., 1997). No animal or fungal protein has been reported to possess a GBD.
In this paper we describe the solution structure of the AtERF1 GBD in complex with the target DNA, and discuss the mode of the specific interaction of the two molecules. The GBD binds to the DNA via its β‐sheet. This is somewhat unexpected, since this study reveals that GBD has an arrangement of secondary structural elements which resembles that of zinc fingers to some extent, although it is the α‐helices of zinc fingers that are known to bind to DNA. The mode of DNA binding of the GBD β‐sheet is different from that of the β‐sheets of other known DNA‐binding domains in the number and arrangement of β‐strands, in the contacting pattern of amino acid residues and DNA bases, and in the number of base pairs contacted. The solution structure of GBD determined in the absence of DNA is also reported to enable better understanding of the structural requirements imposed on the protein structure for DNA recognition.
Three dimensional (3D) structure of GBD
The solution structure of the AtERF1 GBD, Lys145‐Val206, in the absence of DNA was determined using 1187 unambiguous conformational constraints (Table I) obtained from multidimensional nuclear magnetic resonance (NMR) experiments. Forty‐six structures were selected after simulated annealing, which showed the smallest violations of the constraints (Figure 2A). The average root mean square (r.m.s.) deviation among the selected structures was 0.54 Å for the heavy atoms in the backbone, His146‐Pro203 (Table I). The final structure (Figure 2B) was determined by energy minimization of the mean co‐ordinates of the ensemble. All the residues were identified in the favored or additionally allowed regions in the (φ, ψ) space (Morris et al., 1992).
The determined GBD structure consists of a three‐stranded anti‐parallel β‐sheet comprising strand 1 (Val149‐Arg152), strand 2 (Lys156‐Asp163) and strand 3 (Ala169‐Phe176), packed along an α‐helix (Thr178‐Arg194) (Figure 1A). The structure is stabilized by extensive hydrophobic contacts of the side‐chains of Tyr146, Val149, Phe157, Ala159, Ile161, Val171, Leu173, Phe176, Ala179, Ala182, Ala183, Ala185, Tyr186, Ala189, Ala190, Ala198, Leu200 and Phe202 with each other. The geometry of the α‐helix relative to the β‐sheet appears to be determined by the interaction of the many Ala residues in the α‐helix and the larger hydrophobic residues in the β‐sheet, in particular, Phe157, Phe176, Val171 and Ile161, which clamp the α‐helix at the four corners. The N→C direction of the α‐helix is approximately parallel to that of β‐strand 2. This arrangement is slightly unusual, since an α‐helix is typically tilted slightly with respect to β‐strands (Janin and Chothia, 1980).
Turn 1, Arg152‐Lys156, between β‐strands 1 and 2 is classified to type 3:5, and turn 2, Asp163‐Ala169, between strands 2 and 3 is classified to type 5:5 (Sibanda et al., 1989). The N‐terminus of the α‐helix is capped by hydrogen bonds to and from Thr178. The N‐terminal loop, His145‐Gly148, is better defined than the C‐terminal loop, Gly195‐Pro203 (Figure 1A).
DNA recognition by GBD
The solution structure of the GBD–DNA complex (Figure 3B) was determined by simulated annealing by using NOE constraints obtained from NMR spectra (Figure 3A) followed by restrained molecular dynamic (rMD) calculation (see Figure 4D for the nucleotide sequence and the scheme for the numbering of the bases).
The 25 structures selected after the simulated annealing (Figure 3A) had an average r.m.s. deviation of 0.61 Å for the heavy atoms in the protein backbone, His146‐Pro203. The average r.m.s. deviation for all the heavy atoms in the polypeptide, His146‐Pro203, nucleotides 3–11 in the coding strand, and nucleotides 16–24 in the complementary strand, was 1.20 Å.
During the rMD calculation the van der Waals energy term in the AMBER force field decreased by 684 kcal/mol. The average inter‐base pair rise parameter in the GBD–DNA complex before the rMD calculation had a value of 4.02 ± 0.26 Å. In contrast the structure after the rMD calculation possessed a value of 3.21 ± 0.25 Å, which was much closer to the values in the known crystal DNA structures. The improvement was due to consideration of the effects of water molecules, which resulted in the hydrophobic effects of base‐stacking being included in the rMD calculation.
The orientation of the protein surface with respect to the target DNA was determined unambiguously on the basis of 37 intermolecular nuclear Overhauser effect (NOE) connectivities identified in homonuclear two dimensional (2D) NOE spectroscopy (NOESY) spectra: Arg162 to T16 and G17, Arg150 to C19, Trp154 to T3, A4 and C22, Trp172 to A4, G5 and C6, and Thr175 to A4 (Figures 5C and 6). The intermolecular cross‐peaks were observed isolated from the other peaks in two sets of NOESY spectra measured at different temperatures, 298 and 303 K, and were assigned to the same protein–DNA connectivities.
By using 13C, 15N‐labeled GBD in the complex with the unlabeled DNA, the 37 intermolecular NOE connectivites were studied further (Figure 7). Splitting of cross‐peaks was expected in one of the two frequency dimensions only for intermolecular NOE connectivities (see Materials and methods). The splitting was observed for 16 out of 37 connectivities. For the rest (21 connectivities) the intensities of the cross‐peaks were too weak for examination. The splitting of the 16 peaks was terminated by decoupling of the heteronuclear spin connectivities (Figure 7C), which was further evidence that the peaks were those of intermolecular connectivities.
The GBD binds to the major groove of the GCC‐box via the three‐stranded β‐sheet. The N→C direction of strand 2 is approximately parallel to the 5′→3′ direction of the coding strand (Figure 3). Only one residue from the α‐helix, Tyr186, is identified as binding to a phosphate, while no contact is identified to the bases.
The guanidyl groups of four arginine residues are identified to contact five guanine bases through hydrogen bonds (Figure 5B); Arg150 to O6 of G20, Arg152 to O6 and N7 of G5 and O6 of G21, Arg162 to N7 of G17, and Arg170 to N7 of G8. The stems of the arginine side‐chains make hydrophobic interactions with pyrimidine bases; Arg150 to C19, Arg170 to C7, and Arg162 to T16. The side‐chain Oϵ of Glu160 is positioned close to HN4 of C19. Although the distance, 3.11 Å, is slightly larger than would be expected for a standard hydrogen bond, no space remains for accepting a water molecule. The aromatic rings of Trp154 and Trp172 make hydrophobic interactions with T3 and A4, and G5 and C6, respectively. These interactions cover six base pairs in the conserved AGCCGCC sequence and thus serve as the specific recognition of the GCC‐box by GBD.
All the Arg and Trp residues which contact the bases also bind to the sugar phosphate backbones either by ionic interactions to the phosphates or by hydrophobic interactions to the sugars, with the exception of Arg152 which does not bond to the backbone: Arg150 to C19 phosphate; Arg162 to G17 phosphate and T16 sugar; Arg170 to C7 phosphate; Trp154 to T3 sugar and A4 phosphate; and Trp172 to G5 phosphate and G5 sugar. Certain other residues bind to the sugar phosphate backbones: Lys156 to A4 phosphate; Thr175 to G5 phosphate and A4 sugar; Arg147 and Tyr186 to G17 phosphate; and the backbone amide of Gly148 to G18 phosphate. These interactions determine the geometry of GBD relative to the DNA and thereby comprise a framework for the specific base recognition.
Although some of the intermolecular contacts identified after the rMD calculation were not formed correctly before the rMD calculation, i.e. one of the four ionic contacts and eight of the 13 hydrogen bonds, the deviation of the candidates in question from the criteria used for identifying contacts was small. In one of the eight hydrogen bonds the distance criterion was satisfied, while in the other seven hydrogen bonds and in the single ionic contact in question the deviation from the distance criteria was 0.74 Å on average. Only two of the eight hydrogen bonds did not fully satisfy the angle criterion; the excess being ∼44.7–45.6°.
For the final complex structure around the eight hydrogen bonds that were not correctly formed before the rMD calculation 33 intermolecular NOE connectivities are expected, where the intermolecular distances between the protons are <4 Å. Here connectivities that involve protons whose resonances are not expected to be observed at neutral pH are not included in the calculation. In the NOESY spectra 16 of the 33 were identified, i.e. the identification rate was 49%. Comapred with a similar rate of identification of intramolecular NOE connectivities calculated for the well‐determined core of GBD, 58%, the value appears to be acceptable to justify the formation of the hydrogen bonds.
Changes in the two molecules upon binding
The overall structure of the DNA in the complex is close to the standard B‐DNA structure, with the r.m.s. difference of 0.62 Å, but it is kinked by ∼20° around the major groove at the CG step (Figure 4A). The roll value of the CG step is the largest among the steps, while the helical twist value is the smallest (Figure 4C). The width of the major groove around the CG step is slightly narrower than that expected for the standard B‐DNA, while the minor groove is distinctively wider (Figure 4B). These observations are consistent with the previous analysis of the DNA bending at pyrimidine–purine steps in crystal structures (Suzuki and Yagi, 1995a; Suzuki et al., 1997).
The plot of the major groove width reveals a W‐shaped feature around the CG step (Figure 4B). This seems to be a consequence of two contradictory requirements; namely, a primary requirement that the major groove needs to become narrower globally to allow a better fit around the β‐sheet, which is carried out by the rolling of the CG step in the center, and a secondary requirement that the β‐sheet is widest around Ala159, which is close to the CG step (Figure 5D), and thus the groove becomes wider locally and approaches its standard size around the step.
To confirm the DNA bending the rMD calculation was repeated twice more by replacing the initial DNA structure by the straight standard B‐conformation and the standard B‐conformation but modeled to bend to a large degree by inserting an untwisted‐rolled CG step found in a crystal DNA structure (Hegde et al., 1992), respectively. The two rMD calculations resulted in structures very similar to those described above (Figure 4B and C).
The overall GBD structure does not change much upon binding to the target DNA (Figure 1A). The average r.m.s. difference between the mean structure determined in the complex with the target DNA and that determined in the absence of the target DNA is 0.78 Å for the backbone heavy atoms, His146‐Pro203, and 1.38 Å when the side‐chain heavy atoms were included. These values are comparable with the average r.m.s. deviation in each of the two ensembles (Table I). Locally, turn 1 bends slightly towards the DNA, thereby enhancing the concave curvature of the DNA‐binding surface (Figure 1A, inset).
Higher disorder is observed at turn 1, turn 2 and the C‐terminal loop in the GBD structure (blue and red lines in Figure 1A) compared with that observed in the GBD–DNA complex. At turn 2 the difference between the two mean structures is subtle (black line in Figure 1A) with only a slight change in the disorder (compare blue and red lines in Figure 1A). However, at turn 1 disorder decreases upon DNA binding, and the r.m.s. difference between the two mean structures is more substantial. It is likely that the changes observed are induced either for or by the binding of Arg152, Trp154 and Lys156 to the DNA.
Comparison of GBD with some other DNA‐binding domains
The DNA‐binding domains of MetJ and Arc repressors interact with the target DNA via β‐sheets (Breg et al., 1990; Somers and Phillips, 1992; Raumann et al., 1994). The β‐sheets are composed of two identical strands, one from each subunit, and form the dimerization interface. Their target nucleotide sequences are essentially palindromic. In contrast, GBD is monomeric, its β‐sheet is three‐stranded, and it interacts with the nucleotide of a non‐palindromic sequence. In the β‐sheet of GBD many residues contact both bases and the backbones, while in Arc and MetJ repressor residues are differentiated for the two types of function. Bases in six consecutive base pairs can be recognized by a β‐sheet of the MetJ–Arc type (Suzuki, 1995), while bases in nine consecutive base pairs are recognized by the GBD β‐sheet. Thus, the interaction of GBD with the target DNA reported here represents a novel mode of protein–DNA interaction.
The first zinc finger of SWI5 (Dutnall et al., 1996) has the same arrangement of the secondary structural elements as that in GBD. However, zinc fingers bind to the DNA via the α‐helix and not the β‐sheet. Calculations of electrostatic potentials suggest that the combination of positive charges on the β‐sheet and negative charges on the α‐helix in GBD (Figure 2B), and that of negative charges on the β‐sheet and positive charges at the N‐terminus of the α‐helix in the zinc finger (Figure 2C) are important for creating the different DNA‐binding mechanisms. All zinc fingers of the C2H2 type are substantially smaller than GBD (compare Figure 2B and C). Zinc co‐ordination to the four amino acid residues can stabilize the core of zinc fingers as effectively as the extensive hydrophobic interactions in the core of GBD.
Comparison of the DNA‐binding modes of the β‐sheets
In GBD, β‐strand 2 is twisted slightly at Ala159 to follow strand 3 (Figure 5D), whose biphasic regularity is disturbed by the insertion of Gly174; Ala159 makes hydrogen bonds with Leu173 and Gly174. It has been reported that mutation of Gly174 in another GBD caused malfunction of the protein (Jofuku et al., 1994). In one half of the protein–DNA interaction site, downstream of Ala159, strand 3 contacts the bases in the coding strand, while strand 2 contacts bases in the complementary strand (Figure 5D). In the other half, upstream of Ala159, strand 1 contacts the bases in the both strands, but strand 2 contacts no base. At no point do the three strands fit side by side in the center of the major groove as strands 2 and 3 in the upstream half are separated slightly from the groove (Figure 3D). Consequently, the DNA groove accepting the three‐stranded β‐sheet is not much wider or smaller than that accepting two‐stranded β‐sheets of MetJ–Arc (Suzuki and Yagi, 1995b).
A two‐stranded β‐sheet is not totally flat but has a curvature. Thus, there are essentially two ways of placing a two‐stranded antiparallel β‐sheet in the DNA groove; in one mode the convex side of the β‐sheet faces the DNA, while in the other way it is the concave side (Suzuki, 1995). If the N→C direction of each strand follows the 5′→3′ direction of the nearest nucleotide strand the β‐sheet forms a concave mode, but if the N→C direction of each strand follows the 3′→5′ direction of the nearest nucleotide strand, the β‐sheet forms a convex mode. A convex mode, which is used by the MetJ–Arc family, is suitable for contacting bases since the major groove is concave (Figure 3C), but it is unable to follow the groove for more than six consecutive base pairs as it does not curve around the DNA helix axis (Figure 5E). In contrast, a concave mode is appropriate for following the DNA backbone, which is convex if viewed along the double helix axis (Figure 3D). However, the concave mode prevents a β‐sheet completely entering the groove, and hence residues are unable to contact the DNA bases effectively (Suzuki, 1995). The concave mode has been found in β‐sheets which bind to the minor groove, which is shallower than the major groove, by following the DNA backbones (Vis et al., 1995; Rice et al., 1996).
A convex mode is observed in the downstream half of the GBD β‐sheet (Figure 5D), while the β‐sheet is extended upstream by adopting a concave mode, where strand 2 and irregular strand 3 bind to the DNA via the sugar phosphate backbone, creating a curvature appropriate for following the DNA (Figure 5D). Hence, the GBD β‐sheet represents a better design by combining the different modes in two halves, resulting in contact with a larger number of base pairs. The curvature of the β‐sheet closely follows the DNA helix axis (Figure 3D as well as closely fits into the major groove (Figure 3C).
Implication for GBD–protein interaction
Many amino acid residues are identical among the GBD sequences and most of the changes are conservative (Figure 1B). To confirm the formation of essentially the same 3D structure by other GBDs, GBD of C‐repeat/DRE binding factor 1 (CBF1) protein (Stockinger et al., 1997) was modeled. The side chains in the AtERF1 GBD were replaced with those present in CBF1 followed by molecular dynamic simulation in water (M.Tateno, K.Yamasaki and M.Suzuki, unpublished results). The same arrangement of the secondary structural elements was observed in the modeled CBF1 structure; the average r.m.s. difference between the modeled and original structures was 2.59 Å.
In a previous discussion (Okamuro et al., 1997) the GBD sequence was divided into two halves, the YRG element in the N‐terminal half and the RAYD element in the C‐terminal half, and it was predicted that the RAYD element would fold into an α‐helix. The two elements do not, however, exactly correspond to the β‐sheet and the α‐helix, respectively, as reported in this paper. The YRG element corresponds to the N‐terminal loop and strands 1 and 2, while the RAYD element corresponds to strand 3, the α‐helix and the C‐terminal loop. Two alternative possibilities have been discussed for the function of the predicted α‐helix of the RAYD element, DNA‐binding or interaction with another protein (Okamuro et al., 1997).
Amino acids of the AtERF1 GBD identified to contact DNA bases are well conserved among the GBD sequences. Thus, it is likely that the DNA‐binding specificity of the known GBDs is essentially the same. In the GBD–DNA complex many amino acid residues bind to both bases and the sugar phosphate backbones. Alternation of these residues could cause truncation in the binding geometry. Thus, it seems more difficult to design a DNA‐binding domain of different binding specificity on the basis of the GBD structure than that of some other DNA‐binding domains.
The majority of the conserved residues in the GBD sequences can be classified into two categories: those that stabilize the protein structure, and those that are responsible for DNA recognition. However, a few residues appear to have no attributable function. Three such residues, Asp187, Arg194 and Phe202, are accessible to the solvent. It is conceivable that the α‐helix or the C‐terminal loop might be used for interaction through these residues with another domain, either in the same protein or another protein. In vitro studies have demonstrated the ability of AtEBP to interact with some other transcription factors (Büttner and Singh, 1997).
Materials and methods
Preparation of the protein and the DNA
The GBD region of the AtERF1 gene, corresponding to Gly143‐Glu210, was cloned into vector pAF104 (Fukuoh et al., 1997) and expressed by the Escherichia coli strain BL21(DE3). For isotope‐labeling, K‐MOPS minimal medium (Neiderhardt et al., 1974) containing 10 mM 15N‐labeled NH4Cl and/or 0.2% (w/v) [13C]glucose was used for the culture. The protein was purified by cation exchange chromatography (Pharmacia Resource™ S) and gel filtration (Superdex™ 75). By electrospray mass spectrometry it was indicated that the N‐terminal methionine had been removed, possibly after translation. The GBD and the GBD–DNA complex were dissolved into a 50 mM potassium phosphate buffer (pH 5.0 or 6.0) containing 40 mM KCl in the different D2O/H2O ratio, depending on the type of the NMR measurements to yield 2.5 and 1.5 mM solutions, respectively.
The NMR spectra were recorded on Bruker DMX‐750 (750.13 MHz for 1H, 188.64 MHz for 13C and 76.02 MHz for 15N) and DMX‐500 (500.13 MHz for 1H, 125.75 MHz for 13C and 50.68 MHz for 15N) spectrometers at 283–303 K; 2D NOESY (Jeener et al., 1979), 2D total correlation spectroscopy (TOCSY) (Braunschwieler and Ernst, 1983), 2D double‐quantum‐filtered (DQF) correlation spectroscopy (COSY) (Braunschwieler et al., 1983), 2D 1H–15N HSQC (Bodenhausen and Ruben, 1980), 3D 1H–15N NOESY‐heteronuclear multiple‐quantum coherence (HMQC) (Marion et al., 1989a), 3D 1H–15N TOCSY–HMQC (Marion et al., 1989b), 3D HNCA (Ikura et al., 1990) and 3D HN(CO)CA (Bax and Ikura, 1991). Quadrature detection in indirect dimensions was carried out by the time‐proportional phase incrementation method (Marion and Wüthrich, 1983). During the detection of amide protons, GARP1 15N‐decoupling (Shaka et al., 1985) was used. The mixing time chosen for TOCSY was 50 ms, and for NOESY, 25–150 ms. Spectra were referenced relative to external sodium 2, 2‐dimethyl‐2‐silapentane‐5‐sulfonate for proton and carbon signals, or liquid ammonium for that of nitrogen.
The resonances were assigned using sequential inter‐unit NOE connectivities by the standard procedures (Wüthrich, 1986). The 13Cα–15N‐coupling connectivities observed in the HNCA and HN(CO)CA spectra were also used for the sequential assignment. In the NOESY spectra of the complex the resonances due to the protein moiety and those due to the DNA moiety were identified and separated from each other using the 13C‐ and 15N‐filtration method (Otting and Wüthrich, 1990; Ikura and Bax, 1992). Approximately half of the Hβ resonances were assigned stereospecifically by analyzing the NOESY and TOCSY spectra (Wüthrich, 1986).
Intermolecular NOE connectivities were identified by homonuclear NOESY spectroscopy (Figures 6 and 7) to the same protein–DNA connectivities at two different temperatures, 298 and 303 K. The intermolecular NOE connectivities were further studied by using 13C, 15N‐labeled GBD in the complex with the unlabeled DNA. Intermolecular NOE cross‐peaks were expected to be split in one of the two frequency dimensions in homonuclear 2D NOESY spectra, since the resonances of protons which were covalently bonded to 13C or 15N in GBD would be split by the coupling effects of 13C or 15N spins, while such splitting was not expected with the resonances of protons in the unlabeled DNA.
Amide hydrogen–deuterium exchange experiments were carried out at 283 K (pD = 5.0) in the absence of the DNA, and at 298 K (pD = 6.0) in the complex with the DNA, in order to identify hydrogen bond donors. Thirty hydrogen bond donors were identified for GBD in the absence of the DNA. For GBD in complex with DNA, 30 hydrogen bond donors were identified, among which 28 were the same as those identified in GBD in the absence of DNA.
Determination of the 3D structure of GBD in the absence of the DNA
All the spectroscopic characteristics suggested that Gly143 and Asn207‐Glu210 in GBD were unfolded. Thus, residues Lys144‐Val206 were included in all calculations.
The distance constraints derived from the NOESY spectra were classified into four categories corresponding to inter‐proton distance constraints of 1.8–2.8, 1.8–3.5, 1.8–5.0, and 1.8–6.0 Å, respectively. To the protons whose resonances were not assigned stereospecifically, i.e. all the protons in the methyl groups, and some protons in the methylene groups and the aromatic rings of Phe and Tyr, the constraints were imposed by using the <r−6> averaged distances from all the identical protons in the groups or rings (Brünger et al., 1986). To maintain a hydrogen bond, a constraint of 1.5–2.5 Å was imposed on the distance between the hydrogen and the acceptor oxygen, while another constraint of 2.5–3.5 Å was imposed on the distance between the donor nitrogen and the acceptor oxygen. A force constant of 50 kcal/mol/Å2 was used in order to impose the distance constraints outside the allowed regions using the method of Brünger (1992).
Five torsion angle constraints were obtained by HMQC‐J (Kuboniwa et al., 1994) and were imposed on φ angles in GBD in the absence of the DNA. The angle constraints were classified into three categories, −120 ± 60°, −120 ± 50°, and −120 ± 40°, corresponding to the 3JαN coupling values, <7.5, 7.5–8.5 and >8.5 Hz, respectively. A force constant of 200 kcal/mol/rad2 was used following the method of Brünger (1992).
Dynamic simulated annealing (Nilges et al., 1988a,b) was carried out using the algorithm implemented in the X‐PLOR package (Brünger, 1992). Eighty initial structures were created in the form of random arrays of atoms (Nilges et al., 1988a). Forty‐six structures were selected, where no distance violation was >0.2 Å and no torsion angle violation was >3°; the rate of acceptance was 57.5%. The mean co‐ordinates of the ensemble was submitted to 400 cycles of energy minimization (Powell, 1977). Differences between the 46 structures were characterized by calculating the average r.m.s. deviations of the co‐ordinates from the unminimized mean structure of the ensemble (Table I).
The simulated annealing was repeated twice. In the first round no hydrogen bond constraints were imposed. Of the 54 structures accepted after the first round, candidates for the acceptors were identified using part of a program NAOMI (Brocklehurst and Perham, 1993) for the hydrogen bond donors that were identified by the H–D exchange experiments. When two or more candidates of acceptors were found for the same donor in different structures, the most frequently occurring candidate was selected. In all such cases the pairs selected possessed the most favorable pseudo‐energy values. A hydrogen bond donor, the δ nitrogen of Asn201, was identified to form two hydrogen bonds by using both of the two hydrogens to the same acceptor atom, producing three constraints. For the 30 hydrogen bond donors identified by the H–D exchange experiments, 61 constraints were made. The hydrogen bond constraints were imposed in the second round of the simulated annealing in order to maintain the selected donor–acceptor pairs. The method used was essentially the same as that of Kalia et al. (1993).
Determination of the 3D structure of the GBD–DNA complex
Nucleotides 1–13 in the coding strand and 14–26 in the complementary strand were included in the simulated annealing of the complex. In the rMD calculation, nucleotides 3–13 in the coding strand and 14–24 in the complementary strand were included, since the upstream end of the DNA was not contacted by the protein and a meaningful experimental constraint was not observed with the end section. Residues Lys144‐Val206 of GBD were included in all the calculations.
The simulated annealing of the GBD–DNA complex was carried out essentially in the same way as in that of GBD in the absence of the DNA. The initial structures of DNA were random arrays of atoms, as were the initial structures of GBD. In addition to the 1632 experimental constraints the following theoretical constraints were imposed: 72 constraints on the intra‐base pair hydrogen bonds in order to maintain the base‐pairing (Werner et al., 1995) with a force constant of 200 kcal/mol/Å2; 65 constraints on the rotation of bases around the intra‐base pair hydrogen bonds in order to ensure good stereochemistry in terms of propeller‐twisting and base‐rolling (Omichinski et al., 1997) with a force constant of 200 kcal/mol/rad2; and 122 constraints on the rotation of the atoms around the covalent bonds in the DNA backbones in order to prevent problems associated with the mirror image (Omichinski et al., 1993) with a force constant of 200 kcal/mol/rad2.
Starting with 200 random structures, 25 structures were refined, where no distance violation in the protein structure was >0.2 Å, no distance violation in the DNA structure was >0.3 Å, no GBD–DNA distance violation was >0.3 Å, and no torsion angle violation was >3°. Although the rate of accepting structures after the simulated annealing of the complex of 12.5% was different from that of GBD in the absence of the DNA (57.5%), the calculation algorithm and the criteria for the selection of structures were kept essentially the same. In fact, the r.m.s. deviation values of GBD structures in the two ensembles are found to be similar, i.e. 1.21 ± 0.09 Å for all the heavy atoms of GBD in the complex with the DNA, and 1.32 ± 0.21 Å for those of GBD in the absence of DNA (Table I). The difference between the two acceptance rate values is likely to be due to the difference in molecular weight.
After energy minimization in the AMBER force field the total energy value of the minimized mean structure became significantly lower than the smallest among the values calculated with the ensemble structures. The minimized mean structure was submitted to the rMD calculation.
The rMD calculation was carried out by using the force field (Cornell et al., 1995) implemented in the AMBER package (Pearlman et al., 1994) by imposing only the NOE distance constraints and not by imposing a torsion angle constraint or a hydrogen bond constraint. A force constant of 30 kcal/mol/Å2 was used in order to impose the NOE constraints. The net charge of the protein and the DNA were neutralized by counterions. Approximately 5000 water molecules were modeled by the TIP3P method (Jorgensen et al., 1983). Approximation of the electrostatic force by truncated Coulombic potentials, which ignores interactions beyond a chosen cut‐off length, can distort the molecular system significantly. In this study, long‐ranged electrostatic forces were taken into consideration by the particle mesh Ewald method (Darden et al., 1993). The lengths of covalent bonds from hydrogens were kept constant using the SHAKE method (van Gunsteren and Berendsen, 1977).
Trajectories were generated every 0.002 ps under the pressure of 1 atm. For the first 20 ps the co‐ordinates of the atoms in the protein and the DNA, and of the counterions were kept the same, while only the co‐ordinates of the atoms of the water molecules were changed. Temperature was kept constant at 300 K. For the following 6 ps only the co‐ordinates of counterions and water molecule atoms were changed for every 2 ps at 100, 200 and 300 K. For the following 8 ps the calculation was continued by changing the co‐ordinates of all the atoms every 2 ps at 100, 150, 200 and 250 K. The calculation was further continued for 302 ps at 300 K. The co‐ordinates of 200 complex structures were selected, that were produced at each 1 ps during the last 200 ps, and the mean co‐ordinates were subjected to energy minimization.
Analysis of the structures
The secondary structural elements in GBD were identified on the basis of NOE profiles characteristic of the elements and the observed protection of the amide protons from the exchange to deuterons (Wüthrich, 1986). The φ and ψ dihedral angles were analyzed using the Procheck–NMR program (Laskowski et al., 1996). Diagrammatic representations were made using MOLSCRIPT (Kraulis, 1991). Electrostatic potentials were essentially calculated by the method of Guenot et al. (1994). Parameters for describing the geometry of dinucleotide steps in the DNA were calculated using a program from Babcock et al. (1994). The widths of the major and minor grooves were calculated according to the method of Suzuki and Yagi (1996).
The criterion used in this study for identifying an ionic interaction between Arg or Lys and a DNA phosphate was a distance of <5.0 Å from the side‐chain HN to the phosphate oxygen (Billeter et al., 1993). The criteria for an intermolecular hydrogen bond was a donor–acceptor distance of <3.4 Å and a donor–proton–acceptor angle of >110°, while the criterion for a hydrophobic interaction was a C–C distance of <4 Å (Chuprina et al., 1993). The criteria used for identifying hydrogen bonds in the protein were the same as those described by Brocklehurst and Perham (1993).
The co‐ordinates of the structures determined and a list of the experimental constraints have been deposited in the Protein Data Bank (Bernstein et al., 1977). The accession code for the final structure of the GBD–DNA complex is 1GCC, while those for the GBD in the absence of DNA is 2GCC for the minimized mean structure, and 3GCC for the ensemble after the simulated annealing.
We thank Drs M.Iwakura and T.Takenawa for mass spectrometry measurements. This work was supported by the Center of Excellence, COE, program of the Science and Technology Agency, and the Core Research for Evolutional Science and Technology, CREST, program of the Japan Science and Technology Corporation.
- Copyright © 1998 European Molecular Biology Organization