The BRCT domain (BRCA1 C‐terminus), first identified in the breast cancer suppressor protein BRCA1, is an evolutionarily conserved protein–protein interaction region of ∼95 amino acids found in a large number of proteins involved in DNA repair, recombination and cell cycle control. Here we describe the first three‐dimensional structure and fold of a BRCT domain determined by X‐ray crystallography at 3.2 Å resolution. The structure has been obtained from the C‐terminal region of the human DNA repair protein XRCC1, and comprises a four‐stranded parallel β‐sheet surrounded by three α‐helices, which form an autonomously folded domain. The compact XRCC1 structure explains the observed sequence homology between different BRCT motifs and provides a framework for modelling other BRCT domains. Furthermore, the established structure of an XRCC1 BRCT homodimer suggests potential protein–protein interaction sites for the complementary BRCT domain in DNA ligase III, since these two domains form a stable heterodimeric complex. Based on the XRCC1 BRCT structure, we have constructed a model for the C‐terminal BRCT domain of BRCA1, which frequently is mutated in familial breast and ovarian cancer. The model allows insights into the effects of such mutations on the fold of the BRCT domain.
The BRCT domain is defined by distinct hydrophobic clusters of amino acids and is believed to occur as an autonomous folding unit of ∼95 amino acids. This domain is found in proteins involved in DNA repair, recombination and cell cycle control (Koonin et al., 1996; Bork et al., 1997; Callebaut and Mornon, 1997). It was first identified in the C‐terminal region of the breast cancer suppressor protein BRCA1 and was thus named BRCT domain (Koonin et al., 1996). Two mammalian proteins which contain BRCT domains and have functions in DNA repair, DNA ligase III and XRCC1 protein, bind each other strongly (Ljungquist et al., 1994; Caldecott et al., 1995) to form a heterodimer through specific interactions between their C‐terminal BRCT domains (Nash et al., 1997). Thus, the C‐terminal stretch of 96 amino acids of XRCC1 is necessary and sufficient to bind DNA ligase III efficiently. A second, more N‐terminal BRCT domain in XRCC1 interacts with poly(ADP‐ribose) polymerase (Masson et al., 1998). Similar results have been obtained for the C‐terminal BRCT region of DNA ligase IV, which forms a heterodimer with the XRCC4 protein and is involved in DNA double‐strand break repair (Critchlow et al., 1997). These data identify BRCT domains as protein–protein interaction entities, which can either bind different BRCT domains specifically or interact with other unknown protein folds.
XRCC1 (633 amino acids) has no known enzymatic activity (Thompson et al., 1990) but apparently functions as a scaffolding protein in the mammalian base excision‐repair pathway. Thus, XRCC1 promotes the efficiency of the repair process and serves to bring together DNA polymerase β and DNA ligase III, since these two enzymes do not interact directly (Kubota et al., 1996; Cappelli et al., 1997). As observed for several proteins involved in the correction of abasic sites in DNA, deletion of the XRCC1 gene in mice results in an embryonic lethal phenotype (Tebbs et al., 1996). This is consistent with an essential role in removal of endogenous DNA damage. However, two Chinese hamster ovary cell lines with mutations in XRCC1 have been isolated; these cells show reduced ability to join single‐strand breaks in DNA, with concomitant cellular hypersensitivity to ionizing radiation and alkylating agents (Thompson et al., 1990; Cappelli et al., 1997; Shen et al., 1998).
The BRCT domain was first identified by a database search, comparing regions of the BRCA1 protein with other available protein sequences (Koonin et al., 1996). The BRCA1 gene encodes a 220 kDa nuclear phosphoprotein in which structural changes confer susceptibility to familial breast and ovarian cancer (Miki et al., 1994). Thus, inherited mutations associated with loss of activity of BRCA1 result in a greatly increased risk of women developing breast cancer (Easton, 1997). The detailed molecular functions of BRCA1 are unknown, but recently BRCA1 has been implicated in transcriptional regulation and DNA repair (reviewed in Bertwistle and Ashworth, 1998). Specifically, the C‐terminal region of BRCA1 containing two BRCT domains acts as a transcriptional activation region in reporter assay systems (Chapman and Verma, 1996). BRCA1 co‐purifies with the RNA polymerase II holoenzyme complex (Scully et al., 1997a), and a direct interaction between the RNA helicase A component of the holoenzyme and BRCA1 has been detected (Anderson et al., 1998). Moreover, BRCA1 is phosphorylated in response to DNA damage and associates with hRAD51 (Scully et al., 1997b), the human homologue of the Escherichia coli RecA protein, which plays a key role in homologous recombination and post‐replication repair. Recent data indicate that BRCA1 co‐immunoprecipitates with p53, and can regulate p53‐mediated gene expression with an absolute requirement for the most C‐terminal BRCT domain, suggesting that BRCA1 functions as a coactivator of p53 (Ouchi et al., 1998). A unique involvement of the second of two C‐terminal BRCT domains in heterodimer formation and tight binding of its protein partner has also been observed for DNA ligase IV (Herrmann et al., 1998). Many different mutations in BRCA1 have been described [see the Breast Cancer Information Core (BIC) databases on the World Wide Web: http://www.nhgri.nih.gov/intramural_research/lab_transfer/bic], the positions of which have been correlated with a predisposition to breast and/or ovarian cancer (Gayther et al., 1995).
Here we report the first three‐dimensional structure and fold of a BRCT domain. This provides a structural basis to explain the observed sequence conservation of the BRCT family as well as giving insights into the function of the XRCC1 BRCT domain in terms of specific protein–protein interactions leading to intracellular stabilization of its DNA ligase III partner. Using the XRCC1 fold, we have constructed a model for the homologous BRCA1 C‐terminal BRCT domain to predict the structural consequences of cancer‐predisposing mutations found within the domain.
Results and discussion
The structure of the XRCC1 C‐terminal BRCT domain
The tertiary structure of the C‐terminal BRCT domain from human XRCC1 (residues 538–633; 96 residues) was solved by X‐ray crystallography to 3.2 Å resolution (see Table I and Materials and methods for details), using phases obtained from single isomorphous replacement, further improved through density averaging, and refined to a free R‐factor of 26.5% (Table I). The relatively large unit cell and highly hydrated nature of the crystals obtained were limiting factors in the resolution of the data. However, the final electron density map improved by 2‐fold non‐crystallographic‐symmetry averaging and solvent flattening showed clear density for main chain and most side chains. This allowed an unambiguous trace of a model containing residues 538–633. The structure forms a compact globular α/β domain (∼36 Å×26 Å×23 Å) consisting of a four‐stranded parallel β‐sheet (with strand order β2β1β3β4) surrounded by three α‐helices (α1‐α3; Figure 1A). The overall topology is β1α1β2β3α2β4α3, with two α‐helices (α1 and α3) on one side of the β‐sheet and the third helix (α2) on the other (Figure 1A). The β‐sheet forms the core of the structure, with helix α1 forming hydrophobic interactions with residues from β1 and β2. Helix α2 interacts with β4, also through hydrophobic interactions, and is stabilized further by a salt bridge (Glu52 from loop c3 to Arg71 from β4, residues numbered within the BRCT domain; Figure 2). Helix α3 contains the highly conserved core residue Trp74, which interacts with other conserved residues from β1, β3 and β4 as well as the C‐terminal segment (Figure 1B). Helices α1 and α3 form a two‐stranded helical bundle through interactions between residues Leu25, Tyr28, Val29 (α1) and Ile75, Tyr76 (α3). A search of the Dali database (Holm and Sander, 1993) identified several proteins, including bacterial chemotaxis factor CheY (PDB accession code 1Chy) and the energy‐coupling IIB enzyme (PDB accession code 1iib‐A), which have some similarity to the XRCC1 C‐terminal BRCT domain. However, there is no evidence for sequence homology between the XRCC1 BRCT domain and these structurally similar folds.
Sequence conservation among different BRCT domains
A multiple sequence alignment of a representative subset of BRCT sequences is shown in Figure 2, along with the secondary structure of the XRCC1 C‐terminal BRCT domain. Using the secondary structure as a guide, we have defined five conserved regions, N‐terminus/β1, α1/β2, β3, β4/α3 and the C‐terminus, with boundaries indicated by the coloured lines below the sequences (Figure 2). This slightly modifies the boundary definitions by Koonin et al. (1996) which were based on sequence homology within the BRCT family, although the overall secondary structure elements were predicted correctly for the core of the motif. The conserved hydrophobic amino acid clusters which define the family (Bork et al., 1997; Callebaut and Mornon, 1997) are located within the central β‐sheet, on α1 and α3 and at the N‐ and C‐termini of the domain (Figures 2 and 3). Residues which form helix α3 are highly conserved both on the surface of the structure and with regard to those residues which provide the interface with α1. Likewise, residues on α1 which interact with α3 (Leu25, Tyr28 and Val29) are conserved (Figure 2), suggesting that the two‐stranded helical bundle is an essential element of the BRCT domain. Trp74 is one of the three residues (Trp74, Asn33 and Gly34) which are the most invariant in the BRCT family (Figure 2 and Bork et al., 1997; Callebaut and Mornon, 1997). Trp74 is located on α3 at the centre of a highly conserved hydrophobic pocket (comprising Trp74, Phe6, Phe11, Phe46, Val70 and Leu84; Figure 1B) and forms interactions with Phe11 (β1), Phe46 (β3), Val70 (β4) and Leu84 (Figure 1B). Cys78 is another relevant residue (Figure 2; Bork et al., 1997), which specifically interacts with Trp74 (S‐Nϵ 3.3 Å) and the main chain amide of Leu85 (S‐N 3.4 Å), positioning the C‐terminal segment onto the core of the structure (Figure 3). It is interesting to note that a mutational change of this conserved Cys residue to Tyr in the other BRCT domain of the XRCC1 protein causes functional inactivation in vivo (Shen et al., 1998).
Sequences corresponding to the α2 region are the least conserved within the BRCT family, varying both in length (none to >20 residues) and amino acid composition (Figure 2; Bork et al., 1997; Callebaut and Mornon, 1997). In the domains which have deletions in α2, such as the DNA ligase III and RAP1 BRCT domains, Phe69 on β4, which is buried by Leu61 on α2 in XRCC1, is replaced by a hydrophilic residue (Gln in DNA ligase III or Thr in RAP1; Figure 2). Apart from α2, several other sequence insertions or deletions occur within the superfamily, which map to the surface loops connecting β1 and α1 (c1) and β2 to β3 (c2) respectively (Figures 1 and 2). None of the sequence variability within the BRCT family is likely to alter the overall core fold of the domain, comprising the central β‐sheet, α1 and α3, which is highly conserved.
A recent study by Caldecott and co‐workers (Taylor et al., 1998), using site‐specific mutagenesis and deletion analysis, aimed to identify regions within the XRCC1 BRCT domain required for interaction with DNA ligase III. Most of their results can be explained easily, since deleting parts of the BRCT domain would profoundly affect the overall fold and structure of the domain. Furthermore, substitution of Val47 Ile48 in β3 by Asp (Taylor et al., 1998; Figure 2) would probably prevent the correct folding of the domain, which correlates with the observation that this mutant cannot interact with DNA ligase III. However, Taylor et al. (1998) also reported the puzzling result that substitution of the centrally located and highly conserved Trp residue (Trp74, Figures 1B and 3) by Asp did not seem to affect the interaction with DNA ligase III. Given that Trp74 is at the centre of a highly conserved hydrophobic pocket at the core of the BRCT domain, the substitution with a charged residue is likely to have severe consequences on the folding of the domain, and further experimental evidence will be required to confirm and clarify the structural and functional effects of mutations at this site.
Dimer interface and potential protein–protein interaction sites
In the XRCC1 crystal, there are two BRCT domains in the asymmetric unit forming a dimer primarily through interactions between α1 and the N‐terminal region (Figure 4). Solvent‐accessible surface areas (SAAs) for the non‐crystallographic dimer were calculated, using the program ASA (A.Lesk, Cambridge; probe size of 1.4 Å), in order to assess the significance of the dimer interface. Five residues in the N‐terminal segment provide ∼30% of the total SAA at the dimer interface, whereas six residues in helix α1 (residues 23–31) followed by a turn of four residues (residues 32–35) provide ∼59% of the dimer SAA. The remaining area (∼11%) is provided by three residues after α3 (residues 79–81). The α1–α1 contact involves extensive interactions between residues from the monomers (designated A and B to distinguish between them), especially the salt bridge between Arg23A and Glu35B across the dimer interface (Figure 4). Another significant interaction in the dimer interface is the salt bridge between Asp4A and Arg27B (Figure 4).
The XRCC1 C‐terminal BRCT domain forms a specific heterodimer in vitro with the BRCT domain of DNA ligase III (Nash et al., 1997). Of the residues which make significant contributions to the XRCC1 non‐crystallographic dimer interface, Asp4 and Arg23 are conserved in the DNA ligase III BRCT domain (see Figure 2). Furthermore, Arg27 and Glu35 in DNA ligase III (Glu is replaced by an Asp) are shifted one residue towards the N‐terminus relative to XRCC1. In viewing these similarities, it is reasonable to propose that some of the interactions in the XRCC1 dimer interface could be retained in a heterodimeric BRCT complex between XRCC1 and DNA ligase III. It is also notable that the subunit interface in the XRCC1 non‐crystallographic dimer covers 1306 Å2 (653 Å2 per subunit representing ∼11% of the total SAA), a value which is reported to be significant for protein–protein interfaces (Janin, 1997). Thus, the XRCC1 non‐crystallographic dimer interface which we observe in the crystal structure may represent some aspects of the XRCC1–DNA ligase III BRCT interface, although further mutational and structural studies are necessary to confirm this notion.
The other highly conserved residues within the BRCT family comprise a double Gly–Gly motif (Asn33–Gly34 in XRCC1) which is located in a short loop/turn connecting α1 and β2, thereby allowing the main chain to reverse direction. There is also sequence preference at positions Gly8 (preceding β1) and Ser66 (turn between α2 and β4), suggesting that the geometry of these turns is retained. These residues form part of a relatively flat surface of the domain (Figures 3 and 5, bottom surface) which could be of functional importance for interaction with other proteins. Some of the conserved hydrophobic residues also form part of the BRCT domain surface (Figure 5). Noteworthy is the surface area comprising residues Lys9 (hydrophobic residue in most other BRCT domains; Figure 2), Phe46, Leu84 and Leu85. Since BRCT domains are found in proteins with diverse functions, it is possible that in addition to the observed XRCC1 BRCT dimer interface, BRCT domains may contain other protein–protein interaction sites specific to individual proteins.
BRCA1 BRCT model and the potential structural consequences of BRCT mutations
In this present project, we have studied the BRCT domain from XRCC1, since attempts at producing the corresponding BRCT domain from BRCA1 in quantities suitable for structural studies have so far been unsuccessful (data not shown). However, since a large number of BRCA1 single‐site mutations have been mapped to the BRCT sequences and the most C‐terminal BRCT domain appears to be of functional importance, a three‐dimensional model for this BRCT domain of BRCA1 was constructed based on its partial sequence identity to the C‐terminus of XRCC1 (see Materials and methods for details). As expected for this low level of sequence identity (Figure 2; 12% identity, 35% similarity), there are regions in BRCA1 that were difficult to model, in particular the alignment of residues within loop c2 (Lys1793–Pro1806) and α2 (Gln1814–Pro1831). However, the nature of the sequence conservation for the entire family implies that the overall predicted fold and the core area described here (Figure 6) are correct.
The location of mutations in this BRCA1 BRCT domain (obtained from BIC) are mapped onto the predicted structure (Figure 6) and shown in Table II. In addition, there are several other predisposing mutations that cause premature termination of the molecule due to frameshift and nonsense mutations which would clearly prevent formation of a fully folded BRCT domain (Figure 6). Two missense point mutations are known to be predisposing to cancer, and from their location in the model we can interpret the structural consequences of these substitutions (Table II). A Trp1837 (corresponding to residue 74 in the XRCC1 structure) to Arg mutation leads to this predominantly buried tryptophan in the protein core (conserved in nearly all BRCT family members; Figure 2) being replaced by a charged residue, and this substitution would probably prevent formation of a properly folded BRCT domain. The second point mutation, a substitution of an exposed Met1775(20) by Arg, is unlikely to prevent correct folding of the domain. Instead, this mutation, which is close to the dimer interface in the XRCC1 BRCT structure, may be involved in recognition of another protein molecule. A recent report on the association of BRCA1 with RNA helicase A supports this interpretation, since a Met1775(20) to Glu mutation reduces the binding stability of BRCA1 to RNA helicase A (Anderson et al., 1998). The predisposing in‐frame double residue deletion of conserved Val1809(48) and Val1810(49) within β3 (Figure 6) would also be likely to prevent correct BRCT domain folding. A number of unclassified mutant variants (in terms of cancer predisposition) of BRCA1 have also been identified (BIC database; Table II). Among these (Table II), at least four mutations would be expected to affect the integrity of the folding pattern of the BRCT domain and, consequently, would be likely candidates for cancer‐predisposing mutants. The Met1783(28) to Thr substitution changes a conserved hydrophobic residue mediating intramolecular helix–helix packing (α1–α3), and this alteration should prove detrimental to the protein fold. Furthermore, a Gly1788(33) to Val mutation would affect residues at a conserved turn region, and could result in incorrect folding. Two other mutations (Table II) may also have structurally deleterious consequences.
The structure of the C‐terminal XRCC1 BRCT domain reported here defines for the first time the overall fold of a BRCT protein module. The structure also provides a framework both to explain the sequence conservation within the BRCT family and to model other BRCT domains. Such a model for the C‐terminal BRCT domain of BRCA1 has allowed us to interpret a number of BRCA1 predisposing mutations in terms of their effects on the structure and folding of the BRCT domain. Furthermore, there are several unclassified BRCA1 BRCT mutations whose effects on cancer risk are unknown (BIC databases). We can now at least provide some assessment of the structural consequences of these mutations based on our BRCA1 BRCT model. An obvious next step in these investigations is to define the region of interaction between the two complementary BRCT domains of XRCC1 and DNA ligase III. To address this question, we are pursuing two parallel approaches, namely extensive site‐specific mutagenesis of XRCC1 BRCT surface residues, to identify the changes that disrupt the interaction with DNA ligase III, and co‐crystallization of the BRCT heterodimer complex. The present availability of the three‐dimensional structure of a member of the conserved BRCT domain family is an essential step in the unravelling of functional roles for these domains.
Materials and methods
The BRCT domain of the C‐terminal 96 amino acids of XRCC1 (residues 538–633) was overexpressed in E.coli fused to a C‐terminal FLAG octapeptide (Nash et al., 1997), purified by binding to an anti‐FLAG affinity column and eluted with FLAG peptide (Sigma Chemical Co.). Selenium–methionine (Se‐Met)‐labelled protein was expressed in E.coli strain B834 grown in minimal media supplemented with selenomethionine. The incorporation of selenium was confirmed by mass spectrometry. Se‐Met and native protein crystals were grown by vapour diffusion from solutions containing 4 M sodium formate, using protein concentrations of 3.5 mg/ml. The crystals belong to space group P3121 (a = b = 100.8 Å, c = 72.5 Å, α = β = 90°, γ = 120°), with two molecules in the asymmetric unit.
Data collection and processing
All diffraction data were measured on single crystals. A Se‐Met ‘native’ data set was collected to 3.2 Å resolution (λ = 0.97 Å) at room temperature on a Mar image plate using beam line 9.5 at the Daresbury Synchrotron Radiation Laboratory. A platinum heavy‐atom derivative was prepared by soaking Se‐Met crystals (8 h) in 1 mM K2Pt(NO2)4. The platinum data set was measured to 3.5 Å resolution (λ = 0.97Å) at 100 K on a Mar image plate using beam line W21 at LURE. Cryoprotectant conditions were achieved by adding and increasing the concentration of glycerol from 5% (v/v) to 20% (v/v). Data were processed with DENZO (Otwinowski and Minor, 1997) and SCALEPACK (Otwinowski and Minor, 1997), and most subsequent calculations were carried out with the CCP4 program suite (CCP4, 1994). Data statistics are summarized in Table I.
Structural determination and refinement
Two Pt heavy‐atom sites were first located from manual inspections of isomorphous difference Patterson maps, and their parameters were refined using the maximum likelihood method as implemented in SHARP (De La Fortelle and Bricogne, 1997). Residual difference Fourier analyses revealed no further significant Pt sites. Single isomorphous replacement phases were calculated and improved by solvent flattening with SOLOMON (CCP4, 1994) using a solvent content of 68%. Subsequent electron density maps allowed an initial backbone trace to be fitted with the program O (Jones et al., 1991). Extra electron density was identified and the backbone model was fitted manually into these densities which form a heterodimer with the initial backbone model. The non‐crystallographic symmetry (NCS) 2‐fold axis and its transformation matrix were then obtained using the backbone heterodimer model and the function Lsq_explicit in program O (Jones et al., 1991). The electron density map was improved further by a combination of real space 2‐fold NCS averaging and histogram matching with DM (CCP4, 1994). The phases were then extended from 3.5 to 3.2 Å resolution using real space 2‐fold NCS averaing and solvent flattenting. A model containing residues 538–633 of XRCC1 was then built into the resulting electron density map using program O (Jones et al., 1991). The model was refined using least square minimization as implemented in X‐PLOR (version 3.84) (Brünger, 1996). This includes 150 steps of positional refinement followed by simulated annealing refinement with slow cooling, during which the temperature was decreased from 3000 to 300 K (Brünger, 1996). The temperature factors were refined with the restriction that adjacent atoms do not vary more than 2σ. Strict NCS restraints were enforced throughout the refinement, which gives a data to parameter ratio of 1.9:1. Subsequent (2Fo – Fc) maps showed clear electron density for all residues, except the FLAG octapeptide which was not visible. The final model has a free R‐factor of 26.5% with good geometry, as detailed in Table I; 76% of the residues in the final model are in the most favoured region in the Ramachandran plot, with no residues found in disallowed regions. Coordinates are being deposited in the Protein Data Bank.
Comparative modelling of the BRCA1 C‐terminal BRCT domain
The model of the C‐terminal BRCA1 BRCT domain was constructed from a single template, namely the XRCC1 BRCT structure. Loops and regions with incompatible φ/ψ angles to the template were replaced by database searches as described in Bates et al. (1997). Manual intervention was needed if candidate fragments could not be found to cover a region. A number of fragment conformations were selected for each gap, and the best candidate was chosen from the ensemble by a modification of the self‐consistent mean field approach to gap closure (Koehl and Delarue, 1995). Side chain rotamers were assigned initially by tracing the path of the template side chain as far as possible (Bates et al., 1997). After the replacement of all side chains, extra conformers from a side chain rotamer library were built at each residue position. The best conformer was then selected via a second mean field refinement, where each conformer feels the average environment due to conformers of other residues weighted by their respective probabilities (Koehl and Delarue, 1994). Energy parameters were taken from Lee and Subbiah (1991). To remove the small number of steric clashes remaining in the model, 100 steps of steepest descents were run using program CHARMM (v 3.3; Molecular Simulations Inc. 200 Fifth Avenue, Waltham, MA). The Protein health checks option in the program QUANTA (v 3.3; Molecular Simulations Inc.) was used to check the general packing quality of the protein core. The model passed all filters such as excess volume within the core and close contacts. The program PROCHECK (Laskowski et al., 1993) was also used to check stereochemical quality of the model (see Martin et al., 1997, for a detailed assessment of comparative modelling).
We are grateful to the staff of LURE (Orsay) for making station W21 of LURE‐DCI available to us. The Se‐Met data sets were collected at Daresbury as part of the Daresbury's protein crystallography data collection trial service. We thank Graeme Card, Pawel Dokurno, Mike Gorman, Suhail Islam, Richard Bowater and John Sgouros for helping with this project. We also thank Dinah Raman and Darryl Pappin for mass spectrometry and sequencing analyses.
- Copyright © 1998 European Molecular Biology Organization