The crystal structure of domain II of duck carboxypeptidase D, a prohormone/propeptide processing enzyme integrated in a three repeat tandem in the natural system, has been solved, constituting a prototype for members of the regulatory metallocarboxypeptidase subfamily. It displays a 300 residue N‐terminal α/β‐hydrolase subdomain with overall topological similarity to and general coincidence of the key catalytic residues with the archetypal pancreatic carboxypeptidase A. However, numerous significant insertions/deletions in segments forming the funnel‐like access to the active site explain differences in specificity towards larger protein substrates or inhibitors. This α/β‐hydrolase subdomain is followed by a C‐terminal 80 residue β‐sandwich subdomain, unique for these regulatory metalloenzymes and topologically related to transthyretin and sugar‐binding proteins. The structure described here establishes the fundamentals for a better understanding of the mechanism ruling events such as prohormone processing and will enable modelling of regulatory carboxypeptidases as well as a more rational design of inhibitors of carboxypeptidase D.
The metal‐dependent carboxypeptidase (CP) family currently contains ∼25 members, which can be subdivided into two subfamilies, the digestive enzymes and the regulatory enzymes (Fricker, 1988; Skidgel, 1988, 1996; Rawlings and Barrett, 1995). The latter are involved in more selective processing reactions than the mere digestion of intake proteins. Within each group, the members have 25–63% amino acid sequence identity, but it decreases to only 15–25% when comparison is performed between subfamilies. Among the digestive CPs, pancreatic CPA1, CPA2, CPB, mast cell CPA and plasma CPB (also known as CPU) have been described (Avilés et al., 1993). These proteins have a CP domain of ∼300 amino acids and are secreted as inactive zymogens containing a 90–95 amino acid N‐terminal prosegment (Avilés et al., 1993). The regulatory CPs include CPD, CPE, CPM, CPN, CPZ, and proteins designated CPX‐1, CPX‐2 and AEBP‐1 (He et al., 1995; Song and Fricker, 1995, 1997; Skidgel, 1996; Xin et al., 1998; Lei et al., 1999). These enzymes perform a variety of important cellular functions, including prohormone processing, regulation of peptide hormone activity and alteration of protein–protein or protein–cell interactions (Skidgel, 1988, 1996; Fricker, 1991). Knowledge about the sequence and function of these CPs has increased dramatically over the past two decades as a consequence of the research on biologically active peptides and the discovery that some of these hydrolases are involved in their processing (Naggert et al., 1995; Fricker et al., 1996).
All members of the regulatory subgroup have a conserved region of ∼400 residues, suggesting the presence of a second smaller subdomain of ∼100 amino acids following the 300 amino acid catalytic subdomain. The active site, metal binding and substrate binding residues of pancreatic CPA/CPB are generally conserved in most members of the regulatory group, suggesting a common fold for the catalytic subdomains. However, CPX‐1, CPX‐2, AEBP‐1 and the third repeat of CPD lack several of these residues critical for proteolytic activity. They have been proposed to function as binding proteins rather than active enzymes.
CPD is a 180 kDa single‐chain protein containing three tandem repeats of ∼390 amino acids linked by short bridge regions, which are followed by a transmembrane domain and a short 60 residue sequence that forms the cytosolic tail (Kuroki et al., 1995; Tan et al., 1997; Xin et al., 1997). The first two repeats are catalytically active and have ∼40–50% amino acid identity with other members of the regulatory subfamily (Tan et al., 1997; Xin et al., 1997). CPD was initially described and characterized from cattle (Song and Fricker, 1995). The duck homolog, gp180, was identified by its ability to bind the preS envelope protein of duck hepatitis B virus particles (Kuroki et al., 1995). Recently, the rat and human homologs were cloned and sequenced (Tan et al., 1997; Xin et al., 1997). A comparison of the human and rat enzymes with duck gp180 reveals 66, 83 and 82% sequence identity among the three repeats. The second repeat is the most conserved of the two catalytically active repeats and was therefore chosen as the subject of this study. The third repeat is also highly conserved, implying that, albeit being probably catalytically unfunctional, this region may play an important role, either in the secretory pathway or on the cell surface, such as binding an endogenous preS‐like factor (Eng et al., 1998). Furthermore, a two‐repeat Drosophila homolog has been found as the silver gene product, associated with embryogenesis (Settle et al., 1995), and a four‐repeat form has been described for Aplysia (Fan et al., 1999).
CPD removes C‐terminal basic residues and prefers an alanine in the penultimate position. The majority of proteolytic activity has been reported to be membrane‐bound in bovine and is only extracted in appreciable amounts using detergent. The optimal pH is 5.5–6.5 and the activity is stimulated by cobalt ions and inhibited by zinc chelating agents (Song and Fricker, 1995). The wide distribution of CPD in virtually all tissues examined (Song and Fricker, 1995, 1996; Tan et al., 1997), and the enrichment in the trans‐Golgi network and immature secretory vesicles, suggest a role for this enzyme in the processing of many prohormones/proproteins that transit either the regulated or constitutive secretory pathways by removing basic C‐terminal residues following the action of furin and related endopeptidases. In addition, CPD could potentially function in endosomes, cleaving peptide hormones such as bradykinin and epidermal growth factor after receptor‐mediated endocytosis (Tan et al., 1997). The re‐uptake of CPD from the cell surface may furthermore contribute to the pathogenesis of hepatitis B in the duck (Eng et al., 1998). Given that CPD seems to be the primary cellular receptor for viral infection with hepatitis B viruses (Breiner et al., 1998; Urban et al., 1999), its structural knowledge could facilitate the design of antivirals.
Although crystal structures of metallocarboxypeptidases are available for both pancreatic CPAs and CPBs and their proforms (Rees et al., 1983; Coll et al., 1991; Guasch et al., 1992; Gomis‐Rüth et al., 1995; García‐Sáez et al., 1997), for CPT from Thermoactinomyces vulgaris (Teplyakov et al., 1992), CPG2 from Pseudomonas (Rowsell et al., 1997) and Streptomyces albus muramoyl‐peptapeptide CP (Dideberg et al., 1982), no structure has been published for a member of the regulatory group. The low degree of amino acid sequence identity and the presence of a large number of insertions/deletions (see Figure 1) observed when pairing the sequences of the latter to those of other proteins of known 3‐dimensional structures does not permit extraction of accurate conclusions for the structural determinants of the cleavage mechanism and inhibition of members of the regulatory group or accurate modelling. Furthermore, the folding of the C‐terminal subdomain within each repeat is unknown. In order to address these issues we have undertaken the three‐dimensional structure analysis of gp180 carboxypeptidase D domain II (CPD‐2) and describe here the main structural features.
Results and discussion
For the structural studies, CPD‐2 was expressed in a Pichia pastoris system, which had previously been used to express large amounts of digestive CPs (Reverter et al., 1998; Ventura et al., 1999), using a similar construct to that previously employed for expression in the baculovirus system (Eng et al., 1998). The P.pastoris system provided an efficient system for the production of tens of milligrams of CPD‐2 protein per litre of culture, and rendered an enzyme with catalytic properties close to those of the full‐length protein.
Overall structure of CPD‐2
The CPD‐2 polypeptide chain is defined from residue Gln4 to Thr383 (for numbering, see Figure 1 and Materials and methods) and is folded into two distinct subdomains, the catalytic CP subdomain (Gln4–His303) and the C‐terminal subdomain (CTSD; Arg304–Thr383). It is worth mentioning that the sequence of the C‐terminal subdomain is highly conserved among all known regulatory carboxypeptidases and is thus present in the three duck CPD repeats.
The former, of approximate overall dimensions 50 × 50 × 40 Å, is formed by a doubly wound eight‐stranded β‐sheet of mixed topology (Figure 2A and B), and strand connectivity +1,+2,−1x,+2x,+2,+1x,−2 (Richardson, 1981) twisted for ∼125°. The central sheet (strands I–VIII) is flanked on both sides by three and six helices, respectively (Figure 2A and B). This folding topology is in accordance with the α/β‐hydrolase fold (Ollis et al., 1992) or PLEES‐protein family fold (Puente and López‐Otín, 1997) already observed in other metallocarboxypeptidases (see below).
The CTSD consists of 80 amino acids and displays a rod‐like shape with approximate overall dimensions 25 × 25 × 40 Å, with its N‐ and C‐terminus located on opposite sides of the rod (Figures 2A and 3A). It forms a β‐barrel or β‐sandwich of prealbumin‐like folding topology made up by seven strands (strands IX–XV) connected by short loops, with the first four strands displaying a Greek‐key topology. The β‐barrel can be interpreted as composed of two parallel sheets, a three‐stranded one comprising strands IX, XII and XV (see Figure 2B), and a four‐stranded one encompassing strands X, XI, XIII and XIV. The structure of this subdomain is maintained by a buried hydrophobic cluster running across the whole CTSD made up by the side chains of Ile306, Val310, Ile319, Ala322, Ile324, Val326, Ala327, Ile329, Val333, Thr335, Tyr341, Leu345, Tyr350, Val352, Ala354, Tyr359, Val362, Val366, Val368, Val374, Val376, Phe370 and Leu380 (Figure 3A). CTSD shares topological similarity and connectivity with transthyretin (prealbumin; PDB access code 1etb; Blake and Oatley, 1977) over 78 backbone Cα atoms (r.m.s.d. 2.4 Å) despite low sequence similarity (18% identity) and despite lacking an eighth strand present in the latter structure. Transthyretin, one of the most strongly conserved plasma proteins, is associated with several forms of amyloidosis (Blake and Oatley, 1977). It binds to and distributes thyroid hormones and also forms a complex with retinol‐binding proteins. Further structural relatedness of CTSD due to superimposable β‐strands and connectivity is found with a 70 residue subdomain (r.m.s.d. 2.5 Å) of cyclodextrin glucanotransferase (Harata et al., 1996; PDB access code 1pam), a 71 amino acid (r.m.s.d. 2.6 Å) glucoamylase fragment (Sorimachi et al., 1997; PDB access code 1kum), and a subdomain of protocatechuate 3,4‐dioxygenase (Orville et al., 1997; PDB access code 3pch), despite some larger loop insertions in the latter (76 topologically equivalent Cα atoms, r.m.s.d. 2.4 Å).
Three N‐linked glycosylation sites have been found in CPD‐2 at Asn136, Asn321 and Asn377 (see Figures 2A, B and 3A), in accordance with the sequence‐based prediction for this domain in the duck (Kuroki et al., 1995). For the human homologue the same sites have been proposed, as well as a further unique one (Tan et al., 1997). All sites are surface located, freely exposed to solvent and not involved in inter‐ or intra‐subdomain stabilization. One site has been localized in the connecting segment between helices D and E of the CP subdomain, not far away from the active site cleft. Two more sites are observed in the CTSD, on opposite sides of the longitudinal surface of the rod.
Three sulfate anions have been assigned in the CP subdomain based on the electron density. Their presence is chemically reasonable due to the excess of this anion in the crystallization solution. SO4998 is located in the active site cleft (Figure 2A) as in carboxypeptidase T (CPT; Teplyakov et al., 1992). SO4997 is placed on the surface in the vicinity of the N‐terminus and is coordinated to both Arg292 and Arg293 side chains. Finally, SO4996 is charge compensated by Lys223 Nζ and two solvent molecules (Wat642 and Wat644). The only disulfide bridge (out of a total of four cysteine residues present in the structure) is observed in the catalytic moiety in the neighbourhood of this latter sulfate anion, SO4996, and is established between Cys230 and Cys275. It connects loop αGβVII with loop βVIIIαI contributing to anchor the chain segment Tyr225–His241, a surface‐located loop unique to regulatory CPs (see below), to the main subdomain body. Cys230 and Cys275 are conserved in all members of the regulatory CP subfamily.
CTSD is connected to the CP subdomain via helix I, which belongs to the latter and is bent for ∼40° along its sequence, and is attached to the N‐edge of the central β‐sheet of the CP subdomain with the β‐barrel forming strands almost perpendicular to the former (Figure 2A). The interactions between both subdomains are mainly of a hydrophobic nature, burying a surface of 827 Å (calculated employing a probe radius of 1.6 Å). A total of 48 van der Waals interactions (<4 Å), nine hydrogen bonds and one (double) salt bridge (Asp206–Arg343) have been observed. The main segments implicated are the loops or connecting segments βIIβIII, αEβV, βVIαF (and the subsequent beginning of helix αF), αHβVIII (and the end of helix αH) of the CP subdomain, and strands βIX, βXI, βXII and βXV and loops βXβXI and βXIβXII of CTSD.
The active site
The active site cleft is placed at the C‐edge of the central part of the β‐sheet of the catalytic moiety. The catalytic zinc ion is pentahedrally coordinated by His74 Nδ1, His181 Nδ1, both Glu77 Oϵ1 and Oϵ2, and solvent molecule Wat601 (see Figure 3B), which in turn is coordinated to the general base Glu272 (Wat601–Glu272 Oϵ1, 2.94 Å). In the absence of a substrate, a sulfate anion (SO4998) occupies the position of the C‐terminal carboxylate group of a substrate. Based on previous studies performed with CPs and their zymogens (Christianson and Lipscomb, 1989; Coll et al., 1991; Kim and Lipscomb, 1991; Avilés et al., 1993; García‐Sáez et al., 1997), the residues essential for catalysis have been identified.
The S1′ subsite (nomenclature according to Schechter and Berger, 1967), which forms the enzyme specificity pocket and the C‐terminal anchoring of the substrate, is designed to accommodate positively charged bulky residues at the substrate C‐terminus in a fashion similar to CPB (Coll et al., 1991). This pocket is lined by residues Gly246–Gln257, Asn188–Asp192 and Phe267–Thr270. The electronegative character of this pocket is provided by Asp192, whose side chain points towards the interior of the subsite. This key residue is vicinal to the only cis‐peptide bond found in the structure (Pro190–Phe191), which may adopt this unusual conformation to keep the acidic side chain in its appropriate position. This feature is favoured by the establishment of an inter‐main chain hydrogen bond between Phe191 N and Ser202 O and a further one between Pro190 O and Ser204 Oγ, both incompatible with a trans‐peptide bond between Pro190 and Phe191. The specificity pocket is further closed by the aromatic ring of Tyr250 (equivalent to Tyr248 in CPA), fixed in the ‘down’ conformation due to the presence of the sulfate anion as observed in other CPs with an ‘occupied’ pocket (Teplyakov et al., 1992; García‐Sáez et al., 1997), and by Asn144, whose Nδ2 atom would additionally coordinate a C‐terminal carboxylate oxygen from a substrate. This latter group would be further anchored to the protein moiety through the side chains of Arg145 and Arg135, which in our structure coordinate the sulfate anion. The latter arginine (equivalent to Arg127 in CPA) would also be in charge of polarization of the scissile carbonyl and subsequent stabilization of the transition state following a general base mechanism (Christianson and Lipscomb, 1989).
The S1 subsite is shaped by Gly182–Ser184, Glu272 (Glu270 in CPA), Tyr250 (which is probably involved in hydrogen‐bonding of the P1 amide nitrogen through its OH group), and notably by Trp249, which strongly restricts the size of this site and explains the enzyme preference for an alanine in P1. The S2 and S3 subsites involved in substrate binding and torsion are constituted by Arg135, Arg145, Asp142, Lys277 (putatively involved in P2 carbonyl oxygen binding) and Gly130, Gly131, Tyr234, respectively.
The formation of the ‘oxyanion hole’ for the scissile carbonyl oxygen might be promoted by Asp142, hydrogen‐bonded to the zinc ligand His74 side chain and salt‐bridged to Arg135, as pointed out previously for CPT and digestive CPs (Phillips et al., 1990; Teplyakov et al., 1992).
Detailed comparison of CPD‐2 catalytic subdomain with other proteases
The CPD‐2 catalytic moiety shares the central twisted eight‐stranded β‐sheet flanked by α‐helices with other proteases displaying the α/β‐hydrolase fold (Ollis et al., 1992; Puente and López‐Otín, 1997) that belong to the cysteine‐, serine‐ and metalloprotease families. In many of them, the topology and strand connectivity is the same for the first six strands, all of them parallel except strand II. The active sites, despite differences in the nature of their constituting residues and hydrolytic mechanisms, reside in equivalent loci on the C‐terminal site of the β‐sheet and close to the end of strand V (see Figure 2B). Interestingly, most of them are exopeptidases chopping off either N‐ or C‐terminal residues from substrates. Key residues for catalysis are provided by loops connecting the regular secondary structure elements. The orientation of the last two β‐strands enables a further subdivision of these α/β‐hydrolases: (i) parallel to the previous four in prolyl iminopeptidase (Medrano et al., 1998), prolyl oligopeptidase (Fülöp et al., 1998), wheat and yeast serine carboxypeptidase (Liao et al., 1992; Endrizzi et al., 1994; these two display two additional strands at the outer end of the sheet, inserted between strands VII and VIII; see Figure 2B); and (ii) antiparallel in CPD, CPA/CPB (Rees et al., 1983; Coll et al., 1991; Guasch et al., 1992; García‐Sáez et al., 1997), CPT (Teplyakov et al., 1992), pyrrolidone carboxypeptidase (Singleton et al., 1999; this structure lacking the first two strands), Aeromonas proteolytica aminopeptidase (Chevrier et al., 1994), Streptomyces griseus aminopeptidase (Greenblatt et al., 1997), leucine aminopeptidase (Burley et al., 1990) and carboxypeptidase G2 (Rowsell et al., 1997).
CPD‐2 catalytic subdomain displays closest topological similarity to both CPT and CPA/CPB. The main chain atoms of all three exopeptidases can be superimposed fairly well over most parts of the structures as denoted by r.m.s.ds of 1.6 Å over 264 topologically equivalent Cα atoms of CPD‐2 and CPA deviating less than 3.5 Å, of 1.4 Å superimposing CPD‐2 and CPT (261 common Cα atoms), and of 1.5 Å comparing CPD‐2 and CPB (267 common Cα atoms). All regular secondary structure elements of CPD‐2, except helix G, lie in regions topologically equivalent to CPA and CPT (Figure 1). This similarity is nonetheless lower than that observed between CPA and CPT (1.2 Å; 290 common Cα atoms). The common fold and equivalence in the position of the residues described to be crucial for catalysis (Figures 1 and 4A, C) is further underlined by the presence of a cis‐peptide bond at an equivalent position, Pro190–Phe191 in CPD‐2, Pro213–Tyr214 in CPT and Pro205–Tyr206 in CPA/CPB. However, CPT displays three additional cis‐peptide bonds, while CPA/CPB have two more (Rees et al., 1983; Coll et al., 1991; Teplyakov et al., 1992). The most important differences between the catalytic subdomains can be observed in the chain segments that form the funnel‐like access to the active site cleft. Segment Ser124–Val133 is folded inwards over the molecular body in CPD, partially covering the access to the active site (top centre of Figure 4A and C). On another side of the rim of the active site, an important deletion is observed between Asp149 and Pro158 in the regulatory enzyme (Figure 1), equivalent to Ala149 and Ser172 in CPA (top, facing side in Figure 4A and C). Furthermore, the adjacent loop (Ser131–Val139) in CPA is cross‐connected to the region Ala149–Ser172, whereas no equivalent crossconnection is present in CPD. Importantly, these two regional differences discussed previously clearly distinguish CPD from other known CPs since the affected loops are almost perfectly superimposable when comparing pancreatic and bacterial CPs. The most significant structural difference is found, however, in the unique insertion of segment Tyr225–His241 at the beginning of strand VII, a structural feature further shaping the funnel border.
All these differences suggest that CPD‐2 may have a distinct selectivity towards larger substrates. For instance, the potato carboxypeptidase inhibitor, which forms a tight complex with CPA, could not bind CPD‐2 due to severe steric hindrance (see Figure 4C). Another interesting point is that CPD‐2 does not display a surface complementarity to a prosegment, as observed in the zymogens of CPA and CPB, consistent with the absence of such a propart in CPD. In CPA and CPB (Rees et al., 1983; Coll et al., 1991; Guasch et al., 1992), the prosegment has been proposed to perform a cotranscriptional chaperone function required for proper folding of the whole zymogen (Ventura et al., 1999). In contrast to the digestive CPs, which must be stored before secretion in an environment where their activity is suppressed, regulatory CPs are produced directly in the appropriate tissues.
Although the residues responsible for catalysis are mainly conserved (Figure 1), a closer inspection of the active sites reveals that the P1 carbonyl oxygen stabilizing Arg71 (in CPA and CPT) is functionally replaced in CPD by Lys277, in approximately the same place as Phe279 in CPA/T (the corresponding Cα atoms are just 2 Å away), and whose Nζ atom is directed in our unliganded structure to meet a hypothetical P1 carbonyl oxygen. Further differences arise when this detailed inspection is extended to the specificity pocket. In CPB, which displays a preference for basic C‐termini similar to CPD, the electronegative character is provided by Asp255 (CPB numbering; Coll et al., 1991; PDB access code 1nsa), at a position equivalent to Gln257 in CPD. In contrast, this role is carried out by Asp192 in the latter (Ser207 in CPB), despite the almost perfectly superimposable peptide chain folding in this region. The size of the pocket is also distinct: Thr194, Leu203, Gly243, Tyr248, Ala250, Gly253, Ser254, Asp255 and Thr268 render in CPB a pocket somewhat larger than the topologically equivalent Asn179, Asn188, Gly246, Tyr250, Val252, Gly255, Met256, Gln257 and Thr270 in CPD‐2.
The structure of CPD‐2, a prototype for the regulatory metallocarboxypeptidase subfamily, reveals that its CP subdomain, despite overall structural similarity to pancreatic CPs, possesses distinct features that affect the access to the active site cleft and might be responsible for a sharp selectivity on natural peptidic substrates. The P1 carbonyl oxygen stabilizing residue Arg71 from the pancreatic forms is replaced by an asparagine in CPD‐2, and the function of the former is probably performed by Lys277. The structure described here provides an important tool for the computational‐based modelling of other regulatory carboxypeptidases (unfeasible until now) and for the development of CPD inhibitors for a variety of applications including antivirals and antibiotics. The 80 residue C‐terminal subdomain, whose function has still to be fully confirmed, has been unveiled to display a seven‐stranded β‐sandwich topology.
Materials and methods
Cloning, expression and purification of CPD
To obtain large amounts of duck CPD domain II, the high level eukaryotic P.pastoris expression system (Invitrogen) was chosen as previously described for human proCPA2 (García‐Sáez et al., 1997; Reverter et al., 1998). Briefly, a 1199 bp SnaBI–StuI cDNA fragment encoding most of duck CPD domain II was inserted into the SnaBI site of pPIC9 P.pastoris expression system. Then a 144 bp PCR XhoI–SnaBI fragment, containing the 5′ end of CPD domain II and vector sequences to encode an α‐factor KEX2 cleavage site, was subcloned into the XhoI–SnaBI site. The CPD domain II sequence was confirmed by the DNA sequence facility of the Albert Einstein College of Medicine. The protein was expressed in the His+Muts P.pastoris strain KM71. The protein produced was detected by an antiserum against full‐length duck CPD using Western blot analysis. Purification was achieved by adjusting the expression media to pH 5.5 with 0.1 M acetic acid and applying onto a p‐aminobenzoyl‐Arg Sepharose 6B substrate affinity column as previously described (Song and Fricker, 1995). CPD‐2 was eluted with 50 mM Tris–HCl pH 8.0, 100 mM NaCl and 0.01% CHAPS containing 25 mM arginine. The examination of carboxypeptidase enzymatic activity of the purified protein was performed as previously reported (Song and Fricker, 1995).
Crystallization and data collection
Cubic crystals (P213; a = 135.54 Å) were obtained at 20°C using the sitting‐drop vapour diffusion method with Cryschem dishes (Charles Supper, MA, USA) from drops consisting of equal volumes of protein solution (10 mg/ml in 5 mM Tris–HCl pH 8.0, 50 mM NaCl, 0.5 mM benzamidine) and a 2.1 M ammonium sulfate, 0.1 M sodium acetate pH 5.2 precipitating solution. Drops were allowed to equilibrate against 300 μl of the latter. Crystals contain a protein monomer per asymmetric unit (VM = 4.6 Å3/Da; Matthews, 1968; 73 % solvent content) and were harvested with 2.5 M ammonium sulfate, 0.1 M sodium acetate pH 5.2 to permit manipulation. An isomorphous mercury derivative (a = 136.14 Å) was prepared soaking native crystals for 23 h in harvesting buffer further containing 10 mM of o‐phenanthroline in order to extract the catalytic zinc‐ion and then for an additional 32 h in harvesting buffer containing 0.5 mM mercury acetate. A cryoprotecting protocol was worked out consisting of successive equilibration of the crystals for 10–15 min against harvesting solutions with increasing glycerol concentrations (5–20%). Diffraction data were collected from 100K‐cryocooled crystals at EMBL beamlines X11 and BW7B, DESY (Hamburg), on MAR Research image plate area detectors. Data were processed with MOSFLM v. 6.0 (Leslie, 1991) and SCALA from the CCP4 suite (CCP4, 1994). Table I provides a summary of data collection and processing.
Structure solution and refinement
The structure was solved by the SIRAS method using a single mercury derivative and anomalous diffraction. Density modification procedures were particularly effective due to the high solvent content of the crystals. Difference and anomalous Patterson maps and, posteriorly, difference‐Fourier synthesis permitted the localization of three mercury sites (see Table I). These positions were refined with MLPHARE (CCP4, 1994) and phases were computed, rendering a mean figure of merit of 0.46 (20.0–2.8 Å). Posterior density modification (Cowtan and Main, 1996) considerably increased this value to 0.79 (20.0–3.0 Å). In parallel, molecular replacement calculations with AMoRe (Navaza, 1994) using the (appropriately truncated) coordinates of carboxypeptidase T (Teplyakov et al., 1992, PDB access code 1obr) rendered a clear solution using data in the 15–4.5 Å range that additionally confirmed P213 as the correct space group (α = 68.6°, β = 28.8°, γ = 333.7°, x = 0.0461, y = 0.3214, z = 0.1170, correlation coefficient = 29.4, Rfactor = 50.7; 2nd highest solution: correlation coefficient = 21.0, Rfactor = 53.5; α, β, γ are in Eulerian angles, x, y, z are in fractional cell coordinates). However, this correctly oriented and translated model was just used for assistance in chain tracing and not for phasing in order to completely omit any possible model bias. The sigma A‐weighted map computed after density modification calculations was of excellent quality and permitted straightforward chain tracing of the whole molecule. Successive cycles of positional and temperature factor refinement performed with CNS v. 0.4 (Brünger et al., 1998) applying bulk‐solvent correction and anisotropic B‐factor correction, and manual model building on a Silicon Graphics workstation using Turbo‐Frodo (Roussel and Cambilleau, 1989), permitted gradual completion of the model and localization of solvent molecules and ions. Table I summarizes the refinement. The final model comprises residues Gln4 to Thr383 of the chemical sequence, 130 solvent molecules (labelled Wat601–Wat730), one zinc cation (residue Zn999) and three sulfate anions with (refined) partial occupancy (SO4996–SO4998). Three asparagine residues are glycosylated (Asn136, Asn321 and Asn377). For the first and last cases, two N‐acetylglucosamide and one mannose residues (labelled Nag901–Nag902–Man903 and Nag921–Nag922–Man923, respectively), for the second, just two N‐acetylglucosamide residues (Nag911–Nag912) have been traced. Standard β‐1,4 links have been assumed and (grouped) occupancies have also been refined for each carbohydrate moiety. All protein residues are clearly defined by electron density and are placed in the most favourable and additionally allowed regions of the Ramachandran plot, except Ala327, placed in a generously allowed region. One peptide bond (Pro190–Phe191) has been found in cis conformation. Two out of four cysteine amino acids join to establish a disulfide bond (Cys230–Cys275). Table I provides a summary of the final refined model parameters. As can be observed, the extremely high solvent content of the cell (73%) taken together with the bulk‐solvent correction performed render above‐average temperature factors for the whole structure, in accordance with the high temperature factor obtained from Wilson scaling (see Table I). Nonetheless, all structural features and parameters are correct and the final electron density is of excellent quality.
Numbering of CPD‐2 residues is sequential with Gln4 corresponding to Gln503 of the published sequence of duck CPD (Kuroki et al., 1995). Residue numbers of other proteins correspond to their respective PDB entries. Figures have been calculated with Turbo‐Frodo (Roussel and Cambilleau, 1989), GRASP (Nicholls et al., 1993), ALSCRIPT (Barton, 1993) and Bobscript (Esnouf, 1997). Superpositions were performed using Turbo‐Frodo. Ascription of an amino acid to a regular secondary structure element (minimum four residues) was based on participation of at least one of its N or O main chain atoms in the hydrogen bond network characterizing the secondary structure element. Searches for protein structures with similar folding topology were made with the DALI (Holm and Sander, 1993) server at EBI in Hinxton (http://www.embl‐heidelberg.de/dali/dali.html), the SCOP server (http://scop.mrc‐lmb.cam.ac.uk/scop/), and the CATH server (http://www.biochem.ucl.ac.uk/bsm/cath/server/index.html). The final coordinates have been deposited with the Protein Data Bank (access code qmu).
This work was supported by grants PB95‐0224 and BIO98‐0362 from the Ministerio de Educación y Cultura (Spain), by grant 1997SGR‐275 and the Centre de Referència en Biotecnologia, both from the Generalitat de Catalunya, by grant DK‐51271 from the National Institutes of Health and by the US–Spain Science & Technology Program, 1999. The support provided by the TMR/LSF programme to the EMBL Hamburg Outstation (ref. ERBFMGECT980134) is gratefully acknowledged.
- Copyright © 1999 European Molecular Biology Organization