A number of extracellular proteins contain cryptic inhibitors of angiogenesis. Endostatin is a 20 kDa C‐terminal proteolytic fragment of collagen XVIII that potently inhibits endothelial cell proliferation and angiogenesis. Therapy of experimental cancer with endostatin leads to tumour dormancy and does not induce resistance. We have expressed recombinant mouse endostatin and determined its crystal structure at 1.5 Å resolution. The structure reveals a compact fold distantly related to the C‐type lectin carbohydrate recognition domain and the hyaluronan‐binding Link module. The high affinity of endostatin for heparin is explained by the presence of an extensive basic patch formed by 11 arginine residues. Endostatin may inhibit angiogenesis by binding to the heparan sulphate proteoglycans involved in growth factor signalling.
In the adult mammal, new blood vessels are formed by angiogenesis, the sprouting of new capillaries from existing vasculature (Bussolino et al., 1997; Risau, 1997). The endothelium lining a mature vessel is normally quiescent and angiogenesis in the adult generally is associated with pathological situations, such as wound healing and tumour growth. Angiogenesis is a complex multi‐stage process that involves, in approximate temporal order, the proteolytic degradation of the basement membrane, loss of endothelial cell adhesion, proliferation and migration of endothelial cells into the surrounding stroma, and finally re‐adhesion of endothelial cells to form the lumen of the new capillary tube. Ongoing angiogenesis is essential for the rapid growth of solid tumours and it appears that successful tumours actively influence the ‘angiogenic switch’ to sustain continuous cell proliferation. The strict dependence on recruiting new blood vessels from the host makes tumours vulnerable to anti‐angiogenic therapy (Hanahan and Folkman, 1996).
A number of inducers of angiogenesis have been identified (Hanahan and Folkman, 1996; Beck and D'Amore, 1997; Bussolino et al., 1997). Of these, acidic and basic fibroblast growth factor (aFGF and bFGF, respectively) and vascular endothelial growth factors (VEGF) are the most widely expressed in normal adult organs. These growth factors are recognized on endothelial cells by transmembrane tyrosine kinase receptors coupled with the intracellular signal transduction pathways. Cell surface heparan sulphate proteoglycans play a key role in bFGF signalling, either by assisting the association of bFGF with its receptor or by promoting bFGF oligomerization (Yayon et al., 1991; Friesel and Maciaq, 1995; Herr et al., 1997; Moy et al., 1997).
Given the requirement for tight control of angiogenesis, it is not surprising that there exist a number of endogenous angiogenesis inhibitors, and that imbalance in their levels is associated with tumorigenesis (Hanahan and Folkman, 1996). One well‐established example is the secreted glycoprotein thrombospondin‐1 (TSP‐1), which is regulated by the p53 tumour suppressor protein (Good et al., 1990; Dameron et al., 1994). Both full‐length TSP‐1 and specific fragments of it inhibit angiogenesis (Good et al., 1990; Tolsma et al., 1993). In contrast, some of the most potent angiogenic inhibitors are fragments derived from abundant extracellular proteins that themselves do not regulate angiogenesis. A 29 kDa fragment of fibronectin (Homandberg et al., 1985), a 16 kDa fragment of prolactin (Ferrara et al., 1991; Clapp et al., 1993) and a 38 kDa fragment of plasminogen termed angiostatin (O'Reilly et al., 1994) are all inhibitors of endothelial cell proliferation, whereas their parent proteins are not. The storage of a class of angiogenesis inhibitors as cryptic fragments of abundant proteins associated with the vascular system appears to be a recurring theme in the regulation of angiogenesis (Hanahan and Folkman, 1996; Sage, 1997). Many angiogenesis inhibitors bind with high affinity to heparin, suggesting antagonization of bFGF signalling as a possible mechanism of action (Brown and Parish, 1994).
Recently, a new angiogenesis inhibitor was identified that turned out to be a 20 kDa C‐terminal fragment of collagen XVIII. This heparin‐binding fragment, termed endostatin, specifically inhibits endothelial cell proliferation and potently inhibits angiogenesis and tumour growth (O'Reilly et al., 1997). Cycled therapy with recombinant endostatin reduced several experimental tumours, including Lewis lung carcinoma, to a dormant state and did not induce resistance. Even more remarkably, after a few cycles of treatment dormancy persisted even when therapy was discontinued (Boehm et al., 1997).
The α1(XVIII) collagen is an unusual collagen characterized by 10 domains of triple‐helical collagenous repeats separated by non‐triple‐helical repeats (Oh et al., 1994a; Rehn and Pihlajaniemi, 1994). It is expressed in a tissue‐specific manner as three alternative splice variants and is localized mainly in perivascular basement membrane zones (Muragaki et al., 1995; Rehn and Pihlajaniemi, 1995). The last six of the triple‐helical repeats of α1(XVIII) collagen are almost identical in size to those of α1(XV) collagen (Myers et al., 1992), and the name multiplexins (for multiple triple helix domains and interruptions) has been given to this new collagen family (Oh et al., 1994a). The α1(XVIII) collagen contains a non‐collagenous C‐terminal domain (NC1) of ∼300 residues; the angiogenic inhibitor endostatin corresponds to the last 184 amino acid residues of the NC1 domain (Figure 1). Interestingly, this exactly matches the conserved region between the α1(XVIII) and α1(XV) C‐terminal domains (Oh et al., 1994a).
Given the considerable potential of endostatin as an anti‐cancer therapeutic, we would like to understand the molecular mechanisms by which this protein inhibits endothelial cell proliferation and angiogenesis. To help design experiments addressing these questions, we have solved the crystal structure of endostatin at 1.5 Å resolution. A large basic surface area is proposed to be the heparin‐binding site of endostatin. Thus, endostatin may exert its antiproliferative effect by competing with bFGF for binding to cell surface heparan sulphate proteoglycans, which could disrupt the mitogenic growth factor signal.
Results and discussion
Protein expression and structure determination
Mouse endostatin was expressed at high levels as soluble protein in human embryonic kidney cells and was found to potently inhibit the bFGF‐induced proliferation of human umbilical vein endothelial cells, with an IC50 of ∼100 ng/ml (unpublished data). The secreted protein spans the 184 C‐terminal amino acid residues of mouse α1(XVIII) collagen and, additionally, contains the N‐terminal sequence APLA (Figure 1). To avoid ambiguities caused by the N‐terminal splicing of α1(XVIII) collagen (Muragaki et al., 1995; Rehn and Pihlajaniemi, 1995), we decided to number the endostatin sequence relative to its position in the NC1 domain, starting at His132.
The structure of endostatin was solved using crystals grown from ammonium phosphate at pH 5. Phases to 2.2 Å resolution were obtained by the multiple isomorphous replacement method and anomalous scattering (MIRAS) from three heavy‐atom derivatives (Table I). The resulting electron density map was of high quality, allowing the polypeptide chain to be traced without difficulty. The final structure, refined at 1.5 Å resolution (Figure 2; Table II), consists of residues Gln138 to Phe309 and a total of 83 water molecules. The first ten and last six residues are not visible and are presumed to be disordered.
Description of the overall structure
Endostatin folds into a single globular domain of approximate dimensions 35×30×30 Å (Figure 3). The structure is composed predominantly of a β‐sheet and loops, but also contains two α‐helices, one of them short. A total of 40% of the amino acid residues adopt an extended main chain conformation, but due to many irregularities such as kinks and bulges, only 25% are actually contained in uninterrupted β‐strands spanning more than two residues. The fold of endostatin is intricate and is best described with reference to a schematic representation (Figure 3C). The most prominent feature of the structure is a highly twisted mixed β‐sheet composed of seven strands (E, F, A, P, J, M and O). An α‐helix of 14 residues, α1, packs against one face of this sheet, whereas the other face is covered by elaborate loop structures involving a short stretch of antiparallel β‐sheet (strands G and N) and a second shorter α‐helix, α2. At the C‐terminus of strand A, the polypeptide chain bulges around a water molecule before re‐establishing hydrogen bonding with the N‐terminus of strand P for another two residues. The segment following α1 forms the short strand B, then kinks at Phe180 and leads into the β‐hairpin C–D. Strands B, J and P form a triangular structure, at the centre of which a water molecule hydrogen bonds with the peptide carbonyls of Phe180 and Trp251 and the amide nitrogen of Leu303. This water molecule is deeply buried in the hydrophobic core and represents an integral part of the endostatin structure. Apart from the C–D β‐hairpin, there are two additional classical β‐hairpins, one extending the central β‐sheet at strand A (strands H and I), the other following strand J after a kink at Gly253 (strands K and L). The overall arrangement of β‐strands in the endostatin structure can be described as a rather irregular β‐barrel propped open on one side by α2. Endostatin contains two disulphide bridges in a nested pattern, linking Cys164 with Cys304 and Cys266 with Cys296. The former disulphide bridge connects α1 to the central β‐sheet, and the latter circularizes a twisted loop containing strands M, N and O.
Despite the disjointed fold and the large fraction of irregular loop structures, endostatin is a compact molecule. All surface loops pack tightly against the body of the structure and many of them contribute to a large hydrophobic core. This core is divided into two regions of different size by strand J. The smaller core is centred around Trp251 and is built up mainly from amino acid side chains contributed by strands J, M, O and P. A much more extensive hydrophobic core fills the large concave face of the central β‐sheet and this is also the region where most of the longer surface loops are found.
Strands A and P of endostatin are situated next to each other, engaged in antiparallel hydrogen bonding. As a result, the first and last residue defined by the electron density (Gln138 and Phe309, respectively) are close in space. In the intact NC1 domain of collagen XVIII, endostatin is preceded by 131 residues. This N‐terminal portion of unknown structure is likely to interact with endostatin near Phe309 and this interaction may well lead to an ordering of the C‐terminus of endostatin, which significantly contains two large hydrophobic residues, Met310 and Phe313. We note that the side chains of Phe162 and Phe165, located next to each other on α1, are fully exposed to solvent (Figure 5A). In the crystal lattice they are covered by a hydrophobic packing contact, and we speculate that these two residues may be involved in interdomain interactions in the full‐length NC1 domain.
Collagen XVIII is a member of the so‐called multiplexin family of collagens that also includes collagen XV. These two collagens are distinguished by C‐terminal globular domains that share 58% sequence identity in their last 184 residues corresponding to endostatin in collagen XVIII (Oh et al., 1994a). We suggest that this common region represents a new extracellular module (Bork et al., 1996), as it appears unlikely that the compact endostatin structure is a soluble fragment excised from a larger domain. Interestingly, the ordered portion of the endostatin structure starts at an intron–exon boundary of α1(XVIII) collagen (Rehn et al., 1996). The autonomous folding of endostatin is also indicated by the high production rate of recombinant mouse endostatin (10–20 mg/l per day).
Comparison with other structures
An automated search of the DALI database (Holm and Sander, 1993, 1994) unexpectedly revealed that endostatin resembles the carbohydrate recognition domain (CRD) of mammalian C‐type lectins (Weis et al., 1991; Drickamer, 1993). The highest scores (Z = 3.1) were obtained with E‐selectin (Graves et al., 1994) and lithostathine, a homologue of the CRD that does not bind calcium or carbohydrate ligands (Bertrand et al., 1996). For the endostatin/E‐selectin CRD pair, DALI identified 77 equivalent residues whose Cα‐atoms could be superimposed with an r.m.s.d. of 3.1 Å. The corresponding sequence identity of 9% is well below the threshold of statistical significance, but the high degree of structural similarity strongly argues for an evolutionary relationship. Figure 4 shows that the entire β‐sheet structure of the E‐selectin CRD is contained within the endostatin structure; only strands E and F of the central β‐sheet of endostatin do not have counterparts in the E‐selectin CRD. The α‐helices are in topologically equivalent positions in the two proteins, but their disposition relative to the central β‐sheet varies. Significantly, both disulphide bridges in endostatin and E‐selectin align almost perfectly when the three‐dimensional structures are superimposed. Apart from Cys296 and Cys304, there are five additional amino acid residues in endostatin that have identical counterparts in the E‐selectin structure. With the exception of the surface residue Gln163 whose conservation may be adventitious, the location of these identities is telling: Pro246 and Gly253 flank the crucial strand J in the large β‐sheet, and Trp251 forms the nucleus of the smaller of the two hydrophobic cores in endostatin. The equivalent region in the C‐type lectin CRD domain is the ‘WIGL’ sequence (aromatic‐aliphatic‐glycine‐aliphatic), which is an important component of the C‐type lectin consensus (Weis et al., 1991; Drickamer, 1993).
The differences between endostatin and the E‐selectin CRD are concentrated in two regions, situated on opposite faces of the central β‐sheet. In E‐selectin, the connection between α1 and strand β2 (the equivalent of endostatin βJ) is afforded by a short strand, helix α2 and the loops preceding and following α2. The equivalent, much more elaborate region in endostatin contains eight, predominantly short strands (B–I) as well as α2 and accounts for most of the extra residues of endostatin compared with E‐selectin. In this region there is little similarity between the two structures, apart from the general location of α2. In E‐selectin, β5 (the equivalent of endostatin βO) and the loop preceding β4 form a high‐affinity calcium‐binding site involved in oligosaccharide binding (Graves et al., 1994). There is no indication of calcium binding to endostatin and, indeed, we find that the long loop providing three of the five calcium ligands in E‐selectin is shorter in endostatin and is arranged very differently, folded away from βO.
In summary, while endostatin is doubtless related to the C‐type lectin CRD, it has lost one of the defining features of this protein family, namely calcium‐dependent oligosaccharide binding. This is not an unprecedented observation. We have already mentioned lithostathine, a CRD homologue that acts as an inhibitor of stone formation in the pancreas (Bertrand et al., 1996). More relevantly, the recently elucidated structure of the Link module showed that this hyaluronan‐binding domain resembles the C‐type lectin CRD but does not use calcium for glycosaminoglycan (GAG) binding (Kohda et al., 1996). This is further discussed below in conjunction with the ligand‐binding properties of endostatin.
A putative heparin‐binding site
The mechanism(s) by which endostatin inhibits endothelial cell proliferation and angiogenesis have yet to be defined. However, given the high affinity of endostatin for heparin, interference with the heparan sulphate requirement of bFGF signalling is a distinct possibility. Endostatin contains a large number of basic residues, in particular arginines, and their distribution on the protein surface provides a clue to the location of the heparin‐binding site. We note that of the 15 arginine residues present in mouse endostatin, all but one are fully conserved in the human protein, the exception being a conservative replacement by lysine (the overall identity between mouse and human endostatin is 87%; Oh et al., 1994b). Given the general location of arginines in surface loops, this high degree of conservation is noteworthy and hints at an important function.
Apart from two crystal structures of bFGF with bound heparin‐derived tetra‐ and hexasaccharides (Faham et al., 1996), nothing is known about the detailed structural requirements of heparin binding to proteins. However, it is clear that basic residues are crucial; these are often arranged as discrete clusters with a spacing that matches the distribution of GAG sulphate groups (Fromm et al., 1997). In addition, neutral polar residues are required to provide hydrogen bonding partners for the sugar moieties (Thompson et al., 1994). The importance of aromatic residues, which is well established in the recognition of uncharged oligosaccharides, is less evident in the case of GAGs.
A representation of the surface electrostatic properties of endostatin is shown in Figure 5. Eleven out of the total of 15 arginine residues cluster on one face of the molecule (Figure 5). This extensive basic patch (diameter ≈20 Å) involves α1 and α2, strand B, the long loop connecting the C–D β‐hairpin to strand E, as well as strand L and the following loop around Cys266. The solvent‐exposed side chains of Phe162 and Phe165 (see above) are found at the periphery of the patch. A detailed inspection highlights one particular area as a candidate heparin‐binding site. The two arginine pairs Arg193/Arg194 and Arg259/Arg260 form the borders of a shallow depression, at the centre of which the side chain of Tyr215 emerges, surrounded by several additional polar residues. The residues forming this putative heparin‐binding site are contributed by the long irregular loop preceding strand E and strand L, both elements unique to endostatin when compared with the C‐type lectin CRD and the Link module. Binding of a larger GAG chain may involve additional arginines, possibly those centred around Arg158.
Endostatin contains a second, less extensive basic patch approximately opposite the large area defined by the 11 arginines discussed above (Figure 5). This patch, composed mainly of residues contributed by the H–I β‐hairpin and the N‐terminus of strand O, is of interest because it is close to the ligand‐binding site in the related C‐type lectins CRD (Weis et al., 1992; Graves et al., 1994). We cannot fully dismiss the possibility that heparin may also bind to this region, although we note that some of the basic residues clearly serve structural purposes and would not be available for ligand binding (Arg230, Arg237 and Lys248). Furthermore, our assignment of the heparin‐binding site to the face bearing exclusively arginine residues is consistent with results from chemical modification, which demonstrate the involvement in heparin binding of several arginine but no lysine residues (unpublished data).
The Link module, a domain of ∼100 residues found in several extracellular matrix proteins and in the cell surface receptor CD44, is a distant relative of the C‐type lectin CRD that binds the GAG hyaluronan. In the TSG‐6 Link module structure (Kohda et al., 1996), a critical basic residue in the loop preceding α1 and a patch of solvent‐exposed aromatic side chains define a putative hyaluronan‐binding site, which partly overlaps with the surface area used by the CRDs for oligosaccharide binding. In endostatin the equivalent region involves mainly strands M and O and the connecting loop to strand P, which do not coincide with either of the two basic patches described above. We therefore believe that endostatin and the Link module employ spatially distinct regions for GAG binding.
The function of the domain corresponding to endostatin in tissue‐deposited collagen XVIII and XV is not known. Collagen XVIII is mainly found in vascular basement membrane regions (Muragaki et al., 1995), and the C‐terminal NC1 domain may mediate interactions with basement membrane GAGs or proteoglycans. Full‐length collagen XVIII is likely to be immobilized in some kind of network. Proteolytic cleavage in the NC1 domain, perhaps by a proteinase secreted by a tumour, could produce soluble endostatin which would be free to diffuse to its targets and elicit its effects on endothelial cell proliferation and angiogenesis. The intact NC1 domain of collagen XVIII is not an inhibitor of angiogenesis (unpublished data), corroborating earlier indications that the antiproliferative activity of endostatin may be cryptic (O'Reilly et al., 1997). It is not obvious from our structure how this regulation of activity may take place. Steric blocking of an important epitope by the N‐terminal portion of the NC1 domain is a possibility. However, other scenarios can be envisaged. For instance, the exposed and mobile polypeptide chain termini of endostatin may be important for the antiproliferative effect.
Guided by our structure we are now planning to delineate the heparin‐binding site of endostatin by site‐directed mutagenesis, and this will provide a powerful tool to define and dissect the mechanisms of action of endostatin. If endostatin indeed acts by interfering with the heparan sulphate requirement of bFGF signalling, abolishment of heparin binding would be expected to be accompanied by a loss of inhibition of endothelial cell proliferation and/or angiogenesis. Alternatively, endostatin may interact with an as yet undefined protein or receptor and cause inhibition independent of GAG binding. Heparin binding may also turn out to be only one of several critical components of the mechanism, as is the case with bFGF signalling. Finally, the question how endostatin activity is regulated by the proteolytic unmasking of cryptic epitopes will be addressed by the functional characterization and structure determination of the full‐length NC1 domain of collagen XVIII. The insights gained from these studies should prove useful for the development of new approaches to cancer therapy.
Materials and methods
Construction of expression vector
Mouse α1(XVIII) cDNA clone mc3b (Oh et al., 1994a) was used as a template to amplify the sequence encoding endostatin by polymerase chain reaction (PCR) with Vent polymerase (New England Biolabs) following the manufacturer's instructions. The primer for the 5′ end was GTCAGCTAGCTCATACTCATCAGGAC and that for the 3′ end was GTCACTCGAGCTATTTGGAGAAAGAGGTC. In addition to the annealing sequences the primers contained an NheI site at the 5′ end or a stop codon followed by an XhoI site at the 3′ end, in order to allow the in‐frame insertion of the construct into the BM‐40 signal peptide (Mayer et al., 1993). The PCR fragment was cloned into the modified episomal expression vector pCEP‐Pu (Kohfeldt et al., 1997). The sequence of the construct was confirmed by cycle sequencing using Dye Terminator Cycle Sequencing Ready Reaction Kit (ABI).
Expression and purification of recombinant mouse endostatin
Human embryonic kidney cells that express the EBNA‐1 protein from Epstein–Barr virus (293‐EBNA cells, Invitrogen) were used for transfection with the expression vector (Kohfeldt et al., 1997). Resistant cells were selected with puromycin (0.5 μg/ml) and used for collection of serum‐free conditioned medium. The medium (≈1 l) was dialysed against 0.1 M NaCl, 0.05 M Tris–HCl pH 7.4, and then applied onto a heparin–Sepharose CL‐6B column (2.5×20 cm, Pharmacia) equilibrated in the same buffer. A linear 0.1–1.0 M NaCl gradient (500 ml) was used for elution. Endostatin eluted at 0.4–0.5 M NaCl and was further purified on a Superose 12 column (HR16/50, Pharmacia) equilibrated in 0.2 M ammonium acetate, pH 6.8. The purified product was soluble in neutral buffer and showed a single 22 kDa band in SDS gel electrophoresis under reducing conditions. The protein has a single N‐terminal sequence APLAHTHQ and contains less than one residue of hexosamine per molecule.
Crystallization and data collection
Crystals were obtained at room temperature by the hanging drop vapour diffusion method. Equal volumes (typically 2 μl) of a 10 mg/ml solution of endostatin in 5 mM MOPS, pH 6.8, and 1.5–1.7 M ammonium phosphate, pH 4.7–5.3, were mixed and equilibrated against 1 ml of the latter solution. The crystals belong to space group P212121 with unit cell constants a = 45.6 Å, b = 54.0 Å, c = 65.9 Å. There is one molecule of endostatin in the asymmetric unit, resulting in a solvent content of 37%. For heavy‐atom soaks, crystals were stabilized in 1.8 M Li2SO4, 0.1 M Na‐acetate, pH 5.3. All diffraction data except native II were collected at room temperature using an MAR image plate detector mounted on a rotating anode generator operated at 4 kW (CuKα radiation, λ = 1.54 Å). For derivative data collection, crystals were rotated around their carefully aligned a axis to minimize systematic errors in the measurement of Bijvoet pairs. Native II data were collected at room temperature on beamline 9.6 of the Daresbury Synchrotron Radiation Source using an MAR image plate detector (λ = 0.87 Å). Data were integrated with MOSFLM (Leslie, 1994) and reduced with programs of the CCP4 suite (Collaborative Computing Project No. 4, 1994). Data collection statistics are summarized in Table I.
Structure solution and refinement
Three heavy‐atom derivatives and the native I data were used for phasing by the MIRAS method (Table I). Soak conditions were 3 mM UO2SO4 for 3 days, 20 mM K2Pt(CN)4 for 1 day and 10 mM NaAu(CN)2 for 2 days. Heavy‐atom sites were deduced from difference Patterson maps, brought to a common origin and hand by cross‐phased difference Fourier maps, and refined with MLPHARE (Z.Otwinowski; Collaborative Computing project No. 4, 1994). Due to the high isomorphism of the U and Pt derivatives, useful MIRAS phases could be obtained to a resolution of 2.2 Å (mean figure‐of‐merit 0.584). The MIRAS map was subjected to density modification with DM (Cowtan and Main, 1996) in ‘combine omit’ mode employing solvent flattening, histogram matching and Sayre's equation. Approximately 75% of the structure could be built with confidence into the resulting map using O (Jones et al., 1991). The remaining loop structures were added after combination of partial model phases with the experimental phases using SIGMAA (Read, 1986). The structure was first refined with X‐PLOR (Brünger, 1992) against the native I data at 2.0 Å resolution to Rcryst = 0.192 (Rfree = 0.237). Refinement against the synchrotron native II data was then initiated by a round of simulated annealing refinement starting from 3000 K to remove model bias, followed by conventional positional and B‐factor refinement. The final model comprises residues 138–309 and 83 water molecules (Table II); 86.3% of the amino acid residues are in the most favourable regions of the Ramachandran plot, with the remaining 13.7% in additionally allowed regions, as defined by PROCHECK (Laskowski et al., 1993).
Coordinates and structure factors have been deposited in the Brookhaven Protein Data Bank (accession code 1KOE) and will be held for one year after publication.
We thank Dr Steve Wood (University of Southampton, UK) for access to X‐ray data collection facilities, Dr James Nicholson for help with synchrotron data collection and Dr Noriko Yamaguchi for carrying out the cell proliferation assay. E.H. is supported by a long‐term fellowship from the Human Frontier Science Program. Recombinant protein production was supported by EC contract No. BIO4‐CT96‐0537. B.R.O. acknowledges support by NIH grant AR36820.
- Copyright © 1998 European Molecular Biology Organization