Crystal structure of the CENP‐B protein–DNA complex: the DNA‐binding domains of CENP‐B induce kinks in the CENP‐B box DNA

Yoshinori Tanaka, Osamu Nureki, Hitoshi Kurumizaka, Shuya Fukai, Shinichi Kawaguchi, Mari Ikuta, Junji Iwahara, Tsuneko Okazaki, Shigeyuki Yokoyama

Author Affiliations

  1. Yoshinori Tanaka1,2,4,
  2. Osamu Nureki2,4,
  3. Hitoshi Kurumizaka1,3,4,
  4. Shuya Fukai2,
  5. Shinichi Kawaguchi3,
  6. Mari Ikuta2,3,
  7. Junji Iwahara2,3,
  8. Tsuneko Okazaki4 and
  9. Shigeyuki Yokoyama*,1,2,3
  1. 1 RIKEN Genomic Sciences Center, 1‐7‐22 Suehiro‐cho, Tsurumi, Yokohama, 230‐0045, Japan
  2. 2 Department of Biophysics and Biochemistry, Graduate School of Science, University of Tokyo, 7‐3‐1 Hongo, Bunkyo‐ku, Tokyo, 113‐0033, Japan
  3. 3 Cellular Signaling Laboratory, RIKEN Harima Institute at SPring8, 1‐1‐1 Kohto, Mikazuki‐cho, Sayo, Hyogo, 679‐5143, Japan
  4. 4 Institute for Comprehensive Medical Science, Fujita Health University and CREST of JST, Toyoake‐shi, Aichi, 470‐1192, Japan
  1. *Corresponding author. E-mail: yokoyama{at}
  1. Y.Tanaka, O.Nureki and H.Kurumizaka contributed equally to this work


The human centromere protein B (CENP‐B), one of the centromere components, specifically binds a 17 bp sequence (the CENP‐B box), which appears in every other α‐satellite repeat. In the present study, the crystal structure of the complex of the DNA‐binding region (129 residues) of CENP‐B and the CENP‐B box DNA has been determined at 2.5 Å resolution. The DNA‐binding region forms two helix–turn–helix domains, which are bound to adjacent major grooves of the DNA. The DNA is kinked at the two recognition helix contact sites, and the DNA region between the kinks is straight. Among the major groove protein‐bound DNAs, this ‘kink–straight–kink’ bend contrasts with ordinary ‘round bends’ (gradual bending between two protein contact sites). The larger kink (43°) is induced by a novel mechanism, ‘phosphate bridging by an arginine‐rich helix’: the recognition helix with an arginine cluster is inserted perpendicularly into the major groove and bridges the groove through direct interactions with the phosphate groups. The overall bending angle is 59°, which may be important for the centromere‐specific chromatin structure.


The centromere is a region of the chromosome essential for its segregation during cell division, and has a special chromatin structure involving α‐satellite DNA repeats and their associated proteins. In the human, the proteins associated with the centromere have been identified with specific autoantibodies for the centromere (Earnshaw and Rothfield, 1985). The human centromere proteins A, B and C (CENP‐A, CENP‐B and CENP‐C, respectively) are such proteins, and they exhibit DNA‐binding activities (Masumoto et al., 1989; Palmer et al., 1991; Saitoh et al., 1992; Sugimoto et al., 1994; Sullivan et al., 1994). These centromere‐specific DNA‐binding proteins are considered to be fundamental components in the formation of the special chromatin structure on the α‐satellite DNA repeats of the centromere.

Among the centromere‐specific DNA‐binding proteins, CENP‐B specifically binds a 17 bp sequence (the CENP‐B box) (Earnshaw et al., 1989; Masumoto et al., 1989; Kipling et al., 1995), which appears in every other α‐satellite repeat (Ikeno et al., 1994, 1998). The CENP‐B box sequence positions nucleosomes adjacently (Yoda et al., 1998), and may therefore function as a cis‐element for centromere‐specific nucleosome assembly. In vivo analyses with cultured human cells revealed that the existence of the CENP‐B box is essential for the formation of minichromosomes (Ikeno et al., 1998). On the other hand, CENP‐B null mice appeared to be normal, except for lower body and testis weights, and uterine dysfunction (Hudson et al., 1998; Perez‐Castro et al., 1998; Fowler et al., 2000), indicating the existence of functional homolog(s) of CENP‐B. Actually, CENP‐B‐like proteins of unknown function have been identified in the human, such as the jerky‐like protein (Toth et al., 1995) and the transposases encoded by the human Tigger1 and Tigger2 transposable elements (Smit and Riggs, 1996; Kipling and Warburton, 1997).

CENP‐B is an 80 kDa protein that contains a DNA‐binding region at the N‐terminus (Yoda et al., 1992). In the present study, we determined the crystal structure of the DNA‐binding region of CENP‐B (CENP‐B1–129) complexed with a 21 bp CENP‐B box DNA at 2.5 Å resolution. This is the first structure of a centromere‐specific protein–DNA complex.

Results and discussion

Structure determination

Recombinant CENP‐B1–129 forms an inclusion body and, therefore, the protein was purified in the presence of 6 M urea. The complex of CENP‐B1–129 and the CENP‐B box DNA (21 bp; Figure 1A) was formed during the refolding process by dialysis against buffer without urea. The purified complex was concentrated up to 3–6 mg protein/ml, and the co‐crystals were obtained by the hanging drop method. The crystals belong to the trigonal space group P3112, with unit cell constants of a = b = 83.4 Å, c = 139.0 Å, and contain one complex per asymmetric unit. A native diffraction data set was collected at 2.5 Å resolution at beamline BL44B2 in SPring‐8, Harima. The structure was solved by multiple isomorphous replacement augmented with anomalous dispersion (MIRAS) (Table I), by the use of three isomorphous derivatives with duplex DNAs in which a thymidine was replaced by a 5‐iodouridine at positions 12, 14 and 19, respectively (Figure 1A). A portion of the electron density map is shown in Figure 1B.

Figure 1.

The crystal structure of the CENP‐B1–129 protein complexed with the 21mer DNA. (A) The 21mer DNA sequence with the CENP‐B box. The three boxes, marked as sites 1, 2 and 3, indicate the essential bases for CENP‐B binding to the CENP‐B box DNA. Closed triangles indicate the thymine residues, which were each replaced by 5‐iodouracil for phasing. (B) Stereo view of the (2|Fo| − |Fc|) electron density map for amino acid residues 119–129 and base pairs 15–20. (C) Overall structure of the CENP‐B1–129·DNA complex. The N‐terminal arm, domain 1, the linker loop and domain 2 are shown in blue, orange, magenta and cyan, respectively. The local DNA axis was calculated by the program CURVES (Lavery et al., 1988) and is shown in red. The structure was drawn by the programs MOLSCRIPT (Kraulis, 1991) and Raster3D (Merritt and Bacon, 1997). (D) The amino acid sequence and the secondary structure of CENP‐B1–129.

View this table:
Table 1. Crystallographic statistics

Protein structure

The crystal structure of CENP‐B1–129 is divided into four well‐defined regions: the N‐terminal arm (amino acid residues 1–9), domain 1 (residues 10–64), the linker loop (residues 65–74) and domain 2 (residues 75–129) (Figure 1C and D). Domains 1 and 2 of CENP‐B1–129 have a helix–turn–helix motif and bind to adjacent major grooves of DNA. DNA‐complexed structures of a tandem repeat of two helix–turn–helix domains have been determined for the DNA‐binding domains of the human Myb protein (Ogata et al., 1994), the human Pax6 protein (Xu et al., 1999) and the yeast telomere binding protein RAP1 (König et al., 1996). Domain 1 of CENP‐B1–129 has five α‐helices (helices 1–5, Figure 1D). We previously reported the solution structure of a shorter CENP‐B fragment consisting of residues 1–56 (Iwahara et al., 1998). Although the fragment was truncated in the middle of the helix 4 region, the solution structure of CENP‐B1–56 can be superimposed onto the corresponding part of the present crystal structure of domain 1 in the DNA‐bound CENP‐B1–129. The presence of helices 4 and 5 in addition to the three α‐helices of the canonical helix–turn–helix domain is characteristic of domain 1 of CENP‐B; helix 1 is surrounded by helices 2–5 through many hydrophobic interactions. Domain 1 of CENP‐B is structurally close to the DNA‐binding domain of the Tc3A transposase (van Pouderoyen et al., 1997) (r.m.s.d. 1.44 Å), the N‐sub domain of Pax6 (Xu et al., 1999) (r.m.s.d. 1.22 Å) and domain 1 of RAP1 (König et al., 1996) (r.m.s.d. 1.49 Å). Domain 2 of CENP‐B has three α‐helices (helices 6–8, Figure 1C and D), and the helix–turn–helix is formed by helices 7 and 8 and a turn region of seven amino acid residues. The C‐subdomain of Pax6 and domain 2 of RAP1 show high similarity to domain 2 of CENP‐B, with r.m.s.d. values of 1.97 and 2.18 Å, respectively (König et al., 1996; Xu et al., 1999). The CENP‐B domains 1 and 2 themselves are similar to each other (r.m.s.d. 2.26 Å for the Cα atoms, except for the extra helices 4 and 5 of domain 1).

Base recognition and DNA kinks

Nine out of the 17 bp of the CENP‐B box are essential for the recognition by CENP‐B (Masumoto et al., 1993). These essential base pairs are located in three distinct regions (sites 1, 2 and 3), which have the sequences T4–T5–C6–G7, A12, and C15–G16–G17–G18, respectively (Figure 1A). In the present complex structure, CENP‐B1–129 makes direct contacts with these essential sites. Helix 3 of domain 1 lies along the major groove and recognizes the essential sequences in site 1 (Figure 2A and B). In site 1 (from T4:A4′ to G7:C7′), T4 and T5 of one strand are recognized by Pro39 and Ser43, respectively, through van der Waals interactions, and the N7 atom of G6′ of the other strand forms a hydrogen bond with the Oγ of Ser40 (Figure 2A and B). These interactions bend the DNA grooves at bp 6 of site 1, and induce a local DNA kink of 16° (Figures 2E and 3A). The G7:C7′ base pair, which is indispensable for the CENP‐B binding (Masumoto et al., 1993), is not recognized directly by CENP‐B in the crystal structure (Figure 2A). This base pair may be important for CENP‐B to induce the kinked DNA structure, because replacement by another base pair, such as A:T, causes steric hindrance in the DNA structure (data not shown). In site 2 (A12:T12′), the O4 of T12′ is recognized by the Nζ atom of Lys70 of the linker loop through hydrogen bond formation in the minor groove (Figure 2A and C). The DNA around site 2 is only slightly distorted as compared with the B‐form DNA. Arg5 of the N‐terminal arm stacks with the base moiety of G10, and the side chain amino group of Arg5 forms a hydrogen bond with the O2 group of T9 in the same minor groove (Figure 2A).

Figure 2.

Protein–DNA interactions. (A) Schematic diagram summarizing the DNA contacts by CENP‐B1–129. The essential base pairs in sites 1, 2 and 3 are colored blue. Water molecules are denoted as open circles labeled with W. Open circles represent phosphates. Hydrogen bonds and salt bridges with the backbone phosphate groups are indicated with thick black lines. Specific recognitions of bases in sites 1, 2 and 3 are shown in red (hydrogen bonds) and yellow (van der Waals interactions) lines. (BD) Specific interactions at sites 1, 2 and 3, respectively (stereo view). DNA strands are shown in green and yellow. Dotted lines in gray represent hydrogen bonds, and those in orange represent van der Waals interactions. (E) Graphic representation of the widths of the major and minor grooves, calculated by CURVES (Lavery et al., 1988). The vertical axis indicates the groove width (Å) and the horizontal axis indicates the base number. The solid and dashed lines indicate the widths of the major and minor grooves, respectively.

Figure 3.

Structural comparisons of the CENP‐B·DNA complex (A) with the complexes of the E.coli CAP protein (B; Schultz et al., 1991), the yeast MATa1/MATα2 protein (C; Li et al., 1995), the human serum response factor (SRF) (D; Pellegrini et al., 1995), the human Pax6 protein (E; Xu et al., 1999) and the yeast RAP1 protein (F; König et al., 1996), respectively. The local DNA axis was calculated by the program CURVES (Lavery et al., 1988) and is shown as a red line. Arrows in (A) indicate the angles of local DNA kinks found in the CENP‐B complex. All of the structures are represented in the orientation where the DNA molecules show the greatest bending.

Site 3, which consists of base pairs from C15:G15′ to G18:C18′ (Figure 1A), is kinked locally by CENP‐B1–129 through a novel mechanism. In contrast to helix 3 of domain 1, the orientation of the recognition helix of domain 2 (helix 8) relative to the major groove is very different from the cases of canonical helix–turn–helix proteins. Actually, in site 3, helix 8 penetrates perpendicularly into the major groove around bp 16 (Figures 1D and 2D). Thus, N7 of G16, N7 of G17 and O6 of G18 are recognized by the side chain NH2 group of Arg125, the main chain NH group of Gly121 and the side chain NH2 group of Asn120, respectively, through hydrogen bonding (Figure 2A and D). In addition to the involvement of Arg125 in the base‐specific recognition, helix 8 contains three more argine residues (Arg127, Arg128 and Arg129), all of which are involved in the DNA interaction (Figure 2A and D). The arginine cluster in the recognition helix is unique to the CENP‐B domain 2 among helix–turn–helix proteins. Arg128 interacts with the 5′ phosphate group of C15 through a water molecule (Figure 2A). Remarkably, Arg127 and Arg129 bridge the major groove through direct interactions with the phosphate groups of A20′ and A14, respectively (Figure 2A and D). These bridges between the arginine residues of helix 8 and the backbone phosphates of the major groove drastically narrow the width of the major groove, expanding in turn that of the minor groove (Figure 2E and D), and induce the 43° local kink of the DNA at bp 16 in site 3 (Figure 3A). This kinking mechanism is unique, and is now named the ‘phosphate bridging by an arginine‐rich helix’ (PBAH). The C15:G15′ base pair, which is essential to CENP‐B binding (Masumoto et al., 1993), is not recognized directly by CENP‐B. This is probably because C15:G15′ is sterically required as is G7:C7′ described above; the present kinked structure causes steric hindrance for the substitution of C15:G15′ by any other base pair (not shown).

A DNA kink by major groove protein binding has been found only for the CAP·DNA complex (Figure 3B, Schultz et al., 1991). The CAP protein forms a symmetric homodimer, and is bound with a DNA that is nicked at the protein contact sites. A sharp DNA kink (∼40°) was induced by the extensive interactions of the three α‐helices with the bases and the phosphates on the major groove (Figure 4B).

Figure 4.

Schematic drawings of DNA structures in the complexes with CENP‐B (A), CAP (B), MATa1/MATα2 (C) and SRF (D). Cylinders indicate recognition helices for DNA binding. The 3′ and 5′ ends of DNA molecules are indicated by open and closed circles, respectively. The complex with CAP contains two molecules of double‐stranded DNA, which has a four base overhang at the 3′ end, and the complexes with CENP‐B, MATa1/MATα2 and SRF contain one molecule of double‐stranded DNA.

Overall DNA bending

In the CENP‐B·DNA complex, the kinks only occur at the two protein contact sites, and the DNA region between the two kinks is straight, resulting in a ‘kink–straight–kink’ bend (Figure 4A). In the CAP·DNA complex (Figure 4B), the DNA region between the two kinks is also straight, which might be due to the nicks in the DNA at the protein contact sites. In the CENP‐B·DNA complex, the kinks at sites 1 and 3 are in the same plane, and the overall bending angle of 59° is therefore nearly equal to the sum of the two kink angles (Figure 3A). These kinks are not due to pseudocontinuous superhelix formation in the crystal (Li et al., 1995). One 21 bp DNA molecule contacts another symmetry‐related DNA molecule in a manner twisted by 90° to form a straight pseudocontinuous DNA helix of 42 bp, but the two ends as well as the flank of the pseudocontinuous DNA helix have no crystal packing interactions with any other symmetry‐related DNA molecules.

In contrast to the ‘kink–straight–kink’ bend of the CENP‐B box DNA, other large DNA bends due to major groove protein binding occur gradually over the region between the two protein contact sites. These ‘round bends’ have thus far been observed mostly in systems that have extensive protein–protein interactions, such as dimer formation and domain–domain interactions. In the cases of helix–turn–helix proteins, the yeast MATa1·MATα2 heterodimer causes a round bend with a bending angle of 60° (Figures 3C and 4C) (Li et al., 1995). Similar round bends of DNA have been found for two non‐helix–turn–helix proteins, the human serum response factor (SRF) and MCM1 (Pellegrini et al., 1995; Tan et al., 1998), which both belong to the MADS family (Figures 3D and 4D). These MADS family proteins form homodimers; coiled‐coil α‐helices formed by the well‐conserved peptide segments (the MADS box) lie horizontally along the major groove of the specific DNA (Figure 3D). Therefore, the DNA bending mechanism of the CENP‐B·DNA complex is totally different from that of the other DNA‐bending proteins. On the other hand, the protein structures of the tandemly arranged helix–turn–helix domains of CENP‐B are similar to those of the Pax6 and RAP1 DNA‐binding regions (König et al., 1996; Xu et al., 1999). Two helix–turn–helix domains did not interact directly with each other, similar to the case of CENP‐B in the complexes of the Pax6 and RAP1 DNA‐binding regions with their cognate DNA sites. However, the DNA structures in the complexes are quite different; in the cases of Pax6 and RAP1, the DNAs were only slightly bent (Figure 3E and F).

Centromeric chromatin

It has been reported that CENP‐B induced nucleosome positioning adjacent to the CENP‐B box sequence (Yoda et al., 1998). The CENP‐B box sequence exists in every other α‐satellite repeat (171 bp), and CENP‐B accommodates a pair of centromeric nucleosomes between two CENP‐B boxes (Yoda et al., 1998). Furthermore, CENP‐B bundles two distant CENP‐B boxes through dimer formation (Yoda et al., 1998). On the other hand, it is proposed that the centromere nucleosome contains histones H2A, H2B, H4 and a centromere‐specific histone H3 variant, CENP‐A (Yoda et al., 2000). Recently, it was reported that depletion of CENP‐A caused significant dispersions of CENP‐B and CENP‐C in mice (Howman et al., 2000), suggesting that CENP‐A, CENP‐B and CENP‐C form the centromere cooperatively. In the present study, we found that binding of CENP‐B to the CENP‐B box sequence bends it by ∼60°. The kinked DNA structure induced by CENP‐B should be taken into account when the mechanism of centromeric nucleosome condensation is investigated.

Materials and methods

Purification of CENP‐B1–129

CENP‐B 1–129 was overexpressed in Escherichia coli JM109 (DE3) cells under the control of the T7 promoter. The E.coli strains carrying the expression vector of CENP‐B1–129 were grown at 37°C, and isopropyl‐β‐d‐thiogalactopyranoside (IPTG; 1.5 mM) was added at an OD600 = 0.8 to induce the protein expression. After overnight cultivation at 37°C, the cells were harvested and were disrupted by sonication in buffer S [50 mM Tris–HCl pH 8.0, 5 mM EDTA, 100 mM NaCl, 1 mM phenylmethylsulfonyl fluoride (PMSF)]. Then, the sample was centrifuged at 27 000 g for 20 min and CENP‐B1–129 was recovered in the insoluble fraction. The precipitate was sonicated in buffer S and was centrifuged again. Proteins in the insoluble fraction were extracted under denaturing conditions with buffer A [6 M urea, 20 mM Tris–HCl pH 8.0, 10 mM dithiothreitol (DTT), 2 mM EDTA, 1 mM PMSF, 2.8 M NaCl] and, after centrifugation, the resulting pellet was discarded. The proteins including CENP‐B1–129 were loaded onto a 30 ml butyl‐Toyopearl M (TOHSO, Japan) column, and the resin was washed with 150 ml of buffer A. CENP‐B1–129 was eluted with buffer A containing 0.1 M NaCl. The fractions were dialyzed against buffer C (6 M urea, 50 mM potassium phosphate pH 6.0, 5 mM DTT, 1 mM PMSF, 150 mM NaCl, 10% glycerol) and were loaded onto a 15 ml phosphocellulose column (Whatman P11). The resin was washed with 50 ml of buffer C, and CENP‐B1–129 was eluted with a linear gradient of potassium phosphate pH 6.0, from 50 to 700 mM. The purified fractions were concentrated to 1–5 mg/ml by Centricon 10 ultrafiltration (Amicon).


The 21mer oligonucleotides containing the CENP‐B box sequence were purchased from Oligo‐Espec custom synthesis. These oligonucleotides were purified by Resource Q chromatography (Amersham Pharmacia Biotech) with a linear gradient of NaCl from 0.2 to 0.5 M. Three iodinated derivatives, in which the thymine is replaced by a 5‐iodouracil at positions 12, 14 and 19 (IdU12, IdU14 and IdU19, respectively), were also prepared.

Purification of the CENP‐B1–129·DNA complex

The purified CENP‐B1–129 and both DNA strands were mixed in a 1:1 stoichiometry (molar ratio) under denaturing conditions. After an incubation for 120 min, the samples were dialyzed against buffer G (50 mM HEPES–KOH pH 7.5, 5 mM DTT, 150 mM NaCl, 10% glycerol). The CENP‐B1–129·DNA complex was concentrated to 1–2 ml and was purified on a Hiload Superdex 75 (Pharmacia) gel filtration column in buffer G without glycerol.


The protein–DNA complex was concentrated by Centriprep‐3 ultrafiltration (Amicon) to 0.2–0.4 mM (∼3–6 mg/ml). Co‐crystals were grown at 20°C by the hanging drop vapor diffusion method. The drops contained 1 μl of the complex solution and 1 μl of the reservoir solution (100 mM sodium acetate, 50 mM sodium cacodylate pH 6.5 and 15% 2‐methylpentane‐2,4‐diol). The drops were equilibrated against 0.5 ml of reservoir solution. Crystals grew in 1 day to a size of ∼0.3 × 0.3 × 0.5 mm3, and belong to the trigonal space group P3112 (a = b = 83.3 Å, c = 139.0 Å). Prior to data collection, the crystals were equilibrated in stabilization buffer (100 mM sodium acetate, 50 mM sodium cacodylate pH 6.5, 20% MPD) for 1 h. Then, the crystals were flash frozen in a stream of liquid nitrogen at 100 K. All data sets were collected from the beamline BL44B2 in SPring‐8, Harima. The data were processed and scaled using the DENZO and SCALEPACK programs (Otwinowski and Minor, 1997).

Phasing and refinement

Phase calculations were carried out with the CCP4 program (Collaborative Computational Project Number 4, 1994). The iodine sites of IdU12, IdU14 and IdU19 were determined by the RSPS program (Collaborative Computational Project Number 4, 1994) from the isomorphous difference Patterson map. The initial phases from the IdU12 derivative were used to locate the positions of the iodine in the IdU14 and IdU19 derivatives by difference Fourier analysis, which were consistent with the Patterson search solutions. The heavy atom parameters were refined with the program MLPHALE (Collaborative Computational Project Number 4, 1994), and the resulting MIRAS map was of excellent quality. The map was improved further by density modification procedures, such as solvent flattening and phase extension, using the program DM (Collaborative Computational Project Number 4, 1994). An atomic model was fitted into the electron density map using the graphics program O (Jones et al., 1991). The positions of the iodines guided the assignment of the DNA residues in the electron density map. Crystallographic positional and simulated annealing refinements were carried out against the 2.5 Å data set using CNS (Brünger et al., 1998). The R‐factor of the final model is 22.5% (Rfree = 26.5%) using data from 30 to 2.5 Å. Coordinates were deposited with the Protein Data Bank (RCSB id code: rcsb012438 and PDB ID code: 1HLV).


We thank Drs N.Kamiya and S.Adachi for the data collection at SPring‐8, and also thank Drs M.Shirouzu and T.Terada for sharing beam time at SPring‐8. This work was supported in part by the Bioarchitect Research Program (RIKEN) and also by a Grant‐in‐Aid from the Ministry of Education, Science, Sports and Culture, Japan.