The methylation of lysine residues of histones plays a pivotal role in the regulation of chromatin structure and gene expression. Here, we report two crystal structures of SET7/9, a histone methyltransferase (HMTase) that transfers methyl groups to Lys4 of histone H3, in complex with S‐adenosyl‐l‐methionine (AdoMet) determined at 1.7 and 2.3 Å resolution. The structures reveal an active site consisting of: (i) a binding pocket between the SET domain and a c‐SET helix where an AdoMet molecule in an unusual conformation binds; (ii) a narrow substrate‐specific channel that only unmethylated lysine residues can access; and (iii) a catalytic tyrosine residue. The methyl group of AdoMet is directed to the narrow channel where a substrate lysine enters from the opposite side. We demonstrate that SET7/9 can transfer two but not three methyl groups to unmodified Lys4 of H3 without substrate dissociation. The unusual features of the SET domain‐containing HMTase discriminate between the un‐ and methylated lysine substrate, and the methylation sites for the histone H3 tail.
In eukaryotes, dynamic transition between an extended, transcriptionally‐active euchromatin structure and a compact, transcriptionally‐silent heterochromatin structure is critical for the regulation of gene expression (van Holde, 1988; Wolffe, 2001). Although it is unclear how the chromatin structure is regulated, a large body of evidence suggests that a variety of post‐translational modifications within the basic N‐terminal tails of the histones play an important role in alteration of the chromatin structure, and thus the regulation of gene expression (Spencer and Davie, 1999; Strahl and Allis, 2000; Turner, 2000; Zhang and Reinberg, 2001).
Several post‐translational modifications of the N‐terminal tails of the histones including acetylation, phosphorylation, ubiquitylation and methylation have been reported (van Holde, 1988). The distinct covalent modifications of histone tails are expected to generate a label that can regulate the interaction with chromatin‐associated proteins or protein complexes, which in turn control chromatin function (Strahl and Allis, 2000; Jenuwein and Allis, 2001).
Histone methylation occurs on both lysine and arginine residues of histones H3 (Rea et al., 2000; Ma et al., 2001) and H4 (Strahl et al., 2001; Wang et al., 2001b). In histone H3, the lysine methylation is observed on residues 4, 9, 27 and 36, whereas in histone H4, only Lys20 is methylated (van Holde, 1988). Compared with other modifications, the methylation of histone lysines shows some unique features. First, it is an irreversible process that results in a long‐term epigenetic label (Byvoet et al., 1972; Duerre and Lee, 1974). This label is required to maintain specific gene expression programmes, and to inherit cell‐type identities (Jenuwein, 2001). Furthermore, it allows the distinct chromatin structure of the centromeres to be propagated faithfully during cell division (Karpen and Allshire, 1997). Secondly, the gene regulatory activity mediated by the methylation of histone lysines is determined not only by the location of methylation site(s) but also by the precise methylation status (e.g. mono‐, di‐ or trimethylation; Santos‐Rosa et al., 2002).
SUV39H1 protein (Tschiersch et al., 1994) and its homologues were the first identified histone methyltransferases (HMTases) that can specifically methylate Lys9 of histone H3 (H3‐K9; Rea et al., 2000). This process is important for the recruitment of the heterochromatin protein 1 (HP1; Bannister et al., 2001; Jacobs et al., 2001; Lachner et al., 2001) and the establishment of heterochromatin (Nakayama et al., 2001; Schotta et al., 2002). In addition to SUV39H1, G9a (Tachibana et al., 2001) and ESET/SETDB1 (Yang et al., 2002; Schultz et al., 2002) have been reported to methylate H3‐K9.
The methylation at Lys4 of histone H3 (H3‐K4) is another well‐studied example of lysine methylation of histones (Wang et al., 2001a; Nishioka et al., 2002). In contrast to H3‐K9 methylation that is correlated with transcription repression, H3‐K4 methylation is enriched in transcriptionally‐active euchromatin (Litt et al., 2001; Boggs et al., 2002; Zegerman et al., 2002). SET7/9 is a recently identified HMTase that methylates H3‐K4 (Wang et al., 2001a; Nishioka et al., 2002). The H3‐K4 methylation by SET7/9 inhibits the association of the NuRD deacetylase complex with the H3 tail. Furthermore, the methylation on H3‐K4 by SET7/9 impairs the SUV39H1‐mediated methylation on H3‐K9, preventing the placement of a silencing label that is recognized by HP1 (Wang et al., 2001a; Nishioka et al., 2002).
Almost all lysine HMTases contain a highly‐conserved SET domain (Jenuwein et al., 1998; Kouzarides et al., 2002). Several lines of evidence suggest that methylation of histone lysine residues is mediated through the SET domain. The pre‐ and post‐SET regions are also required for the activity of the enzyme (Rea et al., 2000). The importance of the SET domain for HMTase activity suggests that the mechanism of methyl transfer to the lysine residue is likely to be conserved among the SET domain‐containing family of enzymes (Wang et al., 2001a; Nishioka et al., 2002). The biological importance of the SET domain is underscored by the fact that several SET‐containing HMTases are involved in the development of cancer in a positive or negative manner (Peters et al., 2001; Schneider et al., 2002)
Recently, three structures of SET domain‐containing methyltransferases (MTases), namely SET7/9 (Wilson et al., 2002), DIM‐5 (Zhang et al., 2002) and Rubisco large subunit lysine MTase (Rubisco LSMT; Trievel et al., 2002), have been reported. Each structure has the well‐described unique feature of a SET domain. Also, based on biochemical and structural analyses, an active site for each MTase has been proposed. However, in two cases, SET7/9 (Wilson et al., 2002) and Rubisco LSMT (Trievel et al., 2002), significant differences are observed in the cofactor orientation. In addition, the post‐SET region that is important for the activity of MTase is well defined only in the Rubisco LSMT structure (Trievel et al., 2002). Thus, further structural characterizations are required to resolve these issues.
To understand the roles of the SET domain and c‐SET region in the lysine methyl transfer reaction, we have determined the crystal structure of SET7/9 bound to an S‐adenosyl‐l‐methionine (AdoMet) molecule (Figure 1A). (We refer to the regions immediately prior to and after the SET domain in SET7/9 as n‐SET and c‐SET, respectively, throughout the text since these regions lack the sequence similarity compared with cysteine‐rich pre‐ and post‐SET domains of SUV39H1 and its homologues.) We show that the AdoMet molecule adopts an unusual conformation in the pocket formed between the SET and c‐SET domains of SET7/9. We also demonstrate that SET7/9 can transfer two but not three methyl groups to unmethylated H3‐K4 without dissociation of a substrate, supporting the importance of a channel formed at the active site for the selection of the lysine substrate.
Structure determination of SET7/9
In initial crystallization experiments, a thin plate crystal form was obtained from the full‐length SET7/9 protein. However, this crystal form did not diffract. To overcome this problem, we carried out a limited proteolytic digestion on full‐length protein, and identified two stable polypeptide fragments spanning residues 70—366 (SET7/9L) and residues 109—366 (SET7/9S). Biochemical analyses revealed that both of these fragments retain full HMT activity, suggesting that the missing N‐terminal region is not required for substrate binding and catalytic activity (data not shown).
The SET7/9L and SET7/9S proteins were crystallized in well diffracting crystal forms, one in orthorhombic form (1.7 Å resolution) and the other in a tetragonal space group (2.3 Å resolution). Phasing and refinement statistics for these crystals are shown in Table I. Both crystal forms contain one protein—AdoMet complex in the asymmetric unit. The two structures are virtually identical, showing root mean square (r.m.s.) deviations of 0.5 Å for Cα atoms.
The crystal structure revealed that the SET7/9L consists of three α‐helices and 20 β‐strands organized in two domains (Figure 1B). The two connected domains are well defined and, upon packing into a whole structure, 2035 Å2 of accessible surface area becomes buried. The N‐terminal domain (residues 79—193) is made up entirely of β‐strands (Figure 2A and B). The C‐terminal domain (residues 194—364) folds into an α/β structure, and most of the highly‐conserved residues are concentrated in this region.
The N‐terminal domain folds into a β‐domain structure with a twisted antiparallel β‐sheet and an extra β‐strand (Figure 2A and B). The β‐sheet is formed with nine β‐strands. The first three strands are completely exposed to solvent, whereas strands β4—β7 are packed in a perpendicular manner to the extended part of strand β8, which is bent significantly, and to strand β9. The opposite face of these strands interacts with the C‐terminal domain. The N‐terminal domain is far from the active site and does not appear to be involved in the enzyme activity or substrate binding. A search for homologous structures using the DALI server revealed two structures, bacterial outer‐surface protein A (1OSP; z = 4.9) and porin (3PRN; z = 3.8). The structural similarity of the N‐terminal domain to these proteins suggests that this domain may interact with small polymeric molecules or other proteins to mediate the signal to the C‐terminal domain.
The C‐terminal domain containing a SET domain folds into a novel α/β fold with 10 β‐strands and three α‐helices (Figure 2A and B). The central feature of this domain includes four motifs, which form the SET domain, and the c‐SET helix. The C‐terminal domain folds into a tightly‐packed structure that contains n‐SET, SET and c‐SET regions. A search for homologous structures using DALI reveals no proteins with a structural similarity. The first motif in this domain is a β‐sheet formed by three twisted β‐strands (β13, β14 and β18), and helix α1, which is located next to the β13 strand (Figure 2B). One face of the sheet and helix α1 are packed against the six β‐strands (β4—β8 and β10) of the N‐terminal sheet. The second motif located at the top of the β‐sheet is formed by helix α2 and three antiparallel strands (β15, β16 and β19). The ‘front face’ of this sheet is packed in a perpendicular manner to the β‐sheet in the first motif, whereas the ‘back face’ makes extensive interactions with residues from helix α2. The third motif consists of two antiparallel strands (β11 and β12) that are located at the bottom of the first sheet (Figure 2A and B). These strands are perpendicular to the β‐strands from both the first and second motifs, and do not make direct contact with them. Strand β20 threads through and underneath strand β17 and loop L4, forming a knot structure, and this fourth motif occupies the space that was formed upon packing of the other three motifs. The c‐ET helix α3 is positioned at the opposite end of the N‐terminal domain, making extensive interactions with strand β17 and loop L4. This unusual fold of the C‐terminal part of the SET domain and c‐SET region gives rise to two prominent architectural features; a short and wide pocket and a narrow channel formed in the pocket.
The role of the c‐SET domain
The c‐SET region comprising residues 345—366 is critical for the function of SET7/9, and it has been shown that removal of this region completely abolishes the HMTase activity of SET7/9 (Wilson et al., 2002). Our structure shows that a part of this region (residues 344—349) forms a flexible loop, whereas residues 350—360 form an amphipathic helix. In this helix α3, the interacting residues, including Pro350, Trp352, Tyr353, Leu357 and Phe360, form an extensive network of van der Waals contacts with His297 in strand β17 and Phe299 in loop L4 (Figure 2C). This network is also connected to Tyr335 and Tyr337 in loop L5. The c‐SET helix and a part of the L5 loop are not present in the recently reported crystal structure of SET7/9 (Wilson et al., 2002). As we describe later, in the presence of this helix, a significant portion of the cofactor‐binding site is protected compared with that in the absence of the helix, and the AdoMet molecule can gain access from limited directions.
The adenine base of AdoMet is tightly packed between Trp352 of the c‐SET helix and loop L1/strand β17, and mediates the interaction between them (Figure 2C; see the section entitled ‘The SET7/9—AdoMet interaction’). The L5 loop that connects the c‐SET helix and SET domain is flexible, and the c‐SET region may regulate access of the AdoMet molecule to the binding site by altering the relative position of this helix.
A pocket with a width of 9.5 Å × 13 Å and a depth of 8.5 Å is formed by the interaction of loops and strands from the SET domain and the c‐SET helix in the C‐terminal domain (Figure 3A and B). The bottom of the pocket is lined with strand β17 that contains the R/HxxNHS motif, the most conserved sequence found in members of the SET domain‐containing protein family, and with residues 335—337 of loop L5 in the c‐SET region. One side of the wall is formed by loop L1 between strands β11 and β12. The opposite side of the wall is covered with residues 263—266 in loop L2. Two other ends of the wall are lined with residues 293—295 in loop L3 and the c‐SET helix α3.
An AdoMet with a compact conformation binds to this pocket (Figure 3A—C). The methyl group of the methionine moiety and N6 of the adenine base moiety are directed towards the binding pocket, whereas the hydroxyl group of adenosine ribose points outwards to the solvent.
The conformation of AdoMet is identical in both crystal forms, indicating that this unusual conformation is not the result of different crystal packing. In the SET7/9—AdoMet structure, the adenine base is in an anti‐conformation with its ring parallel to the β‐strand plane at the bottom of the pocket. The adenine ribose has 2′‐exo sugar puckering (Figure 3A—C). We have compared the conformation of AdoMet in SET7/9 with those of 20 other AdoMet molecules that bound to different proteins. Although there are slight differences in each AdoMet conformation, they can be grouped into two classes. Together with the compact form of AdoMet in SET7/9, we can divide these conformations into extended, intermediate and compact forms (Figure 3C). The intermediate form of AdoMet is found in methionine repressor protein (Metj; Somers and Phillips, 1992) and adenine‐N6‐DNA‐methyltransferase (Zubieta et al., 2001), whereas the extended form is observed in the rest of the MTase structures. When these molecules are superimposed by placing their centres on the sugar rings, the compact conformation of an AdoMet molecule in SET7/9 shows some unique features (Figure 3C): (i) the adenine base of the compact form is perpendicular to that of the extended or intermediate form; and (ii) the methionine moiety of AdoMet in SET7/9 is rotated significantly relative to the adenine ribose moiety around both the C4—C′5 and the C′5—Sδ bonds. This rotation results in a highly compact form of AdoMet, and the side chain axis of the methionine moiety is directed perpendicular to the sugar plane, whereas it is parallel to the sugar ring plane in an extended or intermediate form.
The distinctive compact conformation of AdoMet plays crucial roles in the structure and function of SET domain‐containing HMTases. First, it allows the binding of AdoMet into the pocket, which is not possible in the extended or intermediate conformation. Secondly, in this conformation, a sulfonium methyl group (Cϵ) in the methionine moiety can be located close to a number of residues that form a channel where a substrate lysine can enter (see below). The methyl group would be positioned ‘incorrectly’ in an extended or intermediate conformation, and the substrate could not be reached.
The SET7/9—AdoMet interaction
The recently reported structure of SET7/9—AdoMet (Wilson et al., 2002) differs significantly from our structure with regard to the orientation of AdoMet in the active site. These differences may be explained by the lack of a c‐SET helix in the published structure or by the limited resolution at which the orientation of AdoMet cannot be determined accurately. The orientation of cofactor in Rubisco LSMT (Trievel et al., 2002) is similar to that in our structure. The interactions observed between SET7/9 and the AdoMet molecule in our structure can be grouped according to adenine base, adenosine ribose and methionine moieties of AdoMet (Figure 3B and D).
The adenine base is buried between the SET and c‐SET domains through van der Waals contacts and hydrogen bonds (Figure 3A and B). The indole ring of Trp352 from the c‐SET helix packs against the face of the adenine base, and Ala226, Ile223 and the aliphatic part of Asn296 make van der Waals contacts with the other face of the base. Nitrogen N6 forms a hydrogen bond with the Oϵ1 and Oϵ2 atoms of Glu356, and N6 and N7 interact with the backbone carbonyl group of His297.
No significant interactions are observed between the ribose ring and SET7/9 because the sugar moiety of AdoMet is oriented in a manner such that its hydroxyl group points towards the solvent (Figure 3B). The only detectable intermolecular interaction is a hydrogen bond between O4′ of the ribose ring and Nδ2 of Asn296. However, O4′ of the ribose ring makes an intramolecular hydrogen bond with the amide group of the methionine moiety, and this interaction may contribute to the compact conformation of the AdoMet molecule.
The carboxylate group of the methionine moiety interacts with the side chain of Lys294 and the backbone amide of Glu228 through a salt bridge and a hydrogen bond, respectively (Figure 3B and D). The amino group forms hydrogen bonds with the backbone carbonyl groups of Ala226 and Glu228 and the side chain of Asn296. The sulfur atom of AdoMet makes contact with Nδ1 of Asn265. Interestingly, the methyl group is located in a polar environment formed by hydroxyl groups of Tyr245 (4.4 Å), Tyr305 (5.5 Å) and Tyr335 (3.5 Å), and carbonyl oxygen atoms of Gly264 (3.8 Å), His293 (3.0 Å) and Ala295 (4.2 Å). This polar environment presumably helps the access of the Nϵ of a substrate lysine. Previous mutational analysis suggested that two conserved residues, His293 and His297, are important for the HMTase activity of this enzyme family (Wang et al., 2001a; Nishioka et al., 2002). Although these histidine residues are located in the AdoMet‐binding pocket, they are far from this methyl group (6.5 Å for His293 and 9.5 Å for His297). The side chain of His297 is located near the adenine base moiety (4.1 Å from N6 of the base), whereas the side chain of His293 is at a distance of 6.5 Å from the methionine moiety. The sulfur group is in between these.
The substrate‐specific channel
The methyl group of AdoMet points to a narrow channel that is seen within the binding pocket (Figures 3A and 4A). This channel, with an opening diameter of 6 Å and a depth of 6 Å, is formed by the side chains of Tyr245, Leu267, Tyr305, Tyr335 and Tyr337, and the backbone of residues 264—267 (Figure 4A and B). Most of the residues lining the channel are highly conserved among SET domain‐containing HMTases, indicating that the channel is a key feature of this protein family (Figure 4A and B). The inner surface inside the channel where the methyl group is located is highly polar, which is due to the presence of several hydroxyl and carbonyl groups.
The Nϵ atom of histone lysines can be mono‐, di‐ or trimethylated, and substrates with differently methylated states may bind to HMTase. However, the narrow diameter and the electrostatic properties of a channel formed at the active site may prevent the access of a methylated lysine substrate. To address whether the functional role of the channel is to discriminate between unmethylated and methylated lysine substrates, we tested whether the enzyme can methylate a mono‐ or a dimethylated lysine substrate. Figure 4C shows that SET7/9 can transfer a methyl group(s) to unmethylated K4 of H3 but not to either mono‐ or dimethylated K4 of H3. Wilson et al. (2002) have also observed that SET7/9 cannot add a methyl group to a monomethylated peptide.
However, since multiple methyl groups can be transferred to the lysine side chain without dissociation of a substrate peptide, we have explored this issue further by employing tested antibodies that specifically recognize mono‐, di‐ and trimethylated K4 of H3. Although these antibodies are highly specific for each methylated state of the peptide as judged by enzyme‐linked immunosorbent assay (ELISA; data not shown) and peptide competition experiments (Santos‐Rosa et al., 2002), in order to remove completely any possible non‐specific interaction between each antibody and the product(s), we have carried out a peptide competition assay. In this assay, mono‐ and trimethylated peptides were added to the dimethylated K4 antibody prior to detecting the reaction product(s). Similar peptide competition methods were also employed for the mono‐ or trimethylated K4 antibody assay. As shown in Figure 4D, the dimethylated K4 antibody can recognize the recombinant H3 after it has been methylated by SET7/9, whereas the trimethylated antibody fails to do so. Figure 4D also shows that the dimethylated peptide specifically competed with the reaction product against the dimethylated K4 antibody, further supporting the specificity of the dimethylated K4‐H3 antibody and the dimethylated reaction product formed by SET7/9. The binding of dimethylated K4 antibody to the reaction product of H3 by SET7/9 has been observed consistently by another group (B.Hamilton and J.Bone, Upstate Biotechnology, unpublished data). These results suggest that SET7/9 can transfer two methyl groups but not three methyl groups to the K4 of H3, and multiple rounds of methylation occur on the Nϵ atom of a lysine without being released from the enzyme when unmodified substrate is bound to the active site.
The histone‐binding site
The surface representation in Figure 4Aa shows that a conserved shallow groove is formed around the channel that is extended continuously to one side of the structure at site 1. This shallow groove is formed with highly‐conserved residues from strand β18 and helix α3 near the channel, and strand β20 at the side of the structure (Figure 4A). Since a lysine substrate enters the channel, another part of the N‐terminal tail must be in the vicinity of this region. We propose that this shallow groove provides the binding site for a part of the histone tail. Alternatively, another highly‐conserved region is located near the channel (site 2 in Figure 4A) comprising strands β16 and β19, and extends to site 3 where helix α2 is located. Mutation on an exposed residue, Asp270 (strand β16) or Glu254 (helix α2), to alanine reduced the binding of a substrate peptide to SET7/9 (Wilson et al., 2002), suggesting the possibility that a substrate binds to this region.
The structure—function correlations
Guided by the structural information, we carried out mutagenesis on eight residues, His293, His297, Lys294, Asp276, Trp352, Tyr305, Tyr245 and Tyr335, at or near the active site of SET7/9 (Figure 5A). These residues were changed individually to alanines, and the HMTase activities of the mutants were measured. Interestingly, most of the mutants exhibited a significantly decreased HMTase activity (Figure 5B).
Lys294 and Trp352 interact directly with the carboxylate group and adenine base of AdoMet, respectively. The decreased HMTase activity observed for the mutants reflects the importance of these interactions. Mutation of Lys294 or Trp352 is also likely to disrupt the local structure of the pocket. The Nϵ of atom of Lys294 forms a salt bridge with Oϵ1 of Glu228 in loop L1, which forms a wall of the binding pocket. The indole ring of Trp352 makes van der Waals contacts with the side chain rings of His297, Tyr335, Tyr337 and Tyr353 (Figure 5A).
The side chain of Asp279 in strand β16 forms a salt bridge with the conserved Arg258 residue in loop L2 that is located on top of the channel. The decrease in enzyme activity seen for the Asp276 mutant may be explained by the disruption of this salt bridge that could perturb the local channel structure.
The side chain rings of Tyr245, Tyr305 and Tyr335 are important components that form the wall of the channel in the binding pocket. All the hydroxyl groups of these residues are directed to the methyl group of the AdoMet molecule, and a water molecule (3.4 Å from the AdoMet methyl group) is hydrogen bonded to Tyr305 in the centre of a channel. It is likely that the role of these residues is to position the lysine substrate correctly, which is crucial for catalysis. Furthermore, they could deprotonate the Nϵ of the lysine substrate to generate a nucleophile intermediate for the methylation reaction.
His293 and His297 are far from the catalytic site and are unlikely to participate directly in the methylation reaction. These residues may therefore play an important role in the maintenance of the local structure around the active site. His293 is surrounded by Tyr245, Val277 and Tyr287. His297 is surrounded by Phe299, Glu356, Trp352, Tyr353 and Tyr335. However, rotations of the imidazole ring of His293 and His297 around the χ1 axis of the side chain would position their Nϵ atoms within a distance 3 Å from the hydroxyl group of Tyr245 and Tyr335, respectively, indicating the histidine residues may assist the tyrosine residues with their catalytic function.
The evolutionarily conserved SET domain is found in a large and rapidly increasing number of proteins. At present, >350 proteins are known to contain SET domains (Schultz et al., 1998). It is unknown, however, whether all these proteins possess HMTase activity. One important requirement for a SET‐containing protein to possess HMTase activity is the presence of specific pre (n)‐ and post (c)‐SET regions (Rea et al., 2000). In contrast to the conserved SET domain, the n‐ and c‐SET regions vary significantly among the members of this enzyme family, and a particular arrangement of the n‐SET and c‐SET regions appears to be important in HMTase function.
The role of a SET domain and a c‐SET helix
Our structure shows that the n‐SET region and the SET domain are tightly packed together in the C‐terminal domain. The n‐SET region is therefore likely to be involved in the folding of the SET domain and contribute to its stability. The SET domain and c‐SET region have more direct roles in the HMTase activity of SET7/9 because an active site is located within two domains. In particular, the c‐SET helix is essential to protect the binding site from one side. Thus, in the presence of a c‐SET helix, the access of an AdoMet molecule is more restricted than in the absence of this helix. The region immediately adjacent to this helix is a flexible loop. The movement of the c‐SET helix relative to the SET domain may regulate the access of the AdoMet molecule to the active site. Secondly, the more important role of a c‐SET region may be to stabilize the structure of a substrate‐specific channel and the positioning of the catalytic Tyr335 (see below). Thus, disruption of the local structure of the interface between the SET domain and the c‐SET helix may alter the conformation of the substrate‐specific channel where a lysine substrate can enter, and/or may perturb the position of Tyr335, which would affect the catalytic activity. Consistent with this explanation, a deletion mutant in which this c‐SET helix is removed is inactivated completely while retaining cofactor‐ or substrate‐binding affinity similar to that of wild‐type SET7/9 (Wilson et al., 2002). In addition, two point mutants, Trp352Ala and Tyr353Ala, which would disturb the local structure of the interface between the SET domain and the c‐SET helix, did not exhibit HMTase activity (Wilson et al., 2002). It should be noted that the c‐SET region is not defined in the structure of apoSET7/9 (Wilson et al., 2002). Thus, it is possible that the structure of the c‐SET helix may be induced from the unfolded state upon binding of AdoMet to SET7/9.
Two key features in the active site of SET7/9
Our structure shows two important features of the active site of SET7/9. First, instead of an extended conformation that is observed in most other MTases, the AdoMet molecule in SET7/9 adopts a compact conformation that results in it fitting tightly into the binding pocket. This conformation is important to position the methyl group of AdoMet towards the substrate‐specific channel where a lysine binds on the opposite side. Accordingly, a major role of the SET domain and the c‐SET helix in SET7/9 may be to force the AdoMet molecule into a compact conformation.
The second feature is a narrow substrate‐specific channel lined with several highly‐conserved residues that is formed within the active site. It is well established that lysine residues of histones exist in a mono‐, di‐ and trimethylated form. Recent results demonstrated that the precise methylation state of lysine residues defines the gene expression status (Santos‐Rosa et al., 2002).
Our structural and biochemical data show that SET7/9 can transfer methyl groups only to the unmethylated K4 of H3, suggesting that the narrow substrate‐specific channel controls the access of the already methylated substrates to the active site. Our data also clearly demonstrate that the bound substrate lysine is subjected to two rounds of methylation without having to be released from SET7/9. Since little is known about the processivity of the methyl transfer reaction, it would be interesting to see if these multiple rounds of the methyl transfer mechanism are a general feature in other MTases.
Several residues forming the substrate‐specific channel are not conserved. This variability of amino acid residues may alter the diameter of a channel in other SET domain‐containing HMTases. Since the monomethylated peptide substrate that may barely fit into the channel cannot gain access to SET7/9, factors other than the size of the channel may be involved in discriminating between substrates. It is possible that the surface electrostatic property of the channel is an additional factor. Taken together, although it is unknown where the substrate binds to the C‐terminal domain, the selectivity of the methylation site provided by the SET and c‐SET domain, and the specificity of the substrate methylation state (unmethylated versus methylated) provided by the channel represent two key features for the regulation of gene expression by lysine methylation of histones.
The recently reported crystal structure of the large subunit of the Rubisco methyltransferase where a c‐SET region is well defined contains a similar channel in the binding pocket (Trievel et al., 2002), and an HEPES ion binds to this channel mimicking the lysine binding. Our results in conjunction with this observation suggest that the substrate‐specific channel is likely to be an important feature in members of the SET domain‐containing HMTase enzyme family.
Methyl transfer mechanism of SET7/9
Based on our structure in combination with mutational analysis, we propose the following mechanism for the AdoMet‐mediated methyl transfer reaction catalysed by SET domain‐containing HMTases. It has been proposed previously that the lysine methyl transfer process is an SN2 reaction in which the substrate lysine, a methyl group and the leaving thioester group form a linear arrangement (Coward, 1977). In this arrangement, the unprotonated lysine residue makes a nucleophilic attack on the methyl group of AdoMet.
When lysine is positioned within the channel in such a way that the methyl group of AdoMet is near the Nϵ atom, the two OH groups of Tyr245 and Tyr335 are located within a distance of 3.5—4 Å. Tyr335 is absolutely conserved among this family of enzymes, and could help to neutralize the Nϵ atom of the lysine substrate. However, the pKa of the hydroxyl group of tyrosine is ∼10, which is rather high to deprotonate the Nϵ atom of the lysine substrate. The closely located His297 may serve as a proton acceptor of the hydroxyl group of Tyr335. A simple rotation around the χ1 angle of the conserved His297 would locate the Nϵ of His297 within 2.5 Å of the hydroxyl group of Tyr335. This assumption is supported by the significantly decreased HMTase activity of a His297Ala mutant (Figure 5B). Alternatively, we cannot exclude the possibility that the hydroxyl group of Tyr245 deprotonates the substrate lysine. As in the Tyr335—His297 proton relay system, a rotation of the side chain of His293 would locate the Nϵ group of His293 within a distance of 2.5 Å from the hydroxyl group of Tyr245. In contrast to His297, His293 is not well conserved, however, and is substituted by arginine in several enzymes of the family.
In summary, our structure combined with mutational analysis provides new insights into the catalytic mechanism of SET domain‐containing HMTases. The key findings of our study are that: (i) the AdoMet molecule adopts a compact conformation within the active site pocket; (ii) SET7/9 can transfer methyl groups only to the unmethylated substrate lysine; (iii) two but not three methyl groups are added to the lysine without dissociation of a substrate; and (iv) the neutralization of lysine substrate is likely to be achieved by tyrosine residue(s) with or without the assistance of a histidine residue. These findings together with the site specificity for methylation provided by the SET domain and the c‐SET helix provide an explanation for the basis of the HMTase activity of SET7/9.
The high‐resolution structure reported here may also serve as a basis for the design of small molecules that mimic the distinctive conformation of the AdoMet molecule to block the channel of the binding pocket. These molecules could be used as drugs to modulate the gene expression mediated by SET domain‐containing HMTases.
Materials and methods
Protein expression and purification
SET7/9L (residues 70—366) or SET7/9S (residues 109—366) was inserted into PET15b vector and expressed in Escherichia coli BL21 as a His‐tagged form. The protein was isolated initially using a nickel column. After thrombin digestion, the protein was purified further by anion exchange (Mono‐Q) and gel filtration (Superdex 200) chromatography. Each protein was concentrated by ultrafiltration to 20 mg/ml in 50 mM Tris—HCl pH 7.5, 200 mM NaCl, 5 mM DTT, and stored −70°C.
Concentrated SET7/9 protein was mixed with AdoMet at a final concentration of 5 mM AdoMet, and the mixture was incubated on ice for 30 min to allow complex formation. SET7/9—AdoMet complex crystals were grown by the hanging drop vapour diffusion method. The orthorhombic crystals were grown at 4°C in conditions containing 100 mM Tris pH 8.5 and 30% PEG 6K. Se‐Met‐substituted orthorhombic crystals were grown under the same conditions. The crystals belong to the space group P21212 with cell dimensions a = 102.2 Å, b = 38.7 Å, c = 66.6 Å. X‐ray diffraction data were collected at −170°C from a crystal flash‐frozen in crystallization buffer containing 30% glycerol using the 19ID2 beamline in Advanced Photon Source (APS). The tetragonal crystals were grown at 18°C in a condition containing 100 mM Tris pH 8.5, 1.2 M ammonium sulfate, 0.5 M lithium sulfate and 5 mM DTT. The tetragonal crystal belongs to the space group P43212 with cell dimensions a = 93.45 Å, b = 93.45 Å and c = 110.47 Å. The diffraction data were collected at −170°C from a crystal flash‐frozen in crystallization buffer containing 30% glycerol using the B6 beamline in Pohang Accelerator Laboratory (PAL). Data were processed with the programs DENZO and SCALEPACK (Otwinowski and Minor, 1997).
The orthorhombic structure was solved first using the multiple amorphous diffraction (MAD) method. A Se‐Met derivative was used for MAD phasing. The SOLVE program (Terwilliger and Berendzen, 1999) identified six Se positions and provides the initial phasing with three wavelength (λ1 = 0.9792 Å, λ2 = 0.9791 Å and λ3 = 0.9464 Å) Se MAD data at 1.7 Å resolution. Solvent flattening using RESOLVE (Terwilliger and Berendzen, 1999) was used to improve the accuracy of phases. The graphic programs CHAIN (Sack, 1988) and O (Jones et al., 1991) were used for model building, and CNS (Brünger et al., 1998) was used for refinement. The model was refined against a 1.7 Å Se—Met derivative data set collected at 0.9792 Å (λ1) wavelength. During the refinement, the Rfree value was monitored by using 5% of the data. The final model has 87% of the protein main chain (φ/ψ) angles in the most favoured region and none in the disallowed region of the Ramachandran plot, calculated using PROCHECK (Laskowski et al., 1993).
The tetragonal structure was solved by molecular replacement using the Amore program suite (CCP4, 1994) and diffraction data in the 15.0—4.0 Å resolution range. The initial R‐factor for the molecular replacement solution was 42% for all the data ≤3.0 Å resolution. The model was refined to 2.3 Å resolution (5% of the data were set aside and used to calculate Rfree), and 85% of the main chain φ/ψ angles are in the most favoured region, and none in the disallowed region of the Ramachandran plot.
In both structures, most of the regions are clearly defined. However, the N‐terminal part (residues 109—115 in SET7/9S and residues 70—78 in SET7/9L), the C‐terminal part (residues 364—366) and the loop region connecting strand β20 and helix α3 (residues 338—348 in SET7/9S and residues 338—339 in SET7/9L) are disordered and not modelled.
HMTase activity assay and western blot
Site‐directed mutagenesis was performed with either a PCR‐based method or the QuickChange site‐directed mutagenesis kit (Stratagene). The mutation in the SET7/9 sequence was confirmed by DNA sequence analysis.
SET9 protein (1 μg) was incubated in 20 mM Tris—HCl pH 8.0, 50 mM NaCl, 10% glycerol, 1 mM phenylmethylsulfonylfluoride (PMSF) with l μl of 3H‐labelled AdoMet (Amersham) and 1 μg of H3 from calf thymus (Roche) or 1 μg of histone peptide. Following incubation at 30°C for 30 min, the reaction product was spotted on Wattman p81 paper, washed three times for 10 min each with 50 mM NaHCO3 pH 9.2 and briefly with acetone, dried completely and measured by scintillation counting.
For western blot, the reaction was carried out for 180 min. The reaction product was resolved by 15% SDS—PAGE, blotted to a nitrocellulose membrane and probed with the histone3 K4 mono‐ (Abcam; ab8895) or dimethyl antibody (Upstate Biotech, 07‐030), K4, K9 trimethyl (Abcam; ab1322) or K4 trimethyl antibody (Abcam; ab8580). In the antibody assay, peptide competitors were used to remove any non‐specific interaction between each antibody and the product(s).
We thanks to B.Hamilton and J.Bone (Upstate Biotechnology) for sharing unpublished data, members of PAL for their help with data collection, and R.Kammere for critical reading of the manuscript. This work was supported by funds from the Korean Ministry of Science and Technology (National Creative Research Initiative programme, Frontier 2000 programme), the Ministry of Education (programme BK21) and the Korean Academy of Science and Technology. The coordinates have been deposited in the RCSB database (PDB ID of 1N6A and 1N6C).
- Copyright © 2003 European Molecular Biology Organization