The Saccharomyces cerevisiae Set1 complex includes an Ash2 homologue and methylates histone 3 lysine 4

Assen Roguev, Daniel Schaft, Anna Shevchenko, W.W.M.Pim Pijnappel, Matthias Wilm, Rein Aasland, A.Francis Stewart

Author Affiliations

  1. Assen Roguev2,
  2. Daniel Schaft2,
  3. Anna Shevchenko3,
  4. W.W.M.Pim Pijnappel2,
  5. Matthias Wilm1,
  6. Rein Aasland4 and
  7. A.Francis Stewart*,2
  1. 1 EMBL, Meyerhofstrasse 1, D‐69117, Heidelberg, Germany
  2. 2 Present address: Technische Universitaet Dresden, c/o MPI‐CBG, Pfotenhauerstrasse 108, D‐01307, Dresden, Germany
  3. 3 Present address: Max‐Planck‐Institute for Cell Biology and Genetics, Pfotenhauerstrasse 108, D‐01307, Dresden, Germany
  4. 4 Department of Molecular Biology, University of Bergen, Thormoehlensgt. 55, N‐5020, Bergen, Norway
  1. *Corresponding author. E-mail: stewart{at}


The SET domain proteins, SUV39 and G9a have recently been shown to be histone methyltransferases specific for lysines 9 and 27 (G9a only) of histone 3 (H3). The SET domains of the Saccharomyces cerevisiae Set1 and Drosophila trithorax proteins are closely related to each other but distinct from SUV39 and G9a. We characterized the complex associated with Set1 and Set1C and found that it is comprised of eight members, one of which, Bre2, is homologous to the trithorax‐group (trxG) protein, Ash2. Set1C requires Set1 for complex integrity and mutation of Set1 and Set1C components shortens telomeres. One Set1C member, Swd2/Cpf10 is also present in cleavage polyadenylation factor (CPF). Set1C methylates lysine 4 of H3, thus adding a new specificity and a new subclass of SET domain proteins known to methyltransferases. Since methylation of H3 lysine 4 is widespread in eukaryotes, we screened the databases and found other Set1 homologues. We propose that eukaryotic Set1Cs are H3 lysine 4 methyltransferases and are related to trxG action through association with Ash2 homologues.


Members of the trithorax group (trxG) have been identified by genetic screens of Drosophila for mutations that suppress phenotypes caused by disregulation of polycomb‐group (PcG) action or mimic loss‐of‐function homeotic mutant phenotypes. As expected from these complex genetic screens, the trxG appears to encompass several subclasses of gene regulatory factors (Kennison, 1995). One subclass involves chromatin remodelling activity. The realization that the trxG member Brahma (BRM) is a homologue of Saccharomyces cerevisiae Swi2/Snf2 (Peterson and Herskowitz, 1992; Carlson and Laurent, 1994; Elfring et al., 1994) led to the definition of the SWI/SNF complex as a chromatin remodelling machine (Cote et al., 1994; Logie and Peterson, 1997) and the identification of another trxG member, moira, as a further component of the Drosophila SWI/SNF complex (Papoulas et al., 1998). Another trxG subclass encompasses the DNA binding proteins, zeste and GAGA factor. Although these proteins act independently, both appear to play similar roles in the stabilization of higher‐order chromatin looping (Chen and Pirrotta, 1993; Katsani et al., 1999). A third subclass within the trxG (called here trxG3) that remains poorly understood includes trithorax itself, ash1 and ash2 (Shearn, 1989).

Insight into the potential molecular actions of trxG3 members came from the identification of several domains within their protein sequences (Mazo et al., 1990; Stassen et al., 1995; Adamson and Shearn, 1996; Tripoulas et al., 1996). Both trithorax (Trx) and Ash1 include a SET domain, which was identified through its occurrence in the chromatin factors, Su(var)3‐9, enhancer of zeste [E(Z)] and Trx (Jones and Gelbart, 1993; Tschiersch et al., 1994). All three trxG3 members also include one or more PHD fingers (Aasland et al., 1995) and Ash2 includes a SPRY domain (Ponting et al., 1997). Of these domains, the SET domain in Su(var)3‐9 homologues, in particular mammalian SUV39H1 and Schizosaccharomyces pombe Clr4, was recently identified as the first histone lysine methyltrans ferase (Rea et al., 2000), thus suggesting that other proteins containing SET domains could also be histone lysine methyltransferases. Subsequently, two other SET domain proteins, human G9a (Tachibana et al., 2001) and Drosophila Ash1 (C.Beisel, A.Imhof and F.Sauer, in preparation) have been shown to possess histone lysine methyltransferase activities. Notably each of these proteins have an additional cysteine‐rich motif immediately N‐terminal to their SET domains, previously referred to as a Cys‐rich cluster or a preSET region. The presence of the preSET region in SUV39H1 is required for methyltransferase activity (Rea et al., 2000). Trx homologs contain a different type of preSET region, termed ATA2 (Prasad et al., 1997). Whether the absence of a SUV39‐type preSET domain is the reason why Trx and E(Z) so far have tested negative in methyltransferase assays (Rea et al., 2000; C.Beisel, A.Imhof and F.Sauer, in preparation) is not yet clear.

By sequence alignments, the genome of S.cerevisiae encodes six genes with significant matches to the SET domain (now termed Set1–6; Pijnappel et al., 2001). Of these, the SET domain of Set1 is more similar to Trx SET domains than any other. Set1 does not appear to include a preSET Cys‐rich region. In fact, S.cerevisiae does not appear to have a clear Su(var)3‐9 homologue. set1 is not essential to yeast, however thorough analyses of its mutant phenotype revealed a variety of defects, including roles in silencing at mating‐type loci and telomeres, metabolism, maintenance of telomere length (Nislow et al., 1997) and DNA repair (Corda et al., 1999; Schramke et al., 2001). Notably, expression of mammalian E(Z) homologues, either human EZH2 or murine Ezh1, in set1 strains restored the loss of gene repression at telomeres (Laible et al., 1997). To explore further Set1 function in S.cerevisiae, we characterized the Set1 complex (Set1C) and its proteomic environment using a sequential tagging and mass spectrometry approach (Rigaut et al., 1999; Shevchenko et al., 1999). We define Set1C as a complex of eight members which includes two proteins that display a SPRY domain and a PHD finger respectively. Thus it appears that Set1C incorporates an Ash2 analogue by physically associating two proteins that each carry a part of Ash2. Here we show that Set1C has histone methyltransferase activity specific for lysine 4 of histone 3 (H3) and is thus both the first histone lysine methyltransferase described in S.cerevisiae and the first from the Set1/Trx branch of SET domains. Together, these results imply that the Cys‐rich preSET region is not essential for histone lysine methyltransferase activity in certain contexts and that unexpected aspects of trxG3 action also exist in S.cerevisiae.


Biochemical characterization of Set1C

The protein complex associated with Set1 was identified by C‐terminal tagging of Set1 with the TAP tag (Rigaut et al., 1999) and identification of co‐purified proteins by MALDI mass spectrometry (Wilm et al., 1996; Shevchenko et al., 1999). All Coomassie stained bands present in Figure 1 were identified, however, only the eight that were subsequently confirmed as specific (see below) are depicted. All other bands in Figure 1 were identified as highly abundant ribosomal or heat shock proteins and have also been found in tagging experiments of unrelated proteins (results listed in Materials and methods).

Figure 1.

The composition of the Set1 complex. The affinity‐purified Set1‐TAP complex was separated on 7–25% SDS–PAGE and visualized by staining with Coomassie Blue. Molecular weight markers indicated on the left in kilodaltons. All bands present in this gel were identified and those subsequently determined to be specific to Set1C by repetition and further affinity purification exercises are depicted to the right. Each protein is depicted with identifiable protein domains and motifs as indicated in the key below. The thickened grey lines in Set1 and Spp1 indicate regions of further conservation to S.pombe proteins, Set1 and SPBC13G1.08c, respectively and the thickened grey line in Bre2 indicates the extent of further homology to Ash2 either side of the SPRY domain (see Figure 3C). The length of each polypeptide (aa) is noted on the right. The domains indicated are: PHD finger (Pfam:PF00628); n‐SET (N‐terminal SET associated domain in SET1 family; see Figure 3A); SET domain (Pfam:PF00856); postSET, C‐terminal SET‐associated peptide (SMART:00508); WD domain (Pfam:PF00400); RIIa, protein kinase A regulatory subunit dimerization domain (Pfam:PF02197); RRM, RNA recognition motif (Pfam:PF00076) and SPRY, domain in SPlA kinase and the ryanodine receptor (Pfam:PF00622). The SPRY domain in Bre2 is interrupted by three insertions (see Figure 3C). WD40 domains showing significant alignment scores are shown in light green, inferred alignments in white (data not shown).

Each of the seven proteins specifically co‐purifying with Set1‐TAP in Figure 1 were tagged and complexes were purified and identified (Figure 2). In five of the seven cases, all eight members of Set1C were specifically retrieved without any new co‐purifying proteins. In the sixth case, tagging of Sdc1 yielded all members of Set1C except for Swd2, which was not identified in this experiment, possibly for technical reasons. In the last case, Swd2‐TAP pulled down only a part of Set1C (Set1, Bre2, Spp1) and also, unexpectedly, co‐purified the yeast cleavage polyadenylation factor (CPF) complex. The presence of Swd2 in CPF has been confirmed by independent work on CPF by B.Dichtl and W.Keller. Although all members of CPF were not identified in the experiment of Figure 2, our Swd2‐TAP extracts are active for cleavage and polyadenylation (results of B.Dichtl and W.Keller, personnal communication). Swd2 is called Cpf10 by B.Dichtl and W.Keller. Notably, Swd2/Cpf10 is the only member of Set1C that is essential to yeast (SGD database; B.Dichtl and W.Keller, personal communication; our unpublished observations), probably because it provides an essential function for CPF, not Set1C. Since Swd2 was co‐purified with Set1C when six of the seven other members were tagged, we conclude that it is a bona fide member and reason that either it is less stably associated with Set1C or that the TAP tag on Swd2 disturbs Set1C or both.

Figure 2.

Sequential affinity purification of Set1C. Each of the other seven members of Set1C was TAP‐tagged and the purified complexes visualized by Coomassie Blue staining as indicated above each panel. The identity of each protein was established by mass spectroscopy and is indicated by numbers (see key to the right). Many unlabelled bands were also identified and were found to be highly abundant proteins (see Material and methods). The presence of the tag, after TEV cleavage, increases the size of the tagged protein by ∼10 kDa.

The stoichiometry of Set1C appeared to be uniform regardless of which member, except Swd2, was tagged. Careful inspection of Coomassie staining intensities from each of the affinity purified preparations led to a similiar estimate of Set1C stoichiometry. To arrive at this consensus, we discounted stoichiometries of the tagged proteins, since they may be over represented if the complex partially disassembled during purification. We estimate Set1C as Set1 (2); Bre2 (2); Swd1 (1); Spp1 (2); Swd2 (1); Swd3 (1); Sdc1 (<1); Shg1 (<1; relative stoichiometry in parentheses). In contrast to our experience with the Set3 complex (Set3C; Pijnappel et al., 2001), no Set1C member appeared to be clearly present as free protein, that is, in obvious (>4‐fold) stoichiometric excess when purified in tagged form.

Bioinformatic analysis of Set1C members

The protein sequences of the eight members of Set1C were analyzed for matches in the databases (Figure 3). Database searches identified candidate orthologues of Set1 in S.pombe, Caenorhabditis elegans, Drosophila and humans. These five predicted proteins not only share very similar SET domains, but also two further regions of similarity, one of which includes a region with similarity to the RNA recognition motif (RRM, also known as RNP), the canonical RNA‐binding domain (Figure 3A). Whereas all five protein sequences present the RNP motif, similarities between Set1 and S.pombe Set1 extend further either side of the RNP motif (denoted as a thick grey line in Figure 1). Immediately N‐terminal to the SET domain, in the preSET position, these five proteins display a novel conserved region of ∼160 residues called here n‐SET. Hence we conclude that Set1 orthologues are present in four diverse eukaryotes. Additionally, a predicted protein from the Arabidopsis genome shows Set1 homology in the n‐SET and SET domains (Figure 3A), however, no associated RNP motif was identifiable from available sequence data.

Figure 3.Figure 3.Figure 3.Figure 3.
Figure 3.

Sequence analyses of the Set1C. (A) Multiple sequence alignment of Set1 family members showing the RRM region (upper) and n‐SET/SET/postSET region (lower). The alignment is colour coded in order to highlight the conserved features according to Gibson et al. (1994). The amino acid co‐ordinates in each sequence are indicated after each of the two sequence blocks. The RRM domain (also called RNP) is highlighted with an orange bar. Four RRMs from human U2AF and RNPA are included for comparison. Assignment of secondary structure elements (H, helix; E, strand) is based on the known RRM structure (Burd and Dreyfuss, 1994). Below, the n‐SET region is denoted by a light green bar, the SET domain by a dark green bar and the postSET motif by a yellow bar. The conserved residues characteristic for the methyltransferase catalytic core of the SET domain (Rea et al., 2000) are indicated with red dots. The position of stop codons is indicated by ‘<’. (B) Multiple alignments of selected preSET regions are shown. At the top, the preSET regions found in SUV39 and G9a families, termed preSET‐s, is shown. At the bottom, the preSET region found in E(Z) and ASH1 families, preSET‐e, is shown. (C) Multiple alignment of the SPRY region of Bre2 with the Ash2 family. The region of homology shown extends further N‐ and C‐terminally than the defined SPRY domain, which, in this alignment, starts at residue 61 and ends at 283. Three insertions in Bre2 of 32, 46 and 42 residues are indicated. (D) Multiple alignment of a region in Sdc1 with Dpy30 and other sequences, including four human protein kinase A factors for reference. This region contains a motif that is related to the dimerization domain (here called RIIa) of protein kinase A regulatory subunits. The position of two α‐helices in the RIIa structure (pdb:1r2a) is shown. The database sources of all proteins used in this figure and elsewhere in this paper are given in Table II.

The n‐SET region is not cysteine‐rich. However, the nature of the Cys‐rich regions previously observed in SUV39, Ash1 and E(Z) families has not been clearly defined. Hence we examined these Cys‐rich regions for significant alignments and found that they fall into two groups encompassing either SUV39 and G9a, called here preSET‐s, or E(Z) and Ash1 called here preSET‐e (Figure 3B). Neither show any significant similarity to n‐SET. Together with the preSET region in the Trx family ATA2 (Prasad et al., 1997), we conclude that preSET regions fall into at least four distinct classes, ATA2, n‐SET, preSET‐s and preSET‐e.

Bre2 was previously identified in a screen for mutations resistant to brefeldin, the toxin that disrupts the Golgi apparatus (Dinter and Berger, 1998). The relationship between brefeldin resistance of Δbre2 and our finding that Bre2 is entirely restricted to Set1C is unclear; however, loss of Set1 produces a plethora of cellular defects (Nislow et al., 1997; Corda et al., 1999), which may include perturbations of cytoplasmic protein translocation. Sequence analysis of Bre2 reveals a SPRY domain (Figure 3C), which is a domain originally found in splA kinase, ryanodine receptors and the trxG protein, Ash2 (Ponting et al., 1997). Amongst SPRY domains, Bre2 is most similar to that in the Ash2 family of proteins. In fact, the region of homology between Bre2 and Ash2 extends beyond the SPRY domain. The SPRY domain in Bre2 is interrupted by three large insertions relative to most other SPRY domains. These interruptions probably indicate insertions in flexible loops within the protein fold (Figure 3C). Notably, all members of the Ash2‐family so far, except Bre2, carry a PHD finger.

Spp1 is one of the 14 genes in S.cerevisiae that present a PHD finger (unpublished observations). The PHD finger of Spp1 is most closely related to the PHD finger of S.pombe SPCC594.05c and then to human CGBP (data not shown), a protein that binds preferentially to unmethylated CpGs (Shin Voo et al., 2000).

Set1C includes three putative seven WD40 repeat proteins. Whereas Swd3 includes seven statistically significant WD40 repeats, Swd1 and Swd2 show only five or four, respectively, and the other repeats are inferred (not shown, white in Figure 1).

Sdc1 shows a significant match to Dpy‐30 (Figure 3D), a protein required for sex‐specific association of chromosome condensation factors during dosage compensation in C.elegans (Hsu et al., 1995). Sdc1, Dpy‐30 and close relatives include a short motif related to the dimerization motif in the regulatory subunit of protein kinase A. This motif consists of two α‐helices that form a special type of four‐helix bundle during dimerization (Newlon et al., 1999).

Shg1 shows no significant similarity to any protein sequence in the databases.

Dissection of Set1C

Interactions within Set1C were evaluated by deletion of set1 in strains carrying the TAP tag fused to other Set1C members (Figure 4). In the absence of Set1, we observed that (i) Bre2‐TAP and Sdc1 remain associated and no other Set1C members were found; (ii) Sdc1‐TAP and Bre2 remain associated and no other Set1C members were found; (iii) Swd1‐TAP and Swd3 remain associated and no other Set1C members were found and (iv) Spp1‐TAP and Shg1‐TAP were retrieved as free proteins with neither associating with any other Set1C member. Hence, Set1 is central to Set1C, which almost entirely disassembles in its absence. Of the variety of protein–protein interactions required for assembly of Set1C, only the interactions between Swd1 and Swd3, and Bre2 and Sdc1 have been previously mapped by two‐hybrid approaches (Uetz et al., 2000; Ito et al., 2001). Notably, the stoichiometric relationship between Bre2‐TAP and Sdc1 is 2:1; however, between Sdc1‐TAP and Bre2 it is 1:1 (Figure 4). This implies that all Sdc1 is associated with Bre2, however, only half Bre2 is associated with Sdc1. This observation also recapitulates the relative stoichiometries of Bre2 and Sdc1 in Set1C as 2:1 and the observed lack of free protein of either. Hence we conclude that all cellular Bre2 and Sdc1 are incorporated into Set1C at a 2:1 ratio and Sdc1 stably binds Bre2.

Figure 4.

Dissection of interactions within Set1C. The structure of Set1C was examined in set1 strains carrying TAP‐tagged Set1C members, as indicated above each panel, by affinity purification. All protein identities, including many unlabelled unspecific bands were established by mass spectroscopy. Numbers are the same as in Figure 2 and are 2, Bre2; 3, Swd1; 4, Spp1; 6, Swd3; 7, Sdc1; 8, Shg1.

Set1C specifically methylates lysine 4 of H3

Set1C extracts purified from either Bre2‐TAP or Shg1‐TAP strains showed histone methyltransferase activity when incubated with an H3 tail peptide (Figure 5A). Unexpectedly, Set1C extracts purified from the Set1‐TAP strain showed no activity. Since the composition, stoichiometry and method of preparation of Set1C was the same in all three preparations, we attribute this result to inhibition of Set1C methyltransferase activity by the positioning of the protein tag at the C‐terminus of Set1. As for Set1, the SET domain is very often found at the very C‐terminus of SET domain proteins and the position of the stop codon can be regarded as a conserved feature (Ash1 is a notable exception). The presence of the protein tag at the C‐terminus of Set1 may interfere with appropriate folding of the domain, inhibition of a dynamic aspect of enzyme activity or access of the substrate. These results also imply that full methyltransferase activity is not necessary for Set1C formation. As controls for the methyltransferase activity of Set1C, a TAP‐Clr4 extract, purified by the same protocol, showed the expected enzyme activity and extracts made from Δset1 strains showed no activity (Figure 5A). To determine the specificity of Set1C, free histones were incubated with Set1C extracts and methylated products visualized by gel electrophoresis and fluorography (Figure 5B). Again, TAP‐Clr4 displayed the expected specificity towards H3 and only the Bre2‐TAP and Shg1‐TAP extracts showed activity. In each case, only H3 was methylated with no sign of activity for the other histones. To determine which site on H3 was methylated, we incubated Set1C extracts with H3 peptides specifically mutated at either lysine 4 or lysine 9. Whereas the mutant lysine 9 peptide was methylated, the mutant lysine 4 peptide was not. As expected, incubation of these peptides with TAP‐Clr4 gave the opposite result with no detectable methylation of the mutant lysine 9 peptide and strong methylation of the mutant lysine 4 peptide (not shown). Therefore, we conclude that Set1C is a histone lysine methyltransferase specific for lysine 4 of histone H3. Notably, we have so far been unable to obtain any methyltransferase activity from recombinantly expressed parts of Set1 (not shown). Our failure may merely reflect a technical problem, however, could indicate, along with the Set1‐TAP result above, a sensitivity of the enzyme activity to its appropriate environment within in the complex.

Figure 5.

Set1C specifically methylates lysine 4 of H3. Histone methyltransferase activity was assayed by incubation of an H3 tail peptide (A) or free histones (B), with affinity purified extracts prepared from yeast strains carrying the TAP tag fused to Clr4 or Set1C components as indicated above each lane. The extracts were prepared from either wild‐type or set1 strains as indicated. The TAP‐Clr4 extract was prepared from the wild‐type strain carrying a TAP‐Clr4 CEN plasmid. (A) Extracts were incubated with an H3 N‐terminal peptide in the presence of S‐adenosyl‐l‐[methyl‐3H]methionine and incorporated radioactivity determined by filter binding. Buffer, incubation of all reagents without any added extract. (B) Extracts were incubated with free histones in the presence of S‐adenosyl‐l‐[methyl‐3H]methionine, followed by gel electrophoresis and Coomassie Blue staining (left) and fluorography (right). (C) Set1C, isolated from a wild‐type strain including Bre2‐TAP, was incubated with H3 N‐terminal peptides carrying either a lysine 4 to leucine (K4L) or lysine 9 to leucine (K9L) mutation.

Set1C activity in telomere length maintenance

Amongst a variety of phenotypic effects caused by the absence of set1, Nislow et al. (1997) noted that telomere lengths were shortened. We therefore examined telomere lengths in strains carrying Set1C mutations. In agreement with Nislow et al. (1997), removal of set1 significantly shortened telomere lengths by ∼100 bp (Figure 6, lanes 1 and 2). Telomeres are also shortened, although not as dramatically, in the Set1‐TAP strain (lane 3). This concords with our demonstration that Set1C isolated using Set1‐TAP lacks methyltransferase activity in vitro (Figure 5) and indicates that the histone methyltransferase activity of Set1C contributes to maintenance of telomere lengths. As a control, a strain containing N‐terminal TAP‐tagged Set1 (TAP‐Set1; lane 4) did not show any reduction of telomere lengths, thus further strengthening our conclusion that C‐terminal positioning of the TAP tag on Set1 selectively interferes with histone methyltransferase activity. Strains containing deletions of the Set1C components, bre2, swd1 and swd3 (lanes 5–7) also showed reduced telomere lengths. Whereas loss of either swd1 or swd3 resulted in a similar degree of shortening as loss of set1, loss of bre2 produced a milder effect. Taken together with the biochemical characterization of Set1C, these results indicate that Set1 action in maintenance of telomere lengths is mediated by Set1C and involves its methyltransferase activity. The intermediate impact on telomere lengths observed in Set1‐TAP and bre2 strains also suggest that telomere maintenance by Set1C is not solely reliant on its methyltranseferase activity and another aspect of Set1C makes a contribution.

Figure 6.

Set1C activity is required for maintenance of telomere lengths. Telomeres were visualized by a Southern blotting strategy using a telomere specific probe. Genomic DNAs were isolated from the following strains: set1 (lane 1); wild type (lane 2); Set1‐TAP (lane 3); TAP‐Set1 (lane 4); bre2 (lane 5); swd1 (lane 6); swd3 (lane 7).


The stable maintenance of gene expression patterns through mitotic divisions, usually termed epigenetic regulation, appears to be important during development in higher organisms. Epigenetic mechanisms provide a way for multicellular organisms to utilize genomic information in multiple overlapping patterns according to different cell lineages (Francis and Kingston, 2001). The search for a relationship between epigenetic mechanisms and development in higher eukaryotes has been led by the genetic screens in Drosophila that identified two opposing groups of candidates, PcG and trxG. PcG members are required to maintain patterns of gene repression, whereas trxG members appear to be required to maintain patterns of gene activity (Kennison, 1995; Mahmoudi and Verrijzer, 2001). Closer analyses of these phenomena have not yet delivered simple explanations of PcG and trxG epigenetic mechanism in development and recent evidence implies complex linkages to signalling, the cell cycle and the basal transcription apparatus (Gould, 1997; Jacobs et al., 1999; Voncken et al., 1999; Breiling et al., 2001; Saurin et al., 2001).

Clearer insights into epigenetic mechanisms have emerged from studies of the maintenance of chromatin states, such as those associated in yeast with centromeres (Karpen and Allshire, 1997), telomeres (Gottschling et al., 1990) mating‐type silencing (Grunstein, 1998) or synthetic gene expression states in Drosophila (Cavalli and Paro 1998, 1999; Wakimoto, 1998). These studies lend support to the proposition that histone tails are the template for an epigenetic code, written in post‐translational modifications involving phosphorylation, acetylation and methylation (Turner, 2000; Jenuwein and Allis, 2001). Of these modifications, the high turnover of phosphoryl and acetyl adducts on histone tails complicate simple models of inheritable chromatin states. In contrast, methyl groups on histone tails appear to be more stable (for discussion see van Holde, 1988). Consequently, histone tail methylation patterns may be primary inheritable enscriptions of epigenetic states which subsequently limit and/or direct other modifications (Rea et al., 2000; Bannister et al., 2001; Lachner et al., 2001; Nakayama et al., 2001).

To evaluate the merits of this suggestion, the specificities of histone methyltransferases and subsequent impact on patterns and hierarchies of histone tail modifications need to be unravelled. Towards this end, our identification of Set1C as the first H3 lysine 4 methyltransferase activity adds several aspects.

To the specificities of SUV39/Clr4 for H3 lysine 9 (Rea et al., 2000) and Ash1/G9a for H3 lysines 9 and 27 (Tachibana et al., 2001; C.Beisel, A.Imhof and F.Sauer, in preparation), we add the specificity for H3 lysine 4. Methylation of H3 lysine 4 appears to be widespread in eukaryotes (Strahl et al., 1999). Consequently, we anticipated that the corresponding methyltransferase may be widely conserved, so we screened the databases for Set1 homology. We found candidate orthologues in S.pombe, C.elegans and humans and also, after detailed sifting, in the Drosophila genome (Figure 3A). These predicted Set1 proteins have a similar architecture. They encompass three regions of clear homology (Figure 3A); (i) a centrally located region with homology to an RNA binding region known as the RRM motif (this may indicate a role for RNA binding, however RRM motifs also mediate protein–protein interactions); (ii) a region immediately N‐terminal to the SET domain that is distinct and called the n‐SET region and (iii) a very similar SET domain, which is followed by the most common SET domain peptide adduct, postSET and stop codon. Therefore we speculate that Set1 orthologues are widely distributed in eukaryotes and will be, in their respective complexes, H3 lysine 4 methyltransferases.

Set1C associates Set1 with Bre2. As for most Set1C proteins, it appears that all cellular Set1 and Bre2 are bound in Set1C in vegetatively growing haploid yeast. Bre2 shows sequence homology to Ash2 that stretches beyond their shared SPRY domains (Figure 3C). Bre2 does not include, however, a PHD finger, which is a conserved feature of Ash2 orthologues. Intriguingly, Set1C includes a PHD finger protein, Spp1. On the basis of these observations, we predict that the Ash2 complex in higher eukaryotes will include a Set1 orthologue and mediate H3 lysine 4 methyltransferase activity.

Thus, emerging and circumstantial evidence suggests that histone lysine methyltransferase activity is a common mechanistic feature of trxG3 (Trx, Ash1, Ash2) action. For Trx, lack of methyltransferase activity in vitro using recombinantly expressed protein and the absence of a cysteine‐rich preSET region, have raised doubts about its ability to act as a methyltransferase (Rea et al., 2000; C.Beisel, A.Imhof and F.Sauer, in preparation). However, we show that Set1, which lacks a Cys‐rich preSET region, functions as a histone lysine methyltransferase in the context of its native complex. Since the SET domain of Set1 is closely similar to that of Trx and the Trx SET domain includes all residues currently known to be important for enzyme activity, including the arginine beside the essential glutamine/histidine core (Rea et al., 2000; Figure 3A), we consider it probable that Trx acts as a methyltransferase in the context of its native complexes. Recombinantly expressed Ash1 has been recently shown to be a methyltransferase specific for H3 lysines 9 and 27 (C.Beisel, A.Imhof and F.Sauer, in preparation) and here we associate Ash2 with the Set1C H3 lysine 4 methyltransferase activity. Thus the previous genetic linkage between trxG3 members (Shearn, 1989) may reflect a common mechanistic linkage based on histone lysine methyltransferase activities of their protein complexes.

In addition to demonstrating by point mutation that the conserved core of the SUV39 SET domain was key to H3 lysine 9 methyltransferase activity, Rea et al. (2000) also showed that the Cys‐rich preSET region was also required. Here we refine the classification of preSET regions by separating the Cys‐rich regions of SUV39 and G9a families from those of E(Z) and Ash1 families to define the preSET‐s (SUV3‐9 class) and preSET‐e [E(Z) class] regions (Figure 3B) and observe a novel preSET region in Set1 homologues, termed n‐SET (Figure 3A). Along with the previously observed ATA2 preSET region in the Trx family (Prasad et al., 1997), it is now apparent that SET domains are N‐terminally flanked by at least four preSET domains.

Many SET domains are also associated at their C‐terminals with one of two short postSET peptide extensions. Most commonly, this C‐terminal extension includes three cysteines and is immediately followed by a stop codon, except for the Ash1 subclass. SET domains in E(Z) homologues do not include these three cysteines, but do include other conserved residues plus a stop codon (postSET‐z). Consequently, SET domain proteins can be classified by a combinatorial rule according to the (i) type of preSET region (one of four, so far; preSET‐s, preSET‐e, n‐SET, ATA2), (ii) presence of a postSET or postSET‐z C‐terminal extension and (iii) presence or absence of a stop codon immediately following the postSET extension.

Interestingly, classification of SET domains by this combinatorial rule yields the same groupings as classification by homology within the SET domain itself. Figure 7 shows a tree of SET domains based solely on sequence alignments of SET domains excluding flanking regions. This analysis revealed that SET domains fall into four major branches; SUV39, Ash1, Set1/Trx and E(Z) groups. On the right of Figure 7, the distribution of N‐ and C‐terminal associated regions and stop codons is depicted for each of the branches. A clear correspondence between the two ways to classify SET domains is evident. Boxed SET proteins in Figure 7 depict those currently shown to be associated with histone lysine methyltransferase activity. It can be seen that enzymatic activity has been found in three different branches. Furthermore, although no E(Z) member has yet been shown to associate with histone lysine methyltransferase activity, E(Z)H1 has been shown to rescue set1 defects in telomeric silencing in yeast (Laible et al., 1998). These analyses indicate that enzymatically active histone lysine methyltransferases are likely to be widespread amongst SET domain proteins and possibly all those listed in Figure 7 will have activity when tested in suitable ways. For some, this may require testing for activity using native complexes.

Figure 7.

Classification of SET domains by two different criterion yields the same groupings. At the left, a non‐rooted tree of the SET domains, based on multiple sequence alignment of the SET domain, without inclusion of preSET and postSET regions, is shown. Four major groups are evident [SUV39, ASH1, SET1/TRX, E(Z)]. To the right, the distribution of flanking sequence elements is depicted. Known methyltransferases, present in three of the four major branches, are boxed.

Materials and methods


Strains used in this study are listed in Table I. Yeast transformations were performed as described (Soni et al., 1993). All haploid strains were derived from MGD353‐13D (Puig et al., 1998). Gene disruptions and TAP‐tag introduction were performed as described (Puig et al., 1998; Rigaut et al., 1999). Correct cassette integrations were confirmed by PCR and western blot analysis (for tagging) or PCR and genomic Southern blot (for disruptions).

View this table:
Table 1. Saccharomyces cerevisiae strains used in this study

TAP purification and mass spectrometry (MS)

The extraction of yeast cells was performed as described for the yeast SWI/SNF complex (Logie and Peterson, 1999). The TAP tag consists of a calmodulin‐binding peptide (CBP), a TEV protease cleavage site and two IgG‐binding units of protein A as described (Rigaut et al., 1999). TAP purification was performed according to Rigaut et al. (1999) with the following modifications: 10 ml supernatant of the 43 000 g centrifugation (Logie and Peterson, 1999) was allowed to bind to 200 μl IgG–Sepharose (Pharmacia), equilibrated in buffer E (Logie and Peterson, 1999) for 2 h at 4°C using a disposable chromatography column (Bio‐Rad). Two to three columns (the equivalent of 4–6 l yeast culture at OD600) were used per purification shown. The IgG–Sepharose column was washed with 35 ml buffer E lacking proteinase inhibitors, followed by 10 ml TEV cleavage buffer (Rigaut et al., 1999). TEV cleavage was performed using 10 μl (100 U) rTEV (Gibco) in 1 ml TEV cleavage buffer for 2 h at 16°C. Calmodulin–Sepharose (Stratagene) purification was as described (Rigaut et al., 1999). Purified proteins were concentrated as described by Wessel and Flügge (1984). After separation on 7–25% SDS–PAGE gradient gels, proteins were stained with Coomassie Blue, in‐gel digested with trypsin and identified by MS as described (Shevchenko et al., 2000).

Histone methyltransferase assay (HMT)

HMT assays were done essentially as described previously (Strahl et al., 1999; Rea et al., 2000). Partially purified extracts were incubated in 1× methyltransferase buffer (MTB; 50 mM Tris pH 8.5, 20 mM KCl, 10 mM MgCl2 and 250 mM sucrose) in the presence of S‐adenosyl‐l‐[methyl‐3H]methionine (74 Ci/mmol, Amersham) as a methyl group donor at 1 μM final concentration and either 10 μg of free histones (Roche) or 5 μg N‐terminal histone H3 peptides [wt 1–28 aa (Sigma); K4L and K9L 1–20 aa] for 1.5 h at 30°C in a total volume of 50 μl. Reactions were either spotted in duplicate on Whatman P81 paper, washed with 4 × 15 min with 50 mM NaHCO3 pH 9.0, completely dried and counted in a LSC or electrophoresed on 15% AA‐SDS gels and subjected to fluorography.

Genomic Southern analysis

Yeast genomic DNA was extracted from exponentially growing cells. Two micrograms of each were digested with XhoI (NEB) and resolved on a 0.75% agarose gel in TBE buffer. The gels were capilliary blotted onto nylon (Biodyne B, Pall) and hybridized in Church/Gilbert solution at 65°C using a PCR fragment amplified from genomic DNA using the primers AGTTTAGCAGGCATCATC and CCTACTCTTTCCCACTTG to amplify the y′ repeats. The PCR fragment was labelled by random priming (Amersham) and recognises chromosomes 2, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15 and 16.

Sequence analysis

Database searches where performed with Blastp, Tblastn and psi‐Blast (Altschul et al., 1997). Psi‐Blast was only used to identify close homologues and searches were ended after three iterations. Previously characterized protein domains were identified with Pfam (Bateman et al., 2000) and SMART (Schultz et al., 2000) and with the interactive use of PairWise (version 1.6.2b; Birney et al., 1996). Multiple alignments were generated with Clustal_X (Thompson et al., 1997) and subsequently manually edited. The alignments to the RIIa and RRM motifs were performed with Clustal_X in profile mode, using corresponding seed alignments from Pfam as a reference.

Prediction of a second exon in the complete genome sequence of Drosophila melanogaster sequence (DDBJ/EMBL/GenBank accession No. AE02989) GI:10729576 was done by visual inspection and with the aid of Blast2 sequences and PairWise. The details of this prediction will be made available on the webpage

A non‐rooted tree of the SET domains (positions 1–138 in the alignment shown in Figure 3A) was generated with the NJ‐method as implemented in Clustal_X. Positions with gaps where excluded.

Contaminanting proteins found in Figures 1, 2 and 4, listed as database entry numbers:

Figure 1

P11484, P26782, P10592, P02994, P23248, Q12109, P14126, P05753, P26783, P22203, P41805, P05737, P26786, P05740, P26781, Q12672, P04649.

Figure 2

Shg1 lane: P02994, Q12109, P00925, P00560, P14126, P49626, P00359, P05736, P05753, P29453, P22203, P26783, P05737, P53030, P26768, P07280, P26782, P02407, P05749, P04649.

Bre2 lane: P04050, P02994, Q12109, P14126, P10664, P05750, P32905, P05750, P05736, P22203, P26783, P05737, P26786, O13516, P05740, P02407, P04649.

Swd1 lane: P10592, P06634, P40150, P02994, Q01852, P23248, P26783, P22203, P26786, P07281, P02407, P53221, P04649, P06367.

Swd3 lane: P32589, P32503, P10591, P06634, P40150, P02994, Q12109, Q01852, P02365, P05750, P23248, P05753, P26783, P41805, P05737, P26786, P05740, P05756, P02406, P07280, P20407.

Swd2 lane: P10592, P11484, P02994, Q12109, P49626, P00359, P46654, P28495, P02365, P05750, P23248, P26783, P22203, P26786, P48164, P05740, P05735, P07280, P53221, P04649.

Spp1 lane: P10592, P11484, P02994, P14126, P10664, P02365, P26248, P29453, P05753, P26783, P05737, P05754, P41805, P26786, P26785, O13516, P05740, P04449, P32827, P02406, P26782, P02407, P06367, P04649, P07282.

Sdc1 lane: P16521, U43281, P46655, P10591, P40150, P02994, P14126, P10664, P39015, P00359, P05750, P46654, P23248, P05736, P05753, P25443, P22203, X89368, P05737, P41805, P05754, P40212, P05735, P38828, P05755, P26785, P05740, P47913, P24000, P41056, P07282, P05745.

Figure 4

Bre2/Set1 ko: P32589, P10591, P11484, P06634, P02994, P14126, Q01852, P10664, P00359, P26783, P41805, P26786, P05740, P07280, P02407, P26782, P04649, P06367.

Swd1/Set1 ko: P32589, P10591, P06634, P40150, P02994, Q12109, P14126, P25491, Q01852, P49626, P40531, P00359, P00358, P33442, P26783, P22203, P41805, P26786, P05740, P24000, P05735, P26781, P07281, P02407, P26782, P04456, P40213, P54780, P06367, P04649.

Spp1/Set1 ko: Q9URQ7, P02994, P00925, P14126, P10664, P00359, M26506, P05750, P32905, P23248, P05753, P26783, P22203, P41805, P26786, P05740, P47913, P26781, P04449, P02406, P07280, P02407, P26782, P04649, P39516, P05749, P07282.

Sdc1/Set1 ko: P32324, P32565, P32589, P10591, P40150, P06634, P02994, P00925, P00560, P14126, Q01852, P10664, P00359, P23248, P26783, P41805, P26786, P26782, P07280, P53221, P02407, P04649, P06367.


We thank Axel Imhof, Thomas Jenuwein and Frank Sauer for discussions, Robin Allshire and Thomas Jenuwein for valuable materials and Bernhard Dichtl, Walter Keller and Frank Sauer for communication of results prior to publication.