Chromatin is the ensemble of genomic DNA and a large number of proteins. Various genome‐wide mapping techniques have begun to reveal that, despite the tremendous complexity, chromatin organization is governed by simple principles. This review discusses the principles that drive the spatial architecture of chromatin, as well as genome‐wide‐binding patterns of chromatin proteins.
Chromatin is probably the most complex molecular ensemble in the cell. It consists of genomic DNA together with all directly or indirectly associated protein and RNA molecules. This includes histones, DNA‐binding factors (DBFs), the basal transcription machinery and its nascent transcripts, replication and repair machineries that copy and maintain DNA, and many other molecules that interact with any of these components. All of these components work in concert, and cannot be fully understood unless they are studied in their complete context.
The past decade has seen the development of new powerful genomics‐based techniques that have substantially scaled up our ability to perform measurements on chromatin. These and other new methods have enabled us to tackle the complexity of chromatin more efficiently, and we are now beginning to see the first glimpses of the ‘big picture’ of chromatin. Basic global principles, some of which may be different than previously thought, are emerging. Here, these principles (summarized in Box 1) will be discussed. We will focus on two general aspects of chromatin: structure and composition. First, we will consider the spatial folding of the chromatin fibre inside the nucleus, at scales from ∼1 kb to several megabases. Among others, we will reconsider the widely assumed dichotomy of ‘compacted’ heterochromatin versus ‘open’ euchromatin, an oversimplified model that may prove to be largely incorrect. Second, we will look into the proteome of chromatin, and compare the major types of chromatin as defined by local protein composition. Characteristics and biological functions of these chromatin types will be discussed. Finally, a model will be presented on how chromatin may target transcription factors to specific genes without a major role for chromatin (in)accessibility.
Box 1 Principles of chromatin organization
I Three‐dimensional architecture
Architecture is driven by a combination of polymer biophysics and biochemical interactions.
Most DNA is in a beads‐on‐a‐string configuration with varying degrees of poorly understood local compaction.
There are many long‐range contacts between genomic loci, but most of these contacts are transient.
Tethering to landmark structures contributes to the overall folding of chromosomes.
II Chromatin composition
Chromatin harbours thousands of proteins and dozens of histone marks.
Distinct combinations of proteins and histone marks form a limited number of chromatin types.
Some chromatin types mediate gene repression, others are conducive to transcription.
Each chromatin type assists the binding of specific DBFs to their cognate DNA motifs.
Chromatin structure: polymer physics and specific interactions
At its heart, chromatin consists of nucleosomes, which consist of ∼147 bp of DNA wrapped around histone octamers. About 20–50 bp of linker DNA connects the nucleosomes. This ‘beads‐on‐a‐string’ configuration can be regarded as a highly flexible polymer (Figure 1A), and to a large extent the three‐dimensional folding of chromatin is driven by the principles of polymer physics (Langowski and Heermann, 2007). All available evidence indicates that a chromosome does not adopt one single reproducible configuration; instead, it can have many different ones. Nevertheless, polymer physics imposes some restrictions to the possible architecture. Computer simulations have shown that some features of chromosomes, such as aggregation of heterochromatic chromocenters or the segregation of chromosomes into distinct territories, may be explained by basic polymer behaviour with only a few additional assumptions (Cook and Marenduzzo, 2009; de Nooijer et al, 2009; Bohn and Heermann, 2010). But other features of chromosome folding cannot be accounted for by polymer physics alone; for example, over megabase distances, interphase chromosomes are more compacted than is predicted by computer simulations of simple polymers (Mateos‐Langerak et al, 2009). Indeed, it is now clear that specific biochemical mechanisms exist that contribute to the folding of chromatin. In essence, these mechanisms come in three types: local compaction, long‐range interactions and anchoring to nuclear scaffolds.
Local chromatin compaction
A popular notion is that the nucleosomal fibre is locally coiled to form a solenoid or other repetitive structure with a diameter of ∼30 nm. Theoretically, there are several ways in which nucleosomes could be stacked to form such a fibre, and evidence for some configurations has been obtained in vitro using pure polynucleosomes (reviewed in Fussner et al, 2011). However, advanced electron microscopy methods have failed to detect significant amounts of 30 nm fibres in unfixed nuclei or mitotic chromosomes (Maeshima et al, 2010; Fussner et al, 2011). Such fibres may thus be rare, or be restricted to very specific cell types.
There may, however, be other kinds of local compaction that are more common: electron micrographs of most nuclei show clear patches of densely stained chromatin, and biophysical fractionation of chromatin fragments also points to differential compaction (Gilbert et al, 2004). This local compaction may not be in the form of a regular higher‐order fibre, but may instead consist of rather irregular patches of aggregated nucleosomes (Figure 1B). A variety of specialized proteins may promote such compaction, such as the linker histone H1 and Polycomb Group (PcG) protein complexes (Francis et al, 2004; Hizume et al, 2005).
A second biochemical mechanism that is important for chromosome architecture involves physical contacts between two sequence elements that are distant on the linear chromosome (Figure 1C). Such contacts can be detected using a technique named chromatin conformation capture, of which there are now many derivatives (reviewed in Simonis et al, 2007; Fullwood and Ruan, 2009; van Steensel and Dekker, 2010). Application of this technology has revealed that there are many physical interactions between enhancers and promoters (Hakim et al, 2010). There are also frequent interactions within chromosomes over longer distances (up to ∼10 Mb) and even between loci on different chromosomes, although the latter tend to be less abundant. Surveys in mammalian cells suggest that there may be many thousands of long‐range interactions (Simonis et al, 2006; Fullwood et al, 2009; Lieberman‐Aiden et al, 2009). Interestingly, these interactions show a global domain pattern along the chromosomes, where regions of inactive and active chromatin each preferentially interact among themselves. This suggests that chromatin is segregated into two distinct compartments (Simonis et al, 2006; Lieberman‐Aiden et al, 2009). A variety of proteins is thought to act as bridging factors that mediate long‐range interactions, including the insulator protein CTCF, several transcription regulators, cohesin, polycomb protein complexes and others (Sexton et al, 2009; Hakim et al, 2010).
It is important to note that most of these long‐range contacts are in part stochastic: even though they are specific, most of the reported interactions between two loci occur only in a small fraction of cells at any point in time (Simonis et al, 2006; Lieberman‐Aiden et al, 2009; Bantignies et al, 2011). It is, therefore, difficult to imagine how such individual pairwise contacts contribute substantially to the regulation of gene activity, unless short contacts are somehow memorized as a long‐lasting epigenetic imprint (Bantignies et al, 2003).
Attachments to nuclear landmarks
A third mechanism that contributes to the folding of chromosomes involves interactions of the genome with relatively fixed nuclear ‘landmarks’ that may act as anchoring sites (Figure 1D) (reviewed in Deniaud and Bickmore, 2009; Towbin et al, 2009; Kind and van Steensel, 2010; van Steensel and Dekker, 2010). One such landmark is the nuclear lamina (NL), which consists of lamin polymers. The NL coats most of the inner nuclear membrane, providing a large surface area for potential contacts with the genome. Indeed, molecular mapping studies have shown that the genomes of Drosophila, mouse and humans interact extensively with the NL, via so‐called lamina‐associated domains (LADs) (Pickersgill et al, 2006; Guelen et al, 2008; Peric‐Hupkes et al, 2010; van Bemmel et al, 2010). In mammals, there are ∼1300 LADs, which have a median size of about 0.5 Mb. Combined, mammalian LADs cover about 35–40% of the genome. LADs generally have sharp borders that are marked by specific sequence elements that include binding sites for CTCF, CpG islands and promoters. Most genes in LADs are transcriptionally silent. This indicates that the NL contributes to gene repression, which is supported by de‐repression of NL‐associated genes in flies lacking one of the lamins (Shevelyov et al, 2009) and by downregulation of certain genes that are artificially tethered to the NL (Finlan et al, 2008; Kumaran and Spector, 2008; Reddy et al, 2008; Dialynas et al, 2010). In accordance with a role for the NL in gene regulation, hundreds of genes were observed to move towards or from the NL during differentiation (Peric‐Hupkes et al, 2010). It is still unclear how these dynamic interactions of the genome with the NL are mediated and regulated.
Nuclear pore complexes (NPCs) also interact with specific genomic loci (Liang and Hetzer, 2010). These loci are different from LADs, which is consistent with the observation that NPCs are primarily located in gaps in the NL and thus represent a distinct micro‐environment (Schermelleh et al, 2008). Genome‐wide mapping efforts in yeast, flies and mammals have identified many genes that are bound by NPC proteins (Casolari et al, 2004; Capelson et al, 2010; Kalverda et al, 2010; Vaquerizas et al, 2010), but it has become clear that most of these proteins freely diffuse through the nucleoplasm and often interact with their target genes in the nuclear interior rather than at the NPCs (Capelson et al, 2010; Kalverda et al, 2010). Genomic maps of NPC protein interactions should, therefore, be interpreted with caution, as they only partially reflect the spatial positioning of the genome relative to the nuclear envelope. Future studies need to address the functions and mechanisms of genome–NPC interactions.
Another nuclear structure that may provide a relatively fixed anchoring platform for the genome is the nucleolus. Recent mapping studies have identified many nucleolus‐associated domains (NADs), which tend to be large domains and harbour specific gene sets (Nemeth et al, 2010; van Koningsbruggen et al, 2010). Some of the NADs may overlap with LADs, suggesting that there are genomic regions that have alternating contacts with both the NL and nucleoli.
Often, spatial configuration of DNA is confused with accessibility of DNA, but these are distinct—although sometimes related—aspects of chromatin structure. Chromatin accessibility is best defined as the availability of DNA sequences for molecular interactions, typically by DBFs. Nucleosomes are major determinants of local DNA accessibility: a DNA sequence tightly wrapped around a nucleosome is less easily bound by a DBF than the same sequence in a nucleosome‐free stretch of DNA; indeed this model is supported by ample experimental data (Hayes and Wolffe, 1992; Lieb and Clarke, 2005; Rando and Ahmad, 2007). Nucleosome remodelling factors are protein complexes that can alter the positioning of nucleosomes along the DNA (Maier et al, 2008; Clapier and Cairns, 2009).
It is widely believed that there are mechanisms that control accessibility beyond the single nucleosome, over extended chromosomal regions. Theoretically, access could be blocked by specialized proteins that form a dense coating around a series of nucleosomes. Alternatively, a stretch of nucleosomes may be tightly compacted, thereby leaving little space for other proteins to access the DNA. These mechanisms are easy to imagine, but is the existence of such ‘inaccessible heterochromatin’ really supported by experimental data?
Accessibility of chromatin is often measured by exposing detergent‐permeabilized nuclei to endonucleases such as DNase I, after which the cutting sites are quantitatively mapped (Weintraub and Groudine, 1976). The idea is that the cutting frequency is proportional to the accessibility of the DNA. However, permeabilization causes the dissociation of many proteins—due non‐physiological buffer conditions and a simple dilution effect—and will thus create a substantially distorted picture. This does not mean that DNase I hypersensitivity mapping is useless; it has proven to be a powerful method to identify regulatory elements in the genome (Hesselberth et al, 2009; Song and Crawford, 2010), but one should just not forget the caveats when interpreting the results in terms of in vivo chromatin accessibility.
Mapping of in vivo methylation by DNA adenine methyltransferase (Dam) (Gottschling, 1992; Singh and Klar, 1992; Kladde and Simpson, 1994; Wines et al, 1996) does not have the drawbacks of endonucleases. Dam can be expressed in vivo (i.e. without permeabilization) and it marks any accessible GATC motif by adding a small adenine‐methylation tag that does not occur endogenously in most eukaryotes. The adenine‐methylation pattern can be mapped afterwards by a variety of methods. A recent genome‐wide survey of Dam methylation in three Caenorhabditis elegans tissues (Sha et al, 2010) established that most of the genome is accessible, with only about two‐fold variation in Dam methylation signals. Much of this variation is explained by the local positioning of nucleosomes (consistent with earlier Dam assays in yeast; Kladde and Simpson, 1994), and not by effects over extended genomic regions. A carefully controlled study in budding yeast also found less than two‐fold differences in accessibility to both Dam between euchromatin and repressive heterochromatin marked by SIR proteins (Chen and Widom, 2005).
Other DNA‐interacting proteins were also found to have normal access to repressive chromatin. Exogenously expressed DBFs bind to their recognition motifs with almost the same affinity in active and repressive chromatin contexts (Chen and Widom, 2005; Filion et al, 2010), and also several endogenous factors were found to bind to promoters in repressive chromatin (Chen and Widom, 2005; Gao and Gross, 2008).
These results are in overall agreement with studies of chromatin accessibility at the microscopy level, done by comparing rates of diffusion of fluorescently labelled macromolecules inside morphologically defined heterochromatin and euchromatin. Such studies failed to detect substantial differences in accessibility for molecules with radial sizes up to ∼10 nm, which corresponds to protein complexes of roughly 1 MDa (Verschure et al, 2003; Gorisch et al, 2005; Pack et al, 2006). Only molecules of larger sizes appear to be specifically excluded from heterochromatic regions (Gorisch et al, 2005). Even mitotic chromosomes, which are highly compacted, were found to be readily accessible to transcription factor binding (Chen et al, 2005).
In summary, while steric hindrance of DBFs may occur locally at individual positioned nucleosomes, it seems unlikely that regulated accessibility through changes in chromatin compaction constitutes a major mechanism to control the binding of proteins to DNA. Before we consider an alternative model that explains how the binding of DBFs may be regulated, we need to take a close look at the protein composition of chromatin.
A human nucleus has about 3 × 107 nucleosomes, but there are many other chromatin proteins. For the ‘big picture’, it is worth to consider how many protein molecules are part of the chromatin in a typical human nucleus. As an upper limit, we will take the total number of protein molecules estimated to be present in a nucleus. A human cell is thought to contain nearly 1010 protein molecules (Lodish et al, 2000), so if we assume that 10% of these make up the cell nucleus, there are roughly 109 protein molecules per nucleus, or about 1 per 6 bp of DNA and about 30 per nucleosome (Figure 2). Of course, not all nuclear proteins are part of chromatin, but the majority probably is—even the RNA processing machinery is associated with chromatin (Perales and Bentley, 2009; Luco et al, 2011)—so this estimate cannot be too far off. The abundance of individual proteins varies over several orders of magnitude. Most transcription factors are present as 103–105 molecules per nucleus, while some linker histones and high‐mobility group proteins (which are both thought to modulate chromatin structure) are almost as abundant as nucleosomes (Kuehl et al, 1984; Paull et al, 1996).
It is also useful to consider the diversity of proteins that are part of chromatin. A recent proteomics study conservatively identified >500 proteins associated with purified human metaphase chromosomes (Ohta et al, 2010). Interphase chromatin is likely to be much more complex, because many proteins are known to dissociate from chromatin during mitosis. Systematic surveys in yeast indicate that about 1/3 of all cellular proteins are located in the nucleus (Huh et al, 2003). If one extrapolates this to human cells, then roughly 8000 different proteins are nuclear and thus candidates to be part of human chromatin. Of these, ∼1400 are catalogued as sequence‐specific DBFs (Vaquerizas et al, 2009). Taken together, it is reasonable to assume that chromatin in a typical human cell consists of several thousands of different proteins.
All of these proteins exert their functions by forming complexes with DNA, RNA and other proteins. Biochemically purified chromatin protein complex typically consist of 5–50 proteins. However, these only include stably interacting proteins; current estimates suggest that the entire human interactome may contain 130 000 interactions (Venkatesan et al, 2009). So in chromatin, there may be several tens of thousands of distinct pairwise protein–protein interactions, of which we currently know only a tiny fraction. If we also consider the many non‐coding RNA molecules that are being discovered as part of chromatin (Hung and Chang, 2010; Koziol and Rinn, 2010), it becomes clear that chromatin is an incredibly complex macromolecule.
Principal chromatin types
The myriad of chromatin proteins and possible interactions among them raises the theoretical possibility that every bit of the genome is bound by a distinct, unique combination of proteins. This idea of a ‘combinatorial code’ was articulated about a decade ago for histone modifications, which by themselves could potentially form many more combinations than the total number of nucleosomes in a nucleus (Strahl and Allis, 2000).
A systematic study of the combinatorial complexity of chromatin proteins was recently conducted in cultured Drosophila cells (Filion et al, 2010). Using the DamID technology (van Steensel et al, 2001), a broad set of 53 proteins, representing a cross‐section of the known functional classes of chromatin components, was mapped genome wide. Several histone marks were also mapped. Integrative analysis of the maps of the 53 proteins revealed that much of the binding profiles is explained by five principal chromatin types, which are each made up of unique combinations of proteins. While some proteins mark only one of these chromatin types, many proteins are shared by two to four types. Because the Greek word chroma means ‘colour’, the five principal chromatin types are indicated by the colours YELLOW, RED, BLUE, BLACK and GREEN (Figure 3). As outlined below, these five types have distinct and sometimes remarkable characteristics.
YELLOW and RED (high‐occupancy target) chromatin
Most transcriptionally active genes are marked by YELLOW or RED chromatin. YELLOW chromatin fits our general concept of active genes: it harbours components of basal transcription machinery, some DBFs, and enzymes that control histone acetylation. Perhaps surprisingly, this includes the histone deacetylases (HDACs) RPD3 and SIR2—these were originally thought to be involved in repression of gene activity, but HDACs have now also been found to be abundantly present on active genes in human cells (Wang et al, 2009).
RED chromatin is also primarily associated with active genes. Yet, RED chromatin has several unique features that distinguish it from YELLOW chromatin. First, it marks primarily genes that are tissue specific, in contrast to YELLOW chromatin, which has a strong preference for ubiquitously expressed housekeeping genes. Second, while RED and YELLOW genes cover a similar range of expression levels, genes in RED chromatin mostly lack the histone mark H3K36me3, as well as MRG15, a chromodomain protein that binds H3K36me3 in vitro. This absence of H3K36me3 is surprising, because it was previously thought to be a general mark of transcription elongation and have a role in the repression of antisense cryptic promoters in the transcription units (Li et al, 2007). Why this mark is lacking from active genes in RED chromatin is unclear.
The third and most puzzling property of RED chromatin is its enormous diversity of proteins. On average, regions marked by RED chromatin are bound by about 60% of all proteins that were tested. Recent mapping of ∼50 additional chromatin proteins again detected a similar fraction in RED chromatin (BvS lab, unpublished results). Extrapolation of these results implies that there must be hundreds of different proteins that associate with each RED chromatin region. Many of these proteins seem to be functionally unrelated: for example, the DNA‐binding factors GAGA factor, ecdysone receptor, MNT and Jun‐related antigen all co‐localize in red chromatin, yet there is no evidence for functional relationships between these factors. Hotspots where seemingly unrelated regulatory factors co‐localize were observed first in fly cells (Moorman et al, 2006), and more recently in Drosophila embryos, mouse ES cells and C. elegans (Chen et al, 2008; Li et al, 2008; MacArthur et al, 2009; Gerstein et al, 2010; modENCODE Consortium, 2010). Sometimes they are referred to as ‘high‐occupancy target’ (HOT) regions. They have been observed by two independent mapping techniques (ChIP and DamID), ruling out a technical artefact. Interestingly, re‐analysis of previously published ChIP‐chip data (Harbison et al, 2004) indicates that also the budding yeast genome harbours hotspots that are massively bound by a variety of factors—this feature was originally missed due to the use of a normalization algorithm (Harbison et al, 2004) that masks sites of high occupancy (H Bussemaker, personal communication). Thus, equivalents of RED chromatin appear to exist in many species.
RED/HOT regions have reduced nucleosome density, higher nucleosome turnover and enrichment of histone variant H3.3, which is linked to nucleosome displacement (Moorman et al, 2006; Filion et al, 2010; modENCODE Consortium, 2010). One might, therefore, postulate that RED regions are highly accessible and thus available for binding by many proteins. However, as discussed above, differences in accessibility along the genome appear of minor influence to DBFs. Moreover, this logic is self‐contradictory: it is difficult to imagine a DNA sequence being highly accessible (i.e. not bound by anything) and at the same time occupied by dozens of proteins. Indeed, it was empirically determined that the DNA‐binding domain of yeast Gal4 does not have preferential access to its binding motif in RED chromatin of fly cells (Filion et al, 2010). We, therefore, need to invoke a different, more specific mechanism. For example, the formation of RED chromatin may be started by a few ‘pioneer’ DBFs that bind to specific sequence motifs, which in turn recruit other proteins, which in turn recruit yet other proteins. A multitude of protein–protein interactions may thus drive the formation of a large, complex aggregate of proteins. It was found that the DNA‐binding domain of BICOID is largely dispensable for the targeting to hotspots (Moorman et al, 2006), which illustrates the prominent role of protein–protein interactions in the formation of this baffling type of chromatin.
What does RED/HOT chromatin do? Most likely, it is involved in the control of its target genes; perhaps RED regions represent a specialized type of enhancers. An interesting possibility is that RED regions may spatially cluster in so‐called transcription factories, which are nuclear foci where RNA polymerase II, various regulatory proteins and multiple transcribed genes come together (Sutherland and Bickmore, 2009; Cook, 2010). In addition, RED chromatin is the first to replicate during S‐phase, and enriched for origin recognition complex proteins (Filion et al, 2010; modENCODE Consortium, 2010, 21177974), raising the possibility that RED chromatin regions serve as early‐firing replication origins. Given the complexity of RED chromatin, it will be a substantial challenge to unravel its functions and molecular architecture.
BLUE: polycomb chromatin
BLUE chromatin is characterized by the presence of PcG proteins and the histone mark H3K27me3 (Filion et al, 2010). PcG proteins primarily repress transcription and exhibit a strong preference to bind to genes involved in the regulation of developmental processes. Detailed reviews of the molecular biology of PcG proteins can be found elsewhere (Muller and Verrijzer, 2009; Morey and Helin, 2010; Sawarkar and Paro, 2010).
A fascinating new type of chromatin is BLACK chromatin (Filion et al, 2010). In Drosophila Kc167 cells, this type of chromatin covers nearly half of the non‐repetitive genome. Essentially, all genes embedded in BLACK chromatin are transcriptionally silent; in total, BLACK chromatin contains nearly two‐thirds of all silent genes, and thus is the most prevalent repressive chromatin type. A survey of gene expression data in a variety of tissues has indicated that genes that are located in BLACK chromatin (in Kc167 cells) are generally expressed in only a few tissues. Thus, BLACK chromatin appears to be dedicated to the repression of a large set of tissue‐specific genes. Importantly, a transposable element carrying a reporter gene is more likely to be silenced when integrated in BLACK regions than in any of the four other major chromatin types (Filion et al, 2010). This indicates that BLACK chromatin is not just a passive marker of transcriptionally inactive chromatin, but rather makes an active contribution to repression of transcription.
How this repression is achieved is still unclear, but hints come from the handful of proteins that are now known to mark BLACK chromatin. One of these is the linker histone H1, which has previously been linked to repression of transcription (Woodcock et al, 2006). Another is Lamin, the main component of the NL, which implies that BLACK chromatin is preferentially located at the nuclear periphery (Figure 3). Other BLACK marker proteins include the AT‐hook protein D1, which has been implicated in higher‐order packaging of chromatin (Smith and Weiler, 2010) and SuUR, a regulator of replication of heterochromatic regions in polytene chromosomes (Zhimulev et al, 2003).
So far, no histone marks have been identified in BLACK chromatin, which may explain why this chromatin type has remained unnoticed until recently. It is, however, quite possible that BLACK‐specific histone modifications remain to be discovered. For example, EFFETE, a highly conserved homologue of yeast Ubc4/5 that exhibits histone ubiquitinylation activity in vitro, binds along BLACK chromatin, suggesting that one of the histones may carry a specific ubiquitinylation mark. It is probable that additional components of BLACK chromatin remain to be discovered. Clearly, there is still much to be learned about BLACK chromatin.
GREEN chromatin: HP1 and partners are conducive to transcription
GREEN chromatin is specifically marked by HP1 and SU(VAR)3‐9, together with several HP1‐associated proteins (HP2 through 6) and the histone modifications H3K9me2 and H3K9me3 (Greil et al, 2007; Filion et al, 2010; Riddle et al, 2010). It is traditionally referred to as ‘heterochromatin’, but with current knowledge it may be better to abandon this term (see below). GREEN chromatin occupies large domains in pericentric regions and on the ‘dot’ chromosome 4, both of which are rich in repetitive sequences. In addition, it is found on hundreds of genes scattered along the chromosome arms.
HP1 and SU(VAR)3‐9 are well known for their ability to repress reporter genes that are integrated into HP1‐rich regions (Girton and Johansen, 2008). Paradoxically, several genome‐wide studies in Drosophila have now established that most genes that are naturally bound by HP1 are transcriptionally active (Greil et al, 2003; de Wit et al, 2007; Johansson et al, 2007; Piacentini et al, 2009; Filion et al, 2010; Riddle et al, 2010). This is agreement with earlier observations of individual active genes in pericentric heterochromatin (Wakimoto and Hearn, 1990; Clegg et al, 1998). In cultured Kc167 cells, we have found that HP1 knockdown slightly reduces the expression of HP1‐bound genes (G Hogan and BvS, unpublished data), and in flies, pericentric genes show lowered activity in HP1 mutant flies (Lu et al, 2000; Schulze et al, 2005). Thus, at least in Drosophila, HP1 is primarily a modest activator of its natural target genes. With this new insight, the term ‘heterochromatin’ has become inappropriate, because it is traditionally used for repressive chromatin; the term GREEN chromatin provides a neutral alternative.
How can it be that HP1 (together with its partner proteins) represses certain reporter genes, while it is conducive to transcription of the vast majority of its natural target genes? The first important part of an explanation is the realization that all examples of heterochromatin‐repressed genes in Drosophila were historically selected for a repressive phenotype. For example, out of hundreds of flies that were treated with ionizing radiation to induce chromosome rearrangements, the rare ones with a variegating expression pattern were hand picked (for good reasons: the variegating phenotype is fascinating!). In another strategy, reporter genes were randomly inserted into the genome using transposable elements as vehicles, and again flies with a variegating phenotype were selected. By this strategy, about 65 different genes were discovered that variegate when inserted near centromeres, telomeres or on the ‘dot’ chromosome 4 (Tweedie et al, 2009). This may be a substantial number, but we should not forget that thousands of other genes have not been reported to variegate. Thus, could it be that only certain genes are repressed when placed in a GREEN chromatin environment? Detailed investigation of one classic position effect variegation (PEV) model suggests that this may indeed be the case (Vogel et al, 2009). The wm4 mutants, originally discovered due to variegation of the white eye colour gene, are caused by a translocation that joins white and flanking genes to pericentric heterochromatin. Molecular mapping revealed that HP1 spreads across the junction into the formerly euchromatic region and covers about 20 genes including white. Remarkably, only the expression of white is reduced, while the other genes remain essentially unaffected.
Thus, at least in Drosophila, the dogma that HP1 and partner proteins inhibit transcription was based on eye‐catching, yet non‐representative reporter gene data. At present, we can only speculate how these reporter genes are repressed by HP1, while natural target genes are not. Most of the genes that exhibit PEV when exposed to HP1 are tissue‐specific genes, which tend to be located in RED chromatin. It is striking that GREEN chromatin has a very different protein composition than RED chromatin, even though both are transcriptionally active. Moreover, GREEN and RED chromatin rarely are direct neighbours (U Braunschweig, GJ Filion, JG van Bemmel, BvS, unpublished data). These observations suggest that RED and GREEN chromatin may be incompatible. It is thus conceivable that insertion of a gene that normally carries RED chromatin into a GREEN chromatin environment may lead to destabilization of the RED chromatin, and as a consequence mis‐regulation of the gene. Future experiments should reveal whether this ‘incompatible chromatin types’ model provides a good explanation of PEV.
At present, it is still poorly understood why a specific set of several hundreds of active genes scattered throughout the fly genome are bound by HP1 and partners. HP1 target genes tend to be long and exon rich, raising the possibility that HP1 is needed to stimulate transcription elongation or regulation of RNA processing. Indeed, in Drosophila, HP1 interacts with the FACT elongation complex (Kwon et al, 2010). Furthermore, HP1 and partner proteins may have other functions, such as regulation of replication timing (Schwaiger et al, 2010).
Other chromatin classifications: Drosophila and other species
The five principal chromatin types that were identified in Drosophila cells represent the major combinations of proteins found among the set of 53 tested proteins. It may be possible to further refine this classification by a more fine‐grained algorithm. However, any sub‐classification of these five states will require objective statistical or biological criteria in order to avoid arbitrariness. Importantly, we have found recently that extending the set of binding maps with 50 additional (again broadly selected) proteins does not uncover additional chromatin types (JG van Bemmel, GJ Filion, A Rosado, W Talhout, BvS, unpublished data), indicating that the five types represent indeed the principal types.
Several other teams recently reported the classification of chromatin states based on large genome‐wide data sets. In these instances, the data sets were mostly restricted to histone modifications. A set of 18 histone marks in Drosophila cells yielded 9 or 30 combinatorial states depending on the algorithm used (Kharchenko et al, 2010). Because these data were obtained in a different cell line (S2 cells), comparison to the five colours in Kc167 cells should be interpreted with caution. Nevertheless, four of the nine states in S2 cells resemble BLACK, GREEN (2 ×) and BLUE chromatin in Kc cells. The remaining five states may represent sub‐states of RED and YELLOW chromatin. Because the large majority of histone marks is associated with active genes, there may have been a ‘magnifying glass’ effect on the classification of active chromatin, explaining why this study (Kharchenko et al, 2010) identified a larger number of active chromatin states than the study based on a broad selection of 53 proteins (Filion et al, 2010).
Interestingly, a recent study reported in this issue of EMBO Journal suggests that the plant Arabidopsis thaliana may also have a limited number of principal chromatin types. A survey of 11 histone marks together with DNA methylation identified four major combinations that together cover most of the genome (Roudier et al, 2011). Two of these states roughly resemble the GREEN and BLUE chromatin types found in Drosophila. A third state, which generally lacks any of the tested histone marks, may be the equivalent of BLACK chromatin. Finally, only one major type of transcriptionally active chromatin was found in Arabidopsis, unlike the two distinct states in Drosophila.
As many as 51 combinatorial chromatin states were identified in human lymphocytes, based on maps of 38 histone marks (Ernst and Kellis, 2010). This may be an over‐classification, because several of the combinations appeared highly similar. The authors grouped the 51 combinations into 5 major classes, which roughly correspond to active promoters, active transcription units, intergenic regions near active genes, transcriptionally inactive regions and repetitive sequences. In C. elegans, a more conservative analysis of 33 genome‐wide maps (mostly of histone marks) identified three major combinations (Gerstein et al, 2010).
It is not always straightforward to compare the identified chromatin types across species, because some proteins and histone marks have adopted new functions and interactions during evolution. For example, in Drosophila, lamin B is primarily associated with BLACK chromatin, whereas H3K9me2 is located in GREEN chromatin. In contrast, in mouse and human cells H3K9me2 and lamin B1 binding patterns overlap substantially (Guelen et al, 2008; Wen et al, 2009; Peric‐Hupkes et al, 2010). An intriguing possibility is that BLACK and GREEN chromatin have merged in mammals; it is also conceivable that only a few proteins/marks have swapped between chromatin types. Along the same lines, the HP1 orthologue in A. thaliana has evolved to bind to H3K27me3 (Zhang et al, 2007), pointing to task switching of components between GREEN and BLUE chromatin as defined in Drosophila. These examples illustrate that it will be necessary to map large and unbiased sets of chromatin components systematically in every species (and in every cell type) in order to define and compare the major chromatin types.
DBF guidance by chromatin types
DBFs are key players in gene regulation: they are the proteins that read the instructions encoded in regulatory DNA. It is, therefore, important to understand how DBFs interact with DNA. DBFs generally recognize a 4–8 bp sequence motif, often with substantial tolerance for variations in the motif. A typical 6 bp motif occurs by chance every ∼4 kb, so the human genome harbours nearly 1 million motif occurrences that are potentially bound by a DBF. Yet in vivo the vast majority of these motifs is not occupied by the DBF. Thus, additional specificity mechanisms must exist that direct DBFs to only a subset of its motif occurrences. One such mechanism is the occlusion of individual binding motifs by positioned nucleosomes (Hayes and Wolffe, 1992; Lieb and Clarke, 2005; Rando and Ahmad, 2007). As argued above, compaction of larger chromatin segments does not appear to be an effective mechanism to prevent access of DBFs. Instead, the different chromatin types may act as DBF‐specific guides through positive targeting mechanisms (Figure 4).
A survey of DBF–motif interactions has revealed that each DBF has a preference to bind to its motif in certain chromatin types only. For example, MNT binds to its motif preferentially in YELLOW and RED chromatin, while SU(HW) binds to its motif primarily in RED, BLUE and BLACK chromatin, even though the cognate motifs for both DBFs are present in all five chromatin types (Filion et al, 2010). Thus, each DBF has a preferred chromatin context in which it binds to its motif. These preferences are most likely due to the presence of chromatin‐associated ‘helper’ proteins that assist the DBFs. A helper protein may physically interact with the DBF and stabilize the binding of the DBFs to its motifs. Because each DBF may have different helper proteins, each chromatin type augments the interactions of a specific subset of DBFs with their motifs. Together, the chromatin types thus constitute a selection system that guides each DBF to its binding motifs in only some regions of the genome.
Big picture of chromatin: what's next?
Chromatin is highly complex, yet the simple global principles that have emerged provide a strong foundation for future research. Among these are several principles of spatial folding and chromatin composition (Box 1). Yet many questions wait to be answered. For example: what are the molecular mechanisms that drive the assembly of the principal chromatin types? How do these types differ between cell types and species? How do they affect gene expression and other DNA‐associated functions such as replication and repair? Do they contribute to epigenetic memory (Kaufman and Rando, 2010; Margueron and Reinberg, 2010), and if so, by what mechanism? How important are long‐range chromatin contacts for gene regulation? How are interactions of the genome with nuclear landmarks controlled? The rapid pace of technology development suggests that we will soon be able to get answers to these important questions, and perhaps we will uncover additional basic principles.
Supplementary data are available at The EMBO Journal Online (http://www.embojournal.org).
Conflict of Interest
The author declares that he has no conflict of interest.
Supplementary Movie 1
Supplementary Movie 2
Supplementary Movie 3
Supplementary Movie Legends
I thank members of my laboratory for helpful comments. This study was supported by an NWO‐ALW VICI grant.
- Copyright © 2011 European Molecular Biology Organization