Identification of DNA replication origins (ORIs) at a genome‐wide level in eukaryotes has proved to be difficult due to the high degree of degeneracy of their sequences. Recent structural and functional approaches, however, have circumvented this limitation and have provided reliable predictions of their genomic distribution in the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, and they have also significantly increased the number of characterized ORIs in animals. This article reviews recent evidence on how ORIs are specified and maintained in these systems and on their regulation and sensitivity to epigenetic signals. It also discusses the possible additional involvement of ORIs in processes other than DNA replication.
The correct unfolding of DNA instructions requires regulation in both time and space to ensure that genes will be expressed only in the appropriate cell types and developmental stages of the organism. In the case of DNA replication, temporal regulation is essential during the cell cycle to guarantee that chromosomes will duplicate before cell division. However, given that the entire genome must be replicated in each cell cycle, it is not immediately obvious why DNA synthesis could not start anywhere along the chromosomes and proceed until completion. Nevertheless, it has been long known that different regions of the eukaryotic chromosomes replicate at specific times during S phase, implying the existence of preferred sites for initiation.
Unlike the structural information encoding proteins, regulatory information is to some extent relieved from strict obedience to the primary sequence of DNA. Examples are widespread in the case of transcription, where gene promoters are made up of several degenerated elements spread over variable lengths of DNA, which makes it difficult to predict their localization in the genome on the basis of their sequence. A similar limitation applies to eukaryotic DNA replication origins (ORIs), even in the case of Saccharomyces cerevisiae where they span 120–150 bp and include some conserved elements (Newlon and Theis, 1993). Sequence‐based prediction of ORIs is even more difficult in Schizosaccharomyces pombe and mammals, where ORI regions are much longer and do not show any identifiable consensus elements (Dubey et al, 1996; Todorovic et al, 1999).
Our current view of the initiation of DNA replication has progressed enormously in recent years owing to the biochemical and genetic characterization of the protein complexes that bind to ORIs and couple their activity to cell cycle regulators (for recent reviews, see DePamphilis, 2003; Lucas and Raghuraman, 2003; McNairn and Gilbert, 2003; Mendez and Stillman, 2003; Weinreich et al, 2004). The present review focuses on the studies of replication initiation at the DNA sequence level and discusses recent work on the specification and epigenetic regulation of ORIs in several eukaryotic systems.
Genomic distribution of eukaryotic DNA replication origins
The difficulty of predicting ORIs on the basis of their sequence has been circumvented in S. cerevisiae by two functional genome‐wide approaches using DNA microarrays. One of them was based on density labelling to isolate newly replicated DNA (Raghuraman et al, 2001) and the other used chromatin immunoprecipitation (ChIP) with antibodies against several subunits of the origin recognition complex (ORC) and the mini‐chromosome maintenance (MCM) complexes (Wyrick et al, 2001). Microarray hybridization using density‐labelled or immunoprecipitated DNA as a probe predicted 332 and 429 ORIs, respectively. Another genome‐wide approach in S. cerevisiae relied on the two‐fold enrichment of sequences replicated at different times during S phase and identified 260 potential ORIs (Yabuki et al, 2002). While the high degree of concordance of the three approaches emphasizes their potential, discrepancies between them could be more interesting than coincidences since they might reflect specific properties of individual ORIs.
A recent attempt to identify DNA sequences capable of predicting ORIs in S. cerevisiae has led to the development of an algorithm based on a 268 bp consensus sequence derived from a training set of 26 previously known ORIs. The top 100 predictions showed 94% accuracy in predicting ORIs previously identified, but reliability decreased to 70% in the top 350 predictions (Breier et al, 2004). These results simultaneously illustrate the increasing power of bioinformatics for identifying regulatory elements and the difficulty inherent to their degeneracy in fitting them to a common pattern even in S. cerevisiae.
An alternative approach, which relied on base composition rather than on specific sequences, has been used to predict the localization of ORIs in S. pombe. Despite the lack of consensus elements, all previously identified ORIs in S. pombe colocalized with regions up to 1 kb long with an A+T content significantly higher than the genome average. Base composition analysis allowed the definition of a criterion that localized 384 A+T‐rich islands 0.5–1 kb long across the entire genome and functional analyses by two‐dimensional gel electrophoresis confirmed that approximately 90% of them colocalized with active ORIs (Segurado et al, 2003) (Figure 1A). The elevated A+T content of these regions makes them excellent targets for the Orc4 subunit of the ORC complex, whose N‐terminus contains nine AT‐hook domains that bind to A+T‐rich DNA with no strict sequence requirement (Chuang and Kelly, 1999). This mechanism seems to be specific of S. pombe, as this domain has not been found in the Orc4 subunit of other species studied so far.
In human cells, ChIP analyses with antibodies against Orc1 and Orc2 proteins have been used to identify DNA bound to ORC (Keller et al, 2002; Ladenburger et al, 2002). Over 50% of the immunoprecipitated DNA fragments had properties typical of CpG islands, consistent with the previous finding that DNA replication initiates at these regions in mammals (Delgado et al, 1998). Detailed characterization of two of the isolated fragments identified active ORIs at the CpG island promoters of the TOP1 gene and between the divergently transcribed PRKDC and MCM4 genes (Figure 1B). CpG islands are G+C‐rich, nonmethylated regions about 1 kb long that are associated with more than half of the promoters of all human and mouse genes (Antequera, 2003). They are bound by many transcription factors and their contribution to ORI activity has been studied at the human Lamin B2 ORI. Removal of the CpG island immediately adjacent to the replication initiation region at this locus drastically reduces the activity of this replication origin (Paixao et al, 2004). So far, A+T‐rich and CpG islands are the most reliable sequence‐based predictors for ORIs in S. pombe and mammals, respectively (Figure 1).
How are ORIs distributed relative to genes in eukaryotes? In S. cerevisiae there is no bias towards their localization to intergenic regions containing promoters (Raghuraman et al, 2001; Wyrick et al, 2001) while in S. pombe there is a clear preference for ORIs to map to such regions. Interestingly, ORI activity in S. pombe is not dependent on active transcription (Gómez and Antequera, 1999; Segurado et al, 2003). Several ORIs have also been mapped to promoter regions of the slime mould Physarum polycephalum where, in contrast with S. pombe, the activity of the ORIs associated with the promoters of the profilin A and profilin P genes shows a strict correlation with the developmentally regulated expression of both genes in the amoebae and in the plasmodium, respectively (Maric et al, 2003). Localization of ORIs close to promoters could benefit from enhanced accessibility to DNA mediated by chromatin remodelling complexes or by interaction between transcription factors and ORC. The contribution of transcription factors to ORI specification is emphasized by the observation that transcription factor binding to specific sites in plasmids replicating in Xenopus eggs determines the sites of replication initiation (Danis et al, 2004). In mammals, many ORIs have been found in close proximity to promoters although this bias could be partially due to the search for ORIs near well‐characterized genes (for a review, see Todorovic et al, 1999). Genome‐wide localization of ORIs in metazoa will assess the concordance between the transcription and replication profiles at a higher resolution than possible at present (Schübeler et al, 2002). This will be particularly interesting in the human genome given the unanticipated transcription of a large fraction of the genome and the localization of binding sites for transcription factors in vivo at many sites distant from previously identified promoters (Cawley et al, 2004).
Origin specification and maintenance: the art of being redundant without losing the job
A common feature of eukaryotic ORIs is that not all of them fire in every S phase. This implies that the genome is replicated by a subset of available ORIs and raises the question of why this apparent excess exists and how it is maintained (Bielinsky, 2003). This issue has recently been addressed by studying the effect of enforcing S. cerevisiae cells to enter the S phase using a reduced number of ORIs. S. cerevisiae cells devoid of the Cdk inhibitor Sic1 enter the S phase without activating approximately 25% of early ORIs (Lengronne and Schwob, 2002). As a consequence, the length of the S phase was doubled and cells entered mitosis without a fully replicated genome, bypassing the MEC1/RAD53 checkpoint. This premature mitosis resulted in a defective separation of chromatids, double‐strand breaks and gross chromosomal rearrangements (GCRs). On the other hand, orc2‐1 mutants that have reduced levels of Orc2 and a 30% reduction in the number of replication forks are hypersensitive to DNA‐damaging agents, which suggests the existence of a threshold in the number of active replication forks below which the activation of the intra‐S checkpoint is compromised (Shimada et al, 2002). Using a different approach, a high rate of plasmid loss and GCRs was detected in cells with reduced levels of prereplicative complex. This phenotype could be alleviated by increasing the number of ORIs in the plasmid or in the genomic region near the markers used to detect GCRs (Tanaka and Diffley, 2002). On the other hand, S. cerevisiae orc5‐1 mutants also show genetic instability that can be reduced by increasing the number of ORIs in the YAC used for the assay. This same study showed that, surprisingly, an increase in the number of ORIs reduced its stability in orc5‐70 and orc3‐70 mutants (Huang and Koshland, 2003). Together, these studies suggest that there is a lower and a higher density of ORIs beyond which genome stability becomes seriously compromised. It is possible that below a critical level, the distance between ORIs could be too large for stalled forks or lesions occurring during the S phase to be rescued by forks coming from nearby ORIs. By contrast, an excessive amount of ORIs could increase the generation of labile structures at ORIs, as suggested by the high level of mitotic recombination associated with their activity (Benard et al, 2001; Segurado et al, 2002; Lopes et al, 2003) or they might generate a number of forks above the threshold allowed by the intra‐S checkpoint (Shimada et al, 2002). In this regard, it is interesting that the fusion of adjacent replicons is counterbalanced by the activation of new ORIs during replication in Xenopus egg extracts, such that the number of forks is kept constant along most of the S phase (Hyrien et al, 2003).
A key point in this context is how ORIs are maintained in the genome during evolution, taking into account that the deletion of individual ORIs does not detectably affect either replication or chromosome stability. Addressing this question probably requires an understanding of how ORIs are specified in the genome. Understanding ORI specification is difficult because of the high degree of ORI degeneracy and the lack of a significant preference of ORC to bind ORI relative to non‐ORI sequences as shown in Drosophila (Remus et al, 2004), Xenopus (Danis et al, 2004) and mammals (Vashee et al, 2003; Schaarschmidt et al, 2004). ORI specification has been addressed in Xenopus, where replication initiates without sequence specificity in egg extracts and at very early developmental stages. In the case of the rDNA locus, initiation becomes restricted to the intergenic spacers after the mid‐blastula transition, coinciding with the onset of zygotic transcription of the rRNA genes (Hyrien et al, 1995). Another example of restriction in the potential to initiate replication is provided by analysis of hamster nuclei undergoing replication of their DNA in Xenopus egg extracts. These experiments have uncovered two stages in G1 called the timing decision point (TDP), where early and late replication domains are established, and the origin decision point (ODP), which selects only a fraction of the sites previously licensed in late telophase to be used in the next S phase (Okuno et al, 2001; Li et al, 2003). These observations suggest that ORI specification could be achieved by a progressive restriction of the potential to initiate replication from too many or undesired sites during development or during the cell cycle. The molecular basis of the reduction in the number of ORIs is unknown, but it is likely that chromatin remodelling associated with the transcriptional activation of zygotic genes in Xenopus or with the G1/S transition in mammalian cells would restrict initiation to only one subset of all the initially licensed sites. According to this scenario, ORIs could take advantage of—or parasitize—regions that are maintained in an accessible conformation for structural reasons or to facilitate transcription, as suggested by the preference of ORIs to map near promoters in many cases. This opportunistic specification would remove the selective pressure to maintain each single ORI in the genome for its individual contribution to replication, implying that their apparent excess would be an inevitable consequence of the availability of more potential initiation sites than the minimum required to replicate the genome. Chromatin accessibility, however, is unlikely to be the only requirement for ORI specification as several specific sequences ranging in size from 1 to 6 kb have been described that are capable of maintaining their activity at ectopic positions in the genome. These replicators encompass the replication initiation sites and include several essential modules that are not conserved between different ORIs (Liu et al, 2003; Aladjem and Fanning, 2004; Altman and Fanning, 2004; Paixao et al, 2004). As discussed in the following section, if ORIs are established at favourable chromatin regions, perhaps in combination with a preference for some degenerated sequences, their localization and activity might be expected to be influenced by many parameters and, therefore, to vary in different cell types and physiological conditions.
Epigenetic regulation of replication origins
Chromatin organization depends on epigenetic information encoded in postsynthetic modifications of histones and of DNA itself rather than on particular nucleotide sequences. In mammals, DNA methylation takes place at position 5 of the pyrimidine ring of approximately 4% of all cytosines and is mainly located in CpG dinucleotides. Methylated CpGs bind a family of methylated DNA binding proteins (MDBs) that, in general, contribute to transcriptional silencing through interaction with histone deacetylases and transcriptional corepressors (Hendrich and Tweedie, 2003). The effect of DNA methylation on ORI activity has recently been addressed in mouse and human inactive X chromosomes, where most CpG islands are methylated and transcriptionally silent as opposed to their nonmethylated and expressed status in the active homologues. The results indicated that replication initiation at CpG islands was comparable in both alleles (Cohen et al, 2003) but ORIs at active nonmethylated CpG islands replicated earlier than their inactive methylated counterparts during the S phase (Figure 2A) (Gómez and Brockdorff, 2004). These two studies indicate that CpG island methylation does not prevent ORI activity and, given that transcription is completely abolished upon CpG island methylation, raise the question of how ORC manages to assemble on methylated CpG islands. It has been shown by plasmid replication assays in Xenopus egg extracts that ORC—but not MCM—binding is affected by DNA methylation and that initiation coincides with the sites of MCM binding and is not restricted to regions where ORC is located (Harvey and Newport, 2003). It will be interesting to determine whether ChIP analysis with anti‐ORC antibodies is able to detect methylated CpG islands and whether the correlation between late activation and DNA methylation also applies to other ORIs at aberrantly methylated CpG islands in the autosomes that are often found in tumour cells.
In contrast to DNA, histones enjoy a much richer repertoire of modifications that include methylation, acetylation, phosphorylation and ubiquitination. Histone modification is widespread in yeast and its role in replication has been tackled by deleting the RPD3 histone deacetylase gene in S. cerevisiae. Monitoring the replication time across eight selected genomic regions containing ORIs revealed that in all cases, hyperacetylation advanced their activation time during the S phase and, remarkably, the relative advance correlated with the specific increase in the acetylation level of each ORI (Vogelauer et al, 2002). Another study, however, indicates that RPD3 deletion advances the activation of late ORIs but does not affect the timing of the early ORIs (Aparicio et al, 2004) (Figure 2B). The effect of histone acetylation on the activity of specific ORIs has been studied in S. cerevisiae by targeting the Gcn5 histone acetyltransferase close to the late ORI ARS1412. This resulted in a higher level of local histone acetylation and in a shift towards early activation (Vogelauer et al, 2002). In agreement with these observations, deletion of the histone deacetylase gene SIR2 in S. cerevisiae leads to a higher frequency of ORI firing in the rDNA locus (Pasero et al, 2002) (Figure 2C). Similarly, the activity of the ORI that controls developmental amplification of the chorion genes was increased after targeting the Drosophila histone acetyltransferase encoded by the chameau gene to its vicinity (Aggarwal and Calvi, 2004).
Epigenetic modifications have enormous combinatorial possibilities as shown by a recent genome‐wide analysis of the acetylation profile of 11 lysines in the four core histones of S. cerevisiae. This study has uncovered the existence of over 50 different groups of intergenic regions and genes that are coexpressed and participate in related physiological processes (Kurdistani et al, 2004). A similar scenario might allow ORIs to respond to a wide range of signals to accommodate the replication patterns to different situations.
The future: towards ORI diversity
DNA replication was a global issue long before the current globalization furore and is therefore particularly well suited for genome‐wide analyses. The modular organization and degeneracy of ORIs both in yeasts and mammals, together with a role for epigenetic modifications in their specification and regulation, suggest that ORIs could turn out to be as diverse as promoters. Thus, it would not be surprising to find housekeeping ORIs and cell type‐specific or developmental stage‐specific ORIs perhaps associated with the transcriptional profile or the specific physiology of the cells where they are active. For example, a recent study has shown that six ORIs across a 130 kb long region in the hamster AMPD2 locus have different patterns of activation in different cell lines. These patterns can be modified by addition of nucleotide precursors to the growth medium or by depleting them with hydroxyurea, indicating that nucleotide pools determine origin choice and their efficiency of activation (Anglana et al, 2003) (Figure 2D).
Another fascinating issue to be explored in the immediate future is the possibility that ORC or the passage of replication forks could regulate processes not directly related to the duplication of DNA. For example, human Orc2 and Orc6 and Drosophila Orc6 proteins localize to different subcellular regions including ORIs, centrosomes, centromeres and heterochromatin at different stages of the cell cycle (Prasanth et al, 2002; 2004; Chesnokov et al, 2003). The sequential distribution of these proteins suggests a role in coordinating replication and chromosome segregation with cytokinesis. Further connections between ORC and heterochromatin have also been reported in S. cerevisiae and Drosophila (for a review, see Leatherwood and Vas, 2003). A possible regulatory role for replication is illustrated by the requirement of a round of DNA replication to activate the expression of the HoxB locus in mouse P19 cells (Fisher and Mechali, 2003). DNA replication also regulates the switching of the mating type in S. pombe, which depends on a strand‐specific imprint established by the passage of the replication fork across the mat1 locus (Dalgaard and Klar, 2001). It is conceivable that the intrinsic differences between the replication of the leading and lagging strands of DNA could have been exploited also by other mechanisms to establish differences between mother and daughter cells after mitosis. Another consequence of this asymmetry is the strand‐specific rate of mutations due to the lower repair efficiency associated with leading strand synthesis in S. cerevisiae (Pavlov et al, 2003). This bias is maintained across several kilobases and will inevitably affect the sequence of genes flanking ORIs. The integration of the transcription and replication profiles and their comparison across related species will reveal in the near future to what extent ORIs could be strategically positioned in the chromosomes and how transcription and replication have contributed during evolution to the shaping and organization of the eukaryotic genome.
I am very grateful to Anja K Bielinsky, David Gilbert, Maria Gomez, Joel Huberman, MK Raghuraman, Etienne Schwob and three anonymous reviewers for advise and excellent criticism of the manuscript.
- Copyright © 2004 European Molecular Biology Organization