Abstract
Using data from primates, we show that molecular clocks in sites that have been part of a CpG dinucleotide in recent past (CpG sites) and non-CpG sites are of markedly different nature, reflecting differences in their molecular origins. Notably, single nucleotide substitutions at non-CpG sites show clear generation-time dependency, indicating that most of these substitutions occur by errors during DNA replication. On the other hand, substitutions at CpG sites occur relatively constantly over time, as expected from their primary origin due to methylation. Therefore, molecular clocks are heterogeneous even within a genome. Furthermore, we propose that varying frequencies of CpG dinucleotides in different genomic regions may have contributed significantly to conflicting earlier results on rate constancy of mammalian molecular clock. Our conclusion that different regions of genomes follow different molecular clocks should be considered when inferring divergence times using molecular data and in phylogenetic analysis.
Synopsis
The rate at which mutations accumulate in a genome, referred as a “molecular clock,” is an instrumental tool in molecular evolution and phylogenetics. Different types of mutations occur via distinctive molecular pathways. In particular, while most mutations occur from errors in DNA replication, spontaneous deamination of methylated CpG dinucleotides is another important source of mutation in mammalian genomes. Molecular clock studies typically combined all types of mutations together. In this paper, the authors analyze molecular clocks of replication-origin and methylation-origin mutations separately. By utilizing high-quality sequence data from several primate species and fossil calibration, the authors demonstrate that the two types of mutations follow statistically different molecular clocks. Methylation-origin mutations accumulate relatively constantly over time, while replication-origin mutations scale with generation-times. Therefore, the genomic molecular clock, as a whole, is shaped by the molecular origins of mutations that have accumulated over time. The authors' results have direct implications on phylogenetic analyses, estimation of species divergence dates, and studies of the mechanisms and processes of evolution, where molecular clocks are imperative.
Introduction
Organisms with longer generation-time tend to exhibit slower molecular clock than those with shorter generation-time, an effect known as “generation-time effect” [1–5]. However, the extent (or even the existence) of generation-time effect is of significant debate [3,6,7]. An opposing theory posits that molecular evolution occurs relatively constantly over time: in other words, molecular clocks are time dependent [6,8]. Here we show that molecular evolution follows both generation-time–dependent and time-dependent molecular clocks, depending upon the molecular origins of the mutations considered.
A generation-time–dependent molecular clock implies that the majority of single nucleotide substitutions in germlines arise from errors during DNA replication [3,9]. However, some mutations may occur independently from DNA replication. This is especially pertinent for transitions at CpG dinucleotides (henceforth, CpG substitutions). CpG substitutions are the most frequent single nucleotide substitutions in vertebrate genomes, accounting for more than a quarter of all substitutions between the genomes of human and chimpanzee [10,11]. Naturally, they play critical roles in several key genetic mechanisms and disease [12–16].
CpG dinucleotides are hypermutable because the cytosines in CpG dinucleotides are targets of DNA methylation in vertebrate genomes [17]. Methylated cytosine rapidly mutates to thymine via spontaneous deamination, causing a C to T (G to A in the complementary strand) transition [17,18]. While DNA replication occurs in a specialized stage of the cell cycle, methylation is not confined to replicating DNA: germline cells are methylated early in their development and stay methylated until global demethylation occurs after fertilization [19,20]. Therefore, methylation-origin mutations will accumulate in a rate proportional to the total amount of time germ cells are methylated between generations. In other words, the molecular clock at CpG dinucleotides should be relatively constant over time.
Indeed, statistical inferences using approximately 2 Mbp of sequence data have suggested that CpG substitutions follow relatively constant molecular clock in mammals [21]. In addition, a recent analysis of male mutation bias in humans and chimpanzees have shown that CpG dinucleotides exhibit much lower male mutation bias than other sites [22]. Since male-mutation bias is caused by the more frequent DNA replications in male germlines compared to female germlines [14], the finding that there is lower male mutation bias in CpG dinucleotides is consistent with the idea that CpG substitutions follow a relatively time-dependent molecular clock.
In this paper, we sought to directly compare genomic molecular clocks of CpG dinucleotides and other sites. To achieve this goal, we focused on catarrhines, specifically two hominoid species (human and chimpanzee) and two Old World monkeys (rhesus macaque and baboon). These four species are chosen because they satisfy two criteria. First, because these species are closely related, we can identify sites that have been part of a CpG dinucleotide in recent past (CpG sites) and other sites with high confidence [23]. Second, hominoids and Old World monkeys have markedly differently generation times. According to Gage [24], average generation times in Old World monkeys is 11.4 years, while in chimpanzees and humans, they are 22 and 28 years, respectively. As a consequence of the difference in generation times, evolutionary rates of replication-dependent substitutions are slower in hominoids than in Old World monkeys [2,4,25].
Utilizing genomic data from these species, we demonstrate that indeed CpG substitutions exhibit a relatively time-dependent molecular clock, in contrast to generation-time–dependent genomic molecular clock. Furthermore, we propose that heterogeneous molecular clocks among different genomic regions may have contributed to conflicting earlier results on the degree of generation-time effect in mammals.
Results/Discussion
Slower Molecular Evolution of Hominoid Genomes than Old World Monkey Genomes
We first reevaluated the difference in evolutionary rates between hominoids and Old World monkeys. We analyzed approximately 28 Mbp of genomic sequence alignments to compare rates in human (a hominoid) and baboon (an Old World monkey) using a relative rate test [4,26]. Sequence data from marmoset (a New World monkey) were used as an outgroup. We found that rates in humans are on average 28.4% slower than those in baboons in introns and intergenic regions (Table 1, p < 0.001), confirming earlier results [2,4,27]. Because data used in this analysis account for approximately 1% of the human genome and from several different chromosomes, we can conclude that the canonical genomic molecular clocks in primates exhibit significant generation-time effect.
Table 1.
We also constructed a five-species phylogeny of human, chimpanzee, baboon, macaque, and marmoset using data for 1.9 Mbp of sequences orthologous to the human chromosome 7 (hg17.chr7: 115404472–117281897; ENCODE region ENm001). High-quality sequence data are available for all five species analyzed in this study. Figure 1 shows a Neighbor-Joining tree [28] of the five species. Focusing on the ancestral hominoid and ancestral Old World monkey branches, the ratio of the number of substitutions in the Old World monkey branch to the hominoid branch is approximately 1.36, similar to the values estimated from the comparison between the human and baboon genomes. These results confirm the “hominoid rate slowdown” theory proposed more than 40 yeasr ago [9,25].
Our next goal was to compare the molecular clocks at CpG and non-CpG sites separately. However, because of the difficulty in correcting for multiple hits, we cannot easily analyze substitutions at CpG sites in this phylogenetic setting. Therefore, we proceeded to use data only in catarrhines, where we can accurately infer rates in CpG and non-CpG sites [12,22,23].
Different Molecular Clocks of CpG Sites and Non-CpG Sites
We constructed four-species alignments of two hominoids (human and chimpanzee) and two Old World monkeys (rhesus macaque and baboon) (Figure 2). These species pairs provide a unique opportunity to study time-dependent and generation-time–dependent clocks. Critical to our work, the divergence time between the hominoid pair is similar to that of the Old World monkey pair [27,29,30]. The split between human and chimpanzee is estimated to be 6 to 8 million years ago (Mya), based upon fossil records. In particular, the earliest fossil hominin, Sahelanthropus tchadensis, has been dated to late Miocene, at least 7 Mya [30,31]. The split between rhesus macaque and baboon is calibrated by using an estimate for the split between macaques and papionins. The earliest fossil evidence of papionins is dated to be 6 to 8 Mya [27,29]. Therefore, divergence times of the two species within each pair are similar. In other words, TO/TH ≈ 1 (Figure 2). In contrast to this similarity of within-pair divergence times, evolutionary rates are known to differ between these two groups: as explained in the introduction and demonstrated above, genomic evolutionary rates in hominoids are slower than rates in Old World monkeys.
We have two contrasting predictions for a time-dependent versus a generation-time–dependent molecular clock. For replication-origin (hence, generation-time–dependent) mutations, the pairwise sequence divergence in the Old World monkey pair (KO = KMY + KBY in Figure 2) should be greater than the pairwise sequence divergence in the hominoid pair (KH = KHX + KCX in Figure 2). On the other hand, a time-dependent molecular clock predicts that KO is similar to KH.
We examined the molecular clocks in CpG and non-CpG sites separately (see Materials and Methods). To directly compare mutations caused by deamination of methylated cytosines to other transitions occurring during replication, we first analyzed only C-to-T (and G-to-A) transitions. A distinctive pattern emerged: KO/KH is 1.03 in CpG sites (95% confidence interval [CI], 0.92 to 1.15), while it is 1.31 in non-CpG sites (95% CI, 1.25 to 1.37). These two types of sites clearly harbor different molecular clocks. Similar trends were discovered when introns and intergenic regions are considered separately, or when repetitive and nonrepetitive sequences are compared separately (Figure 3).
We then considered all single nucleotide substitutions that occurred in CpG and non-CpG sites and found the same pattern. The ratio KO/KH in non-CpG sites is 1.18 (95% CI, 1.15 to 1.22). In comparison, in CpG sites, KO/KH is 1.00 (95% CI, 0.89 to 1.11). Again, the results are similar when introns and intergenic regions are considered separately, or when repetitive and nonrepetitive sequences are compared separately.
Because human-chimpanzee (hominoid pair) and rhesus macaque-baboon (Old World monkey pair) are extremely closely related, estimates of pairwise sequence divergence are affected by common ancestral polymorphism [32–34]. The common ancestor of the human and chimpanzee is thought to have much larger effective population size than the current human population [35,36]. Rhesus macaque and baboon also harbor comparable levels of genetic diversity to hominoids. For example, Rogers and Kidd [37] reported the nucleotide diversity of Papio hamadryas to be approximately 0.3%. Wall et al. [38] estimated a nucleotide diversity of 0.13% in a noncoding region of rhesus macaques.
Such substantial ancestral polymorphism will effectively reduce the observed rate difference between hominoid and Old World monkey pair: the observed pairwise divergence between rhesus macaque and baboon (KO) is the sum of ancestral diversity (πY, see Figure 2) and the fixed difference between rhesus macaque and baboon (denoted as PO). Likewise, the pairwise divergence between human and chimpanzee, KH = πX + PH. We are interested in the ratio PO/PH while we only have access to KO/KH. When comparing distantly related species, the level of ancestral diversity is negligible relative to the fixed difference. However, between closely related species such as human-chimpanzee and macaque-baboon, ancestral diversity is substantial compared to the fixed difference. For example, πX can be as much as ½ PH [35]. Therefore, KO/KH will underestimate PO/PH.
To address this concern, we used the estimates obtained for CpG and non-CpG sites in hominoids [22] to correct for the effect of ancestral polymorphism. After such corrections, KO/KH for non-CpG sites is 1.18 to 1.26 (Table 2). In contrast, in CpG sites, KO/KH is close to 1.00 even after correcting for the effect of ancestral polymorphism using estimates for CpG sites (Table 2). However, these values should be taken with caution, given the uncertainties associated with ancestral diversity as well as with divergence time estimated from fossil records.
Table 2.
For completeness, we also analyzed the rate difference for CpG and non-CpG sites using the above three-species alignment (human, baboon, and marmoset). Even though this comparison is less reliable due to the difficulty in correcting for multiple hits (see above), we obtained similar results. We observe that the non-CpG sites (the majority of sites) show substantial rate difference between the human and the baboon genomes. In contrast, CpG sites show little difference in evolutionary rates between hominoid and Old World monkeys (Table 1).
In summary, CpG and non-CpG sites show statistically different molecular clocks in various phylogenetic comparisons, indicating that the difference in two types of molecular clocks is a salient picture of molecular evolution in primate genomes.
Factors that May Affect KO/KH for CpG and Non-CpG Sites
Here we review some of the potential factors that can affect our conclusions. An important assumption in our work is that the divergence time between the hominoid pair is similar to that of the Old World monkey pair. This was mainly based upon fossil records [27,29,30]. However, because fossil records are inherently associated with large variance in dates, let us consider the inference from molecular data.
If we measure the divergence between the Old World monkey pair to that between the hominoid pair in the five species phylogeny shown in Figure 1 (equivalent to KO/KH in Figure 2), it is 1.2. This is different from the ratio obtained from the comparison of the ancestral Old World monkey branch to the hominoid branch, which was 1.36. The discrepancy between these two estimates can be explained by at least two mechanisms, which are not mutually exclusive of each other.
First, as mentioned earlier, estimating evolutionary rates between closely related species, such as human-chimpanzee and macaque-baboon, is significantly affected by ancestral polymorphism [32–34]. If we use estimates of the ancestral polymorphism in hominoids [35,36] to correct for the effect of ancestral polymorphism, the ratio of KO/KH increases, close to the value estimated from the ancestral branch. For example, if we assume that the average nucleotide diversities of the ancestral Old World monkey and hominoid populations were 0.4%, the corrected ratio of KO/KH increases to 1.32.
The second possibility is that the actual time in the Old World monkey pair (TO) is slightly shorter than the time in the hominoid pair (TH). Because fossil records provide only the “minimum” divergence time between lineages, the actual divergence time can differ significantly, and the divergence of human and chimpanzee may have occurred before the divergence of macaque and baboon. Therefore, KO/KH will underestimate the true rate difference. According to this possibility, the CpG clock in our data also underestimates the actual rate difference, indicating that some fraction of CpG substitutions follows a generation-time–dependent molecular clock. We believe that this scenario at least partially explains the observed discrepancy, because some substitutions at CpG sites occur during replication. This interpretation is also in accord with the weak but still significant male mutation bias in hominoids [22].
Our study uncovered significant heterogeneity in the degree of generation time effect among different types of single nucleotide substitutions. In particular, when substitutions are divided into transitions and transversions, the latter exhibited less generation-time effect than transitions. In fact, in CpG sites, there were more transversions in the human-chimpanzee pair than in the baboon-macaque pair (58 versus 39). However, the numbers are rather small (since most substitutions at CpG sites are transitions due to methylation), so it is not clear whether this reflects a true underlying pattern. In non-CpG sites, the ratio KO /KH estimated from transitions was 1.31, while the ratio from transversions was 1.14 (the overall ratio was 1.18). Whether this discrepancy reflects differences in molecular mechanisms between transitions and transversions is an interesting question and should be pursued further.
Effect of CpG Dinucleotides on Hominoid Rate Slowdown and Mammalian Molecular Clock
Our findings shed important light on the controversy over mammalian molecular clock. Generation-time effect was clearly demonstrated when closely related species were compared or when noncoding sequences were used [21,27]. However, among relatively distant mammalian species, weak generation-time effect was observed [6,26]. Note that due to sequence availability and alignability, synonymous sites were often used when comparing distantly related species.
We propose that varying proportions of CpG dinucleotides in different data sources can contribute to conflicting conclusions on the nature of genomic molecular clocks. Three observations led to this hypothesis. First, CpG molecular clock runs much faster than clocks at other sites, at least in primates. Assuming that human and chimpanzee diverged 7 Mya [30], we estimate that CpG sites and non-CpG sites undergo single nucleotide substitutions at a rate of 1.03 × 10−8 per site per year and 0.68 × 10−9 per site per year, respectively, from our data. Second, molecular clocks at CpG sites are relatively constant over time. Third, the proportion of CpG dinucleotides is heterogeneous among different genomic regions [39]. In particular, 4-fold degenerate sites are enriched with CpG sites, over 10% [39], while noncoding regions have less than 3% CpG dinucleotides [22,39]. Hence, molecular clocks in regions with relatively abundant CpG sites (such as 4-fold degenerate sites) may be dominated by the rapid and time-dependent CpG clock, while regions relatively devoid of CpG sites (such as noncoding regions) follow generation-time–dependent molecular clock.
To investigate this prediction, we compared results from different studies in Table 3, focusing on two comparisons: between hominoids and Old World monkeys (hominoid rate slowdown), and between primates and rodents. Note that earlier studies on molecular clock did not consider CpG content as a determinant of molecular clock. Therefore, they did not investigate the effect of CpG content on molecular clock. Because some studies used noncoding regions while others used 4-fold degenerate sites, different studies analyzed different data in relation to CpG content (Table 3). We did not include the results from [6] in this table, because they removed a substantial amount of data that did not pass the “homogeneity test,” and the relationship between this test and CpG dinucleotide content is not clear. For example, they discarded 46% of the data in their human-mouse comparison [6].
Table 3.
We can now compare how the data in Table 3 fit our hypothesis. First, when we compare results from all sites, the rate difference between lineages is greater in noncoding regions than in 4-fold degenerate sites. Moreover, in noncoding regions, the rate difference for CpG sites is lower than for all sites or non-CpG sites. Similarly, in 4-fold degenerate sites, the rate difference in non-CpG sites is higher than in all sites. These trends support our hypothesis.
Since we have reasonable estimates of CpG and non-CpG rates in primates (see above), we can investigate how well our hypothesis fits the data in detail. The number of substitutions in hominoids since the split from Old World monkeys can be approximated as
where p is the proportion of CpG sites, kCpG and knon-CpG represent substitution rates per site per year in CpG sites and non-CpG sites, respectively, and T is the time since the split. The observed ratio of Old World monkey branch to hominoid branch can then be expressed as
where r represents the ratio of the branch lengths determined by the generation-time–dependent molecular clock. Figure 4 shows this ratio as a function of p, using the rates inferred from our data. In case when r = 1.4, the observed ratios from regions with 12% and 2.5% CpG dinucleotides (analogous to 4-fold degenerate sites and intergenic regions) are 1.12 and 1.29, respectively.
We compared these theoretical expectations to observed values by analyzing rates between hominoids and Old World monkeys in 4-fold degenerate sites, from 41 autosomal genes (Table S2). The proportion of 4-fold degenerate sites that belong to CpG dinucleotides in any of the three species compared in this dataset is 11.0%. This is likely an underestimate of the true proportions of sites that have been part of a CpG dinucleotide, since the divergence time between the three species is rather long. The ratio of the Old World monkey branch to the hominoid branch was 1.09 when all sites were used (Table 3). When we removed CpG-prone sites (sites preceded by C or followed by G, as used in [12,23,40]) from the 4-fold degenerate sites, the aforementioned ratio was increased to 1.27 (Table 3). Recall, when only noncoding sites were used, this ratio was 1.28 (Table 1), which increased to 1.31 when we removed CpG sites. The proportion of sites that belong to CpG dinucleotides in noncoding sites in our data is 2.5%. Therefore, these values are in excellent accord with the above-mentioned model.
It should be noted, however, that the above model ignores other factors that affect regional mutation rate variation, such as GC content and recombination [4,41]. Also, as discussed above, different mutations (such as transitions and transversions) may have different substitution rates between lineages. Hence, partitioning rates into only two categories is likely to be a simplification. Furthermore, identifying sites that have been part of a CpG dinucleotide in the past is a challenging problem [42,43]. Lineage-specific rates are also affected by ancestral generation times and effective population sizes. Further studies are necessary to determine the roles of generation-time–dependent and time-dependent molecular clocks on genome evolution.
Nevertheless, it is clear that the heterogeneity of molecular clocks due to different mutational origins can significantly alter rate differences between taxa. This effect should be taken into account when molecular clocks are used to infer divergence times and to reconstruct phylogenetic history.
Materials and Methods
Noncoding data mining and assembly.
Because accurate identification of CpG sites is critical in our analyses, we used two precautions. First, we analyzed sequences between closely related primates only. Earlier studies have shown that within catarrhines (hominoids and Old World monkeys), we can directly derive rates of CpG substitutions using comparative methods. Specifically, we can confidently determine “CpG sites” (sites for which the ancestral state was part of a CpG) and extract rates of CpG substitutions using parsimony [12,22,23]. Moreover, we can also identify sites that have not been a part of CpG dinucleotides (non-CpG sites), to be used as a control for replication-origin substitutions [12,22,23]. Second, we only used high-quality sequence data, because data obtained from whole genome assemblies include errors in sequencing and assembly that can cause erroneous conclusions regarding rate difference between lineages [34,44].
For the human-baboon-marmoset dataset, we obtained approximately 28 Mbp of high-quality data (BAC-based) from the ENCODE project [45].
For the human-chimpanzee-baboon-macaque (HCBM) dataset, we mined high-quality BAC-based sequences from GenBank. The HCBM dataset consists of BAC-based sequence data orthologous to human Chromosome 7 (hg17.chr7:114505472–117281897; Encode region ENm001). This is obtained by aligning NT_086357.2 [46], NT_165329.1 (chimpanzee), NT_086378.3 (baboon), and NT_165339.1 (macaque) sequences.
We assembled additional orthologous alignments among the four species using the following procedure. First, we searched the GenBank database for sequences from baboon (Papio anubis or P. hamadryas), macaque (Macaca mulatta), and chimpanzee (Pan troglodytes) BAC clones. We obtained sequence data for 377, 276, and 1,641 BACs from baboon, macaque, and chimpanzee, respectively. Next, we identified orthologous BAC clones among these species, using BLAST [47] and other methods as in [48]. We found 25 baboon BAC clones that had both macaque and chimpanzee orthologs. We then localized orthologous human region for each of these 35 orthologous clones using BLAT [49]. We reconfirmed the orthology between baboon, chimpanzee, and macaque BAC clones by ensuring that the regions where these BAC clones independently map to the human genome overlap with each other. We then removed the BAC clones overlapping with ENm001. Finally, we removed sequences from sex chromosomes. As a result, we obtained 16 genomic regions, shown in Table S1.
Analysis of 4-fold degenerate sites.
For primate comparison, all sequence data for the primate 4-fold degenerate site comparisons were downloaded from GenBank [50]. Accession numbers for all genes used in primate comparison are available in Table S2. A portion of the homologous genes in this dataset was also identified via the HOVERGEN database [51]. Sequence data for the human-mouse-dog comparison were downloaded from the Ensembl database [52]. Any genes that underwent recent gene duplications or did not meet the stringent minimum length of 445 nucleotides were removed from the dataset. Sequences were aligned using CLUSTALW [53] via a BioPerl package [54]. After alignment of homologous genes, any genes containing lineages with a negative K4 value were removed from the dataset.
For primate-rodent comparison, known genes from human, mouse, and dog were downloaded from Ensembl [52]. To find orthologous sequences, we used the OrthoMCL algorithm [55], which uses an all-to-all BLASTP results to generate a graph of orthologs and paralogs. We used default parameters except for E-value < 10−10 to ensure orthology. As a result, we constructed 3,494 orthologous gene trios among the three species. The next steps were performed as described in the primate comparison described above.
Sequence curation, data annotation, and statistical analyses.
CpG islands were identified using the algorithm by Takai and Jones [56] with the following conditions: GC content greater than 55%, observed/expected CpG contents greater than 0.65, length 200 or greater. Since the majority of CpG islands are hypomethylated and do not reflect substitutions of methylation origin, we removed them from further analysis.
Repetitive elements were annotated using the RepeatMasker program [57]. Noncoding regions are identified as in Elango et al [34].
The two-parameter model [58] was used to correct for multiple hits. We used a relative rate test [4,26] to test for rate difference between hominoid and Old World monkeys using New World monkey species as outgroup (Table S2). To compare rate difference between human and mouse, we used dog as an outgroup.
For classification and rate estimation of CpG sites and non-CpG sites, we used the method in Meunier et al. [12] to identify CpG and non-CpG sites. Specifically, CpG sites are defined as the middle base of the following patterns: XNG/XCG/XCG/XCG, with X denoting any nucleotide except C to avoid overlapping CpGs. N can occur in any of the four sequences. Sites fitting the complementary pattern (CGY/CGY/CGY/CNY, Y not G) are also considered as CpG sites. As a control, sites expected to have never been part of a CpG dinucleotides since the last common ancestor of the four species (“non-CpG sites”) are defined as sites not preceded by C nor followed by G [12,22]. Sites that do not satisfy either classification are defined as “ambiguous sites” and excluded from the analysis. A simulation study has shown that this classifying scheme can accurately identify CpG sites and non-CpG sites in catarrhines [23]. Substitutions are then inferred using unweighted parsimony using only such sites. Confidence intervals for estimated rates are derived from bootstrapping 10,000 times.
Supporting Information
Accession Numbers
The National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov) accession numbers for human, chimpanzee, baboon, macaque, and marmoset are NT_086357.2, NT_165329.1, NT_086378.3, NT_165339.1, and NT_086504.2, respectively.
Acknowledgments
We thank Ryan Raaum for information on primate generation times, Adam Eyre-Walker for discussions, and several anonymous reviewers for comments on the manuscript.
Abbreviations
- CI
confidence interval
- Mya
million years ago
Footnotes
Competing interests. The authors have declared that no competing interests exist.
A previous version of this article appeared as an Early Online Release on August 11, 2006 (DOI: 10.1371/journal.pgen.0020163.eor).
Author contributions. EV and SVY conceived and designed the experiments. SHK, NE, CW, EV, and SVY performed the experiments. SHK, NE, CW, EV, and SVY analyzed the data. SVY contributed reagents/materials/analysis tools. EV and SVY wrote the paper.
Funding. SY is supported by the funds from the Georgia Institute of Technology. CW is supported by the summer undergraduate research program in Quantitative Systems Biology and Mathematical Biology by NSF, EV is supported by a National Science Foundation Career grant.
References
- Nei M, Kumar S. Molecular evolution and phylogenetics. New York: Oxford University Press; 2000. [Google Scholar]
- Li WH, Ellsworth DL, Krushkal J, Chang BHJ, Hewett-Emmett D. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylogenet Evol. 1996;5:182–187. doi: 10.1006/mpev.1996.0012. [DOI] [PubMed] [Google Scholar]
- Li WH. Molecular evolution. Sunderland (Massachusetts): Sinauer; 1997. [Google Scholar]
- Yi S, Ellsworth DL, Li WH. Slow molecular clocks in Old World monkeys, apes, and humans. Mol Biol Evol. 2002;19:2191–2198. doi: 10.1093/oxfordjournals.molbev.a004043. [DOI] [PubMed] [Google Scholar]
- Laird CD, McConaughy BL, McCarthy BJ. Rate of fixation of nucleotide substitutions in evolution. Nature. 1969;224:149–154. doi: 10.1038/224149a0. [DOI] [PubMed] [Google Scholar]
- Kumar S, Subramanian S. Mutation rates in mammalian genomes. Proc Natl Acad Sci U S A. 2002;99:803–808. doi: 10.1073/pnas.022629899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S. Molecular clocks: Four decades of evolution. Nat Rev Genet. 2005;6:654–662. doi: 10.1038/nrg1659. [DOI] [PubMed] [Google Scholar]
- Easteal S, Collet C. Consistent variation in amino-acid substitution rate, despite uniformity of mutation rate: Protein evolution in mammals is not neutral. Mol Biol Evol. 1994;11:643–647. doi: 10.1093/oxfordjournals.molbev.a040142. [DOI] [PubMed] [Google Scholar]
- Goodman M. The role of immunologic differences in the phyletic development of human behavior. Hum Biol. 1961;33:131–162. [PubMed] [Google Scholar]
- Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000;156:297–304. doi: 10.1093/genetics/156.1.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. doi: 10.1038/nature04072. [DOI] [PubMed] [Google Scholar]
- Meunier J, Khelifi A, Navratil V, Duret L. Homology-dependent methylation in primate repetitive DNA. Proc Natl Acad Sci U S A. 2005;102:5471–5476. doi: 10.1073/pnas.0408986102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robertson KD, Wolffe AP. DNA methylation in health and disease. Nat Rev Genet. 2000;1:11–19. doi: 10.1038/35049533. [DOI] [PubMed] [Google Scholar]
- Li WH, Yi S, Makova K. Male-driven evolution. Curr Opin Genet Dev. 2002;12:650–656. doi: 10.1016/s0959-437x(02)00354-4. [DOI] [PubMed] [Google Scholar]
- Jones PA, Laird PW. Cancer epigenetics comes of age. Nat Genet. 1999;21:163–167. doi: 10.1038/5947. [DOI] [PubMed] [Google Scholar]
- Keshet I, Schlesinger Y, Farkash S, Rand E, Hecht M, et al. Evidence for an instructive mechanism of de novo methylation in cancer cells. Nat Genet. 2006;38:149–157. doi: 10.1038/ng1719. [DOI] [PubMed] [Google Scholar]
- Bird A. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8:1499–1504. doi: 10.1093/nar/8.7.1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duncan BK, Miller JH. Mutagenic deamination of cytosine residues in DNA. Nature. 1980;287:560–561. doi: 10.1038/287560a0. [DOI] [PubMed] [Google Scholar]
- Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16:6–21. doi: 10.1101/gad.947102. [DOI] [PubMed] [Google Scholar]
- Li E. Chromatin modification and epigenetic reprogramming in mammalian development. Nat Rev Genet. 2002;3:662–673. doi: 10.1038/nrg887. [DOI] [PubMed] [Google Scholar]
- Hwang DG, Green P. Inaugural Article: Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A. 2004;101:13994–14001. doi: 10.1073/pnas.0404142101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor J, Tyekucheva S, Zody M, Chiaromonte F, Makova KD. Strong and weak male mutation bias at different sites in the primate genomes: Insights from the human-chimpanzee comparison. Mol Biol Evol. 2006;23:565–573. doi: 10.1093/molbev/msj060. [DOI] [PubMed] [Google Scholar]
- Meunier J, Duret L. Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 2004;21:984–990. doi: 10.1093/molbev/msh070. [DOI] [PubMed] [Google Scholar]
- Gage TB. The comparative demography of primates: with some comments on the evolution of life histories. Annu Rev Anthropol. 1998;27:197–221. doi: 10.1146/annurev.anthro.27.1.197. [DOI] [PubMed] [Google Scholar]
- Goodman M. Evolution of the immunologic species specificity of human serum proteins. Hum Biol. 1962;34:104–150. [PubMed] [Google Scholar]
- Wu CI, Li WH. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci U S A. 1985;82:1741–1745. doi: 10.1073/pnas.82.6.1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steiper ME, Young NM, Sukrarna TY. Genomic data support the hominoid slowdown and an early Oligocene estimate for the hominoid-cercopithecoid divergence. Proc Nat Acad Sci U S A. 2004;101:17021–17026. doi: 10.1073/pnas.0407270101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
- Delson E, Tattersall I, Van Couvering JA, Brooks AS. Encyclopedia of human evolution and prehistory. 2nd edition. New York: Garland; 2000. pp. 166–171. [Google Scholar]
- Brunet M, Guy F, Pilbeam D, Mackaye HT, Likius A, et al. A new hominid from the Upper Miocene of Chad, Central Africa. Nature. 2002;418:145–151. doi: 10.1038/nature00879. [DOI] [PubMed] [Google Scholar]
- Brunet M, Guy F, Pilbeam D, Lieberman DE, Likius A, et al. New material of the earliest hominid from the Upper Miocene of Chad. Nature. 2005;434:752–755. doi: 10.1038/nature03392. [DOI] [PubMed] [Google Scholar]
- Ebersberger I, Metzler D, Schwartz C, Pääbo S. Genomewide comparison of DNA sequences between human and chimpanzees. Am J Hum Genet. 2002;70:1490–1497. doi: 10.1086/340787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makova KD, Li WH. Strong male-driven evolution of DNA sequences in humans and apes. Nature. 2002;416:624–626. doi: 10.1038/416624a. [DOI] [PubMed] [Google Scholar]
- Elango N, Thomas JW, Program NCS, Yi S. Variable molecular clocks in hominoids. Proc Nat Acad Sci U S A. 2006;103:1370–1375. doi: 10.1073/pnas.0510716103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen FC, Li WH. Genomic divergence between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001;68:444–456. doi: 10.1086/318206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wall JD. Estimating ancestral population sizes and divergence times. Genetics. 2003;163:395–404. doi: 10.1093/genetics/163.1.395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers J, Kidd KK. Nucleotide polymorphism, effective population size, and dispersal distances in the yellow baboons (Papio hamadryas cynocephalus) of Mikumi National Park, Tanzania. Am J Primatol. 1996;38:157–168. doi: 10.1002/(SICI)1098-2345(1996)38:2<157::AID-AJP4>3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
- Wall JD, Frisse LA, Hudson RR, Di Rienzo A. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am J Hum Genet. 2003;73:1330–1340. doi: 10.1086/380311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian S, Kumar S. Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res. 2003;13:838–844. doi: 10.1101/gr.1152803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keightley PD, Lercher MJ, Eyre-Walker A. Evidence for widespread degradation of gene control regions in hominoid genomes. PLoS Biol. 2005;3:e42. doi: 10.1371/journal.pbio.0030042. DOI: 10.1371/journal.pbio.0030042 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellmann I, Ebersberger I, Ptak SE, Paabo S, Przeworski M. A neutral explanation for the correlation of diversity with recombination rates in humans. Am J Hum Genet. 2003;72:1527–1535. doi: 10.1086/375657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siepel A, Haussler D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 2004;21:468–488. doi: 10.1093/molbev/msh039. [DOI] [PubMed] [Google Scholar]
- Arndt PF, Burge CB, Hwa T. DNA sequence evolution with neighbor-dependent mutation. 6th Annu Int Conf Comp Biol. 2002. pp. 32–38. [DOI] [PubMed]
- Taudien S, Ebersberger I, Glöckner G, Platzer M. Should the draft chimpanzee sequence be finished? Trends Genet. 2006;22:122–125. doi: 10.1016/j.tig.2005.12.007. [DOI] [PubMed] [Google Scholar]
- The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. [DOI] [PubMed] [Google Scholar]
- International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- Altschul DA, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion and rearrangement in the mouse and human genomes. Proc Nat Acad Sci U S A. 2003;100:11484–11489. doi: 10.1073/pnas.1932072100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kent WJ. BLAT: The BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2006;34:D16–D20. doi: 10.1093/nar/gkj157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duret L, Mouchiroud D, Gouy M. HOVERGEN: A database of homologous vertebrate genes. Nucleic Acids Res. 1994;22:2360–2365. doi: 10.1093/nar/22.12.2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, et al. Ensembl 2006. Nucleic Acids Res. 2006;34:D556–D561. doi: 10.1093/nar/gkj133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–1618. doi: 10.1101/gr.361602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Stoeckert CJJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takai D, Jones PA. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A. 2002;99:3740–3745. doi: 10.1073/pnas.052410099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smit AFA, Hubely R, Green P. RepeatMasker Open-3.0. 2004. Available: http://www.repeatmasker.org. Accessed 6 September 2006.
- Kimura M. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J Mol Evol. 1980;16:111–120. doi: 10.1007/BF01731581. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.