Significance
Nonallelic gene conversion (NAGC) is a driver of more than 20 diseases. It is also thought to drive the “concerted evolution” of gene duplicates because it acts to eliminate any differences that accumulate between them. Despite its importance, the parameters that govern NAGC are not well characterized. We developed statistical tools to study NAGC and its consequences for human gene duplicates. We find that the baseline rate of NAGC in humans is 20 times faster than the point mutation rate. Despite this high rate, NAGC has a surprisingly small effect on the average sequence divergence of human duplicates—and concerted evolution is not as pervasive as previously thought.
Keywords: gene conversion, gene duplicates, sequence evolution, GC bias, mutation rate
Abstract
Gene conversion is the copying of a genetic sequence from a “donor” region to an “acceptor.” In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role NAGC plays in various genetic diseases and the concerted evolution of gene families, the parameters that govern NAGC are not well characterized. Here, we survey duplicate gene families and identify converted tracts in 46% of them. These conversions reflect a large GC bias of NAGC. We develop a sequence evolution model that leverages substantially more information in duplicate sequences than used by previous methods and use it to estimate the parameters that govern NAGC in humans: a mean converted tract length of 250 bp and a probability of per generation for a nucleotide to be converted (an order of magnitude higher than the point mutation rate). Despite this high baseline rate, we show that NAGC slows down as duplicate sequences diverge—until an eventual “escape” of the sequences from its influence. As a result, NAGC has a small average effect on the sequence divergence of duplicates. This work improves our understanding of the NAGC mechanism and the role that it plays in the evolution of gene duplicates.
As a result of recombination, distinct alleles that originate from the two homologous chromosomes may end up on the two strands of the same chromosome. This mismatch (“heteroduplex”) is then repaired by synthesizing a DNA segment to overwrite the sequence on one strand, using the other strand as a template. This process is called gene conversion.
Although gene conversion is not an error but rather a natural part of recombination, it can result in the nonreciprocal transfer of alleles from one sequence to another, and can therefore be thought of as a “copy and paste” mutation. Gene conversion typically occurs between allelic regions (allelic gene conversion, AGC) (1). However, nonallelic gene conversion (NAGC) between distinct genetic loci can also occur when paralogous sequences are accidentally aligned during recombination because they are highly similar (2)—as is often the case with young tandem gene duplicates (3).
NAGC is implicated as a driver of over 20 diseases (2, 4, 5). The transfer of alleles between tandemly duplicated genes—or pseudogenes—can cause nonsynonymous mutations (6, 7), frameshifting (8), or aberrant splicing (9)—resulting in functional impairment of the acceptor gene. A recent study showed that alleles introduced by NAGC are found in 1% of genes associated with inherited diseases (5).
NAGC is also considered to be a dominant force restricting the evolution of gene duplicates (10–12). It was noticed half a century ago that duplicated genes can be highly similar within one species, even when they differ greatly from their orthologs in other species (13–16). This phenomenon has been termed “concerted evolution” (17). NAGC is an immediate suspect for driving concerted evolution, because it homogenizes paralogous sequences by reversing differences that accumulate through other mutational mechanisms (10, 13, 14, 18). Another possible driver of concerted evolution is natural selection. Both purifying and positive selection may restrict sequence evolution to be similar in paralogs (3, 11, 19–24). Importantly, if NAGC is indeed slowing down sequence divergence, it puts in question the fidelity of molecular clocks for gene duplicates (3, 25). To develop expectations for sequence and function evolution in duplicates, we must characterize NAGC and its interplay with other mutations.
In attempting to link NAGC mutations to sequence evolution, we need to know two key parameters: (i) the rate of NAGC and (ii) the converted tract length. These parameters have been mostly probed in nonhuman organisms with mutation accumulation experiments limited to single genes—typically, artificially inserted DNA sequences (26, 27). The mean tract length has been estimated fairly consistently across organisms and experiments to be a few hundred base pairs (28). However, estimates of the rate of NAGC vary by as much as eight orders of magnitude (26, 29–32)—presumably due to key determinants of the rate that vary across experiments, such as genomic location, sequence similarity of the duplicate sequences and the distance between them, and experimental variability (27, 33). Alternatively, evolutionary-based approaches (19, 34) tend to be less variable: NAGC has been estimated to be 10 to 100 times faster than point mutation in Saccharomyces cerevisiae (35), Drosophila melanogaster (36, 37), and human (19, 38–40). These estimates are typically based on single loci (but see refs. 41, 42). Recent family studies (43–45) have estimated the rate of AGC to be per base pair per generation. This is likely an upper bound on the rate of NAGC, since NAGC requires a misalignment of homologous chromosomes during recombination, while AGC does not.
Here, we estimate the parameters governing NAGC with a sequence evolution model. Our method is not based on direct empirical observations, but it leverages substantially more information than previous experimental and computational methods: We use data from a large set of segmental duplicates in multiple species, and exploit information from a long evolutionary history. We estimate that the rate of NAGC in newborn duplicates is an order of magnitude higher than the point mutation rate in humans. Surprisingly, we show that this high rate does not necessarily imply that NAGC distorts the molecular clock.
Results
To investigate NAGC in duplicate sequences across primates, we used a set of gene duplicate pairs in humans that we had assembled previously (46). We focused on young pairs where we estimate that the duplication occurred after the human–mouse split, and identified their orthologs in the reference genomes of chimpanzee, gorilla, orangutan, macaque, and mouse. We required that each gene pair have both orthologs in at least one nonhuman primate and exactly one ortholog in mouse. Since our inference methods implicitly assume neutral sequence evolution, we focused our analysis on intronic sequence at least bp away from intron–exon junctions. After applying these filters, our data consisted of bp of sequence in intronic regions from gene families (SI Appendix).
We examined divergence patterns (the partition of alleles in gene copies across primates) in these gene families. We noticed that some divergence patterns are rare and clustered in specific regions. We hypothesized that NAGC might be driving this clustering. To illustrate this, consider a family of two duplicates in human and macaque which resulted from a duplication followed by a speciation event—as illustrated in Fig. 1B (“Null tree”). Under this genealogy, we expect certain divergence patterns across the four genes to occur more frequently than others. For example, the gray sites in Fig. 1C can be parsimoniously explained by one substitution under the null genealogy. They should therefore be much more common than purple sites, as purple sites require at least two mutations. However, if we consider sites in which an NAGC event occurred after speciation (Fig. 1A and “NAGC tree” in Fig. 1B), our expectation for divergence patterns changes: Now, purple sites are much more likely than gray sites.
Mapping Recent NAGC Events.
We developed a Hidden Markov Model (HMM) which exploits the fact that observed local changes in divergence patterns may point to hidden local changes in the genealogy of a gene family (Fig. 1 B and C). In our model, genealogy switches occur along the sequence at some rate; the likelihood of a given divergence pattern at a site then depends only on its own genealogy and nucleotide substitution rates. Our method is similar to others that are based on incongruency of inferred genealogies along a sequence (47–49), but it is model-based and robust to substitution rate variation across genes (SI Appendix).
We applied the HMM to a subset of the gene families that we described above: families of four genes consisting of two duplicates in human and a nonhuman primate. Since the HMM assumes that the duplication preceded the speciation, we required that the overall intronic divergence patterns support this genealogy, using the software MrBayes (50). This requirement decreased the number of gene families considered to 39.
Applying our HMM, we identified putatively converted tracts in (46%) of the gene families, affecting 25.8% of the intronic sequence (Fig. 2A; see complete list of identified tracts in Datasets S1–S4). Previous studies estimate that only several percent of the sequence is affected by NAGC, but the definition of “affected sequence” statistic is arguably method-dependendent and therefore not directly comparable (41, 51, 52). Fig. 1D shows an example of the maximum likelihood genealogy maps for two gene families. The average length of the detected converted tracts is 880 bp (Fig. 2B). As previously discussed for other methods (27), this is likely an overestimate of the mean tract length of NAGC, because some identified NAGC tracts result from multiple NAGC events occurring in close proximity (SI Appendix, Fig. S2).
When an AT/GC heteroduplex DNA arises during AGC, it is preferentially repaired toward GC alleles (53, 54). We sought to examine whether the same bias can be observed for NAGC (53, 55–57). We found that converted regions have a high GC content (percentage of bases that are either guanine or cytosine): , compared with in matched unconverted regions (, two-sided t test; Fig. 2C). This base composition difference has been previously observed for histone paralogs (55). However, the difference could be a driver and/or a result of NAGC. To test whether NAGC preferentially repairs AT/GC heteroduplexes toward GC, we focused on sites that carry the strongest evidence of nucleotide substitution by NAGC—these are the sites with the “purple” divergence pattern as before (Fig. 1C). Using a parsimony consideration, we inferred the directionality of such substitutions involving both weak (A/T) and strong (G/C) nucleotides. We found that of these changes were weak to strong changes, compared with an expectation of through point mutation differences and GC-biased AGC alone (exact binomial test , and see SI Appendix and Fig. 2D). We estimate that this observed difference corresponds to a probability of in favor of strong alleles when correcting strong/weak heteroduplexes. Our estimate agrees with the GC bias estimated for AGC (43, 44). Among several possible repair mechanisms that could underlie biased gene conversion that we consider in a simulation study (58, 59), the most likely to underlie such a large bias is the base excision repair mechanism—in which the choice of strand to repair is independent for each heteroduplex (SI Appendix and Fig. 2E). Conversely, it has been shown that the dominant driver of GC bias in yeast acts over long tracts (like the mismatch repair mechanism) (58). This could suggest that different mechanisms drive GC bias in different species (as also suggested by ref. 59).
The power of our HMM is likely limited to recent conversions, where local divergence patterns show clear disagreement with the global intron-wide patterns; it is therefore applicable only in cases where NAGC is not so pervasive that it would have a global effect on divergence patterns (28, 60). Next, we describe a method that allowed us to estimate NAGC parameters without making this implicit assumption.
NAGC Is an Order of Magnitude Faster than Point Mutation.
To estimate the rate and the tract length distribution of NAGC, we developed a two-site model of sequence evolution with point mutation and NAGC (Methods). This model is inspired by the rationale that guided Hudson (61) and McVean et al. (62) in estimating recombination rates: While computing the full likelihood of a sequence evolving through both point mutation and NAGC is intractable, we were able to model the likelihood of the observed divergence between paralogs at a pair of nucleotides at a time. In short, mutation acts to increase—while NAGC acts to decrease—sequence divergence between paralogs. When the two sites under consideration are close by (with respect to the NAGC mean tract length), NAGC events affecting one site are likely to incorporate the other (Fig. 3A). Our model makes no prior assumptions on the frequency of NAGC: Unlike the tract detection method, multiple hits are accounted for in the likelihood of the two-site model.
For each pair of sites in each intron in our data, we computed the likelihood of the observed alleles in all available species, over a grid of NAGC rate and mean tract length values (Fig. 3B and SI Appendix). We then obtained maximum composite likelihood estimates (MLE) over all pairs of sites (ignoring the dependence between pairs).
We first estimated MLEs for each intron separately, and matched these estimates with (16) in exons of the respective gene. We found that NAGC rate estimates decrease as increases (Spearman , Fig. 3C). This trend is likely due to a slowdown in NAGC rate, or its complete stop, as the duplicates diverge in sequence. Since our model assumes a constant NAGC rate, we concluded that the model would be most applicable to lowly diverged genes and therefore limited our parameter estimation to introns with .
We define NAGC rate as the probability that a random nucleotide is converted per base pair per generation. We estimate this rate to be ( 95% nonparametric bootstrap CI, Fig. 3D). This estimate accords with previous estimates based on smaller sample sizes using polymorphism data (19, 27) and is an order of magnitude slower than the AGC rate (43, 44). We simultaneously estimated a mean NAGC tract length of bp ( nonparametric bootstrap CI)—consistent with estimates for AGC (43, 63) and with a metaanalysis of many NAGC mutation accumulation experiments and NAGC-driven diseases (27).
Live Fast, Stay Young? The Effect of NAGC on Neutral Sequence Divergence.
We next consider the implications of our results on the divergence dynamics of paralogs post duplication. In light of the high rate we infer, the question arises: If the divergence of paralogous sequences through point mutation is much slower than the elimination of divergence by NAGC (64, 65), should we expect gene duplicates never to diverge in sequence?
We considered several models of sequence divergence (SI Appendix). First, we considered a model where NAGC acts at the constant rate that we estimated throughout the duplicates’ evolution (“continuous NAGC”). In this case, divergence is expected to plateau around , and concerted evolution continues for a long time [red line in Fig. 4; in practice, there will eventually be an “escape” through a chance rapid accumulation of multiple mutations (11, 66)]. However, NAGC is hypothesized to be contingent on high sequence similarity between paralogs.
We therefore considered two alternative models of NAGC dynamics: first, a model in which NAGC acts only while the sequence divergence between the paralogs is below some threshold (“global threshold”); second, a model in which the initiation of NAGC at a site is contingent on perfect sequence homology at a short 400-bp flanking region upstream from the site [“local threshold”, (2, 27, 67)]. The local threshold model yielded a similar average trajectory to that in the absence of NAGC. A global threshold of as low as may lead to an extended period of concerted evolution as in the continuous NAGC model. A global threshold of results in a different trajectory. For example, with a global threshold of 3%, duplicates born at the time of the primates’ most recent common ancestor would diverge at 3.9% of their sequence, compared with 5.7% in the absence of NAGC (Fig. 4 and SI Appendix, Figs. S10–S12 show trajectories for other rates and threshold values).
Lastly, we asked what these results mean for the validity of molecular clocks for gene duplicates. We examined the explanatory power of these different theoretical models for synonymous divergence in human duplicates. We wished to obtain an estimate of the age of duplication that is independent of between the human duplicates; we therefore used the extent of sharing of both paralogs in different species as a measure of the duplication time. For example, if a duplicate pair was found in human, gorilla, and orangutan—but only one ortholog was found in macaque—we estimated that the duplication occurred at the time interval between the human–macaque split and the human–orangutan split. Except for the continuous NAGC model (or global threshold 4.5%), all models displayed similar broad agreement with the data (Fig. 4).
The small effect of NAGC on divergence levels is intuitive in retrospect: For identical sequences, NAGC has no effect. Once differences start to accumulate, there is only a small window of opportunity for NAGC to act before the paralogous sequences escape from its hold. This suggests that neutral sequence divergence (e.g., ds) may be an appropriate molecular clock even in the presence of NAGC (as also suggested by refs. 41, 46, and 68).
Discussion
In this work, we identify recently converted regions in humans and other primates, and estimate the parameters that govern NAGC. Previously, it has been somewhat ambiguous whether concerted evolution observations were due to natural selection, abundant NAGC, or a combination of the two (3, 22, 23). Today, equipped with genomic data, we can revisit the pervasiveness of concerted evolution; the data in Fig. 4 suggest that, in humans, duplicates’ divergence levels are roughly as expected from the accumulation of point mutations alone. When we plugged in our estimates for NAGC rate, most mechanistic models of NAGC also predicted a small effect on neutral sequence divergence. This result suggests that neutral sequence divergence may be an appropriate molecular clock even in the presence of NAGC.
One important topic left for future investigation is the variation of NAGC parameters. Our model assumes constant action of NAGC through time and across the genome to get a robust estimate of the mean parameters. However, substantial variation likely exists across gene pairs due to factors such as recombination rate, sequence context, physical distance between paralogs (SI Appendix, Fig. S9), and sequence similarity. These factors can also have very different distributions in pervasive, highly homologous sequences other than segmental gene duplicates. For example, long terminal repeats comprise several percent of the genome, and experience pervasive NAGC (69).
Our estimates for the parameters that govern the mutational mechanism alone could guide future studies of other forces shaping the evolution of gene duplicates, such as natural selection. Together with contemporary efforts to measure the effects of genomic factors on gene conversion, our results may clarify the potential of NAGC to drive disease, improve the dating of molecular events, and further our understanding of the evolution of gene duplicates.
Methods
Gene Families Data.
To investigate NAGC in duplicate sequences, we used a set of 1,444 reciprocal best-matched protein-coding gene pairs in the human reference genome that we had assembled previously (46) using the human reference genome (build 37). We focused on young pairs consistent with a duplication after the human–mouse split, and identified their orthologs in the reference genomes of chimpanzee, gorilla, orangutan, macaque, and mouse (Table S1). We focused our analysis on intronic sequences at least bp away from intron–exon junctions. For each of the two inference tasks, we applied additional method-specific filters (SI Appendix)–leaving us with 75 gene families for parameter estimation and 39 gene families for inference of converted tracts.
Two-Site Model Transition Matrix.
We consider the evolution of two biallelic sites in two duplicate genes as a discrete homogeneous Markov Process. We describe these four sites with a four-bit vector (“state vector”). The state corresponds to allele at the “left” site in copy A, allele at the left site in copy B, allele at the “right” site in copy A, and allele at the right site in copy B. The labels 0 and 1 are defined with respect to each site separately—the state does not mean that the left and right sites necessarily have the same allele. We first derive the (per generation) transition probability matrix. There are two possible events that may result in a transition: point mutations which occur at a rate of per site per generation (64) and NAGC. The probability of a site being converted per generation is . We consider these mutational events to be rare and ignore terms of the order , , and . For example, consider the per-generation transition probability from 0110 to 0100, for two sites that are bp apart. This transition can happen either through point mutation at the right site of copy A or by NAGC from copy B to copy A involving the right site but not the left. The transition probability is therefore
where is the probability of a conversion event including one of the sites given that it includes the other. Similarly, we can derive the full transition probability matrix (SI Appendix). We note that our parameterization ignores possible mutations to (third and fourth) unobserved alleles.
We next derive . Following previous work (28), we model the tract length as geometrically distributed with mean . It follows that the probability of a conversion including one site conditional on it includes the other is
by the memorylessness of the geometric distribution. In SI Appendix, we show that recombination (with a breakpoint between the two sites) has a negligible effect on .
Lastly, we turn to compute transition probabilities along evolutionary timescales. Each datum consists of state vectors (corresponding to two biallelic sites in two paralogs) encoding the alleles in the human reference genome and one to four other primate reference genomes. The mouse two-bit state (two sites in one gene) will only be used to set a prior on the root of the tree (SI Appendix). We assume a constant tree—namely, a fixed topology and constant edge lengths as defined in SI Appendix, Fig. S5. We used estimates for primate split times from ref. 70, and assumed a constant generation time of 25 y. Each node corresponds to a state. We assume that—for both mutation types—substitution occurs at a rate equal to the corresponding mutation rate. Therefore, the transition probability matrix for the edge between node i and node j is
Supplementary Material
Acknowledgments
We thank Eilon Sharon, Doc Edge, Kelley Harris, David Knowles, Noah Rosenberg, Adam Siepel, and two anonymous reviewers for helpful discussions/comments on the manuscript. This work was funded by National Institutes of Health Grants HG008140 and HG009431 and by the Howard Hughes Medical Institute. A.H. and Z.G. were supported, in part, by fellowships from the Stanford Center for Computational, Evolutionary and Human Genomics.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. A.S. is a guest editor invited by the Editorial Board.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1708151114/-/DCSupplemental.
References
- 1.Mitchell MB. Aberrant recombination of pyridoxine mutants of Neurospora. Proc Natl Acad Sci USA. 1955;41:215–220. doi: 10.1073/pnas.41.4.215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chen JM, Cooper DN, Chuzhanova N, Férec C, Patrinos GP. Gene conversion: Mechanisms, evolution and human disease. Nat Rev Genet. 2007;8:762–775. doi: 10.1038/nrg2193. [DOI] [PubMed] [Google Scholar]
- 3.Innan H, Kondrashov F. The evolution of gene duplications: Classifying and distinguishing between models. Nat Rev Genet. 2010;11:97–108. doi: 10.1038/nrg2689. [DOI] [PubMed] [Google Scholar]
- 4.Bischoff J, et al. Genome-wide identification of pseudogenes capable of disease-causing gene conversion. Hum Mutat. 2006;27:545–552. doi: 10.1002/humu.20335. [DOI] [PubMed] [Google Scholar]
- 5.Casola C, Zekonyte U, Phillips AD, Cooper DN, Hahn MW. Interlocus gene conversion events introduce deleterious mutations into at least 1% of human genes associated with inherited disease. Genome Res. 2012;22:429–435. doi: 10.1101/gr.127738.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Heinen S, et al. De novo gene conversion in the RCA gene cluster (1q32) causes mutations in complement factor H associated with atypical hemolytic uremic syndrome. Hum Mutat. 2006;27:292–293. doi: 10.1002/humu.9408. [DOI] [PubMed] [Google Scholar]
- 7.Watnick TJ, Gandolph MA, Weber H, Neumann HP, Germino GG. Gene conversion is a likely cause of mutation in pkd1. Hum Mol Genet. 1998;7:1239–1243. doi: 10.1093/hmg/7.8.1239. [DOI] [PubMed] [Google Scholar]
- 8.Roesler J, et al. Recombination events between the p47-phoxgene and its highly homologous pseudogenes are the main cause of autosomal recessive chronic granulomatous disease. Blood. 2000;95:2150–2156. [PubMed] [Google Scholar]
- 9.Lorson CL, Hahnen E, Androphy EJ, Wirth B. A single nucleotide in the SMN gene regulates splicing and is responsible for spinal muscular atrophy. Proc Natl Acad Sci USA. 1999;96:6307–6311. doi: 10.1073/pnas.96.11.6307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nei M. Molecular Evolutionary Genetics. Columbia Univ Press; New York: 1987. [Google Scholar]
- 11.Fawcett JA, Innan H. Neutral and non-neutral evolution of duplicated genes with gene conversion. Genes. 2011;2:191–209. doi: 10.3390/genes2010191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hartasánchez Da, Vallès-Codina O, Brasó-Vives M, Navarro A. Interplay of interlocus gene conversion and crossover in segmental duplications under a neutral scenario. G3 (Bethesda) 2014;4:1479–1489. doi: 10.1534/g3.114.012435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Smith GP, Hood L, Fitch WM. Antibody diversity. Annu Rev Biochem. 1971;40:969–1012. doi: 10.1146/annurev.bi.40.070171.004541. [DOI] [PubMed] [Google Scholar]
- 14.Smith GP. Cold Spring Harbor Symposia on Quantitative Biology. Vol 38. Cold Spring Harbor Lab Press; Cold Spring Harbor, NY: 1974. Unequal crossover and the evolution of multigene families; pp. 507–513. [DOI] [PubMed] [Google Scholar]
- 15.Brown DD, Sugimoto K. 5S DNAs of Xenopus laevis and Xenopus mulleri: Evolution of a gene family. J Mol Biol. 1973;78:397–415. doi: 10.1016/0022-2836(73)90464-6. [DOI] [PubMed] [Google Scholar]
- 16.Li WH, Graur D. Fundamentals of Molecular Evolution. Sinauer Associates; Sunderland, MA: 1991. [Google Scholar]
- 17.Zimmer E, Martin S, Beverley S, Kan Y, Wilson AC. Rapid duplication and loss of genes coding for the alpha chains of hemoglobin. Proc Natl Acad Sci USA. 1980;77:2158–2162. doi: 10.1073/pnas.77.4.2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ohta T. How gene families evolve. Theor Popul Biol. 1990;37:213–219. doi: 10.1016/0040-5809(90)90036-u. [DOI] [PubMed] [Google Scholar]
- 19.Innan H. A two-locus gene conversion model with selection and its application to the human RHCE and RHD genes. Proc Natl Acad Sci USA. 2003;100:8793–8798. doi: 10.1073/pnas.1031592100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Teshima KM, Innan H. Neofunctionalization of duplicated genes under the pressure of gene conversion. Genetics. 2007;1398:1385–1398. doi: 10.1534/genetics.107.082933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Storz JF, et al. Complex signatures of selection and gene conversion in the duplicated globin genes of house mice. Genetics. 2007;177:481–500. doi: 10.1534/genetics.107.078550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sugino RP, Innan H. Selection for more of the same product as a force to enhance concerted evolution of duplicated genes. Trends Genet. 2006;22:642–644. doi: 10.1016/j.tig.2006.09.014. [DOI] [PubMed] [Google Scholar]
- 23.Mano S, Innan H. The evolutionary rate of duplicated genes under concerted evolution. Genetics. 2008;180:493–505. doi: 10.1534/genetics.108.087676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hanikenne M, et al. Hard selective sweep and ectopic gene conversion in a gene cluster affording environmental adaptation. PLoS Genet. 2013;9:e1003707. doi: 10.1371/journal.pgen.1003707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Casola C, Conant GC, Hahn MW. Very low rate of gene conversion in the yeast genome. Mol Biol Evol. 2012;29:3817–3826. doi: 10.1093/molbev/mss192. [DOI] [PubMed] [Google Scholar]
- 26.Jinks-Robertson S, Petes TD. High-frequency meiotic gene conversion between repeated genes on nonhomologous chromosomes in yeast. Proc Natl Acad Sci USA. 1985;82:3350–3354. doi: 10.1073/pnas.82.10.3350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mansai SP, Kado T, Innan H. The rate and tract length of gene conversion between duplicated genes. Genes. 2011;2:313–331. doi: 10.3390/genes2020313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mansai SP, Innan H. The power of the methods for detecting interlocus gene conversion. Genetics. 2010;184:517–527. doi: 10.1534/genetics.109.111161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yang D, Waldman AS. Fine-resolution analysis of products of intrachromosomal homeologous recombination in mammalian cells. Mol Cell Biol. 1997;17:3614–3628. doi: 10.1128/mcb.17.7.3614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Whelden Cho J, Khalsa GJ, Nickoloff JA. Gene-conversion tract directionality is influenced by the chromosome environment. Curr Genet. 1998;34:269–279. doi: 10.1007/s002940050396. [DOI] [PubMed] [Google Scholar]
- 31.Taghian DG, Nickoloff JA. Chromosomal double-strand breaks induce gene conversion at high frequency in mammalian cells. Mol Cell Biol. 1997;17:6386–6393. doi: 10.1128/mcb.17.11.6386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lichten M, Haber J. Position effects in ectopic and allelic mitotic recombination in Saccharomyces cerevisiae. Genetics. 1989;123:261–268. doi: 10.1093/genetics/123.2.261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schildkraut E, Miller CA, Nickoloff JA. Gene conversion and deletion frequencies during double-strand break repair in human cells are controlled by the distance between direct repeats. Nucleic Acids Res. 2005;33:1574–1580. doi: 10.1093/nar/gki295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sawyer S. Statistical tests for detecting gene conversion. Mol Biol Evol. 1989;6:526–538. doi: 10.1093/oxfordjournals.molbev.a040567. [DOI] [PubMed] [Google Scholar]
- 35.Takuno S, Innan H. Selection to maintain paralogous amino acid differences under the pressure of gene conversion in the heat-shock protein genes in yeast. Mol Biol Evol. 2009;26:2655–2659. doi: 10.1093/molbev/msp211. [DOI] [PubMed] [Google Scholar]
- 36.Thornton K, Long M. Excess of amino acid substitutions relative to polymorphism between X-linked duplications in Drosophila melanogaster. Mol Biol Evol. 2005;22:273–284. doi: 10.1093/molbev/msi015. [DOI] [PubMed] [Google Scholar]
- 37.Arguello JR, Chen Y, Yang S, Wang W, Long M. Origination of an X-linked testes chimeric gene by illegitimate recombination in Drosophila. PLoS Genet. 2006;2:e77. doi: 10.1371/journal.pgen.0020077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rozen S, et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature. 2003;423:873–876. doi: 10.1038/nature01723. [DOI] [PubMed] [Google Scholar]
- 39.Bosch E, Hurles ME, Navarro A, Jobling MA. Dynamics of a human interparalog gene conversion hotspot. Genome Res. 2004;14:835–844. doi: 10.1101/gr.2177404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hurles ME. Gene conversion homogenizes the CMT1A paralogous repeats. BMC Genomics. 2001;2:11. doi: 10.1186/1471-2164-2-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dumont BL, Eichler EE. Signals of historical interlocus gene conversion in human segmental duplications. PLoS One. 2013;8:e75949. doi: 10.1371/journal.pone.0075949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ji X, Griffing A, Thorne JL. A phylogenetic approach finds abundant interlocus gene conversion in yeast. Mol Biol Evol. 2016;33:2469–2476. doi: 10.1093/molbev/msw114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Williams AL, et al. Non-crossover gene conversions show strong GC bias and unexpected clustering in humans. Elife. 2015;4:e04637. doi: 10.7554/eLife.04637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Halldorsson BV, et al. The rate of meiotic gene conversion varies by sex and age. Nat Genet. 2016;48:1377–1384. doi: 10.1038/ng.3669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Narasimhan VM, et al. Estimating the human mutation rate from autozygous segments reveals population differences in human mutational processes. Nat Commun. 2017;8:303. doi: 10.1038/s41467-017-00323-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lan X, Pritchard JK. Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals. Science. 2016;352:1009–1013. doi: 10.1126/science.aad8411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Balding DJ, Nichols RA, Hunt DM. Detecting gene conversion: Primate visual pigment genes. Proc Biol Sci. 1992;249:275–280. doi: 10.1098/rspb.1992.0114. [DOI] [PubMed] [Google Scholar]
- 48.Jakobsen IB, Wilson SR, Easteal S. The partition matrix: Exploring variable phylogenetic signals along nucleotide sequence alignments. Mol Biol Evol. 1997;14:474–484. doi: 10.1093/oxfordjournals.molbev.a025784. [DOI] [PubMed] [Google Scholar]
- 49.Weiller GF. Phylogenetic profiles: A graphical method for detecting genetic recombinations in homologous sequences. Mol Biol Evol. 1998;15:326–335. doi: 10.1093/oxfordjournals.molbev.a025929. [DOI] [PubMed] [Google Scholar]
- 50.Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17:754–755. doi: 10.1093/bioinformatics/17.8.754. [DOI] [PubMed] [Google Scholar]
- 51.Jackson MS, et al. Evidence for widespread reticulate evolution within human duplicons. Am J Hum Genet. 2005;77:824–840. doi: 10.1086/497704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Dennis MY, et al. The evolution and population diversity of human-specific segmental duplications. Nat Ecol Evol. 2016;1:69. doi: 10.1038/s41559-016-0069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285–311. doi: 10.1146/annurev-genom-082908-150001. [DOI] [PubMed] [Google Scholar]
- 54.Odenthal-Hesse L, Berg IL, Veselis A, Jeffreys AJ, May CA. Transmission distortion affecting human noncrossover but not crossover recombination: A hidden source of meiotic drive. PLoS Genet. 2014;10:e1004106. doi: 10.1371/journal.pgen.1004106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Galtier N. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 2003;19:65–68. doi: 10.1016/s0168-9525(02)00002-1. [DOI] [PubMed] [Google Scholar]
- 56.Assis R, Kondrashov AS. Nonallelic gene conversion is not GC-biased in Drosophila or primates. Mol Biol Evol. 2011;29:1291–1295. doi: 10.1093/molbev/msr304. [DOI] [PubMed] [Google Scholar]
- 57.McGrath CL, Casola C, Hahn MW. Minimal effect of ectopic gene conversion among recent duplicates in four mammalian genomes. Genetics. 2009;182:615–622. doi: 10.1534/genetics.109.101428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lesecque Y, Mouchiroud D, Duret L. GC-biased gene conversion in yeast is specifically associated with crossovers: Molecular mechanisms and evolutionary significance. Mol Biol Evol. 2013;30:1409–1419. doi: 10.1093/molbev/mst056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Arbeithuber B, Betancourt AJ, Ebner T, Tiemann-Boege I. Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc Natl Acad Sci USA. 2015;112:2109–2114. doi: 10.1073/pnas.1416622112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Betrán E, Rozas J, Navarro A, Barbadilla A. The estimation of the number and the length distribution of gene conversion tracts from population DNA sequence data. Genetics. 1997;146:89–99. doi: 10.1093/genetics/146.1.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hudson RR. Two-locus sampling distributions and their application. Genetics. 2001;159:1805–1817. doi: 10.1093/genetics/159.4.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002;160:1231–1241. doi: 10.1093/genetics/160.3.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Jeffreys AJ, May CA. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet. 2004;36:151–156. doi: 10.1038/ng1287. [DOI] [PubMed] [Google Scholar]
- 64.Kong A, et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature. 2012;488:471–475. doi: 10.1038/nature11396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ségurel L, Wyman MJ, Przeworski M. Determinants of mutation rate variation in the human germline. Annu Rev Genomics Hum Genet. 2014;15:47–70. doi: 10.1146/annurev-genom-031714-125740. [DOI] [PubMed] [Google Scholar]
- 66.Teshima KM, Innan H. The effect of gene conversion on the divergence between duplicated genes. Genetics. 2004;166:1553–1560. doi: 10.1534/genetics.166.3.1553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Jinks-Robertson S, Michelitch M, Ramcharan S. Substrate length requirements for efficient mitotic recombination in Saccharomyces cerevisiae. Mol Cell Biol. 1993;13:3937–3950. doi: 10.1128/mcb.13.7.3937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Dumont BL. Interlocus gene conversion explains at least 2.7 % of single nucleotide variants in human segmental duplications. BMC Genomics. 2015;16:456. doi: 10.1186/s12864-015-1681-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Trombetta B, Fantini G, D’Atanasio E, Sellitto D, Cruciani F. Evidence of extensive non-allelic gene conversion among LTR elements in the human genome. Sci Rep. 2016;6:28710. doi: 10.1038/srep28710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Scally A, et al. Insights into hominid evolution from the gorilla genome sequence. Nature. 2012;483:169–175. doi: 10.1038/nature10842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.