Abstract
The strong reduction in the frequency of recombination in heterozygotes for an inversion and a standard gene arrangement causes the arrangements to become partially isolated genetically, resulting in sequence divergence between them and changes in the levels of neutral variability at nucleotide sites within each arrangement class. Previous theoretical studies on the effects of inversions on neutral variability have assumed either that the population is panmictic or that it is divided into 2 populations subject to divergent selection. Here, the theory is extended to a model of an arbitrary number of demes connected by migration, using a finite island model with the inversion present at the same frequency in all demes. Recursion relations for mean pairwise coalescent times are used to obtain simple approximate expressions for diversity and divergence statistics for an inversion polymorphism at equilibrium under recombination and drift, and for the approach to equilibrium following the sweep of an inversion to a stable intermediate frequency. The effects of an inversion polymorphism on patterns of linkage disequilibrium are also examined. The reduction in effective recombination rate caused by population subdivision can have significant effects on these statistics. The theoretical results are discussed in relation to population genomic data on inversion polymorphisms, with an emphasis on Drosophila melanogaster. Methods are proposed for testing whether or not inversions are close to recombination–drift equilibrium, and for estimating the rate of recombinational exchange in heterozygotes for inversions; difficulties involved in estimating the ages of inversions are also discussed.
Keywords: inversion polymorphisms, genetic diversity, sequence divergence, recombination rate, population subdivision
Introduction
Naturally occurring genetic factors that massively reduce the rate of crossing over in Drosophila melanogaster when heterozygous were discovered by A.H. Sturtevant over 100 years ago (Sturtevant 1917), who later showed them to be inversions of segments of chromosomes (Sturtevant 1926). Inversion polymorphisms were long regarded as a curiosity of species of Drosophila and other “higher” Diptera, where they can readily be detected cytologically using the polytene chromosomes of the larval salivary glands. The largest body of information concerning the properties of inversions has been accumulated in studies of numerous Drosophila species (Krimbas and Powell 1992a; Kapun and Flatt 2019). The recent application of genome sequencing technology to the natural populations of many organisms, including humans, has revealed that inversions are much more abundant than what was previously thought. They are sometimes associated with striking phenotypic polymorphisms, such as the social chromosomes of ants and the behavioral and color polymorphisms of the ruff and white-throated sparrow (Wellenreuther and Bernatchez 2018; Villoutreix et al. 2021). This has led a surge of interest in the evolutionary significance of inversion polymorphisms; however, the Drosophila studies suggest that the vast majority of polymorphic inversions have little or no effects on visible phenotypes, although effects on quantitative traits such as body size and some fitness components have been detected (Krimbas and Powell 1992a; Kapun and Flatt 2019).
It is evident that the genetic isolation of different gene arrangements due to the suppression of crossing over in heterokaryotypes must play a major role in the processes that lead to the evolution of inversion polymorphisms, although the nature of these processes is still an open research question, which may well have multiple answers (Krimbas and Powell 1992b; Wellenreuther and Bernatchez 2018; Kapun and Flatt 2019; Villoutreix et al. 2021). The present paper is concerned with the consequences of the suppression of crossing over in heterozygotes for an inversion and a standard arrangement (heterokaryotypes) for the levels of diversity within and between the 2 arrangements at neutral nucleotide sites contained inside or close to the genomic region covered by the inversion. A critical parameter in determining these genomic features is the rate of recombinational exchange between arrangements in heterokaryotypes, the “gene flux” of Navarro et al. (1997). Evidence about this rate has been provided by genetic studies of recombination in heterokaryotypes. An important mechanism by which crossing over is suppressed or reduced in frequency during Drosophila female meiosis in heterozygotes for a paracentric inversion and the standard arrangement was proposed by Sturtevant and Beadle (1936), whose genetic data suggested that single crossovers produce dicentric and acentric chromosomes that fail to be included in the egg nucleus.
Recent studies of heterozygous Drosophila inversions suggest a near-total suppression of crossing over within the regions covered by a heterozygous inversion and for a substantial distance outside it (Koury 2023; Li et al. 2023). Recombination suppression is, however, often incomplete, with gene conversion and/or double crossing over causing exchanges of alleles between arrangements in heterokaryotypes (Sturtevant and Beadle 1936; Chovnick 1973; Krimbas and Powell 1992b; Navarro et al. 1997; Crown et al. 2018; Korunes and Noor 2019; Li et al. 2023). The conversion of double-strand breaks into noncrossover-associated gene conversion events apparently plays a major role in the suppression of crossing over, as well as allowing exchange to occur via gene conversion (Gong et al. 2005; Crown et al. 2018; Li et al. 2023). But the low rate of occurrence of such gene conversion events and double crossovers (a consensus value for Drosophila is about 10–5 per base pair in female meiosis: Korunes and Noor 2019) means that different arrangements are likely to become substantially genetically differentiated from each other, as a result of the interplay between mutation, genetic drift, and recombination (Ishii and Charlesworth 1977; Navarro et al. 1997, 2000; Andolfatto et al. 2001; Guerrero et al. 2012).
Several lines of evidence, including studies of changes in inversion frequencies in experimental populations, temporal fluctuations in inversion frequencies, and clinal patterns of inversion frequencies, as well as direct measurements of fitness components, show that many inversions are maintained at intermediate frequencies by natural selection (Krimbas and Powell 1992b; Kapun and Flatt 2019; Mérot et al. 2020). An initial complete loss of variability within haplotypes carrying a newly arisen, selectively favored inversion (the hitchhiking effect of Maynard Smith and Haigh 1974), will be eroded by the occurrence of new mutations that spread as a result of genetic drift and recombinational exchange with the standard arrangement. Once an inversion subject to balancing selection has established itself at an intermediate frequency within a population, there will be a gradual approach to mutation–drift–recombination equilibrium with respect to neutral or nearly neutral variants (Navarro et al. 2000). It is therefore important to consider the effects on variability of both the approach to equilibrium and the equilibrium situation.
The equilibrium properties of an inversion polymorphism with respect to neutral variability are similar to those for neutral loci linked to a single locus subject to balancing selection, first modeled by Ohta and Kimura (1970) using diffusion equations and by Strobeck (1983) using identity probabilities. Over the following 2 decades, subsequent modeling work using the structured coalescent process shed further light on patterns of neutral variability at sites linked to loci under balancing selection or divergent local selection, e.g. Kaplan et al. (1988), Hudson and Kaplan (1988), Hudson (1990), Takahata (1990), Charlesworth et al. (1997), Nordborg (1997), and Navarro et al. (2000), reviewed by Charlesworth et al. (2003). These studies showed that, when a balanced polymorphism has been maintained for a time that is much longer than the mean coalescent time for a neutral locus, we expect to see linkage disequilibrium (LD) between the target of selection and neutral sites closely linked to the target of selection, LD among these neutral sites, sequence divergence between haplotypes carrying the different alleles at the target of selection, and enhanced variability in the population as a whole around the target of selection.
These patterns are all reflections of the same process of divergence by drift and mutation among the partially isolated populations represented by the alleles at the target of selection, with a strong analogy with the outcome of mutation and drift in a geographically structured population (Hudson 1990; Charlesworth et al. 2003). Inversions are only special because of the strong suppression of recombination in inversion heterozygotes, which extends to regions outside the inversion that are close to the inversion breakpoints (Navarro et al. 1997; Crown et al. 2018; Korunes and Noor 2019; Koury 2023; Li et al. 2023). The patterns just described are thus much more likely to be detected with inversions than with single locus polymorphisms, due to the much larger region of the genome involved.
A basic finding of the early theoretical work was that significant associations between a balanced polymorphism and variants at a neutral locus at statistical equilibrium under drift and recombination are only likely to be observed if the rate of recombinational exchange in double heterozygotes for the loci involved is of the order of the reciprocal of the effective population size, Ne, as is also the case for a pair of neutral loci (Hill and Robertson 1968; Ohta and Kimura 1969, 1971). Andolfatto et al. (2001) applied the theoretical results of Navarro et al. (2000) to the available data on D. melanogaster inversions and concluded that large effects of inversions on patterns of variability are likely to be seen only for loci close to inversion breakpoints, where exchange is probably most strongly inhibited, and that the observed patterns suggested that the inversions in question were of relatively recent origin compared with the age of the species.
Interest in the question of the effect of inversion polymorphisms on patterns of variability at sites that are either inside inversions or closely linked to inversion breakpoints has increased with the advent of whole genome sequencing, which has both greatly increased our ability to detect and characterize inversions and allows much more fine-scaled analyses of patterns of variability in genomic regions associated with inversions, e.g. Corbett-Detig and Hartl (2012), Cheng et al. (2012), Mérot et al. (2021), and Kapun et al. (2023). Recent population genomic analyses of several classic inversion polymorphisms in D. melanogaster suggest the following patterns (Corbett-Detig and Hartl 2012; Kapun et al. 2023):
There is a modest increase in overall nucleotide site diversity in the regions covered by inversions and adjacent to them, reflecting a low level of sequence divergence between arrangements relative to the genome-wide average diversity.
If genetic differentiation between inverted and standard arrangements relative to within-arrangement diversity is measured by FST-like statistics, a much stronger effect is seen, with mean between-arrangement FST values of the order of 0.1 to 0.2 in the interior of the inversion and in the regions adjacent to the inversions, with a sharp increase in regions close to the breakpoints.
For a low-frequency inversion such as In(3R)P in the Zambian population (Kapun et al. 2023), there is a lower than genome-wide average diversity at sites within the inversion and a higher than average diversity within the standard arrangement.
Much more information of this kind is likely to become available, and its interpretation requires a solid basis in population genetics theory. A number of theoretical investigations on the effect of balanced polymorphisms on variability at linked sites have extended the older work described above, without, however, greatly modifying the basic conclusions, e.g. Innan and Nordborg (2003), Guerrero et al. (2012), Rousset et al. (2014), and Zeng et al. (2021). A limitation of most of the theoretical work on the effects of selectively maintained polymorphisms on neutral diversity is that it assumes a single, randomly mating population, with the exception of Charlesworth et al. (1997), Nordborg (1997), Guerrero et al. (2012), and Rousset et al. (2014), who considered the case of divergent and/or balancing selection in a pair of populations. Nordborg and Innan (2003) examined a more general model of population structure but relied on coalescent process simulations to generate predictions.
While it is often considered that population subdivision in organisms like Drosophila is likely to have only minor effects on genetic diversity, given the generally low levels of FST among populations (Singh and Rhomberg 1987; Schaeffer 2002; Lack et al. 2015, 2016), there is evidence from studies of allelism of recessive lethals that local deme sizes in Drosophila are somewhat restricted in size, with limited migration among them (Wright et al. 1942; Mukai and Yamaguchi 1974; Ives and Band 1986); this conclusion has recently been confirmed by a resequencing study of a single US population over time (Lange et al. 2022). It is known that population subdivision with a large number of demes increases the amount of LD among neutral loci when genomes are sampled from the same deme (Wakeley and Lessard 2003), because population subdivision increases local homozygosity, thereby reducing the effectiveness of recombination. This effect should also apply to associations between a locus under balancing selection and linked neutral sites. It is therefore important to examine the consequences of such subdivision for the effect of a diallelic balanced polymorphism on variability at a linked neutral site, and this is a major focus of the present paper.
For brevity, the locus under selection is referred here to as exhibiting an inversion polymorphism but the results apply to any Mendelian locus with 2 alleles maintained by balancing selection. An island model of a metapopulation of large size, divided into a finite number of demes of equal size, is assumed, with the same migration rate between all pairs of demes. As shown by Wakeley and Aliacar (2001), the properties of such a model are likely to provide a good approximation to more realistic scenarios, such as a 2-dimensional stepping stone model, provided that the number of demes is large. In order to obtain simple results for equilibrium populations, it is assumed that the inversion is the derived state and that selection on the inversion is sufficiently strong that it has risen quickly to an equilibrium frequency that is constant across demes. The properties of variability at the neutral locus, LD between the neutral locus and the inversion, and the extent of divergence between karyotypes at drift–mutation–recombination equilibrium are studied first, followed by an examination of the approach to equilibrium. Here, “karyotype” is used to denote the state of a haplotype with respect to the arrangement which it carries.
Recursion relations for mean pairwise coalescent times are used to obtain simple approximate expressions for the expected diversity and divergence statistics relevant to an inversion polymorphism at equilibrium under recombination and drift, and for the approach to their equilibrium values following the sweep of an inversion to a stable intermediate frequency. The effects of an inversion polymorphism on patterns of LD are also examined. The reduction in effective recombination rate caused by population subdivision can have significant effects on these statistics, and hence on estimates of the ages of inversions. Methods are proposed for testing whether or not inversions are close to recombination–drift equilibrium, and for estimating the rate of recombinational exchange in heterozygotes for inversions; a new method for determining the variances of pairwise coalescence times is also described. It is concluded that many of the observed patterns of diversity at putatively neutral sites associated with inversion polymorphisms in D. melanogaster are consistent with their being close to mutation–recombination–drift equilibrium.
The model and its analysis
Assume that an autosomal inversion (In) is maintained at a frequency x and the standard arrangement (St) has frequency y = 1 − x. Without loss of generality, we can assume x when considering equilibrium results; if this is not the case, then, In and St can simply be interchanged. Parameters for In and St are denoted by subscripts 1 and 2, respectively. The population is assumed to be divided into local populations (demes) that are at equilibrium under mutation, genetic drift and migration at loci independent of the inversion. A Wright–Fisher model of reproduction is assumed, so that the effective population size of a deme is equal to its adult population size. An island model with a large number of demes, d, each with population size N is assumed, so that the migration effective population size (Nagylaki 1998) is NT = Nd. Migration between populations occurs at rate m per generation.
The level of equilibrium neutral differentiation between demes at autosomal loci that are independent of the inversion is measured by FST (Wright 1951), which is defined here as 1 minus the ratio of the mean coalescent time for pairs of alleles sampled from within a population to the mean coalescent time for pairs of alleles sampled randomly from the population as whole. With a large number of demes, (Charlesworth 1998).
Random mating within local populations and with respect to arrangement status is assumed. The migration and recombination parameters, m and r, are assumed to be so small that second-order terms can be neglected. At a given neutral site within the region covered by the arrangement (or just outside it), recombinational exchange between arrangements caused by gene conversion and/or double crossing over occurs at rate r per generation. In the gametes produced by a heterokaryotype, In/St, there is a probability r that a given neutral site associated with the In haplotype came from the St haplotype and that the homologous site associated with the St haplotype came from the In haplotype. It is likely that the value of r will depend on the location of the site within the inversion, with the largest values for sites within the inversion that are remote from the breakpoints, due to the effects of the breakpoints in disturbing synapsis (Navarro et al. 1997), although the extent to which gene conversion events are influenced by proximity to the breakpoints is uncertain (Li et al. 2023). Sites outside the inversion will experience increasingly high rates of recombination with distance from the breakpoints as the effect of crossover suppression dies out (Koury 2023).
Inversion-carrying and standard haplotypes are denoted by indices 1 and 2, respectively. We need to distinguish between a pair of haplotypes that are sampled from the same deme (denoted by subscript w), and a pair of haplotypes sampled from 2 different demes (denoted by subscript b). The expected coalescent time for a sample of class i/j for a within-deme sample is denoted by tijw, and the equivalent for a between-deme sample is tijb. For a between-deme sample, the probability that a migrant haplotype came from the same deme as the nonmigrant haplotype is 1/(d − 1) and the probability that it came from a different deme is (d − 2)/(d − 1). Simple recursions for the t's can be obtained, which are given by Equation (A1) of the Appendix. Further simplification is provided by neglecting the products of 1/Nx and 1/Ny with m and r (Equation A2).
It is convenient to scale the migration rate and recombination rate by 4 times the deme size, writing M = 4Nm and R = 4Nr. In addition, coalescent times can be expressed relative to the expected coalescent time for a pair of alleles sampled from the same deme at a locus that is independent of the inversion, TSn = 2NT = 2dN. Upper case T's are used to denote t's divided by TSn. As shown in the Appendix, manipulation of the resulting equations leads to simple explicit approximate expressions for the equilibrium values of the Tijb and Tijw, assuming a large number of demes and M >> R (Equations A5 and A6).
It is useful to consider the mean scaled coalescent time for pairs of alleles sampled randomly across karyotypes. For alleles from different demes, this is given by
(1a) |
Following Charlesworth et al. (1997), the mean scaled mean coalescent time for a pair of alleles sampled within karyotypes, but from different demes, is
(1b) |
For measuring differentiation between sequences associated with alleles maintained by selection, a between-karyotype analogue of FST was defined by Charlesworth et al. (1997), which is analogous to the KST measure of Hudson, Boos et al. (1992):
(2a) |
A related quantity, analogous to the < FST > statistic of Hudson, Slatkin et al. (1992), is
(2b) |
For equilibrium under recombination and drift, for which T12 > 1, > FATb, since T12b > TTb.
It should be borne in mind that different authors use different estimators for FST or FAT; the widely used methods of Weir and Cockerham (1984) and Hudson, Slatkin et al. (1992), which are mathematically equivalent, have much larger expected values than KST when the number of populations being compared is small (Charlesworth 1998; Gammerdinger et al. 2020), so that caution needs to be used when comparing FST or FAT estimates from different studies.
Corresponding expressions apply to alleles sampled within demes, replacing subscript b by w. The Maruyama invariance principle for structured coalescent processes with conservative migration (Maruyama 1974; Nagylaki 1982) implies that at equilibrium, we have
(3) |
This result breaks down when NTr is close to zero; with r = 0, the 2 karyotypes behave as separate populations, with migration effective population sizes of NTx and NTy, respectively, so that TSw = x2 + y2.
Equation (3) implies that the equilibrium expressions for FATw and simplify to
(4a) |
(4b) |
For equilibrium populations, therefore, the 2 F-statistics for within-population/between-karyotype differentiation contain no more information than do TTw and T12w, respectively; we have TTw = 1/(1 − FATw) and T12w = 1/(1 − ). For applications of these formulae to population genomic data, the divergence statistics corresponding to the T's need to be scaled by an estimate of mean neutral nucleotide site diversity at sites independent of the inversion, or by the weighted mean of the 2 within-karyotype diversities, as described in the Discussion, Interpreting population genomic data on inversion polymorphisms. The latter has the advantage that potential differences in mutation rates among different genomic regions are eliminated.
When the scaled migration rate M tends to infinity and FST tends to zero, the case of a panmictic population with population size NT is approached and the subscripts w and b can be dropped. The recursion relations for the within-deme statistics can be used for this case (Equation A2), setting m to zero and the deme size to NT. Application of the above method to this case, writing ρ = 4NTr for the scaled recombination rate, and assuming that ρ > 0, gives the following expressions for equilibrium:
(5a) |
(5b) |
(5c) |
These expressions are equivalent to Equations (A3) and (A4) of Nordborg (1997), Equation (6) of Navarro et al. (2000), and Equation (8) of Zeng et al. (2021), which were derived by more complex methods. Consistent with Equation (3), TS = xT11 + yT22 = 1 at equilibrium, so that
(5d) |
(5e) |
(5f) |
Theoretical results: equilibrium populations
General considerations
Navarro et al. (2000) have described results for the case of an equilibrium panmictic population, which is the limiting case when FST = 0. The focus here is therefore on subdivided populations, using the approximations for the equilibrium T's given by Equations (A5) and (A6), which assume large d and m >> r. When comparing the theoretical results with observations, it is useful to note that the T's under the infinite sites model of Kimura (1971) are proportional to the corresponding mean pairwise diversities or divergences per nucleotide site, taken over large numbers of neutral sites with the same mutation and recombination rates (e.g. Hudson 1990). The mean levels of nucleotide site diversity and divergence between arrangements within D. melanogaster populations are such that the infinite sites model fits the data reasonably well (Corbett-Detig and Hartl 2012; Langley et al. 2012; Kapun et al. 2023), so that the T values and their ratios presented in the figures below indicate the corresponding expected diversity and divergence values when interpreting data on inversion polymorphisms. Most multicellular organisms have similar or lower diversity values (Buffalo 2021), and so should present even less of a problem of interpretation.
Numerical results for subdivided populations
The results presented here are intended to represent a set of populations in a single geographical region that is isolated from other regions, with a within-deme neutral nucleotide site diversity value (π) similar to that found in population genomic surveys. The results from the Drosophila Genome Nexus Project, which has assembled genome sequence data for a large number of D. melanogaster genomes from natural populations (Lack et al. 2015, 2016), show that FST between pairs of populations within a region is generally low, around 0.05 or even less, and is only about 0.2 between continents (Fig. 3 of Lack et al. 2016). FST values in animals (Roux et al. 2016) and outcrossing flowering plants (Charlesworth 2003) rarely exceed 0.25, so that FST in the figures was restricted to the range 0–0.25 for purposes of illustration, consistent with the approximations used to generate the results.
Fig. 3.
The trajectories of change in the population statistics for the case of a panmictic population of size n = 106, assuming that the time taken to approach the equilibrium inversion frequency is negligible compared with the coalescent time of 2N generations. The X axes display times in units of coalescent time following the sweep to equilibrium. Three different recombination rates in heterokaryotypes are shown, as well as 2 different frequencies of the inversion (0.1 in the upper panels and 0.5 in the lower panels). The dashed curves are FAT, whose values are displayed on the left-hand Y axes. The solid curves are mean coalescent times, measured relative to 2N (right-hand Y axes); red is T12, brown is TT, black is T11, and blue is T22. For the highest rate of recombination (r = 10–5), only the first N generations are shown, in order to capture the rapid changes at the start of the process. The colored bars inside the Y axes indicate the equilibrium values of the corresponding statistics, for cases when these are substantially different from the final values of the statistics.
In what follows, special attention is paid to the results with FST = 0.05, although the mean pairwise FST for inversion-free genomes between the Zambian D. melanogaster population (ZI) and 2 populations from South Africa (SP and SI) is only 0.007 (Lack et al. 2016, Fig. 3). This suggests that panmixia is a good approximation for populations in this region, which is thought be close to the center of origin of the species (Sprengelmeyer et al. 2020). The ZI population was a focus of the intensive study of the In(3R)P inversion by Kapun et al. (2023), discussed below in relation to the theoretical results. It is important to note that, under the assumptions made here, the equations for the T's involve only the parameters M = 4Nm and ρ = 4NTr = 4Ndr; in turn, with large d, we have M ≈ (1 − FST)/FST, where FST corresponds to the KST measure of differentiation among local populations for neutral loci independent of the inversion. Under these conditions, if FST, r, and NT are specified, the results are not affected by the number of demes, the deme size (N), or the migration rate (m).
Results are displayed for 4 different values of the rate of recombination (r) between a neutral site and the arrangement in In/St heterokaryotypes, which fall within the range reported for single inversions in Drosophila (Navarro et al. 1997; Korunes and Noor 2019). Given that r is expected to be highest in the central regions of inversions, and lowest near their breakpoints (Navarro et al. 1997, 2000), increased r in the figures can be interpreted as reflecting an increased distance from a breakpoint. Figure 1 plots several expected pairwise coalescent times for within- and between-deme samples against FST for the case of an inversion frequency of 0.1. Figure 2 shows the corresponding ratios of T11/T22 for within- and between-deme samples, as well as the FAT statistics. Supplementary Figs. 1 and 2 in the file Supplemental Figures give the results for an inversion frequency of 0.5.
Fig. 1.
Equilibrium expected coalescence times (relative to 2NT) for an inversion polymorphism where the inversion is maintained at a constant frequency of 0.1. Four different recombination rates in heterokaryotypes (r) are modeled, as indicated at the top left of each panel. An island model of population structure is assumed, with 200 demes and a total population size of NT = 106, so that individual demes have a population size of N = 5,000. The X axis is the equilibrium FST for neutral sites unlinked to the inversion. Subscripts 1 and 2 denote alleles sampled from the inversion and standard arrangement, respectively; subscripts w and b denote alleles sampled from the same and from separate demes, respectively; subscript T denotes pairs of alleles sampled without regard to karyotype. The dashed curves represent within-deme coalescent times, and the solid curves are between-deme coalescent times; blue is T12 and T11, black is TT, blue is T11, and red is T22 (for T12, there is no significant difference between with- and between-deme values.) The mean within-karyotype values for between-deme samples (TSb) is the solid beige curve; the within-deme equivalent (TSw) is equal to 1 for all r and FST values and is not displayed.
Fig. 2.
Equilibrium values of FATw (blue dashed curves), FATb (blue solid curves), T11w/T22w (black dashed curves), and T11b/T22b (black solid curves), for an inversion polymorphism where the inversion is maintained at a constant frequency of 0.1. The population and recombination parameters in Fig. 1 are used.
The mean coalescent time between In and St relative to 2NT, T12, is almost the same for within-deme pairs of alleles as for alleles from separate demes (see Equations A6d and A6e), so that only the between-deme values are shown. This property reflects the fact that, with large d, the probability that 2 alleles sampled from 2 distinct demes were derived from the same deme in the recent past (where they have an opportunity to recombine) is very small compared with the probability that alleles sampled from the same deme were derived from separate demes (and cannot recombine); given that m is assumed to be >>r, migration is the major factor affecting alleles sampled from the same deme. This causes the properties of pairs of alleles sampled from the same or different demes (other than their probabilities of coalescing) to converge; there is, of course, no immediate chance of coalescence for 2 alleles sampled from different arrangements.
T 12 provides a measure of the expected net sequence divergence between a pair of alleles from the 2 different karyotypes, relative to the expected within-deme neutral diversity for loci independent of the inversion, or to the expected neutral diversity averaged over In and St haplotypes sampled randomly from the same deme. This because these diversity measures are both equal to θ = 4NTu, where u is the mutation rate per base pair (Nagylaki 1998)—see Equation (3). For u = 5 × 10–9 (Assaf et al. 2017) and NT = 106, a value of T12 = 1 corresponds to a sequence divergence of T12θ/2 = 0.01 per base pair, which is approximately equal to the within-population synonymous site diversity in D. melanogaster ancestral range populations (Langley et al. 2012). As can be seen in Fig. 1, T12 is quite close to 1 for r = 10–5 or 10–4 with FST ≤ 0.05 but takes much larger values for the 2 lower recombination rates. T12 increases with FST, but only slowly at the 2 lower recombination rates, and is at most about 1.5 for the 2 higher r values.
Unless r is very low, as may be the case near inversion breakpoints (Navarro et al. 1997), the equilibrium level of sequence divergence between In and St is thus likely to be less than 50% larger than the genome-wide mean neutral diversity, corresponding to an of 0.33. Equation (A6d) shows that the effect of FST on T12 is caused by the increase in divergence between alleles sampled from different karyotypes and different populations as the between-deme coalescent times increase. The effect of recombination on T12 is controlled by the reciprocal of ρ, the recombination rate scaled by 4NT, which is invariant with respect to the level of population subdivision under the assumptions used here.
A related statistic is the net mean coalescent time for a pair of alleles sampled randomly within demes without regard to karyotype (TTw), which determines the overall mean within-deme π for nucleotide sites located within the region covered by the inversion. Figure 1 and Supplementary Fig. 1 show that TTw is much less sensitive to FST than T12 but increases slightly with increasing FST, especially at high recombination rates, reflecting the fact that it is heavily influenced by the within-karyotype coalescent times T11w and T22w, which either change in opposite directions as FST increases (for x = 0.1) or remain constant (for x = 0.5). This also means that TTw < T12, even for an intermediate frequency inversion, so that the corresponding diversity statistic provides a less useful index of between-karyotype differentiation than the between-karyotype divergence. The corresponding statistic for alleles sampled from different demes (TTb) has similar properties but is always somewhat larger in magnitude, especially at high FST values, due to the inflation of between-deme coalescent times with restricted migration. Unless r is very low, an inversion polymorphism at recombination–drift equilibrium is not expected to have a large effect on the overall level of sequence diversity in the population unless there is a high degree of population subdivision, consistent with the results for chromosome arm 3R of D. melanogaster shown in Figure 4 of Corbett-Detig and Hartl (2012).
When the inversion is present at a low frequency, as in Figs. 1 and 2, the mean coalescent times for pairs of alleles sampled within the inversion (T11) are smaller than those for alleles sampled within St (T22), due to the lower effective population size of carriers of In; this also applies to the corresponding π values (Nordborg 1997; Navarro et al. 2000). The converse is true if In is more frequent than St. This difference disappears when In and St have equal frequencies, since their effective sizes are necessarily equal, and is smaller, the smaller the difference in frequencies. Figure 2 shows that the ratio T11w/T22w decreases considerably as FST increases when r is relatively large, whereas T11b/T22b increases slightly. Conversely, T11b/T22b increases considerably with FST when r is small, but T11w/T22w hardly changes. T11w/T22w is, however, always greater than the value of 0.11 expected from the relative frequencies of In and St, reflecting the effect of recombination in reducing allele frequency differences; consistent with this property, both T11w/T22w and T11b/T22b increase with r.
All other things being equal, therefore, the departure of estimates of T11w/T22w from the ratio of the frequencies of In and St provides an inverse measure of the extent of differentiation between karyotypes when the frequency ratio departs considerably from unity. The behavior of T11w/T22w as a function of r reflects the fact that, for a low-frequency inversion, T11w increases considerably with increasing r for a given FST whereas T22w decreases slightly, as seen in Fig. 1. This complementary behavior arises from the fact that xT11w + yT22w = 1 at equilibrium, by Maruyama's invariance principle (Equation 3). An increase in FST results in a reduced effective rate of recombination for within-deme samples, due to increased homozygosity (Wakeley and Lessard 2003) so that T11w decreases with FST, whereas T22w increases. T11b and T22b both increase with FST, especially with high r values, reflecting the effects of reduced migration rates on coalescence times; for between-deme samples, population subdivision of the kind modeled here does not affect the effective recombination rate for between-deme samples (Wakeley and Lessard 2003).
The between-karyotype analogues of FST, FAT, and (see Equations 4) are often used as a measure of the extent of genetic differentiation among karyotypes, rather than the absolute divergence measures just discussed. As noted by Charlesworth et al. (1997) in the context of a single-locus balanced polymorphism (see also Zeng et al. 2021), FAT is equivalent to the measure of LD of Ohta and Kimura (1971), treating the neutral site and karyotype as a pair of loci, but it is considerably less tedious to estimate from population genomic data. Its properties thus differ somewhat from those of T12 or TTw, especially as it is heavily influenced by the within-karyotype coalescent time in the numerator of Equation (4a) (Charlesworth 1998). Figure 2 and Supplementary Fig. 2 show that FATw and FATb are only weakly affected by the extent of population subdivision at the lower 2 recombination rates, with FATw increasing slightly with FST whereas FATb decreases; FATw is always larger than FATb, especially when FST and r are large, and is thus more useful as a measure of between-karyotype differentiation. This difference reflects the effect of population subdivision on recombination in within-deme samples and lack of such an effect for between-deme samples, described above in connection with the ratio T11w/T22w. Both statistics are highly sensitive to r, with small values at the 2 higher r values, especially for FATb. for equilibrium populations is given by 1 − 1/T12 (Equations 4b and 5f), so that it behaves in a similar fashion to T12 as a function of r and FST and is therefore not shown here.
A moderately high level of population subdivision in an equilibrium system thus makes it much easier to detect between-karyotype differentiation at low recombination rates, as measured by FATw, , or T12, but makes FATb a less useful measure. Comparisons of the mean between-karyotype divergence (either within or between demes) with the mean within-deme and within-karyotype diversity for samples at loci independent of the inversion (corresponding to a TSn of 1), or with the mean within-deme and within-karyotype diversity (corresponding to a TSw of 1), are probably the most useful measures of the extent of between-karyotype divergence, if equilibrium can be assumed.
It is of some interest to compare the values of FAT for a polymorphism maintained at constant frequencies with the value of for a pair of neutral loci at statistical equilibrium under recombination and drift in a subdivided population. Table 1 shows results for the within-deme measure of FAT compared with for within-deme samples calculated from the equations in the Appendix to Wakeley and Lessard (2003). For the intermediate and highest recombination rates (ρ = 4 and ρ = 40), FAT for x = 0.5 is very close to , whereas FAT for x = 0.1 is substantially smaller, except for the panmictic case (FST = 0); all 3 variables increase considerably with increasing FST, reflecting the reduced effective recombination rate when there is extensive population subdivision (Wakeley and Lessard 2003). In contrast, for the lowest recombination rate (ρ = 0.4), FAT for x = 0.1 is close to and there is only a small increase in the 3 measures with FST. For a pair of neutral loci, for a sample of haplotypes taken from separate demes (which is equivalent to random sampling from the whole population with large d) is independent of the extent of subdivision with the model used here (Wakeley and Lessard 2003), in contrast to the behavior of FATb shown in Figs. 1 and 2. The results for neutral loci thus shed only limited light on what is to be expected for an inversion polymorphism maintained at a constant frequency. The discrepancy between the statistics for the neutral case and the case with a constant inversion frequency could in principle be used as a test for selection, although this would require accurate knowledge of r.
Table 1.
Comparisons of within-deme FAT for a balanced polymorphism vs for a pair of neutral loci.
ρ = 0.4 | ρ = 4 | ρ = 40 | |||||||
---|---|---|---|---|---|---|---|---|---|
FST |
FAT
x = 0.1 |
FAT
x = 0.5 |
FAT
x = 0.1 |
FAT
x = 0.5 |
FAT
x = 0.1 |
FAT
x = 0.5 |
|||
0 | 0.489 | 0.714 | 0.380 | 0.117 | 0.200 | 0.156 | 0.021 | 0.024 | 0.023 |
0.05 | 0.492 | 0.718 | 0.419 | 0.130 | 0.232 | 0.200 | 0.051 | 0.072 | 0.070 |
0.10 | 0.496 | 0.723 | 0.454 | 0.143 | 0.265 | 0.241 | 0.073 | 0.120 | 0.114 |
0.15 | 0.500 | 0.728 | 0.488 | 0.157 | 0.299 | 0.281 | 0.092 | 0.167 | 0.156 |
0.20 | 0.504 | 0.733 | 0.521 | 0.171 | 0.333 | 0.320 | 0.110 | 0.216 | 0.197 |
0.25 | 0.509 | 0.739 | 0.552 | 0.186 | 0.368 | 0.358 | 0.128 | 0.263 | 0.238 |
The scaled recombination rates of ρ = 0.4, 4, and 40 correspond to r = 10–7, 10–6, and 10–5, respectively, when NT = 106, as assumed in the figures.
As expected from Equation (5a), a comparison of Figs. 1 and 2 with Supplementary Figs. 1 and 2 shows that T12 is not greatly affected by the inversion frequency; in contrast, FATw and FATb are larger with equal frequencies of the 2 karyotypes than when the inversion is either rare or very common. This reflects the smaller TT values with extreme inversion frequencies, as can be seen from Equation (5d) for the panmictic case, where TT is approximately 1 + 2ρ−1 for x = 0.5 but approaches 1 + x(4ρ−1 + 1) as x tends to 0. This is another reason for using between-karyotype divergence relative to mean within-deme diversity as a measure of differentiation between karyotypes.
Theoretical results: approach to equilibrium
General considerations
The above results assume both that the inversion has reached its equilibrium frequency and that there has been sufficient time for the effects of coalescence and recombination to equilibrate. In reality, as was assumed in early hitchhiking models of associations between electrophoretic variants and inversions (Ishii and Charlesworth 1977), a polymorphic inversion is likely to have arisen on a single unique haplotype drawn at random from the initial population. Having survived early stochastic loss, it will have approached its equilibrium frequency over a time that is inversely related to the strength of selection acting on it. Molecular characterizations of inversion breakpoints in D. melanogaster and Drosophila subobscura strongly support the unique origin hypothesis (Corbett-Detig and Hartl 2012; Orengo et al. 2019; Kapun et al. 2023). After this selective equilibrium has been approached, there will be another extended period during which mutation–recombination–drift equilibrium is approached. In the case of a panmictic population, both of these episodes can be included in the same model, using phase-type theory (Zeng et al. 2021).
But the population genetics of the spread of a new mutation in a subdivided population, and its hitchhiking effects on linked neutral sites, is much more complex (Barton et al. 2013) and has not been applied to the case of balancing selection. In the present treatment, the initial approach to the equilibrium inversion frequency is assumed to occur effectively instantaneously, so that only the second phase of the approach to equilibrium is studied. Clearly, this is likely to cause the time taken to approach equilibrium and the effects of recombination during this period to be underestimated, due to the additional time needed for a mutation to spread through a subdivided population compared with panmixia, even if the habitat is 2 dimensional rather than 1 dimensional (Barton et al. 2013).
By using this simplifying assumption, we can set T22w = T22b = 0, T12w = T11w = 1, and T12b = T11b, with T11b ≈ 1/(1 − FST) when d is large. Using these as initial values, the change per generation in the deviations of the T's from their equilibrium values can be calculated by means of the matrix iteration xn = A xn − 1, where xn is the column vector of deviations of the T's in generation n from their equilibrium values and the matrix A is defined at the end of section 1 of Supplementary File 1. This approach breaks down in the absence of recombination, since in this case, the divergence between In and St increases without limit as time increases; a separate treatment of this case is given in sections 6 and 7 of Supplementary File 1.
In order to speed up calculations, a relatively small total population size (105 for the case of a subdivided population and 104 for the equivalent panmictic population) was assumed, with rescaling of parameters to keep their products with NT identical with those used for the equilibrium results. Insight into the asymptotic rate of approach to equilibrium when d is large can be obtained from the eigenvalues of the A matrix. As shown in section 3 of Supplementary File 1, if second-order terms in 1/N, r, and m are neglected, the structure of A is such that its 6 eigenvalues (3 in the panmictic case) are each approximately equal to 1 of its diagonal elements, i.e. to 1 − 2m − 2ry − 1/(2Nx), 1 − 2m/d − 2ry, 1 − 2m − 2rx − 1/(2Ny), 1 − 2 m/d − 2rx, 1 − 2m − r, and 1 − 2m/d – r. The asymptotic rate of approach to equilibrium is controlled by the largest of these quantities; which of the 6 is the largest is determined by the relative values of Nx, Ny, rx, ry, and m. Since m >> r and d >> 1 with the approximations used here, either 1 – 2m/d – 2ry or 1 – 2m/d – 2rx will be the largest eigenvalue, showing that the asymptotic rate of approach to equilibrium is largely controlled by the product of 2r and the smaller of the 2 frequencies x and y when d is very large. In the case of a panmictic population, the system reduces to a 3-dimensional one with eigenvalues approximately equal to 1 – 2ry – 1/(2Nx), 1 – 2rx – 1/(2Ny), and 1 – r, so that a similar conclusion applies.
The timescale of the approach of the coalescence times to equilibrium after the inversion has approached its equilibrium frequency is thus of the order of the inverse of rx (if x ≤ 0.5) or ry (if x > 0.5), unless r is very close to zero—see sections 6 and 7 of Supplementary File 1 for this case. It is nearly independent of population structure with large d and m >> r. The full solution for xn in generation n in terms of the representation of An by the eigenvalues and eigenvectors of A is given in section 3 of Supplementary File 1. In practice, however, it is simpler to iterate the basic matrix recursion—it takes only a few seconds to iterate several million generations on a laptop computer.
It is also of interest to examine the initial rates of change of the Ts using the starting point of an instantaneous sweep to equilibrium described above. The corresponding initial values of FATw and FATb are then both equal to 1 − y/(2xy + y2) = x/(1 + x); = = x. The initial FAT statistics can thus be substantially different from zero despite the absence of any divergence between In and St, due to the assumed lack of variability within inverted chromosomes. This is seen in other systems such as X–Y comparisons when a newly evolved Y chromosome lacks variability (Bergero et al. 2019; Toups et al. 2019; Gammerdinger et al. 2020).
Normalizing Equation (A2) by dividing each term by 2NT, neglecting second-order terms in the products of the deviations of the T's from their equilibrium values with m, r, and 1/N, and exploiting the fact that 4Nm ≈ (1 − FST)/FST in order to simplify Equations (A6c–A6e), the following simple relations hold for the initial changes per generation in the T's:
(6a) |
(6b) |
(6c) |
(6d) |
(6e ) |
(6f) |
Using the results in section 4 of Supplementary File 1, we also have the following:
(7a) |
(7b) |
These results require rx and ry to both exceed 1/2NT and hence are invalid for very low rates of recombination, as in the case of ρ = 0.4 shown in Figs. 3–5 below.
Fig. 5.
This is the same as Fig. 4, except that FST = 0.15 (4Nm = 5.67).
Equations (6a), (6c), (6e), and (7a) can be applied to the panmictic case, by setting the right-hand terms involving m and 1/2N (but not 1/2Ny) to zero, and equating N and NT.
Results for a single randomly mating population
The results for a single, randomly mating population of size N = NT are considered first, assuming an instantaneous sweep of a new inversion to its equilibrium frequency. This case was previously studied by Navarro et al. (2000) using coalescent simulations. Some representative results are shown in Fig. 3 for 3 different recombination rates adjusted to the values for N = 106, i.e. scaled recombination rates of ρ = 4Nr = 0.4, 4, and 400, and 2 different equilibrium frequencies, x = 0.1 and x = 0.5. Time is measured relative to 2N = 2NT, and the results are thus invariant with respect to N for constant ρ (unless N is sufficiently small that the assumptions of the coalescent process are violated). The accuracy of the approximations used to generate these results was checked by coalescent simulations and found to be excellent (see Supplementary Table 1). Further details of the results based on the recursion relations are given in Supplementary Table 3.
As shown above, the initial value of FAT after an instantaneous sweep of the inversion to its equilibrium frequency is simply x/(1 + x). FAT does not necessarily change monotonically over time, as can be seen for the case of x = 0.1 with r = 10–6, where it first decreases and then increases. With the highest recombination rate, the final direction of change of FAT can be opposite to that for T12 because its initial value is greater than its equilibrium value, so that FAT declines over time, reflecting the negative term in r in Equation (7a). In contrast, T12 increases on the approach to equilibrium, after an initial decline when r is large, as predicted by Equation (6e) (see the results for x = 0.5 and r = 10–5).
As expected intuitively, and as is consistent with Equation (6a), the mean coalescent time for the inversion (T11) always increases with time towards its equilibrium value, reflecting the effect of recombination in causing it to share ancestry with the standard sequence and the fact that the inversion has increased in number from a completely bottlenecked haploid population size of 1. Even with the lowest frequency of recombination (ρ = 0.4), there is a relatively fast approach of T11 towards its equilibrium value compared with FAT and T12, over a timescale that is much less than the neutral coalescent time. Except for the lowest recombination rate, relatively large values of FAT and T12 (when compared with the values for the higher 2 recombination rates) are approached over this timescale, reflecting their large equilibrium values. With the lowest recombination rate, however, there is a long period of time when FAT and T12 are both much lower than their equilibrium values. Consistent with Equation (6c), T22 decreases initially for all 3 recombination rates, although very slowly if Ny is close to N. In some cases (e.g. with x = 0.5 and r = 10–5), FAT changes nonmonotonically, with an initial decrease followed by an increase. This reflects the fact that, for these parameter values, Equation (7a) predicts an initial decrease in FAT, whereas its final value is greater than its initial value.
A simple analytical solution to the trajectories of the mean coalescent times for the case of no recombination can also be derived (see section 6 of Supplementary File 1). In this case, T12 increases linearly with time on the coalescent timescale (Equation S5a), whereas T11 and T22 experience an exponential decay of their deviations from their respective equilibrium values of x and y, with rate constants x and y. In the case of an instantaneous sweep of In to its equilibrium frequency, this implies that T11 always increases over time and T22 always decreases. For sufficiently large T, these expressions imply that FAT increases monotonically towards 1, due to the fact that T12 increases without being bound in the absence of recombination. Approximations to these expressions for small T show that FAT always increases initially (see Equations (S5e)–(S5i) in Supplementary File 1).
Results for a subdivided population
The numerical values of the population statistics were generated by using scaled recombination and migration parameters that matched those used for Figs. 1 and 2. Figs. 4 and 5 show the results for 2 different values of FST at neutral sites independent of the inversion, corresponding to scaled migration rates of 4Nm = 19 and 5.67, respectively. Further details are given in Supplementary Tables 4 and 5.
Fig. 4.
The trajectories of change in the population statistics for the case of a subdivided (island model) population of total size NT = 106, with 200 demes of size N = 500 and an FST = 0.05 (4Nm = 19) for neutral sites independent of the inversion. It is assumed that the time taken to approach the equilibrium inversion frequency is negligible compared with the coalescent time of 2NT generations. The X axes display times in units of coalescent time following the sweep to equilibrium. Three different recombination rates in heterokaryotypes are shown, as well as 2 different frequencies of the inversion (0.1 in the upper panels and 0.5 in the lower panels). The solid curves represent within-population statistics, and the dashed curves are between-population statistics. The values of FATw and FATb (brown curves) are given by the left-hand Y axes. The other curves are mean coalescent times, measured relative to 2NT; red is T12w (T12b behaves almost identically, except for its higher initial value and slower rate of increase when the time since the sweep is <0.005); black is T11 and blue is T22 (right-hand Y axes). For the highest rate of recombination (r = 10–5), only the first NT generations are shown. The colored bars inside the Y axes indicate the equilibrium values of the corresponding statistics, for cases when these are substantially different from the final values of the statistics.
The results are broadly similar to those for the panmictic case described above. The most notable difference is that there is a short initial period with rapid increases in T12w and FATw. For FATw, this period is followed by a monotonic decrease if the equilibrium value of FATw is lower than the maximum values it achieves, or (in the case of x = 0.1 and r = 10–6) a decrease followed by an increase, when its equilibrium value exceeds its minimum value. In contrast, T12b and FATb behave initially much like T12 and FAT in the panmictic case. For the case of r = 0 and a large number of demes (d), Equations (S9c) and (S9d) with very small dMT show that T12w ≈ 1 + dT, whereas T12b ≈ (1 + M)/M + T.
The numerical results used in Figs. 4 and 5 show, however, that T12w and T12b quickly converge and have nearly identical trajectories after scaled time T = 0.005 (about 10,000 generations with the parameters used here). It is easily shown from Equations (A2e) and (A2f) for the case of no recombination and an instantaneous sweep of the inversion that
(8) |
The 2 measures of T12 thus converge rapidly when r = 0; since it is assumed here that migration is a more powerful force than recombination, this is true more generally.
Similarly, Equation (S20e) shows that the coefficient of T in the numerator of the expression for FATw with no recombination involves dx, so that FATw also increases very fast initially when the number of demes is large, especially if x is close to 1. In contrast, there is no contribution from d to the expression for FATb (Equation S20f); FATb starts, however, from a higher initial value than FATw because of population subdivision.
When the recombination rate is sufficiently high, Equations (6) and (7) for the initial rates of change in the population statistics can be applied. ΔT12w in Equation (6e) involves the sum of 1/(2NT) and 1/(2N), whereas the term in 1/(2N) is absent from the corresponding expressions for ΔT12b as well as ΔT12 in the panmictic case. Since 1/(2N) is larger than 1/(2NT) by a factor of d, this term has a large effect on the initial rate of change of T12w. Similarly, the expression for ΔFATw (Equation 7a) involves the positive term (1 – 1/2d)/Ny, whereas the corresponding expression for ΔFATb involves (1 – FST)/(2Nd), which is negative. These results show how it is possible for the 2 measures to change initially in opposite directions.
Discussion
General considerations
The results described here are broadly consistent with a previous theory on the patterns of diversity associated with balanced polymorphisms (Hudson and Kaplan 1988; Hudson 1990; Takahata 1990; Charlesworth et al. 1997; Nordborg 1997; Navarro et al. 2000; Charlesworth et al. 2003; Innan and Nordborg 2003; Nordborg and Innan 2003; Guerrero et al. 2012; Kirkpatrick and Guerrero 2014; Zeng et al. 2021) but extend it in several ways. From the purely technical point of view, the recursion relations for mean pairwise coalescence times in a structured population (Nagylaki 1998) provide a simple and computationally efficient method for calculating the expected values of population statistics and their trajectories, on the assumption that the alleles at the target of selection are maintained at constant frequencies. The only previous application of this method to balanced polymorphisms appears to have been that by Zeng et al. (2021) for the case of a single population.
Here, this method has been extended to an autosomal balanced polymorphism in a subdivided population, for the simple case of a finite island model of d demes of equal size N, under a Wright–Fisher model of drift for which the effective population size of a single deme equals N. The “migration effective population size” (Nagylaki 1998), which determines the mean coalescent time for a pair of alleles drawn from the same deme at a locus independent of the balanced polymorphism, is then given by NT = Nd, provided that Nm is not very close to 0. This result applies more widely to all conservative migration models, where the mean allele frequency across demes is not changed by migration (Nagylaki 1982, 1998). For more general drift models, N can be replaced by Ne, the effective population size of a deme, given certain restrictions (Charlesworth and Charlesworth 2010, p.327). At recombination–drift equilibrium, the mean coalescent time for this sampling scheme is also equal to the mean within-karyotype coalescent time, if coalescent times for the 2 arrangements are weighted by the arrangement frequencies (Equation 3) and NTr is not very close to 0. For this reason, the various coalescent times used here have been expressed relative to 2NT.
For simplicity, the rest of the discussion will refer only to 2 arrangements with respect to an inversion polymorphism, but identical results apply to other types of diallelic polymorphisms. In principle, it is possible to extend this approach to more general situations, such as polymorphisms for multiple different arrangements, sex chromosomes, and nonrandom mating populations, as well as changes in population size. Its main drawback is that the main results are limited to expected pairwise coalescent times and do not provide information on features such as the site frequency spectra of neutral sites linked to the target of balancing selection. For these properties, either coalescent simulations or more complex analytical approaches, such as the phase-type theory of Zeng et al. (2021), are needed. An extension of Nagylaki’s (1998) recursion equation approach can be used for obtaining the variances and higher moments of pairwise coalescent times, as described in section 8 of Supplementary File 1. The statistical properties of mean pairwise diversity and divergence measures over large numbers of nucleotide are discussed in the light of the variances obtained in this way in section 9 of Supplementary File 1.
Interpreting population genomic data on inversion polymorphisms
The results on equilibrium-expected coalescent times for pairs of alleles (Figs. 1 and 2 and Supplementary Figs. 1 and 2) suggest that the scaled expected coalescent time T12 for a pair of alleles sampled from the 2 different arrangements (In and St) provides the most meaningful basis for interpreting their level of sequence divergence. When the number of demes is large, Equation (A6e) shows that there is essentially no difference in equilibrium T12 between alleles sampled from the same deme versus alleles from 2 different demes, at least when FST at neutral sites independent of the inversion is moderate, for the reasons described in the section Numerical results for subdivided populations.
The numerical results on the time courses of the Tij's with population subdivision (Supplementary Tables 4 and 5) show that the lack of dependence of T12 on the nature of the sample (within or between demes) holds very soon after the equilibrium frequency of the inversion has been reached. In addition, except for the lowest scaled recombination rate considered here (ρ = 0.4), the ratio of T12 to the weighted mean scaled coalescent time within karyotypes and within demes (TSw) approaches its equilibrium value quite fast, even when both variables deviate considerably from their equilibrium values. For example, with panmixia, x = 0.1, and ρ = 4, the equilibrium value of T12/TS is 1.5 (TS = 1); at time T = 0.050, T12 = 1.039, TS = 0.916, and T12/TS = 1.14; at T = 0.5, T12 = 1.250, TS = 0.926, and T12/TS = 1.37. With ρ = 40, the equilibrium T12/TS is 1.05; at T = 0.050, T12/TS = 1.039; by T = 0.1, T12/TS = 1.04. It is, however, not always the case that T12/TS is initially lower than its equilibrium value; for x = 0.5 and ρ = 4, its initial value is 2 compared with 1.73 at T = 0.1 and a final value of 1.55.
These findings are important for the interpretation of population genomic data for several reasons. First, if there is evidence of significant population subdivision at neutral loci independent of the inversion, the measure of sequence divergence between In and St, estimated by Nei's dxy (Nei 1987), is best obtained from the mean of within-population samples, since its expectation is equal to 2NTT12u and 4NTu is equal to the expected within-population neutral π in the absence of an inversion polymorphism. Second, T12/TSw (which is equal to T12 for an equilibrium population) can be estimated by dividing an estimate of dxy either by an estimate of the mean within-karyotype, within-population diversity (πSw), or by an estimate of the corresponding mean π at independent neutral sites (πnw). This is because the expectations of both these statistics are equal to 4NTu under the infinite sites model of mutation (for the same mutation rate), unless there is a very low frequency of recombination or the inversion has arisen very recently. To control for possible differences in mutation rate, it is preferable to use the ratio of an estimate of dxy to an estimate of πSw in order to obtain T12. As discussed above in the section The model and its analysis, the theoretical value of the commonly used statistic obtained by applying the < FST > statistic of Hudson, Slatkin et al. (1992) to In versus St haplotypes is equivalent to 1 − TSw/T12. However, reciprocals have undesirable statistical properties if means are to be taken over sets of sequences; it is thus preferable to estimate T12 directly.
This raises the question of whether recombinational exchange in heterokaryotypes occurs at a sufficient rate that the condition TSw = 1 is likely to hold. At least for simple inversions (with only a single pair of breakpoints), there is ample experimental evidence for recombination between In and St at sites within and adjacent to the inversion in species of Drosophila, much of which appears to be caused by gene conversion rather than double crossovers (Chovnick 1973; Korunes and Noor 2019; Koury 2023; Li et al. 2023), with r = 10–5 per base pair per generation in female meiosis being a commonly accepted typical rate for central regions of inversions (Chovnick 1973; Korunes and Noor 2019); for autosomal loci, the lack of recombination in males means that the effective r is half of this value. Complex inversions, with 3 or more breakpoints, might be expected to have much lower rates of exchange than simple inversions, but heterozygotes for multiply inverted chromosomes in D. melanogaster have been found to experience noncrossover-associated gene conversion events at rates that are even higher than those in the absence of inversions (Crown et al. 2018), so that this expectation may not be well founded.
The occurrence of such “gene flux” (Navarro et al. 1997) is consistent with numerous observations of substantial nucleotide site diversity within In karyotypes, but at a reduced level compared with genome-wide diversity. In many Drosophila examples, FAT and/or increase and within-inversion diversity levels decrease, towards the inversion breakpoints (Andolfatto et al. 2001; Nobréga et al. 2008; Corbett-Detig and Hartl 2012; Kapun et al. 2016, 2023), suggesting that flux is lowest near the breakpoints and highest towards the center of the inversion; an exception is the polymorphism in D. subobscura for the overlapping pair of inversions O3+4 versus the standard arrangement Ost (Munté et al. 2005; Papaceit et al. 2013).
Patterns of this kind do not, however, necessarily imply a complete absence of exchange at the breakpoints. Rozas et al. (1999) obtained population genetic evidence for gene conversion events near the breakpoints of arrangements in the O chromosome system of inversions in D. subobscura, and Li et al. (2023) found no evidence for an effect of proximity to breakpoints for the dl-49 inversion of D. melanogaster on the rate of gene conversion. There are, however, mechanistic reasons for believing that exchange rates will be lowest at the breakpoints and highest in the center of a simple inversion (Navarro et al. 1997). Population genomic data, which reveal the effects of recombination over very long time spans compared with laboratory crosses, offer an excellent opportunity for exploring the effects of breakpoints on recombination, as described in the next section.
Estimating the exchange parameter
If equilibrium can be assumed, estimates of between-arrangement divergence and within-arrangement diversities can be compared with the theoretical predictions for the corresponding variables, in order to obtain estimates of r (mutation rate estimates are also needed to estimate r rather than ρ). Figures 1 and 2 show that, for a low-frequency inversion, T11w/T22w, as well as T12, are strongly affected by the level of population subdivision, unless recombination is rare or absent. Even a modest change from FST = 0 to FST = 0.05 causes this ratio to shift from close to 0.8 to approximately 0.6 when r = 10–5 and the inversion frequency is 0.1 (Figure 2), reflecting the effect of population subdivision in multiplying the effective rate of recombination within populations by a factor of 1 − FST (Wakeley and Lessard 2003). Estimates of FST ideally need to be included in any attempts to estimate r. Another difficulty here is that the current model assumes a long-term constancy of the inversion frequency over time, as well as an absence of among population variation in inversion frequencies between populations. Many studies of Drosophila and other groups have revealed major differences among populations in the frequencies of polymorphic inversions, often clinal in nature, which are likely to reflect locally varying selection pressures on genes contained in the inversion, e.g. Krimbas and Powell (1992b), Aulard et al. (2002), Cheng et al. (2012), Kapun and Flatt (2019), and Mérot et al. (2021). Further work is needed to evaluate the consequences of relaxing the assumption of constancy.
This raises the question of whether equilibrium can safely be assumed. Studies of inversion polymorphisms in many Drosophila species suggest that they rarely persist beyond species boundaries (Krimbas and Powell 1992a). There are examples of very close relatives with totally different sets of inversion polymorphisms and many fixed differences with respect to gene arrangements, e.g. Drosophila miranda versus Drosophila pseudoobscura (Bartolomé and Charlesworth 2006). It is thus quite likely a priori that the times of origin of many polymorphic inversions are only a small fraction of the mean neutral coalescent time, 2NT, as was proposed by Andolfatto et al. (2001) in their review of early data on the molecular population genetics of Drosophila inversions.
Figures 3–5 show that for the 2 higher recombination rates 10–6 and 10–5 corresponding to scaled recombination rates (ρ) of 4 and 40, the times for statistics such as T11, T12, and FAT to approach their equilibrium values after the inversion has approached its equilibrium frequency are either commensurate with 2NT (r = 10–6) or much smaller (r = 10–5), consistent with the theoretical prediction that the timescale of approach to equilibrium in terms of generations is of the order of 1/rx when x < 0.5 both for a panmictic population and a subdivided population with a large number of demes (see section 2 of Supplementary File 1). In the absence of recombination, however, diversity within the inversion recovers over a timescale of 2xNT generations for a panmictic population and 2(1 + Mx)NT/M generations for a subdivided population (see Equations S5b and S18a), so that the signature of the selective sweep of the inversion on within-karyotype diversity can persist for a long time, as has been noted previously (Navarro et al. 2000; Zeng et al. 2021). In addition, the fact that the ratio T12/TSw converges on its equilibrium value much faster than the absolute Tij's (see the above section Interpreting population genomic data on inversion polymorphisms) means that it is not necessary to assume that the population is very close to equilibrium when using the corresponding divergence to diversity ratio as a statistic.
If recombination in heterokaryotypes is as frequent as is suggested by the data on Drosophila (Korunes and Noor 2019), the assumption of recombination–drift equilibrium may thus often be reasonably accurate as a predictor of observed patterns of population genomic statistics for genomic regions covered by an inversion polymorphism, at least for sites that are located well away from inversion breakpoints. The equilibrium assumption can be tested by comparing the within-In neutral diversity level with the diversity at neutral sites independent of the inversion (or against the within-St diversity). A very recent sweep of the inversion, leaving the system far from equilibrium, would cause diversity across the whole length of the inversions to be much lower than a fraction x of the diversity outside the inversion or a fraction x/(1 − x) of diversity within the standard arrangement—see the curves in Figs. 3–5. It would also be expected to leave a signature of an excess of low-frequency variants at neutral sites compared with what is seen at comparable sites independent of the inversions (Navarro et al. 2000; Andolfatto et al. 2001; Zeng et al. 2021), although the effects of background selection and selective sweeps within the low recombination environment created by a low-frequency inversion could also cause such a pattern, which is also affected by population size changes (Charlesworth and Jensen 2021). The inversion In(1)Be in African populations of D. melanogaster is an example of a very recent spread of an inversion (Corbett-Detig and Hartl 2012).
The study by Kapun et al. (2023) of In(3R)P in the Zambian population (ZI) of D. melanogaster provides an example of how to apply the theoretical results. As mentioned in the section Numerical results for subdivided populations, panmixia can be probably be assumed for this case, which implies that Equations (5) can be used for the expected coalescence times. Let X = T11/T22. Using Equations (5b) and (5c), simple algebra yields the following formula for ρ, the scaled recombination rate:
(9) |
Figure 2 of Kapun et al. (2023) shows that the mean nucleotide site diversity (across all classes of nucleotide sites) is significantly lower (0.00792) for the In(3R)P haplotypes than for the standard haplotypes (0.00979), giving an estimate of T11/T22 = 0.00792/0.00979 = 0.81 for the region covered by the inversion; x ≈ 0.11 for this population (Kapun and Flatt 2019, Supplementary Table 1). Substituting these numbers into Equation (10) yields an estimate of 34 for ρ, corresponding to r = 5.3 × 10–6 with Ne = 1.6 × 106, the estimate for this population obtained by Johri et al. (2020), which is close to the estimate of 5 × 10–6 from crossing experiments after adjustment for the lack of recombination in males (Korunes and Noor 2019). The estimate of diversity for regions outside the inversion on chromosome 3 is 0.00854, giving an estimate of T11/TS = (0.11 × 0.00792 + 0.89 × 0.00979)/0.00854 of 1.12, slightly higher than the theoretical value of 1 for equilibrium; this discrepancy probably reflects different levels of selective constraints among the genome regions being compared and/or the fact that inversions suppress exchange well outside their breakpoints (Koury 2023; Li et al. 2023).
Figure 4 of Kapun et al. (2023) shows that for the central region of In(3R)P in the Zambian population. Equation (3a) implies that
(10) |
This expression yields an estimate of ρ = 18, considerably less than the above value. This may be due to the fact that the estimate of involves taking the mean of a reciprocal over 100-kb windows, which biases it toward low values compared with using the mean of estimates of T12.
The empirical results on In(3R)P are thus consistent with this inversion being close to recombination–drift equilibrium, with a mean r for central regions of the inversion of approximately 5 × 10–6. The noticeable increase in within-inversion diversity towards the middle of the inversion (Supplementary Fig. 1 of Kapun et al. 2023) and the increase in near the breakpoints (their Fig. 4) strongly indicate that flux rates are reduced close to the breakpoints. The analysis of the diversity data near the breakpoints described below suggests, however, that this reduction is not total.
Estimating the ages of inversions: divergence between In and St
If exchange was indeed completely suppressed around inversion breakpoints, divergence between In and St for sequences close to the breakpoints could be used to estimate the date of origin of an inversion (Hasson and Eanes 1996; Andolfatto et al. 2001; Cáceres et al. 2001; Corbett-Detig and Hartl 2012). In this case, it can no longer be assumed that TSw = 1 for the sequences concerned. Given the reservations about whether such suppression of exchange is absolute, however, caution should be used in making such inferences.
If, as is usually assumed, the new arrangement had a unique origin, T12 in the absence of exchange is equal to the value for a pair of randomly chosen alleles in the ancestral population. In the case of a panmictic Wright–Fisher population of constant size, T12 at time T since the origin of the inversions is given by 1 + T (Equation S5a). With population subdivision, this expression is no longer accurate; if dMT >> 1 and d is large, T12 ≈ (1 − FST)−1 + T for both within- and between-deme measures of T12 (see the approximation to Equation (S9b)). The first term here corresponds to the mean coalescent time for pairs of alleles sampled from different demes. For species like D. melanogaster, with NT ≈ 106 for populations in the ancestral species range (Lack et al. 2015), a T value of 0.1 corresponds to 200,000 generations; with FST = 0.05, (1 − FST)−1 is approximately 50% of this, so that ignoring this term could create a substantial error in the estimated time of origin. The situation would clearly be worse for a species with a higher level of population subdivision.
Under these conditions and assuming population size stability, if there is population subdivision, the neutral diversity estimated from pairs of alleles that are independent of the inversion and are sampled randomly across populations (πnT) can be treated as a proxy for the corresponding mean coalescence time and subtracted from dxy. The ratio of this corrected value to the neutral diversity statistic then provides an estimate of T as defined here. For a very recent origin of the inversion, full Equation (S9b) would usually have to be applied if there is significant population subdivision. The simple approximations of Equations (S9c) and (S9d) could be used when dMT << 1, which corresponds to an extremely recent origin when M ≥ 1, as is likely to be the case for between-population differences within species of most outcrossing species of animals and plants (Charlesworth 2003; Roux et al. 2016).
In practice, different authors have used different methods for estimating T from divergence between arrangements. Taking some of the pioneering studies as examples, Andolfatto et al. (1999) used the number of fixed differences between In and St, which is strongly affected by sample size. Cáceres et al. (2001), Hasson and Eanes (1996), and Corbett-Detig and Hartl (2012) used dxy corrected by subtraction of the within-St diversity. Somewhat different values will be generated by each of these methods. Difficulties clearly arise if there is evidence for recent strong population expansions or contractions, so that the current population statistics cannot be equated to their values at the time of origin of In. There are also likely to be considerable statistical errors, especially as the dxy values are often very small for D. melanogaster inversions (see Supplementary Table 2 in Corbett-Detig and Hartl 2012).
Estimating the ages of inversions: diversity within In
The nucleotide site diversities within arrangements also provide information on the age of an inversion, assuming an absence of exchange. If the inversion frequency is low, however, the within-inversion within-deme diversity (π11w = 2NTT11wu) is only a small fraction of the diversity within the standard sequence or at sites independent of the inversion (see Fig. 1), so that a reduced level of within-inversion diversity does not necessarily imply a recent origin of the inversion. In the case of a subdivided population, provided that dT >> x(1 + Mx), Equation (S18a), which corresponds to the exponential growth model used for simulation-based estimates of inversion age by Corbett-Detig and Hartl (2012), implies that
(11) |
This expression is independent of M and is only meaningful if T11w < x. Given the values of x, mean diversity within the inversion (π11w), and πnw as defined above, it is possible to estimate T from this formula, equating π11w/πnw to T11w. For example, Table 1 of Andolfatto et al. (1999) provides estimates of diversity values of 0.0009 and 0.0125 for a breakpoint of In(2L)t and for independent sites, respectively, in a D. melanogaster population with an inversion frequency of 0.23. T11w is thus estimated as 0.0009/0.0125 = 0.072, giving T = 0.42. Andolfatto et al. (1999) estimated the time to the most common recent ancestor of the sequences around the breakpoint from the standard neutral coalescent formula as 0.15 in the present notation; since this method ignores the effect of the recovery from the sweep of the inversion on the gene tree, it overestimates the opportunity for diversity to recover from its complete loss and hence underestimates the time since the origin of the inversion.
Similar calculations can be applied to the data in Fig. 2 and Supplementary Fig. S1b of Kapun et al. (2023) on the Zambian population of D. melanogaster for sites within 100 kb of the distal and proximal breakpoints of In(3R)P. These yield estimates of T11w of 0.59 and 0.11. These are highly discordant, reflecting the much lower diversity at the proximal breakpoint, which is also seen in the European and North American samples. Given the estimated frequency of 0.11 for In(3R)P in this population (Kapun and Flatt 2019, Supplementary Table 1), these estimates of T11w are inconsistent with a total absence of exchange near the breakpoints. The corresponding values of T22w are 1.28 and 1.13, yielding estimates of TSw of 1.20 and 1.02, respectively, so that the data are reasonably consistent with recombination–drift equilibrium for sites close to the breakpoints, although the flux rates must be substantially lower than the mean of 5 × 10–6 estimated for the central region of the inversion (see the above section of the Discussion, Estimating the exchange parameter).
The numerical results in Supplementary Table 1 for the panmictic case suggest r values around 10–7 and 10–6 for the distal and proximal breakpoints, respectively. Caution should, however, be used in interpreting these conclusions, as they are highly sensitive to the frequency of the inversion in question and there is considerable continent-wide variation in the frequency of inversions such as In(3R)P within Africa (Kapun and Flatt 2019, Supplementary Table 1); Sprengelmeyer et al. (2020) estimated a frequency of 0.23 for a small Zambian sample. However, the data for Zambian, Bechuanaland, and Swaziland samples in Table 1 of Aulard et al. (2002) yield an overall frequency of 0.11 with s.e. of 0.024, so that a frequency around 0.10 for this region of Africa seems reasonable.
An alternative has been to assume that the recent sweep of an inversion results in a star phylogeny, so that the expected pairwise diversity within the inversion (in the absence of exchange) is equal to 2Tu (Rogers 1995); Rozas et al. (1999) and Nobréga et al. (2008) used this method to estimate the ages of several inversions of D. subobscura. Since a star phylogeny must have the smallest mean divergence between a pair of alleles compared with a gene tree with 1 or more coalescent events after the origin of the inversion, this method overestimates the age of the inversion.
The relevance of FAT and
Another point concerns the interpretation of FAT, the analogue of FST for differentiation between In and St, and the related statistic —see Equation (4). These statistics are sometimes loosely referred to as measures of the extent of divergence between In and St, e.g. Guerrero et al. (2012), Kennington and Hoffmann (2013), and Kapun and Flatt (2019). As noted in the section Theoretical results: Approach to equilibrium, General considerations, the fact that FAT and do not necessarily measure the extent of divergence between karyotypes is brought out by their high initial values immediately after an instantaneous sweep to an equilibrium frequency of x, for which FAT = x/(1 + x) and = x. This reflects the fact that In and St are initially no more divergent on average than a random pair of sequences, but there is no diversity within the inversion. Furthermore, in the presence of recombination, FAT can reach an equilibrium level that is lower than its initial value whereas T12 increases above 1 (e.g. Figure 4, lower middle panel), so that the 2 statistics can change in opposite directions over time (see also Zeng et al. 2021, Fig. 8).
As described earlier, FAT is equivalent to the measure of LD of Ohta and Kimura (1971), which is approximately equal to R2, the squared correlation coefficient between 2 loci of Hill and Robertson (1968), so that the use of FAT is effectively equivalent to estimating LD between SNPs and karyotype. This equivalence does not hold for , however, which yields much larger values than FAT for equilibrium situations (Charlesworth 1998; Gammerdinger et al. 2020), and has frequently been used to characterize differentiation between arrangements (e.g. Corbett-Detig and Hartl 2012, Kapun et al. 2023). This difference arises because the theoretical value of is equal to 1 − TS/T12 whereas FAT = 1 − TS/TT with TT < T12 at recombination–drift equilibrium (see Equations 4). The difference between them can be considerable; Table 2 of Laayouni et al. (2003) shows both statistics for inversion polymorphisms of Drosophila buzzatii, with approximately 4-fold larger values of compared with FAT. If data on diversity within inverted and standard haplotypes are available, as well as an estimate of x, it is possible to determine FAT from an estimate of and vice-versa. In the case of In(3R)P in the Zambian population of D. melanogaster, the mean value of 0.1 combined with the diversity values in Fig. 2 of Kapun et al. (2023) gives mean FAT ≈ 0.033, close to a direct estimate (Thomas Flatt and Martin Kapun, personal communication). The magnitude of LD between single nucleotide variants (SNPs) and an inversion polymorphism can therefore be rather small, even when there is noticeable sequence differentiation between arrangements.
The larger expected equilibrium value of compared with FAT, as well as its closer relationship with the divergence between arrangements, makes it a more powerful measure of the extent of differentiation between arrangements relative to within-arrangement diversity. It is also more convenient for use with population genomic data based on haploid genome sequences, as is the case for much of the Drosophila Genome Nexus data (Lack et al. 2015, 2016) and for the data in Kapun et al. (2023), because it does not require the reconstruction of the frequencies of diploid genotypes used for calculating the mean diversity over a random set of individuals. As pointed out above, however, direct estimates of T12 have better statistical properties.
LD among neutral variants associated with inversions
As just discussed, estimates of FAT can be used to estimate LD between neutral variants such as silent site SNPs and an inversion polymorphism. It is also of interest to examine LD between the SNPs themselves, as this has been used to infer the existence of inversion polymorphisms in nonmodel organisms from evidence for large blocks of LD in specific regions of the genome (e.g. Faria et al. 2019; Mérot et al. 2021). It is difficult to obtain exact analytical results on LD in structured populations (Wakeley and Lessard 2003), but some approximations are easily obtained. From Equation (A9b) of Charlesworth et al. (1997), if LD within arrangements is small compared with LD between neutral sites and the inversion polymorphism, for a pair of SNPS is approximately equal to the product of for each SNP and the inversion polymorphism. The standard formula for a partial correlation coefficient implies that this is also true for the R2 statistic of Hill and Robertson (1968), if there is little or no correlation between SNPs within arrangements. If there is LD within arrangements, these products somewhat underestimate the corresponding statistics for the pairs of SNPs. If there is a major effect of divergence between arrangements on LD between SNPs and karyotype, there should be little dependence of or R2 on the physical distance between SNPs across the region where crossing over is suppressed by the inversion, in contrast to what is expected for within-karyotype patterns of LD.
Supplementary Fig. 3 of Kapun et al. (2023) shows a pattern of elevated R2 among SNPs across the entire region of In(3R)P in the Zambian population, contrasted with the rapid decay of LD with physical distance between SNPs within inverted and standard haplotypes. As expected from the product formula, the magnitude of individual R2 values is modest in this case—the estimate of 0.033 for FAT given above gives an expected R2 of approximately 0.001 for a pair of SNPs in genomic regions affected by the inversion polymorphism. Much stronger patterns are, however, seen in the non-African samples studied by Kapun et al. (2023), which probably reflect the effects of population bottlenecks or recent selection. This example illustrates the point that substantial associations between SNPs and an inversion polymorphism may exist but could be hard to detect simply from LD patterns in samples without prior knowledge of the existence of the inversion polymorphism. Conversely, false signals of an inversion polymorphism could be generated from localized clusters of LD in bottlenecked populations (e.g. Haddrill et al. 2005). Of course, if there are virtually complete associations between SNPs and arrangements, as in the inversion polymorphisms of Coelopa frigida (Mérot et al. 2021), there will be a strong pattern of localized LD blocks (see their Fig. 2).
Effects of inversions on neutral divergence between populations
For the case of populations with a large number of demes, the extent of population differentiation at a neutral locus associated with an inversion can be measured by FST as defined here (which in this case is nearly the same as the < FST > statistic of Hudson, Slatkin et al. 1992) for standard and inverted haplotypes, denoted here by FST,St and FST,In, respectively—see Equation (S23). Provided that m >> r, which is likely to be true for sites within the inversion or close to the breakpoints, the equilibrium values of FST,In and FST,St are equal to 1/(1 + Mx) and 1/(1 + My), respectively, which are somewhat surprisingly independent of r (Equations (A6h) and (A6i)). These expressions are similar in form to that for a subdivided population in the absence of a polymorphism, with M replaced by Mx and My. For r = 0, it can be shown that these values are reached almost instantaneously once the inversion has reached its equilibrium frequency (Equation S23). Numerical examples show that this is also true for r > 0. The rapid equilibration of FST in subdivided populations is well known (Crow and Aoki 1984; Pannell and Charlesworth 2000).
These results imply that within-arrangement FST or < FST > statistics should be larger for loci within the inversion than for loci that are independent of the inversion, independently of location with respect to the inversion breakpoints. A low-frequency inversion should thus show a considerable inflation in between-population F-statistics, even in the absence of differences in karyotype frequencies between populations, whereas only a small effect should be seen for the standard arrangement; such a pattern was reported by Kennington and Hoffmann. (2013) for the D. melanogaster inversion In(2L)t. Caution should therefore be used in interpreting such a pattern as evidence for spatial differences in selection pressures.
In contrast, if inversion frequencies vary considerably between populations because of divergent selection, the LD between neutral sites and karyotype will cause among-population differentiation at neutral sites (Charlesworth et al. 1997; Guerrero et al. 2012), as has been found in some studies of Drosophila (Pegueroles et al. 2013) and other taxa (Faria et al. 2019; Mérot et al. 2021). If exchange rates increase with distance from breakpoints, then higher values of FST, In and FST, St (or the corresponding < FST > statistics) are expected towards the centers of inversions, as has been observed in some cases, such as Inv2La in Anopheles gambiae (Cheng et al. 2012).
Conclusions
The theoretical results described here provide pointers on how to interpret population genomic data on inversion polymorphisms, with caveats about methods for estimating the ages of inversions. They are subject to several important limitations. In particular, they are based on expectations for pairwise coalescence times. A study of the variances of pairwise diversity and divergence statistics suggests that averages taken over the many thousands of sites in the central regions of several megabase inversions that are several megabases in size have high statistical precision (see sections 8 and 9 of Supplementary File 1), but a rigorous statistical framework such as maximum likelihood inference based on the structured coalescent process (e.g. Lohse et al. 2011) remains to be developed. In addition, many simplifying assumptions have been made, including ignoring the consequences of selection on loci responsible for maintaining the inversion polymorphisms, as well as the effects of Hill–Robertson interference in the low recombination environment characteristic of low-frequency inversions. There is, therefore, plenty of scope for further work.
Supplementary Material
Acknowledgements
I thank Deborah Charlesworth and Thomas Flatt for the helpful discussions and Thomas Flatt for his comments on a draft of this paper. Aneil Agrawal and 2 anonymous reviewers made helpful suggestions for improving the paper. Thomas Flatt and Martin Kapun generously shared their results on In(3R)P.
Appendix
Recursion relations for expected coalescence times
The dynamics of coalescent events are determined jointly by migration, gene conversion, and genetic drift. To determine pairwise coalescent times, we need to consider cases where 2 In haplotypes are sampled (a sample of class 1/1), 2 St haplotypes are sampled (a class 2/2 sample), and 1 of each type is sampled (a class 1/2 sample). The method of Nagylaki (1998) is used for this purpose, which provides a simple way of determining expected coalescent times (see also Charlesworth and Charlesworth 2010, p.316). For 1/1 samples, the probability that the focal neutral site in 1 of the 2 haplotypes was associated with St in the previous generation is 2ry (ignoring second-order terms in ry), since the chance that a given In haplotype was combined with an St haplotype is y and either of the 2 In haplotypes could have experienced a recombination event. There is a probability 2m that 1 of the 2 haplotypes was present in a different deme from the current 1, and a probability of 1/(2Nx) that the 2 haplotypes coalesced in the previous generation if they were both in the same deme, given that haplotypes from different demes cannot coalesce. Similar considerations apply to 2/2 samples, except that x and y are interchanged. For 1/2 samples, there is a probability y that the In haplotype was associated with an St haplotype in the previous generation, so the probability that the focal neutral site in In derives from St is ry. Similarly, there is a probability rx that the focal neutral site in the St haplotype derives from In; the net probability of the focal site coming from a different karyotype from its current one is r. No coalescence of the 2 haplotypes is possible in this case, but there is a probability 2m that one of them has moved from a different deme.
By ignoring second-order terms in m and r, the following recursion relations for the mean coalescence times defined in the main text can be obtained, with primes indicating their values in the next generation:
(A1a) |
(A1b) |
(A1c) |
(A1d) |
(A1e) |
(A1f) |
These equations can be simplified by neglecting products of 1/Nx and 1/Ny with m and r, and rearranging. This yields the simplified recursion relations:
(A2a) |
(A2b) |
(A2c) |
(A2d) |
(A2e) |
(A2f) |
Expressions for equilibrium mean coalescence times
Writing M = 4Nm and R = 4Nr, we obtain the following expressions for equilibrium, equating the t’s and ts:
(A3a) |
(A3b) |
(A3c) |
(A3d) |
(A3e) |
(A3f) |
This is a set of 6 linear equations that can be solved exactly (see File S1, section 1). However, it is possible to obtain informative approximate expressions by assuming that d is very large, with 2m >> r but dr >> 2m, which are plausible conditions for many real-life cases. This means that the terms in R or r on the left-hand sides of Equations (A3a), (A3c), and (A3e) can be neglected, as well as the terms in R or r on the right-hand sides of Equations (A3a), (A3c), and (A3e); in addition, d − 1 can be replaced by d with negligible loss of accuracy.
Using these approximations, we have:
(A4a) |
The intuitive interpretation of this expression is that, if recombination is infrequent compared with migration, a class 1/2 pair of alleles will wait an average of 1/(2m) generations before 1 of them migrates to a different deme, after which their expected coalescent time is t12b. The various t values are of the order of 2NT (see below), so that for large d and moderate FST values, 1/m is small compared with the t's, and the term in m in this equation can be neglected.
Equations (A3a) and (A3c) can then be approximated as follows:
(A4b) |
(A4c) |
From Equations (A3c) and (A4a), we obtain
so that
Similarly,
Using Equations (A4b) and (A4c), after some algebra, we obtain
(A4d) |
(A4e) |
For further work, it is convenient to scale the t's by division by the expected coalescent time for a pair of neutral alleles sampled from the same deme, TSn = 2NT = 2dN. Times on this scale are denoted by uppercase T's. The corresponding scaled recombination rate is denoted by ρ = 4NTr, which replaces 2r in Equations (A4d) and (A4e) when using the rescaled t's; 2m is also replaced by dM, and d can be cancelled from both sides of these equations, which then give a pair of linear equations in T11b and T22b. After some further algebra, their solution can be written in the standard form for a pair of linear equations as follows:
(A5a) |
(A5b) |
(A5c) |
(A5d) |
Use of the expressions for the aij yields the following formulae:
(A6a) |
(A6b) |
(A6c) |
These expressions can be substituted into the scaled versions of Equations (A4d)–(A4e) to obtain the complete approximate solution to the set of equations for the T's:
(A6d) |
(A6e) |
(A6f) |
(A6g) |
If d >> 1, the terms in d in Equations (A6f) and (A6g) can be neglected, so that
(A6h) |
(A6i) |
For calculating the approach to the equilibrium described by Equation (A6), it is convenient to eliminate the terms in 1 on the right-hand sides of Equation (A2) by using the vector of deviations of the T's from their equilibrium values, and applying a matrix A that contains the multipliers of the T's in Equation (A2). Details are given in File S1, section 7.
Data availability
No new data or reagents were generated by this research. The codes for the computer programs used to produce the results described above are available in Supplementary File 2.
Supplemental material available at GENETICS online.
Funding
This work was not funded.
Literature cited
- Andolfatto P, Depaulis F, Navarro A. Inversion polymorphisms and nucleotide variability in Drosophila. Genet Res. 2001;77(1):1–8. doi: 10.1017/S0016672301004955. [DOI] [PubMed] [Google Scholar]
- Andolfatto P, Wall JD, Kreitman M. Unusual haplotype structure at the proximal breakpoint of In(2L)t in a natural population of Drosophila melanogaster. Genetics. 1999;153(3):1297–1311. doi: 10.1093/genetics/153.3.1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Assaf ZJ, Tilk S, Park J, Siegal ML, Petrov DA. Deep sequencing of natural and experimental populations of Drosophila melanogaster reveals biases in the spectrum of new mutations. Genome Res. 2017;27(12):1988–2000. doi: 10.1101/gr.219956.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aulard S, David JR, Lemeunier F JR. Chromosomal inversion polymorphism in Afrotropical populations of Drosophila melanogaster. Genet Res. 2002;79(1):49–63. doi: 10.1017/S0016672301005407. [DOI] [PubMed] [Google Scholar]
- Bartolomé C, Charlesworth B. Rates and patterns of chromosomal evolution in Drosophila pseudoobscura and D. miranda. Genetics. 2006;173(2):779–791. doi: 10.1534/genetics.105.054585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barton NH, Etheridge AM, Kelleher J, Véber A. Genetic hitchhiking in spatially extended populations. Theor Pop Biol. 2013;87:75–89. doi: 10.1016/j.tpb.2012.12.001. [DOI] [PubMed] [Google Scholar]
- Bergero R, Gardner J, Bader B, Yong L, Charlesworth D. Exaggerated heterochiasmy in a fish with sex-linked male coloration polymorphisms. Proc Natl Acad Sci USA. 2019;116(14):6924–6931. doi: 10.1073/pnas.1818486116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buffalo V. Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin's paradox. Elife. 2021;10:e67509. doi: 10.7554/eLife.67509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cáceres M, Puig M, Ruiz A. Molecular characterization of two natural hotspots in the Drosophila buzzatii genome induced by transposon insertions. Genome Res. 2001;11(8):1353–1364. doi: 10.1101/gr.174001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B. Measures of divergence between populations and the effect of forces that reduce variability. Mol Biol Evol. 1998;15(5):538–543. doi: 10.1093/oxfordjournals.molbev.a025953. [DOI] [PubMed] [Google Scholar]
- Charlesworth B, Charlesworth D. Elements of Evolutionary Genetics. Greenwood Village: (CO: ): Roberts and Company; 2010. [Google Scholar]
- Charlesworth B, Charlesworth D, Barton NH. The effects of genetic and geographic structure on neutral variation. Ann Rev Ecol Evol Syst. 2003;34(1):99–125. doi: 10.1146/annurev.ecolsys.34.011802.132359. [DOI] [Google Scholar]
- Charlesworth B, Jensen JD. The effects of selection at linked sites on patterns of genetic variability. Ann Rev Ecol Evol Syst. 2021;52(1):177–197. doi: 10.1146/annurev-ecolsys-010621-044528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B, Nordborg M, Charlesworth D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res. 1997;70(2):155–174. doi: 10.1017/S0016672397002954. [DOI] [PubMed] [Google Scholar]
- Charlesworth D. Effects of inbreeding on the genetic diversity of plant populations. Phil Trans R Soc B. 2003;358(1434):1051–1070. doi: 10.1098/rstb.2003.1296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng C, White BJ, Kamdem C, Mockaitis K, Costantini C, Hahn MW, Besansky NJ. Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics. 2012;190:1417–1432. doi: 10.1534/genetics.111.137794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chovnick A. Gene conversion and transfer of genetic information within the inverted region of inversion heterozygotes. Genetics. 1973;75(1):123–131. doi: 10.1093/genetics/75.1.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbett-Detig RB, Hartl DL. Population genomics of inversion polymorphisms in Drosophila melanogaster. PLoS Genet. 2012;8(12):e1003056. doi: 10.1371/journal.pgen.1003056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crow JF, Aoki K. Group selection for a polygenic behavioral trait—estimating the degree of population subdivision. Proc Natl Acad Sci USA. 1984;81(19):6073–6077. doi: 10.1073/pnas.81.19.6073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crown KN, Miller DE, Sekelsky J, Hawley RS. Local inversion heterozygosity alters recombination throughout the genome. Curr Biol. 2018;28(18):2984–2990.e3. doi: 10.1016/j.cub.2018.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faria R, Chaube P, Morales HE, Larsson T, Lemmon AR, Lemmon EM, Rafajlovic M, Panova M, Ravinet M, Johannesson K, et al. Multiple chromosomal rearrangements in a hybrid zone between Littorina saxatilis ecotypes. Mol Ecol. 2019;28(6):1375–1393. doi: 10.1111/mec.14972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gammerdinger W, Toups M, Vicoso B. Disagreement in FST estimators: a case study from sex chromosomes. Mol Ecol Res. 2020;20(6):1517–1525. doi: 10.1111/1755-0998.13210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong WJ, McKim KS, Hawley RS. All paired up with no place to go: pairing, synapsis, and DSB formation in a balancer heterozygote. PLoS Genet. 2005;1(5):e67. doi: 10.1371/journal.pgen.0010067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guerrero RF, Rousset F, Kirkpatrick M. Coalescent patterns for chromosomal inversions in divergent populations. Phil Trans R Soc B. 2012;367(1587):430–438. doi: 10.1098/rstb.2011.0246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P. Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res. 2005;15(6):790–799. doi: 10.1101/gr.3541005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hasson E, Eanes WF. Contrasting histories of three gene regions associated with In (3L)Payne of Drosophila melanogaster. Genetics. 1996;144(4):1565–1575. doi: 10.1093/genetics/144.4.1565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor Appl Genet. 1968;38(6):226–231. doi: 10.1007/BF01245622. [DOI] [PubMed] [Google Scholar]
- Hudson RR. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 1990;7:1–45. [Google Scholar]
- Hudson RR, Boos DD, Kaplan NL. A statistical test for detecting geographic subdivision. Mol Biol Evol. 1992a;9(1):138–151. doi: 10.1093/oxfordjournals.molbev.a040703. [DOI] [PubMed] [Google Scholar]
- Hudson RR, Kaplan NL. The coalescent process in models with selection and recombination. Genetics. 1988;120(3):831–840. doi: 10.1093/genetics/120.3.831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson RR, Slatkin M, Maddison WP. Estimation of levels of gene flow from population data. Genetics. 1992b;132(2):583–589. doi: 10.1093/genetics/132.2.583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Innan H, Nordborg M. The extent of linkage disequilibrium and haplotype sharing around a polymorphic site. Genetics. 2003;165(1):437–444. doi: 10.1093/genetics/165.1.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishii K, Charlesworth B. Associations between allozyme loci and gene arrangements due to hitch-hiking effects of new inversions. Genet Res. 1977;30(2):93–106. doi: 10.1017/S0016672300017511. [DOI] [Google Scholar]
- Ives PT, Band HT. Continuing studies on the South Amherst Drosophila melanogaster natural population during the 1970s And 1980s. Evolution. 1986;40(6):1289–1302. doi: 10.2307/2408954. [DOI] [PubMed] [Google Scholar]
- Johri P, Charlesworth B, Jensen JD. Toward an evolutionarily appropriate null model: jointly inferrring demography and purifying selection. Genetics. 2020;215(1):173–192. doi: 10.1534/genetics.119.303002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapun M, Durmaz Mitchell E, Kawecki T, Schmidt P, Flatt T. An ancestral balanced inversion polymorphism confers global adaptation. Mol Biol Evol. 2023;40(6):msad118. doi: 10.1093/molbev/msad118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapun M, Fabian DK, Goudet J, Flatt T. Genomic evidence for adaptive inversion clines in Drosophila melanogaster. Mol Biol Evol. 2016;33(5):1317–1336. doi: 10.1093/molbev/msw016. [DOI] [PubMed] [Google Scholar]
- Kapun M, Flatt T. The adaptive significance of chromosomal inversion polymorphisms in Drosophila melanogaster. Mol Ecol. 2019;28(6):1263–1282. doi: 10.1111/mec.14871. [DOI] [PubMed] [Google Scholar]
- Kennington WJ, Hoffmann AA. Patterns of genetic variation across inversions, geographic variation in the In(2L)t inversion in populations of Drosophila melanogaster from eastern Australia. BMC Evol Biol. 2013;13(1):100. doi: 10.1186/1471-2148-13-100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura M. Theoretical foundations of population genetics at the molecular level. Theor Pop Biol. 1971;2(2):174–208. doi: 10.1016/0040-5809(71)90014-1. [DOI] [PubMed] [Google Scholar]
- Kirkpatrick M, Guerrero RF. Signatures of sex-antagonistic selection on recombining sex chromosomes. Genetics. 2014;197(2):531–541. doi: 10.1534/genetics.113.156026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korunes K, Noor MAF. Pervasive gene conversion in chromosomal inversion heterozygotes. Mol Ecol 2019;28(6):1302–1315. doi: 10.1111/mec.14921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koury SA. Predicting recombination suppression outside chromosomal inversions in Drosophila melanogaster using crossover interference theory. Heredity (Edinb). 2023;130(4):196–208. doi: 10.1038/s41437-023-00593-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krimbas CB, Powell JR. Drosophila Inversion Polymorphism. Boca Raton: (FL: ): CRC Press; 1992a. [Google Scholar]
- Krimbas CB, Powell JR. Introduction. In: Krimbas CB, Powell JR, editors. Drosophila Inversion Polymorphism. Boca Raton: (FL: ): CRC Press; 1992b. p. 1–52. [Google Scholar]
- Laayouni H, Hasson E, Santos M, Fontdevila A. The evolutionary history of Drosophila buzzatii. XXXV. Inversion polymorphism and nucleotide variability in different regions of the second chromosome. Mol Biol Evol. 2003;20(6):931–944. doi: 10.1093/molbev/msg099. [DOI] [PubMed] [Google Scholar]
- Lack JB, Cardeno CM, Crepeau MW, Taylor W, Corbett-Detig RB, Stevens KA, Langley CH, Pool JE. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics. 2015;199(4):1229–1241. doi: 10.1534/genetics.115.174664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lack JB, Lange JD, Tang AD, Corbett-Detig RB, Pool JE. A thousand fly genomes: an expanded Drosophila genome nexus. Mol Biol Evol. 2016;33(12):3308–3313. doi: 10.1093/molbev/msw195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange JD, Bastide H, Lack JB, Pool JE. A population genomic assessment of three decades of evolution in a natural Drosophila population. Mol Biol Evol 2022;39(2):msab368. doi: 10.1093/molbev/msab368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langley CH, Stevens K, Cardeno C, Lee YCG, Schrider DR, Cardeno C, Lee YCG, Schrider DR, Pool JE, Langley SA, et al. Genomic variation in natural populations of Drosophila melanogaster. Genetics. 2012;192(2):533–598. doi: 10.1534/genetics.112.142018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Berent E, Hadjipanteli S, Galey M, Muhammad-Lahbabi N, Miller DE, Crown KN. Heterozygous inversion breakpoints suppress meiotic crossovers by altering recombination repair outcomes. PLoS Genet. 2023;19(4):e1010702. doi: 10.1371/journal.pgen.1010702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lohse K, Harrison RJ, Barton NH. A general method for calculating likelihoods under the coalescent process. Genetics. 2011;189(3):977–987. doi: 10.1534/genetics.111.129569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maruyama T. A simple proof that certain quantitities are independent of the geographic structure of populations. Theor Pop Biol. 1974;5(2):148–154. doi: 10.1016/0040-5809(74)90037-9. [DOI] [PubMed] [Google Scholar]
- Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(1):23–35. doi: 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
- Mérot C, Berdan EL, Cayuela H, Djambazian H, Ferchaud A, Laporte M, Normandeau E, Ragoussis J, Wellenreuther M, Bernatchez L. Locally adaptive inversions modulate genetic variation at different geographic scales in a seaweed fly. Mol Biol Evol 2021;38(9):3953–3971. doi: 10.1093/molbev/msab143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mérot C, Llaurens V, Normandeau E, Bernatchez L, Wellenreuther M. Balancing selection via life-history trade-offs maintains an inversion polymorphism in a seaweed fly. Nat Commun. 2020;11(1):670. doi: 10.1038/s41467-020-14479-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukai T, Yamaguchi O. The genetic structure of natural populations of Drosophila melanogaster. XI. Genetic variability in a local population. Genetics. 1974;76(2):339–366. doi: 10.1093/genetics/76.2.339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munté A, Rozas J, Aguadé M, Segarra C. Chromosomal inversion polymorphism leads to extensive genetic structure: a multilocus survey in Drosophila subobscura. Genetics. 2005;169(3):1573–1581. doi: 10.1534/genetics.104.032748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagylaki T. Geographical invariance in population genetics. J Theor Biol. 1982;99(1):159–172. doi: 10.1016/0022-5193(82)90396-4. [DOI] [PubMed] [Google Scholar]
- Nagylaki T. The expected number of heterozygous sites in a subdivided population. Genetics. 1998;149(3):1599–1604. doi: 10.1093/genetics/149.3.1599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Navarro A, Barbadilla A, Ruiz A. Effect of inversion polymorphism on the neutral nucleotide variability of linked chromosomal regions in Drosophila. Genetics. 2000;155(2):685–698. doi: 10.1093/genetics/155.2.685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Navarro A, Betrán E, Barbadilla A, Ruiz A. Recombination and gene flux caused by gene conversion and crossing over in inversion heterokaryotypes. Genetics. 1997;146(2):695–709. doi: 10.1093/genetics/146.2.695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nei M. Molecular Evolutionary Genetics. New York: Columbia University Press; 1987. [Google Scholar]
- Nobréga C, Khadem M, Aguadé M, Segarra C. Genetic exchange versus genetic differentiation in a medium-sized inversionof Drosophila: the A2/Ast arrangements of Drosophila subobscura. Mol Biol Evol. 2008;25(8):1534–1543. doi: 10.1093/molbev/msn100. [DOI] [PubMed] [Google Scholar]
- Nordborg M. Structured coalescent processes on different time scales. Genetics. 1997;146(4):1501–1514. doi: 10.1093/genetics/146.4.1501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordborg M, Innan H. The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population. Genetics. 2003;163(3):1201–1213. doi: 10.1093/genetics/163.3.1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohta T, Kimura M. Linkage disequilibrium due to random genetic drift. Genet Res. 1969;13(1):47–55. doi: 10.1017/S001667230000272X. [DOI] [Google Scholar]
- Ohta T, Kimura M. Development of associative overdominance through linkage disequilibrium in finite populations. Genet Res. 1970;18(3):277–286. doi: 10.1017/S0016672300012684. [DOI] [PubMed] [Google Scholar]
- Ohta T, Kimura M. Linkage disequilibrium between two segregating nucleotide sites under steady flux of mutations in a finite population. Genetics. 1971;68(4):571–580. doi: 10.1093/genetics/68.4.571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orengo DJ, Puerma E, Cereijo U, Aguadé M. The molecular genealogy of sequential overlapping inversions implies both homologous chromosomes of a heterokaryotype in an inversion origin. Sci Rep. 2019;9(1):17009. doi: 10.1038/s41598-019-53582-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pannell JR, Charlesworth B. Effects of metapopulation processes on measures of genetic diversity. Phil Trans R Soc B. 2000;355(1404):1851–1864. doi: 10.1098/rstb.2000.0740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Papaceit M, Segarra C, Aguadé M. Structure and population genetics of the breakpoints of a polymorphic inversion in Drosophila subobscura. Evolution. 2013;67:67–79. doi: 10.1111/j.1558-5646.2012.01731.x. [DOI] [PubMed] [Google Scholar]
- Pegueroles C, Aquadro CF, Mestres F, Pascual M. Gene flow and gene flux shape evolutionary patterns of variation in Drosophila subobscura. Heredity. 2013;110(6):520–529. doi: 10.1038/hdy.2012.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers AR. Genetic evidence for a Pleistocene explosion. Evolution. 1995;49(4):608–615. doi: 10.2307/2410314. [DOI] [PubMed] [Google Scholar]
- Rousset F, Kirkpatrick M, Guerrero RF. Matrix inversions for chromosomal inversions: a method to construct summary statistics in complex coalescent models. Theor Pop Biol. 2014;97:1–10. doi: 10.1016/j.tpb.2014.07.005. [DOI] [PubMed] [Google Scholar]
- Roux C, Fraisse C JR, Anciaux Y, Galtier N, Bierne N. Shedding light on the grey zone of speciation along a continuum of genomic divergence. PLoS Biol. 2016;14(12):e2000234. doi: 10.1371/journal.pbio.2000234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rozas J, Segarra C, Ribó G, Aguadé M. Molecular population genetics of the rp49 gene region in different chromosomal inversions of Drosophila subobscura. Genetics. 1999;151(1):189–202. doi: 10.1093/genetics/151.1.189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaeffer SW. Molecular population genetics of sequence length diversity in the Adh region of Drosophila pseudoobscura. Genet Res. 2002;80(3):163–175. doi: 10.1017/S0016672302005955. [DOI] [PubMed] [Google Scholar]
- Singh RS, Rhomberg LR. A comprehensive study of genic variation in natural populations of Drosophila melanogaster. II. Estimates of heterozygosity and patterns of geographic differentiation. Genetics. 1987;117(2):255–271. doi: 10.1093/genetics/117.2.255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sprengelmeyer QD, Mansourian S, Lange JD, Matute DR, Cooper BS, Jirle EV, Stensmyr MC, Pool JE. Recurrent collection of Drosophila melanogaster from wild African environments and genomic insights into species history. Mol Biol Evol. 2020;37(3):627–638. doi: 10.1093/molbev/msz271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strobeck C. Expected linkage disequilibrium for a neutral locus linked to a chromosomal rearrangement. Genetics. 1983;103(3):545–555. doi: 10.1093/genetics/103.3.545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sturtevant AH. Genetic factors affecting the strength of linkage in Drosophila. Proc Natl Acad Sci USA. 1917;3(9):555–558. doi: 10.1073/pnas.3.9.555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sturtevant AH. A crossover reducer in Drosophila melanogaster due to inversion of a section of chromosomes. Biol Zentralbl. 1926;46:697–702. [Google Scholar]
- Sturtevant AH, Beadle GW. The relations of inversions in the X chromosome of Drosophila melanogaster to crossing over and disjunction. Genetics. 1936;21(5):554–604. doi: 10.1093/genetics/21.5.554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahata N. A simple genealogical structure of strongly balanced allelic lines and trans-species evolution of polymorphism. Proc Natl Acad Sci USA. 1990;87(7):2419–2423. doi: 10.1073/pnas.87.7.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toups M, Rodrigues N, Perrin N, Kirkpatrick M. A reciprocal translocation radically reshapes sex-linked inheritance in the common frog. Mol Ecol. 2019;28(8):1877–1889. doi: 10.1111/mec.14990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Villoutreix R, Ayala D, Joron M, Gompert Z, Feder JL, Nosil P. Inversion breakpoints and the evolution of supergenes. Mol Ecol. 2021;30(12):2738–2755. doi: 10.1111/mec.15907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakeley J, Aliacar N. Gene genealogies in a metapopulation. Genetics. 2001;159(2):893–905. doi: 10.1093/genetics/159.2.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakeley J, Lessard S. Theory of the effects of population structure and sampling on patterns of linkage disequilibrium applied to genomic data from humans. Genetics. 2003;164(3):1043–1053. doi: 10.1093/genetics/164.3.1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38(6):1358–1370. doi: 10.1111/j.1558-5646.1984.tb05657.x. [DOI] [PubMed] [Google Scholar]
- Wellenreuther M, Bernatchez L. Eco-evolutionary genomics of chromosomal inversions. Trnds. Ecol Evol. 2018;33(6):427–440. doi: 10.1016/j.tree.2018.04.002. [DOI] [PubMed] [Google Scholar]
- Wright S. The genetical structure of populations. Ann Eugen. 1951;15(4):323–354. doi: 10.1111/j.1469-1809.1949.tb02451.x. [DOI] [PubMed] [Google Scholar]
- Wright S, Dobzhansky T, Hovanitz W. Genetics of natural populations. VII. The allelism of lethals in the third chromosome of Drosophila pseudoobscura. Genetics. 1942;27(4):363–394. doi: 10.1093/genetics/27.4.363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng K, Charlesworth B, Hobolth A. Studying models of balancing selection using phase-type theory. Genetics. 2021;218(2):iyab055. doi: 10.1093/genetics/iyab055 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No new data or reagents were generated by this research. The codes for the computer programs used to produce the results described above are available in Supplementary File 2.
Supplemental material available at GENETICS online.