Abstract
Characterizing spatial patterns in allele frequencies is fundamental to evolutionary biology because these patterns contain evidence of underlying processes. However, the spatial scales at which gene flow, changing selection, and drift act are often unknown. Many of these processes can operate inconsistently across space, causing nonstationary patterns. We present a wavelet approach to characterize spatial pattern in allele frequency that helps solve these problems. We show how our approach can characterize spatial patterns in relatedness at multiple spatial scales, i.e. a multilocus wavelet genetic dissimilarity. We also develop wavelet tests of spatial differentiation in allele frequency and quantitative trait loci (QTL). With simulation, we illustrate these methods under different scenarios. We also apply our approach to natural populations of Arabidopsis thaliana to characterize population structure and identify locally adapted loci across scales. We find, for example, that Arabidopsis flowering time QTL show significantly elevated genetic differentiation at 300–1,300 km scales. Wavelet transforms of allele frequencies offer a flexible way to reveal geographic patterns and underlying evolutionary processes.
Keywords: landscape genetics, F ST , local adaptation, isolation by distance
Introduction
Geographic clines in allele frequency are a classic pattern in evolutionary biology, being frequently observed in nature and having extensive theory for the underlying processes. For example, theory describes how limited gene flow and drift (Wright 1931) or changing selection (Haldane 1948) can generate allele frequency differences between populations. Accordingly, researchers often estimate and model spatial allele frequency patterns to make inferences about underlying evolutionary and ecological mechanisms. To do so, researchers often divide sampled individuals into discrete groups (populations) among which differences in allele frequencies are calculated. A common such approach involves estimating , the proportion of total allele frequency variation that differs between discrete populations (Wright 1949; Lewontin and Krakauer 1973).
However, many species exist as more or less continuously distributed populations. Theoretical study of allele frequency change across continuous populations began as early as Wright (1943) and Malécot (1948), who found expectations for genetic differentiation or kinship as functions of gene flow and geographic distance. Later progress included diffusion models (Nagylaki 1978) and stepping stone/lattice models (Kimura and Weiss 1964) giving expectations for correlation in allele frequencies across distance, and models accounting for population regulation by negative density dependence (Barton et al. 2002).
Despite these theoretical advances, the statistical tools for inference on continuously distributed populations have lagged (Bradburd and Ralph 2019; Hancock et al. 2023). Nevertheless, statistical approaches to studying spatial pattern in continuous populations include models relating landscape features to gene flow (McRae et al. 2008), calculating correlations between spatial functions and genotype (Yang et al. 2012; Wagner et al. 2017), and applying discrete landscape grids to identify geographic regions where genetic turnover is particularly high or low (Petkova et al. 2016). Approaches have been developed to estimate the average distance of gene flow from the slope of genetic divergence versus geographic distance (Rousset 2000; Vekemans and Hardy 2004), to estimate localized genetic “neighborhoods” (Wright 1946; Shirk and Cushman 2014), and to model both discrete and continuous relatedness patterns simultaneously (Bradburd et al. 2018).
In recent years, researchers have collected many large, broadly distributed DNA sequence datasets from diverse species (Alonso-Blanco et al. 2016; Yeaman et al. 2016; Wang et al. 2020; Machado et al. 2021). Statistical inference can be applied to these data to understand gene flow, demographic histories, and spatially varying selection. Despite the progress made by previous approaches, there remain challenges.
The form and scale of relevant spatial patterns is unknown
Humans can infer seemingly meaningful patterns in even randomly generated images (Blakemore et al. 2003; Ayton and Fischer 2004; Fyfe et al. 2008). So what are the spatial patterns we are looking for? The functional forms (i.e. shapes) of both spatially varying selection and neutral processes (e.g. dispersal kernels) are often unknown, as are the forms of resulting spatial patterns. For example, the specific environmental gradients driving changing selection are often not known, nor is the spatial scale at which they act, and whether they change at the same rate consistently across a landscape.
In the case of neutral processes, a homogeneous landscape approximately at equilibrium is rarely of interest to empiricists. Instead, the influence of heterogeneous landscapes (Manel et al. 2003) and historical contingency is usually a major force behind spatial patterns in allele frequency and traits (Excoffier and Ray 2008). As a result, researchers often attempt to characterize spatial patterns of relatedness and genetic similarity to make inferences about variation in gene flow (McRae et al. 2008; Wang et al. 2009; Peterman 2018) and recent population expansion (Slatkin 1993). The influence of gene flow, drift, and range expansion can occur at a variety of spatial scales, and in different ways across a heterogenous landscape. For example, the rate at which relatedness decays over geographic distance can change abruptly at major barriers (Rosenberg et al. 2005). However, the scale-specificity and nonstationarity of such patterns can be challenging to characterize.
The spatially varying selective gradients causing local adaptation are unknown
One important force behind allele frequency clines is changing selection due to environmental gradients, resulting in local adaptation. However, it is often not clear what environmental gradients drive local adaptation (Kawecki and Ebert 2004). This is especially true of nonmodel systems and those with little existing natural history knowledge. Even for well-studied species, it is not trivial to identify the specific environmental conditions that change in space and drive local adaptation. Ecology is complex, and abiotic and biotic conditions are high-dimensional. Rather than a priori selection of a putative selective gradient, an alternative approach is to search for spatial patterns in allele frequencies that cannot be explained by neutral processes. This approach is embodied by several statistics and approaches, such as (Weir and Cockerham 1984), XtX (Gautier 2015), spatial ancestry analysis (SPA) (Yang et al. 2012), Moran’s eigenvector maps (MEMs) (Wagner et al. 2017), and others.
Many approaches rely on discretization of population boundaries
Some of the aforementioned approaches rely on dividing sampled individuals into discrete spatial groups. is one such approach, that was introduced by Wright (1949) and defined as the “correlation between random gametes, drawn from the same subpopulation, relative to the total,” where the definition of “total” has been interpreted differently by different authors (Bhatia et al. 2013). The classic approach of calculating to test for selection was usually applied to a small number of locations, a situation when discretization (i.e. deciding which individuals genotyped belong in which population) was a simpler problem. Current studies often sample and sequence individuals from hundreds of locations, and so the best approach for discretizing these genotyped individuals into defined ’populations’ is less clear. In addition to the spatial scale of subpopulations, at issue is precisely where to place the boundaries between populations. The problem is enhanced for broadly distributed species, connected by gene flow, that lack clear spatially distinct populations (Josephs et al. 2019). Even if clustering algorithms appear to show clustering of genotypes, these methods can be sensitive to sampling bias (e.g. geographic clustering) and can mislead as to the existence of discrete subpopulations (Serre and Pääbo 2004; Frantz et al. 2009).
Some approaches are not limited by discretization, and might be generally termed “population-agnostic” because discrete populations are not defined. These instead use ordination of genetic loci or geographic location. Approaches that use ordination (such as PCA) of genetic loci look for particular loci with strong loadings on PCs (Duforet-Frebourg et al. 2016) or traits with an unexpectedly high correlation with individual PCs (Josephs et al. 2019). Alternatively, ordination of distance or spatial neighborhood matrices can create spatial functions that can be used in correlation tests with genetic loci (Wagner et al. 2017). However, ordinations to create individual rotated axes are not done with respect to biology and so might not be ideal for characterizing biological patterns. For example, ordinations of genetic loci are heavily influenced by global outliers of genetic divergence (Peter et al. 2020) and uneven sampling (McVean 2009). Ordinations like PCA also often lack parametric null distributions for hypothesis testing.
Wavelet characterization of spatial pattern
Instead of discretizing sampled locations into populations, one could model allele frequencies with flexible but smooth functions. Wavelet transforms allow one to characterize the location and the scale or frequency of a signal (Daubechies 1992). Daubechies (1992) gives a nice analogy of wavelet transforms: they are akin to written music, which indicates a signal of a particular frequency (musical notes of different pitch) at a particular location (the time at which the note is played, in the case of music). Applying this analogy to genetics, the frequency is the rate at which allele frequencies change in space, and the location is the part of a landscape where allele frequencies change at this rate. Applying wavelet basis functions to spatial genetic data could allow us to characterize localized patterns in allele frequency, and dilating the scale of these functions could allow us to characterize scale-specific patterns in allele frequency (see Supplementary Fig. 1 for an example). Note that wavelets are distinct from Fourier analysis. Wavelets capture localized signals because the basis functions’ variance goes to zero moving away from the focal location, while Fourier can only capture global average patterns as it uses stationary (unchanging) basis functions. Wavelet transforms have had some recent applications in modeling ancestry along the genome (Pugach et al. 2011; Groh and Coop 2024) but have not been implemented to model geographic genetic patterns.
Keitt (2007) created a wavelet approach for characterizing spatial patterns in ecological communities. He used this approach to identify locations and scales with particular high community turnover, and created null-hypothesis testing of these patterns. These spatial patterns in the abundance of multiple species are closely analogous to spatial patterns in allele frequency of many genetic markers across the genome, and previous spatial genetic studies have also profited by borrowing tools from spatial community ecology (Lasky et al. 2012; Fitzpatrick and Keller 2015). Here we modify and build on this approach to characterize spatial pattern in allele frequency across the genome and at individual loci.
Methods
Wavelet characterization of spatial pattern in allele frequency
Our implementation here begins by following the work of Keitt (2007) in characterizing spatial community turnover, except that we characterize genomic patterns using allele frequencies of multiple loci in place of abundances of multiple species in ecological communities. In later sections of this paper, we build off this approach and develop new tests for selection on specific loci. Our implementation of wavelets allows estimation of scale-specific signals (here, allele frequency clines) centered on a given point, , in two-dimensional space. We use a version of the Difference-of-Gaussians (DoG) wavelet function (Supplementary Fig. 1) (Muraki 1995). We start with a Gaussian smoothing function centered at for a set of sampling points , which takes the form
| (1) |
where s controls the scale of analysis and is the Gaussian kernel .
The DoG wavelet function then takes the form
| (2) |
where , and so the larger scale smooth function is subtracted from the smaller scale smooth to characterize the scale-specific pattern. If we use , then the dominant scale of analysis resulting from the DoG is s distance units (Keitt 2007). This formulation of the wavelet kernel is similar in shape to the derivative-of-Gaussian kernel and has the advantage of maintaining admissibility (Daubechies 1992) even near boundaries, as each of the smoothing kernels are normalized over the samples such that their difference integrates to zero.
Let be the major allele frequency of the ith locus from a set of I biallelic markers at a location with spatial coordinates . The adaptive wavelet transform of allele frequency data at locus i, centered at and at scale s is then
| (3) |
where the right summation is of the product of the smooth function and the allele frequencies across locations. The magnitude of this summation will be greatest when the DoG wavelet filter matches the allele frequency cline. That is, when the shape of the wavelet filter matches the allele frequency cline in space, the product of and will resonate (increase in amplitude) yielding greater variation among locations in , the wavelet-transformed allele frequencies. When the spatial pattern in the wavelet filter and allele frequencies are discordant, the variation in their product, and hence the wavelet-transformed allele frequency, is reduced. For consistency, here we choose major allele frequency for , though in practice the signing of alleles has little impact on our results.
The term in equation 3 is used to normalize the variation in the wavelet function so that the wavelet transforms are comparable for different scales s and locations :
| (4) |
When is far from locations in Ω relative to the scale s, the Gaussian functions [] that make up the wavelet function ψ are only evaluated over a range where they remain close to zero. Thus unsampled geographic regions will have very small , the term used to normalize for local variation in the wavelet basis functions. In turn, very small dramatically and undesirably inflates the wavelet transformed allele frequencies (equation 3) in these geographic regions where there is little sampling relative to s. For this reason, we do not calculate the wavelet transform for locations where there are no locations sampled closer than distance units.
Below we illustrate how to apply this wavelet transform (equation 3) of spatial allele frequency patterns to characterize genome-wide patterns, as well as to test for local adaption at individual loci.
Wavelet characterization of spatial pattern in multiple loci
Researchers are often interested in characterizing spatial patterns aggregated across multiple loci across the genome to understand patterns of relatedness, population structure, and demographic history. Here, we specifically want to characterize heterogeneity in spatial patterns, because this heterogeneity in pattern may reflect heterogeneity in underlying processes: where there is heterogeneity in migration rates, such as where there are migration barriers (Petkova et al. 2016), or where there are recent range expansions such that spatial patterns are farther from equilibrium (Slatkin 1993).
We use
| (5) |
to calculate a “wavelet genetic distance” or “wavelet genetic dissimilarity.” This wavelet genetic dissimilarity is computed as the euclidean distance (in the space of allele frequencies across the genome) between the genetic composition centered at and other locations across s distance units. This wavelet genetic dissimilarity is localized in space and scale-specific. This quantity captures the level of genetic turnover at scale s centered at , and is capturing similar information as the increase in average genetic distance between a genotype at and other genotypes s distance units away. To obtain the average dissimilarity across the landscape, one can also calculate the mean of across locations at each sampled site, to get a mean wavelet genetic dissimilarity for s. A benefit of using the wavelet transformation over sliding window approaches (e.g. Bishop et al. 2023) is that wavelets smoothly incorporate patterns from samples that are not precisely s distance units away and can be centered at any location of the analyst’s choosing.
Testing the null hypothesis of no spatial pattern in allele frequency
A null hypothesis of no spatial pattern in allele frequencies can be generated by permuting the location of sampled populations among each other. Most empirical systems are not panmictic, and so this null model is trivial in a sense. However, comparison with this null across scales and locations can reveal when systems shift from small-scale homogeneity (from local gene flow) to larger scale heterogeneity (from limited gene flow) (Keitt 2007).
Simulated neutral patterns across a continuous landscape
To demonstrate the wavelet transformation of allele frequencies, and wavelet genetic dissimilarity function, we applied these tools to several simulated scenarios. First, we conducted forward landscape genetic simulations under neutrality using the SLiM software (Haller and Messer 2019), building off published approaches (Battey et al. 2020a, 2020b). We simulated outcrossing, iteroparous, hermaphroditic organisms, with modest lifespans (average of time steps). Individual fecundity was Poisson distributed, mating probability (determining paternity) was determined based on a Gaussian kernel (truncated at 3 SD), and dispersal distance from mother was also Gaussian (Battey et al. 2020a). Individuals became mature in the time step following their dispersal. These parameters roughly approximate a short lived perennial plant with gene flow via pollen movement and seed dispersal. Competition reduced survival and decayed with distance following a Gaussian (truncated at 3 SD, Battey et al. 2020b). Near landscape boundaries, survival was reduced to compensate for lower competition from beyond the landscape margin (Battey et al. 2020b). Code is available at GitHub (https://github.com/jesserlasky/WaveletSpatialGenetic).
We began by characterizing a simple scenario across a continuous landscape. We simulated a square two dimensional landscape measuring 25 units on each side. The standard deviation of mating and dispersal distance σ were both 0.2, yielding a combined standard deviation of gene flow distances of 0.24 []. In this first simulation there was no selection. The population was allowed to evolve for 1,00,000 time steps before we randomly sampled 200 individuals and 1,000 SNPs with a minor allele frequency of at least 0.05. The first two principal components (PCs) of these SNPs show smooth population structure across the landscape, and that these two PCs predict the spatial location of each sample (Supplementary Fig. S2 in File S1).
To facilitate interpretation of wavelet transformed allele frequencies we provide two example loci i with distinct spatial patterns (Fig. 1). The first locus has the greatest variance in wavelet transformed allele frequencies among sampled loci at (Fig. 1a–c) while the second locus has the greatest variance at (Fig. 1d–f).
Fig. 1.
Two example SNPs (rows) with distinct spatial patterns. Shading shows either allelic variation (untransformed, a, d) or variation in wavelet transformed allele frequencies (b,c,e,f). The first locus a–c) has the greatest variance in wavelet transformed allele frequency among sampled loci at . The second locus d–f) has the greatest variance in wavelet transformed allele frequency at . For the SNP in the top row, the variance among locations in for is 0.56 (visualized as shading in b), while it is only 0.17 for the SNP in the bottom row e). For the SNP in the bottom row, the variance among locations in for is 44.46 (visualized as shading in f), while it is only 1.24 for the SNP in the top row c).
We then calculated wavelet dissimilarity , aggregating the signals in across loci i, for each sampled location at a range of spatial scales s. Here and below we use a set of scales increasing by a constant log distance interval, as genetic distances are often linearly correlated to log geographic distances in two dimensions (Rousset 1997). The mean across sampled locations for each scale was calculated and compared to the null distribution for that scale (Supplementary Fig. S2 in File S1). The null was generated by permuting locations of sampled individuals as described above, and observed mean of dissimilarity was considered significant if it was below the 2.5 percentile or above the 97.5 percentile of dissimilarity from null permutations.
When comparing our simulated data to the null, we found that mean wavelet genetic dissimilarity was significantly less than expected under the null model at scales , due to local homogenization by gene flow (SD = 0.24). At scales , wavelet dissimilarity was significantly greater than expected, due to isolation by distance, with monotonically increasing wavelet genetic dissimilarity at greater scales (Supplementary Fig. 2).
To demonstrate how the scale of gene flow influences the wavelet dissimilarity , we also conducted identical simulations as described above but instead with standard deviations of mating and dispersal distances, σ, of 0.5, 1, 2, or 5, yielding combined standard deviations of gene flow distances of 0.61, 1.22, 2.45, and 6.12.
To verify that simulations were generating results consistent with theoretical expectations of continuous populations at equilibrium, we compared the simulated gene flow parameters with estimations from the simulated data based on theory. The slope of genetic differentiation versus geographic distance in two dimensions is expected to be proportional to the inverse of Wright’s neighborhood size, , where D is the effective population density and σ is the standard deviation of gene flow (Wright 1943, 1946; Rousset 2000; Vekemans and Hardy 2004).
We estimated D using where N is census population size and V is variance in lifetime reproductive output (Kimura and Crow 1963). We calculated V using the lifetime reproductive output of the individuals dying in the last 50 time steps. We then divided the estimated by landscape area (assuming evenly distribution across the landscape) to get effective density D (Vekemans and Hardy 2004). We used three different genetic differentiation or kinship metrics (Loiselle et al. 1995; Ritland 1996; Rousset 2000) combined with estimated D to estimate gene flow across a range of true gene flow parameters (using SPAGeDi v1.5 software, Hardy and Vekemans (2002)). We also compared individual pairwise estimates of genetic differentiation across distance with the theoretically expected slope. Simulations were run for 100,000 time steps with parameters as described above.
We found that the gene flow estimated using the slope of genetic versus geographic distance and D was closely matched by the simulation parameter value, especially for the Rousset (2000) genetic differentiation estimator (Supplementary Figs. 3 and 4). This matching suggests these simulations corresponded well with theory for continuous populations at equilibrium, despite ignoring the effects of negative density dependence, uneven distribution of individuals, and boundary effects (Barton et al. 2002).
With increasing scale of gene flow we see a flatter change in wavelet dissimilarity across spatial scales (Supplementary Fig. 5). When gene flow is local, wavelet dissimilarity is low at small scales and high at large scales. At the large gene flow scale, the observed wavelet dissimilarity is indistinguishable from the panmictic null. We also ran the same analyses but using biased sampling along the landscape’s y-axis, so that 3/4 of samples were in the upper half of the landscape. Even with this bias, the wavelet dissimilarities across scales and gene flow parameters were essentially unchanged (Supplementary Fig. S6 in File S1). To investigate sensitivity to landscape size, we also ran these same simulations with landscapes four times as large (50×50) and found similar patterns of wavelet dissimilarity across scales and simulated gene flows (Supplementary Fig. 7).
Results
Simulated long-term neutral patterns in a heterogeneous landscape
To assess if our approach could identify localized and scale-specific patterns of isolation by distance, we next simulated multiple scenarios where we expected spatial heterogeneity. First, we simulated neutral evolution across a simulated patchy landscape (generated from earlier work) (Lasky and Keitt 2013). This landscape contained a substantial portion of unsuitable habitat where arriving propagules perished. We used the same population parameters as previously and simulated 100,000 time steps to reach approximately stable relatedness patterns. We then calculated wavelet dissimilarity using 1,000 random SNPs of 200 sampled individuals.
Additionally, we sought to compare wavelet dissimilarities to more familiar metrics. To do so, we calculated euclidean genetic distance (in the space of allele frequencies across the genome) and geographic distance between pairs of samples, and did this for different subsets of samples and regions, so as to compare localized patterns in wavelet dissimilarity to localized patterns in pairwise distances.
In our landscape, wavelet dissimilarity showed localized and scale-specific patterns of low and high dissimilarity (Fig. 2). Notably, the same two islands (top left and bottom right of landscape in Fig. 2) have lower dissimilarity than expected at small scales and are more dissimilar than expected at larger scales. Stated another way, these islands have low diversity locally (e.g. within populations), as can be seen by the slow increase in genetic distance with geographic distance locally (Fig. 2d, compare to 2f). However, at larger scales (e.g. comparing island to mainland) islands are more dissimilar, as seen by the greater genetic distances at larger geographic distances (Fig. 2e, compare to 2g; also see the first two principal components of SNPs, Supplementary Fig. S8 in File S1). These results highlight the capacity of the method to contrast patterns across scales requiring only dilation of the analyzing kernel.
Fig. 2.
Wavelet genetic dissimilarity identifies scale-specific, localized patterns in a heterogeneous landscape, with pairwise distance plots for comparison. a–c) Maps of simulated landscape where habitat is gray (in background) and unsuitable areas are white. Sampled individuals are circles. Colors represent sampling locations where wavelet genetic dissimilarity was significantly high (red) or low (blue), with s, the wavelet scale, shown at top of each panel as a horizontal line. At the smallest scales a), samples have less dissimilarity than expected, especially in the island in the upper left of the landscape. This pattern can also be seen d,f) when comparing pairwise geographic versus euclidean genetic distances for samples in the different regions of the landscape (dashed gray lines in a). At larger spatial scales b–c), all locations have significantly greater dissimilarity than expected due to limited gene flow. However, the same islands show the greatest dissimilarity at large scales (lower panels), due to their high genetic difference from mainland samples at center. This pattern can also be seen in the pairwise genetic distances across larger geographic distances e,g). d–g) Loess smoothing curves are shown.
Simulated neutral patterns in a colonizing and range-expanding species
For a second scenario where we expected localized, scale-specific heterogeneity, we simulated an invasion/range expansion. Beyond the importance of invasions in applied biology, the changes in spatial genetic patterns over time are of general interest (Slatkin 1991, 1993; Le Corre et al. 1997; Castric and Bernatchez 2003), considering that all species ranges are dynamic and many “native” species still bear clear evidence of expansion, e.g. following the last glacial maximum.
We simulated invasion across a square landscape of the same size as before, but beginning with identical individuals only in the middle at the bottom edge of the landscape (Fig. 3). We sampled 200 individuals at time steps 100, 250, 500, 1,000, 1,500, 2,000, through the full populating of the landscape around 2,500 years and until the 3,000th time step.
Fig. 3.
Wavelet genetic dissimilarity reveals dynamic spatial patterns during an invasion across a homogeneous landscape. Left column of panels a–c) shows a map of the landscape through time, with 200 sampled individuals at each time step and the wavelet dissimilarity at at their location. Darker red indicates greater wavelet dissimilarity. In the second time step, 1,000, two regions are highlighted in dashed boxes b), one with higher dissimilarity at d) and one with lower dissimilarity at this scale e). d–e) show pairwise geographic distance versus distance in the first PC of SNPs for samples from these regions. f) shows the loadings of each sample on the first PC of SNPs. d–e) highlight the greater increase in PC1 distance with geographic distance at this scale (vertical dashed lines) in d), compared to the smaller increase in PC1 distance at this scale in e). In particular, the region highlighted in d) is homogeneous at short distances but very distinct at distances at the highlighted scale , indicating the major genetic turnover at this scale and location. g) Mean wavelet dissimilarity across the landscape changes over time, highlighting the dynamic spatial population genetic patterns across invasions. Loess smoothing curves are shown in e–f).
We characterized wavelet genetic dissimilarity and found substantial heterogeneity across different regions and across time (e.g. for , dark versus light red in Fig. 3a–c). This heterogeneity in genetic turnover can be seen by contrasting genotypes from different regions. Near the expansion front, there is relative homogeneity and low diversity locally in new populations, but with rapid turnover in genotypes separated by space, resulting in high wavelet dissimilarity at intermediate spatial scales (Fig. 3d). In the range interior, there is greater local diversity and less turnover in genotype across space, i.e. a weaker isolation by distance (Fig. 3e, see all SNP genetic distance plot Supplementary Fig. 9). Supporting the role of founder effects and low diversity at expanding range margins in driving these patterns, we observed a decline in medium- and large-scale wavelet dissimilarity in later years (Fig. 3g) after the landscape had been populated.
These patterns highlight how wavelet dissimilarity is capturing scale-specific turnover in genetic composition, rather than merely genetic distance at a given geographic distance. Comparing the two regions highlighted in Fig. 3b, the genetic distances at a geographic distance of 6.9 are not strikingly different (Supplementary Fig. 9). Rather what distinguishes these regions is their rate of genetic change in composition at this scale, as highlighted in Fig. 3. The region of high wavelet dissimilarity at (Fig. 3b) transitions from homogeneity among nearby samples to high genetic distance at larger scales (Fig. 3d, S9). By contrast the region of low wavelet dissimilarity at (Fig. 3b) starts out with greater genetic distance among nearby samples with a modest increase in genetic distance at larger scales (Fig. 3e, S9).
Overall, these simulations show the capacity of , wavelet genetic dissimilarity, to capture localized, scale specific trends in genetic composition. Given the spatial heterogeneity in nature and the dynamics of populations and species ranges through time, there are likely many such patterns waiting to be described to shed light on patterns of gene flow and population history.
Finding the loci of local adaptation
Using wavelet transforms to identify outliers of spatial pattern in allele frequency
We can also use our approach to transforming allele frequencies to identify particular genetic loci involved in local adaptation, and the regions and spatial scales of turnover in their allele frequency. Our strategy is (as before) to first calculate , the wavelet transform, for each locus i at each sampling point for a set of chosen spatial scales .
Because of different ages and histories of drift, mutations will vary in their global allele frequency and thus global variance. To facilitate comparisons among loci for relative evidence of selection, we can normalize spatial patterns in allele frequency by total variation across locations, as is done when calculating .
Here we divide the wavelet transforms of allele frequency by the standard deviation of global allele frequency variation for each locus i, . This normalization is greatest when minor allele frequency is 0.5 for a biallelic locus, and yields a scaled wavelet transformed allele frequency: , for a given location and scale.
We then calculate the variance across sampling locations of and refer to this quantity as the “scale-specific genetic variance.” This scaled-specific variance is akin to in being a measure of spatial variation in allele frequency normalized to total variation (which is determined by mean allele frequency). High scale-specific variance for a given locus indicates high variation at that scale relative to the total variation and mean allele frequency. We then used a null distribution across all genomic loci to calculate parametric P-values (Cavalli-Sforza 1966; Lewontin and Krakauer 1973) and used the approach of Whitlock and Lotterhos (2015) to fit the degrees of freedom of the distribution of scale-specific genetic variances (see Supplemental Methods). Applying this approach to a range of simulated scenarios as well as an empirical dataset (described below), we see that the distribution with a maximum-likelihood fit to determine degrees of freedom provides a reasonably close fit to the distribution of scale-specific genetic variance among SNPs (Supplementary Figs. 10–13).
Simulated local adaptation
First, we present some specific individual simulations for illustration, and then a larger set with more variation in underlying parameters. We simulated a species with the same life history parameters as in simulations above, with the addition of spatially varying viability selection on a quantitative trait. We imposed two geometries of spatially varying selection, one a linear gradient and the other a square patch of different habitat selecting for a different trait value. As with the neutral simulations, simulations with selection began with organisms distributed across the landscape, with an ancestral trait value of zero. In these simulations, 1% of mutations influenced the quantitative trait with additive effects and with effect size normally distributed with a standard deviation of 5. For the linear gradient, the optimal trait value was 0.5 at one extreme and 0.5 at the other extreme, on a 25x25 square landscape. Selection was imposed using a Gaussian fitness function to proportionally reduce survival probability, with standard deviation . In this first simulation, . Carrying capacity was roughly five individuals per square unit area, and simulated populations usually stabilized close to this density. Full details of simulation, including complete code, can be found in supplemental materials and on GitHub (https://github.com/jesserlasky/WaveletSpatialGenetic).
In the first simulation along a linear gradient after 2,000 time steps, there were 2 selected loci with minor allele frequency (MAF) at least 0.1, with a genetic variance in the trait of 3.7. (the scale of mating and propagule dispersal were each ) The two loci under stronger selection were clearly identified by the scale-specific genetic variance at the larger spatial scales (Fig. 4). When there is a linear selective gradient across the entire landscape, the largest spatial scale is the one most strongly differentiating environments and the strongest scale-specific genetic variance was at the largest scale (Fig. 4). However, power may not be greatest at these largest scales, because population structure also is greatest at these largest scales. Instead, power was greatest at intermediate scales, as seen by the lowest P-values being detected at these intermediate scales (Fig. 4). At these scales, there is greater gene flow but still some degree of changing selection that may maximize power to detect selection.
Fig. 4.
Scale-specific genetic variance test applied to simulations with a linear selective gradient. (top panels) Genome-wide variation in scale-specific genetic variance, , for five different scales s and upper-tail P-values for test using fitted values of d.f. Each point represents an SNP at a specific scale. Loci under selection are indicated with vertical lines along with the absolute value of the derived allele’s effect on the trait and MAF. At bottom are shown maps of the two selected loci as well as their spectra of scale-specific genetic variance. At upper right the mean scale-specific genetic variance across all genomic loci is shown for each scale s. The scale of mating and propagule dispersal were each . Gaussian viability selection was imposed with . Carrying capacity was approximately five individuals per square unit area.
We next simulated change in selection in a discrete habitat patch, which may more closely correspond to the setting where researchers would find useful a flexible approach to finding spatial patterns in allele frequency, especially if the patches of distinct environment are not known by researchers. In our simulation there was a large central patch, 10x10, that selected for distinct trait values () compared to the outer parts of the landscape (). Selection was initially weakly stabilizing ( around the optimum of zero for the first 500 years to accumulate diversity, and then the patch selective differences were imposed with stronger selection, . The scales of mating and propagule dispersal were each . Carrying capacity was roughly 50 individuals per square unit area.
In this simulation, we present results after 3,000 time steps, where there was a single common quantitative trait locus (QTL) under selection, giving a genetic variance in trait of 0.42 (Fig. 5). We found several spurious large scale peaks in scale-specific genetic variance (Fig. 5a), but when using the test on these statistics we clearly identified the single QTL under selection, with lowest P-values for intermediate scales (Fig. 5b).
Fig. 5.
Simulations of local adaptation to a single discrete patch of different habitat. a) Genome-wide variation in scale-specific genetic variance and b) P-values for six different scales s, for a discrete habitat difference after 3,000 simulated years. Each point in the left panels represents an SNP, and wavelet statistics a–b) at specific scales. The selected SNP is indicated with a vertical line along with the absolute value of a derived allele’s effect on the trait and MAF. c) A map of the landscape with individuals’ genotypes at the causal SNP indicated with color, in addition to the spectrum of scale-specific genetic variance at this SNP, showing a peak at approximately half the patch width (vertical line at 5). d–e) Implementation of using arbitrary boundaries for populations. This approach can easily miss causal loci c,e) if the delineated population boundaries do not match habitat boundaries. a) At upper right the mean scale-specific genetic variance across all loci is shown for each scale s. The scale of mating and propagule dispersal were each . Gaussian viability selection was imposed with .
We calculated the scale-specific genetic variance across a denser spectrum of scales s for the causal SNP, to determine at what scale variance was greatest. We found the maximum scale-specific genetic variance for the causal SNP was at 5.02, approximately half the length of a patch edge (Fig. 5c). For illustration, we also calculated (Weir and Cockerham 1984; Goudet 2005) for several naively discretized subpopulation scenarios for a simple illustration of how results are sensitive to discretization (Fig. 5d–f). We also implemented our test on these two simulated landscapes but with biased sampling and found our ability to detect causal loci was robust (Supplementary Fig. 14).
Evaluating the scale-specific genetic variance test
As an initial assessment of the general appropriateness of the scale-specific genetic variance test we proposed above, we conducted additional simulations on two types of landscapes with varying life history parameters. These simulations were not meant to be an exhaustive evaluation of the performance of this new test; we leave a more extensive evaluation for future studies.
Here, we again used the discrete habitat patch landscape and the linear gradient landscape but with a wider range of parameter variation. We tested a range of mating and dispersal (σ) scales including 0.25, 0.5, 1, and 2, and a range of stabilizing selection () values including 0.125, 0.25, 0.5, and 1. Three simulations were conducted for each combination of parameter settings and each ran for 10,000 years.
Because PCAdapt is one of the few methods for identification of spatial pattern in allele frequency that does not require subpopulation discretization and in theory could detect patterns at multiple scales, we also implemented this method. We used the PCA of the scaled genotype matrix, thinned for LD but including causal SNPs, to extract the z-scores and P-values of each SNP with a cutoff of p = 0.05. We used a scree plot showing the percentage of variance explained in decreasing order to identify the optimal number of principal components following Cattell’s rule (Duforet-Frebourg et al. 2016).
Calculating false and true positive rates for PCAdapt was straightforward, but for the scale specific genetic variance test there are several tests (one at each scale) for each SNP. To conservatively represent inference across these multiple tests, we considered SNPs a significant result if one of the tested scales was significant. Because the individual scale tests are slightly conservative, and continuous wavelet transforms are correlated across scales (and hence not completely independent tests), we expected the resulting false positive rates would not be unduly high.
Overall the scale-specific genetic variance test showed good false positive rates. Across simulations, the proportion of SNPs with upper-tail at one scale was usually close to but sometimes slightly more than 0.05 (Fig. 6). By contrast, under scenarios of low gene flow and strong stabilizing selection, nominal false positive rates were high for PCAdapt, often .
Fig. 6.
Comparing the scale-specific genetic variance test with PCAdapt in simulations of adaptation to single discrete patch of different habitat. a) True positive rates (nominal ) for each combination of simulation parameters, the scales of mating and dispersal σ and the standard deviation of the Gaussian stabilizing selection function . b) An alternate view of statistical power based on the median rank of the top selected SNP among all SNPs. c) False positive rates (nominal ). d) Comparing power between the two statistical approaches for the different simulation runs. Density of points is shown in the blue scale so as to indicate where many simulations had the same result. The line indicates a 1:1 relationship. e–f) Individual selected SNPs in simulations, showing their nominal P values and ranks among all SNPs, colored based on σ in the simulation. The x-axis represents the proportion of total phenotypic variation among sampled individuals that was explained by the given SNP ( from a linear model).
Power to detect SNPs (proportion of selected SNPs with ) under selection was generally high (true positive rate near 1) but sometimes low, depending on the strength of selection () and mating and dispersal scales (σ) (Fig. 6). When gene flow was high and selection was weak, power was low for both the scale-specific genetic variance test and PCAdapt. This also corresponds to the scenario when local adaptation is weakest (Kirkpatrick and Barton 1997). In addition to considering power simply based on P for each SNP, we also considered power using the top P-value rank among selected SNPs under each simulation, based on the reasoning that researchers may want to follow up on top ranked outlier SNPs first before any lower ranked SNPs. This approach showed similar results, with high power for both the scale-specific genetic variance test and PCAdapt except when gene flow was high and selection weak. In general, the two methods showed comparable power across different scenarios (Fig. 6), with some indication that the scale-specific genetic variance test had higher power under high gene flow and PCAdapt slightly higher power under lower gene flow. By plotting individual SNPs we can see that for the upper end of gene flow scenarios ( or 2), the scale-specific genetic variance test more consistently identified selected SNPs at the top compared to PCAdapt. For the low gene flow scenarios, PCAdapt more consistently identified large effect variants, while the scale-specific genetic variance test more consistently identified the smaller effect variants (see results for linear gradient in Supplementary Fig. 15). Overall, the similarities in true and false positive rates between methods suggest that our wavelet approach is effective compared to other related tools, while our test also offers the ability to explicitly consider variation in spatial scale.
Testing for spatial pattern in QTL
When testing for spatially varying selection on a quantitative trait, one approach is ask whether QTL identified from association or linkage mapping studies show greater allele frequency differences among populations than expected (Berg and Coop 2014; Price et al. 2018). Here we implement such an approach to compare wavelet transformed allele frequencies for QTL L to a set of randomly selected loci of the same number and distribution.
For this test, we calculate the mean of scale-specific genetic variance for all QTL with MAF at least 0.05 among sampled individuals. We then permute the identity of causal QTL across the genome and recalculate the mean scale-specific genetic variance, and repeat this process 1,000 times to generate a null distribution of mean scale-specific genetic variance of QTL for each scale s.
We illustrate this test here briefly using a simulation of adaptation to a square patch of habitat in the middle of a landscape, with the two gene flow parameters , the strength of selection , carrying capacity individuals per square unit area. After 1,000 generations we sampled 300 individuals, from which there were 13 QTL for the trait under selection with MAF at least 0.05. We then calculated the mean scale-specific genetic variance for these QTL across scales s and compared to the null permutations of randomly selected 13 SNPs from the genome.
We found significantly higher mean scale-specific genetic variance for the QTL than the null expectation at all six scales tested. Although the scale-specific genetic variance was greatest at the largest scales for the QTL, these scales did not show as great a distinction when comparing to the null. The greatest mean wavelet variance of QTL relative to null came at the intermediate scales of 3–5, which was approximately 1/3–1/2 the width of the habitat patch (Supplementary Fig. 16).
Application to an empirical system
Genome-wide wavelet dissimilarity
We applied our approach to an empirical dataset of diverse, broadly distributed genotypes with whole genome resequencing data: 908 genotypes from 680 natural populations of the model plant, Arabidopsis thaliana (Brassicaceae). We used a published Arabidopsis dataset (Alonso-Blanco et al. 2016), only including Eurasian populations and excluding highly distinct “relicts” and also likely contaminant accessions (Pisupati et al. 2017). For locations with more than one accession genotyped, we calculated allele frequency. We used a total of 1,29,536 SNPs filtered for minor allele frequency (MAF) and LD (Zheng et al. 2012).
We first calculated the genome-wide wavelet dissimilarity, , across a series of increasing scales s at even intervals in log distance units from m to approximately half the distance separating the farthest samples, km.
We observed increasing mean genome-wide wavelet dissimilarity at larger scales (Fig. 7), a pattern indicative of isolation by distance, on average, across the landscape. Arabidopsis showed significantly low dissimilarity at scales less than km, likely due to the homogenizing effect of gene flow. However, we found significantly high dissimilarity at scales greater than km. This scale of significantly high dissimilarity may be a relatively short distance, considering that Arabidopsis is largely self-pollinating and lacks clear seed dispersal mechanisms (though seeds of some genotypes form mucus in water that increases buoyancy) (Saez-Aguayo et al. 2014). At scales greater than km, we found an increase in the slope relating scale s and dissimilarity, perhaps signifying a scale at which local adaptation begins to emerge.
Fig. 7.
Genome-wide wavelet dissimilarity, , for Arabidopsis genotypes. a) The global mean dissimilarity across scales compared to the null expectation (gray ribbon) and b) the dissimilarity across scales centered on each sampled genotype, with several regions highlighted (vertical lines indicate scales shown in panels c–f). c–f) Selected scales highlight the changes in dissimilarity across locations, with each circle indicating a genotyped sample/population. Red indicates significantly greater wavelet dissimilarity than expected, blue significantly less than expected. For the map panels, the intensity of color shading indicates the relative variation (for a given scale) in among significant locations.
The locations of scale-specific dissimilarity among Arabidopsis populations revealed several interesting patterns. Even by the km scale, there were three notable regions of significantly high dissimilarity: northern Spain and extreme southern and northern Sweden (Fig. 7). The high dissimilarity at this scale in northern Spain corresponds to the most mountainous regions of Iberia, suggesting that limitations to gene flow across this rugged landscape have led to especially strong isolation among populations at short distances. In northern Sweden, Long et al. (2013) previously found a particularly steep increase in isolation-by-distance. Alonso-Blanco et al. (2016) found that genetic distance was greatest among accessions from Southern Sweden at scales from to 250 compared to regions farther south. At larger, among-region scales, dissimilarity was significantly high across the range, with Iberia and northern Sweden again being most dissimilar at km and surpassed by central Asia at km as being most dissimilar. Iberia and northern Sweden contain many accessions distantly related to other accessions, likely due to isolation during glaciation and subsequent demographic histories (Alonso-Blanco et al. 2016). This scale in Asia separates populations in Siberia from those further south in the Tian Shan and Himalayas, indicating substantial divergence potentially due to limited gene flow across the heterogeneous landscape. By contrast, populations in the UK and the Balkan peninsula had low dissimilarity across a range of scales, possibly due to reduced diversity and a more recent history of spread in these regions.
Identifying putative locally adapted loci
For this analysis, we used the same genotypes as in the prior section but not filtered for LD, leaving 1,642,040 SNPs with MAF (Alonso-Blanco et al. 2016).
The scale-specific genetic variance test identified putative locally adapted loci (Supplementary Fig. 17). The distribution of scale-specific genetic variance among SNPs was reasonably matched to the theoretical distribution (Supplementary Fig. 13). Among notable loci, at the km scale, the #2 QTl and #3 SNP is in the coding region of METACASPASE 4 (MC4), a gene that controls biotic and abiotic stress-induced programed cell death (Hander et al. 2019; Shen et al. 2019). To speculate, if MC4 were involved in coevolution with microbial pathogens we might expect rapid allele frequency dynamics and thus a pattern of high variation among even nearby populations.
The #1 SNP for the km scale was in the coding sequence of the DOG1 gene (Fig. 8, Supplementary Fig. 17). This SNP, Chr. 5, 18,590,741 was also strongly associated with flowering time (see next section) and germination and tags known functional polymorphisms at this gene that are likely locally adaptive (Martínez-Berdeja et al. 2020). The spatial pattern of variation at this locus (Fig. 8) is complicated, highlighting the benefit of the flexible wavelet approach. By contrast, imposing a grid on this landscape, or using national political boundaries to calculate could easily miss the signal as did Horton et al. (2012). The climate-allele frequency associations for DOG1 are also complicated and nonmonotonic (Martínez-Berdeja et al. 2020; Gamba et al. 2023), making it challenging for genotype-environment association approaches (Lasky et al. 2023).
Fig. 8.
Allelic variation (colors) for SNPs that were top outliers for scale-specific genetic variance test at different scales. On maps at left, the scale for which an SNP was an outlier is indicated by a bar above each map. The right panels show the spatial spectra for each SNP, i.e. the scale-specific genetic variance across a range of scales. Dashed lines indicate the scale for which an SNP was an outlier.
At the km scale, the #1 SNP (and also the lowest P-value SNP among all scales, Fig. 8, Supplementary Fig. 17) was on chromosome 5 at 26,247,515 bp, 555 bp upstream from AT5G65660, a hydroxyproline-rich glycoprotein family protein. These are cell wall glycoproteins with important roles in development and growth (Johnson et al. 2017) some of which have a role in abiotic stress response (Tseng et al. 2013).
Testing for local adaptation in QTL
We tested for nonrandom scale-specific genetic variance of QTL for Arabidopsis flowering time, a trait that is likely involved in local adaptation (Ågren et al. 2017). We used previously published data on flowering time: days to flower at 10∘C measured on 1,003 genotypes and days to flower at 16∘C measured on 970 resequenced genotypes (Alonso-Blanco et al. 2016). We then performed mixed-model genome wide association studies (GWAS) in GEMMA (v 0.98.3) (Zhou and Stephens 2012) with 2,048,993 M SNPs filtered for minor allele frequency (MAF), while controlling for genome-wide similarity among ecotypes.
We found that top flowering time GWAS SNPs showed significantly elevated scale-specific genetic variance at several intermediate spatial scales tested. For flowering time at both 10∘ and 16∘C, scale-specific genetic variance was significantly elevated for the top 1,000 SNPs at the 282, 619, and km scales, but not always at the largest or smallest scales (Fig. 9). In particular the scale-specific genetic variances were greatest for the km scale where the mean scale specific genetic variance for 16∘C QTL was 15.2 SD above the null mean, and the km scale, where the mean scale specific genetic variance for 10∘C QTL was 13.5 SD above the null mean. For QTL from both temperature experiments, results were nearly equivalent if we instead used the top 100 SNPs.
Fig. 9.
Testing for selection on Arabidopsis flowering time QTL. We compared scale-specific genetic variance, , of QTL with random SNPs, for five different scales s, for flowering time measured at 10∘C and 16∘C. The first two columns show the observed mean of the top 1,000 flowering time SNPs with a vertical line and a z-score. The histograms show null distributions of scale-specific genetic variance based on permutations of an equal number of markers with an equal distribution as the flowering time QTL. At right the scale-specific genetic variance is shown for random SNPs and for the flowering time QTL (gray lines), across scales, with the mean indicated by a black line.
Discussion
Geneticists have long developed theory for spatial patterns in allele frequency (Wright 1943; Haldane 1948; Malécot 1948). Empiricists have sought to use these patterns make inference about underlying processes of demography, gene flow, and selection (Lewontin and Krakauer 1973; Rousset 2000; McRae et al. 2008). While statistical approaches have been developed to characterize geographic patterns, few are flexible enough to incorporate patterns at a range of scales that are also localized in space. Because wavelet transforms have these properties, we think they may be useful tools for geneticists. Here we demonstrated several applications of wavelet transforms to capture patterns in whole genome variation and at particular loci, under a range of neutral and nonneutral scenarios.
Some important existing approaches are based on discretization of spatially distributed samples into spatial bins, i.e. putative populations (Weir and Cockerham 1984; Petkova et al. 2016; Bishop et al. 2023). However, without prior knowledge of selective gradients, patterns of gene flow, or relevant barriers, it is often unclear how to delineate these populations. For example, we can see how the specific discretization can hinder our ability to find locally adapted loci in our simulations (Fig. 5) and in empirical studies of Arabidopsis in the case of the phenology gene DOG1 that was missed in previous scans (Horton et al. 2012; Alonso-Blanco et al. 2016).
Our goal in this paper was to provide a new perspective on spatial population genetics using the population-agnostic, and spatially smooth approach of wavelet transforms. We showed how these transforms characterize scale-specific and localized population structure across landscapes (Figs. 2, 3, 7). We also showed how wavelet transforms can capture scale-specific evidence of selection on individual genetic loci (Figs. 4, 5, 6, 8) and on groups of QTL (Fig. 9). Our simulations and empirical examples showed substantial heterogeneity in the scale and stationarity of spatial patterns. For example, the wavelet genetic dissimilarity allowed us to identify regions near a front of range expansion with steeper isolation by distance at particular scales due to drift (Fig. 3). Additionally, we identified loci underlying local adaptation and showed an example where the evidence for this adaptation was specific to intermediate spatial scales (Fig. 5). While existing approaches to characterizing population structure or local adaptation have some ability to characterize scale specific patterns, e.g. those based on ordinations of geography (Wagner et al. 2017) or SNPs (Josephs et al. 2019), and some can capture localized patterns (e.g. Petkova et al. 2016), there are few examples of approaches that merge both abilities (Wagner et al. 2017).
Like many methods in population genetics that rely on inference from observational data, we view our approaches as exploratory and hypothesis generating. Heterogeneous patterns of genome-wide wavelet dissimilarity suggest demographic hypotheses, some of which can be tested with detailed ecological and genetic study (e.g. Keeley et al. 2017). For genome-scans for loci involved in local adaptation, the P-values resulting from multiple tested scales are comparable and so we recommend starting with the loci having the lowest P-value, and using these to develop hypotheses for functional follow up experiments (Lasky et al. 2023).
The test for spatial pattern in individual loci we developed owes greatly to previous work from Lewontin and Krakauer (1973) who initially developed tests applied to the distribution of values, and from Whitlock and Lotterhos (2015)’s approach of inferring the degrees of freedom of the distribution using maximum likelihood and across loci. The distribution underlies a number of related tests applied across loci (François et al. 2016). However, we note that this test may be slightly conservative in some situations (Fig. 6). Nevertheless, we believe there were important signs in our work that this -based scale-specific genetic variance test was valuable. In particular, we found in our simulation of adaptation to a habitat patch that the scale-specific genetic variance was greatest at large spatial scales but at neutral sites, which obscured spatial pattern at the causal locus (Fig. 5). When applying the test, we were able to clearly map the causal locus while spurious loci with high scale-specific genetic variance fell away because spatial patterns at those loci still fit within the null distribution.
Relatedly, we found in other simulations and our empirical examples that the strongest evidence for local adaptation was often not at the largest spatial scales (Fig. 9), even when the selective gradient was linear across the landscape (i.e. the largest scale, Fig. 4). This enhanced power at scales sometimes smaller than the true selective gradients may be due to the limited power to resolve true adaptive clines at large scales from the genome-wide signal of isolation by distance at these scales. At intermediate scales, there may be a better balance of sufficient environmental variation to generate spatial pattern versus higher relatedness between locations due to gene flow.
We note that there remain several limitations to our approach proposed here. First, the ability of wavelet transforms to capture patterns depends on the correspondence between the wavelet form (shape) and the form of the empirical patterns we seek to enhance, and there may be better functional forms to filter spatial patterns in allele frequency. Generally speaking, a more compact smoothing kernel with minimum weight in the tails will be better at revealing abrupt spatial transitions, but at the necessary cost of less precise determination of scale (Heisenberg 1927). Smoothing kernels such as the tricube () have been shown to optimize certain tradeoffs in this space and could be used to construct a difference-of-kernels wavelet. However, the overall influence of kernel shape tends to be much less than the influence of kernel bandwidth in our experience. Second, we have not yet implemented localized tests for selection (i.e. specific to certain locations) as we did with genome-wide dissimilarity. A challenge applying this test at individual loci is that there is a very large number of resulting tests from combinations of loci, locations, and scales. Therefore, we have not fully exploited the localized information we derive from the wavelet transforms.
There are number of interesting future directions for research on wavelet characterization of spatial pattern in evolutionary biology. First, we could apply the wavelet transforms to genetic variation in quantitative traits measured in common gardens, to develop tests for selection on traits akin to the - test (Whitlock and Guillaume 2009; Josephs et al. 2019). Second, we could follow the example of Al-Asadi et al. 2019 and apply our measures of genetic dissimilarity to haplotypes of different size to estimate relative variation in the age of population structure. Third, we should test the performance of our tools under a wider range of demographic and selective scenarios to get a more nuanced picture of their strengths and weaknesses. Fourth, null models for wavelet dissimilarity could be constructed using knowledge of gene flow processes (instead of random permutation) to identify locations and scales with specific deviations from null patterns of gene flow.
In conclusion, population genetics (like most fields) has a long history of arbitrary discretization for the purposes of mathematical, computational, and conceptual convenience. However, the real world often exists without clear boundaries between populations and where processes act simultaneously at multiple scales. We believe that wavelet transforms are one of a range of tools that can move population genetics into a richer but still useful characterization of the natural world.
Supplementary Material
Acknowledgments
We thank Emily Josephs and Benjamin Peter and two anonymous reviewers for helpful comments. We thank Joanna Rifkin and JGI for help finding bugs in code.
Contributor Information
Jesse R Lasky, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.
Margarita Takou, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.
Diana Gamba, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.
Timothy H Keitt, Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA.
Data availability
Code used to generate the simulations and analyses shown here are freely available at https://github.com/jesserlasky/WaveletSpatialGenetic/.
Supplementary material available at GENETICS online.
Funding
This work was supported by National Institutes of Health award R35GM138300 to J.R.L.
Literature cited
- Ågren J, Oakley CG, Lundemo S, Schemske DW. 2017. Adaptive divergence in flowering time among natural populations of Arabidopsis thaliana: estimates of selection and QTL mapping. Evolution. 71(3):550–564. doi: 10.1111/evo.13126 [DOI] [PubMed] [Google Scholar]
- Al-Asadi H, Petkova D, Stephens M, Novembre J. 2019. Estimating recent migration and population-size surfaces. PLoS Genet. 15(1):e1007908. doi: 10.1371/journal.pgen.1007908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt K, Cao J, Chae E, Dezwaan T, Ding W, et al. 2016. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 166(2):481–491. doi: 10.1016/j.cell.2016.05.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ayton P, Fischer I. 2004. The hot hand fallacy and the gambler’s fallacy: two faces of subjective randomness? Mem Cogn. 32(8):1369–1378. doi: 10.3758/BF03206327 [DOI] [PubMed] [Google Scholar]
- Barton NH, Depaulis F, Etheridge AM. 2002. Neutral evolution in spatially continuous populations. Theor Popul Biol. 61(1):31–48. doi: 10.1006/tpbi.2001.1557 [DOI] [PubMed] [Google Scholar]
- Battey CJ, Ralph PL, Kern AD. 2020a. Predicting geographic location from genetic variation with deep neural networks. eLife. 9:e54507. doi: 10.7554/eLife.54507 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battey CJ, Ralph PL, Kern AD. 2020b. Space is the place: effects of continuous spatial structure on analysis of population genetic data. Genetics. 215(1):193–214. doi: 10.1534/genetics.120.303143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg JJ, Coop G. 2014. A population genetic signal of polygenic adaptation. PLoS Genet. 10(8):e1004412. doi: 10.1371/journal.pgen.1004412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhatia G, Patterson N, Sankararaman S, Price AL. 2013. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23(9):1514–1521. doi: 10.1101/gr.154831.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bishop AP, Chambers EA, Wang IJ. 2023. Generating continuous maps of genetic diversity using moving windows. Methods Ecol Evol. 14(5):1175–1181. doi: 10.1111/2041-210X.14090 [DOI] [Google Scholar]
- Blakemore S-J, Sarfati Y, Bazin N, Decety J. 2003. The detection of intentional contingencies in simple animations in patients with delusions of persecution. Psychol Med. 33(8):1433–1441. doi: 10.1017/S0033291703008341 [DOI] [PubMed] [Google Scholar]
- Bradburd GS, Coop GM, Ralph PL. 2018. Inferring continuous and discrete population genetic structure across space. Genetics. 210(1):33–52. doi: 10.1534/genetics.118.301333 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bradburd GS, Ralph PL. 2019. Spatial population genetics: it’s about time. Annu Rev Ecol Evol Syst. 50(1):427–449. doi: 10.1146/annurev-ecolsys-110316-022659 [DOI] [Google Scholar]
- Castric V, Bernatchez L. 2003. The rise and fall of isolation by distance in the anadromous brook charr (Salvelinus fontinalis Mitchill). Genetics. 163(3):983–996. doi: 10.1093/genetics/163.3.983 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavalli-Sforza LL. 1966. Population structure and human evolution. Proc R Soc Lond B Biol Sci. 164(995):362–379. doi: 10.1098/rspb.1966.0038 [DOI] [PubMed] [Google Scholar]
- Daubechies I. 1992. Ten Lectures on Wavelets. Philadelphia: SIAM. [Google Scholar]
- Duforet-Frebourg N, Luu K, Laval G, Bazin E, Blum MG. 2016. Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 genomes data. Mol Biol Evol. 33(4):1082–1093. doi: 10.1093/molbev/msv334 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Excoffier L, Ray N. 2008. Surfing during population expansions promotes genetic revolutions and structuration. Trends Ecol Evol. 23(7):347–351. doi: 10.1016/j.tree.2008.04.004 [DOI] [PubMed] [Google Scholar]
- Fitzpatrick MC, Keller SR. 2015. Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecol Lett. 18(1):1–16. doi: 10.1111/ele.2014.18.issue-1 [DOI] [PubMed] [Google Scholar]
- François O, Martins H, Caye K, Schoville SD. 2016. Controlling false discoveries in genome scans for selection. Mol Ecol. 25(2):454–469. doi: 10.1111/mec.13513 [DOI] [PubMed] [Google Scholar]
- Frantz AC, Cellina S, Krier A, Schley L, Burke T. 2009. Using spatial Bayesian methods to determine the genetic structure of a continuously distributed population: clusters or isolation by distance? J Appl Ecol. 46(2):493–505. doi: 10.1111/j.1365-2664.2008.01606.x [DOI] [Google Scholar]
- Fyfe S, Williams C, Mason OJ, Pickup GJ. 2008. Apophenia, theory of mind and schizotypy: perceiving meaning and intentionality in randomness. Cortex. 44(10):1316–1325. doi: 10.1016/j.cortex.2007.07.009 [DOI] [PubMed] [Google Scholar]
- Gamba Diana, Lorts Claire, Haile Asnake, Sahay Seema, Lopez Lua, Xia Tian, Kulesza Evelyn, Elango Dinakaran, Kerby Jeffrey, Yifru Mistire, et al. 2023. The genomics and physiology of abiotic stressors associated with global elevation gradients in Arabidopsis thaliana. bioRxiv. 10.1101/2022.03.22.485410, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
- Gautier M. 2015. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics. 201(4):1555–1579. doi: 10.1534/genetics.115.181453 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goudet J. 2005. HIERFSTAT, a package for r to compute and test hierarchical F-statistics. Mol Ecol Notes. 5(1):184–186. doi: 10.1111/j.1471-8286.2004.00828.x [DOI] [Google Scholar]
- Groh JS, Coop G. 2024. The temporal and genomic scale of selection following hybridization. Proc Natl Acad Sci USA. 121(12):e2309168121. doi: 10.1073/pnas.2309168121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haldane JBS. 1948. The theory of a cline. J Genet. 48(3):277–284. doi: 10.1007/BF02986626 [DOI] [PubMed] [Google Scholar]
- Haller BC, Messer PW. 2019. SLiM 3: forward genetic simulations beyond the Wright-Fisher model. Mol Biol Evol. 36(3):632–637. doi: 10.1093/molbev/msy228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hancock ZB, Toczydlowski RH, Bradburd GS. 2023. A spatial approach to jointly estimate Wright’s neighborhood size and long-term effective population size. bioRxiv. 10.1101/2023.03.10.532094, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
- Hander T, Willems P, Schatowitz H, Rombaut D, Staes A, Nolf J, Pottie R, Yao P, Gonçalves A, Pavie B, et al. 2019. Damage on plants activates -dependent metacaspases for release of immunomodulatory peptides. Science. 363(6433):eaar7486. doi: 10.1126/science.aar7486 [DOI] [PubMed] [Google Scholar]
- Hardy OJ, Vekemans X. 2002. SPAGEDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2(4):618–620. doi: 10.1046/j.1471-8286.2002.00305.x [DOI] [Google Scholar]
- Heisenberg W. 1927. Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Zeitschrift Physik. 43(3-4):172–198. doi: 10.1007/BF01397280 [DOI] [Google Scholar]
- Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjalmsson BJ, et al. 2012. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet. 44(2):212–216. doi: 10.1038/ng.1042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson KL, Cassin AM, Lonsdale A, Wong GK-S, Soltis DE, Miles NW, Melkonian M, Melkonian B, Deyholos MK, Leebens-Mack J, et al. 2017. Insights into the evolution of hydroxyproline-rich glycoproteins from 1000 plant transcriptomes. Plant Physiol. 174(2):904–921. doi: 10.1104/pp.17.00295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Josephs EB, Berg JJ, Ross-Ibarra J, Coop G. 2019. Detecting adaptive differentiation in structured populations with genomic data and common gardens. Genetics. 211(3):989–1004. doi: 10.1534/genetics.118.301786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kawecki TJ, Ebert D. 2004. Conceptual issues in local adaptation. Ecol Lett. 7(12):1225–1241. doi: 10.1111/ele.2004.7.issue-12 [DOI] [Google Scholar]
- Keeley ATH, Beier P, Keeley BW, Fagan ME. 2017. Habitat suitability is a poor proxy for landscape connectivity during dispersal and mating movements. Landsc Urban Plan. 161:90–102. doi: 10.1016/j.landurbplan.2017.01.007 [DOI] [Google Scholar]
- Keitt TH. 2007. On the quantification of local variation in biodiversity scaling using wavelets. In: Scaling Biodiversity. Cambridge: Cambridge University Press. p. 168–180.
- Kimura M, Crow JF. 1963. The measurement of effective population number. Evolution. 17(3):279–288. doi: 10.1111/j.1558-5646.1963.tb03281.x [DOI] [Google Scholar]
- Kimura M, Weiss GH. 1964. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics. 49(4):561–576. doi: 10.1093/genetics/49.4.561 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirkpatrick M, Barton NH. 1997. Evolution of a species’ range. Am Nat. 150(1):1–23. doi: 10.1086/286054 [DOI] [PubMed] [Google Scholar]
- Lasky JR, Des Marais DL, McKay JK, Richards JH, Juenger TE, Keitt TH. 2012. Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate. Mol Ecol. 21(22):5512–5529. doi: 10.1111/mec.2012.21.issue-22 [DOI] [PubMed] [Google Scholar]
- Lasky JR, Josephs EB, Morris GP. 2023. Genotype-environment associations to reveal the molecular basis of environmental adaptation. Plant Cell. 35(1):125–138. doi: 10.1093/plcell/koac267 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lasky JR, Keitt TH. 2013. Reserve size and fragmentation alter community assembly, diversity, and dynamics. Am Nat. 182(5):E142–E160. doi: 10.1086/673205 [DOI] [PubMed] [Google Scholar]
- Le Corre V, Machon N, Petit RJ, Kremer A. 1997. Colonization with long-distance seed dispersal and genetic structure of maternally inherited genes in forest trees: a simulation study. Genet Res (Camb). 69(2):117–125. doi: 10.1017/S0016672397002668 [DOI] [Google Scholar]
- Lewontin RC, Krakauer J. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 74(1):175–195. doi: 10.1093/genetics/74.1.175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loiselle BA, Sork VL, Nason J, Graham C. 1995. Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot. 82(11):1420–1425. doi: 10.1002/ajb2.1995.82.issue-11 [DOI] [Google Scholar]
- Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A, Zhang Q, Vilhjálmsson BJ, Korte A, Nizhynska V, et al. 2013. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat Genet. 45(8):884–890. doi: 10.1038/ng.2678 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Machado HE, Bergland AO, Taylor R, Tilk S, Behrman E, Dyer K, Fabian DK, Flatt T, González J, Karasov TL, et al. 2021. Broad geographic sampling reveals the shared basis and environmental correlates of seasonal adaptation in Drosophila. eLife. 10:e67577. doi: 10.7554/eLife.67577 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malécot G. 1948. Les Mathématiques de l’hérédité. Paris: Masson & Cie. [Google Scholar]
- Manel S, Schwartz MK, Luikart G, Taberlet P. 2003. Landscape genetics: combining landscape ecology and population genetics. Trends Ecol Evol. 18(4):189–197. doi: 10.1016/S0169-5347(03)00008-9 [DOI] [Google Scholar]
- Martínez-Berdeja A, Stitzer MC, Taylor MA, Okada M, Ezcurra E, Runcie DE, Schmitt J. 2020. Functional variants of DOG1 control seed chilling responses and variation in seasonal life-history strategies in Arabidopsis thaliana. Proc Natl Acad Sci USA. 117(5):2526–2534. doi: 10.1073/pnas.1912451117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McRae BH, Dickson BG, Keitt TH, Shah VB. 2008. Using circuit theory to model connectivity in ecology, evolution, and conservation. Ecology. 89(10):2712–2724. doi: 10.1890/07-1861.1 [DOI] [PubMed] [Google Scholar]
- McVean G. 2009. A genealogical interpretation of principal components analysis. PLoS Genet. 5(10):e1000686. doi: 10.1371/journal.pgen.1000686 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muraki S. 1995. Multiscale volume representation by a DoG wavelet. IEEE Trans Vis Comput Graph. 1(2):109–116. doi: 10.1109/2945.468408 [DOI] [Google Scholar]
- Nagylaki T. 1978. A diffusion model for geographically structured populations. J Math Biol. 6(4):375–382. doi: 10.1007/BF02463002 [DOI] [PubMed] [Google Scholar]
- Peter BM, Petkova D, Novembre J. 2020. Genetic landscapes reveal how human genetic diversity aligns with geography. Mol Biol Evol. 37(4):943–951. doi: 10.1093/molbev/msz280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peterman WE. 2018. ResistanceGA: an R package for the optimization of resistance surfaces using genetic algorithms. Methods Ecol Evolu. 9(6):1638–1647. doi: 10.1111/2041-210X.12984 [DOI] [Google Scholar]
- Petkova D, Novembre J, Stephens M. 2016. Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet. 48(1):94–100. doi: 10.1038/ng.3464 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pisupati R, Reichardt I, Seren Ü, Korte P, Nizhynska V, Kerdaffrec E, Uzunova K, Rabanal FA, Filiault DL, Nordborg M. 2017. Verification of Arabidopsis stock collections using SNPmatch, a tool for genotyping high-plexed samples. Sci Data. 4(1):170184. doi: 10.1038/sdata.2017.184 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price N, Moyers BT, Lopez L, Lasky JR, Monroe JG, Mullen JL, Oakley CG, Lin J, Ågren J, Schrider DR, et al. 2018. Combining population genomics and fitness QTLs to identify the genetics of local adaptation in Arabidopsis thaliana. Proc Natl Acad Sci USA. 115(19):5028–5033. doi: 10.1073/pnas.1719998115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pugach I, Matveyev R, Wollstein A, Kayser M, Stoneking M. 2011. Dating the age of admixture via wavelet transform analysis of genome-wide data. Genome Biol. 12(2):R19. doi: 10.1186/gb-2011-12-2-r19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritland K. 1996. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res (Camb). 67(2):175–185. doi: 10.1017/S0016672300033620 [DOI] [Google Scholar]
- Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. 2005. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1(6):e70. doi: 10.1371/journal.pgen.0010070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rousset F. 1997. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics. 145(4):1219–1228. doi: 10.1093/genetics/145.4.1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rousset F. 2000. Genetic differentiation between individuals. J Evol Biol. 13(1):58–62. doi: 10.1046/j.1420-9101.2000.00137.x [DOI] [Google Scholar]
- Saez-Aguayo S, Rondeau-Mouro C, Macquet A, Kronholm I, Ralet M-C, Berger A, Sallé C, Poulain D, Granier F, Botran L, et al. 2014. Local evolution of seed flotation in Arabidopsis. PLoS Genet. 10(3):e1004221. doi: 10.1371/journal.pgen.1004221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serre D, Pääbo S. 2004. Evidence for gradients of human genetic diversity within and among continents. Genome Res. 14(9):1679–1685. doi: 10.1101/gr.2529604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen W, Liu J, Li J-F. 2019. Type-II metacaspases mediate the processing of plant elicitor peptides in Arabidopsis. Mol Plant. 12(11):1524–1533. doi: 10.1016/j.molp.2019.08.003 [DOI] [PubMed] [Google Scholar]
- Shirk AJ, Cushman SA. 2014. Spatially-explicit estimation of Wright’s neighborhood size in continuous populations. Front Ecol Evol. 2:62. doi: 10.3389/fevo.2014.00062 [DOI] [Google Scholar]
- Slatkin M. 1991. Inbreeding coefficients and coalescence times. Genet Res. 58(2):167–175. doi: 10.1017/S0016672300029827 [DOI] [PubMed] [Google Scholar]
- Slatkin M. 1993. Isolation by distance in equilibrium and non-equilibrium populations. Evolution. 47(1):264–279. doi: 10.2307/2410134 [DOI] [PubMed] [Google Scholar]
- Tseng I-C, Hong C-Y, Yu S-M, Ho T-HD. 2013. Abscisic acid- and stress-induced highly proline-rich glycoproteins regulate root growth in rice. Plant Physiol. 163(1):118–134. doi: 10.1104/pp.113.217547 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vekemans X, Hardy OJ. 2004. New insights from fine-scale spatial genetic structure analyses in plant populations. Mol Ecol. 13(4):921–935. doi: 10.1046/j.1365-294X.2004.02076.x [DOI] [PubMed] [Google Scholar]
- Wagner HH, Chávez-Pesqueira M, Forester BR. 2017. Spatial detection of outlier loci with Moran eigenvector maps. Mol Ecol Resour. 17(6):1122–1135. doi: 10.1111/1755-0998.12653 [DOI] [PubMed] [Google Scholar]
- Wang IJ, Savage WK, Bradley Shaffer H. 2009. Landscape genetics and least-cost path analysis reveal unexpected dispersal routes in the California tiger salamander (Ambystoma californiense). Mol Ecol. 18(7):1365–1374. doi: 10.1111/j.1365-294X.2009.04122.x [DOI] [PubMed] [Google Scholar]
- Wang J, Hu Z, Upadhyaya HD, Morris GP. 2020. Genomic signatures of seed mass adaptation to global precipitation gradients in sorghum. Heredity. 124(1):108–121. doi: 10.1038/s41437-019-0249-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of population structure. Evolution. 38(6):1358–1370. doi: 10.1111/j.1558-5646.1984.tb05657.x [DOI] [PubMed] [Google Scholar]
- Whitlock MC, Guillaume F. 2009. Testing for spatially divergent selection: comparing QST to FST. Genetics. 183(3):1055–1063. doi: 10.1534/genetics.108.099812 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitlock MC, Lotterhos KE. 2015. Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am Nat. 186(S1):S24–S36. doi: 10.1086/682949 [DOI] [PubMed] [Google Scholar]
- Wright S. 1931. Evolution in mendelian populations. Genetics. 16(2):97–159. doi: 10.1093/genetics/16.2.97 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright S. 1943. Isolation by distance. Genetics. 28(2):114–138. doi: 10.1093/genetics/28.2.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright S. 1946. Isolation by distance under diverse systems of mating. Genetics. 31(1):39–59. doi: 10.1093/genetics/31.1.39 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright S. 1949. The genetical structure of populations. Ann Eugen. 15(1):323–354. doi: 10.1111/ahg.1949.15.issue-1 [DOI] [PubMed] [Google Scholar]
- Yang W-Y, Novembre J, Eskin E, Halperin E. 2012. A model-based approach for analysis of spatial structure in genetic data. Nat Genet. 44(6):725–731. doi: 10.1038/ng.2285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeaman S, Hodgins KA, Lotterhos KE, Suren H, Nadeau S, Degner JC, Nurkowski KA, Smets P, Wang T, Gray LK. 2016. Convergent local adaptation to climate in distantly related conifers. Science. 353(6306):1431–1433. doi: 10.1126/science.aaf7812 [DOI] [PubMed] [Google Scholar]
- Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. 2012. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 28(24):3326–3328. doi: 10.1093/bioinformatics/bts606 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Stephens M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7):821–824. doi: 10.1038/ng.2310 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Code used to generate the simulations and analyses shown here are freely available at https://github.com/jesserlasky/WaveletSpatialGenetic/.
Supplementary material available at GENETICS online.









