Abstract
The repeated evolution of similar phenotypes in independent populations (i.e. parallel or convergent evolution) provides an opportunity to identify genetic and ecological factors that influence the process of adaptation. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model for such studies, as they have repeatedly adapted to divergent habitats across the Northern hemisphere. Here, we use genomic, ecological and morphological data from 16 independent pairs of stickleback populations adapted to divergent lake and stream habitats. We combine a population genomic approach to identify regions of the genome that are likely under selection in these divergent habitats with an association mapping approach to identify regions of the genome that underlie variation in ecological factors and morphological traits. Over 37% of genomic windows are repeatedly differentiated across lake–stream pairs. Similarly, many genomic windows are associated with variation in abiotic factors, diet items and morphological phenotypes. Both the highly differentiated windows and candidate trait windows are non-randomly distributed across the genome and show some overlap. However, the overlap is not significant on a genome-wide scale. Together, our data suggest that adaptation to divergent food resources and predation regimes are drivers of differentiation in lake–stream stickleback, but that additional ecological factors are also important.
This article is part of the theme issue ‘Convergent evolution in the genomics era: new insights and directions’.
Keywords: adaptation, parallel evolution, convergent evolution, population genomics, association mapping
1. Introduction
When organisms recurrently adapt to new environments, what are the genetic and ecological factors that influence the repeatability of their evolutionary trajectories? The answers to this question should reveal the form and constancy of natural selection, as well as the constraining roles of genetic variation and gene flow. Independent, replicate pairs of populations adapting to similar ecological conditions provide a useful opportunity to address this question. Recent genetic studies of replicated phenotypic evolution have provided tantalizing clues that evolutionary trajectories may be more repeatable than previously thought [1–3]. In particular, many studies have identified regions of the genome that are repeatedly differentiated between independent population pairs adapting to divergent habitats (e.g. [4–7]). Although repeatable phenotypic divergence is generally taken as strong evidence of the role of natural selection [8–9], it is not always clear that patterns of repeated genomic divergence solely result from natural selection [10]. Furthermore, population genomic studies are usually agnostic to specific phenotypes, and most studies have not associated specific ecological factors or morphological traits with the regions of the genome that evolve consistently across replicates.
The threespine stickleback (Gasterosteus aculeatus) is a good system in which to quantify the repeatability of genomic differentiation and to identify the ecological conditions and phenotypic traits that are associated with regions of repeated genomic differentiation. These small fish have frequently colonized diverse freshwater habitats in the Northern Hemisphere since the retreat of the glaciers 12 000 years ago [11]. Strikingly, sticklebacks living in similar habitats often evolve similar phenotypes, suggesting that phenotypic shifts are adaptive. The availability of a number of genetic resources including an assembled and annotated genome [4] facilitates the identification of the genetic basis of putatively adaptive phenotypes [12]. Thus, this system provides an opportunity to ask whether the same genomic regions underlie evolutionary change in similar habitats and to further ask whether these seemingly adaptive regions are associated with particular biotic and abiotic environmental factors, or with specific organismal phenotypes.
One widespread example of repeated phenotypic divergence is found among pairs of stickleback populations inhabiting lakes versus streams. Lake ecotypes are adapted to feeding on zooplankton, while stream ecotypes are adapted to feeding on macro-invertebrates [13–14]. Lake–stream pairs can show considerable genetic and morphological divergence [15–18], and at least some phenotypic differences between the ecotypes are heritable [19–23]. These ‘lake–stream’ ecotype pairs have been extensively studied in Canada and Europe and show repeated phenotypic evolution [24]; i.e. lake fish from Europe and Canada resemble each other more in multiple key traits than their more closely related stream fish [25–26]. Previous work on this system has shown that parallel evolution (here termed ‘repeatability’) is imperfect, but that deviations from parallelism can be partially explained. Specifically, the degree of phenotypic parallelism is positively correlated with the degree of environmental parallelism [24]. This correlation suggests that evolutionary repeatability is indeed adaptive to some extent in this system, but that deviations from repeatable lake–stream divergence can also be attributed to adaptation to differences in ecology among lakes or among streams. This incomplete parallelism means that habitat categories (lake versus stream), environmental variables and fish morphological traits are decoupled enough to allow meaningful genome-wide association studies.
In this study, we combine genomic, ecological and morphological data from 16 population pairs of lake–stream stickleback sampled from independent watersheds on Vancouver Island, Canada, to ask three sets of questions: (1) For a given genomic region, what proportion of the 16 lake–stream pairs show similarly high genetic differentiation, and what proportion of the genome overall exhibits shared genetic differentiation?; (2) Which genomic locations are associated with variation in environmental factors (biotic and abiotic) and variation in morphological traits across lake and stream populations?; (3) Do the genomic regions underlying these traits co-localize to genomic locations that are repeatedly differentiated? If so, is there enrichment of particular categories of traits within the repeatedly differentiated regions?
2. Material and methods
(a). Quantification of repeated genomic differentiation
Illumina sequence data for pairs of lake and stream stickleback from 16 independent watersheds on Vancouver Island, Canada (32 total populations) was previously generated using the double-digest restriction-site-associated DNA-sequencing (double-digest RAD) method [27], with 24 individuals sequenced from each population. Single nucleotide polymorphisms (SNPs) were identified using a standard, reference-based bioinformatics pipeline (see [24] for full details of these data); alignment of reads was done to the Jones et al. [4] genome assembly. For each individual, a site was only included if the read coverage was between 8 and 100. SNPs mapping to the mitochondrial DNA or unassembled regions of the genome were excluded from further analysis. Weir–Cockerham Fixation index (FST) [28] was used to estimate genetic differentiation between each pair of lake–stream populations, then averaged over 50 kilobase pair (kbp) windows (electronic supplementary material, figure S1). These windows were constrained to have the same size and genomic locations for all lake–stream comparisons. Window-averaged FST values were calculated by dividing the sum of the numerators of all SNP-wise FST estimates within a given window by the sum of their denominators. For downstream analysis, we required that each window contained at least three variable sites.
Genomic windows were classified as ‘outliers’ or ‘non-outliers’ based on their mean FST. We classified outlier windows as those with mean FST values falling within the top 5% of the genome-wide FST distribution within a given lake–stream comparison. Outlier classification was performed using custom R scripts. Read coverage did not differ significantly between windows classified as outliers and those classified as non-outliers for any of the 16 population pairs (data not shown).
To identify the genomic regions (windows) that had repeatedly differentiated between independently derived lake–stream pairs, the outlier windows in each single lake–stream comparison were compared across all 16 population pairs. Repeatability was estimated window-by-window as the proportion of population pairs that had a given outlier, using the following equation:
where k is the number of population pairs with an outlier for a given window and n is the number of population pairs with data for that window.
To test whether the level of repeatability was greater than that expected by chance, we ran a permutation with 10 000 iterations. For each iteration, the outlier status of a given window was randomly shuffled among the 16 population pairs and the magnitude of repeatability was re-estimated. Missing data were held in place during resampling so that the total number of windows with data for a given population pair remained the same for all iterations. These 10 000 iterations yielded a null distribution of repeatability for each window, and empirical estimates were compared against these nulls to determine statistical significance. The resulting p-values were corrected for multiple testing using the p.adjust function with the BH (alias fdr) method in R [29].
(b). Association mapping with Bayenv
The same SNP data outlined above were used to identify genomic loci associated with variation in abiotic factors (n = 5), diet items (n = 45) or morphological phenotypes (n = 34) across the 32 freshwater populations from 16 watersheds (see electronic supplementary material, table S1 for a full list of traits). Data for individual traits in each of these three categories were all previously reported, with full details on the method of measurement provided in Stuart et al. [24].
Bayen v2.0 [30] was used to detect statistical associations between individual SNPs and each trait. SNPs with high linkage disequilibrium (r2 above 0.2) were identified using SNPrelate [31] and removed from the dataset, which left 11 440 unlinked SNPs. Only these unlinked SNPs were used to estimate the covariance matrix in Bayen v2.0 with 10 000 iterations [30]. This covariance matrix was then used in the association mapping models to account for population structure/relatedness. Population-level allele frequencies were estimated for all SNPs (68 677) and formatted as a POPfile using a custom PERL script. Average values for each morphological, diet and abiotic trait were estimated for each population and normalized by subtracting the among-population mean from each estimate and dividing by Coop et al. the among-population standard deviation, as suggested by Coop et al. [30]; this was the ENVfile. These average allele frequency (POPfile) and trait estimates (ENVfiles), along with the covariance matrix, were the input files for Bayenv. Bayenv v2.0 was run independently five times, following the methods of Blair et al. [32]. Each independent run used a unique random seed and had 10 000 iterations. Bayes factors and Spearman's ρ correlation coefficients were estimated for all SNPs and traits; both statistics were averaged across the five independent Bayenv runs before downstream analysis. The log Bayes factors for each trait were individually plotted against the corresponding Spearman's ρ values (i.e. in volcano plots); this allowed us to visually ensure that loci with large Bayes factors did not tend to have small correlation coefficients, as this would be an indicator of false positives.
The SNPs from the Bayenv analysis were classified as significant candidates for explaining variance in a trait if they met the following criteria: both the log Bayes factor value and Spearman's ρ fell in or above the 99.9th quantile of their respective distributions (see electronic supplementary material, table S1 for the Bayes factor and Spearman's ρ correlation coefficient significance thresholds for each trait). If a given 50 kbp window contained one or more of these candidate trait SNPs, it was defined as a ‘candidate trait window’.
(c). Permutation tests to quantify co-localization and enrichment of candidate trait windows and repeatedly differentiated genomic windows
For all permutation tests described below, we ran 10 000 iterations to generate the null distribution, against which empirical estimates were compared to determine statistical significance.
First, we tested whether the number of candidate trait windows on each chromosome was greater than expected by chance. In each iteration, the candidate status of a window (i.e. candidate containing or not) was randomly shuffled among the 21 chromosomes and the total number of candidate trait windows per chromosome was re-estimated. Three permutation tests were run, one for each category of the trait (abiotic, diet or morphology). The same method was used to test for enrichment of repeatedly differentiated windows, with repeatability status of a window randomly shuffled.
Second, we tested whether more traits within each category type (i.e. morphological, abiotic or diet) mapped to a given window than expected by chance. For each iteration, the presence or absence of a candidate for each trait within a category was randomly shuffled among the genomic windows and the total number of mapped traits per window was re-estimated. In this permutation, the resulting p-values were corrected for multiple testing using the p.adjust function and BH (alias fdr) method in R [29].
Third, we tested whether there were more windows that were repeatedly differentiated and classified as candidate trait windows than expected by chance. To accomplish this test, we randomized the candidate trait windows and separately randomized the windows that were repeatedly differentiated. Here, any window shared by at least two pairs of populations was coded as repeatedly differentiated, although the results were the same if only the significantly repeatedly differentiated windows (i.e. in at least three pairs; see Results) were coded in this way. For each iteration, we re-quantified the total number of windows that were repeatedly differentiated and contained a candidate trait locus.
Fourth, this same permutation structure was used to test whether there was more overlap between the candidate trait windows of the different categories (e.g. candidates for diet and morphology) than expected by chance. In these tests, the candidate trait windows in each category were individually randomized and the total number of windows containing candidates for both trait categories was re-estimated.
3. Results
(a). Repeatedly differentiated genomic windows
Of the 2513 windows (50 kbp) across the stickleback genome, 1013 windows were highly differentiated (i.e. contained FST outliers) in at least one lake–stream comparison. Across the genome, 377 windows (15% of the 2513 total genomic windows or 37% of the 1013 highly differentiated windows) were outliers in two or more population pairs, indicating that there is some evolutionary repeatability at the genomic level (figure 1). Permutation testing revealed that 42 of these windows (approx. 2% of all windows, 4% of highly differentiated windows) were significantly repeatedly differentiated (p < 0.05) even after correction for false discovery rate. These significant windows were all outliers in multiple watersheds (a minimum of three lake–stream population pairs, a maximum of 10 out of 15 pairs with; figure 1). It is also worth noting that an additional 126 windows were outliers in three to five population pairs but did not meet the significance threshold after false discovery correction in the permutation testing (see figures 1 and 2 and electronic supplementary material, table S2 for the genomic locations of these windows). There was no difference between the 42 windows that were significantly repeatedly differentiated windows and the remaining highly differentiated windows in either recombination rate (difference in recombination rate = 1.50, T475 = 0.73, p = 0.467) or gene density (difference in gene density = 0.019, T1448 = 0.05, p = 0.69).
The outlier window shared by 10 populations was located on chromosome I. There was significant enrichment (p < 0.05 in a permutation test; see electronic supplementary material, table S3 for individual chromosomal p-values) of repeatedly differentiated windows on chromosomes VIII, XI and XXI. The location of this highly repeatable window on chromosome XXI is biologically reasonable given previous work has indicated the clustering of quantitative trait loci (QTL) on this chromosome [12]. However, very few ecologically relevant traits had been previously mapped to chromosomes VIII or XI. These sites might contain genes affecting traits that differ between lake and stream sticklebacks but that have not been genetically mapped, such as parasite resistance or behaviour.
(b). Candidate trait windows for abiotic factors, diet items and morphological phenotypes
Using a population-level association mapping approach (i.e. Bayenv), we detected windows containing at least one SNP associated with environmental variables or fish traits. Specifically, Bayenv calculates the correlation between population allele frequency and mean trait value across the 32 populations (16 lake and 16 stream), while controlling for their genetic structure. We applied this analysis to three categories of data (abiotic factors, diet items, morphological phenotypes). Electronic supplementary material, figure S2 summarizes the locations of all windows.
All five abiotic factors tested had candidate windows that significantly explained population-level variation (correlation coefficients 0.32–0.60). Salinity and dissolved oxygen had the fewest candidate windows (four each), while the other factors had 17–20 significant candidate windows. All 34 morphological phenotypes considered had multiple candidate windows significantly associated with population-level variation (correlation coefficients 0.28–0.67). Pectoral fin area had the fewest candidates (six) while other traits had 30 or more significant candidate windows. Of the 45 diet items tested, 43 had one or more candidate window(s) and some factors had up to 20–30 significant candidate windows (correlation coefficients 0.27–0.69). See electronic supplementary material, table S2 for locations and summaries of mapped windows for each trait.
(c). Clustering of candidate trait windows across chromosomes
There was significant enrichment (p < 0.05 in a permutation test; see electronic supplementary material, table S3 for individual chromosomal p-values) of candidate windows associated with all trait categories: abiotic factors were enriched on chromosome VII, diet items were enriched on chromosomes VII and XXI, and morphological traits were enriched on chromosomes IV, VII and XX.
(d). Clustering of traits within candidate windows
Across the genome, there was no significant co-mapping (clustering of candidate regions) of the five different abiotic factors to a given window (p > 0.05 for all windows). By contrast, 45 windows (1.8% of all windows) exhibited significant co-mapping of two or more of the 34 morphological phenotypes (p < 0.05 in a permutation test after correction for false discovery rate). Genomic windows containing candidate loci for multiple morphological traits were located primarily on chromosomes IV, VII, XII and XX. For diet, there were 14 windows (0.6%) that had significant co-mapping of candidate loci for the 45 different diet components (p < 0.05 in a permutation test after correction for false discovery rate); these genomic regions were located primarily on chromosomes IV, VII and XV.
Overall, there were 586 windows (23%) that contained candidate loci for at least one category of trait (figure 3a). Three hundred and forty-three of these windows were associated with at least one morphological phenotype, 347 were associated with at least one diet item and 62 were associated with at least one abiotic factor (figure 3b). There was considerable overlap of these candidate windows across the three trait categories (figure 3b). Correspondingly, the degree of genome-wide overlap between candidate trait windows for different trait types (i.e. diet and morphology, morphology and abiotic, diet and abiotic) was significantly more than expected by chance for all three trait combinations (p < 0.0001 for all three permutation tests). Seventeen windows contained candidate loci for all three categories of traits (figure 3b); these windows were found on chromosomes I, IV, VII, VIII, IX and XII.
When we only considered the overlap between candidate windows for diet components and the 18 morphological phenotypes that have a known role in feeding, we found that 85 of the possible 223 regions (38%) were shared (p < 0.0001). These windows appear to be key in determining feeding capabilities and the corresponding prevalence of prey items in the diet; they were primarily located on chromosomes IV, VII, XIII and XX, with 8–14 candidate windows on each of these chromosomes.
(e). Co-localization of repeatedly differentiated windows and candidate trait windows
Eighty-four genomic windows (3.3%) were repeatedly differentiated (shared by two or more pairs) and contained a candidate locus for at least one trait (figure 2 and figure 3a). Six of these windows contained candidate loci for adaptation to abiotic factors, 49 windows contained candidate loci for morphological phenotypes and 49 windows contained candidate loci for diet items (figure 2 and figure 3c). There was an overlap of 18 of these windows among the different trait categories; two windows on chromosome VIII contained candidate loci for all three trait categories (figure 2 and figure 3c). However, the degree of genome-wide overlap between windows with repeated differentiation and windows containing any type of candidate loci was not greater than expected by chance (p > 0.05 in permutation tests for each of the three trait categories). The 40 windows displaying significantly repeatable differentiation (three or more pairs sharing an outlier window) contained candidates for between 1 and 13 individual traits (abiotic, diet or morphological). Interestingly, the four windows differentiated in the most population pairs (on chromosomes IV, VIII and XI) that also contained candidate loci were associated with diet components, swimming, feeding and armour traits (see highlighted windows in figure 2).
4. Discussion
In this study, we aimed to detect genomic regions with a signature of repeated differentiation across multiple independent pairs of lake–stream stickleback. We then asked whether those regions contained SNPs associated with environmental conditions and morphological divergence. Before discussing our results, we clarify that here we are not looking at parallel evolution in the strict sense, that is, whether the same mutation or allele is used repeatedly (see [33] for a discussion of different usages of the term parallel). Rather, we are interested in genomic regions of repeated differentiation. Repeated differentiation, like repeated fixation of the same mutation, strongly suggests the action of natural selection. However, it is important to note that other evolutionary forces or genomic features could also influence these patterns [10]. In our study, the repeatedly differentiated windows did not have significantly different recombination rates or gene densities from other differentiated windows, suggesting that our results are not an artefact of genome structure. Thus, finding that a region of the genome has a signature of selection (e.g. high FST) in multiple independently derived populations implies that there may be some interesting genes or genetic features located within the region. A limitation of the reduced representation sequencing methods (double-digest RAD) employed by this study is that we are unable to distinguish between patterns of differentiation generated from direct versus linked selection [34]. Thus, we do not attempt any analyses of the specific genetic content of the genomic windows identified here.
(a). Magnitude of repeatable genomic differentiation
Across biological systems, the degree to which differentiation is repeatable appears to be highly variable and differs depending on the biological level (gene versus phenotype versus genomic region) being considered (reviewed by [33]). Previous work in threespine stickleback has shown highly repeatable patterns of differentiation at particular candidate loci, for example, at the Ectodysplasin (Eda) gene, which has been directly shown to underlie the reduction of lateral plating in numerous freshwater populations relative to their marine ancestors [35]. A pattern of high repeatability has also been shown for the sodium/potassium ATPase (ATP1a1) gene, which mediates salinity tolerance [4,36].
Recent work characterizing genome-wide patterns of repeatability in sticklebacks has generally found that a small to moderate fraction of highly differentiated loci/regions evolve in a repeatable fashion [4,37–39]. Among three independent benthic–limnetic ecotype pairs of stickleback from Canada, 33% of outlier markers (SNPs) were shared by two or more of the pairs [36]; this is similar to the 37% seen for our North American lake–stream pairs. This is also similar to the 23.2% of outliers differentiating in parallel in the European anchovy (Engraulis encrasicolus) [7]. By contrast, for European lake–stream stickleback, only 3% of outlier windows were shared for two to four of the five surveyed population pairs [37]. This is similar to what is seen among crab and wave ecotypes of the rough periwinkle (Littorina saxatilis) where 3–13% of outliers are shared between at least two of the three surveyed islands [5]. In general, variable levels of repeatability for patterns of phenotypic and/or genotypic differentiation are thought to be due to a combination of environmental heterogeneity, insufficient genetic variation, variable gene flow and genetic drift [33]. Both gene flow and environmental heterogeneity influence the observed levels of phenotypic repeatability among the 16 lake–stream pairs studied here [24]. However, despite these factors, we still find a non-negligible (albeit low) level of repeatable genomic divergence for these ecotypes.
It is possible that the repeatability we observe for some genomic regions (upwards of five population pairs with shared outliers) is a signature of adaptation from standing genetic variation, as is often the case when marine sticklebacks colonize freshwater habitats [35,39–41]. Recent simulation work has shown that when populations adapt to identical environments and standing variation is present, the same alleles are most often used rather than new mutations [42]. Interestingly, Thompson et al. [42] also show the tendency toward repeated (parallel) evolution diminishes rapidly when selection is not entirely parallel. Indeed, Stuart et al. [24] have previously shown that variation in the degree of parallelism for environmental factors predicted the degree of phenotypic parallelism in the populations studied here. It could be that standing genetic variation also interacts with selective heterogeneity to produce the observed patterns.
(b). Clustering of candidate trait windows
Many of the candidate trait windows identified in this study correspond to genomic regions previously identified as QTL for morphological, behavioural and reproductive traits (reviewed by [12]). In particular, the four chromosomes (IV, VII, XX and XXI) showing enrichment for at least one of the three trait categories (abiotic factors, diet items, morphological phenotypes) mapped in our study are among the five chromosomes previously shown to have more QTL than expected given the physical size or number of genes on the chromosome [12]. Furthermore, we found that chromosomes IV and XX were enriched for candidate trait windows associated with both feeding morphology and diet items, consistent with the previously observed enrichment of QTL associated with feeding morphology on these two chromosomes [12]. Although we did not find evidence of enrichment of candidate trait windows on chromosome XVI, the QTL enrichment on this chromosome appears to be largely driven by loci influencing body shape [12], which was not studied here. It is important to note that the work here also presents many new candidate regions that will merit future fine-scale investigation, particularly for abiotic factors and diet items, which have received relatively little attention in previous mapping studies.
It is possible that the non-random distribution of candidate trait windows is a signature of either pleiotropy or tight physical linkage. In many systems, the tight clustering of multiple loci affecting different traits (sometimes called ‘supergenes’) involved in adaptive divergence has been observed [43]. Clustering of adaptive alleles is thought to be especially important when adaptation is occurring in the presence of gene flow [44], or when it would be maladaptive to have co-adapted phenotypes broken up by recombination [45–47]. Consistent with these theoretical predictions, there is strong evidence to suggest the differentiation of the lake–stream ecotypes studied here is occurring in the presence of ongoing gene flow [24].
(c). Association between windows of repeated genomic differentiation and candidate traits
For the candidate trait windows that were repeatedly differentiated, there was a positive, but marginally non-significant, correlation between the magnitude of repeated differentiation (number of population pairs sharing an outlier window) and the number of environmental or phenotypic traits that mapped to that window (r = 0.15, F1,82 = 2.88, p = 0.09). This suggests that loci are to some degree pleiotropic (i.e. influencing the variance of more than one trait) and may be more frequently used during adaptation to a common agent (or agents) of selection. However, such a pattern could also be generated if different agents of selection are acting in different population pairs on independent traits (and loci) that map to the same windows. Future fine-scale mapping and selection studies will be required to disentangle these alternative mechanisms.
Despite this association, we did not find evidence for significant genome-wide enrichment of candidate trait windows within regions of repeated genetic differentiation. However, we did find that all of the phenotypic traits previously identified to be highly parallel in pairs of lake–stream stickleback [24] mapped to regions of repeated genetic differentiation. This suggests that we are describing a real genetic signature of repeated phenotypic evolution. Yet, an important consideration of this study is that our use of population-level association analyses (Bayenv), rather than a within-population association study, reduced our ability to detect candidate loci. This is because the sensitivity of a population-level analysis increases when a greater number of populations exhibit the same associations between allele frequencies and phenotypes. As a result, candidate loci underlying trait variation in a single population would often be overlooked. Correspondingly, the candidate regions reported here are very likely only a subset of those important for the abiotic factors and traits considered in this study.
The observed mapping of multiple diet, feeding and armour traits to regions of the genome evolving in a repeatable fashion supports the idea that both feeding capacity and predation avoidance are among the drivers of differentiation in this system. We see a higher fraction of candidate regions related to biotic factors (predation avoidance and foraging) mapping to repeatedly differentiated genomic regions (14%) than candidate regions for abiotic factors (9%). This pattern may suggest biotic factors play a relatively greater role than abiotic factors in shaping patterns of repeated differentiation in this system. However, this pattern may also be due in part to greater variance in abiotic factors between the watersheds than between the lake and stream habitats within a watershed [24].
Despite the constraints of our methods discussed above, the repeated genomic differentiation found in this study cannot be explained solely by the variety of ecological factors and morphological traits that we considered here. This is not surprising as we know that selection in threespine stickleback is generally multifarious, involving a multitude of biotic factors such as competition [48], predation [49], parasites [50–52] and pathogens [53], as well as a variety of abiotic factors such as salinity [54], turbidity [55] and temperature [56]. Our results identify regions of the genome that are likely important for adaptation to these other environmental factors and provide a reminder that multifarious agents of selection should be considered in studies of repeated evolution. Furthermore, our results highlight the importance of integrating association mapping studies to identify links between genotypes and phenotypes with population genomic studies to identify links between genotypes and fitness. Combined, these two types of analyses can provide a more holistic view of the ecological and genetic factors that drive repeated phenotypic evolution.
Supplementary Material
Supplementary Material
Supplementary Material
Supplementary Material
Supplementary Material
Acknowledgements
We thank Andrew Hendry for discussions and support at all stages of this project and Jesse Weber for assistance with the original double-digest RAD data collection.
Data accessibility
All code and input files described in the manuscript are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.pj8c6g2 [57]. The original raw sequence reads, environmental data, and SNP tables are archived by Stuart et al. [24].
Authors' contributions
D.J.R., D.I.B. and C.L.P. conceived and designed the study. D.J.R., D.I.B. and C.L.P. obtained funding for the project. Y.E.S. generated the data. D.J.R. conducted the analyses. D.J.R. interpreted the data and wrote the manuscript with critical input from Y.E.S., D.I.B. and C.L.P. All authors approved the final version of the manuscript.
Competing interests
The authors have no competing interests.
Funding
Research supported by grants from the U.S. National Science Foundation (DEB 1456462 to Y.E.S. and D.I.B.; DEB 1144773 to D.I.B.; DEB 1144556 to C.L.P.). This project received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement 794277-PLEVOCON (to D.J.R.).
References
- 1.Conte GL, Arnegard ME, Peichel CL, Schluter D. 2012. The probability of genetic parallelism and convergence in natural populations. Proc. R. Soc. B 279, 5039–5047. ( 10.1098/rspb.2012.2146) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Stern DL. 2013. The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764. ( 10.1038/nrg3483) [DOI] [PubMed] [Google Scholar]
- 3.Martin A, Orgogozo V. 2013. The loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution 67, 1235–1250. ( 10.1111/evo.12081) [DOI] [PubMed] [Google Scholar]
- 4.Jones FC, et al. 2012. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61. ( 10.1038/nature10944) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ravinet M, Westram A, Johannesson K, Butlin R, André C, Panova M. 2016. Shared and nonshared genomic divergence in parallel ecotypes of Littorina saxatilis at a local scale. Mol. Ecol. 25, 287–305. ( 10.1111/mec.13332) [DOI] [PubMed] [Google Scholar]
- 6.Kautt AF, Elmer KR, Meyer A. 2012. Genomic signatures of divergent selection and speciation patterns in a ‘natural experiment’, the young parallel radiations of Nicaraguan crater lake cichlid fishes. Mol. Ecol. 21, 4770–4786. ( 10.1111/j.1365-294X.2012.05738.x) [DOI] [PubMed] [Google Scholar]
- 7.Le Moan A, Gagnaire PA, Bonhomme F.. 2016. Parallel genetic divergence among coastal–marine ecotype pairs of European anchovy explained by differential introgression after secondary contact. Mol. Ecol. 25, 3187–3202. ( 10.1111/mec.13627) [DOI] [PubMed] [Google Scholar]
- 8.Harvey PH, Pagel MD. 1991. The comparative method in evolutionary biology. Oxford, UK: Oxford University Press. [Google Scholar]
- 9.Schluter D, Nagel LM. 1995. Parallel speciation by natural selection. Am. Nat. 146, 292–301. ( 10.1086/285799) [DOI] [Google Scholar]
- 10.Ravinet M, Faria R, Butlin RK, Galindo J, Bierne N, Rafajlović M, Noor MAF, Mehlig B, Westram AM. 2017. Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow. J. Evol. Biol. 30, 1450–1477. ( 10.1111/jeb.13047) [DOI] [PubMed] [Google Scholar]
- 11.Bell MA, Foster SA. 1994. The evolutionary biology of the threespine stickleback. Oxford, UK: Oxford University Press. [Google Scholar]
- 12.Peichel CL, Marques DA. 2017. The genetic and molecular architecture of phenotypic diversity in sticklebacks. Phil. Trans. R. Soc. B 372, 20150486 ( 10.1098/rstb.2015.0486) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Berner D, Adams DC, Grandchamp AC, Hendry AP. 2008. Natural selection drives patterns of lake–stream divergence in stickleback foraging morphology. J. Evol. Biol. 21, 1653–1665. ( 10.1111/j.1420-9101.2008.01583.x) [DOI] [PubMed] [Google Scholar]
- 14.Berner D, Grandchamp AC, Hendry AP. 2009. Variable progress toward ecological speciation in parapatry: stickleback across eight lake-stream transitions. Evolution 63, 1740–1753. ( 10.1111/j.1558-5646.2009.00665.x) [DOI] [PubMed] [Google Scholar]
- 15.Kaeuffer R, Peichel CL, Bolnick DI, Hendry AP. 2012. Parallel and nonparallel aspects of ecological, phenotypic, and genetic divergence across replicate population pairs of lake and stream stickleback. Evolution 66, 402–418. ( 10.1111/j.1558-5646.2011.01440.x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Roesti M, Hendry AP, Salzburger W, Berner D. 2012. Genome divergence during evolutionary diversification as revealed in replicate lake–stream stickleback population pairs. Mol. Ecol. 21, 2852–2862. ( 10.1111/j.1365-294X.2012.05509.x) [DOI] [PubMed] [Google Scholar]
- 17.Deagle BE, Jones FC, Absher DM, Kingsley DM, Reimchen TE. 2013. Phylogeography and adaptation genetics of stickleback from the Haida Gwaii archipelago revealed using genome-wide single nucleotide polymorphism genotyping. Mol. Ecol. 22, 1917–1932. ( 10.1111/mec.12215) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ravinet M, Prodöhl PA, Harrod C. 2013. Parallel and nonparallel ecological, morphological and genetic divergence in lake–stream stickleback from a single catchment. J. Evol. Biol. 26, 186–204. ( 10.1111/jeb.12049) [DOI] [PubMed] [Google Scholar]
- 19.Sharpe DM, Räsänen K, Berner D, Hendry AP. 2008. Genetic and environmental contributions to the morphology of lake and stream stickleback: implications for gene flow and reproductive isolation. Evol. Ecol. Res. 10, 849–866. [Google Scholar]
- 20.Berner D, Kaeuffer R, Grandchamp AC, Raeymaekers JAM, Räsänen K, Hendry AP. 2011. Quantitative genetic inheritance of morphological divergence in a lake–stream stickleback ecotype pair: implications for reproductive isolation. J. Evol. Biol. 24, 1975–1983. ( 10.1111/j.1420-9101.2011.02330.x) [DOI] [PubMed] [Google Scholar]
- 21.Berner D, Moser D, Roesti M, Buescher H, Salzburger W. 2014. Genetic architecture of skeletal evolution in European lake and stream stickleback. Evolution 68, 1792–1805. ( 10.1111/evo.12390) [DOI] [PubMed] [Google Scholar]
- 22.Lucek K, Sivasundar A, Kristjánsson BK, Skúlason S, Seehausen O. 2014. Quick divergence but slow convergence during ecotype formation in lake and stream stickleback pairs of variable age. J. Evol. Biol. 27, 1878–1892. ( 10.1111/jeb.12439) [DOI] [PubMed] [Google Scholar]
- 23.Oke KB, Bukhari M, Kaeuffer R, Rolshausen G, Räsänen K, Bolnick DI, Peichel CL, Hendry AP. 2016. Does plasticity enhance or dampen phenotypic parallelism? A test with three lake–stream stickleback pairs. J. Evol. Biol. 29, 126–143. ( 10.1111/jeb.12767) [DOI] [PubMed] [Google Scholar]
- 24.Stuart YE, et al. 2017. Contrasting effects of environment and genetics generate a continuum of parallel evolution Nat. Ecol. Evol. 1, 158 ( 10.1038/s41559-017-0158) [DOI] [PubMed] [Google Scholar]
- 25.Berner D, Roesti M, Hendry AP, Salzburger W. 2010. Constraints on speciation suggested by comparing lake-stream stickleback divergence across two continents. Mol. Ecol. 19, 4963–4978. ( 10.1111/j.1365-294X.2010.04858.x) [DOI] [PubMed] [Google Scholar]
- 26.Lucek K, Sivasundar A, Roy D, Seehausen O. 2013. Repeated and predictable patterns of ecotypic differentiation during a biological invasion: lake–stream divergence in parapatric Swiss stickleback. J. Evol. Biol. 26, 2691–2709. ( 10.1111/jeb.12267) [DOI] [PubMed] [Google Scholar]
- 27.Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. 2012. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7, e37135 ( 10.1371/journal.pone.0037135) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370. ( 10.1111/j.1558-5646.1984.tb05657.x) [DOI] [PubMed] [Google Scholar]
- 29.Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1, 289–300. [Google Scholar]
- 30.Coop G, Witonsky D, Di Rienzo A, Prichard JK.. 2010. Using environmental correlations to identify loci underlying local adaptation. Genetics 185, 1411–1423. ( 10.1534/genetics.110.114819) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. 2012. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328. ( 10.1093/bioinformatics/bts606) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Blair LM, Granka JM, Feldman MW. 2014. On the stability of the Bayenv method in assessing human SNP-environment associations. Hum. Genomics 8, 1 ( 10.1186/1479-7364-8-1) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bolnick DI, Barrett RDH, Oke K, Rennison DJ, Stuart YE. 2018. (Non)parallel evolution. Ann. Rev. Ecol. Evol. Syst. 49, 303–330. ( 10.1146/annurev-ecolsys-110617-062240) [DOI] [Google Scholar]
- 34.Burri R. 2017. Interpreting differentiation landscapes in the light of long-term linked selection. Evol. Lett. 1, 118–131. ( 10.1002/evl3.14) [DOI] [Google Scholar]
- 35.Colosimo PF, et al. 2005. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science 307, 1928–1933. ( 10.1126/science.1107239) [DOI] [PubMed] [Google Scholar]
- 36.Jones FC, et al. 2012. A genome-wide SNP genotyping array reveals patterns of global and repeated species-pair divergence in sticklebacks. Curr. Biol. 22, 83–90. ( 10.1016/j.cub.2011.11.045) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Feulner PG, et al. 2015. Genomics of divergence along a continuum of parapatric population differentiation. PLoS Genet. 11, e1004966 ( 10.1371/journal.pgen.1004966) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Raeymaekers JA, Chaturvedi A, Hablützel PI, Verdonck I, Hellemans B, Maes GE, Meester L, Volckaert FA. 2017. Adaptive and non-adaptive divergence in a common landscape. Nat. Commun. 8, 267 ( 10.1038/s41467-017-00256-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Miller CT, Beleza S, Pollen AA, Schluter D, Kittles RA, Shriver MD, Kingsley DM. 2007. cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans. Cell 131, 1179–1189. ( 10.1016/j.cell.2007.10.055) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.O'Brown NM, Summers BR, Jones FC, Brady SD, Kingsley DM. 2015. A recurrent regulatory change underlying altered expression and Wnt response of the stickleback armor plates gene EDA. Elife 4, e05290 ( 10.7554/eLife.05290) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Indjeian VB, Kingman GA, Jones FC, Guenther CA, Grimwood J, Schmutz J, Myers RM, Kingsley DM. 2016. Evolving new skeletal traits by cis-regulatory changes in bone morphogenetic proteins. Cell 164, 45–56. ( 10.1016/j.cell.2015.12.007) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Thompson KA, Osmond MM, Schluter D. 2019. Patterns of speciation and parallel genetic evolution under adaptation from standing variation. Evol. Lett. 3, 129–141. ( 10.1002/evl3.106) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schwander T, Libbrecht R, Keller L. 2014. Supergenes and complex phenotypes. Curr. Biol. 24, R288–R294. ( 10.1016/j.cub.2014.01.056) [DOI] [PubMed] [Google Scholar]
- 44.Samuk K, Owens GL, Delmore KE, Miller SE, Rennison DJ, Schluter D. 2017. Gene flow and selection interact to promote adaptive divergence in regions of low recombination. Mol. Ecol. 26, 4378–4390. ( 10.1111/mec.14226) [DOI] [PubMed] [Google Scholar]
- 45.Bürger R, Akerman A. 2011. The effects of linkage and gene flow on local adaptation: a two-locus continent–island model. Theor. Popul. Biol. 80, 272–288. ( 10.1016/j.tpb.2011.07.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Yeaman S, Whitlock MC. 2011. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897–1911. ( 10.1111/j.1558-5646.2011.01269.x) [DOI] [PubMed] [Google Scholar]
- 47.Aeschbacher S, Selby JP, Willis JH, Coop G. 2017. Population-genomic inference of the strength and timing of selection against gene flow. Proc. Natl Acad. Sci. USA 114, 7061–7066. ( 10.1073/pnas.1616755114) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Schluter D. 1994. Experimental evidence that competition promotes divergence in adaptive radiation. Science 266, 798–801. ( 10.1126/science.266.5186.798) [DOI] [PubMed] [Google Scholar]
- 49.Reimchen TE. 1994. Predators and morphological evolution in threespine stickleback. In The evolutionary biology of the threespine stickleback (eds Bell MA, Foster SA), pp. 240–276. Oxford, UK: Oxford University Press. [Google Scholar]
- 50.Wegner KM, Reusch TBH, Kalbe M. 2003. Multiple parasites are driving major histocompatibility complex polymorphism in the wild. J. Evol. Biol. 16, 224–232. ( 10.1046/j.1420-9101.2003.00519.x) [DOI] [PubMed] [Google Scholar]
- 51.Weber JN, Kalbe M, Shim KC, Erin NI, Steinel NC, Ma L, Bolnick DI. 2017. Resist globally, infect locally: a transcontinental test of adaptation by stickleback and their tapeworm parasite. Am. Nat. 189, 43–57. ( 10.1086/689597) [DOI] [PubMed] [Google Scholar]
- 52.Stutz WE, Bolnick DI. 2017. Natural selection on MHC IIβ in parapatric lake and stream stickleback: balancing, divergent, both or neither? Mol. Ecol. 26, 4772–4786. ( 10.1111/mec.14158) [DOI] [PubMed] [Google Scholar]
- 53.Waltzek TB, Marty GD, Alfaro ME, Bennett WR, Garver KA, Haulena M, Weber ES, Hedrick RP. 2012. Systemic iridovirus from threespine stickleback Gasterosteus aculeatus represents a new megalocytivirus species (family Iridoviridae). Dis. Aquat. Org. 98, 41–56. ( 10.3354/dao02415) [DOI] [PubMed] [Google Scholar]
- 54.DeFaveri J, Merilä J. 2014. Local adaptation to salinity in the three-spined stickleback? J. Evol. Biol. 27, 290–302. ( 10.1111/jeb.12289) [DOI] [PubMed] [Google Scholar]
- 55.Engström-Öst J, Candolin U. 2006. Human-induced water turbidity alters selection on sexual displays in sticklebacks. Behav. Ecol. 18, 393–398. ( 10.1093/beheco/arl097) [DOI] [Google Scholar]
- 56.Barrett RDH, Paccard A, Healy TM, Bergek S, Schulte PM, Schluter D, Rogers SM. 2011. Rapid evolution of cold tolerance in stickleback. Proc. R. Soc. B 278, 233–238. ( 10.1098/rspb.2010.0923) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Rennison DJ, Stuart YE, Bolnick DI, Peichel CL. 2019. Data from: Ecological factors and morphological traits are associated with repeated genomic differentiation between lake and stream stickleback Dryad Digital Repository. ( 10.5061/dryad.pj8c6g2) [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Rennison DJ, Stuart YE, Bolnick DI, Peichel CL. 2019. Data from: Ecological factors and morphological traits are associated with repeated genomic differentiation between lake and stream stickleback Dryad Digital Repository. ( 10.5061/dryad.pj8c6g2) [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
All code and input files described in the manuscript are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.pj8c6g2 [57]. The original raw sequence reads, environmental data, and SNP tables are archived by Stuart et al. [24].