Abstract
Adaptation in the wild often involves standing genetic variation (SGV), which allows rapid responses to selection on ecological timescales. However, we still know little about how the evolutionary histories and genomic distributions of SGV influence local adaptation in natural populations. Here, we address this knowledge gap using the threespine stickleback fish (Gasterosteus aculeatus) as a model. We extend restriction site‐associated DNA sequencing (RAD‐seq) to produce phased haplotypes approaching 700 base pairs (bp) in length at each of over 50,000 loci across the stickleback genome. Parallel adaptation in two geographically isolated freshwater pond populations consistently involved fixation of haplotypes that are identical‐by‐descent. In these same genomic regions, sequence divergence between marine and freshwater stickleback, as measured by dXY, reaches tenfold higher than background levels and genomic variation is structured into distinct marine and freshwater haplogroups. By combining this dataset with a de novo genome assembly of a related species, the ninespine stickleback (Pungitius pungitius), we find that this habitat‐associated divergent variation averages six million years old, nearly twice the genome‐wide average. The genomic variation that is involved in recent and rapid local adaptation in stickleback has therefore been evolving throughout the 15‐million‐year history since the two species lineages split. This long history of genomic divergence has maintained large genomic regions of ancient ancestry that include multiple chromosomal inversions and extensive linked variation. These discoveries of ancient genetic variation spread broadly across the genome in stickleback demonstrate how selection on ecological timescales is a result of genome evolution over geological timescales, and vice versa.
Keywords: Adaptation, evolutionary genomics, speciation islands, RAD‐seq
Impact Summary.
Adaptation to changing environments requires a source of genetic variation. When environments change quickly, species often rely on variation that is already present–‐so‐called standing genetic variation‐–because new adaptive mutations are too rare. The threespine stickleback, a small fish species living throughout the Northern Hemisphere, is well‐known for its ability to rapidly adapt to new environments. Populations living in coastal oceans are heavily armored with bony plates and spines that protect them from predators. These marine populations have repeatedly invaded and adapted to freshwater environments, losing much of their armor and changing in shape, size, color, and behavior.
Adaptation to freshwater environments can occur in mere decades and probably involves a significant amount of standing genetic variation. Indeed, one of the clearest examples we have of adaptation from standing genetic variation comes from a gene, eda, that controls the shifts in armor plating. This discovery involved two surprises that continue to shape our understanding of the genetics of adaptation. First, freshwater stickleback from across the Northern Hemisphere share the same version, or allele, of this gene. Second, the “marine” and “freshwater” alleles arose millions of years ago, even though the freshwater populations studied arose much more recently. While it has been hypothesized that other genes in the stickleback genome may share these patterns, large‐scale surveys of genomic variation have been unable to test this prediction directly.
Here, we use new sequencing technologies to survey DNA sequence variation across the stickleback genome for patterns like those at the eda gene. We find that nearly every region of the genome associated with marine‐freshwater genetic differences shares this pattern to some degree. Moreover, many of these regions are as old or older than eda, stretching back over 10 million years in the past and perhaps even predating the species we now call the threespine stickleback. We conclude that natural selection has maintained this variation over geological timescales and that the same alleles we observe in freshwater stickleback today are the descendants of those under selection in ancient, now‐extinct freshwater habitats. Our findings highlight the need to understand evolution on macroevolutionary timescales to understand and predict adaptation happening in the present day.
The mode and tempo of adaptive evolution depend on the sources of genetic variation affecting fitness (Wright 1932; Orr 2005). While new mutation is the ultimate origin of all genetic variation, recent studies of adaptation in the wild have documented adaptive genetic variation that was either segregating in the ancestral population as standing genetic variation (SGV) (Barrett and Schluter 2008; Domingues et al. 2012; Schrider and Kern 2017), or introgressed from a separate population or species (Huerta‐Sánchez et al. 2014; Fontaine et al. 2015). The use of SGV during evolution appears particularly important when dramatic responses to selection occur on ecological timescales, in dozens of generations or fewer (Barrett and Schluter 2008). When environments change rapidly, SGV can propel rapid evolution in ecologically relevant traits even in populations of long‐lived organisms like Darwin's finches (Grant and Grant 2002), monkeyflowers (Wright et al. 2013), and threespine stickleback fish (Colosimo et al. 2005).
The contribution of SGV to rapid divergence has important consequences for our understanding of evolutionary genetics. Existing genetic variants have evolutionary histories that are often unknown, but which may none‐the‐less have significant impacts on subsequent adaptation (Kirkpatrick and Barton 2006; Wright et al. 2013). The abundance, genomic distribution, and fitness effects of SGV are themselves the products of evolution (Charlesworth et al. 1993; Colosimo et al. 2005; Kirkpatrick and Barton 2006; Linnen et al. 2009; Stankowski and Streisfeld 2015), and their unknown history raises fascinating questions for the genetics of adaptation in the wild. When did adaptive variants originally arise? How are they structured, across both geography and the genome? Which evolutionary forces shaped their current distribution and how might this history channel future evolutionary change?
Answers to these questions are critical for our understanding of the importance of SGV in nature, as well as our ability to predict the paths available to adaptation on ecological timescales (Wright et al. 2013). Biologists are beginning to probe evolutionary histories of SGV using genome‐wide sequence variation across multiple individuals in numerous populations (Pease et al. 2016), but this level of inference has been unavailable for most natural systems because of methodological limitations that remove phase information (e.g., pool‐seq: Schlotterer et al. 2014) or produce very short reads (e.g., RAD‐seq: Davey et al. 2011).
Here, we investigate the structure and evolutionary history of divergent SGV by modifying the original sheared RAD‐seq method to generate ∼700 bp haplotypes at tens of thousands of loci sampled across the stickleback genome. This approach allows us to accurately measure sequence variation and estimate divergence times across the genome. By collecting more detailed sequence information at each RAD locus, this approach also provides more accurate estimates of polymorphism and divergence at each locus, and with far smaller sample sizes, compared to traditional short‐read methods (Nei 1987 chapters 10 and 13; Wakeley 2009; Cruickshank and Hahn 2014).
SGV has long been postulated to be critical to adaptation in stickleback, and several recent population genomic studies have supported this hypothesis (Hohenlohe et al. 2010; Jones et al. 2012; Feulner et al. 2013; Roesti et al. 2015; Samuk et al. 2017). Marine stickleback have repeatedly colonized freshwater lakes and streams (Bell and Foster 1994b; Jones et al. 2012; Wund et al. 2016), and adaptive divergence in isolated freshwater habitats is highly parallel at the phenotypic (Colosimo et al. 2004; Cresko et al. 2004) and genomic levels (Hohenlohe et al. 2010; Jones et al. 2012; but see Stuart et al. 2017). In addition, analyses of haplotype variation at the genes eda (Colosimo et al. 2005; Roesti et al. 2014) and atp1a1 (Roesti et al. 2014) present two clear results: separate freshwater populations share common “freshwater” haplotypes that are identical‐by‐descent (IBD), and sequence divergence between the major marine and freshwater haplogroups suggests their ancient origins–‐perhaps over two million years ago in the case of eda (Colosimo et al. 2005). While intriguing, it is not clear whether the deep evolutionary histories of these loci are rarities or representative of more widespread ancient history across the genome.
To address fundamental questions of genealogical relationships and molecular evolution in stickleback, we utilize the new RAD‐seq haplotyping approach to assay genome‐wide variation associated with adaptive divergence in two young freshwater ponds, which formed during the end‐Pleistocene glacial retreat (c. 12,000 years ago: Francis et al. 1986; Cresko et al. 2004, Fig. 1). In addition, we generated a de novo genome assembly of the sister taxon ninespine stickleback (Pungitius pungitius), allowing us to estimate divergence times for genealogies across the genome. Our results clearly demonstrate that the previous findings of deep evolutionary history based upon candidate loci are not unique but in fact the rule. A suite of adaptive variation structured into distinct marine and freshwater haplotypes that evolved over millions of years structures a deep pool of SGV fueling repeated and rapid evolution in stickleback.
Figure 1.

Stickleback sampling and RAD sequencing to measure haplotype variation. (A) Threespine stickleback sampling locations in this study. Colors represent habitat type: red: marine; blue: freshwater. Number of haploid genome sampled is shown. (B–D) We modified the original RAD‐seq protocol to generate local haplotypes. Colored bars represent polymorphic sites. For a detailed description of haplotype construction, see Methods. (B) Overlapping paired‐end reads are anchored to PstI restriction sites. (C) Paired reads mapping to each half‐site are merged into contigs. Contigs mapping to the same restriction site are identified by alignment to the reference genome. (D) Sequences from each half of a restriction site are phased to generate a single RAD locus. RAD tags in the background represent multiple genotypes used in phasing.
Methods
SAMPLE COLLECTION
Wild threespine stickleback were collected from Rabbit Slough (N 61.5595, W 149.2583, n = 5 fish), Boot Lake (N 61.7167, W 149.1167, n = 5 fish), and Bear Paw Lake (N 61.6139, W 149.7539, n = 4 fish) (Fig. 1A). Rabbit Slough is an offshoot of the Knik Arm of Cook Inlet and is known to be populated by anadromous populations of stickleback that are stereotypically oceanic in phenotype and genotype (Cresko et al. 2004; Hohenlohe et al. 2010). Boot Lake and Bear Paw Lake are both shallow lakes formed during the end‐Pleistocene glacial retreat. Fish were collected in the summers of 2009 (Rabbit Slough), 2010 (Bear Paw Lake), and 2014 (Boot Lake) using wire minnow traps and euthanized in situ with Tricaine solution. Euthanized fish were immediately fixed in 95% ethanol and shipped to the Cresko Laboratory at the University of Oregon (Eugene, OR, USA). DNA was extracted from fin clips preserved in 95% ethanol using either Qiagen DNeasy spin column extraction kits or Ampure magnetic beads (Beckman Coulter, Inc.) following manufacturer's instructions. Yields averaged 1–2 μg DNA per extraction (∼30 mg tissue). Treatment of animals followed protocols approved by the University of Oregon Institutional Animal Care and Use Committee (IACUC).
SEQUENCING STRATEGY AND RATIONALE
We designed our sequencing to maximize detection of DNA sequence variation and divergence, with the ultimate goal being the estimation of absolute divergence times of marine and freshwater haplogroups. Previous work by us and others using short sequence reads provided clear evidence of changes in relative frequencies of alleles across stickleback populations (Hohenlohe et al. 2010; Roesti et al. 2014; Lescak et al. 2015; Roesti et al. 2015), but could not sufficiently address questions of haplotype ages. We therefore designed a RAD sequencing approach to (1) accurately estimate sequence diversity within and divergence between threespine stickleback ecotypes and (2) recover sufficient RAD loci that map unambiguously to an outgroup genome sequence from the ninespine stickleback that we could confidently compare diversity within threespine stickleback to divergence from the ninespine stickleback. We chose the ninespine stickleback as an outgroup because the threespine‐ninespine split is sufficiently old that lineage sorting should be nearly complete (>>4Ne, assuming Ne <106, Aldenhoven et al. 2010) yet recent enough to facilitate sequence mapping between species (Rastas et al. 2015).
To achieve our aims, we designed a sequencing method to produce phased haplotypes of ∼700 bp at each RAD locus (Fig. 1B–D) and to sample the genome densely enough to identify signatures of selection after the likely dropout of RAD loci without clear mapping in the ninespine stickleback genome. We used the single‐digest, sheared RAD approach to limit biases in our estimates of sequence diversity. RAD‐seq has known biases due to mutations in restriction sites causing allele dropout (Arnold et al. 2013; Gautier et al. 2013), the potential for which increases with increasing sequence divergence and leads to underestimates of genetic diversity. Diversity estimates are, however, substantially more accurate with sheared RAD‐seq compared to other RAD‐seq approaches (e.g., double‐digest RAD‐seq: Peterson et al. 2012). Importantly for the coalescent analyses we present here, such allele dropout is unlikely to affect estimates of overall divergence across the clade of alleles. When in the rare cases it does, the bias is toward underestimation of the divergence age (Arnold et al. 2013), which would make our findings of deep divergence even more striking.
Our sequencing design facilitated accurate inference of sequence variation even with smaller population samples than are typical among population genomic studies. While allele frequency‐based statistics like FST have particularly high variance with small sample sizes (Willing et al. 2012), our study is fortunate to be built upon numerous properly powered, previous population genomic studies in stickleback, including extensive previous work in the three populations we analyze here. The genome‐wide patterns of FST we observed using our new approach closely matched multiple previous studies (Hohenlohe et al. 2010; Jones et al. 2012, Fig. 2A). Because of this extensive body of previous work, we relied on FST only to draw inference of larger genomic regions containing tens or hundreds of RAD loci. Instead, as stated above, the focus of this work is to extend these previous findings by addressing the ages of allelic divergence. We therefore do not expect the higher variance associated with smaller sample size to qualitatively influence our results. Importantly, estimation of sequence diversity (π) (Nei 1987) and divergence (dXY) (Nei 1987; Cruickshank and Hahn 2014) at a given locus improves greatly with increases in sequence length. Using equations (10.9) and (13.83) from Nei (1987; Box 1 in Cruickshank and Hahn 2014), the predicted sampling variances in both π and dXY using 700 bp sequences in five individuals are lower than those obtained using standard 100 bp sequences at any sample size (Fig. S1). Therefore, not only is this novel application of RAD‐seq ideally suited for our questions, our findings show that this approach may significantly decrease the necessary sample size, and thus resource expenditure, for many population genomic studies.
Figure 2.

The genealogical structure of parallel genomic divergence. (A) Genome‐wide FST for both marine‐freshwater comparisons was kernel‐smoothed using a normally distributed kernel with a window size of 500 kb. Inverted triangles indicate the locations of two genes known to show extensive marine‐freshwater haplotype divergence, eda and atp1a1. Three chromosomal inversions are highlighted in yellow. (B) Lineage sorting patterns were identified from maximum clade credibility trees for each RAD locus. Blue bars: haplotypes from both freshwater populations form a single monophyletic group; red: haplotypes from the marine population form a monophyletic group; black: A RAD locus is structured into reciprocally monophyletic marine and freshwater haplogroups.
LIBRARY PREPARATION
To identify sufficient sequence variation at a RAD locus, and to simplify downstream sequence processing and analysis, we took advantage of longer sequencing reads available on newer Illumina platforms and the phase information captured by paired‐end sequencing. We generated RAD libraries from these samples using the single‐digest sheared RAD protocol from Baird et al. (Baird et al. 2008) with the following specifications and adjustments: 1 μg of genomic DNA per fish was digested with the restriction enzyme PstI‐HF (New England Biolabs), followed by ligation to P1 Illumina adaptors with 6 bp inline barcodes. Ligated samples were multiplexed and sheared by sonication in a Bioruptor (Diagenode). To ensure that most of our paired‐end reads would overlap unambiguously and produce longer contiguous sequences, we selected a narrow fragment size range of 425–475 bp. The remainder of the protocol was per Baird et al. (Baird et al. 2008). All fish were sequenced on an Illumina HiSeq 2500 using paired‐end 250 bp sequencing reads at the University of Oregon's Genomics and Cell Characterization Core Facility (GC3F).
SEQUENCE PROCESSING
Raw Illumina sequence reads were demultiplexed, cleaned, and processed primarily using the Stacks v1.46 pipeline (Catchen et al. 2011, 2013). Paired‐end reads were demultiplexed with process_shortreads and cleaned using process_radtags using default criteria (throughout this article, names of scripts, programs, functions, and command‐line arguments will appear in italics). Overlapping read pairs were then merged with fastq‐join (Aronesty 2011). Pairs that failed to merge were removed from further analysis. To retain the majority of the sequence data for analysis in Stacks and still maintain adequate contig lengths, merged contigs were trimmed to 350 bp and all contigs shorter than 350 bp were discarded. We aligned these contigs to the stickleback reference genome (Jones et al. 2012; Glazer et al. 2015) using bbmap v35.69 with the most sensitive alignment settings (“vslow = t”; http://jgi.doe.gov/data-and-tools/bbtools/) and required that contigs mapped uniquely to the reference. We then used the pstacks, cstacks, and sstacks components of the Stacks pipeline to identify RAD‐tags and call SNPs using the maximum likelihood algorithm implemented in pstacks, create a catalog of RAD tags across individuals, and match tags across individuals. All data were then passed through the Stacks error correction module rxstacks to prune unlikely haplotypes. We ran the Stacks component program populations on the final dataset to filter loci genotyped in fewer than four individuals in each population and to create output files for sequence analysis. We use the naming conventions of Baird et al. (Baird et al. 2008): A “RAD tag” refers to sequence generated from a single end of a restriction site and the pair of RAD tags sequenced at a restriction site comprises a “RAD locus” (Fig. 1D).
We used the program phase v2.1 (Stephens et al. 2001; Stephens and Scheet 2005) to phase pairs of RAD tags originating from the same restriction site. We coded haplotypes present at each RAD tag, which often contain multiple SNPs, into multiallelic genotypes. This both simplified and reduced computing time for the phasing process. We also performed coalescent simulations to generate, “cut,” and rephase haplotypes to demonstrate the high accuracy of this method using sequences and sample sizes similar to those in this study (Fig. S2). Custom Python scripts automated this process and are included as supplementary files. We required that each individual had at least one sequenced haplotype at each tag for phasing to be attempted. If a sample had called genotypes at only one tag in the pair, the sample was removed from further analysis of that locus. The resultant phased haplotypes were uniformly 696 bp in length [(350 bp x 2)–‐4 bp PstI overlap] with 690 potential variable sites (6 bp PstI motif was invariant) and were used to generate sequence alignments for import into BEAST.
We recovered a total of 236,787 RAD tags after filtering, mapping to 151,813 PstI restriction sites. At 84,974 restriction sites, we recovered and successfully phased adjacent RAD tags (169,948 RAD tags) into single RAD loci. RAD tags with no variable sites were simply concatenated to the adjacent tag to form a single locus. We retained these 84,974 RAD loci for our analysis. For population genetic analyses, inclusion of singleton (i.e. unpaired) RAD tags did not qualitatively change our results. We chose to restrict genealogical analyses to loci of uniform length and to use the same set of loci in analyses of polymorphism and gene tree topologies.
NINESPINE STICKLEBACK GENOME ASSEMBLY
To estimate TMRCA of threespine stickleback RAD alleles, we used the ninespine stickleback (Pungitius pungitius) as an outgroup. RAD sequence analysis, however, relies on the presence of homologous restriction sites among sampled individuals and results in null alleles when mutations occur within a restriction site (Arnold et al. 2013). Because this probability increases with greater evolutionary distance among sampled sequences, we elected to use RAD‐seq to only estimate sequence variation within the threespine stickleback. We then generated a contig‐level de novo ninespine stickleback genome assembly from a single ninespine stickleback individual from St. Lawrence Island, Alaska (collected by J. Postlethwait) using DISCOVAR de novo revision 52488 (https://software.broadinstitute.org/software/discovar/blog/). We used this single ninespine stickleback haplotype to estimate threespine‐ninespine sequence divergence and time calibrate coalescence times within the threespine stickleback. DISCOVAR de novo requires a single shotgun library of paired‐end 250‐bp sequence reads from short‐insert‐length DNA fragments. High molecular weight genomic DNA was extracted from an ethanol‐preserved fin clip by proteinase K digestion followed by DNA extraction with Ampure magnetic beads. Purified genomic DNA was mechanically sheared by sonication and size selected to a range of 200–800 bp by gel electrophoresis and extraction. We selected this fragment range to agree with the recommendations for de novo assembly using DISCOVAR de novo. This library was sequenced on a single lane of an Illumina HiSeq2500 at the University of Oregon's Genomics and Cell Characterization Core Facility (GC3F: https://gc3f.uoregon.edu/). We assembled the draft ninespine stickleback genome using DISCOVAR de novo. Raw sequence read pairs were first quality filtered and adaptor sequence contamination removed using the program process_shortreads, which is included in the Stacks analysis pipeline. We ran the genome assembly on the University of Oregon's Applied Computational Instrument for Scientific Synthesis (ACISS: http://aciss-computing.uoregon.edu).
ALIGNMENT OF RAD TAGS TO THE NINESPINE ASSEMBLY
We included the single ninespine stickleback haplotype into our sequence analyses by aligning a single‐phased threespine stickleback RAD haplotype from each locus to the ninespine genome assembly. For those that aligned uniquely (59,254 RAD loci), we used a custom Python script to parse the alignment fields of the output BAM file (Li et al. 2009) and reconstruct the ninespine haplotype by introducing threespine‐ninespine substitutions into the threespine RAD locus sequence. The final dataset consists of 57,992 RAD loci that mapped to the 21 threespine stickleback chromosomes and aligned uniquely to the ninespine assembly.
LINEAGE SORTING AND TIME TO THE MOST RECENT COMMON ANCESTOR
Allelic divergence can occur by multiple modes of lineage sorting during adaptation. To identify patterns of lineage sorting associated with freshwater colonization, we analyzed gene tree topologies at all RAD loci using BEAST v. 1.7 (Drummond and Rambaut 2007; Drummond et al. 2012). We chose BEAST because it coestimates tree topologies and node ages for sequenced genomic loci. BEAST does not explicitly model natural selection, and this may affect divergence time estimates in genomic regions influenced by selection. However, other methods developed to estimate the age of adaptive alleles make assumptions that are likely not relevant to the evolutionary histories we infer here. First, some models assume a recent origin of an adaptive allele compared to adjacent genomic variation (Peter et al. 2012; Ormond et al. 2016)–‐the opposite of what we describe here–‐so that measures of variation at linked sites and the decay of linkage disequilibrium can be used to estimate when a sweep began. Selection in the stickleback populations we study likely acted on SGV, as has been supported by previous studies, and we hypothesize that this SGV may be quite old. Therefore, adaptive alleles already existed on distinct haplotype backgrounds, which masks the differences between selected and linked neutral sites.
Second, a recent model developed to infer ages of standing genetic variants assumes that the variant was evolving neutrally at some point during its trajectory through a population (Peter et al. 2012). This assumption is unlikely for many of the variants we detect here, except in the very distant past and for those variants that have evolved recently arose in genomic regions already heavily influenced by selection. Rather, the patterns of haplotype variation we observed in the genomic regions that differentiate marine and freshwater populations reflect long‐term maintenance and isolation of separate haplogroups that mimics population structure and even speciation, with selective sweeps being important but constituting a small minority of the time these haplotypes have segregating in the stickleback metapopulation. For all of these reasons, we therefore chose to estimate tree topologies and divergence times with BEAST, which makes minimal assumptions regarding specific evolutionary processes.
We used blanket parameters and priors for BEAST analyses across all RAD loci. Markov chain Monte Carlo (MCMC) runs of 1,000,000 states were specified, and trees logged every 100 states. We used a coalescent tree prior and the GTR+Γ substitution model with four rate categories and uniform priors for all substitution rates. We identified evidence of lineage sorting by using the program treeannotator v1.7.5 to select the maximum clade credibility (MCC) tree for each RAD locus and the is.monophyletic() function included in the R package “ape” v3.0 (Paradis et al. 2004; Popescu et al. 2012). We determined for each MCC tree whether tips originating from marine (RS) or freshwater (BL + BP) formed monophyletic clades.
To convert node ages estimated in BEAST into divergence times, in years, we assumed a 15 million‐year divergence time between threespine and ninespine stickleback at each RAD locus (Aldenhoven et al. 2010). The TMRCA of all alleles in each gene tree was set at 15 Mya and each node age of interest was converted into years relative to the total height of the tree. Additionally, to use the ninespine stickleback as an outgroup, we required that threespine stickleback haplotypes at a RAD locus were monophyletic to the exclusion of the ninespine haplotype. Doing so reduced our analysis to 49,672 RAD loci for analyses included in Fig. 4. We used medians of the posterior distributions as point estimates of TMRCA for each RAD locus. Because of the somewhat limited information from any single RAD locus, and because the facts of the genealogical process mean that the true TMRCA at any locus likely differs from the 15 My estimate (Kingman 1982a, b; Tajima 1983), we do not rely heavily on TMRCA estimates at individual RAD loci. Rather, we use these estimates to understand broad patterns of ancestry throughout the threespine stickleback genome.
Figure 4.

Marine‐freshwater divergence has evolved over millions of years, affecting large genomic regions. We performed Bayesian estimation of the time to the most recent common ancestor (TMRCA) of alleles at threespine stickleback RAD loci. We calibrated coalescence times within threespine stickleback by including a de novo genome assembly from the ninespine stickleback (Pungitius pungitius) and setting threespine‐ninespine divergence at 15 million years ago. (A) Maximum clade credibility RAD gene tree representative of the genome‐wide average TMRCA. Branches within threespine are colored by population of origin. (B) Kernel‐smoothed densities of TMRCA distributions for all RAD loci containing a monophyletic group of threespine stickleback alleles (light gray) and those structured into reciprocally monophyletic marine and freshwater haplogroups. (C) The genomic distribution of reciprocally monophyletic RAD loci (black, as in Fig. 2) is associated with increased TMRCA at a genomic scale. TMRCA outlier windows (those exceeding 99.9% of permuted genomic windows) are shown as gray bars. Genome‐wide TMRCA was kernel‐smoothed using a normally distributed kernel with a window size of 500 kb. Inverted triangles indicate the locations of eda and atp1a1. Three chromosomal inversions are highlighted in yellow.
We determined TMRCA outlier genomic regions by permuting and kernel smoothing the genomic distribution of TMRCA estimates using the same window sizes as we present in the main text. Windows where the smoothed TMRCA exceeded 99.9% of permuted windows were considered outliers. This method controls for the local density of RAD loci (poorly sampled regions will have larger confidence bands) and the size of the windows used.
SEQUENCE DIVERSITY AND HAPLOTYPE NETWORKS
We quantified sequence diversity within and among populations and sequence divergence between populations using R v3 (R Core Team 2016). We used the R package “ape” (Paradis et al. 2004) to compute pairwise distance matrices for all alleles at each RAD locus and used these matrices to calculate the average pairwise nucleotide distances, π, within and among populations along with dXY, the average pairwise distance between two sequences using only across‐population comparisons (Nei 1987). We also calculated the haplotype‐based FST from Hudson et al. (1992) implemented in the R package “PopGenome” v2.2.4 (Pfeifer et al. 2014). We used permutation tests written in R to identify differences in variation within‐ and between‐habitat type at divergent RAD loci versus the genome‐wide distributions. Mann–Whitney–Wilcoxon tests implemented in R were used to identify variation in genome‐wide diversity among populations and habitat types.
We constructed haplotype networks of the RAD loci at eda and atp1a1 using the infinite sites model with the function haploNet() in the R package “pegas” (Paradis 2010). The atp1a1 network was constructed from a RAD locus spanning exon 15 of atp1a1 and including portions of introns 14 and 15 at (chr1:21,726,729–21,727,381 [BROAD S1, v89]; chr1: 26,258,117–26,257,465 [rescaffolding from Glazer et al. (2015)]). The eda network spans exon 2 and portions of introns 1 and 3 of eda (chr4: 12,808,396–12,809,030).
CODE AVAILABILITY
Scripts used to phase RAD‐tags, summarize gene trees, calculate population genetic statistics, and produce figures and statistics presented in article are available at https://github.com/thomnelson/ancient-divergence. Scripts for processing raw sequence data are available from the authors upon request.
DATA AVAILABILITY
Raw sequence data supporting these findings are available on NCBI at PRJNA429207. The Pungitius pungitius draft assembly is available at PRJNA429208. The final datasets needed to reproduce the figures and statistics presented in the article are available at https://github.com/thomnelson/ancient-divergence.
Results and Discussion
Parallel adaptation to freshwater environments has been a major theme of stickleback evolutionary history (Bell and Foster 1994a). Stereotypical morphological changes to, for example, bony armor (Colosimo et al. 2004) and craniofacial structures (Kimmel et al. 2005) presumably reflect adaptation to similar selective regimes (Reimchen 1994; Arnegard et al. 2014). These phenotypic changes are accompanied by parallel genomic divergence (Hohenlohe et al. 2010; Jones et al. 2012), which involves large regions spanning many megabases (Schluter and Conte 2009; Roesti et al. 2014), including multiple chromosomal inversions (Jones et al. 2012). The leading hypothesis for the genetics of parallel divergence in stickleback posits that distinct freshwater‐adaptive haplotypes that are identical‐by‐descent (IBD) are shared among freshwater populations due to historical gene flow between marine and freshwater populations (Schluter and Conte 2009). We tested for the presence of these haplotypes directly and at a genomic scale.
PARALLEL DIVERGENCE INVOLVES A SHARED, GENOME‐WIDE SUITE OF HAPLOTYPES
Our sequencing strategy produced 57,992 RAD loci, each with 690 potential variable sites, present across the three threespine stickleback populations and aligned to the ninespine stickleback genome assembly. These data comprise over 40 Mb of sequence, or nearly 10% of the threespine stickleback genome (9.5% of 419 Mb assigned to chromosomes) (Jones et al. 2012; Glazer et al. 2015). All loci we recovered were polymorphic and we observed a median of seven segregating sites per locus (range: 2–155, Fig. S3, Table S1). By including haplotypes from all three populations in these genealogical analyses, we were able to jointly calculate population genetic statistics (FST, π, dXY) and identify patterns of identity‐by‐descent (IBD) among populations, which we defined as haplotypes from two populations forming a monophyletic group to the exclusion of the third population.
We find that parallel population genomic divergence in the two freshwater pond populations consistently involved haplotypes that were identical‐by‐descent (IBD) among both freshwater populations (Fig. 2). Background FST between populations was moderate, ranging from 0.139–0.226 (FST(RS‐BL) = 0.139, FST(RS‐BP) = 0.194, FST(BL‐BP) = 0.226; two‐sided Mann–Whitney test for all pairwise comparisons: P ≤ 1 × 10−10). Genome‐wide differentiation was highest between the freshwater populations, supporting the hypothesis of independent colonization and in agreement with previous morphological (Francis et al. 1986) and genetic data (Hohenlohe et al. 2010). The genomic distributions of marine‐freshwater FST were similar to those previously reported (Hohenlohe et al. 2010), including outlier regions over a broad span of chromosome 4 in which the eda gene is embedded (orange triangle in Fig. 2A), and three regions now known to be associated with chromosomal inversions on chromosomes 1, 11, and 21 (Jones et al. 2012). The gene atp1a1 (green triangle in Fig. 2A) is contained within inv1. As expected, we found distinct haplogroups associated with marine and freshwater habitats at both eda and atp1a1 (Fig. 3, insets).
Figure 3.

Extensive sequence divergence between marine and freshwater haplogroups accompanies reciprocal monophyly. For each reciprocally monophyletic RAD locus, we calculated sequence variation (π) within and sequence divergence between habitat types (dXY). Each RAD locus is shown as a pair of lines connecting estimates of π and dXY. Boxplots show distributions across all reciprocally monophyletic RAD loci: Boxes are upper and lower quartiles, including the median; whiskers extend to 1.5 × interquartile range. Dashed lines are the genome‐wide medians. Single RAD loci from within the transcribed regions of eda and atp1a1 are shown as gold and green lines, respectively, and presented as haplotype networks. Dots represent mutational steps. Circle sizes indicate the number of haplotypes and colors indicate population of origin as in Figure 1. Each network = 28 haplotypes.
Strikingly, this finding of habitat‐specific haplogroups was not at all unique to these well‐studied genes or chromosomal inversions. The two isolated freshwater populations shared IBD haplotypes within all common marine‐freshwater FST peaks even though IBD was rare elsewhere (Fig. 2B). Furthermore, we observed a separate clade of haplotypes representing the marine RS population at the majority (1129 of 2172, 52%) of RAD loci showing freshwater IBD. The result was a genome‐wide pattern of reciprocal monophyly between marine and freshwater haplotypes. Notably, this is the same genealogical structure previously reported at eda (Colosimo et al. 2005; Roesti et al. 2014) and atp1a1 (Roesti et al. 2014), demonstrating that these loci are but a small part of a genome‐wide suite of genetic variation sharing similar habitat‐specific evolutionary histories, and the previous documentation of their genealogies was a harbinger of a much more extensive pattern across the genome revealed here. Hereafter, we refer collectively to this class of RAD loci as “divergent loci.”
ADAPTIVE MARINE‐FRESHWATER SEQUENCE DIVERGENCE INVOLVES ANCIENT ALLELIC ORIGINS
Because the genealogical structure of divergence across the genome mirrors that at eda and atp1a1, we asked whether levels of sequence variation and divergence also showed consistent genomic patterns. At all RAD loci we therefore calculated π within each population, as well as in the combined freshwater populations, and dXY between marine and freshwater habitat types. Genome‐wide diversity was similar across populations and habitat types (mean π RS = 0.0032, π BL = 0.0034, π BP = 0.0026, π FW = 0.0038) and comparable to previous estimates (Hohenlohe et al. 2010). Likewise, genome‐wide dXY among habitat types was modest (0.0049) when compared to π across all populations (π = 0.0042, two‐sided Mann–Whitney test: P ≤ 1 × 10−10; Fig. S4). Among divergent loci, however, we observed reductions in diversity in both habitats (mean π RS‐divergent = 0.0012, π FW‐divergent = 0.0016, two‐sided permutation test: P ≤ 1 × 10−4, Fig. 3), indicating natural selection in both habitats. Sequence divergence associated with reciprocal monophyly was striking, averaging nearly three times the genome‐wide mean (mean dXY ‐divergent = 0.0124). This divergence ranged more than an order of magnitude (0.0013–0.0442), from substantially lower than the genome‐wide average to ten times greater than the average. These findings indicate that much of the genetic variation underlying adaptive divergence is vastly older than the diverging freshwater populations themselves. Not only was adaptive variation standing and structured by habitat, but it has been segregating and accumulating for millennia.
These data clearly support the hypothesis of Schluter and Conte (2009) of ancient haplotypes “transported” among freshwater populations. Much of the divergence we observed was ancient in origin, with levels of sequence divergence at some RAD loci exceeding that observed at eda (Fig. 3, gold line) and suggestive of divergence times of at least two million years ago (Colosimo et al. 2005). Our observation that sequence variation was consistently reduced in both habitat types emphasizes that alternative haplotypes at these loci are likely selected for in the marine population as well as the freshwater. These alternative fitness optima—driven by divergent ecologies—provide a favorable landscape for the maintenance of variation (Charlesworth et al. 1997; Lenormand 2002), but also lead to a more potent barrier to gene flow among freshwater populations if there are fitness consequences in the marine habitat for stickleback carrying freshwater‐adaptive variation. Conditional fitness effects through genetic interactions (i.e., dominance or epistasis: Otto and Bourguet 1999; Phillips 2008) and genotype‐by‐environment interactions (McGuigan et al. 2011) could potentially extend the residence time of freshwater haplotypes in the marine habitat. Future work should consider the phenotypic effects of divergently adaptive variation in different external environments (McGuigan et al. 2011; McCairns and Bernatchez 2012).
Adaptive divergence between marine and freshwater stickleback genomes is likely ongoing, with recently derived alleles arising on already highly divergent genomic backgrounds. We found reciprocal monophyly associated with a spectrum of sequence divergence, including a substantial fraction of divergent loci (11.0%, 124/1129) with dXY below the genome‐wide average. Thus, ongoing marine‐freshwater ecological divergence may continue to yield additional marine‐freshwater genomic divergence. Moreover, while this younger variation is shared between the freshwater populations in this study, and localizes to genomic regions of divergence shared globally (Jones et al. 2012), some adaptive variants may be distributed only locally (e.g., limited to southern Alaska or the eastern Pacific basin). Global surveys of shared variation have been performed (Jones et al. 2012), but future work in this system should quantify the distributions of locally or regionally limited genomic variation involved in ecological divergence, because regional pools of variation may contribute substantially to stickleback genomic and phenotypic diversity (Stuart et al. 2017).
HABITAT‐ASSOCIATED GENOMIC DIVERGENCE IS AS OLD AS THE THREESPINE STICKLEBACK SPECIES
Sequence divergence provides an important relative, but ultimately incomplete, evolutionary timescale. To more directly compare the timescales of ecological adaptation and genomic evolution, we translated patterns of sequence variation into the time to the most recent common ancestor (TMRCA) of allelic variation, in years. We find that the divergence of marine and freshwater haplotypes has been ongoing for millions of years and extends back to the split with the ninespine stickleback lineage (Fig. 4B). Genome‐wide variation averaged 4.1 MY old, and TMRCA for the vast majority of RAD loci was under 5 MY old. In contrast, divergence times at habitat‐associated loci averaged 6.4 MYA and, amazingly, the most ancient 10% (118 of 1129) are each estimated at over 10 MY old.
This deep genomic divergence not only underscores that local adaptation to marine and freshwater habitats has been occurring throughout the history of the threespine stickleback lineage–‐for which there is evidence in the fossil record going back 10 million years (Bell et al. 1985)–‐but it also demonstrates that at least some of the variation fueling those ancient events has persisted until the present day. In fact, the most highly divergent regions of the threespine stickleback genome also had the lowest rates of monophyly of threespine stickleback haplotypes (Fig. S6). This is consistent with marine‐freshwater allelic divergence occurring near to or before the divergence of the threespine and ninespine lineages. It is tempting to think that marine‐freshwater ecological divergence, which has evolved convergently in both species and involves similar phenotypes, may involve alleles that are IBD and arose before the split of these species. However, QTL analyses of ecologically relevant armor traits point either to independent genetic architectures in the two species (Shapiro et al. 2009) or to a locus associated with repeated mutational events among threespine stickleback populations (Chan et al. 2010; Shikano et al. 2013). Nevertheless, in some genomic regions marine and freshwater threespine stickleback are as divergent as threespine and ninespine stickleback, which are classified into separate genera.
LONG‐TERM DIVERGENCE MAINTAINS LINKED VARIATION AND PROMOTES GENOMIC STRUCTURAL EVOLUTION
Adaptive divergence has impacted the history of the stickleback genome as a whole (Fig. 4C). We identified 32.6 Mb, or 7.5% of the genome, as having elevated TMRCA (gray boxes in Fig. 4C; two‐sided permutation test of smoothed genomic intervals, P ≤ 0.001). Outside of the nonrecombining portion of the sex chromosome (chr. 19), the oldest regions of the stickleback genome were those enriched for divergent loci. Patterns of ancient ancestry closely mirrored recent divergence in allele frequencies (Fig. 2A) and it appears that historical and contemporary marine‐freshwater divergence has impacted ancestry across much of the length of some chromosomes. Chromosome 4, for example, contains at least three broad peaks in TMRCA and a total of 5.9 Mb identified as genome‐wide outliers (two‐sided permutation test, P ≤ 0.001). This chromosome has been of particular interest because of its association with a number of phenotypes (Colosimo et al. 2004; Miller et al. 2014), including fitness (Barrett et al. 2008). We found the major‐effect armor plate locus eda comprised a local peak (mean TMRCA = 6.4 MYA) nested within a large region of deep ancestry spanning 8.1 Mb. Moreover, at least two other peaks distal to eda, centered at 21.4 Mb and 26.6 Mb, were also several million years older than the genomic average at 6.8 MYA and 7.0 MYA, respectively.
Intriguingly, genomic regions of elevated TMRCA remained outliers even after removing marine‐freshwater relative divergence outlier loci (as measured by FST: Fig. S5). We estimated that 7.5% of the genome had increased TMRCA even though only 1.9% of RAD loci (1129 of 57,992) were classified as divergent based on marine‐freshwater reciprocal monophyly. When we removed these loci, along with loci with elevated marine‐freshwater differentiation (FST > 0.5), many of the regions in which they resided were still TMRCA outliers. It is possible that the remainder of this old variation is neutral with respect to fitness. However, we identified divergence outliers based on only a single axis of divergence: the marine‐freshwater axis. Throughout the entire species range, populations are locally experiencing multiple axes of divergence, including lake‐stream and benthic‐limnetic axes (McKinnon and Rundle 2002), that often shares a common genomic architecture (Deagle et al. 2012; Roesti et al. 2015). Our data may indicate underlying similarities in selection regimes. Alternatively, this colocalized ancient variation may represent the accumulation of adaptive divergence along multiple axes in the same genomic regions, whether or not the underlying adaptive variants are the same. Aspects of the genomic architecture–‐such as gene density, local recombination rates, and segregating structural variants–‐may in part govern where in the genome adaptive divergence can occur (Feulner et al. 2013; Roesti et al. 2013; Aeschbacher et al. 2017; Samuk et al. 2017). Multiple axes of divergence may therefore act synergistically to maintain genomic variation across the stickleback metapopulation.
Nevertheless, much of the ancient variation we observe may in fact itself be neutral, having been maintained by close linkage to loci under divergent selection between the marine and freshwater habitats (Barton and Bengtsson 1986; Charlesworth et al. 1997). Indeed, the broadest peaks of TMRCA we observe occur in genomic regions with low rates of recombination in other stickleback populations (Roesti et al. 2013; Glazer et al. 2015), which would extend the size of the linked region affected by divergent selection. On ecological timescales, low recombination rates in stickleback are thought to promote divergence by making locally adapted genomic regions resistant to gene flow (Roesti et al. 2013). Our results potentially extend the inferred impact of recombination rate variation on genomic variation to timescales that are 1000‐fold longer, maintaining both multimillion‐year‐old adaptive variation and large stores of linked genetic variation. Future modeling efforts will be needed to explore the range of population genetic parameter values (e.g., selection coefficients, migration rates, and recombination rates) required to produce the extent of divergence we see here.
Lastly, our findings further support the hypothesis that known chromosomal inversions maintain globally distributed, multilocus haplotypes. The three chromosomal inversions known to be associated with marine‐freshwater divergence (Jones et al. 2012; Roesti et al. 2015) (inv1, inv11, and inv21; yellow bars in Fig. 4C) all showed sharp spikes in TMRCA. Genomic signatures of these inversions are distributed throughout the species range, including coastal marine‐freshwater population pairs in the Pacific and Atlantic basins (Jones et al. 2012) and inland lake‐stream pairs in Switzerland (Roesti et al. 2015). Despite our limited geographic sampling, our finding that all three of these inversions are over six million years old is further evidence of single, ancient origins of each, followed by their spread across the species range. Each inversion contained a high density of divergent RAD loci (inv1: 64% of loci divergent; inv11: 60%; inv21: 71%) but we also identified regions within these inversions in which haplotypes from marine or freshwater habitats, or both, were not monophyletic. inv1 and inv11 both contained two regions separated by loci in which neither habitat type was monophyletic; inv21, the largest of the three, contained ten such regions. Additionally, TMRCA and FST decreased sharply to background levels outside of the inversions, demonstrating the potential for gene flow and recombination to homogenize variation in these regions. We interpret this as evidence that these inversions help maintain linkage disequilibrium among multiple divergently adaptive variants in regions susceptible to homogenization (Kirkpatrick and Barton 2006; Guerrero et al. 2012). The presence of these inversions in addition to divergence in regions of generally low recombination (Glazer et al. 2015), therefore, further supports the hypothesis that the recombinational landscape can influence where in the genome adaptive divergence can occur (Roesti et al. 2013; Samuk et al. 2017) and emphasizes the degree to which gene flow among divergently adapted stickleback populations has impacted global genomic diversity.
Conclusions
Selection operating on two very different timescales—the ecological and the geological—has shaped genomic patterns of SGV in the threespine stickleback. On ecological timescales, selection drives phenotypic divergence in decades or millennia by sorting SGV across geography and throughout the genome (Hendry et al. 2002; Hohenlohe et al. 2010; Lescak et al. 2015; Roesti et al. 2015). Our findings show that persistent ecological diversity and continual local adaptation of stickleback has set the stage for long‐term divergent selection and for the accumulation and maintenance of adaptive variation over geological timescales. Some of the genetic variants fueling contemporary, rapid adaptation may even have been present–‐and under selection–‐since before the threespine‐ninespine stickleback lineages split. The genomic architecture of ecological adaptation in one population is therefore the product of millions of years of evolution taking place across multiple populations, many of which are now extinct. These findings underscore the need to understand macroevolutionary histories of genetic variation when studying microevolutionary processes, and vice versa.
Associate Editor: J. Slate
Supporting information
Figure S1. Longer sequences reduce variance in estimates of sequence diversity and divergence.
Figure S2. Accurate phasing of RAD loci even at low population‐level sampling.
Figure S3. RAD‐seq effectively samples genome‐wide sequence diversity.
Figure S4. Relative (FST) and absolute (dXY) sequence divergence are positively correlated genome‐wide in two instances of marine‐freshwater divergence.
Figure S5. TMRCA outlier regions remain outliers after removing highly differentiated RAD loci.
Figure S6. Marine‐freshwater divergence in threespine sticklebacks is associated with reduced signal of monophyly of threespine haplotypes.
Table S1. Sequencing summary for threespine stickleback samples.
Table S2. Genome assembly statistics for Pungitius pungitius.
AUTHOR CONTRIBUTIONS
T.C.N. and W.A.C. conceived of the project and designed sampling, sequencing, and analysis. T.C.N. prepared sequencing libraries, wrote software, and performed data analysis. T.C.N. and W.A.C. wrote the article.
ACKNOWLEDGMENTS
We thank P. Phillips, M. Streisfeld, J. Postlethwait, K. Sterner for valuable input and lively discussion throughout this project. We also thank K. Alligood, E. Beck, S. Bassham, M. Chase, M. Currey, M. Hahn, L. Fishman, P. Ralph, C. Small, S. Stankowski, J. Willis, five anonymous reviewers, and members of the Cresko Lab and the Institute of Ecology and Evolution for advice and comments on previous versions of this manuscript. J. Postlethwait graciously donated ninespine stickleback tissue, collected under awards NIH R01 OD011116 and NIEHS R01 ES019620. We acknowledge National Science Foundation awards NSF DEB 1501423 (W.A.C. and T.C.N.), NSF DEB 0919090 (W.A.C.), and National Institutes of Health award NIH T32GM007413 (T.C.N.).
CONFLICTS OF INTEREST
The authors declare no conflicts of interest.
LITERATURE CITED
- Aeschbacher, S. , Selby J. P., Willis J. H., and Coop G.. 2017. Population‐genomic inference of the strength and timing of selection against gene flow. Proc. Natl. Acad. Sci. USA 114:7061–7066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aldenhoven, J. T. , Miller M. A., Corneli P. S., and Shapiro M. D.. 2010. Phylogeography of ninespine sticklebacks (Pungitius pungitius) in North America: glacial refugia and the origins of adaptive traits. Mol. Ecol. 19:4061–4076. [DOI] [PubMed] [Google Scholar]
- Arnegard, M. E. , McGee M. D., Matthews B., Marchinko K. B., Conte G. L., Kabir S., Bedford N., Bergek S., Chan Y. F., Jones F. C., et al. 2014. Genetics of ecological divergence during speciation. Nature 511:307–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arnold, B. , Corbett‐Detig R. B., Hartl D., and Bomblies K.. 2013. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22:3179–3190. [DOI] [PubMed] [Google Scholar]
- Aronesty, E. 2011. ea‐utils: command‐line tools for processing biological sequencing data. Expression Analysis, Durham, NC.
- Baird, N. A. , Etter P. D., Atwood T. S., Currey M. C., Shiver A. L., Lewis Z. A., Selker E. U., Cresko W. A., and Johnson E. A.. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. Plos One 3:e3376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrett, R. D. , Rogers S. M., and Schluter D.. 2008. Natural selection on a major armor gene in threespine stickleback. Science 322:255–257. [DOI] [PubMed] [Google Scholar]
- Barrett, R. D. H. , and Schluter D.. 2008. Adaptation from standing genetic variation. Trends Ecol. Evol. 23:38–44. [DOI] [PubMed] [Google Scholar]
- Barton, N. , and Bengtsson B. O.. 1986. The barrier to genetic exchange between hybridizing populations. Heredity 57:357–376. [DOI] [PubMed] [Google Scholar]
- Bell, M. A. , Baumgartner J. V., and Olson E. C.. 1985. Patterns of temporal change in single morphological characters of a miocene stickleback fish. Paleobiology 11:258–271. [Google Scholar]
- Bell, M. A. , and Foster S. A.. 1994a. Evolutionary inference: the value of viewing evolution through stickleback‐tinted glasses Pp. 472–486 in Bell M. A. and Foster S. A., eds. The evolutionary biology of the threespine stickleback. Oxford Univ. Press, New York. [Google Scholar]
- Bell, M. A. , and Foster S. A.. 1994b. Introduction to the evolutionary biology of the threespine stickleback Pp. 1–27 in Bell M. A. and Foster S. A., eds. The evolutionary biology of the threespine stickleback. Oxford Univ. Press, New York. [Google Scholar]
- Catchen, J. , Hohenlohe P. A., Bassham S., Amores A., and Cresko W. A.. 2013. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22:3124–3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Catchen, J. M. , Amores A., Hohenlohe P., Cresko W., and Postlethwait J. H.. 2011. Stacks: building and genotyping loci de novo from short‐read sequences. G3—Genes Genom Genet. 1:171–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan, Y. F. , Marks M. E., Jones F. C., G. Villarreal, Jr. , Shapiro M. D., Brady S. D., Southwick A. M., Absher D. M., Grimwood J., Schmutz J., et al. 2010. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science 327:302–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth, B. , Morgan M. T., and Charlesworth D.. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth, B. , Nordborg M., and Charlesworth D.. 1997. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet. Res. 70:155–174. [DOI] [PubMed] [Google Scholar]
- Colosimo, P. F. , Hosemann K. E., Balabhadra S., G. Villarreal, Jr. , Dickson M., Grimwood J., Schmutz J., Myers R. M., Schluter D., and Kingsley D. M.. 2005. Widespread parallel evolution in sticklebacks by repeated fixation of Ectodysplasin alleles. Science 307:1928–1933. [DOI] [PubMed] [Google Scholar]
- Colosimo, P. F. , Peichel C. L., Nereng K., Blackman B. K., Shapiro M. D., Schluter D., and Kingsley D. M.. 2004. The genetic architecture of parallel armor plate reduction in threespine sticklebacks. PLoS Biol. 2:635–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cresko, W. A. , Amores A., Wilson C., Murphy J., Currey M., Phillips P., Bell M. A., Kimmel C. B., and Postlethwait J. H.. 2004. Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations. Proc. Natl. Acad. Sci. USA 101:6050–6055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cruickshank, T. E. , and Hahn M. W.. 2014. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23:3133–3157. [DOI] [PubMed] [Google Scholar]
- Davey, J. W. , Hohenlohe P. A., Etter P. D., Boone J. Q., Catchen J. M., and Blaxter M. L.. 2011. Genome‐wide genetic marker discovery and genotyping using next‐generation sequencing. Nat. Rev. Genet. 12:499–510. [DOI] [PubMed] [Google Scholar]
- Deagle, B. E. , Jones F. C., Chan Y. F., Absher D. M., Kingsley D. M., and Reimchen T. E.. 2012. Population genomics of parallel phenotypic evolution in stickleback across stream‐lake ecological transitions. Proc. Roy Soc. B Biol. Sci. 279:1277–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Domingues, V. S. , Poh Y. P., Peterson B. K., Pennings P. S., Jensen J. D., and Hoekstra H. E.. 2012. Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66:3209–3223. [DOI] [PubMed] [Google Scholar]
- Drummond, A. J. , and Rambaut A.. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. Bmc. Evol. Biol. 7:214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drummond, A. J. , Suchard M. A., Xie D., and Rambaut A.. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29:1969–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feulner, P. G. D. , Chain F. J. J., Panchal M., Eizaguirre C., Kalbe M., Lenz T. L., Mundry M., Samonte I. E., Stoll M., Milinski M., et al. 2013. Genome‐wide patterns of standing genetic variation in a marine population of three‐spined sticklebacks. Mol. Ecol. 22:635–649. [DOI] [PubMed] [Google Scholar]
- Fontaine, M. C. , Pease J. B., Steele A., Waterhouse R. M., Neafsey D. E., Sharakhov I. V., Jiang X., Hall A. B., Catteruccia F., Kakani E., et al. 2015. Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347:1258524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Francis, R. C. , Baumgartner J. V., Havens A. C., and Bell M. A.. 1986. Historical and ecological sources of variation among lake populations of threespine sticklebacks, Gasterosteus aculeatus, near Cook Inlet, Alaska. Can. J. Zool. 64:2257–2265. [Google Scholar]
- Gautier, M. , Gharbi K., Cezard T., Foucaud J., Kerdelhue C., Pudlo P., Cornuet J. M., and Estoup A.. 2013. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22:3165–3178. [DOI] [PubMed] [Google Scholar]
- Glazer, A. M. , Killingbeck E. E., Mitros T., Rokhsar D. S., and Miller C. T.. 2015. Genome assembly improvement and mapping convergently evolved skeletal traits in sticklebacks with genotyping‐by‐sequencing. G3 Genes Genom Genet. 5:1463–1472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant, P. R. , and Grant B. R.. 2002. Unpredictable evolution in a 30‐year study of Darwin's finches. Science 296:707–711. [DOI] [PubMed] [Google Scholar]
- Guerrero, R. F. , Rousset F., and Kirkpatrick M.. 2012. Coalescent patterns for chromosomal inversions in divergent populations. Phil. Trans. Roy Soc. B Biol. Sci. 367:430–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendry, A. P. , Taylor E. B., and McPhail J. D.. 2002. Adaptive divergence and the balance between selection and gene flow: lake and stream stickleback in the Misty system. Evolution 56:1199–1216. [DOI] [PubMed] [Google Scholar]
- Hohenlohe, P. A. , Bassham S., Etter P. D., Stiffler N., Johnson E. A., and Cresko W. A.. 2010. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. Plos Genet. 6:e1000862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson, R. R. , Slatkin M., and Maddison W. P.. 1992. Estimation of levels of gene flow from DNA‐sequence data. Genetics 132:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huerta‐Sánchez, E. , Jin X., Asan, Bianba Z., Peter B. M., Vinckenbosch N., Liang Y., Yi X., He M., Somel M., et al. 2014. Altitude adaptation in Tibetans caused by introgression of Denisovan‐like DNA. Nature 512:194–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones, F. C. , Grabherr M. G., Chan Y. F., Russell P., Mauceli E., Johnson J., Swofford R., Pirun M., Zody M. C., White S., et al. 2012. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484:55–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimmel, C. B. , Ullmann B., Walker C., Wilson C., Currey M., Phillips P. C., Bell M. A., Postlethwait J. H., and Cresko W. A.. 2005. Evolution and development of facial bone morphology in threespine sticklebacks. Proc. Natl. Acad. Sci. USA 102:5791–5796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingman, J. F. C. 1982a. The coalescent. Stochastic Processes Their Appl. 13:235–248. [Google Scholar]
- Kingman, J. F. C. 1982b. On the genealogy of large populations. J. Appl. Prob. 19:27–43. [Google Scholar]
- Kirkpatrick, M. , and Barton N.. 2006. Chromosome inversions, local adaptation and speciation. Genetics 173:419–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lenormand, T. 2002. Gene flow and the limits to natural selection. Trends Ecol. Evol. 17:183–189. [Google Scholar]
- Lescak, E. A. , Bassham S. L., Catchen J., Gelmond O., Sherbick M. L., von Hippel F. A., and Cresko W. A.. 2015. Evolution of stickleback in 50 years on earthquake‐uplifted islands. Proc. Natl. Acad. Sci. USA 112:E7204–E7212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, H. , Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G. R., Durbin R., and Subgroup G. P. D. P.. 2009. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linnen, C. R. , Kingsley E. P., Jensen J. D., and Hoekstra H. E.. 2009. On the origin and spread of an adaptive allele in deer mice. Science 325:1095–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCairns, R. J. S. , and Bernatchez L.. 2012. Plasticity and heritability of morphological variation within and between parapatric stickleback demes. J. Evol. Biol. 25:1097–1112. [DOI] [PubMed] [Google Scholar]
- McGuigan, K. , Nishimura N., Currey M., Hurwit D., and Cresko W. A.. 2011. Cryptic genetic variation and body size evolution in threespine stickleback. Evolution 65:1203–1211. [DOI] [PubMed] [Google Scholar]
- McKinnon, J. S. , and Rundle H. D.. 2002. Speciation in nature: the threespine stickleback model systems. Trends Ecol. Evol. 17:480–488. [Google Scholar]
- Miller, C. T. , Glazer A. M., Summers B. R., Blackman B. K., Norman A. R., Shapiro M. D., Cole B. L., Peichel C. L., Schluter D., and Kingsley D. M.. 2014. Modular skeletal evolution in sticklebacks is controlled by additive and clustered quantitative trait loci. Genetics 197:405–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nei, M. 1987. Molecular evolutionary genetics. Columbia Univ. Press, New York. [Google Scholar]
- Ormond, L. , Foll M., Ewing G. B., Pfeifer S. P., and Jensen J. D.. 2016. Inferring the age of a fixed beneficial allele. Mol. Ecol. 25:157–169. [DOI] [PubMed] [Google Scholar]
- Orr, H. A. 2005. The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6:119–127. [DOI] [PubMed] [Google Scholar]
- Otto, S. P. , and Bourguet D.. 1999. Balanced polymorphisms and the evolution of dominance. Am. Nat. 153:561–574. [DOI] [PubMed] [Google Scholar]
- Paradis, E. 2010. pegas: an R package for population genetics with an integrated‐modular approach. Bioinformatics 26:419–420. [DOI] [PubMed] [Google Scholar]
- Paradis, E. , Claude J., and Strimmer K.. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290. [DOI] [PubMed] [Google Scholar]
- Pease, J. B. , Haak D. C., Hahn M. W., and Moyle L. C.. 2016. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biol. 14:e1002379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peter, B. M. , Huerta‐Sanchez E., and Nielsen R.. 2012. Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet. 8:e1003011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peterson, B. K. , Weber J. N., Kay E. H., Fisher H. S., and Hoekstra H. E.. 2012. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non‐model species. PLoS One 7:e37135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pfeifer, B. , Wittelsburger U., Ramos‐Onsins S. E., and Lercher M. J.. 2014. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31:1929–1936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips, P. C. 2008. Epistasis: the essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9:855–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Popescu, A. A. , Huber K. T., and Paradis E.. 2012. ape 3.0: new tools for distance‐based phylogenetics and evolutionary analysis in R. Bioinformatics 28:1536–1537. [DOI] [PubMed] [Google Scholar]
- Rastas, P. , Calboli F. C., Guo B., Shikano T., and Merila J.. 2015. Construction of ultradense linkage maps with Lep‐MAP2: stickleback F2 recombinant crosses as an example. Genome Biol. Evol. 8:78–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reimchen, T. E. 1994. Predators and morphological evolution in threespine stickleback Pp. 240–276 in Bell M. A. and Foster S. A., eds. The evolutionary biology of the threespine stickleback. Oxford Univ. Press, New York. [Google Scholar]
- Roesti, M. , Gavrilets S., Hendry A. P., Salzburger W., and Berner D.. 2014. The genomic signature of parallel adaptation from shared genetic variation. Mol. Ecol. 23:3944–3956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesti, M. , Kueng B., Moser D., and Berner D.. 2015. The genomics of ecological vicariance in threespine stickleback fish. Nat. Commun. 6:8767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesti, M. , Moser D., and Berner D.. 2013. Recombination in the threespine stickleback genome—patterns and consequences. Mol. Ecol. 22:3014–3027. [DOI] [PubMed] [Google Scholar]
- Samuk, K. , Owens G. L., Delmore K. E., Miller S. E., Rennison D. J., and Schluter D.. 2017. Gene flow and selection interact to promote adaptive divergence in regions of low recombination. Mol. Ecol. 26:4378–4390. [DOI] [PubMed] [Google Scholar]
- Schlotterer, C. , Tobler R., Kofler R., and Nolte V.. 2014. Sequencing pools of individuals‐mining genome‐wide polymorphism data without big funding. Nat. Rev. Genet. 15:749–763. [DOI] [PubMed] [Google Scholar]
- Schluter, D. , and Conte G. L.. 2009. Genetics and ecological speciation. Proc. Natl. Acad. Sci. USA 106(Suppl 1):9955–9962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrider, D. R. , and Kern A. D.. 2017. Soft sweeps are the dominant mode of adaptation in the human genome. Mol. Biol. Evol. 34:1863–1877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shapiro, M. D. , Summers B. R., Balabhadra S., Aldenhoven J. T., Miller A. L., Cunningham C. B., Bell M. A., and Kingsley D. M.. 2009. The genetic architecture of skeletal convergence and sex determination in ninespine sticklebacks. Curr. Biol. 19:1140–1145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shikano, T. , Laine V. N., Herczeg G., Vilkki J., and Merilä J.. 2013. Genetic architecture of parallel pelvic reduction in ninespine sticklebacks. G3‐Genes Genom. Genet. 3:1833–1842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stankowski, S. , and Streisfeld M. A.. 2015. Introgressive hybridization facilitates adaptive divergence in a recent radiation of monkeyflowers. Proc. Roy Soc. B Biol. Sci. 282:20151666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephens, M. , and Scheet P.. 2005. Accounting for decay of linkage disequilibrium in haplotype inference and missing‐data imputation. Am. J. Hum. Genet. 76:449–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephens, M. , Smith N. J., and Donnelly P.. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart, Y. E. , Veen T., Weber J. N., Hanson D., Ravinet M., Lohman B. K., Thompson C. J., Tasneem T., Doggett A., Izen R., et al. 2017. Contrasting effects of environment and genetics generate a continuum of parallel evolution. Nat. Ecol. Evol. 1:0158. [DOI] [PubMed] [Google Scholar]
- Tajima, F. 1983. Evolutionary relationship of DNA‐sequences in finite populations. Genetics 105:437–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team . 2016. R: a language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria.
- Wakeley, J. 2009. Coalescent theory: An introduction. Harvard Univ. Press, New York. [Google Scholar]
- Willing, E. M. , Dreyer C., and van Oosterhout C.. 2012. Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. Plos One 7:e42649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright, K. M. , Lloyd D., Lowry D. B., Macnair M. R., and Willis J. H.. 2013. Indirect evolution of hybrid lethality due to linkage with selected locus in Mimulus guttatus . PLoS Biol 11:e1001497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proc. Sixth Internat. Con. Genet. 1:356–366. [Google Scholar]
- Wund, M. A. , Singh O. D., Geiselman A., and Bell M. A.. 2016. Morphological evolution of an anadromous threespine stickleback population within one generation after reintroduction to Cheney Lake, Alaska. Evol. Ecol. Res. 17:203–224. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Longer sequences reduce variance in estimates of sequence diversity and divergence.
Figure S2. Accurate phasing of RAD loci even at low population‐level sampling.
Figure S3. RAD‐seq effectively samples genome‐wide sequence diversity.
Figure S4. Relative (FST) and absolute (dXY) sequence divergence are positively correlated genome‐wide in two instances of marine‐freshwater divergence.
Figure S5. TMRCA outlier regions remain outliers after removing highly differentiated RAD loci.
Figure S6. Marine‐freshwater divergence in threespine sticklebacks is associated with reduced signal of monophyly of threespine haplotypes.
Table S1. Sequencing summary for threespine stickleback samples.
Table S2. Genome assembly statistics for Pungitius pungitius.
Data Availability Statement
Raw sequence data supporting these findings are available on NCBI at PRJNA429207. The Pungitius pungitius draft assembly is available at PRJNA429208. The final datasets needed to reproduce the figures and statistics presented in the article are available at https://github.com/thomnelson/ancient-divergence.
