Abstract
Currents are unique drivers of oceanic phylogeography and thus determine the distribution of marine coastal species, along with past glaciations and sea-level changes. Here we reconstruct the worldwide colonization history of eelgrass (Zostera marina L.), the most widely distributed marine flowering plant or seagrass from its origin in the Northwest Pacific, based on nuclear and chloroplast genomes. We identified two divergent Pacific clades with evidence for admixture along the East Pacific coast. Two west-to-east (trans-Pacific) colonization events support the key role of the North Pacific Current. Time-calibrated nuclear and chloroplast phylogenies yielded concordant estimates of the arrival of Z. marina in the Atlantic through the Canadian Arctic, suggesting that eelgrass-based ecosystems, hotspots of biodiversity and carbon sequestration, have only been present there for ~243 ky (thousand years). Mediterranean populations were founded ~44 kya, while extant distributions along western and eastern Atlantic shores were founded at the end of the Last Glacial Maximum (~19 kya), with at least one major refuge being the North Carolina region. The recent colonization and five- to sevenfold lower genomic diversity of the Atlantic compared to the Pacific populations raises concern and opportunity about how Atlantic eelgrass might respond to rapidly warming coastal oceans.
Subject terms: Population genetics, Plant evolution, Marine biology
Ocean currents play a crucial role in the distribution of marine coastal species. Here the nuclear and chloroplast genomes of this eelgrass (Zostera marina L.) is used to trace its colonization history from its origin in the Northwest Pacific.
Main
Seagrasses are the only flowering plants that returned to the sea ~67 mya (million years ago). Three independent lineages descended from freshwater ancestors that lived ~114 mya (ref. 1). Seagrasses are foundation species of entire ecosystems thriving in all shallow coastal areas of the global ocean except Antarctica2. By far the most geographically widespread species is eelgrass (Zostera marina), occurring in Pacific and Atlantic areas of the Northern Hemisphere from warm temperate to Arctic environments3, spanning 40° of latitude and a range of ~18 °C in average annual temperatures (Fig. 1a). Eelgrass is a unique foundation species in that no other current seagrass can fill its ecological niche in the cold temperate to Arctic Northern Hemisphere3 (Supplementary Note 1). At the same time, eelgrass meadows provide critical nursery functions and ecosystem services including erosion protection, nutrient cycling and considerable carbon sequestration4.
Given its very wide natural distribution range that exceeds most terrestrial plant species, our goal was to reconstruct the major colonization pathways of eelgrass starting from the putative origin of Z. marina in the West Pacific along the Japanese Archipelago5,6. Currents are unique drivers of phylogeographic processes in the ocean, and we hypothesized that the North Pacific Current, Alaska and California Currents in the Pacific, and the Labrador, Gulf Stream and North Atlantic Drift in the Atlantic drove its worldwide colonization. Being a flowering plant, rafting seed-bearing shoots of eelgrass stay alive for weeks and have been shown to be able to travel tens to hundreds of kilometres, providing a biological mechanism for long-distance dispersal7 (Supplementary Note 1).
One major objective of the present study was to provide time estimates of major colonization events. We asked how evolutionary contingency—specifically the timing of large-scale dispersal events—may have affected the timing of arrival of eelgrass on East Pacific and North Atlantic coastlines8. To do so, we took advantage of recent extensions of the multi-species coalescent (MSC) as applied at the population level9,10, making it possible to construct a time-calibrated phylogenetic tree from SNP (single-nucleotide polymorphism) data11. Our data set comprised 190 individuals from 16 worldwide locations that were subjected to comprehensive whole-genome resequencing (nuclear and chloroplast).
Superimposed on the general eastward colonization are Pleistocene cycles of glacial and interglacial periods that resulted in frequent latitudinal expansions and contractions of available habitat for both terrestrial and marine biota12. Such local extinctions and subsequent recolonizations from refugial populations are expected to leave their genomic footprint in extant marine populations13–15 and may restrict their potential to rapidly adapt to current environmental change16,17. Hence, we were also interested in how glaciations—in particular the Last Glacial Maximum (LGM; 20 kya (thousand years ago); ref. 18)—have affected the population-wide genomic diversity of Z. marina and which glacial refugia permitted eelgrass to survive this period.
Results
Whole-genome resequencing and nuclear and chloroplast polymorphism
Among 190 Z. marina specimens collected from 16 geographic locations (Fig. 1a and Supplementary Table 1), full-genome sequencing yielded an average read coverage of 53.73x. After quality filtering (Supplementary Data Table 1), SNPs were mapped and called (Supplementary Figs. 1 and 2) based on a chromosomal-level assembly v.3.1 (ref. 19). To avoid reference-related bias, owing to the large Pacific–Atlantic genomic divergence, and to facilitate phylogenetic reconstruction within a conserved set of genes20, we focused on core genes—the set of genes shared by most individuals. From a total of 21,483 genes, we identified 18,717 core genes that were on average observed in 97% of the samples, containing 763,580 SNPs (hereafter ‘ZM_HQ_SNPs’; Supplementary Note 2).
After exclusion of 37 samples owing to missing data, selfing or duplicate clonality, 153 were left for further analyses (Supplementary Tables 2 and 3 and Supplementary Figs. 3 and 4). We also extracted two additionally filtered SNP data sets: one based on synonymous SNPs (‘ZM_neutral_SNPs’, comprising 144,773 sites) and the other based on a further subset in which only sites with a physical distance of >3 kbp were retained (‘ZM_Core_SNPs’, 11,705 SNPs; Supplementary Figs. 1 and 2; see Methods for further explanation).
A complete chloroplast genome of 143,968 bp was reconstructed from the reference sample21. Median chloroplast sequencing coverage for the samples of the worldwide data set was 6,273x. A total of 151 SNPs were detected along the whole chloroplast genome, excluding 23S and 16S ribosomal RNA gene regions due to possible contamination in some samples and ambiguous calling next to microsatellite regions (132,438 bp), comprising 54 haplotypes.
Gradients of genetic diversity within and among ocean basins
As measures of genetic diversity, we assessed nucleotide diversity (π) and genome-wide heterozygosity (Hobs) (Fig. 1b,c). Consistent with the Pacific origin of the species (Supplementary Note 3), Pacific locations showed a 5.5 (π)- to 6.6 (Hobs)-fold higher genetic diversity compared to the Atlantic ones (Supplementary Table 4). The highest π and Hobs values were observed in Japan-South (JS) followed by Japan-North (JN). Alaska-Izembek (ALI) and Alaska-Safety Lagoon (ASL) showed approximately a third (28% for π; 34% for Hobs) of the diversity in the more southern Pacific sites (average of San Diego (SD), Bodega Bay, California (BB) and Washington State (WAS)). In the Atlantic, a comparable loss of diversity along a south–north gradient was observed. Quebec (QU) showed 42% (π) and 47% (Hobs) of the diversity of North Carolina (NC) and Massachusetts (MA), while the diversity values in Northern Norway (NN) was 31% and 43% of averaged values of Sweden (SW) and Wales, respectively.
Global population structure of Z. marina
To reveal the large-scale population genetic structure, we performed a principal component analysis (PCA) based on the most comprehensive SNP selection (Supplementary Fig. 1; 782,652 SNPs, Fig. 2a). Within-ocean genetic differentiation in the Pacific was as great as the Pacific–Atlantic split, whereas there was much less variation within the Atlantic. Separate PCAs for each ocean revealed additional structure (Fig. 2c,e), including the separation of the Atlantic and Mediterranean Sea populations (principal component 1, 24.47%, Fig. 2e).
We then used STRUCTURE22, a Bayesian clustering approach, on 2,353 SNPs (20%) randomly selected from the ZM_Core_SNPs. The most likely number of genetic clusters was determined using a combination of the Delta-K method23 and other metrics introduced by ref. 24 (Fig. 2b,d,f), with a qualitative inspection of additional K values as generated from StructureSelector25 in Supplementary Figs. 5–7. In the global analysis (Fig. 2b), two clusters representing Atlantic and Pacific locations were identified. JN contained admixture components with the Atlantic, consistent with a west–east colonization via northern Japan through the North Pacific Current and then north towards the Bering Sea. Given the pronounced nested population structure (Fig. 2a), we then proceeded with separate analyses for Pacific and Atlantic, as recommended in ref. 25. An analysis restricted to Pacific sites supported a role of JN as a dispersal hub, with admixture components from JS and Alaska, suggesting that this site has been a gateway between both locations (Fig. 2c). At K = 3, WAS and BB, located centrally along the east Pacific coastline, were admixed between both Alaskan sites and SD. WAS showed about equal northern and southern components, while BB was dominated by the adjacent southern SD genetic component. Interestingly, under K = 4 (Supplementary Fig. 6), which was supported by the metrics medmeak and maxmeak24, a presumably ancient connection between JN and SD becomes apparent, while at even larger K values, the pattern remains stable for the Pacific side.
In the Atlantic and Mediterranean (Fig. 2f), a less pronounced population structure was present, with only two clearly separated groups representing the Mediterranean (plus Portugal (PO)) and all other Atlantic Ocean sites (both east and west), consistent with the PCA results (Fig. 2e). Further exploration of an additional genetic cluster revealed a connection between PO closest to the Strait of Gibraltar and the East Atlantic at K = 4 (NC, Supplementary Fig. 7, supported by medmeak and maxmeak). A clear split among West and East Atlantic becomes apparent with K = 4 and 5 clusters, for which either the separation time since the LGM or some non-sampled East Atlantic refugia might be responsible.
Population structure of chloroplast DNA
A haplotype network (Fig. 2g) revealed three markedly divergent clades, which were additionally supported by bootstrap values of 98–100% based on a maximum-likelihood phylogeny (Extended Data Fig. 1). In the Pacific, WAS showed haplotypes similar to those of Alaska (ALI and ASL) and JN, while BB showed haplotypes of a divergent clade that also comprises all haplotypes from SD. ASL and JN share the same dominant haplotype, suggesting JN to be a hub between West and East Pacific. In JS, two divergent private haplotypes (separated by nine mutations from other haplotypes) suggest long-term persistence of eelgrass at that location.
On the Atlantic side, only four to six mutations separate the Northeast Atlantic and Mediterranean haplotypes, consistent with a much younger separation. The central (putatively ancestral) haplotype is shared by both MA and NC, with nine private NC haplotypes. A single mutation separates both MA and QU, as well as MA and Wales-North. Also extending from the central haplotype were SW and NN (Fig. 2g). Together with the diversity measures (Fig. 1b,c), this pattern suggests long-term residency of eelgrass on the West Atlantic coast and transport to the Northeast Atlantic via the North Atlantic Drift. Notably, there were no shared chloroplast DNA (cpDNA) haplotypes among Pacific and Atlantic, suggesting that the Atlantic was colonized only once.
Reticulated topology of Z. marina phylogeography
To further explore the degree of admixture and secondary contact, we constructed a split network26 using all ZM_Core_SNPs. Pacific populations were connected in a web-like fashion (Fig. 3a). WAS and BB were involved in alternative network edges (Fig. 3b), either clustering with SD or with both JS and JN. The topology places WAS and BB in an admixture zone with a northern Alaska component (ALI and ASL) and a more divergent southern component from SD, in line with the STRUCTURE results (Fig. 2c). Due to uniparental inheritance mode, the population relationships inferred from chloroplast data were expected to reflect only one of the two topologies. Based on these data, WAS groups with the Alaska component (Fig. 2g and Supplementary Fig. 6), indicating an early divergence from the SD and BB cpDNA haplotypes. In the Atlantic (Fig. 3c), edges among locations were shorter than those on the Pacific side, indicating a more recent divergence among Atlantic populations. A bifurcating topology connected the older Mediterranean populations, while both Northeast and Northwest Atlantic were connected by unresolved, web-like edges, indicating a mixture of incomplete lineage sorting and probable, recent gene flow.
We used Patterson’s D-statistic27 to further test for admixture28 (Extended Data Fig. 2). For the Pacific side, the pairs WAS/SD, BB/ALI and BB/ASL in addition to JN/ALI and JN/ASL showed the highest D values along with statistical significance (D = 0.67; P < 0.001), suggesting substantial admixture. For the Atlantic side, D values indicated recent or ongoing connection between the Atlantic and Mediterranean Sea, consistent with the admixture signal detected by STRUCTURE (SW, Fig. 2f) and with two Atlantic (SW) cpDNA haplotypes that cluster with the Mediterranean ones (Fig. 2g).
Time-calibrated MSC analysis of colonization events
Application of the MSC11 (Fig. 4) assumes that populations diverge under a bifurcating model. Hence, three locations (WAS, BB, JN) that showed pronounced admixture (compare with Figure 2; Extended Data Fig. 2) were excluded, while we explored the effects of including or excluding admixed populations in Supplementary Fig. 9.
As direct fossil evidence is unavailable within the genus Zostera, the divergence time between Z. marina and Zostera japonica was estimated from a calibration point that takes advantage of a whole-genome duplication event previously identified and dated to ~67 mya (ref. 21). The resulting clock rate for fourfold degenerative transversions of paralogous gene sequences yielded a divergence time estimate of 9.86–12.67 mya between Z. marina and Z. japonica (Supplementary Note 4). We then repeated the analysis based on 13,732 SNP sites polymorphic within our target species (Supplementary Fig. 2) after setting a new Z. marina-specific calibration point.
Assuming JS as generally representative of the species origin5 (Supplementary Note 3), we found evidence for two trans-Pacific dispersal events (Fig. 4). The first trans-Pacific dispersal event at ~352 kya (95% highest posterior density (HPD), 422.10–284.9 kya) founded populations close to SD that remained isolated but engaged in admixture to the north (Supplementary Note 5), as also supported by chloroplast-based population structure. A second trans-Pacific dispersal event from JS to the Northeast Pacific seeded the Alaskan populations some 270 kya (95% HDP, 327.50–221.8 kya), likely with JN as stepping stone. Shortly thereafter, the Atlantic was colonized ~243 kya (95% HPD, 294.9–199.6 kya) from populations in or close to Alaska. This estimate is surprising given that the Bering Strait opened as early as 4.8–5.5 mya (ref. 29). Further support for JN being a dispersal hub is its smallest pairwise FST with all Atlantic populations (Supplementary Table 5). Moreover, JN was the only Pacific population that showed a shared genetic component with the Atlantic (Fig. 2b).
In the Atlantic, divergence time estimates were much more recent than in the Pacific. The Mediterranean Sea clade emerged ~43.8 kya (95% HPD, 52.8–35.5 kya). The Northwest and Northeast Atlantic populations also diverged from each other very recently at ~18.8 kya (95% HPD, 22.9–15.1 kya) and shared a common ancestor during the LGM, indicating that they were partially derived from the same glacial refugium in the Northwest Atlantic (likely at or near NC). Some admixture found in the SW population stemming from the Mediterranean gene pool (Fig. 2f, g) likely explains a higher genetic diversity at that location (Fig. 1b,c). Some coalescence runs of the population data set with WAS, BB and JN excluded showed a different topology for the JS–Alaska–Atlantic split, requiring the presence of a third trans-Pacific colonization event that predated the Atlantic colonization (Supplementary Fig. 9a), along with a more recent dispersal to Alaska. Note that divergence time estimates for all other splits, in particular the foundation of the SD lineage and the Atlantic and Mediterranean colonization, were very similar.
In a second coalescent approach10, we used alignments of 617 core genes across all samples (Supplementary Note 2). Based on the same initial calibration as under the MSC, the tree topology was examined using ASTRAL. Despite high incomplete lineage sorting (ASTRAL normalized quartet score = 0.48), the species tree follows geographic patterns with only 2 of 107 individuals showing incongruent topology based on geographic collection sites30 (Supplementary Fig. 11). Subsequent divergence time estimation was performed with StarBEAST2 (ref. 31). This approach resulted in a topology consistent with the one depicted in Fig. 4, while divergence time estimates for the deeper nodes were even more recent (for example, Pacific–Atlantic split at 162 kya). Estimates for the more recent divergence events were nearly identical (Supplementary Fig. 12). The StarBEAST2-based topology supports the SNAPP topology presented in Fig. 4.
Finally, we used the mutational steps among chloroplast (cpDNA) haplotypes as an alternative dating method. SD and BB along the Pacific East coast showed very different haplotypes, separated by about 30 mutations from the other Pacific and the Atlantic clades. Assuming a synonymous cpDNA mutation rate of 2 × 10−9 per site per year, this genetic distance corresponds to a divergence time of 392 kya (Supplementary Note 6), comparable to the estimate of 352 kya in the coalescent analysis. Conversely, few mutations (4–7) distinguished major Atlantic haplotypes from the Mediterranean Sea, consistent with a much younger divergence estimate based on nuclear genomes (Fig. 4). The topology had a high bootstrap support in a maximum-likelihood-based phylogenetic tree32 (Extended Data Fig. 1).
Demographic history and post-LGM recolonization
We used the multiple sequentially Markovian coalescent (MSMC)33 to infer past effective population size Ne (Fig. 5). We here focus on time intervals where different replicate runs per population converged, acknowledging that MSMC creates unreliable estimates in recent time34. Almost all eelgrass populations revealed a recent expansion 1,000–100 generations ago, while the magnitude of Ne value minima (at about 10,000–1,000 generations) varied. Given a range of plausible generation times under a mix of clonal and sexual reproduction, it is likely that an Ne minimum shown by several locations coincides with the LGM, which in turn can be used to estimate the long-term generation time. For example, a local minimal Ne at 5,000 generations ago, at locations JS, WAS, BB, SD and MA would translate to 3 year × generation−1 × 5,000 generations = 15 kya, just after the LGM. In general, lower Ne values were related to lower clonal diversity at sites in northern (NN) and southern Europe (PO; Supplementary Table 3). Within the Pacific, the southernmost population (SD) showed no drop in Ne, while all others showed bottlenecks that became more pronounced from south to north (in the order BB, WAS and ALI/ASL). As for the Atlantic side, the Northwest Atlantic populations NC and MA and the southern European populations PO and CZ (and to a lesser extent Mediterranean FR) showed little evidence for bottlenecks (as local Ne minima), suggesting that these localities were refugia during the LGM (Fig. 5). The opposite applied to QU in the Northwest and NN and SW in the Northeast Atlantic, where we see a pronounced minimal Ne at about 3,000 generations ago.
For the Atlantic, we determined the most likely post-LGM recolonization through approximate Bayesian computations (Do-It-Yourself-Approximate Bayesian Computation - DIY-ABC; Supplementary Fig. 10) and found that the region north from NC to QU was the most likely donor source (Supplementary Note 7).
Discussion
With rapid climate change, information about past climatic shifts and their legacy effects on genetic structure and diversity of extant populations can help to guide restoration efforts to ensure persistence and resilience16,17,35. Z. marina has a circumglobal distribution that provided us with the unique opportunity to reconstruct the natural expansion of a marine plant throughout the Northern Hemisphere starting from the species origin in the Northwest Pacific during a period of strong recurrent climate changes (Fig. 6a,b).
The presence of eelgrass in the Atlantic is surprisingly recent, dating to only ~243 kya. As no other seagrass species is able to fill this ecological niche or form dense meadows in boreal to Arctic regions (>50° N, Supplementary Note 1), historical contingency8 has played a previously underappreciated role for the establishment of this unique and productive ecosystem. The recency of the arrival of eelgrass in the Atlantic may also explain why relatively few animals are endemic to eelgrass beds or have evolved to consume its plant tissue directly (Supplementary Table 6). Greater numbers of species are found to be intimately associated with Z. marina in the Pacific than the Atlantic, including specialist feeders, facultative feeders on green tissue and habitat specialists.
The first dated population-level phylogeny in any seagrass species might also explain why there seems to be little niche differentiation among eelgrass-associated epifauna in the Atlantic compared to the Pacific36. Our study shows how macro-ecology, here the presence of an entire ecosystem, may be strongly determined by the colonization history, specifically the time frame in which eelgrass reached the North Atlantic8, and not by suitable environmental conditions.
We identified the North Pacific Current, which began to intensify ∼1 mya (ref. 37), as the major dispersal gateway. It bifurcates north into the Alaska Current and south into the California Current (Fig. 6a), roughly at the latitude of mid-Vancouver Island (Supplementary Note 5). Based on this scenario, SD was colonized by the earliest detectable colonization event roughly 352 kya (Fig. 6a, event 1) and has retained old genetic variation since then, probably owing to the rarity of genetic exchange southward across the Point Conception biogeographic boundary38 and the variable North–South Davidson Current (reviewed in ref. 39). Subsequent trans-Pacific events that headed south at the gateway eventually resulted in an admixture zone involving WAS and BB.
Another trans-Pacific dispersal (Fig. 6a, event 2) at 270 kya moved north through the gateway, colonized Alaska and became the stepping stone for an inter-oceanic dispersal to the Atlantic through the Arctic Ocean some 243 kya (event 3). Further support for the gateway bifurcation comes from two chloroplast mat-K haplotypes present in northern Hokkaido, Japan40, with a split on the East Pacific side: the mat-K2 haplotype went north and was found at 12 sites in the Bering and Gulf of Alaska Large Marine Ecosystems, whereas the mat-K4 haplotype was found south of the gateway at six sites in the California Current Large Marine Ecosystem all the way to Baja (Supplementary Note 5).
Although the Bering Strait may have opened as early as 5.5–4.8 mya (ref. 29), our analyses only support a single colonization event into the Atlantic, in contrast to findings for other amphi-Arctic and boreal marine invertebrates41 and seaweeds42. Genomic variation characteristic of extant Alaskan populations was not detected in any North Atlantic populations, in line with earlier microsatellite data40, corroborating that the Atlantic was only colonized once. While we cannot rule out an earlier colonization, this would require that Z. marina became extinct without leaving any trace in nuclear genomes or cpDNA haplotypes, which we consider unlikely.
The Pacific–Atlantic genetic divide has been recently identified as a ‘Pleistocene legacy’ based on a microsatellite-based genotyping study17. Here we further confirm the presence of two deeply divergent clades in the Pacific that share a complex pattern of secondary contact on the East Pacific side (Supplementary Note 8). In contrast, the genetic separation between West and East Atlantic populations is present but weak, suggesting recent population contractions and expansions driven by the LGM, with the North Atlantic Drift driving repeated west–east colonization events (Fig. 6b).
While our phylogeny (Fig. 4) is also consistent with a scenario in which the deep branching SD population would represent the species’ origin of Z. marina, we consider this extremely unlikely given the long-term prevailing ocean currents (Fig. 6a), the distribution of genetic diversity (Fig. 1b,c) and our current understanding of the emergence of the genus Zostera (~15 mya), including the species Z. marina some 5–1.62 mya (ref. 5) in the Northwest Pacific (Supplementary Note 3). Thus, considering all evidence jointly, we conclude that the Japan region, and not the East Pacific (SD), is the most likely geographic origin of eelgrass and the source of multiple dispersal events with ocean currents.
The NC and Chesapeake Bay region northward to Long Island served as a major refuge and was at least one subsequent source population for the Northeast Atlantic (Fig. 6b, event 5). The coastal areas further north of Cape Cod, Nova Scotia, Quebec and Newfoundland are also known refugia43 and connected by Quebec in our sampling. Additional inclusion of populations from Newfoundland and southern Greenland may modify this view, as may be the case of refugia around southwestern Ireland and the Brittany peninsula44,45 (Supplementary Note 7). Indeed, there is some evidence in our data from the STRUCTURE analysis of higher K modes (Supplementary Figs. 5–7) and admixture signals in SW that additional East Atlantic refugia resulted in a more complex post-LGM genetic composition of extant northern European populations as suggested earlier46 (Supplementary Note 7).
Along with demographic modelling, we identify population contraction and subsequent latitudinal expansion along three coastlines following the LGM (26–19 kya). These are common patterns of many terrestrial12 and intertidal species15,46, with the Northeast Atlantic/North Sea coastline and Beringia being most drastically affected. Interestingly, for Z. marina, the Atlantic region was not more severely influenced by the last glaciations and sea-level changes than the East Pacific (Fig. 5 and 6b), even when considering their relative baseline diversities (Supplementary Table 4). In both oceans, there were dramatic losses of genome-wide diversity. The 5- to 7-fold lower overall genetic diversity in the Atlantic simply amplified LGM effects and resulted in >30-fold differences among populations with the highest (JS) versus lowest (NN) diversity. This observation may have significant but as yet unknown consequences for the adaptive potential and genetic rescue of eelgrass in the Anthropocene.
In conclusion, the relatively low number of extant seagrass species (~65 species in six families47) has been attributed to frequent intermediate extinctions6. Our data suggest a second plausible process, namely multiple long-distance genetic exchanges within and among ocean basins that may have impeded allopatric speciation (see also ref. 48). Our range-wide sampling has allowed an overview of evolutionary history in this lineage of seagrass and opens the door for exploration of functional studies across ocean basins and coasts. Future work will explore the pan-genome of Z. marina with the consideration of how the high diversity and robustness of Pacific populations may be able to contribute to management and rescue of populations along rapidly warming Atlantic coastlines.
Methods
Study species and sampling design
Eelgrass (Z. marina L.) is the most widespread seagrass species of the temperate to Arctic Northern Hemisphere3. It is being developed as a model for studying seagrass evolution and genomics17,19,21,49. Z. marina is a foundation species of shallow water ecosystems17 with a number of critical ecological functions including enhancement of fish and crustacean recruitment50, improvement of water quality51 and the sequestration of ‘blue carbon’52,53.
Eelgrass features a mix of clonal (=vegetative spread of the rhizome system) and sexual reproduction via seeds, with varying proportions across locations46. The mating system is monoecious. While there is the possibility for selfing, that is, self-compatibility54, most populations are outcrossing55. Except for the most extreme cases of mono-clonality56,57, replicated modular units (leaf shoots = ramets) stemming from a sexually produced individual (=genet or clone) are intermingled to form the seagrass meadow. This also implies that generation times are difficult to estimate or average across populations. Nevertheless, we assumed here based on personal observations that in perennial eelgrass populations, individuals become reproductive in year 2 after germination, while attaining their maximal reproductive output in year 3. Extended clone longevity results in overlapping generations, but not in longer generation times. Additional evidence for an average generation time of 3 years used here for later modelling comes from the historical demographic analysis (Fig. 5), specifically the local Ne minima that are indicative of the population bottleneck during the LGM.
We conducted a range-wide sampling collection of 190 Z. marina specimens from 16 geographic locations (Fig. 1a and Supplementary Table 1). The chosen populations feature a mix of sexual and vegetative reproduction with the exception of mostly vegetative reproduction at the sites PO and NN, apparent through extended clones. Chosen locations were a subset of the Zostera Experimental Network sites that were previously analysed using 24 microsatellite loci17. Although a sampling distance of >2 m was maintained to reduce the likelihood of collecting the same genet/clone twice, this was not always successful (compare with Supplementary Table 3) and thus provided an estimate of local clonal diversity.
Plant tissue was selected from the basal meristematic part of the shoot after peeling away the leaf sheath to minimize epiphytes (bacteria and diatoms), frozen in liquid nitrogen and stored at −80 °C until DNA extraction.
DNA extraction, whole-genome resequencing and quality check
About 100–200 mg fresh weight of basal leaf tissue, containing the meristematic region, was ground in liquid N2. Genomic DNA was extracted using the Macherey-Nagel NucleoSpin plant II kit following the manufacturer’s instructions. DNA concentrations were in the range of 50–200 ng µl−1. Quality control was performed following Joint Genome Institute guidelines (https://jgi.doe.gov/wp-content/uploads/2013/11/Genomic-DNA-Sample-QC.pdf). Plate-based DNA library preparation for Illumina sequencing was performed on the PerkinElmer Sciclone NGS robotic liquid handling system using Kapa Biosystems library preparation kit. About 200 ng of sample DNA was sheared to a length of around 600 bp using a Covaris LE220 focused ultrasonicator. Selected fragments were end-repaired, A-tailed and ligated with sequencing adaptors containing a unique molecular index barcode. Libraries were quantified using KAPA Biosystems’ next-generation sequencing library qPCR-kit on a Roche LightCycler 480 real-time PCR instrument. Quantified libraries were then pooled together and prepared for sequencing on the Illumina HiSeq2500 sequencer using TruSeq SBS sequencing kits (v4) following a 2 × 150 bp indexed run recipe to a targeted depth of approximately 40x coverage. The quality of the raw reads was assessed by FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and visualized by MultiQC58. BBDuk (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) was used to remove adapters and for quality filtering, discarding sequence reads (1) with more than one ‘N’ (maxns = 1), (2) shorter than 50 bp after trimming (minlength = 50) and (3) with average quality <10 after trimming (maq = 10). FastQC and MultiQC were used for second round of quality check for the clean reads. Sequencing coverage and mapping rate was calculated for each sample (Supplementary Data Tables 1 and 2).
Identifying core and variable genes
To analyse genetic loci present throughout the global distribution range of eelgrass, we focused on identifying core genes that are present in genomes of all individuals. To do so, each of the 190 ramets were de novo assembled using HipMer (k = 51) (ref. 59). To categorize, extract and compare core and variable (shell and cloud) genes, primary transcript sequences (21,483 gene models) from the Z. marina reference (V3.1; ref. 19) were aligned using BLAT using default parameters60 to each de novo assembly. Genes were considered present if the transcript aligned with either (1) >60% identity and >60% coverage from a single alignment or (2) >85% identity and >85% coverage split across three or fewer scaffolds. Individual presence–absence-variation calls were combined into a matrix to classify genes into core, cloud and shell categories based on their observation across the population. The total number of genes considered was 20,100. Because identical genotypes and fragmented, low-quality assemblies can bias and skew presence–absence-variation analyses, only 141 single representatives of clones and ramets with greater than 17,500 genes were kept to ensure that only unique, high-quality assemblies were retained. Genes were classified using discriminant analysis of principal components61 into cloud, shell and core gene clusters based on their frequency. Core genes were the largest category, with 18,717 genes that were on average observed in 97% of ramets.
SNP mapping, calling and filtering
The quality-filtered reads were mapped against the chromosome-level Z. marina reference genome V3.1 using BWA MEM62. The alignments were converted to BAM format and sorted using Samtools62. The MarkDuplicates module in GATK4 (ref. 63) was used to identify and tag duplicate reads in the BAM files. The mapping rate for each genotype was calculated using Samtools (Supplementary Data Table 2). HaplotypeCaller (GATK4) was used to generate a Genomic Variant Call Format (GVCF) file for each sample, and all the GVCF files were combined by CombineGVCFs (GATK4). GenotypeGVCFs (GATK4) was used to call genetic variants.
BCFtools64 was used to remove SNPs within 20 base pairs of an indel or other variant type (Supplementary Fig. 1), as these variant types may cause erroneous SNPs calls. VariantsToTable (GATK4) was used to extract INFO annotations. SNPs meeting one or more than one of the following criteria were marked by VariantFiltration (GATK4): MQ < 40.0; FS > 60.0; QD < 10.0; MQRandSum > 2.5 or MQRandSum < −2.5; ReadPosRandSum < −2.5; ReadPosRandSum > 2.5; SOR > 3.0; DP > 10,804.0 (2 × average DP). Those SNPs were excluded by SelectVariants (GATK4). A total of 3,975,407 SNPs were retained. VCFtools65 was used to convert individual genotypes to missing data when GQ < 30 or DP < 10. Individual homozygous reference calls with one or more reads supporting the variant allele, and individual homozygous variant calls with ≥1 read supporting the reference, were set as missing data. Only bi-allelic SNPs were kept (3,892,668 SNPs). To avoid the reference-genome-related biases, due to the large Pacific–Atlantic genomic divergence, we focused on the 18,717 core genes that were on average observed in 97% of ramets. Bedtools66 was used to find overlap between the SNPs and the core genes, and only those SNPs were kept (ZM_HQ_SNPs, 763,580 SNPs). Genotypes that were outside our custom quality criteria were represented as missing data.
Excluding clone mates and genotypes originating from selfing
Based on the extended data set ZM_HQ_SNPs (763,580 SNPs; Supplementary Fig. 1), possible parent–descendant pairs under selfing (Supplementary Table 2) as well as clonemates were detected based on the shared heterozygosity (ref. 67). To ensure that all genotypes assessed originated by random mating, ten ramets showing evidence for selfing were excluded. Seventeen multiple sampled clonemates were also excluded (Supplementary Table 3 and Supplementary Fig. 3). Based on ZM_HQ_SNPs (763,580 SNPs), we calculated the sample-wise missing rate using a custom Python3 script and plotted results as a histogram (Supplementary Fig. 4). Missing rates were mostly <15%, except for ten ramets (ALI01, ALI02, ALI03, ALI04, ALI05, ALI06, ALI10, ALI16, QU03 and SD08) that were also excluded. After the exclusion of these 37 samples owing to missing data, selfing or clonality, 153 samples were left for further analyses.
Chloroplast haplotypes
The chloroplast genome was de novo assembled by NOVOPlasty68. The chloroplast genome of Z. marina was represented by a circular molecule of 143,968 bp with a classic quadripartite structure: two identical inverted repeats (IRa and IRb) of 24,127 bp each, a large single-copy region of 83,312 bp, and a small single-copy region of 12,402 bp. All regions were equally taken into SNP calling analysis except for 9,818 bp encoding 23S and 16S ribosomal RNAs due to bacterial contamination in some samples. The raw Illumina reads of each individual were aligned by BWA MEM to the assembled chloroplast genome. The alignments were converted to BAM format and then sorted using Samtools62. Genomic sites were called as variable positions when the frequency of variant reads was >50% (Supplementary Fig. 8) and the total coverage of the position was >30% of the median coverage (174 variable positions). Then 11 positions likely related to microsatellites and 12 positions reflecting minute inversions caused by hairpin structures69 were removed from the final set of variable positions for the haplotype reconstruction (151 SNPs). For the phylogenetic tree reconstruction, we further selected 108 SNPs that represent parsimony-informative sites (that is, no singletons).
Putatively neutral and non-linked SNPs
Among the 153 unique samples that were retained for analyses, SnpEff (http://pcingola.github.io/SnpEff/) was used to annotate each SNP as genic or non-genic, and within the former category as synonymous or non-synonymous. To obtain putatively neutral SNPs, we kept only SNPs annotated as ‘synonymous_variant’ (ZM_Neutral_SNPs, 144,773 SNPs). For the SNPs in ZM_Neutral_SNPs (144,773 SNPs), only SNPs without any missing data were kept, which resulted in 44,865 SNPs, the data set used for calculating π (Supplementary Figs. 1 and 2). To obtain putatively non-linked SNP loci for the coalescence runs, we thinned sites using VCFtools to achieve a minimum pairwise distance (physical distance in the reference genome) of 3,000 bp to obtain our core data set, hereafter ZM_Core_SNPs, corresponding to 11,705 SNPs.
Population structure based on nuclear and chloroplast polymorphism
We used R packages to run a global PCA based on ZM_HQ_SNPs, (=763,580 SNPs). The package vcfR70 was used to load the VCF format file, and function glPca in adegenet package to conduct PCA analyses, followed by visualization through the ggplot2 package.
We used Bayesian clustering implemented in STRUCTURE to study population structure and potential admixture22. To reduce the run time, we randomly selected 2,353 SNPs from ZM_Core_SNPs (20%) to run STRUCTURE (length of burn-in period 3 × 105; number of Markov chain Monte Carlo runs 2 × 106). Ten runs were performed for K values 1–10. StructureSelector25 was used to help determine the optimal number of clusters (K) based on the original Delta-K method23 in conjunction with additional metrics proposed by ref. 24 that give an upper limit to the number of clusters. We considered the hierarchical structure of our data set owing to the marked Pacific–Atlantic divide and always performed a qualitative inspection of alternative major and minor K modes.
To detect hidden hierarchical population structure, we further analysed populations from the Atlantic and Pacific alone. Pacific data were extracted from ZM_Neutral_SNPs (144,773 SNPs), excluding monomorphic sites and those with missing data. To obtain putatively independent SNPs, we thinned sites using VCFtools, so that no two sites were within 3,000 bp distance (physical distance in the reference genome) from one another (ZM_Pacific_SNPs, 12,514 SNPs). Those 12,514 SNPs were subjected to PCA, while a set of randomly selected 6,168 SNPs was used in STRUCTURE to reduce run times (length of burn-in period 3 × 105; number of Markov chain Monte Carlo runs, 2 × 106) as described above, with possible K values 1–7.
Polymorphism data for Atlantic and Mediterranean eelgrass were also extracted from ZM_Neutral_SNPs (144,773 SNPs). To obtain putatively independent SNPs, we thinned sites using VCFtools according to the above criteria. The resulting 8,552 SNPs were then used to run another separate PCA and STRUCTURE using the parameters above. For STRUCTURE analysis, K was set from 1 to 5. For each K, we repeated the analysis 10 times independently (Supplementary Figs. 6 and 7).
The population structure of cpDNA was explored using a haplotype network, constructed via the Median Joining Network method71 with epsilon 0 and 1 implemented by PopART72, based on 151 polymorphic sites. The topology was additionally confirmed using a maximum-likelihood phylogenetic tree, reconstructed by IQ-TREE v1.5.5 with 1,000 bootstrap replicates32 based on 108 parsimony-informative polymorphic sites (Extended Data Fig. 1).
Analysis of reticulate evolution using split network
To assess reticulate evolutionary processes, we used SplitsTree426, a combinatorial generalization of phylogenetic trees designed to represent incompatibilities. A custom Python3 script was used to generate a fasta format file containing concatenated DNA sequences for all ramets based on ZM_Core_SNPs. As the majority of genotypes were heterozygous, one allele had to be randomly selected to represent the site for an individual. We checked for consistency by re-rerunning the analysis with different randomly selected SNP sets and found identical topologies and similar split weights. The fasta format file was converted to nexus format file using MEGAX73, which was fed to SplitsTree4. NeighborNet method was used to construct the split network.
Genetic diversity
VCFtools was used to calculate nucleotide diversity (π) for each population at all synonymous sites using each of the six chromosomes as replicates for 44,685 SNPs without any missing data (Supplementary Fig. 1). Genomic heterozygosity for a given genotype HOBS (as (number of heterozygous sites)/(total number of sites with available genotype calls)) was calculated using a custom Python3 script based on all synonymous SNPs (144,773).
Pairwise population differentiation using FST
We used the function stamppFst in the StAMPP-R package74 to calculate pairwise FST based on ZM_Core_SNPs (Supplementary Table 5). P values were generated by 1,000 bootstraps across loci.
D-statistics
Patterson’s D provides a simple and powerful test for the deviation from a bifurcating evolutionary history. The test is applied to three populations, P1, P2 and P3 plus an outgroup O, with P1 and P2 being sister populations. If P3 shares more derived alleles with P2 than with P1, Patterson’s D will be positive. We used Dsuite28 to calculate D values for populations within the Pacific and within the Atlantic Oceans (Extended Data Fig. 2). D was calculated for trios of Z. marina populations based on the SNP core data set (ZMZJ_D_SNPs) (Supplementary Fig. 2), using Z. japonica as outgroup. The Ruby script plot_d.rb (https://github.com/mmatschiner/tutorials/blob/master/analysis_of_introgression_with_snp_data/src/plot_d.rb) was used to plot a heat map that jointly visualizes both the D value and the associated P value for each comparison of P2 and P3. The colour of the corresponding heat map cell indicates the most significant D value across all possible populations in position P1. Red colours indicate higher D values, and more saturated colours indicate greater significance.
Phylogenetic tree with estimated divergence time
To estimate the divergence time among major groups, we used the MSC in combination with a strict molecular clock model11. We used the software SNAPP9 with an input file prepared by script ‘snapp_prep.rb’ (github.com/mmatschiner/snapp_prep). Two specimens were randomly selected from each of the included populations, and genotype information was extracted from ZMZJ_Neutral_SNPs (Supplementary Figs. 1 and 2). Monomorphic sites were excluded. Only SNPs without any missing data were kept. To obtain putatively independent SNPs, we thinned sites using VCFtools so that no two sites included SNPs that were within 3,000 bp (physical distance in the reference genome) from one another (6,169 SNPs). The estimated divergence time between Z. japonica and Z. marina was used as a calibration point, which was implemented as a lognormal prior distribution (Supplementary Note 4, mean = 11.154 mya, s.d. = 0.07).
Most of the 6,169 SNPs above represented the genetic differences between Z. japonica and Z. marina and were monomorphic in Z. marina. To obtain a better estimation among Z. marina populations, we performed a second, Z. marina-specific SNAPP analysis via subsampling from the ZM_Neutral_SNPs (144,773 SNPs) data set, excluding monomorphic sites and missing data. We thinned sites again using VCFtools, so that all sites were ≥3,000 bp distance from one another (13,732 SNPs). The crown divergence for all Z. marina populations, estimated in the first SNAPP analysis, was used as calibration point, assuming a lognormal prior distribution (mean = 0.3564 mya, s.d. = 0.1).
As the MSC model does not account for genetic exchange, the SNAPP analysis was repeated after removing populations showing admixture in STRUCTURE (Fig. 2), SplitsTree (Fig. 3) and D-statistics (Extended Data Fig. 2). We hence reduced the data set by excluding JN (admixed with Alaska), as well as WAS and BB (involved in admixture with SD). We also explored how this exclusion of admixed populations progressively affected the SNAPP phylogenetic tree topology (Supplementary Fig. 11b–d). As alternative coalescent method, an ASTRAL analysis based on 617 core genes in combination with divergence time estimation using StarBEAST2 was conducted (Supplementary Note 2). Incomplete lineage sorting was examined using ASTRAL quartet analysis30 (Supplementary Fig. 11), and the alternative dating of divergence events is presented in Supplementary Fig. 12.
Demographic analysis
The MSMC33 was run for each genotype per population. We focused on time intervals where different replicate runs per population converged, because MSMC creates unreliable estimates in recent time34. Owing to differences in the relative amount of sexual versus clonal or vegetative reproduction, the generation time of Z. marina varies across populations. We therefore refrained from representing the x axis in absolute time. We first generated one mappability mask file for each of the six main chromosomes using SNPable (http://lh3lh3.users.sourceforge.net/snpable.shtml). Only chromosomal regions that permitted unique mapping of sequencing reads were considered. We generated one mask file for all core genes along each of the six main chromosomes. We generated one ramet-specific mask file based on the BAM format file using bamCaller.py (https://github.com/stschiff/msmc-tools), containing the chromosomal regions with sufficient coverage of any genoytpe, with minDepth = 10. We also generated a ramet-specific VCF file for each of the six main chromosomes based on ZM_HQ_SNPs using a custom Python3 script.
Recolonization scenarios after the LGM for the Atlantic
Simulations using DIYABC-RF75 were run to distinguish between alternative models of the recolonization history of Z. marina after the LGM. Considering that the Mediterranean Sea had its own glacial refugium, the ABC modelling was conducted for the Atlantic only. We constructed three recolonization scenarios (Supplementary Fig. 10): (1) NC and MA were glacial refugia in the Atlantic, which first recolonized QU as a stepping stone and then the Northeast Atlantic. (2) NC and MA represent the only glacial refugia in the Atlantic. Both QU and Northeast Atlantic were directly recolonized by the glacial refugia. (3) NC and MA represent the southern glacial refugia for the Northwest Atlantic only. Note that this analysis cannot cover any additional East Atlantic refugia that were not sampled (Supplementary Note 7).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This study was supported by a PhD scholarship from the China Scholarship Council to L.Y. (number 201704910807), by a fellowship to M.K. in the Helmholtz School for Marine Data Science (grant number HIDSS-0005) and by a grant to J. Eisen, J.J.S. and J.L.O. from the US Department of Energy Joint Genome Institute Community Sequencing Program (CSP 502951, 2016, Population and evolutionary genomics of host–microbiome interactions in Zostera marina and other seagrasses). The work (proposal 10.46936/10.25585/60000773) conducted by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a US Department of Energy Office of Science User Facility, is supported by the Office of Science of the US Department of Energy operated under Contract No. DE-AC02-05CH11231. Field sampling was supported by the National Science Foundation (OCE-1336206 to J.E.D. and OCE-1829976 to J.J.S.). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government. We thank X. Zhang for providing the unpublished reference genome of Zostera japonica to predict the coding sequences, Susanne Landis (scienstration) for assisting with figures and illustrations and the many other members of the Zostera Experimental Network. We thank T. Bayer for discussions on bioinformatic problems and Y. Li for assistance with the ABC-RF analysis.
Extended data
Source data
Author contributions
J.A.E., J.J.S., J.S., J.L.O. and T.B.H.R. conceived and designed the study; M.K. analysed the chloroplast data; L.Y., M.M. and A.H. conducted the phylogenetic analyses; A.H. identified the core genes; L.Y. calculated D-statistic with assistance from M.M.; L.Y. conducted all other analyses; B.C. and D.G. assisted with sample acquisition and DNA extraction; J.G., K.K. and C.P. conducted the DNA sequencing; J.G., J.J., S.M., J.S., T.D. and Y.V.d.P. assisted with the bioinformatic analyses; M.C., J.E.D., F.J.F., A.R.H., M.H., M.J., C.K., D.M.M., P.-O.M., M.N., K.R., F.R., J.L.R., S.S., J.J.S., S.T., R.U. and D.H.W. provided access to the sampling sites and performed the specimen sampling; J.J.S. compiled the table on eelgrass-associated fauna; L.Y., M.K., M.M., A.H., J.L.O., T.D. and T.B.H.R. discussed and interpreted the results; L.Y., J.L.O. and T.B.H.R. wrote the paper. All authors commented on earlier versions of the manuscript.
Peer review
Peer review information
Nature Plants thanks Qing-Feng Wang, Richard Hodel and Sandra Lindstrom for their contribution to the peer review of this work.
Funding
Open access funding provided by GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel.
Data availability
Genome data have been deposited in Genbank (short read archive, Supplementary Data Table 3). Coding sequences of Z. japonica and Z. marina for the ASTRAL analysis can be found on figshare (10.6084/m9.figshare.21626327.v1). VCF files of the 11,705 core SNPs can be accessed at 10.6084/m9.figshare.21629471.v1. Source data for Fig. 1b,c are given, as well as statistics of sequencing coverage, mapping rate and further specifications of each sequenced library (Supplementary Tables 1–3). Source data are provided with this paper.
Code availability
Custom-made scripts are deposited on GitHub for SNP filtering (github.com/leiyu37/populationGenomics_ZM.git), for clone mate detection (github.com/leiyu37/Detecting-clonemates.git), for heterozygote and nucleotide diversity quantification (github.com/leiyu37/populationGenomics_ZM.git) and to prepare SplitsTree input files (https://github.com/leiyu37/populationGenomics_ZM/blob/main/10_SplitsTree/vcf2alignment.py) and SNAPP input files (github.com/mmatschiner/snapp_prep). Scripts for calculating D-statistics are available at github.com/mmatschiner/tutorials/blob/master/analysis_of_introgression_with_snp_data/src/plot_d.rb. Scripts to prepare the gene presence/absence analysis are deposited on https://github.com/leiyu37/populationGenomics_ZM/tree/main/gene_presense_absence_analysis. Further software code for the MSMC analysis are found at http://lh3lh3.users.sourceforge.net/snpable.shtml (generation of mappability mask file for each of six chromosomes using SNPable) and at https://github.com/stschiff/msmc-tools (generation of ramet-specific mask file based on a bam file using bamCaller.py).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Change history
8/7/2023
A Correction to this paper has been published: 10.1038/s41477-023-01504-y
Extended data
is available for this paper at 10.1038/s41477-023-01464-3.
Supplementary information
The online version contains supplementary material available at 10.1038/s41477-023-01464-3.
References
- 1.Chen L-Y, et al. Phylogenomic analyses of Alismatales shed light into adaptations to aquatic environments. Mol. Biol. Evol. 2022;39:msac079. doi: 10.1093/molbev/msac079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Unsworth RKF, Cullen-Unsworth LC, Jones BLH, Lilley RJ. The planetary role of seagrass conservation. Science. 2022;377:609–613. doi: 10.1126/science.abq6923. [DOI] [PubMed] [Google Scholar]
- 3.Green, E. P. & Short, F. T. World Atlas of Seagrasses (Univ. California Press, 2003).
- 4.Röhr ME, et al. Blue carbon storage capacity of temperate eelgrass (Zostera marina) meadows. Glob. Biogeochem. Cycles. 2018;32:1457–1475. doi: 10.1029/2018GB005941. [DOI] [Google Scholar]
- 5.Coyer JA, et al. Phylogeny and temporal divergence of the seagrass family Zosteraceae using one nuclear and three chloroplast loci. Syst. Biodivers. 2013;11:271–284. doi: 10.1080/14772000.2013.821187. [DOI] [Google Scholar]
- 6.Waycott, M., Biffin, E. & Les, D. H. in Seagrasses of Australia: Structure, Ecology and Conservation (eds Larkum, A. W. D., Kendrick, G. A. & Ralph, P. J.) 129–154 (Springer International, 2018).
- 7.Harwell MC, Orth RJ. Long-distance dispersal potential in a marine macrophyte. Ecology. 2002;83:3319–3330. doi: 10.1890/0012-9658(2002)083[3319:LDDPIA]2.0.CO;2. [DOI] [Google Scholar]
- 8.Marske KA, Rahbek C, Nogués-Bravo D. Phylogeography: spanning the ecology–evolution continuum. Ecography. 2013;36:1169–1181. doi: 10.1111/j.1600-0587.2013.00244.x. [DOI] [Google Scholar]
- 9.Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 2012;29:1917–1932. doi: 10.1093/molbev/mss086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform. 2018;19:153. doi: 10.1186/s12859-018-2129-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stange M, Sánchez-Villagra MR, Salzburger W, Matschiner M. Bayesian divergence-time estimation with genome-wide single-nucleotide polymorphism data of sea catfishes (Ariidae) supports Miocene closure of the Panamanian Isthmus. Syst. Biol. 2018;67:681–699. doi: 10.1093/sysbio/syy006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hewitt G. The genetic legacy of the Quaternary ice ages. Nature. 2000;405:907–913. doi: 10.1038/35016000. [DOI] [PubMed] [Google Scholar]
- 13.Bringloe TT, Verbruggen H, Saunders GW. Unique biodiversity in Arctic marine forests is shaped by diverse recolonization pathways and far northern glacial refugia. Proc. Natl Acad. Sci. USA. 2020;117:22590–22596. doi: 10.1073/pnas.2002753117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Neiva J, et al. Glacial vicariance drives phylogeographic diversification in the amphi-boreal kelp Saccharina latissima. Sci. Rep. 2018;8:1112. doi: 10.1038/s41598-018-19620-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Marko PB, et al. The ‘expansion–contraction’ model of Pleistocene biogeography: rocky shores suffer a sea change? Mol. Ecol. 2010;19:146–169. doi: 10.1111/j.1365-294X.2009.04417.x. [DOI] [PubMed] [Google Scholar]
- 16.Hewitt, G. M. & Nichols, R. A. in Climate Change and Biodiversity (eds Lovejoy, T. E. & Hannah. L.) 176–192 (Yale Univ. Press, 2005).
- 17.Duffy JE, et al. A Pleistocene legacy structures variation in modern seagrass ecosystems. Proc. Natl Acad. Sci. USA. 2022;119:e2121425119. doi: 10.1073/pnas.2121425119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Clark PU, et al. The Last Glacial Maximum. Science. 2009;325:710–714. doi: 10.1126/science.1172873. [DOI] [PubMed] [Google Scholar]
- 19.Ma X, et al. Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass) F1000Research. 2021;10:289. doi: 10.12688/f1000research.38156.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Danilevicz MF, Tay Fernandez CG, Marsh JI, Bayer PE, Edwards D. Plant pangenomics: approaches, applications and advancements. Curr. Opin. Plant Biol. 2020;54:18–25. doi: 10.1016/j.pbi.2019.12.005. [DOI] [PubMed] [Google Scholar]
- 21.Olsen JL, et al. The genome of the seagrass Zostera marina reveals angiosperm adaptation to the sea. Nature. 2016;530:331–335. doi: 10.1038/nature16548. [DOI] [PubMed] [Google Scholar]
- 22.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 2005;14:2611–2620. doi: 10.1111/j.1365-294X.2005.02553.x. [DOI] [PubMed] [Google Scholar]
- 24.Puechmaille SJ. The program structure does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Mol. Ecol. Resour. 2016;16:608–627. doi: 10.1111/1755-0998.12512. [DOI] [PubMed] [Google Scholar]
- 25.Li Y-L, Liu J-X. StructureSelector: a web-based software to select and visualize the optimal number of clusters using multiple methods. Mol. Ecol. Resour. 2018;18:176–177. doi: 10.1111/1755-0998.12719. [DOI] [PubMed] [Google Scholar]
- 26.Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 2006;23:254–267. doi: 10.1093/molbev/msj030. [DOI] [PubMed] [Google Scholar]
- 27.Patterson N, et al. Ancient admixture in human history. Genetics. 2012;192:1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Malinsky M, Matschiner M, Svardal H. Dsuite—fast D-statistics and related admixture evidence from VCF files. Mol. Ecol. Resour. 2021;21:584–595. doi: 10.1111/1755-0998.13265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Marincovich L, Gladenkov AY. Evidence for an early opening of the Bering Strait. Nature. 1999;397:149–151. doi: 10.1038/16446. [DOI] [Google Scholar]
- 30.Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-Pro: quartet-based species-tree inference despite paralogy. Mol. Biol. Evol. 2020;37:3292–3307. doi: 10.1093/molbev/msaa139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ogilvie HA, Bouckaert RR, Drummond AJ. StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol. Biol. Evol. 2017;34:2101–2114. doi: 10.1093/molbev/msx126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015;32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schiffels S, Durbin R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 2014;46:919–925. doi: 10.1038/ng.3015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schiffels, S. & Wang, K. in Statistical Population Genomics pp. 147-166 (Humana, 2020).
- 35.Cortés AJ, López-Hernández F, Osorio-Rodriguez D. Predicting thermal adaptation by looking into populations’ genomic past. Front. Genet. 2020;11:564515. doi: 10.3389/fgene.2020.564515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gross CP, et al. The biogeography of community assembly: latitude and predation drive variation in community trait distribution in a guild of epifaunal crustaceans. Proc. R. Soc. B. 2022;289:20211762. doi: 10.1098/rspb.2021.1762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gallagher SJ, et al. The Pliocene to recent history of the Kuroshio and Tsushima Currents: a multi-proxy approach. Prog. Earth Planet. Sci. 2015;2:17. doi: 10.1186/s40645-015-0045-6. [DOI] [Google Scholar]
- 38.Burton RS. Intraspecific phylogeography across the Point Conception biogeographic boundary. Evolution. 1998;52:734–745. doi: 10.2307/2411268. [DOI] [PubMed] [Google Scholar]
- 39.Checkley DM, Barth JA. Patterns and processes in the California Current System. Prog. Oceanogr. 2009;83:49–64. doi: 10.1016/j.pocean.2009.07.028. [DOI] [Google Scholar]
- 40.Talbot SL, et al. The structure of genetic diversity in eelgrass (Zostera marina L.) along the North Pacific and Bering Sea coasts of Alaska. PLoS ONE. 2016;11:e0152701. doi: 10.1371/journal.pone.0152701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Laakkonen HM, Hardman M, Strelkov P, Väinölä R. Cycles of trans-Arctic dispersal and vicariance, and diversification of the amphi-boreal marine fauna. J. Evol. Biol. 2021;34:73–96. doi: 10.1111/jeb.13674. [DOI] [PubMed] [Google Scholar]
- 42.Coyer JA, Hoarau G, Van Schaik J, Luijckx P, Olsen JL. Trans-Pacific and trans-Arctic pathways of the intertidal macroalga Fucus distichus L. reveal multiple glacial refugia and colonizations from the North Pacific to the North Atlantic. J. Biogeogr. 2011;38:756–771. doi: 10.1111/j.1365-2699.2010.02437.x. [DOI] [Google Scholar]
- 43.Maggs CA, et al. Evaluating signals of glacial refugia for North Atlantic benthic taxa. Ecology. 2008;89:S108–S122. doi: 10.1890/08-0257.1. [DOI] [PubMed] [Google Scholar]
- 44.Jenkins T, Castilho R, Stevens J. Meta-analysis of northeast Atlantic marine taxa shows contrasting phylogeographic patterns following post-LGM expansions. PeerJ. 2018;6:e5684. doi: 10.7717/peerj.5684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li, J.-J., Hu, Z.-M. & Duan, D.-L. in Seaweed Phylogeography: Adaptation and Evolution of Seaweeds Under Environmental Change (eds Hu, Z.-M. & Fraser, C.) 309–330 (Springer, 2016).
- 46.Olsen JL, et al. North Atlantic phylogeography and large-scale population differentiation of the seagrass Zostera marina L. Mol. Ecol. 2004;13:1923–1941. doi: 10.1111/j.1365-294X.2004.02205.x. [DOI] [PubMed] [Google Scholar]
- 47.Larkum, A. W. D., Orth, R. J. & Duarte, C. M. Seagrasses: Biology, Ecology and Conservation (Springer, 2006).
- 48.Palumbi SR. Genetic divergence, reproductive isolation, and marine speciation. Annu. Rev. Ecol. Syst. 1994;25:547–572. doi: 10.1146/annurev.es.25.110194.002555. [DOI] [Google Scholar]
- 49.Franssen SU, et al. Transcriptomic resilience to global warming in the seagrass Zostera marina, a marine foundation species. Proc. Natl. Acad. Sci. USA. 2011;108:19276–19281. doi: 10.1073/pnas.1107680108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bertelli CM, Unsworth RKF. Protecting the hand that feeds us: seagrass (Zostera marina) serves as commercial juvenile fish habitat. Mar. Pollut. Bull. 2014;83:425–429. doi: 10.1016/j.marpolbul.2013.08.011. [DOI] [PubMed] [Google Scholar]
- 51.Reusch TBH, et al. Lower Vibrio spp. abundances in Zostera marina leaf canopies suggest a novel ecosystem function for temperate seagrass beds. Mar. Biol. 2021;168:149. doi: 10.1007/s00227-021-03963-3. [DOI] [Google Scholar]
- 52.Macreadie PI, et al. Blue carbon as a natural climate solution. Nat. Rev. Earth Environ. 2021 doi: 10.1038/s43017-021-00224-1. [DOI] [Google Scholar]
- 53.Stevenson A, Corcora TCÓ, Hukriede W, Schubert P, Reusch TBH. Substantial seagrass blue carbon pools in the southwestern Baltic Sea are spatially heterogeneous, mostly autochthonous, and include historically terrestrial peatlands. Front. Mar. Sci. 2022;9:949101. doi: 10.3389/fmars.2022.949101. [DOI] [Google Scholar]
- 54.Hämmerli A, Reusch TBH. Flexible mating: experimentally induced sex-ratio shift in a marine clonal plant. J. Evol. Biol. 2003;16:1096–1105. doi: 10.1046/j.1420-9101.2003.00626.x. [DOI] [PubMed] [Google Scholar]
- 55.Reusch TBH. Pollination in the marine realm: microsatellites reveal high outcrossing rates and multiple paternity in eelgrass Zostera marina. Heredity. 2000;85:459–465. doi: 10.1046/j.1365-2540.2000.00783.x. [DOI] [PubMed] [Google Scholar]
- 56.Yu L, et al. Somatic genetic drift and multilevel selection in a clonal seagrass. Nat. Ecol. Evol. 2020;4:952–962. doi: 10.1038/s41559-020-1196-4. [DOI] [PubMed] [Google Scholar]
- 57.Reusch TBH, Boström C, Stam WT, Olsen JL. An ancient eelgrass clone in the Baltic Sea. Mar. Ecol. Prog. Ser. 1999;183:301–304. doi: 10.3354/meps183301. [DOI] [Google Scholar]
- 58.Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Georganas, E. et al. In SC ‘15: Proc. International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1–11 (2015).
- 60.Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129. [DOI] [PubMed] [Google Scholar]
- 62.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).
- 64.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yu L, Stachowicz JJ, DuBois K, Reusch TBH. Detecting clonemate pairs in multicellular diploid clonal species based on a shared heterozygosity index. Mol. Ecol. Resour. 2023;23:592–600. doi: 10.1111/1755-0998.13736. [DOI] [PubMed] [Google Scholar]
- 68.Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017;45:e18. doi: 10.1093/nar/gkw955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Petit, R. J. & Vendramin, G. G. in Phylogeography of Southern European Refugia: Evolutionary Perspectives on the Origins and Conservation of European Biodiversity (eds Weiss, S. & Ferrand, N.) 23–97 (Springer, 2007).
- 70.Knaus BJ, Grünwald NJ. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 2017;17:44–53. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]
- 71.Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 1999;16:37–48. doi: 10.1093/oxfordjournals.molbev.a026036. [DOI] [PubMed] [Google Scholar]
- 72.Leigh JW, Bryant D. popart: full-feature software for haplotype network construction. Methods Ecol. Evol. 2015;6:1110–1116. doi: 10.1111/2041-210X.12410. [DOI] [Google Scholar]
- 73.Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018;35:1547–1549. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Pembleton LW, Cogan NOI, Forster JW. StAMPP: an R package for calculation of genetic differentiation and structure of mixed-ploidy level populations. Mol. Ecol. Resour. 2013;13:946–952. doi: 10.1111/1755-0998.12129. [DOI] [PubMed] [Google Scholar]
- 75.Collin F-D, et al. Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest. Mol. Ecol. Resour. 2021;21:2598–2613. doi: 10.1111/1755-0998.13413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Murphy GEP, et al. From coast to coast to coast: ecology and management of seagrass ecosystems across Canada. FACETS. 2021;6:139–179. doi: 10.1139/facets-2020-0020. [DOI] [Google Scholar]
- 77.Jahnke M, et al. Seascape genetics and biophysical connectivity modelling support conservation of the seagrass Zostera marina in the Skagerrak–Kattegat region of the eastern North Sea. Evol. Appl. 2018;11:645–661. doi: 10.1111/eva.12589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genome data have been deposited in Genbank (short read archive, Supplementary Data Table 3). Coding sequences of Z. japonica and Z. marina for the ASTRAL analysis can be found on figshare (10.6084/m9.figshare.21626327.v1). VCF files of the 11,705 core SNPs can be accessed at 10.6084/m9.figshare.21629471.v1. Source data for Fig. 1b,c are given, as well as statistics of sequencing coverage, mapping rate and further specifications of each sequenced library (Supplementary Tables 1–3). Source data are provided with this paper.
Custom-made scripts are deposited on GitHub for SNP filtering (github.com/leiyu37/populationGenomics_ZM.git), for clone mate detection (github.com/leiyu37/Detecting-clonemates.git), for heterozygote and nucleotide diversity quantification (github.com/leiyu37/populationGenomics_ZM.git) and to prepare SplitsTree input files (https://github.com/leiyu37/populationGenomics_ZM/blob/main/10_SplitsTree/vcf2alignment.py) and SNAPP input files (github.com/mmatschiner/snapp_prep). Scripts for calculating D-statistics are available at github.com/mmatschiner/tutorials/blob/master/analysis_of_introgression_with_snp_data/src/plot_d.rb. Scripts to prepare the gene presence/absence analysis are deposited on https://github.com/leiyu37/populationGenomics_ZM/tree/main/gene_presense_absence_analysis. Further software code for the MSMC analysis are found at http://lh3lh3.users.sourceforge.net/snpable.shtml (generation of mappability mask file for each of six chromosomes using SNPable) and at https://github.com/stschiff/msmc-tools (generation of ramet-specific mask file based on a bam file using bamCaller.py).