Abstract
Twentieth century industrial whaling pushed several species to the brink of extinction, with fin whales being the most impacted. However, a small, resident population in the Gulf of California was not targeted by whaling. Here, we analyzed 50 whole-genomes from the Eastern North Pacific (ENP) and Gulf of California (GOC) fin whale populations to investigate their demographic history and the genomic effects of natural and human-induced bottlenecks. We show that the two populations diverged ~16,000 years ago, after which the ENP population expanded and then suffered a 99% reduction in effective size during the whaling period. In contrast, the GOC population remained small and isolated, receiving less than one migrant per generation. However, this low level of migration has been crucial for maintaining its viability. Our study exposes the severity of whaling, emphasizes the importance of migration, and demonstrates the use of genome-based analyses and simulations to inform conservation strategies.
Subject terms: Conservation genomics, Genetic variation, Evolutionary biology, Evolutionary genetics
Industrial whaling drove several species to near extinction. From an analysis of 50 whole-genomes from fin whale populations, this study shows that the fin whale population in the Eastern North Pacific was reduced 99% during whaling but has maintained genomic diversity, whereas the Gulf of California population remained small and isolated, resulting in increased genetic load.
Introduction
Due to increasing recent human impacts, many vertebrate species have experienced drastic population declines and now persist as small and fragmented populations1–3. Small populations are at higher risk of population declines due to stochastic environmental and genetic factors4–6. Both anthropogenic and naturally occurring population declines reduce genetic diversity, and increase inbreeding and genetic load due to the stronger action of genetic drift which diminish the long-term survival and adaptive potential of populations7,8. However, the impact of these processes depends on the often unknown population-specific demographic histories and life history traits. For example, gene flow as low as one effective migrant per generation may counteract genetic drift and reduce the frequency of deleterious variation9–11, but might also reduce metapopulation genetic variation12, or introduce strongly deleterious alleles13. Therefore, uncovering population history and determining how detrimental genetic patterns arise in declining populations are challenging questions, but the answers are critical to developing effective conservation strategies14.
Industrial whaling during the 20th century is arguably one of the most disruptive ecological events caused by humans15, which decimated all great whale species and drove many of them to the brink of extinction16,17. Estimating the decline of whale populations is crucial to evaluate the full impact of whaling and designing appropriate recovery policies, not only on whale abundance but on entire ecosystems15,17,18. However, quantifying the magnitude of known recent population declines in endangered vertebrate species from contemporary samples has proven difficult because the estimates based on genetic diversity capture long-term effective sizes rather than recent demographic events19,20. Additionally, the long life span and generation time of whales complicate the inference of recent population size changes21 because less generation turnover occurs in a given amount of time. Given these challenges, previous genetic studies using contemporary samples have only indirectly inferred the impact of whaling by determining that historical abundance estimates obtained from whaling records and ecological studies are orders of magnitude lower than those based on the diversity of a few mitochondrial or nuclear markers17–24, suggesting a slower recovery of whale populations after the end of whaling. Therefore, to overcome these challenges, we used high-coverage whole-genome sequence data and model-driven approaches to provide more power and resolution to directly detect recent demographic changes19,25, such as whaling.
The fin whale (Balaenoptera physalus) is the second-largest whale and the one most impacted by industrial whaling worldwide. In the North Pacific alone, more than 75,500 fin whales were harvested26. However, fin whales in the Gulf of California, Mexico, belong to a resident population that was not targeted by whalers27,28. Nevertheless, this population has been small with limited gene flow from and to the Pacific for thousands of years28–31. In contrast, the Eastern North Pacific population was large, interconnected, and overexploited27, although the population along the U.S. west coast has shown evidence of growth at 3% per year since the 1990’s32.
Here, we provide direct genome-wide demographic reconstructions of whaling in a previously large population, in comparison to a never-whaled but small and isolated population. We analyze and model the whole-genome diversity of fin whale populations with contrasting demographic histories to identify the genetic and evolutionary impacts of population reductions in large, long-lived marine mammals. Understanding the complex interaction between demographic and evolutionary factors shaping the genetic diversity in whale populations is key to improving their conservation, especially given current and future whaling threats and the challenges of climate change and human inputs to marine ecosystems17. Evaluating the genomic consequences of contrasting population reductions in fin whale populations make our results relevant for the conservation of populations in other threatened or endangered species.
Results
Sampling, population structure, and differentiation
To assess the genome-wide impact of human-induced and natural bottlenecks on fin whale populations, we generated high coverage (average 27×) whole-genome resequencing data from 50 samples of free-ranging individuals collected between 1995 and 2017 (Fig. 1A; Table S1). Thirty individuals are from regions that survived intensive whaling pressure in the Eastern North Pacific (ENP), along the coasts of California (CA; N = 9), Oregon (OR; N = 4), Washington (WA; N = 2), British Columbia (BC; N = 3) and Alaska (AK; N = 12). Additionally, we included 20 individuals from a naturally small population in the Gulf of California, Mexico (GOC), that has maintained a low population size between 300 and 600 individuals for thousands of years and avoided the impacts of whaling27,30,31.
The sequences were aligned, genotyped, annotated and filtered using the minke whale genome as a reference (BalAcu1.0). We also genotyped a subset of ten individuals using a recently available fin whale genome assembly (GCA_023338255.1). We observed only a 1.5% overestimation of diversity when using the minke whale genome as reference, which could be due to a less accurate mapping (See Supplemental Discussion). Also, both reference genomes provide similar genotyping statistics and genomic diversity results (Table S2; Fig. S1; Supplemental Methods and Results), suggesting that using the minke whale genome as a reference does not introduce significant biases in our analyses (see discussion and significance tests in Supplemental Results and Discussion). Principal component analysis (PCA) separated the ENP and GOC individuals on PC1 with tight clustering of the GOC samples (Fig. 1B). A wider dispersion pattern is observed for the ENP samples, with the Alaska samples remaining relatively clustered, suggesting some degree of differentiation of this northern population from those to the south (Fig. S2). Admixture analysis of all the samples supports a K = 2 partition of ENP and GOC samples (Figs. 1C, S3). We identified one ~50% admixed individual from each population (ENPCA09 and GOC010) and a small admixture fraction from GOC in the ENP population (Fig. 1B, C). Additional admixture analysis of only ENP samples supports a K = 1 partition of this population (Fig. S4). FST values are higher between the GOC and ENP (FST = 0.073, p = 0.001) than between all locations within the ENP (FST = 0–0.008; Table S3). Assuming the highest FST of 0.008 observed within ENP, this substructure would at most inflate effective population size (Ne) estimates by 0.8%33. Also, a phylogenetic analysis separated both populations into different groups, with the nodes within the ENP group showing no bootstrap support. The two admixed individuals clustered with ENP but showed early divergence (Fig. S5), suggesting their greater genetic differentiation. These results indicate there are two main populations in our sample, one off the Pacific coast and the other in the Gulf of California, consistent with previous microsatellite and mitochondrial data30,31. In addition, our findings confirm the strong isolation of the geographically distinct Gulf population30,34, whereas weak population substructure was observed in the eastern North Pacific.
Genome-wide patterns of variation and runs of homozygosity
We explored the genome-wide diversity patterns of fin whale populations by calculating average genome-wide heterozygosity and per-site heterozygosity in non-overlapping 1-Mb windows. In GOC individuals we found patterns of reduced variation, with an average 1.13 heterozygotes per kb (het/kb) and an increased proportion of genomic regions with low heterozygosity (46% of windows contain <1 het/kb). In contrast, the ENP population had much higher diversity (1.76 het/kb; two-tailed Mann–Whitney U [MWU] test p = 1.15E-10; Fig. 2A) and few regions of low heterozygosity (12% of windows with <1 het/kb; Figs. 2B, S6, S7). These genome-wide results imply contrasting demographic histories of long-term small and large population size in the Gulf and North Pacific, respectively30. Compared with other cetaceans that experienced different levels of population contractions, such as the diminutive vaquita porpoise (Phocoena sinus) in the Gulf of California35,36 (0.1 het/kb), abundant minke whale37 (0.6 het/kb) and endangered blue whale38 (2.1 het/kb), the GOC fin whales have maintained moderate genome-wide patterns of variation (Fig. 2A), suggesting that evolutionary mechanisms such as migration have maintained genetic diversity. However, the GOC population has an enriched number of 1-Mb windows with null or very low heterozygosity (0–0.1 het/kb) compared with more endangered mysticete species such as the North Atlantic right whale and blue whale (Fig. S8), indicating that populations of these endangered species were historically larger than the Gulf of California fin whale population and imply a reassessment towards a more threatened status of the GOC population may be needed.
To characterize the history of inbreeding events, we identified runs of homozygosity (ROH), which are genomic stretches within an individual that are assumed to be identical by descent, using two model-based methods39,40 (Fig. S9). Long ROH (≥5 Mb) typically result from recent close inbreeding whereas shorter ROH indicate either older inbreeding or older reductions in population size41. Overall, GOC individuals contained considerably more ROH segments than ENP individuals (two-tailed MWU test p = 9.42E-08), but most of the ROH were of short (0.1–1 Mb) or intermediate (1–5 Mb) length (Fig. 2A). Long ROH were present in all GOC individuals, except the admixed sample GOC010, and only in three ENP individuals. Nevertheless, they comprise a small fraction of total ROH length in both populations (FROH ≥ 5M = 0.4–3.1%; Table S4). To further explore the timing of inbreeding, we estimated the average time at which two homologous haplotypes could coalesce within our ROH categories for each population, assuming a recombination rate of 1 cM/Mb42. For short ROH, haplotypes coalesced on average approximately 145 and 250 generations ago in GOC and ENP, respectively, whereas for intermediate ROH the average haplotype coalescent time was 28 and 30 generations ago. These findings suggest a lack of recent inbreeding in both populations (Figs. 2A, S10). However, the higher number and longer ROH observed in the GOC fin whales (Figs. 2A, S9, S10), together with the high proportion of their genome contained in ROH larger than 1 Mb (FROH ≥ 1M(GOC) = 17.5–23.4%; Table S4), indicate that genomic segments in this population share a more recent common ancestor than they do in the Pacific population. Finally, we determined the relatedness between individuals in both populations and found significantly higher average kinship coefficient among GOC individuals (0.054) than in the ENP population (0.0032; two-tailed MWU test p < 2.2E-16), indicating greater identity-by-descent in the GOC, which further demonstrate higher inbreeding levels in this population (Fig. S11A). We divided the ENP into location groups to account for larger geographical coverage and continued to observe significantly higher kinship in the GOC (Fig. S11B, C). In summary, these results reflect the greater historical isolation and small population size of the GOC29 and a lack of recent inbreeding in both populations.
Demographic inference of whaling, divergence and gene flow
We reconstructed the demographic history of fin whale populations using the site frequency spectrum (SFS) to assess the impact of whaling in the Eastern North Pacific population and to determine the demographic events that have shaped the genomic diversity of the Gulf of California population. First, using the SFS from each population, we tested different single-population effective size (Ne) change models, employing coalescent43 (fastsimcoal2) and diffusion approximation44 (∂a∂i) methods. We assumed a generation time of 25.9 years45 and a mutation rate of 2.77E-08 mutation/bp/generation37, and tested several nested models with increasing numbers of size-change epochs (Fig. S12). Both inference methods provided concordant findings and ∂a∂i results are shown throughout the text, except when noted (see Tables S5–S7, for fastsimcoal2 results and all 95% confidence interval [CI] values). Our demographic analyses show that a 3-epoch model was the best fit for the ENP population (Figs. 3A, B, S13A; Tables S5, S6) and revealed an expansion starting ~115 thousand years ago (kya; 4,424 generations), from an ancestral Ne of 16,479 to 23,913. This was followed by a severe decline only 26 (one generation ago for fastsimcoal2 estimate; 95% CI: 0–2) or 52 years before present (two generations ago for ∂a∂i estimate; 95% CI: 1.89–2.11) to a current Ne = 305 individuals (95% CI: 0–1137; Fig. 3A, B; Table S7), representing an ~99% reduction. To further verify the timing and size of this recent population reduction, we implemented a grid search (Fig. S14, see Supplemental Methods and Supplemental Results), performed additional inference runs varying the time for the whaling reduction (Tables S5, S7), used different optimization methods (Table S8), confirmed our power to detect such recent decline using coalescent SFS simulations under this model (Fig. S15), and ran supplementary inferences under a SFS without filtering on genotype calls to avoid bias against rare alleles (Tables S9, S10; Supplemental Methods and Supplemental Discussion). These additional analyses demonstrated that our findings reflect a drastic recent reduction one or two generations ago. Since the average collection year for samples from this population was 2006 (Table S1), the estimated times of the reduction correspond to the years 1954 to 1980, coinciding with the most intense whaling period this population suffered between 1940 and 198026,27.
For the Gulf of California population, none of the inferred SFS for the single-population models had a good fit to the data (Fig. S13B). Additionally, the models with the best likelihood did not show convergence or concordant parameter estimation between inference methods (Tables S5, S6, S7), which can indicate an overparameterization of the models (see Supplemental Results). Therefore, we inferred the demographic history of the Gulf whales using a two-population model (described below) because they have shown to contain more information than single-population models and improve demographic inference46.
The time of divergence and migration rates between both populations were estimated by testing several two-population models based on the joint SFS between ENP and GOC (Figs. S16, 17; Table S5). The model of an ancestral size change before the populations diverged fits our data well (Figs. 3C, S17; Table S5), is consistent among inference methods (Tables S11, S12) and is biologically feasible, therefore it was chosen as our best model (see Supplemental Results). This model predicted that before the populations separated, the ancestral population expanded from ~16,000 effective individuals to ~25,000, more than 100 kya (4322 generations). Then, the populations split between 16 and 25 kya (616 and 960 generations, ∂a∂i and fastsimcoal2 estimates, respectively). Thereafter, the ENP population remained at Ne = 17,386 until it recently crashed due to whaling, as shown by the single-population model. By contrast, the GOC effective population size remained small after the divergence at Ne = 114. The model also inferred asymmetrical gene flow, with a higher migration rate from the Pacific into the Gulf population (3.42E-03; fraction of individuals that are migrants) than in the opposite direction (9.24E-05; Table S11). However, when scaled by the receiving population’s effective size, these rates represent a long-term effective migration of 0.39 immigrants per generation into the Gulf and 1.61 into the Pacific population (Fig. 3C).
To test if unsampled (ghost) populations contributed to migration into the GOC, we ran additional two-population models incorporating feasible ghost populations, the South Pacific and the western North Pacific (WNP). The ghost western North Pacific had a higher log-likelihood (Table S13) but did not considerably increase the total migration into the Gulf of California (the migration rate and effective migration from the ghost WNP into the GOC were 2.09E-04 and 0.01, respectively; Table S14; Fig. S18), demonstrating that migration from ghost populations into the GOC is negligible and does not affect our estimates. However, ghost population models revealed that the divergence between the ancestral ENP and ghost WNP populations match the expansion observed in both the single-population ENP and two-population models, around 4300 generations ago (Supplemental Discussion; Figs. 3A, C, S18; Tables S7, S11, S14).
Our results suggest the GOC population was founded at the end of the Wisconsin glaciation during the Last Glacial Maximum47 and remained small and highly isolated since then, receiving less than one migrant per generation (Fig. 3C). These findings are substantially different from estimates based on mitochondrial and microsatellite loci that predicted more recent divergence times, ~2300 or 9300 years before present (123 or 360 generations ago, respectively) and ~1 migrant per generation30,31 (see Supplemental Discussion). Therefore, our results emphasize the greater resolution of whole-genome resequencing data for demographic inference empowered by the sheer availability of independent genealogies sampled20 compared with only a handful of microsatellite loci30 and a maternally inherited non-recombining marker.
Putatively deleterious variation and genetic load
Our demographic inference analysis suggests a historically large population size and a recent contraction for the ENP population and a high degree of isolation for the GOC population. To assess how these demographic trajectories have impacted fitness, we examined variants in coding regions, which are more likely to have functional impacts. The derived alleles were classified into four mutation types: synonymous, tolerated nonsynonymous (SIFT score ≥0.05), putatively deleterious nonsynonymous (SIFT score <0.05), and loss-of-function (LOF; identified using snpEff, details in Methods). The synonymous and tolerated nonsynonymous mutations serve as a proxy for neutral variants whereas the putatively deleterious nonsynonymous and LOF mutations are proxies for putatively deleterious variants48. Although amino-acid changing variants could serve as candidates for local adaptation, most of them are deleterious49,50. Since the dominance for variants in natural populations is poorly quantified, we assumed two extreme scenarios. Specifically, the dominance of all variants is fully recessive (h = 0), or fully additive (h = 0.5).
For all four mutation types, heterozygosity is significantly depleted and homozygosity is significantly elevated in the GOC population (MWU tests p = 2.9E-12 in all comparisons; Table S15). This pattern has not been reported in other fin whale populations or great whale species25 and is consistent with reduced genome-wide heterozygosity and small population size. The number of homozygous derived putatively deleterious nonsynonymous genotypes per individual was on average 39.68% higher in the GOC (2079) compared to the ENP population (1488). Similarly, the number of homozygous-derived LOF genotypes was on average 28.98% higher in the Gulf (140) compared with the Pacific population (108; Fig. 4A). Assuming that these putatively deleterious mutations are also at least partially recessive, this increased homozygosity in the GOC is predicted to result in reduced fitness51.
When deleterious mutations act in an additive manner, the genetic load is determined by counts of derived alleles per genome. We found that the ENP and GOC populations showed a similar number of derived neutral alleles as expected52 (Table S15). For the putatively deleterious class of mutations, only nonsynonymous alleles showed a significant 2.03% elevation in the GOC population (GOC average = 5983, ENP average = 5864, MWU test p = 1.20E-07), whereas the number of LOF alleles were similar in the two populations (p = 0.87; Fig. 4B). Assuming that these nonsynonymous alleles are slightly deleterious, the small population size of the GOC population likely increased the strength of genetic drift and decreased the efficacy of selection compared to the larger ENP population, allowing the persistence of deleterious variants in the Gulf. By contrast, the similar number of LOF alleles indicates that, in spite of the GOC population’s small size, purifying selection has remained effective at eliminating the most deleterious mutations. Overall, these results imply a slight increase in the genetic load in the GOC population if deleterious mutations are additive.
Finally, we computed the RXY (relative accumulation of derived alleles) and R2XY (relative accumulation of derived homozygotes) statistics that compare the expected number of the derived alleles or homozygotes occurring only in one population53 (Fig. 4C). Among the four mutation types, only the deleterious nonsynonymous alleles showed a relative accumulation of derived alleles in GOC (RGOC/ENP = 1.04, Z-score p = 0.02), similar to the allele counts pattern (Fig. 4B). However, the R2XY was significantly elevated for all mutation types in the GOC population (Z score p < 0.001 for all comparisons), consistent with their higher homozygosity values in GOC (Fig. 4A). We repeated these analyses using snpEff’s mutation impact categories (i.e., high, moderate and low) to rule out software bias (see Methods) and found similar results (Fig. S19). In summary, these results suggest an increase in genetic load in the GOC population, both due to a shift towards higher homozygosity among all protein-coding variants, as well as an overall accumulation of putatively deleterious nonsynonymous alleles compared to the ENP population. However, the magnitude of the effect on fitness is unclear, given uncertainties about the selection and dominance coefficients of these mutations51.
Simulations of deleterious variation and genetic load
To further explore how fin whale demographic history and the recent whaling-induced decline has shaped patterns of deleterious variation and accumulation of genetic load, we ran forward-in-time genetic simulations using SLiM v.3.3.254. We simulated a 10 Mb chromosomal segment with a combination of intergenic, intronic, and exonic regions. Selection coefficients for nonsynonymous deleterious mutations were drawn from a distribution estimated from humans55, and dominance coefficients were set such that the most deleterious mutations were highly recessive, though nearly neutral mutations were closer to additive (see Methods for details).
Using this simulation framework, we first investigated the extent to which the recent whaling bottleneck may have led to an increase in genetic load in the ENP population. Specifically, we simulated under our best-fit ENP demographic model, which includes a contraction to Ne = 305 two generations ago (Fig. 3A). After two generations at Ne = 305, we did not observe any changes in genetic load, heterozygosity, or levels of inbreeding, as expected given the short duration of this decline (Fig. 5A). To explore how various potential recovery scenarios may impact the viability of the ENP population in the future, we continued these simulations for an additional 18 generations following the decline, during which we observed increasing trends for genetic load and levels of inbreeding, though minimal impacts on genetic diversity (Fig. 5A). To test the impacts of a partial recovery in the ENP, we also ran simulations where we increased the effective population size to Ne = 1000 after two generations at Ne = 305. Here, we observe minimal increases in genetic load and inbreeding, suggesting that even a modest recovery would stave off any deleterious genetic effects (Fig. 5A). In conclusion, these results highlight the importance of a prompt recovery to minimize deleterious genetic impacts from the whaling bottleneck.
Our next aim for these simulations was to assess the importance of low levels of migration (0.39 effective migrants/gen from ENP to GOC) for maintaining genetic diversity and fitness in the small GOC population (Ne = 114) despite long-term isolation (~16 kya). We simulated under our best-fit two-population demographic model, running simulations that included the estimated rates of migration between the ENP and GOC (Fig. 3C) as well as simulations where no migration was allowed. When carrying out simulations that include the empirically inferred rate of migration from ENP to GOC, we observe a 26.7% reduction in heterozygosity and increase in FROH > 1Mb from 0 to 0.10 in the GOC population compared to the ENP population (Fig. 5B), in good agreement with the trends from our empirical dataset (35.7% empirical heterozygosity reduction; Fig. 2). Additionally, we find that average genetic load in the GOC population is elevated to 7.75% compared to 2.87% in the ENP population (Fig. 5B). However, this increase in genetic load appears to be counteracted by the removal of recessive strongly deleterious mutations (s < −0.01), which are reduced in frequency by 22.9% in the GOC population (Fig. S20). By contrast, we observe minimal differences in the numbers of moderately (−0.01 < s ≤ −0.001) or weakly (−0.001 < s ≤ −0.00001) deleterious alleles per individual (Fig. S20), suggesting that migration has helped keep these mutations from drifting to high frequency in the GOC population. In summary, these results suggest that isolation and small population size in the GOC may have resulted in a lowered fitness, though these fitness reductions have apparently not been substantial enough to impact population viability.
When simulating without migration, we observed far more dramatic changes in the genetic composition of the GOC population. Specifically, we found a near-complete loss of genetic diversity, higher levels of inbreeding (FROH>1Mb = 0.11), and a substantial increase in genetic load to 10.3% in the GOC population (Fig. 5B). The loss of diversity was also confirmed in theoretical calculations (see Supplemental Results). This increase in genetic load appears to be driven primarily by fixation of moderately deleterious alleles (9.22% gain in the isolated GOC population compared with the migration scenario; Fig. S20). Thus, these simulations suggest that, in the absence of migration, the GOC population would have experienced a much more substantial increase in genetic load, which may have been substantial enough to drive extinction. In conclusion, these results highlight the importance of low levels of migration in maintaining viability in the GOC population over its long period of isolation.
Discussion
Detecting recent population bottlenecks in endangered species using estimates of genetic diversity in contemporary samples has been challenging19,20, especially in long-lived species with long generation times, such as the great whales21,56. Specifically, the influence of changes in population size on genetic diversity is slow relative to temporal scale of human-induced events19 and the overall loss of genetic variation depends on the duration of the bottleneck relative to the life history traits57,58 such as life-span and generation time. Although genomic data can improve our ability to detect the impact of bottlenecks, most studies analyzing whole-genome data have failed to detect signals of whaling in blue38 and gray whales59, presumably due to small sample sizes. Recently, low-coverage sequencing of North Atlantic fin whales may have recovered a signal of whaling, although the results did not completely rule out the alternative scenario of a more gradual decline over the last 600 years rather than an abrupt whaling bottleneck25, two scenarios which are challenging to disentangle, particularly with added uncertainties associated with low-coverage data. Here, we show that using high-coverage genome resequencing (~27×), sampling a high number of individuals (~30 per population) at a single timepoint, and implementing SFS-based demographic inference approaches, anthropogenic population contractions, such as the one imposed by the 20th-century whaling on fin whales26,27 can be identified (Supplemental Discussion). In addition to our sampling and methodological approaches, the combination of a high pre-whaling genetic variation possessed by the fin whales in the Eastern North Pacific30,31,34,60 together with an extreme reduction of two orders of magnitude, even if short, likely caused a deficit in low-frequency variants in present-day individuals that we were able to detect20 (Fig. 3B). Therefore, our research demonstrates that even very recent human-driven population bottlenecks leave a detectable genomic footprint in the SFS derived from genome-wide data of contemporary individuals, and this signal can be used to identify the demographic and genetic effects of recent exploitation and model current and future impacts on populations.
Our study examines the natural experiment of whale populations that have experienced both natural and anthropogenic population bottlenecks, providing unique contrasts not available in single-population studies25. Despite a 99% decline in effective population size, the Eastern North Pacific fin whales have retained most of their pre-whaling genetic diversity (Figs. 2, 5A). They do not exhibit a substantial decrease in genome-wide heterozygosity nor an increase in inbreeding or genetic load (Figs. 2, 4 and 5A), similar to that found in a North Atlantic population25. Since genetic diversity declines exponentially with the number of generations passed from the contraction, this lagging impact on genetic diversity is likely a consequence of the long generation time of fin whales45 (~25.9 yrs) relative to the duration of the whaling bottleneck (~70 years) and a partial recovery following the whaling moratorium beginning in 198532,58,61. The contraction, although severe, only lasted for two generations (see Supplemental Results). However, other detrimental effects remain alarming. The reduction in 99% of pre-whaling effective size has likely had strong ecological consequences15,18,62. Additionally, if the ENP population does not completely recover and remains relatively small, it may experience a loss of adaptive potential to resist future climate change or disease63. Furthermore, this reduced effective population size in the ENP could also imperil the viability of the Gulf of California population by further diminishing or completely halting migration into this population, which our simulations have shown can accelerate the accumulation of deleterious load and loss of genetic diversity. These simulations allowed us to explore genomic consequences under various conservation scenarios (Fig. 5), an important perspective not yet adopted in other great whale genomic studies25,38,59. Both empirical and simulation findings show that continuing the current moratorium and enhancing population size remains essential for fin whale recovery and long-term persistence17,26.
Regarding the Gulf of California fin whale population, our results show that immigration from ghost populations is negligible (see Supplemental Discussion) and as few as 0.39 migrants per generation have been sufficient to maintain genetic diversity and fitness in this population over ~16,000 years of isolation (Fig. 5B), which is consistent with other genetic and ecological studies describing the isolation of this population28,30,34. By contrast, when omitting migration from our simulations, we observe a near-complete loss of genetic diversity and a substantial increase in levels of inbreeding and genetic load (Fig. 5B). Thus, these results highlight the importance of gene flow for maintaining population viability over long evolutionary timescales11,64, even when levels of migration are far lower than the classic rule of thumb of ‘one migrant per generation’10. This rule has been widely applied in conservation, however, it is based on a neutral model that makes numerous simplifying assumptions and does not consider deleterious variation12. Here, we combine empirical observations with more realistic models including deleterious variation to demonstrate that small populations can be maintained by exceedingly low levels of migration, even when modest levels of genetic load may accumulate65. These results have important implications for conserving other small and isolated populations, where maintaining high levels of migration may not be feasible.
Population persistence in the GOC also appears to be enabled in part by eliminating strongly deleterious mutations, as has been shown in other small vertebrate populations66,67 including marine mammals36. Specifically, our simulations suggest a 22.9% reduction in the frequency of these mutations in the GOC (Fig. S20) due to its long-term small population size, occurring despite the impact of gene flow continually reintroducing these mutations13. However, we were unable to detect this decrement in our empirical dataset, where we observed similar numbers of putatively deleterious LOF mutations in the GOC and ENP populations (Fig. 4). This discrepancy could be partially explained by LOF mutations being an imperfect proxy of strongly deleterious variation68,69, as shown in empirical studies48. Although it could be argued that some genomic patterns of deleterious variation might reflect local adaptation in the GOC population, this explanation seems unlikely. For example, only drift would cause increased homozygosity in all mutation categories as observed, specifically, increased homozygosity in synonymous variants is not expected under a scenario of local adaptation (Fig. 4A, C). Moreover, local adaptive events occur more rarely than genetic drift and purifying selection that is constantly ongoing in natural populations70.
Here, we have assessed the genomic impacts of both natural and anthropogenic bottlenecks on the second-largest mammal. We demonstrate that it is possible to confidently estimate the magnitude and timing of recent human-driven population bottlenecks, and to determine the key role that gene flow and potential purging of deleterious variants play in the persistence of small isolated populations by analyzing whole-genome resequencing data from contemporary samples together with individual-based simulations. From a conservation perspective, our findings expose the severity of whaling and indicate that it is necessary to reassess the recovery goals for the ENP fin whales and the regional threatened status of the GOC population, which may warrant specific conservation actions to maintain gene flow and avert additional impacts from climate change, mortality by entanglement28 or microplastic contamination71. Therefore, our study contributes to fulfilling the overdue promise of genomics to conservation biology concerning the genetic effects of very recent population reductions caused by anthropogenic activities and identifying the evolutionary and ecological processes that promote the viability of small populations72. Finally, we demonstrate the importance of using both genomic and simulated data to inform the conservation of intensely exploited species.
Methods
Samples and sequencing
Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented by individuals from the coasts of California [9], Oregon [4], Washington [2], British Columbia [3], and Alaska [12]; Table S1), and the Gulf of California (GOC; N = 20, from seven different localities; Bahía de La Paz [3], Loreto [6], Bahía de los Angeles [5], Bahía Kino [3], North of Tiburon Island [1], Puerto Refugio [1] and out of Bahía Los Frailes [1]). All samples from the Gulf of California were obtained under the appropriate collecting permits issued by the Mexican Wildlife Agency (Dirección General de Vida Silvestre, Subsecretaría de Gestión para la Protección Ambiental, Secretaría del Medio Ambiente y Recursos Naturales; permit numbers: D0070(2)−0598, D00700(2)−14093, D00750-1537 and SGPA/DGVS/−0576). Samples from the Eastern North Pacific were collected by the Southwest Fisheries Science Center (California, USA) under US Marine Mammal Protection Act permits (NMFS-873, NMFS-1026, NMFS-774-1437, NMFS 0782-1438, NMFS-774-1714, NMFS-774-1437, NMFS-14097, and NMFS-19091). DNA from the samples was extracted using the QIAamp DNA Mini Kit (Qiagen; California, USA. Catalog number: 51304). The genomic libraries were prepared from extracted DNA using the Illumina TruSeq DNA PCR-free standard kit (Illumina; California, USA. Catalog number: 20015962) following the manufacturer’s instructions. Whole-genome sequencing was performed using the 150-bp paired-end protocol on Illumina HiSeqX or NovaSeq6000 platforms. Library preparation and sequencing were performed in Fulgent genetics’ sequencing core facility (Fulgent genetics LLC; California, USA).
To compare the fin whales’ genomic characteristics within Mysticeti, previously generated whole-genome resequencing fastq data from four representative Mysticeti species were downloaded from the NCBI Sequence Read Archive: the minke whale (Balaenoptera acutorostrata; SRR1802584), a stable and abundant rorqual; the humpback whale (Megaptera novaeangliae; SRR5665639), the closest relative with fin whales; the North Atlantic right whale (Eubalaena glacialis; SRR5665640) and the blue whale (Balaenoptera musculus; SRR5665644), the most endangered baleen whales (Table S1).
Read processing and alignment
We followed the sequence reads processing and genotyping pipeline adapted from the Genome Analysis Toolkit (GATK) Best Practices Guide75. Read quality was first checked using FastQC v.0.11.876. Illumina adapters were removed from the paired-end sequence reads using picard (v.2.20.3) MarkIlluminaAdapters. The adapter-free paired-end reads were aligned against the minke whale (Balaenoptera acutorostrata scammoni) reference genome (GCF_000493695.1 [BalAcu1.0]; Scaffold N50: 12,843,668, Downloaded on November 12, 2019) using BWA-MEM v.0.7.1777. Mapping statistics were generated using QUALIMAP v.2.278 and samtools v.1.979. We used the minke whale genome as a reference because the available fin whale genome assemblies are much more fragmented and poorly annotated (GCA_008795845.1; Scaffold N50: 871,016) or they did not have a publicly available genome annotation as of November 2022 (GCA_023338255.1), and the blue whale genome (GCF_009873245.2) did not have genome annotation in 2019 (Supplemental Methods; Table S16; Fig. S21). The fin whale and minke whale are in the same genus, with a divergence time of ~10 million years ago38. The average mapping rate of fin whale reads to the minke whale genome is 99.09 ± 0.21% (Table S1), which is similar to the 99.49% mapping rate to the most recent fin whale reference genome (GCA_023338255.1; Table S2), obtained from a subset of samples (n = 10; see Supplemental Methods), suggesting that the divergence time with minke whales did not strongly impact read alignment.
Genotype calling and filtration
Joint genotype calling at all sites (including invariant positions) across the reference genome was performed using GATK80 (v.3.8). We removed PCR duplicates from the bam files using picard MarkDuplicates. Raw variant calling was performed for each individual using GATK’s HaplotypeCaller using the default settings for removing low-quality reads (min_mapping_quality_score=20; min_base_quality_score=20). Joint genotype calls for the 50 fin whales were generated from the raw variants using GATK GenotypeGVCF, excluding scaffolds shorter than 1 Mbp. The total scaffold length used for genotyping was 2,324,429,847 bp, with the excluded scaffolds comprising only 4.4% of the total genome length (107,257,851 bp out of 2,431,687,698 bp).
Since we do not have a database of known variants, we did not perform base quality score recalibration (BQSR) or variant quality score recalibration (VQSR). Instead, we performed a stringent set of quality and depth filters for the genotype calls, keeping only high-quality biallelic SNPs and monomorphic sites with the latter including all homozygous reference or all homozygous alternate genotypes (Fig. S22). Sites that (1) had low Phred score (QUAL < 30); (2) failed GATK recommended hard filters (QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0 || SOR > 3.0); or (3) fell within repeat regions identified by WindowMasker81, RepeatMasker82 or CpG islands identified by UCSC genome browser (total length: 1,247,900,490 bp), were marked as failed filtration (Fig. S22A). For the sites that passed the above filters, we performed genotype-level filtration. Specifically, for each individual, only genotypes with a minimum depth of eight reads and maximum depth of 2.5x mean depth; a minimum Phred score of 20, and expected allele balance (the following thresholds were used for the allele balance, defined as the read depth for the reference allele divided by the total read depth: ≥0.9 for homozygous reference genotypes; between ≥0.2 and ≤ 0.8 for heterozygous genotypes; and ≤ 0.1 for homozygous alternative genotypes) were kept. Genotypes that failed these filters were converted to missing (Fig. S22B). Thereafter, sites were further filtered if they had more than 20% missing genotypes or more than 75% heterozygous genotypes (Fig. S22A). We repeated the genotype calling and filtration pipeline with four additional baleen whales included with 50 fin whale samples. The derived dataset (“f50b4” in the following text) was only used in the construction of neighbor-joining tree and generation of genome-wide heterozygosity comparison. An additional variant dataset (“genotype-filter-free” dataset) for the ENP individuals without any genotype-level filters was generated and used in confirmatory demographic inference (Supplemental Methods). We also performed the same genotyping pipeline using the most recent fin whale genome as reference (GCA_023338255.1) in a subset of 10 individuals (10-fin-ref dataset) to determine if there were significant differences in genomic diversity estimates caused by the reference genome used (minke whale vs fin whale; see Supplemental Methods, Results, and Discussion). The total number of sites that passed all the filters in our genotyping pipeline for the different datasets we analyzed is reported in Table S17.
Variant annotations and identification of neutral regions
We annotated variant sites using two softwares, snpEff v.4.3.183 and SIFT4G v.6.084. We used the minke whale genome annotation gtf file to build custom snpEff and SIFT4G databases with default settings. We then annotated and predicted the effects of variants with -canon option in snpEff and -t option in SIFT4G. The most deleterious effect was selected per site.
Although a recent fin whale genome assembly (GCA_023338255.1) has been annotated25, this annotation is not publicly available at the present time, preventing us to use it to identify putatively neutral regions for our demographic and deleterious variation analyses. In addition, if the annotation of this fin whale genome assembly would be available it is unlikely it will significantly affect our main results and conclusions (See Supplemental Discussion).
We used the minke whale as an outgroup to classify the allele ancestral states, and considered the sites in the minke whale reference sequence as ancestral. Because the minke whale has evolved since the common ancestor with these two populations of fin whales, the ancestral alleles identified may not represent the true ancestral state. However, this error is not expected to bias the relative comparison of variants between the ENP and GOC fin whales since they are equally diverged from the minke whale. To detect the putatively neutral regions for demographic modeling, we first extracted sites that passed all filters and are at least at 20 kb distance from exons or coding regions and not in CpG islands or repetitive regions using bedtools v.2.28.085. The identified regions were aligned to the zebra fish genome, using BLAST v.2.7.186, regions with a hit with e-value lower than 1E-10 were further removed, as they could represent conserved regions and not evolving neutrally. 397,627,899 sites were defined as neutral.
Evaluation of population structure
Population structure analyses were performed using the R package SNPRelate v.1.16.087 and gdsfmt v.1.22.088. We selected biallelic sites in the vcf that passed variant filtration criteria and converted them to gds format using function snpgdsVCF2GDS. Linkage disequilibrium pruning was implemented (snpgdsLDpruning) with an r2 cutoff of 0.2, and a minor allele frequency cutoff of 0.10. A total of 30,350 SNPs were kept for PCA, kinship, and FST analyses.
We performed the PCA analysis using the function snpgdsPCA. After observing the overall population structure, an additional PCA was performed within ENP individuals to inspect variation among locations. The kinship between sample pairs was assessed using PLINK’s identity-by-descent method of moments approach (snpgdsIBDMoM). We calculated kinship at three different levels: (1) populations (groups: ENP and GOC), (2) sampling locations (groups: AK, BC, OR, WA, CA, and GOC); and (3) merged middle ENP locations combining samples from BC, WA and OR (groups: AK, MENP and GOC). The two-tailed MWU test was used to compare the average kinship coefficients among groups. FST between populations, sampling locations and merged ENP locations were calculated using the Weir and Cockerham estimator89, with a SNP missing rate at 20% (function snpgdsFst, missing.rate = 0.2). The significance of FST was estimated using 999 permutations described in ref. 90. Due to the low sample size in BC, OR and WA locations, we only estimated the significance of FST between populations and merged ENP locations. To determine the potential influence from population substructure within ENP on Ne estimates, we calculated the population size inflation factor by 1/(1- FST)33, using the highest FST value found in the ENP.
The LD pruned SNP set was converted to PLINK ped format using function seqGDS2VCF in R package SeqArray v.1.26.288 and PLINK v.1.9091. ADMIXTURE92 (v.1.3.0) analyses were performed using values of K from two to six, with 10 iterations per K. Mean cross-validation (CV) error for each K was used to select the best number of ancestral populations (K). To further test a substructure in the ENP, additional ADMIXTURE analyses were performed within ENP individuals, using values of K from one to six, with the same settings described above. A neighbor-joining phylogenetic tree was constructed from 32,191 LD pruned SNPs in the “f50b4” dataset using function nj in R package ape v.5.393, and visualized using ggtree v.2.0.494. 1000 bootstraps were performed, and the North Atlantic right whale (“EubGla01”) was designated as the outgroup (Fig. S5).
Heterozygosity and identification of runs of homozygosity
We defined heterozygosity as the number of heterozygous genotypes divided by the total number of called genotypes, including monomorphic sites, that passed variant filtration standards48. We first calculated the genome-wide heterozygosity for all scaffolds used for genotyping. Two-tailed MWU tests were used to evaluate if the genome-wide heterozygosity varied significantly between the ENP and GOC populations. We also calculated the per-site heterozygosity in non-overlapping 1 Mb windows across the scaffolds. Windows with more than 80% missing data were excluded. The missing data in these windows derive from regions that failed site filtering criteria described above.
For identifying ROH, we first separated the vcf file for ENP and GOC individuals and reestimated allele frequencies within each population. ROH were identified using bcftools roh -G30 in bcftools v.1.939. Three individuals were excluded from bcftools ROH analyses to avoid biasing allele frequency estimations [ENPCA09 and GOC010 due to admixture proportion > 0.25; ENPOR12 due to low genotyping rate (Fig. S22)]. Additional ROH analysis was performed using R package RZooRoH v.0.2.340, which can classify ROH segments into different age classes. A model with ten classes (9 ROH and 1 non-ROH) and a successive rate of three was applied (zoomodel, K = 10, base = 3). A minor allele frequency cutoff of 0.05 was used but no individual was excluded. For both methods, ROH segments less than 100 kb were discarded. The rest of the segments were divided in three length categories, short (0.1 Mb ≤ ROH < 1 Mb), intermediate (1 Mb ≤ ROH < 5 Mb) and long (≥5 Mb). The concordance of the two methods was confirmed (Fig. S9) and the output from the RZooRoH analysis is shown in the main text. The proportion of genomes with ROH (FROH) was calculated as the total length of ROH passing a certain length threshold (e.g. ROH > 100 kb) within an individual divided by the total scaffold length used for genotyping (2,324,429,847 bp). We used the two-tailed MWU test to compare total number of ROH segments in all length categories obtained in the two populations.
To determine if the inbreeding observed in both fin whale populations were due to recent or older events, we estimated the average time at which two haplotypes would coalesce in each of the ROH categories (short, intermediate and long). The length of ROH associated with inbreeding (L) decreases due to recombination in each generation and follows an exponential distribution95–97. The mean length of ROH in the exponential distribution is E[L] = 100/2tr, where E[L] is the mean ROH length (in Mb), the constant 100 represents large segments belonging to the common ancestor in cM, t is the number of generations to the common ancestor and r is the assumed constant recombination rate of 1 cM/1Mb42,98. Therefore, we calculated on average how many generations ago two haplotypes shared a common ancestor in each of the ROH categories as t = 100/2E[L]r42.
Projected site frequency spectra
A vcf file comprising only putatively neutral SNPs was used to obtain the site frequency spectrum (SFS) within and between populations. To avoid introducing bias to our demographic inferences from known contributing factors, such as uneven read depth99, admixture proportions44 and highly related individuals100, six individuals were discarded in SFS projection (Low genotype depth: “ENPOR12”; Admixture proportion > 0.25: “ENPCA01”, “ENPCA09”, “GOC010”; Kinship > 0.15: “GOC080”, “GOC111”). To avoid uncertainties in ancestral state classifications, we computed a folded SFS. This SFS was calculated based on a hypergeometric projection implemented using easySFS v.0.0.1 (https://github.com/isaacovercast/easySFS), which minimizes the effects of missing genotypes101 (https://dadi.readthedocs.io/en/latest/user-guide/manipulating-spectra/#projection). From this projection, an optimal number of haploid individuals with a maximized number of SNPs are identified and this number is then used to construct the folded SFS. Both the single-population SFS for each population (projected haploid size: ENP = 44, GOC = 30; projected number of SNPs: ENP = 3,410,730, GOC = 1,532,968) and the joint two-population SFS were generated (projected number of SNPs: ENP-GOC = 3,418,226). Thereafter, the count of monomorphic sites was calculated and incorporated as follows: for the single-population SFS, monomorphic sites in the neutral regions that were called in at least the number of haploid individuals in the projection were added to the 0-bin already calculated by the projection. For the two-population SFS, monomorphic sites were computed by counting the number of monomorphic sites that were called in at least 44 haploid individuals in the ENP population and at least 30 haploid individuals in the GOC population. These sites were added to the previous 0-0-bin of the projection.
Demographic history reconstruction
We utilized the projected neutral SFS generated above to reconstruct the demographic history of fin whales surveyed in this study using two methods: ∂a∂i44 (v.2.2.1; Diffusion Approximations for Demographic Inference) and fastsimcoal243 (v.2.6; fast sequential Markov coalescent simulation).
To explore a variety of possible demographic scenarios, we first tested the following single-population models on the ENP and GOC populations separately (Fig. S12; Table S7). All the models are described forward in time. For population size parameters (NANC, NCUR, etc.), all values are in units of numbers of diploids. For time parameters (T, TCUR, etc.), all values are in units of generations. For the ENP population, we explored two additional 3Epoch models fixing the TCUR to two generations (3EpochTcur2) or three generations (3EpochTcur3).
1Epoch: single epoch model with no population size change. This model provides a “null model” that estimates ancestral population size (NANC).
2Epoch: two epoch model with one size change event, from the ancestral size (NANC) to the current size (NCUR) occurring T generations ago.
3Epoch: three epoch model with two size change events. The first event changed from the ancestral size (NANC) to a bottleneck size (NBOT) and lasted for TBOT generations. The second event changed from the bottleneck size (NBOT) to the current size (NCUR) occurring TCUR generations ago.
4Epoch: four epoch model with three size change events. The first event changed from the ancestral size (NANC) to a bottleneck size (NBOT) and lasted for TBOT generations. The second event changed from the bottleneck size (NBOT) to a recovery size (NREC) and lasted for TREC generations. The third event changed from the recovery size (NREC) to the current size (NCUR) occurring TCUR generations ago. For the 3Epoch and 4Epoch models, we note that despite the population sizes were named as a “bottleneck size” or “recovery size”, we did not restrict the direction of size changes (expansion or contraction) for any events.
Next, we tested the following two-population models (Fig. S16; Table S11) to elucidate the divergence time and gene flow in the ENP and GOC populations:
Split-NoMigration: a simple population split model with no migrations. The ancestral population (NANC) diverged into the ENP (NENP) and GOC (NGOC) populations occurring T generations ago. Two populations remained isolated since then.
Split-SymmetricMigration: an isolation-migration model. The ancestral population (NANC) diverged into the ENP (NENP) and GOC (NGOC) populations occurring T generations ago. The ENP and GOC populations maintained a symmetric migration rate of m.
Split-AsymmetricMigration: another isolation-migration model. This model is similar to model 2 (Split-SymmetricMigration), but the ENP and GOC populations were allowed to have different values of migration rate, with mENP->GOC measured as the fraction of individuals each generation in the GOC population that are new migrants from ENP, and vice versa for mGOC->ENP
Split-AsymmetricMigration-ENPChangeTw2: this model is based on model 3 (Split-AsymmetricMigration), but an ENP population size change event to NENP2 is introduced after population divergence, with a fixed TW = 2 generations before present. This size change event after divergence is used to model the impact of whaling bottleneck.
AncestralSizeChange-Split-AsymmetricMigration: this model is based on model 3 (Split-AsymmetricMigration), but an ancestral size change event from NANC to NANC2 that lasted for TA generations was introduced before population divergence.
AncestralSizeChange-Split-Isolation-AsymmetricMigration: this model is based on model 5 (AncestralSizeChange-Split-AsymmetricMigration), but after population divergence, an isolation period lasted for TD, during which there is no migration between the ENP and GOC populations. Asymmetric migrations between two populations occurred TC generations before present.
AncestralSizeChange-Split-AsymmetricMigration-GOCChange: this model is based on model 5 (AncestralSizeChange-Split-AsymmetricMigration), but after population divergence, the GOC population remained at NGOC for TD generations. The GOC population then experienced a size change event from NGOC to NGOC2 that occurred TC generations before present.
To evaluate if unsampled (ghost) populations contribute to the total migration into the GOC population, we included two feasible ghost populations into the selected two-population model, the South Pacific (SP), which diverged from the North Pacific ~1.8 Mya according to mtDNA data31; and the Western North Pacific (WNP) population, which has been suggested to breed separately from the ENP27 potentially since the recent Pleistocene’s interglacial periods23. For our demographic inference with ∂a∂i, we ran only one ghost model using the same initial parameters as in our chosen model. The initial parameter for the divergence time of ghost population was set at the expansion time in the ENP population 3Epoch model, and the size of the ghost population was fixed to the size of the ancestral population before divergence to find the best parameter space. In contrast, for fastsimcoal2 we constrained the lower and upper bounds for the divergence time of the ghost populations based on the previous knowledge mentioned above to 35,000 ~ 200,000 generations ago for the SP population and 100 ~ 10,000 generations ago for the WNP. We also fixed the size of the ghost populations to 30,000 haploids, approximately the same size of the ancestral population before the divergence.
Fastsimcoal
The coalescent simulation approach fastsimcoal2 was employed to infer parameters and composite likelihoods for the demographic models specified above. Each inference was performed using the Expectation‐Conditional Maximization (ECM) algorithm102, using 60 ECM cycles (-L 60), in which each E-step consisted of 1,000,000 coalescent trees (-n 1000000), computing only the SFS for the minor allele (-m) with the following command line.
fsc26 -t $header.tpl -e $header.est -n 1000000 -m -M -L 60 -q
The starting parameters were chosen from a uniform distribution with an imposed minimum value and flexible upper boundary. The expected SFS under the fastsimcoal2 model parameters were compared to the empirical SFS and the multinomial log-likelihood was calculated. For single-population and joint populations models, we performed 100 and 50 replicates of the inference, respectively, to confirm that both parameters and log-likelihoods converged and parameters with the maximum log-likelihood were chosen. This difference in the number of replicates is due to the inference of two-population model parameters being more computationally expensive and time-consuming. All estimated size parameters were obtained as the number of haploids and converted to diploids, whereas time parameters were inferred as the number of generations before present day. To control for inflations in log-likelihood estimates in models with more parameters, we performed a likelihood ratio test (LRT) for nested models with its more immediate complex model (e.g., 2Epoch vs. 1Epoch, 3Epoch vs. 2Epoch) using the equation: –2 * [loglikelihood (simple)–loglikelihood (complex)]. The LRT significance was evaluated with a chi-square test (iχ2) with one or two degrees of freedom, depending on the number of parameter differences between models.
The parameter confidence intervals were obtained using a parametric bootstrap43 following the simulation functionality described in fastsimcoal2’s manual (http://cmpg.unibe.ch/software/fastsimcoal26/man/fastsimcoal26.pdf page. 56). For each model, we simulated 100 SNP-based SFS from the best-fit parameters in the observed data with ~4 million (3,927,079 for ENP single-population models, 3,908,444 for GOC single-population models and 3,864,185 for two-population models) non-recombining segments of 100 bp, mimicking the same number of observed sites. Parameters were estimated from 20 random starting conditions for the 100 bootstrapped SFS datasets using the same settings as described above for the empirical data. 95% confidence intervals of the best-fit parameters were obtained adding and subtracting two standard deviations of the 100 bootstrap estimated parameters from the empirical best-fit parameters.
∂a∂i
For demographic inference using ∂a∂i, haploid sample sizes plus 5, 15, and 25 were used as extrapolation grid points44. Lower and upper bounds of model parameters were imposed based on prior knowledge of population history, and starting parameters under these boundaries were chosen from previous knowledge or outputs from nested runs and randomized with a fold=1. We used the optimize_log function as our optimization algorithm, and calculated the multinomial log-likelihood for the expected SFS obtained from each optimization.
Best‐fit parameter sets of each model were scaled using NANC calculated by the equation , where L is the total sequence length of the neutral region (392,707,916 bp for ENP single-population models, 390,844,414 bp for GOC single-population models and 386,418,461 bp for two-population models), μ is the fin whale mutation rate (2.77E-08 mutations/generation/bp)37, and θ is the optimal value of theta for the given model. Population size parameters were adjusted by NANC into diploids and time parameters were re-scaled by 2NANC into generations. The model uncertainty was assessed by estimating 95% confidence intervals of the best-fit parameters using a Godambe Information Matrix (GIM) with bootstrapped data103. The bootstrapped data was obtained by dividing the genome into fragments of 4 Mb and generating 100 bootstrap pseudo-replicate datasets by resampling from those, which in total amounts for sampling 400 Mb that approximates the length of the putatively neutral data analyzed in our demographic inferences.
One hundred replicates of each model were performed with randomized starting parameters to assess convergence of the inferred parameters and composite likelihood. Parameters with the maximum log-likelihood among replicates from each model were selected and the expected SFS under these parameters was compared with the empirical SFS. LRT was calculated as previously described.
Additionally, to ensure that the results from the ENP population 3-epoch model were in fact reflecting the recent bottleneck caused by whaling, we simulated the SFS under ∂a∂i’s inferred demographic scenario using msprime v.0.7.4104. The simulated SFS were generated using a recombination rate of 1E-8 cross-over events per base pair per generation and a mutation rate of 2.77E-8 per base pair per generation37, with 1000 replicates and a chunk size of 2 Mb. Visual inspection was performed to validate the fit of simulated SFS to the empirical data. We also performed ∂a∂i inference on msprime simulated SFS using the same settings for empirical SFS and tested if we could obtain similar parameter estimates as the empirical data to confirm that we had the power to detect a recent population contraction.
To account for the correlations of current population size (NCUR) and time of most recent contraction (TCUR), we carried out grid searches to find the range of possible parameter pairs that are within two log-likelihood units of the maximum likelihood estimate (M.L.E; see Supplemental Methods).
Model selection
We selected the models that more likely represent the demographic history of the populations from the demographic models without any constraints (i.e., not fixing any of the parameters to a certain value). To select the best demographic model, we considered several features of our demographic inference results. First, the log-likelihood of the models should be the highest given the satisfaction of the following criteria. Second, a good fit of the expected SFS to the empirical SFS. Third, the estimated parameter values between the two inference methods that we used (i.e., fastsimcoal2 and ∂a∂i) should be consistent, especially the direction of population size change (expansion vs contraction). Fourth, the log-likelihood of the top 10 replicated runs for each model should converge. We consider that a model has good convergence if the log-likelihood difference between the best run and the 10th best run of the model was no more than 25 log-likelihood units. Fifth, the model should have significantly better LRT than the more immediate nested model and this LRT significance should be consistent in fastsimcoal2 and ∂a∂i. Sixth, the range of the confidence intervals should not be unrealistically large. Models meeting the above criteria, were chosen as the ones representing the demographic history of fin whale populations. For the ENP single-population model, after choosing the 3Epoch model according to the previous criteria, we tried to confirm the findings of this unconstrained model by running it with the parameter reflecting the time of the putative whaling bottleneck fixed at 2 and 3 generations. Results show that models with fixed parameters have better log-likelihoods and do not significantly change the parameter values obtained with the unconstraint model, indicating that the estimates of the unconstrained model are a good representation of the demographic history of this population. For the two-population models, we ran the Split-AsymmetricMigration-ENPChangeTw2 model with the time of the whaling bottleneck fixed at two generations, such model was not selected.
Quantifying putatively deleterious variation
Two lines of evidence were used to quantify relative levels of putatively deleterious variation in the ENP and GOC populations. We focused on mutations within protein-coding regions, which are more likely to have direct fitness impacts and identified derived alleles within four mutation types: synonymous, tolerated nonsynonymous, deleterious nonsynonymous, and LOF. The nonsynonymous mutations were classified as putatively tolerated (SIFT score ≥0.05) or deleterious (SIFT score <0.05) based on phylogenetic constraints using SIFT4G84. The LOF mutations are predicted to eliminate or severely inhibit gene function and include splice acceptor, splice donor, start lost and stop gained mutations. LOF mutations were identified using the default settings in snpEff83, which utilized the LOF definition in ref. 69. We normalized for differences in missing data across individuals by the average number of called genotypes using R package vcfR v.1.12.0105. Since the dominance for variants in natural populations is poorly quantified, we assumed two extreme scenarios: (1) when the dominance of all variants is recessive (h = 0) and the fitness is only reduced in homozygous derived genotypes; or (2) when variants are additive (h = 0.5) and the fitness decreases linearly to the number of derived alleles. The real-life fitness impact probably lies between these two scenarios. We did not assume dominant variants (0.5 < h ≤ 1) given that segregating deleterious variants are very unlikely to be dominant51.
First, two-tailed MWU tests were used to evaluate if the normalized count of derived alleles and homozygotes varied significantly between the ENP and GOC populations in these four mutation types48. The count of derived putatively deleterious alleles, including the deleterious nonsynonymous and LOF alleles, are considered a proxy for additive genetic load, while the count of derived homozygotes provides a proxy for recessive load106,107.
Second, we calculated the relative accumulation of mutations RXY and homozygous mutations R2XY for the four mutation types using methods adapted from ref. 53. Here we designated the GOC population as population X and the ENP population as population Y. At each polymorphic site , we defined as the count of derived alleles at that site in a sample of haploid genomes from population X and as the count of derived alleles in a sample of haploid genomes from population Y. The expected number of derived mutations observed only in population X but not in population Y is defined as:
I |
And the expected number of homozygous derived mutations observed only in but not in is defined as:
II |
The ratio statistics is further defined as:
III |
IV |
The standard errors of RXY and R2XY were estimated from a weighted-block jackknife53. If selection has been equally effective and mutation rates remain the same in both populations, the RXY and R2XY statistics are expected to be 1. Z score test was used to evaluate the significance of the deviation from the null expectation.
Lastly, we assessed the robustness of the four mutation types across the genome using an additional mutation impact scoring system implemented by snpEff. SnpEff classifies variants’ impact severity into HIGH, MODERATE, LOW and MODIFIER categories based on their effect types. We excluded the MODIFIER category because these mutations are mostly non-protein-coding. We additionally limited the MODERATE and LOW categories within the gtf identified coding sequence (CDS) region to exclude non-protein-coding mutations as well. Two-tailed MWU tests and analyses were performed as described above to evaluate the variation in the count of derived alleles and homozygotes (Fig. S19). For all above analyses, we removed the six individuals that were also discarded in the demographic inference.
Genetic load simulations
We conducted forward-in-time population genetic simulations using SLiM v.3.3.254. For our simulations, we assumed a 10 Mb chromosomal segment with a uniform recombination rate of 1E-8 cross-over events per base pair per generation and randomly generated intergenic, intronic, and exonic regions, following ref. 108. The length of the 10 Mb chromosomal segment was chosen as a tradeoff between computation efficiency and genomic representation. Within this chromosomal segment, mutations occurred at a rate of 2.77E-8 per base pair per generation37, with deleterious (nonsynonymous) mutations occurring only in exonic regions at a ratio of 2.31:1 to neutral (synonymous) mutations109. Selection coefficients for deleterious mutations were drawn from a distribution estimated from human data55. We assumed an inverse relationship between selection coefficients and dominance coefficients, given empirical evidence that strongly deleterious mutations also tend to be highly recessive51,110. Specifically, we assumed that strongly deleterious mutations (s < −0.01) were fully recessive (h = 0.0), moderately deleterious mutations (−0.01 ≤ s < −0.001) were partially recessive (h = 0.1), and weakly deleterious mutations (−0.001 <s ≤ −0.00001) were nearly additive (h = 0.4).
Using this simulation framework, we simulated under our two best-fit demographic models, including a single-population model for the ENP population, and a two-population divergence model for the ENP and GOC populations (see above for details). For both models, we assumed a burn-in duration of 10x the ancestral population size. During the simulation, we kept track of several quantities for each simulated population, including mean genetic load (the reduction in individual fitness, calculated multiplicatively across sites), mean genome-wide heterozygosity, mean inbreeding coefficient (here measured as FROH, where the minimum ROH length was 1 Mb), and the mean number of strongly deleterious alleles (s < −0.01), moderately deleterious alleles (−0.01 ≤ s < −0.001), and weakly deleterious alleles (−0.001 <s ≤ −0.00001) per individual. These quantities were estimated using a sample size of 40 individuals. For all simulations, we ran 25 replicates and averaged these quantities across replicates.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We dedicate this work to Robert K. Wayne, a pioneer in the field of conservation genetics and conservation biology. We would like to thank the members of the research program of marine mammals from Universidad Autónoma de Baja California Sur for their help collecting the samples in the Gulf of California. Sergio Flores-Ramírez for his initial support in the Gulf of California project. Cei Abreu-Goodger for his support and for providing laboratory space during the initial analysis of the data, Phil Morin for reviewing an early draft of the manuscript. Unpublished genome assemblies and sequencing data for the Rice’s whale and fin whale are used with permission from the DNA Zoo Consortium (dnazoo.org). For the fin whale DNAzoo assemblies, the sample for the assembly was collected by The Marine Mammal Center under the Marine Mammal Health and Stranding Program (MMHSPR) Permit No. 18786-04 issued by the National Marine Fisheries Service (NMFS) in accordance with the Marine Mammal Protection Act (MMPA) and Endangered Species Act (ESA). The work at DNA Zoo was performed under Marine Mammal Health and Stranding Response Program (MMHSRP) Permit No. 18786-03. This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by UCLA Institute for Digital Research and Education’s Research Technology Group. This work was supported by the Mexican National Council for Science and Technology (CONACYT) grant FONCICYT/50/2016, National Science Foundation (DEB Small Grant #1556705), UCMEXUS-CONACYT collaborative grant 2006. SNM was supported by CONACYT Postdoctoral Fellowship 724094 and the Mexican Secretariat of Agriculture and Rural Development Postdoctoral Fellowship. M.L. was supported by the University of California, Los Angeles Department of Ecology and Evolutionary Biology (EEB) Summer Research Fellowship. K.E.L. and C.K. were supported by NIH grant R35GM119856 to K.E.L. A.C.B. was supported by the Biological Mechanisms of Healthy Aging Training Program NIH T32AG066574. M.J.P.-A. was supported by ANID under Grant Program FONDECYT Iniciación 11170182. E.P. and M.J.P.-A. were supported by ANID Millennium Science Initiative Program ICN2021_002.
Author contributions
S.N.-M., A.M.-E., and R.K.W. conceived the study. A.M.-E. and R.K.W. contributed reagents, materials, and analysis tools. J.U.R., L.V.-G., and F.I.A. collected and contributed the samples and sample information. S.N.-M. carried out laboratory work. S.N.-M., M.L., P.N.-V., and C.K. performed the analysis of the data. A.C.B., J.A.R. and A.R. provided scripts for some analyses. A.M.-E., A.C.B., J.A.R., A.R., M.J.P.-A., E.P., and K.E.L. provided guidance and advised the project. A.M.-E., K.E.L., and R.K.W. performed funding acquisition. S.N.-M., M.L., P.N.-V., CC.K., and R.K.W. wrote the manuscript with input from all the authors.
Peer review
Peer review information
Nature Communications thanks Carlos Carreras and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
The raw sequence data generated in this study are deposited in NCBI’s Sequence Read Archive (SRA) database under accession numbers SRR23615109 - SRR23615158 (BioSample SAMN33439338 - SAMN33439387; BioProject PRJNA938516; see Table S1 for details). The sequence data for the additional mysticete species used in this study are available in NCBI’s SRA database under accession numbers SRR5665640, SRR1802584, SRR5665644, and SRR5665639, please see Table S1 for details. The cpg island data are available in the UCSC genome browser (http://hgdownload.soe.ucsc.edu/goldenPath/balAcu1/database/). The balenopterid genomes assemblies used for the comparison shown in Table S16 are available in NCBI’s Assembly database under accession numbers GCA_008795845.1, GCA_023338255.1, GCF_000493695.1, GCF_009873245.2, GCA_004329385.1, or in the DNA Zoo database under accession names Balaenoptera_physalus (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_physalus/) and Balaenoptera_ricei (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_ricei/). Source data are provided in this paper.
Code availability
The scripts used to perform the sequence data processing and analyses are publicly available in a GitHub repository that can be accessed through Zenodo111 at 10.5281/zenodo.7980107.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Sergio F. Nigenda-Morales, Meixi Lin.
Deceased: Robert K. Wayne.
Contributor Information
Sergio F. Nigenda-Morales, Email: snigenda@csusm.edu
Meixi Lin, Email: meixilin@ucla.edu.
Kirk E. Lohmueller, Email: klohmueller@g.ucla.edu
Andrés Moreno-Estrada, Email: andres.moreno@cinvestav.mx.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-40052-z.
References
- 1.Ceballos G, Ehrlich PR. Mammal population losses and the extinction crisis. Science. 2002;296:904–907. doi: 10.1126/science.1069349. [DOI] [PubMed] [Google Scholar]
- 2.Pimm SL, et al. The biodiversity of species and their rates of extinction, distribution, and protection. Science. 2014;344:1246752–1246752. doi: 10.1126/science.1246752. [DOI] [PubMed] [Google Scholar]
- 3.Waters CN, et al. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 2016;351:aad2622–aad2622. doi: 10.1126/science.aad2622. [DOI] [PubMed] [Google Scholar]
- 4.Lande R. Risks of population extinction from demographic and environmental stochasticity and random catastrophes. Am. Nat. 1993;142:911–927. doi: 10.1086/285580. [DOI] [PubMed] [Google Scholar]
- 5.Reed DH, Frankham R. Correlation between fitness and genetic diversity. Conserv. Biol. 2003;17:230–237. doi: 10.1046/j.1523-1739.2003.01236.x. [DOI] [Google Scholar]
- 6.Melbourne BA, Hastings A. Extinction risk depends strongly on factors contributing to stochasticity. Nature. 2008;454:100–103. doi: 10.1038/nature06922. [DOI] [PubMed] [Google Scholar]
- 7.Frankham R. Genetics and extinction. Biol. Conserv. 2005;126:131–140. doi: 10.1016/j.biocon.2005.05.002. [DOI] [Google Scholar]
- 8.Willi Y, Van Buskirk J, Hoffmann AA. Limits to the adaptive potential of small populations. Annu. Rev. Ecol. Evol. Syst. 2006;37:433–458. doi: 10.1146/annurev.ecolsys.37.091305.110145. [DOI] [Google Scholar]
- 9.Wright, S. Evolution in Mendelian populations. Genetics. 1931;16:97. doi: 10.1093/genetics/16.2.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mills LS, Allendorf FW. The one-migrant-per-generation rule in conservation and management. Conserv. Biol. 1996;10:1509–1518. doi: 10.1046/j.1523-1739.1996.10061509.x. [DOI] [Google Scholar]
- 11.Frankham R. Genetic rescue of small inbred populations: meta-analysis reveals large and consistent benefits of gene flow. Mol. Ecol. 2015;24:2610–2618. doi: 10.1111/mec.13139. [DOI] [PubMed] [Google Scholar]
- 12.Wang J. Application of the one-migrant-per-generation rule to conservation and management. Conserv. Biol. 2004;18:332–343. doi: 10.1111/j.1523-1739.2004.00440.x. [DOI] [Google Scholar]
- 13.Kyriazis CC, Wayne RK, Lohmueller KE. Strongly deleterious mutations are a primary determinant of extinction risk due to inbreeding depression. Evol. Lett. 2021;5:33–47. doi: 10.1002/evl3.209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Díez-del-Molino D, Sánchez-Barreiro F, Barnes I, Gilbert MTP, Dalén L. Quantifying temporal genomic erosion in endangered species. Trends Ecol. Evol. 2018;33:176–185. doi: 10.1016/j.tree.2017.12.002. [DOI] [PubMed] [Google Scholar]
- 15.Springer AM, et al. Sequential megafaunal collapse in the North Pacific Ocean: an ongoing legacy of industrial whaling? PNAS. 2003;100:12223–12228. doi: 10.1073/pnas.1635156100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Clapham PJ, Young SB, Brownell RL., Jr. Baleen whales: conservation issues and the status of the mostendangered populations. Mammal. Rev. 1999;29:35–60. doi: 10.1046/j.1365-2907.1999.00035.x. [DOI] [Google Scholar]
- 17.Baker CS, Clapham PJ. Modelling the past and future of whales and whaling. Trends Ecol. Evol. 2004;19:365–371. doi: 10.1016/j.tree.2004.05.005. [DOI] [PubMed] [Google Scholar]
- 18.Jackson JA, Patenaude NJ, Carroll EL, Baker CS. How few whales were there after whaling? Inference from contemporary mtDNA diversity. Mol. Ecol. 2008;17:236–251. doi: 10.1111/j.1365-294X.2007.03497.x. [DOI] [PubMed] [Google Scholar]
- 19.Palsbøll PJ, Peery MZ, Olsen MT, Beissinger SR, Bérubé M. Inferring recent historic abundance from current genetic diversity. Mol. Ecol. 2013;22:22–40. doi: 10.1111/mec.12094. [DOI] [PubMed] [Google Scholar]
- 20.Beichman, A. C., Huerta-Sanchez, E. & Lohmueller, K. E. Using genomic data to infer historic population dynamics of nonmodel organisms. Annu. Rev. Ecol. Evol. Syst.49, 433–456 (2018).
- 21.Beland SL, Frasier BA, Darling JD, Frasier TR. Using pre- and postexploitation samples to assess the impact of commercial whaling on the genetic characteristics of eastern North Pacific gray and humpback whales and to compare methods used to infer historic demography. Mar. Mammal. Sci. 2020;36:398–420. doi: 10.1111/mms.12652. [DOI] [Google Scholar]
- 22.Roman J, Palumbi SR. Whales before whaling in the North Atlantic. Science. 2003;301:508–510. doi: 10.1126/science.1084524. [DOI] [PubMed] [Google Scholar]
- 23.Alter SE, Rynes E, Palumbi SR. DNA evidence for historic population size and past ecosystem impacts of gray whales. Proc. Natl. Acad. Sci. USA. 2007;104:15162–15167. doi: 10.1073/pnas.0706056104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ruegg K, et al. Long-term population size of the North Atlantic humpback whale within the context of worldwide population structure. Conserv. Genet. 2013;14:103–114. doi: 10.1007/s10592-012-0432-0. [DOI] [Google Scholar]
- 25.Wolf M, de Jong M, Halldórsson SD, Árnason Ú, Janke A. Genomic impact of whaling in North Atlantic Fin Whales. Mol. Biol. Evol. 2022;39:msac094. doi: 10.1093/molbev/msac094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rocha RC, Clapham PJ, Ivashchenko YV. Emptying the oceans: a summary of industrial Whaling catches in the 20th century. Mar. Fish. Rev. 2014;76:37–48. doi: 10.7755/MFR.76.4.3. [DOI] [Google Scholar]
- 27.Mizroch SA, Rice DW, Zwiefelhofer D, Waite J, Perryman WL. Distribution and movements of fin whales in the North Pacific Ocean. Mammal. Rev. 2009;39:193–227. doi: 10.1111/j.1365-2907.2009.00147.x. [DOI] [Google Scholar]
- 28.Jiménez MEL, Palacios DM, Legorreta AJ, Urbán JR, Mate BR. Fin whale movements in the Gulf of California, Mexico, from satellite telemetry. PLoS ONE. 2019;14:e0209324. doi: 10.1371/journal.pone.0209324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nigenda-Morales S, Flores-Ramirez S, Urban-R, J, Vazquez-Juarez R. MHC DQB-1 polymorphism in the gulf of california fin whale (Balaenoptera physalus) population. J. Heredity. 2008;99:14–21. doi: 10.1093/jhered/esm087. [DOI] [PubMed] [Google Scholar]
- 30.Rivera-León VE, et al. Long-term isolation at a low effective population size greatly reduced genetic diversity in Gulf of California fin whales. Sci. Rep. 2019;9:12391. doi: 10.1038/s41598-019-48700-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pérez-Alvarez, M. J. et al. Contrasting phylogeographic patterns among Northern and Southern Hemisphere fin whale populations with new data from the Southern Pacific. Front. Mar. Sci.8, 630233 (2021).
- 32.Moore JE, Barlow J. Bayesian state-space model of fin whale abundance trends from a 1991-2008 time series of line-transect surveys in the California Current: Bayesian trend analysis from line-transect data. J. Appl. Ecol. 2011;48:1195–1205. doi: 10.1111/j.1365-2664.2011.02018.x. [DOI] [Google Scholar]
- 33.Rousset, F. Genetic structure and selection in subdivided populations. (Princeton University Press, 2004).
- 34.Bérubé M, Urbán J, Dizon AE, Brownell RL, Palsbøll PJ. Genetic identification of a small and highly isolated population of fin whales (Balaenoptera physalus) in the Sea of Cortez, México. Conserv. Genet. 2002;3:183–190. doi: 10.1023/A:1015224730394. [DOI] [Google Scholar]
- 35.Morin PA, et al. Reference genome and demographic history of the most endangered marine mammal, the vaquita. Mol. Ecol. Resour. 2021;21:1008–1020. doi: 10.1111/1755-0998.13284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Robinson JA, et al. The critically endangered vaquita is not doomed to extinction by inbreeding depression. Science. 2022;376:635–639. doi: 10.1126/science.abm1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yim H-S, et al. Minke whale genome and aquatic adaptation in cetaceans. Nat. Genet. 2014;46:88–92. doi: 10.1038/ng.2835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Árnason Ú, Lammers F, Kumar V, Nilsson MA, Janke A. Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci. Adv. 2018;4:eaap9873. doi: 10.1126/sciadv.aap9873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Narasimhan V, et al. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32:1749–1751. doi: 10.1093/bioinformatics/btw044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bertrand AR, Kadri NK, Flori L, Gautier M, Druet T. RZooRoH: an R package to characterize individual genomic autozygosity and identify homozygous‐by‐descent segments. Methods Ecol. Evol. 2019;10:860–866. doi: 10.1111/2041-210X.13167. [DOI] [Google Scholar]
- 41.Kirin M, Mcquillan R, Franklin CS, Campbell H, Mckeigue PM. Genomic runs of homozygosity record population history and consanguinity. PLoS One. 2010;5:13996. doi: 10.1371/journal.pone.0013996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Browning SR. Estimation of pairwise identity by descent from dense genetic marker data in a population sample of haplotypes. Genetics. 2008;178:2123–2132. doi: 10.1534/genetics.107.084624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. Robust demographic inference from genomic and SNP data. PLOS Genet. 2013;9:e1003905. doi: 10.1371/journal.pgen.1003905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695. doi: 10.1371/journal.pgen.1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Taylor, B. L., Chivers, S. J., Larese, J. & Perrin, W. F. Generation length and percent mature estimates for IUCN assessments of cetaceans. http://swfsc.noaa.gov/BarbTaylorPubs.aspx (2007).
- 46.McCoy RC, Garud NR, Kelley JL, Boggs CL, Petrov DA. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population. Mol. Ecol. 2014;23:136–150. doi: 10.1111/mec.12591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Clark PU, et al. The last glacial maximum. Science. 2009;325:710–714. doi: 10.1126/science.1172873. [DOI] [PubMed] [Google Scholar]
- 48.Robinson JA, et al. Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction. Sci. Adv. 2019;5:eaau0757. doi: 10.1126/sciadv.aau0757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 2007;8:610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]
- 50.Boyko AR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4:e1000083. doi: 10.1371/journal.pgen.1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Huber CD, Durvasula A, Hancock AM, Lohmueller KE. Gene expression drives the evolution of dominance. Nat. Commun. 2018;9:2750. doi: 10.1038/s41467-018-05281-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 2014;46:220–224. doi: 10.1038/ng.2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Do R, et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 2015;47:126. doi: 10.1038/ng.3186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Haller BC, Messer PW. SLiM 3: forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 2019;36:632–637. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kim BY, Huber CD, Lohmueller KE. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. Genetics. 2017;206:345–361. doi: 10.1534/genetics.116.197145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Baker CS, et al. Abundant mitochondrial DNA variation and world-wide population structure in humpback whales. Proc. Natl. Acad. Sci. 1993;90:8239–8243. doi: 10.1073/pnas.90.17.8239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Nei, M., Maruyama, T. & Chakraborty, R. The bottleneck effect and genetic variability in populations. Evolution29, 1–10 (1975). [DOI] [PubMed]
- 58.Amos B. Levels of genetic variability in cetacean populations have probably changed little as a result of human activities. Rep. Int. Whal. Comm. 1996;46:657–658. [Google Scholar]
- 59.Brüniche-Olsen A, et al. The inference of gray whale (Eschrichtius robustus) historical population attributes from whole-genome sequences. BMC Evol. Biol. 2018;18:87. doi: 10.1186/s12862-018-1204-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Archer FI, et al. Mitogenomic phylogenetics of fin whales (Balaenoptera physalus spp.): genetic evidence for revision of subspecies. PLoS One. 2013;8:e63396. doi: 10.1371/journal.pone.0063396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Aguilar, A. & García-Vernet, R. Fin whale: balaenoptera physalus. in Encyclopedia of marine mammals 368–371 (Elsevier, 2018).
- 62.Essington, T. E. 5. Pelagic ecosystem response to a century of commercial fishing and whaling. In: Whales, Whaling, and Ocean Ecosystems (eds. et al.) 38–49 (University of California Press, 2007).
- 63.Hoffmann AA, Sgrò CM, Kristensen TN. Revisiting adaptive potential, population size, and conservation. Trends Ecol. Evol. 2017;32:506–517. doi: 10.1016/j.tree.2017.03.012. [DOI] [PubMed] [Google Scholar]
- 64.Slatkin M. Gene flow and the geographic structure of natural populations. Science. 1987;236:787–792. doi: 10.1126/science.3576198. [DOI] [PubMed] [Google Scholar]
- 65.Hedrick PW, Garcia-Dorado A. Understanding inbreeding depression, purging, and genetic rescue. Trends Ecol. Evol. 2016;31:940–952. doi: 10.1016/j.tree.2016.09.005. [DOI] [PubMed] [Google Scholar]
- 66.Xue Y, et al. Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science. 2015;348:242–245. doi: 10.1126/science.aaa3952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Grossen C, Guillaume F, Keller LF, Croll D. Purging of highly deleterious mutations through severe bottlenecks in Alpine ibex. Nat. Commun. 2020;11:1–12. doi: 10.1038/s41467-020-14803-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 2011;12:628–640. doi: 10.1038/nrg3046. [DOI] [PubMed] [Google Scholar]
- 69.MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012;335:823–828. doi: 10.1126/science.1215040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Johri P, Charlesworth B, Jensen JD. Toward an evolutionarily appropriate null model: jointly inferring demography and purifying selection. Genetics. 2020;215:173–192. doi: 10.1534/genetics.119.303002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Fossi MC, et al. Are baleen whales exposed to the threat of microplastics? A case study of the Mediterranean fin whale (Balaenoptera physalus) Mar. Pollut. Bull. 2012;64:2374–2379. doi: 10.1016/j.marpolbul.2012.08.013. [DOI] [PubMed] [Google Scholar]
- 72.Shafer AB, et al. Genomics and the challenging translation into conservation practice. Trends Ecol. Evol. 2015;30:78–87. doi: 10.1016/j.tree.2014.11.009. [DOI] [PubMed] [Google Scholar]
- 73.Lambertsen RH. A biopsy system for large whales and its use for cytogenetics. J. Mammal. 1987;68:443–445. doi: 10.2307/1381495. [DOI] [Google Scholar]
- 74.Harlin AD, Würsig B, Baker CS, Markowitz TM. Skin swabbing for genetic analysis: application to dusky dolphins (Lagenorhynchus obscurus) Mar. Mammal. Sci. 1999;15:409–425. doi: 10.1111/j.1748-7692.1999.tb00810.x. [DOI] [Google Scholar]
- 75.Van der Auwera GA, et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics. 2013;43:11–10. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
- 77.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32:292–294. doi: 10.1093/bioinformatics/btv566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.McKenna A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Morgulis A, Gertz EM, Schäffer AA, Agarwala R. WindowMasker: window-based masker for sequenced genomes. Bioinforma. (Oxf., Engl.) 2006;22:134–141. doi: 10.1093/bioinformatics/bti774. [DOI] [PubMed] [Google Scholar]
- 82.Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. http://www.repeatmasker.org (2013).
- 83.Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC. SIFT missense predictions for genomes. Nat. Protoc. 2016;11:1. doi: 10.1038/nprot.2015.123. [DOI] [PubMed] [Google Scholar]
- 85.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zheng X, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Zheng, X. et al. SeqArray – a storage-efficient high-performance data format for WGS variant calls. Bioinformatics33, 2251–2257 (2017). [DOI] [PMC free article] [PubMed]
- 89.Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution38, 1358–1370 (1984). [DOI] [PubMed]
- 90.Hudson RR, Boos DD, Kaplan NL. A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 1992;9:138–151. doi: 10.1093/oxfordjournals.molbev.a040703. [DOI] [PubMed] [Google Scholar]
- 91.Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
- 94.Yu G, Smith D, Zhu H, Guan Y, Lam TT-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017;8:28–36. doi: 10.1111/2041-210X.12628. [DOI] [Google Scholar]
- 95.Pool JE, Nielsen R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009;181:711–719. doi: 10.1534/genetics.108.098095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Thompson EA. Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013;194:301–326. doi: 10.1534/genetics.112.148825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Foote AD, et al. Runs of homozygosity in killer whale genomes provide a global record of demographic histories. Mol. Ecol. 2021;30:6162–6177. doi: 10.1111/mec.16137. [DOI] [PubMed] [Google Scholar]
- 98.Dumont BL, Payseur BA. Evolution of the genomic rate of recombination in mammals. Evolution. 2008;62:276–294. doi: 10.1111/j.1558-5646.2007.00278.x. [DOI] [PubMed] [Google Scholar]
- 99.Han E, Sinsheimer JS, Novembre J. Characterizing bias in population genetic inferences from low-coverage sequencing data. Mol. Biol. Evol. 2014;31:723–735. doi: 10.1093/molbev/mst229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Blischak PD, Barker MS, Gutenkunst RN. Inferring the demographic history of inbred species from genome-wide SNP frequency data. Mol. Biol. Evol. 2020;37:2124–2136. doi: 10.1093/molbev/msaa042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Beichman, A. C. et al. Genomic analyses reveal range‐wide devastation of sea otter populations. Mol. Ecol.32, 281–298 (2022). [DOI] [PMC free article] [PubMed]
- 102.Meng X-L, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80:267–278. doi: 10.1093/biomet/80.2.267. [DOI] [Google Scholar]
- 103.Coffman AJ, Hsieh PH, Gravel S, Gutenkunst RN. Computationally efficient composite likelihood statistics for demographic inference. Mol. Biol. Evol. 2016;33:591–593. doi: 10.1093/molbev/msv255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Kelleher J, Etheridge AM, McVean G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 2016;12:e1004842. doi: 10.1371/journal.pcbi.1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Knaus BJ, Grünwald NJ. VCFR: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 2017;17:44–53. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]
- 106.Lohmueller KE, et al. Proportionally more deleterious genetic variation in European than in African populations. Nature. 2008;451:994–997. doi: 10.1038/nature06611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Beichman AC, et al. Aquatic adaptation and depleted diversity: a deep dive into the genomes of the Sea Otter and Giant Otter. Mol. Biol. Evol. 2019;36:2631–2655. doi: 10.1093/molbev/msz101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Mooney JA, et al. Understanding the hidden complexity of latin American population isolates. Am. J. Hum. Genet. 2018;103:707–726. doi: 10.1016/j.ajhg.2018.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Huber CD, Kim BY, Marsden CD, Lohmueller KE. Determining the factors driving selective effects of new nonsynonymous mutations. Proc. Natl. Acad. Sci. 2017;114:4465–4470. doi: 10.1073/pnas.1619508114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Agrawal AF, Whitlock MC. Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics. 2011;187:553–566. doi: 10.1534/genetics.110.124560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Nigenda, S., Lin, M. & Nuñez-Valencia, P.G. snigenda/Fin_whale_Population_Genomics: V1.0. 10.5281/zenodo.7980107 (2023).
- 112.Vihtakari, M. ggOceanMaps: plot data on oceanographic maps using ‘ggplot2’ R package version 0.4.3. https://mikkovihtakari.github.io/ggOceanMaps (2020).
- 113.Amante C, Eakins BW. ETOPO1 1 Arc-minute global relief model: procedures, data sources and analysis. NOAA Tech. Memorandum NESDIS. 2009;NGDC-24:19. [Google Scholar]
- 114.Grant KM, et al. Sea-level variability over five glacial cycles. Nat. Commun. 2014;5:5076. doi: 10.1038/ncomms6076. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw sequence data generated in this study are deposited in NCBI’s Sequence Read Archive (SRA) database under accession numbers SRR23615109 - SRR23615158 (BioSample SAMN33439338 - SAMN33439387; BioProject PRJNA938516; see Table S1 for details). The sequence data for the additional mysticete species used in this study are available in NCBI’s SRA database under accession numbers SRR5665640, SRR1802584, SRR5665644, and SRR5665639, please see Table S1 for details. The cpg island data are available in the UCSC genome browser (http://hgdownload.soe.ucsc.edu/goldenPath/balAcu1/database/). The balenopterid genomes assemblies used for the comparison shown in Table S16 are available in NCBI’s Assembly database under accession numbers GCA_008795845.1, GCA_023338255.1, GCF_000493695.1, GCF_009873245.2, GCA_004329385.1, or in the DNA Zoo database under accession names Balaenoptera_physalus (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_physalus/) and Balaenoptera_ricei (https://dnazoo.s3.wasabisys.com/index.html?prefix=Balaenoptera_ricei/). Source data are provided in this paper.
The scripts used to perform the sequence data processing and analyses are publicly available in a GitHub repository that can be accessed through Zenodo111 at 10.5281/zenodo.7980107.