Copy Number Variation Shapes Structural Genomic Diversity Associated With Ecological Adaptation in the Wild Tomato Solanum chilense

Kai Wei; Remco Stam; Aurélien Tellier; Gustavo A Silva-Arias

doi:10.1093/molbev/msaf191

. 2025 Aug 5;42(8):msaf191. doi: 10.1093/molbev/msaf191

Copy Number Variation Shapes Structural Genomic Diversity Associated With Ecological Adaptation in the Wild Tomato Solanum chilense

Kai Wei ^1,^2,^✉,^b, Remco Stam ³, Aurélien Tellier ^4,^✉,^b, Gustavo A Silva-Arias ^5,^6,^✉,^b

Editor: Michael Purugganan

PMCID: PMC12378046 PMID: 40760966

Abstract

Copy Number Variation (CNV) is a prevalent type of variation affecting large genomic regions which contributes to both genetic diversity and ecological adaptation in plants. The target genes involved in adaptation through CNV in tomato and its wild relatives remain unexplored at the population level. Therefore, we characterized the CNV landscape of Solanum chilense, a wild tomato species adapted to dry habitats, using whole-genome short-read data of 35 individuals from 7 populations. We identified 212,207 CNVs, including 160,926 deletions and 51,281 duplications. We found a higher number of CNVs in diverging populations occupying stressful habitats. CNVs and single-nucleotide polymorphism analyses concordantly revealed the known species' population structure, underscoring the impact of historical demographic and recent colonization events shaping genome-wide CNVs. Furthermore, we identified 3,539 candidate genes with highly divergent CNV profiles across populations. Interestingly, these genes are functionally associated with response to abiotic stress and linked to multiple pathways of flowering time regulation. Gene CNVs in S. chilense exhibit 2 evolutionary trends: gene loss in ancestral lineages distributed in central and southern coast populations and gene gain in the most recent diverged lineage from the southern highland region. Environmental association of the CNVs ultimately linked the dynamics of gene copy number to 6 climatic variables. It suggests that natural selection has likely shaped CNV patterns in stress-response genes, promoting the colonization of contrasting habitats. Our findings provide insights into the role of CNV underlying adaptation during recent range expansion.

Keywords: flowering time, population genomics, abiotic stress response, ecological adaptation, gene duplication and deletion, range expansion and colonization

Introduction

Copy number variation (CNV) is the primary type of structural variation (SV) caused by genomic rearrangements, which mainly includes deletion (DEL) and duplication (DUP) events resulting from the loss and gain of DNA segments (Feuk et al. 2006; Żmieńko et al. 2014). It is expected that CNV has a more significant impact on gene function than single-nucleotide polymorphisms (SNPs) because it covers more base pairs (Shaikh et al. 2009; Hämälä et al. 2021) and has a higher per-locus mutation rate than SNPs (Lupski 2007). CNV is recognized as an essential driver of genomic divergence and local adaptation (Rinker et al. 2019; Hämälä et al. 2021; Marszalek-Zenczak et al. 2023). Genome-wide studies confirm the importance of CNV in stress response and yield improvement in multiple plants, such as maize (Springer et al. 2009), rice (Fuentes et al. 2019; Qin et al. 2021), and Arabidopsis thaliana (Zmienko et al. 2020; Marszalek-Zenczak et al. 2023). However, such studies have been conducted, so far, in selfing species and/or crops characterized by small effective population size (N_e) and domestication bottlenecks (Alonso-Blanco et al. 2016; Beissinger et al. 2016; Brumlop et al. 2019). Therefore, it is difficult in such species to disentangle the genome-wide effect of random evolutionary processes (genetic drift, chromosomal rearrangements, and demographic history) generating fast and extensive CNVs between populations from that of adaptive processes at specific loci (here positive selection underpinning environmental adaptation [Johri et al. 2022]). The dynamics of gene copy number indeed results from the population history and multiple events, including selection, migration, and recombination (Sudmant et al. 2015; Zhou et al. 2019; Otto et al. 2022; Antinucci et al. 2023; Otto and Wiehe 2023). Indeed, the N_e of populations determines the amount of genetic diversity (SNPs or CNVs) available, the efficiency of positive and negative selection against genetic drift, and the effect of linked selection around sites under selection, thus being a major determinant of the genome architecture (Lynch and Walsh 2007). It is therefore more difficult to disentangle the effect of neutral processes from that of selection in small populations and/or populations with strong past demographic changes, for example, following range expansions with strong bottlenecks (Johri et al. 2022).

The tomato wild relative species Solanum chilense is an excellent model species to study the genetic basis of adaptive evolution when colonizing novel habitats (Böndel et al. 2015; Stam et al. 2019b; Wei et al. 2023). Features such as outcrossing, gene flow, seed banks, and relatively mild bottlenecks during the colonization of new habitats result in high N_e, as reflected by high nucleotide diversity and high recombination rates, meaning that this species has a high adaptive potential (Arunyawat et al. 2007; Stam et al. 2019b; Wei et al. 2023). S. chilense occurs in southern Peru and northern Chile, from mesic to very arid habitats around the Atacama Desert, and is the southernmost distributed species in the tomato clade (Nakazato et al. 2010). Moreover, within S. chilense, 2 lineages expanded southward during 2 independent colonization events (Böndel et al. 2015; Stam et al. 2019b; Raduski and Igić 2021; Wei et al. 2023): one, early divergent toward the coastal part of northern Chile (hereafter the southern coast group, SC), and the other with a recent post-glacial divergence toward the high altitudes of the Chilean Andes (hereafter the southern highland group, SH) (Fig. 1a). The populations currently occurring in the southern coast and southern highland habitats have been shown to exhibit signatures of past positive selection for adaptation to cold, drought, light (photoperiod), heat, and biotic stress (Xia et al. 2010; Fischer et al. 2011; Nosenko et al. 2016; Böndel et al. 2018; Stam et al. 2019b; Wei et al. 2023). These signatures of past adaptive selection suggest a genetic basis for the adaptation to novel habitats during the southward expansion of S. chilense populations toward arid areas around the Atacama desert (Wei et al. 2024). Furthermore, these studies show that it is possible, to some extent, to disentangle in S. chilense the local footprints of strong positive selection (due to local adaptation) from the noise and variation in genome-wide polymorphism patterns due to neutral past demographic events. As advocated in Johri et al. (2022), we rely on providing orthogonal evidence from demographic inference and simulations guiding selective sweeps scans and correlation with climatic data. However, these studies revealed adaptive signatures based on scans for positive selection using solely SNP data: whether CNV can also contribute to adaptation to novel habitats in S. chilense is still unknown.

Fig. 1. — Overview of copy number variation detected in 35 *S. chilense* individuals. a) The map with the distribution of all *S. chilense* populations at the Tomato Genetics Resource Center (TGRC), the 7 *S. chilense* populations in this study (black circles), and the 4 population groups (circles with other colors). The 2 reconstructed southward colonization events, first to the southern coast and second to the southern highland (orange arrows). C: central; SH: southern highland; SC: southern coast. b) The number of CNVs pooled across 5 individuals within each population. DEL: deletion; DUP: duplication. c) The distribution of CNV size. d) The CNV density along the genome is expressed as a count per 1Mb window. e) The number of CNVs overlapping various genomic features for each population.

Reference genomes of several species of the tomato clade, including numerous cultivated tomato varieties, have been sequenced and assembled (Ranjan et al. 2012; Sato et al. 2012; Bolger et al. 2014; Stam et al. 2019a). Three tomato SV sets have recently been constructed based on a tomato-clade pan-genome analysis to investigate the impact of genome rearrangements on gene expression and genomic diversity and provide new genomic resources for the improvement of tomato (Alonge et al. 2020; Zhou et al. 2022; Li et al. 2023). These 3 studies compared cultivated tomato genomes with those of several wild tomato species, including an individual of the S. chilense population LA1969 (belonging to the central group; Fig. 1a). Interestingly, these studies showed that S. chilense exhibited the highest number of SV among all wild and cultivated tomato species, while the closely related wild tomato species S. peruvianum and S. corneliomulleri show only up to half of the number of SVs found in S. chilense (Li et al. 2023). All these 3 species exhibit a similar recent proliferation of transposable elements (Li et al. 2023). As S. chilense occurs in a wide range of environments, this species is of key importance for understanding the role of CNV in speciation and intraspecific diversification processes in the tomato clade. However, the studies mentioned above focused on the pan-genome level across species (wild and cultivated), and an understanding of the role of CNV in local (ecological) adaptation is still lacking, especially for the adaptation to new arid habitats in southern populations of S. chilense.

In this work, we identified genome-wide CNVs and generated copy number (CN) for each gene based on genome-wide short-read sequencing data for 35 S. chilense individuals from 7 populations (5 diploid individuals per population) representing 3 different geographic habitats: 3 central (C) populations, 2 southern highland (SH) populations, and 2 southern coast (SC) populations (Fig. 1a; supplementary Dataset S1, Supplementary Material online). Based on these data, we first identified “candidate genes with highly differentiated CN” (CN-differentiated genes) across populations that are likely candidates associated with the inter-population differentiation in S. chilense. We then measured the evolutionary trend of expansion and contraction of gene CN based on candidate genes for a specified phylogenetic tree. Finally, we associated the dynamics of gene CN with climatic variables to provide evidence for environmental stresses driving CNV dynamics across populations. Our results suggest that CNV contributes to population adaptation to novel habitats in an outcrossing species with a large N_e and genetic diversity. We shed light on the importance of including an analysis of CNVs to complement genomic scans of recent positive selection based on SNPs.

Results

Summary of CNVs in the Genome of S. chilense and Validation of the Pipeline

We identified a total of 212,207 CNVs (160,926 deletions and 51,281 duplications) using the combination of 4 CNV callers and the alignment of each of the 35 whole-genome sequencing datasets (supplementary Dataset S1, Supplementary Material online) to the chromosome-level S. chilense reference genome (Silva-Arias et al. 2025) (supplementary fig. S1, Supplementary Material online; supplementary Dataset S2, Supplementary Material online). We found 73,014 to 94,621 CNVs per population (Fig. 1b; supplementary table S1, Supplementary Material online) and 31,923 to 46,579 CNVs per individual (supplementary table S2, Supplementary Material online). Although the number of deletions in all individuals and populations is much larger than the number of duplications (Fig. 1b; supplementary fig. S1, Supplementary Material online), the mean size of duplications (39,140 bp ± 104,577) is larger than that of deletions (14,052 bp ± 59,930) and exhibits a skewed distribution (Fig. 1c; Kolmogorov–Smirnov test, P = 2.2e−16). We found 37% to 43% of the CNVs to be private to one individual in the 3 central populations. In comparison, only 12% to 14% of all CNVs are fixed in each of the 3 central populations (supplementary fig. S2, Supplementary Material online); i.e. CNVs were observed in all 5 individuals of a given population. Southern populations (southern coast and southern highland) exhibited more fixed CNVs than the central populations, especially the 2 southern coast populations (25% in SC_LA2932 and 31% in SC_LA4107; supplementary fig. S2, Supplementary Material online).

Deletions and duplications were enriched at both ends of the chromosomes (Fig. 1d), consistent with previous studies (Alonge et al. 2020; Hämälä et al. 2021; Li et al. 2023). Although most CNVs (76% to 79% per population) cover intergenic regions (Fig. 1e), about 35% to 38% of CNVs impacted coding sequences annotated in the S. chilense reference (some large CNVs were counted repeatedly due to covering multiple genes and intergenic regions). In addition, 45% and 50% of CNVs across populations overlapped with putative regulatory elements 5 kb upstream and 5 kb downstream of genes, respectively. As expected, 68% of deletions and 82% of duplications matched at least one transposable element annotated in the S. chilense genome, supporting that CNVs are predominantly shaped by transposable elements (Fuentes et al. 2019; Alonge et al. 2020).

To confirm the validity of our pipeline, which assembled CNV detection from 4 tools specialized for short-read datasets, we simulated 1,000 deletions and 1,000 duplications with lengths ranging from 50 bp to 1 Mb based on 150-bp short reads (see Methods). Our pipeline successfully detected approximately 90% of the simulated CNVs, and the false-positive rate was much lower than that based on a single caller (supplementary table S3, Supplementary Material online). Our results, as well as previous claims, indicated that combining multiple callers can effectively improve the detection of CNVs based on short-read data (Kosugi et al. 2019; Mahmoud et al. 2019; Coutelier et al. 2022).

CNVs Effectively Capture the Known Species Population Structure

We compared the results of population structure analyses based on genome-wide SNPs and CNVs. The principal component analysis (PCA) based on the genotyped CNV dataset agreed with the clustering patterns from the genome-wide SNP dataset (Fig. 2a; supplementary fig. S3a, Supplementary Material online). Both analyses suggested a division of our samples into 4 genetic clusters that aligned with the geographic structure of the populations (best K = 4). The first principal component (PC1) separated the southern coast populations from inland (central and southern highland) populations, PC2 separated the southern coast subgroup into 2 genetic clusters (SC_LA2932 and SC_LA4107), and PC3 separated the inland populations into central and southern highland clusters (Fig. 2a; supplementary fig. S3a, Supplementary Material online). The ADMIXTURE analysis confirmed this result (Fig. 2b; supplementary fig. S3b, Supplementary Material online, with K = 4 exhibiting the lowest cross-validation error) and was consistent with the results from the SNP dataset (supplementary fig. S3c, Supplementary Material online).

Fig. 2. — Population structure and differentiation analyses based on the genotyped CNVs. a) Principal component analysis (PCA) based on the genotyped CNVs from 35 individuals from 7 *S. chilense* populations. b) Structure analysis based on genotyped CNVs and assuming between K = 2 and K = 7 subgroups (The best K value determined from the cross-validation error was 4; supplementary fig. S3b, Supplementary Material online). C: central; SH: southern highland; SC: southern coast. c) The correlation between pairwise F_ST/D_xy and pairwise V_ST indicates that CNVs support the known population differentiation.

We further explored the population differentiation using the V_ST statistics. This statistic is analogous to the classically used F_ST and D_xy statistics, but using CN values instead of allele frequencies (Redon et al. 2006). The V_ST statistic ranges between 0 and 1, where 1 indicates that the populations are fully differentiated. We first computed the pairwise V_ST values along the whole genome in 1-kb windows using 2 CN quantitative measurements: Control-FREEC (V_ST[CN]) and read depth (V_ST[RD]) (supplementary table S4, Supplementary Material online). We found a highly significant positive correlation between these 2 estimators of the pairwise V_ST statistic (Pearson's test, P = 1.06e−07; supplementary fig. S4a, Supplementary Material online). In addition, all duplicated and lost fragments detected by Control-FREEC can be found in the CNV dataset obtained using the pipeline based on the 4 SV detection tools. Based on the pairwise V_ST statistics, we found similar structure patterns as in previous studies based on SNPs (Böndel et al. 2015; Stam et al. 2019b; Raduski and Igić 2021; Wei et al. 2023), namely the high differentiation between southern coast and inland populations, especially between southern coast and southern highland populations (average V_ST[RD] = 0.257 ± 0.039, average V_ST[CN] = 0.198 ± 0.027; supplementary table S4, Supplementary Material online). As expected, both V_ST statistics (V_ST[CN] and V_ST[RD]) showed a highly significant positive correlation with F_ST and D_xy based on SNPs (Pearson's test, P values, see Fig. 2c; supplementary fig. S4b to d, Supplementary Material online).

Differentiation of Gene CN in Different Populations

To explore the role of natural selection in shaping CNV frequencies and distribution across populations, we also calculated global V_ST statistics (also with 2 methods, V_ST[CN] and V_ST[RD]) for each gene (39,245 genes in total). We aimed to capture candidate genes under divergent selective pressures by identifying genes with strong CN differentiation across all populations (supplementary fig. S5, Supplementary Material online). In total, we identified 3,539 candidate genes that present outlier CN differentiation across the 7 populations (i.e. genes with global V_ST greater than the top 95th percentile of the 1,000 permuted V_ST values; supplementary fig. S5, Supplementary Material online; supplementary table S5, Supplementary Material online; supplementary Dataset S3, Supplementary Material online) and 2,192 strongly CN-differentiated genes of these belong to the top 99th percentile of the 1,000 permuted V_ST values (supplementary fig. S5, Supplementary Material online; supplementary table S5, Supplementary Material online). In supplementary fig. S6, Supplementary Material online, we show the distribution of deletions and duplications for these 3,539 candidate genes. Southern highland populations exhibited a comparatively large increase in gene gains (duplications) and a reduction in gene loss (deletions) relative to the other populations. In contrast, southern coast populations showed a comparatively high number of deletions relative to the high-altitude populations. In addition, southern highland and southern coast populations showed comparatively higher duplications than central populations (except C_LA3111). This may indicate that duplications play an important role during southward colonization.

We performed 4 PCA analyses based on the Control-FREEC–based CN values of (1) all annotated 23,911 genes with the mapped reads (supplementary fig. S7a, Supplementary Material online); (2) the 12,392 genes with V_ST(CN) > 0 (supplementary fig. S7b, Supplementary Material online); (3) the 3,539 differentiated gene set (observed V_ST values > 95% confidence interval cutoff in both gene CN estimate methods; Fig. 3a); and (4) the 2,192 strongly differentiated gene set (observed V_ST values > 99% confidence interval cutoff; supplementary fig. S7c, Supplementary Material online). In the PCA based on the 23,911 genes (supplementary fig. S7a, Supplementary Material online), all samples exhibited a cohesive grouping, except those from SC_LA4107. In the PCA based on the 12,392 genes with global V_ST(CN) > 0 (supplementary fig. S7b, Supplementary Material online), 2 southern coast populations separated from the 5 inland populations (central and southern highland populations), suggesting a large difference in the CN range and composition between southern coast and inland populations. In the PCA based on the differentiated gene set (Fig. 3a; supplementary fig. S7c, Supplementary Material online), PC3 separated the southern highland populations from the central populations, consistent with the PCA based on the genotyped CNVs and SNPs (Fig. 2a; supplementary fig. S2a, Supplementary Material online). To rule out the effect of a few outlier individuals on the PCA (supplementary fig. S7a and b, Supplementary Material online), we removed 2 outliers and found that the PCA results remain consistent (supplementary fig. S8, Supplementary Material online). Notably, however, southern highland populations still showed ca. 20% of admixed ancestry coefficients with the central populations (Fig. 2b). These admixture signatures can reflect gene flow post-colonization of the southern habitats (between southern highland and central populations) or a very short divergence time. Consequently, similar polymorphisms in some parts of the genome were maintained between southern highland and central populations (Wei et al. 2023). These results may indicate that the past demographic history of habitat colonization (and the resulting genetic drift) and gene flow are important evolutionary processes shaping both SNP and CNV frequencies within and between populations of S. chilense.

Fig. 3. — The CN-differentiated genes among 7 populations are linked to response to multiple environmental stimuli. a) A PCA based on the CN of 3,539 differentiated genes. C: central; SH: southern highland; SC: southern coast. b) The proportions of CN-differentiated genes enriched in response to external stimuli/stresses (significantly enriched P < 0.05). The proportion of gene enrichment is defined as the number of genes enriched in one GO category divided by the number of background genes in this category. The number on each bar represents the number of genes enriched in that GO category. c) The proportions of 25 CN-differentiated genes involved in the photoperiod pathway to regulate flowering time overlapping with deletion, duplication or absence of CNV in the 7 populations. d) The proportions of 20 CN-differentiated genes involved in the vernalization pathway to regulate flowering time overlapping with deletion (DEL), duplication (DUP), or absence of CNV (no CNV) in the 7 populations. e) The proportions of 73 CN-differentiated genes involved in the root developmental process overlap with deletion, duplication, or absence of CNV in the 7 populations. The pie charts in c, d, and e denote the proportions of CN-differentiated genes overlapping with deletion, duplication or absence of CNV (see also supplementary table S6, Supplementary Material online). The numbers in the pie chart indicate the number of genes overlapping with deletion, duplication, or absence of CNV.

Copy Number Variation Illuminates Enriched Abiotic Stress-Response Pathways in S. chilense

We performed functional enrichment analysis on the 3,539 CN-differentiated genes according to GO biological process categories (supplementary Dataset S4, Supplementary Material online). We classified the significantly enriched GO categories (P < 0.05) into 9 groups (supplementary fig. S9a, Supplementary Material online) enriched for 82 genes (cell wall organization) up to 580 genes (cellular metabolic process). Interestingly, 400 (11.30%) CN-differentiated genes were enriched for a response to stimulus/stress that can be linked to multiple environmental factors (supplementary fig. S9a, Supplementary Material online), for example response to drought (water deprivation; 14.35% with 60 genes), cold (17.62% with 37 genes), heat (26.43% with 39 genes), red/far red light (15.82% with 65 genes), or ultraviolet light (UV; 19.03% with 47 genes) (Fig. 3b). The enrichment for these stress responses supported multiple sources of evidence for adaptation at genes associated with responses to arid conditions along a steep altitudinal gradient in S. chilense (Fischer et al. 2011; Nosenko et al. 2016; Böndel et al. 2018; Blanchard-Gros et al. 2021; Wei et al. 2023). For instance, multiple drought- (HSF and DREB3), cold- (FAD7), and light/cold-responsive genes (FT, GI, and FLD) were found to be involved in flowering regulatory processes (supplementary Dataset S5, Supplementary Material online). These findings are consistent with previous studies suggesting that selection pressures may occur at point mutations as well as at CNVs (Ofria et al. 2003; Tan et al. 2017; Lye and Purugganan 2019).

We found 227 CN-differentiated genes associated with flowering (supplementary fig. S9a and b, Supplementary Material online), an important fitness trait underlying local adaptation in plant species (Srikanth and Schmid 2011). As a critical part of the transition from vegetative to reproductive growth, flowering is influenced by multiple environmental conditions. Therefore, divergent flowering times related to local adaptation processes along the ecological gradient may be driven by CN-differentiated genes (supplementary fig. S9c, Supplementary Material online). We found 31 and 36 CN-differentiated genes linked to response to light and cold among the genes involved in flowering regulation (supplementary fig. S9c, Supplementary Material online), of which 25 and 20 genes were linked to photoperiod and vernalization pathways (supplementary fig. S9b, Supplementary Material online). The latter represent 2 regulatory flowering time pathways sensitive to the relative lengths of light-dark periods and low temperatures, respectively (Srikanth and Schmid 2011; Gaudinier and Blackman 2020). These genes showed a comparatively high overlap with duplications in southern highland populations (Fig. 3c and d; supplementary fig. S10, Supplementary Material online; supplementary table S6, Supplementary Material online). These genes included the potential homologs of floral integrator genes FT and FD (Liu et al. 2008; Srikanth and Schmid 2011; Putterill and Varkonyi-Gasic 2016), putative homologs of CRY2, GI, and ELF3 in the photoperiod pathway (Srikanth and Schmid 2011; Makita et al. 2021), and a putative homolog of AGL14 in the vernalization pathway (Hecht et al. 2005; Pérez-Ruiz et al. 2015). These candidate genes are well-known flowering time regulators in A. thaliana (supplementary Dataset S5, Supplementary Material online). Note that these potential candidate genes related to flowering regulation were duplicated only in southern highland populations and exhibited either no CNVs or copy loss in central and southern coast populations (Fig. 3c and d; supplementary table S6, Supplementary Material online; t-test, P < 0.05). These findings indicate that genes with CN gains may promote colonization and adaptation in the southern highland habitats by regulating flowering time via the photoperiod and vernalization pathways (Wei et al. 2023). Previous studies on several plant species have shown that a duplication of these positively regulated genes determining flowering time increases their expression level thereby promoting flowering (Blackman et al. 2010; Díaz et al. 2012; Panchy et al. 2016). This genomic finding was consistent with the phenology observed in glasshouse conditions, in which southern highland individuals consistently flower 5 to 10 d earlier than those from central populations. In addition, other potential flowering regulatory genes in the differentiated gene set were likely involved in flowering regulation via different pathways (supplementary Dataset S5, Supplementary Material online), namely the putative homologs of the genes FY and FLD (Srikanth and Schmid 2011; Cheng et al. 2017; Bao et al. 2020). The FLD gene showed an increased copy number in all populations (supplementary Dataset S5, Supplementary Material online).

We identified 60 drought-responsive CN-differentiated genes associated with direct responses to water deprivation (Fig. 3b), encompassing duplicated homologs of ABI4 and AFP1 in the abscisic acid (ABA) pathway, along with a putative WRKY33 transcription factor homolog with varying CN across populations (supplementary Dataset S5, Supplementary Material online). These genes were validated as drought stress-responsive in A. thaliana and crops (Xiao et al. 2021; Liu et al. 2022; Luo et al. 2022 ), including WRKY33, which is linked to temperature stress in tomato (Guo et al. 2022). Furthermore, eleven CN-differentiated genes also belong to the drought-response metabolism co-expression network we previously found to be over-expressed under drought compared to well-watered conditions (supplementary fig. S11, Supplementary Material online; t-test, P = 2.68e−05) (Wei et al. 2024), which corroborates their role in adaptive responses. Interestingly, we found similar numbers of deletion and duplication genes associated with water deprivation response across all populations (supplementary fig. S9d, Supplementary Material online; supplementary table S6, Supplementary Material online), suggesting a species-wide adaptation process in S. chilense through alterations in a metabolic gene network.

Our previous SNP study linked root development genes to putative local adaptation processes (primarily in response to extreme drought) in 3 low-altitude populations, including SC_LA2932, SC_LA4107 and C_LA1963 (Wei et al. 2023). Accordingly, we also found 73 CN-differentiated genes involved in root development. These genes showed more CNVs in these 3 low-altitude populations (SC_LA2932, SC_LA4107, and C_LA1963) than in high-altitude populations (C_LA2931, C_LA3111, SH_LA4117A, and SH_LA4330) (Fig. 3e; supplementary table S6, Supplementary Material online; t-test, P < 0.05). This further indicated that root development may be an important strategy for adaptation to low-altitude environments.

Gene Expansion and Contraction Patterns Show Differences Along Altitudinal Gradients

Our findings indicate that many CN-differentiated genes may be involved in adaptation to local habitats. To investigate the CN evolutionary trends of the 3,539 differentiated genes across populations, we performed an analysis of gene CN expansion (due to gene gain) and contraction (due to gene loss) across populations based on a phylogenetic tree derived from the inferred population genealogy (Fig. 4a). The CN of the differentiated genes was expanded (meaning there have been genes with CN gain) in the inland group with an expansion rate of 1.788 (Table 1). On the other hand, we found a gene reduction (meaning gene with CN loss) in the southern coast group with a contraction rate of −0.818. Within the inland group, the southern highland group exhibited CN gain (expansion rate of 0.416). In contrast, the central group showed CN losses (contraction rate of −0.767) 3 times higher than CN gains (Table 1). This likely indicates that gene CN of inland populations presents different evolutionary trends along the 2 evolutionary lineages. The 2 southern highland populations showed distinct CN expansion rates of 1.663 (SH_LA4117A) and 1.375 (SH_LA4330). In the central group, although the C_LA1963 and C_LA2931 displayed a trend of CN contraction, the C_LA3111 exhibited a similar rate of CN expansion (1.037) as the 2 southern highland populations (Table 1). The comparable CN expansion observed in the high-altitude populations (specifically, C_LA3111, SH_LA4330, and SH_LA4117A) may be attributed to 3 factors: the recent divergence of the southern highland group from the central group, the recent (re-)colonization of highland habitats following the glacial maximum (Wei et al. 2023), and the ecological similarity of the habitats (Fig. 1a) which may also result in the duplication of a similar set of genes for C_LA3111 and the southern highland populations.

Fig. 4. — The expansion and contraction of CN-differentiated genes in different populations relative to the *S. chilense* reference genome. a) The ultrametric phylogenetic tree used in gene expansion and contraction analysis (see Table 1). C: central; SH: southern highland; SC: southern coast. b) The map and pie charts show the trends of gene CN loss and gain in the processes of 2 southward colonization events, first to the southern coast and second to the southern highland (arrows). The proportion of CN gains or losses for each population is defined as the number of CN gains or losses divided by the sum of the number of CN gains and losses. c) The number of CN gains (positive values) or losses (negative values) for 16 rapidly evolving genes in 2 southern coast populations. d) The number of CN gains and CN losses for rapidly evolving genes related to photosynthesis in 4 subgroups representing 4 different habitats (see also supplementary table S8, Supplementary Material online).

Table 1.

Summary of gene expansion and contraction in different groups/populations based on an ultrametric tree

Groups/Populations^a	Number of genes with CN expansion	Number of genes with CN contraction	Number of CN gained	Number of CN lost	Rate of average expansion/contraction^b	Number of rapidly evolving genes^c
inland	40	26	167	49	1.788	15 (+13/−2)
C	163	695	355	1,013	−0.767	20 (+5/−15)
SH	527	525	1,143	705	0.416	37 (+32/−5)
SC	48	359	106	439	−0.818	9 (+2/−7)
C_LA1963	137	416	445	728	−0.512	10 (+3/−7)
C_LA2931	212	458	815	878	−0.094	15 (+3/−12)
C_LA3111	364	266	1,068	444	1.037	23 (+6/−15)
SH_LA4117A	813	342	2,574	653	1.663	52 (+38/−14)
SH_LA4330	446	328	1,766	702	1.375	31 (+22/−9)
SC_LA2932	268	846	427	1,514	−0.935	29 (+7/−22)
SC_LA4107	595	640	1,758	1,098	0.534	35 (+25/−10)

Open in a new tab

The table shows the expansion and contraction of CN-differentiated genes in different groups/populations based on an ultrametric tree (Fig. 4a). C: central; SH: southern highland; SC: southern coast. Expanded genes and contracted genes indicate that genes show an increase or decrease in the number of gene CN predicted by the birth and death models. CN to be gained or lost indicate that the number of CN increases or decreases for expanded and contracted genes, respectively.

^aGroups and populations denote the branches in the ultrametric tree (Fig. 4a).

^bRate of average expansion/contraction = (Number of CN gained - Number of CN lost)/(Number of CN expanded genes + Number of CN contracted genes). Positive values indicate CN expansion and negative values indicate CN contraction.

^cThe rapidly evolving genes indicate significantly higher CN expansion or contraction (P < 0.05) across the different groups/populations. Values outside parentheses represent the total number of the rapidly evolving genes. Positive values in parentheses denote the number of significantly expanded genes and negative values denote the number of significantly contracted genes (see also supplementary Dataset S6, Supplementary Material online).

Interestingly, opposite results were observed between the 2 southern coast populations. Gene CN appeared to have contracted in SC_LA2932 (contraction rate of −0.935), while expansion occurred in SC_LA4107 (expansion rate of 0.534; Table 1) for the 3,539 differentiated genes. This follows our previous observation that the 2 southern coast populations showed a high degree of differentiation, possibly resulting from a long time of evolution in isolation and environmental differentiation. These results are also consistent with the population structure (Fig. 2) and may reflect the old southernmost colonization of the coastal habitats and the recent colonization of the highlands (Stam et al. 2019b; Wei et al. 2023).

Overall, the copy numbers of these potentially adaptively differentiated genes show an expansion (CN gain) in the 2 previously elucidated southward colonization events (Fig. 4b). Considering that the reference genome was assembled from population C_LA3111, which probably does not represent the ancestral state of the species. We also performed the same CN-expansion and contraction analysis using gene CN data calculated from the reference genome of S. pennellii (supplementary table S7, Supplementary Material online), a drought-adapted wild tomato species. We found consistent results, except for a slight decrease in the proportion of CN gains using the reference genome of S. pennellii in C_LA3111 (supplementary fig. S12, Supplementary Material online) compared to using the S. chilense reference (Fig. 4b).

We identified 155 “rapidly evolving genes” that exhibited higher CN expansion or contraction (see Methods) across the different groups/populations from 3,539 differentiated genes based on the reference genome of S. chilense (Table 1; supplementary Dataset S6, Supplementary Material online). The 155 rapidly evolving genes also supported the population clusters in the PCA (supplementary fig. S13, Supplementary Material online), but C_LA3111 appeared closer to the southern highland populations than the other central populations. The highest number of such rapidly evolving genes was found in the southern highland populations (91 genes), including 71 significant CN expanded genes with GO enriched for photosynthesis (light reaction), long-day photoperiodism (flowering), and response to UV light and cold. We also observed 20 rapidly evolving genes primarily associated with developmental and metabolic processes. We also found 56 genes with rapidly evolving CN in the central populations (Table 1; supplementary Dataset S6, Supplementary Material online), 75% of which exhibited a significant trend of CN contraction.

Among the 51 rapidly evolving genes in the 2 southern coast populations, 16 genes showed opposite CN trends in the phylogeny: a significant contraction in SC_LA2932 versus an expansion in SC_LA4107 (Fig. 4c). These genes included few homologs of photosystem subunits (i.e. psbB and petD) mainly involved in photosynthesis (supplementary Dataset S5, Supplementary Material online) and may underpin the high genetic differentiation at the CNV level between the 2 southern coast populations. In addition, the same CN rapidly evolving genes enriched for photosynthesis (light reaction) GO categories were also found in central and southern highland groups (Fig. 4d; supplementary table S8, Supplementary Material online). These potentially photosynthetic gene families appeared to have been contracting (CN loss) in the central group and SC_LA2932 but expanding (CN gain) in the southern highland group and SC_LA4107 (Fig. 4d; supplementary table S8, Supplementary Material online), suggesting that changes in the photosynthetic pathway may also be an important adaptive strategy across the different habitats in S. chilense.

CN-differentiated Genes Are Associated With Climatic Variation Along the Altitudinal Gradient

To further explore CNV as the potential genetic basis of an adaptive response to abiotic factors, we conducted 2 genome–environment associations (GEA) analyses between the gene CN and 37 climate variables (supplementary Dataset S7, Supplementary Material online).

We first implemented a redundancy analysis (RDA) to identify climate variables significantly associated with CN-differentiated genes across the 7 populations. Three climatic variables (Bio7, Bio8, and Bio19) were observed to correlate with CN changes in the RDA based on 12,391 genes with global V_ST(CN) > 0 (supplementary fig. S14a, Supplementary Material online). The first 3 RDA axes retained only 22.62% of the putative adaptive gene CNV and only weakly distinguished between inland and southern coast populations (Permutation test, P < 0.001; supplementary fig. S14a to c, Supplementary Material online). In the RDA based on the 3,539 CN-differentiated genes, 52.11% of the variance in CN can be explained by 6 climate variables (explanatory variables; sum of proportions in Fig. 5c) from 5 significant RDA axes (Permutation test, P < 0.001; Fig. 5a and 5c; supplementary fig. S14d, Supplementary Material online). These climatic variables were significantly correlated with the different populations (Mantel test, P < 0.05; Fig. 5b). In concordance with the PCA (Fig. 2a), the 2 main ordination axes did cluster the 7 populations into 4 groups corresponding to the main geographical habitats (central, southern highland, and 2 southern coast habitats). RDA axis 1 (RDA1) was correlated with the annual temperature range (Bio7) and potential evapotranspiration during the driest period (PETDriestQuarter). This axis represented the differentiation between the southern coast and inland populations (Fig. 5a and b). RDA axis 2 (RDA2) reflected the differentiation between 2 southern coast populations by mean temperature of the wettest quarter (Bio8). RDA2 also summarized a climatic gradient differentiating the low altitude (C_LA1963) and highland populations, which was mainly driven by solar radiation (ann_Rmean) and potential evapotranspiration (annualPET and PETColdestQuarter) (Fig. 5a and b). These 6 climatic variables were primarily associated with the colonization of southern highland and southern coast populations (Fig. 5b). The proportions of gene CN differentiation explained by these 6 climatic variables ranged from 0.02 (annualPET) to 0.136 (PETColdestQuarter) (Fig. 5c), in which PETColdestQuarter and PETDriestQuarter (0.121) exhibited the highest importance and correlated with inland and southern coast populations, respectively (Fig. 5a to c). Moreover, temperature changes (Bio7 and Bio8) also explained about 20.8% of the gene CN differentiation (Fig. 5c). Solar radiation (ann_Rmean) was a specific variable correlated with high-altitude populations and explained 3.6% of gene CN differentiation (Fig. 5a to c). A consistent RDA model was obtained using the 2,192 strongly CN-differentiated genes (supplementary fig. S14e to g, Supplementary Material online). Finally, no significantly associated climate variables and RDA axes (permutation test, P < 0.001) were obtained in the RDA applied on the 20,372 non-CN-differentiated genes (supplementary fig. S14h, Supplementary Material online). This may corroborate that the CN-differentiated genes respond to external environmental stimuli in S. chilense.

Fig. 5. — Genome–environment association (GEA) analysis between the gene CN and the climatic data of the different habitats. a) Redundancy analysis (RDA) ordination biplot illustrating the association between the climatic variables (supplementary Dataset S7, Supplementary Material online), individuals, and 3,539 differentiated gene CN. In the RDA, arrows indicate the direction of the climatic variables associated with the different populations, and the projection of arrows onto an ordination axis shows the correlation with that axis. The gray points denote the CN-differentiated genes. C: central; SH: southern highland; SC: southern coast. b) The correlations between 6 overrepresented climate variables and populations, respectively. The bubble chart shows correlations between 6 climate variables. The asterisks (*) indicate the levels of significance of the climate variables for the RDA model (permutation test; * P < 0.05, ** P < 0.01, *** P < 0.0001). The grey boxes to the right of the climatic variables show the populations significantly associated with that climatic variable (Mantel test, P < 0.05). c) The proportion of variance explained by 6 overrepresented climate variables in the RDA model. d) 34 CN-differentiated genes associated with both temperature annual range (Bio7) and solar radiation (ann_Rmean) in 7 populations. The pie charts denote the proportions of CN-differentiated genes with deletion (DEL), duplication (DUP) or absence of CNV (see also supplementary table S9, Supplementary Material online).

We subsequently searched for candidate genes that may be associated with the 6 overrepresented climate variables using latent factor mixed models (LFMM) (supplementary fig. S15a, Supplementary Material online) (Frichot et al. 2013; Caye et al. 2019). Here, we performed an association analysis between the climatic variables and 3,539 highly CN-differentiated genes (not all genes). We identified 312 CN-differentiated genes significantly associated with the 6 climatic variables (z-test; calibrated P < 0.01; supplementary fig. S15b, Supplementary Material online; supplementary Dataset S8, Supplementary Material online). The PCA based on the CN of these 312 candidate genes displayed population clustering consistent with the one found in the RDA model (supplementary fig. S16a, Supplementary Material online; Fig. 5a), supporting that the 6 climate variables reflected gene CN changes across the species distribution. Among these 312 candidates, we found 217 genes to be significantly associated with 3 Potential Evapotranspiration (PET) climate variables (annualPET, PETDriestQuarter, and PETColdestQuarter), of which 98 genes were shared in at least 2 PET variables (supplementary fig. S15b, Supplementary Material online). Indeed, PET was the primary variable reflecting the drought status of the habitat. We noted that these PET-associated CN-differentiated genes were found across all populations (supplementary fig. S16b and c, Supplementary Material online) and were mainly GO-enriched in metabolic and root development processes. This is consistent with previous genomic and transcriptomic analyses showing that metabolic pathways and root development are important responses to drought stress (Wei et al. 2023, 2024). This result confirmed that drought tolerance is likely the main environmental pressure driving CN evolution across the population distribution of S. chilense. Furthermore, 69% (34 out of 49) of the genes associated with Bio7 were also observed to be correlated with ann_Rmean (supplementary fig. S15b, Supplementary Material online), which is likely a consequence of the correlation between Bio7 and ann_Rmean (Fig. 5b; Pearson's correlation = 0.50). These genes were mainly duplicated in the southern highland populations and lost in the southern coast populations (Fig. 5d; supplementary table S9, Supplementary Material online). This result likely reflects that cold and high solar radiation are challenging conditions in southern highland populations (supplementary Dataset S7, Supplementary Material online). Multiple duplicated genes associated with solar radiation (ann_Rmean) were enriched for a response to UV light in high-altitude populations, such as (likely) homologs of UV-B receptor ARI12, and DNA repair gene REV1 (supplementary Dataset S5, Supplementary Material online) (Tossi et al. 2019; Thompson and Cortez 2020). In addition, we also found a few CN-differentiated genes, such as putative homologs of CPD (supplementary Dataset S5, Supplementary Material online), which are related to pigment (anthocyanins) accumulation and were statistically associated with solar radiation variables.

We finally observed that the number of duplicated genes associated with the 6 climatic variables in the southern coast and especially southern highland populations was much higher than in the central populations (supplementary fig. S16b, Supplementary Material online). The analysis of GO enrichment above showed that these duplicated genes are involved in response to environments, including light, drought, cold, UV, and photosynthesis, such as the likely homologs of the genes FT, FD, and ABI4, and genes involved in the formation of photosystem subunits (supplementary Dataset S5, Supplementary Material online). The number of candidate genes found as deletions was highly consistent with the RDA results (supplementary fig. S16c, Supplementary Material online; Fig. 5a). For example, a large number of deletions in genes significantly associated with Bio8 occurred in SC_LA2932 (27 genes; supplementary fig. S16c, Supplementary Material online), far more than in other populations. The RDA results consistently showed that CNVs in SC_LA2932 were also predominantly associated with Bio8 (Fig. 5a). The GO enrichment analysis showed that most lost genes are related to plant growth and development. The GEA analyses confirmed the adaptive relevance of gene CN expansion and contraction: (i) the CN-differentiated genes in the central group appeared mainly as contraction genes (deletions) while these appeared as expansion genes (duplications) in the southern highland populations; (ii) the gene CN changes were linked to the climatic variables and associated with colonization of novel habitats at the southern edge of the species distribution; and (iii) the expansion/contraction of gene CN in different populations and RDA model also matched the population structure.

Discussion

In this study, we explored the role of genomic CNV in the ecological adaptations of S. chilense. A set of key genomic CNVs in S. chilense populations were found to be highly correlated with the species colonization process and environmental variables and thus were likely implicated in the adaptive differentiation between populations, probably because of their major impact on gene expression (Fuentes et al. 2019; Rinker et al. 2019; Alonge et al. 2020; Hämälä et al. 2021; Li et al. 2023). This confirms that CNV has ubiquitous roles in adaptive processes in ecology and evolution (Żmieńko et al. 2014; Castagnone-Sereno et al. 2019; Lauer and Gresham 2019; Mérot et al. 2020). To better understand the genetic basis behind the fitness effect of CNV in natural populations, we analysed whole-genome (short-read) data for 35 S. chilense individuals from 7 populations, which allowed us to identify genome-wide CNVs. Our CNV calling pipeline resolved hundreds of thousands of CNVs in S. chilense. The number of CNVs for each population of S. chilense was similar to the numbers found in the previous tomato-clade CNV based on a pan-genome study that included a single sample of S. chilense (Li et al. 2023). CNVs were abundant across all chromosomes and frequently resided within, or in close proximity to, genes in the S. chilense genome (Fig. 1). Widespread CNVs in the S. chilense genome exhibited similar performance as SNPs for the inference of population structure and differentiation between populations (Fig. 2; supplementary fig. S3, Supplementary Material online). Based on the demographic model, we developed previously (Wei et al. 2023) as a neutral null model and the dynamic changes of gene CN in 2 southward colonization events, our results supported that neutral processes likely shape most CNVs (Silva-Arias et al. 2025). However, a genome-wide perspective allowed us to identify CNVs likely related to the adaptive divergence in recently colonized regions in response to abiotic stress.

We conservatively identified patterns of gene CN differentiation that likely represent footprints of adaptive divergence. CN differences of these genes across different populations reflected the neutral and divergent selection process between populations, demonstrating that CNV must be considered to fully understand how selection shapes genomic structural diversity and local adaptation. Overall, the evolutionary processes generating CNV diversity and divergence were dominated by the demographic history of S. chilense, namely 2 southward independent colonization events. Gene CN appears expanded in the southernmost SC_LA4107 and southern highland populations, which underwent recent colonization events and exhibited lower population sizes (Stam et al. 2019b; Wei et al. 2023), while gene CN revealed a trend of contraction in the central and SC_LA2932 populations (close to the species' center of origin). Therefore, we estimated that CN expansion and contraction likely reflect and underpin selective events during the 2 southward colonization events. Conversely, some plant species exhibit adaptive evolution by gene loss; for example, adaptive gene loss has been associated with changes in pollinators in Petunia axillaris (Hoballah et al. 2007), Ipomoea quamoclit (Zufall and Rausher 2004), and A. thaliana (Shimizu et al. 2008). Our study suggests that adaptive gene loss may also occur in genes involved in plant growth and development in central populations and in photosynthesis in central and SC_LA2932 populations (Fig. 4d). These findings confirm the critical role of gene loss in adaptive evolution. Changes in CN at photosynthetic genes underpin population differentiation between SC_LA2932 (gene loss) and SC_LA4107 (gene gain), 2 populations in 2 different habitats on the southern coast. CN-differentiated genes were also enriched in response to multiple abiotic stresses, such as red/far red light, cold, UV, or drought. These response processes can directly affect plant reproduction and growth and regulate flowering regulatory processes (supplementary fig. S9, Supplementary Material online). These findings agree with our results based on SNPs, showing that the reproductive cycle, namely the regulation of flowering time, may play a key role in adaptation to abiotic stress in S. chilense (Wei et al. 2023).

The regulation of flowering time involved in response to light (photoperiod) and cold (vernalization) appears as key adaptive pathways for S. chilense populations to colonize southern habitats, as suggested by the analysis of genome-wide SNPs (Wei et al. 2023). Here, we obtained further candidate genes based on CNVs enriched for flowering regulatory pathways and response to changes in photoperiod and cold. These genes (putative FT, FD, and FLD homologs) are duplicated in the southern highland populations (supplementary fig. S10, Supplementary Material online). Solar radiation is also a challenging condition for plants at high altitudes. Many CN-differentiated genes were enriched for a function in response to UV light (Fig. 3b; supplementary Dataset S4, Supplementary Material online), including homologs of genes involved in anthocyanin accumulation in response to UV light. In plants, anthocyanin accumulation can improve the tolerance for drought, cold, salt, and biotic stresses (Kaur et al. 2023), especially anthocyanins act as potent antioxidants which help in eliminating Reactive Oxygen Species (ROS) molecules and protect the DNA from damage under UV radiation (Catola et al. 2017; Fang et al. 2019). This may indicate that the gene CNVs in the anthocyanin accumulation pathway are important for adaptation in high-altitude populations of S. chilense. This follows a previous ecological niche study, which suggested that S. chilense populations are expanding to high-altitude habitats (Wei et al. 2023). More generally, the large number of gene losses in response to environmental stresses may indicate that the genome size reduction is a powerful evolutionary driver of adaptation (Albalat and Cañestro 2016; Helsen et al. 2020; Monroe et al. 2021). Further functional validation will help understand the molecular mechanisms through which copy number variation drives adaptive evolution in natural populations.

To provide further evidence for selection (versus the footprints of past demography), our RDA analysis ultimately linked the dynamics of gene CN across populations to 6 climatic variables (Fig. 5a and b), of which 5 climatic variables were consistent with previous RDA results based on SNPs (Wei et al. 2023). Similar CNV-environmental interactions have been observed in A. thaliana (DeBolt 2010; Zmienko et al. 2020), S. lycopersicum (Alonge et al. 2020), Theobroma cacao (Hämälä et al. 2021), and Oryza sativa (Fuentes et al. 2019; Qin et al. 2021). Our results also highlight that CNV likely plays an essential role in response to the environment and the southward colonization of S. chilense. CNVs, especially duplications in southern highland populations exposed to typical high-altitude stresses, were enriched in genes with functions related to cold, change of photoperiod, and solar radiation. The CN changes of differentiated genes in southern coast populations mainly correlated with drought stress, such as root development, cell homeostasis, or cell wall maintenance. Interestingly, gene CN differentiation related to photosynthesis provided evidence for the genetic underpinning of the adaptive differentiation between SC_LA2932 and SC_LA4107, representing 2 different coastal habitats (Figs. 1a and 4c). These differentiated genes revealed opposite CN evolutionary trends between the two southern coast populations. Indeed, we saw different habitats as SC_LA2932 grows in dry ravines (quebrada) in Lomas formations, whereas SC_LA4107 grows in extremely fine alluvial soil (with even some running water). Moreover, these chloroplast genes were detected in the nuclear genome, consistent with widespread events of organellar gene transfers to the nuclear genome in tomatoes (Pesaresi et al. 2014; Lichtenstein et al. 2016; Kim and Lee 2018). Since the chloroplast genome is much more conserved than the nuclear genome in plants, the transfer of chloroplast genes to the nuclear genome with CNVs likely facilitates the increase in genetic diversity at nuclear copies of chloroplast genes, influencing the ecological adaptability of S. chilense (Daniell et al. 2016). These putative adaptive signatures related to photosynthesis were not found in previous studies based on genome scans of SNPs (Wei et al. 2023). The 3 central populations showed mainly a trend toward gene loss and low correlation with climatic variables (Fig. 5a and b). This trend is consistent with the fact that GEA analyses based on current climatic data have limited statistical power to detect old adaptive selection signals, whether based on SNPs or CNVs, due to the occurrence of multiple historical confounding events such as genetic drift, migration, and recombination (De Mita et al. 2013; Manel et al. 2016). The 2 central populations found at high altitudes (C_LA2931 and C_LA3111) exhibit few adaptive duplication signatures, but some possible responses to cold and solar radiation, similar to those observed for the southern highland populations (Stam et al. 2019b; Wei et al. 2023).

Finally, we would like to stress that our study likely underestimates the amount and importance of CNV in S. chilense as we do not possess long-read data for all populations, and our measures of outlier CNVs using global V_ST are likely conservative. First, the tests with simulations based on the short-read data showed that our pipeline, based on 4 tools to recover CNVs, was likely conservative, which means that we probably missed some CNVs. Second, there may be bias in finding footprints of selection when using seeds from accessions maintained and propagated at the Tomato Genetics Resource Center (TGRC; UC Davis, USA), as we discussed previously (Wei et al. 2023). Third, we also point out that the detection of CN-differentiated genes by the global V_ST statistics might be inflated because it is hard to correct for multiple testing (especially without a neutral demographic model of CNV evolution). We refrain from using the pairwise V_ST values to search for CNVs under selection because the sample size per population remained rather low (5 diploids), but with higher sample sizes, such comparison of global versus pairwise V_ST would pinpoint more precisely to the population in which CNVs may have been selected. The availability of a new reference genome (Silva-Arias et al. 2025) and a small number of populations sequenced with long-read (Li et al. 2023) opens the path to sequence wild populations with long-read sequencing and a complete assessment of the importance of CNV at abiotic stress genes in S. chilense. We highlight here that, contrary to common practice in SNP analyses (Wei et al. 2023 and recommendation in Johri et al. 2022), there is no standard procedure for detecting CNVs under selection, and we used here a permutation method based on V_ST (see also Rinker et al. (2019). Nonetheless, the V_ST measure, despite our randomization procedure, may be biased by low-frequency CNVs (as is known for F_ST), and thus we used RDA to provide orthogonal evidence. Therefore, there is a need to develop new simulation and inference methods to study, infer, and disentangle the neutral and selective processes driving gene duplication and deletion (Otto et al. 2022; Otto and Wiehe 2023). These are much-needed options to quantify/infer the neutral rates of gene duplication/deletion during the species' southward expansion and local adaptation, and, thereby, develop robust statistical selection tests for CNVs. Fourth, instead of using the CN dataset of all genes to perform association analyses with climate variables, we used genes with high CN differentiation. The reason for this is that in the RDA analysis, we did not obtain any associated climate variables when using the CN dataset for all genes, indicating that the large number of genes with weak CN changes greatly reduced the resolution of the analysis (while low-frequency CNVs are not picked up by the RDA analysis). This may confirm that the RDA complements our V_ST analysis and supports the footprints of selection (versus that of neutral processes) at our high CN-differentiated genes.

Despite being conservative regarding the importance of positive selection shaping the CNV diversity in S. chilense, our results reinforced the observation that CNV is an important contributor to adaptation across different ecological habitats (Żmieńko et al. 2014; Rinker et al. 2019; Hämälä et al. 2021; Monroe et al. 2021). The strong selective pressure imposed by the range expansion of S. chilense and the need to adapt to novel stressful habitats have shaped the genetic diversity at SNPs and CNVs. In agreement with previous studies, we suggest that natural selection acting on CNVs can reshape the genomic composition of populations and might form a basis for local adaptation (Iskow et al. 2012; Żmieńko et al. 2014; Rinker et al. 2019; Hämälä et al. 2021).

Materials and Methods

Sample Collection and Sequence Read Processing

The 35 S. chilense plants were grown in standard glasshouse conditions from seeds obtained from the Tomato Genetics Resource Center (TGRC, University of California, Davis, CA, USA). We sampled 5 diploid plants from accessions representing the 3 main geographic groups. We retrieved whole-genome short-read sequencing data from 35 specimens from 7 populations of S. chilense (accession: C_LA1963, C_LA3111, C_LA2931, SH_LA4330, SH_LA4117A, SC_LA2932, and SC_LA4107; 5 diploid plants for each population) representing 3 main geographic groups and environments (Fig. 1a). The data are available on the European Nucleotide Archive (ENA; BioProject accession no. PRJEB47577). We executed the same pipeline of read processing procedure as in our previous study (Wei et al. 2023), including quality trimming, mapping and SNP calling based on the reference genome of S. chilense (Silva-Arias et al. 2025). The results of the sequencing and read mapping were documented in supplementary Dataset S1, Supplementary Material online.

Identification and Genotyping of CNVs

To obtain high-confidence CNVs including deletions and duplications, we chose 4 software tools for SV detection based on an evaluation of SV detection tools by Kosugi et al. (2019). This study found that combining SV detection tools gives higher precision and that LUMPY (Layer et al. 2014), Manta (Chen et al. 2016), Wham (Kronenberg et al. 2015) and DELLY (Rausch et al. 2012) showed the best overall performance. These tools implement different calling algorithms that jointly draw information from patterns of read pairs, split reads, read depth, and de novo assembly.

For LUMPY v0.3.1, we first extracted the discordant paired-end reads with abnormal insertion size from mapped results using the “view” function of Samtools v1.7 (Wysoker et al. 2009), and then we extracted the split-read alignments using the “extractSplitReads_BwaMem” script. We used the “sort” function of Samtools to sort the resulting BAM files. Next, we ran LUMPY using the mapped reads, discordant paired-end reads, and split reads as inputs to detect CNVs. We chose the default parameters for CNV calling with DELLY v0.7.6 and converted the output file from BCF to VCF format using bcftools v1.9 (Danecek et al. 2011,2021). We also ran Manta v1.6 and Wham v1.8 using default parameters. We merged the CNV sets obtained with these 4 tools for each individual using SURVIVOR v1.0.7 (Jeffares et al. 2017). We set the minimum CNV length to 50 bp and the maximum CNV length to 1 Mb. Only CNVs of the same type (deletion or duplication) and same DNA strand (sense strand or antisense strand) detected by different tools were integrated. We retained CNVs that were called by at least 2 of the 4 tools.

We finally used the merged CNV set as input for SVTyper v0.7.0 to call breakpoint genotypes of the structural variants (Chiang et al. 2015). The script we used for CNV calling, merging, and breakpoint estimation is available from our Gitlab repository: https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/pipeline_of_CNV_calling_genotyping.

To assess the sensitivity and accuracy of our pipeline for CNV calling, we simulated short-read data with CNV using CNV-Sim v0.9.2 (https://github.com/NabaviLab/CNV-Sim). We simulated 1,000 duplication and 1,000 deletion regions ranging from 50 bp to 1 Mb based on 150-bp paired reads. We then used our same pipeline to call CNVs based on this simulated short-read dataset (supplementary table S3, Supplementary Material online). The script implementing these CNV simulations is available from https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/CNVs_simulation.

Population Structure Analysis

We inferred the population structure using the whole-genome SNPs and genotyped CNVs. We performed the principal component analysis (PCA) to seek a summary of the clustering pattern among sampled genomes using GCTA v1.91.4 (Yang et al. 2011). We first converted the VCF format to PLINK format using VCFtools v1.17 (Danecek et al. 2011) and then converted the PLINK format to a binary format using PLINK v1.9 (Purcell et al. 2007) with parameters “–noweb –make-bed” to generate input for GCTA. Next, we performed the analysis of the admixture using the program ADMIXTURE v1.3.0 (Alexander et al. 2009). We assessed 6 scenarios (ranging from K = 2 to K = 7) for genetic clustering using the same input as the PCA analysis.

Quantification of Gene Copy Number

We employed 2 strategies to quantify gene copy number (CN). First, we used the read-depth-based method implemented in Control-FREEC v11.6 to estimate the CN in 1 kb sliding windows across the entire genome (Boeva et al. 2012). We used the following parameters in Control-FREEC: ploidy = 2, breakPointThreshold = 0.8, degree = 3, minExpectedGC = 0.3, maxExpectedGC = 0.55, and telocentromeric = 0. We then obtained gene CN from the Control-FREEC outputs and gene coordinates in the genome. However, some genes had more than one CN estimate. These events may be due to the imperfect estimation of breakpoints using our window size. So, we calculated the average CN if one gene corresponded to multiple CN values.

We also employed an alternative strategy. We first extracted read depth using Mosdepth v0.3.2 (Pedersen and Quinlan 2018) in 1-kb sliding windows from BAM files, and then we calculated the read depth for each gene from gene coordinates. We used median read-depth values of all windows and genes as a normalizing factor to obtain the final window and gene CN estimate, respectively, and the formula reads as: CN = (read depth/median value) × 2. A factor of 2 stands for the species diploidy (Rinker et al. 2019).

Estimation of the Population Differentiation by CNVs

We calculated V_ST to estimate the population differentiation. The V_ST measurement, analogous to F_ST, is applied to identify loci that differentiate by CN between populations (Redon et al. 2006; Zhao and Gibbons 2018; Rinker et al. 2019). V_ST is calculated by defining (V_T–V_S)/V_T, where V_T denotes the total variance and V_S denotes the average variance within each population, weighted by the sample size (5 for all populations in this study). Using a sliding window-based approach, we first calculated pairwise F_ST and pairwise V_ST to compare the measures of population differentiation by SNPs and CNVs. We calculated for each pair of populations the F_ST statistics using VCFtools over 1 kb sliding windows and V_ST based on the CN of a 1-kb sliding window across the reference genome. Note that we calculated pairwise V_ST based on 2 different CN estimation strategies: using control-FREEC (V_ST[CN]) and read depth (V_ST[RD]).

Identification of CNV Candidate Genes Associated With Population Differentiation

We identified candidate genes with significant CN differences between populations, the so-called CN-differentiated genes, using a global V_ST per each gene based on the gene CN (Zhao and Gibbons 2018; Rinker et al. 2019). The per-gene global V_ST calculation follows:

\begin{aligned} V_{S T} \frac{V_{T} - 5 (V_{C_{L A 1963}} + V_{C_{L A 2931}} + V_{C_{L A 3111}} + V_{S C_{L A 2932}} + V_{S C_{L A 4107}} + V_{S H_{L A 4330}} + V_{S H_{L A 4117 A}}) / 35}{V_{T}} \end{aligned}

where V_T is the CN variance over all 35 individuals, V_{pop_x} is the CN variance for each respective population, 5 is the sample size for each respective (pop_x) population and 35 is the total sample size. An R script with the pipeline of gene V_ST calculations and identification of candidate genes is found on: https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/VST.R. We performed permutation tests on the CN counts to identify which genes displayed the greatest degree of observed inter-population CN differentiation while controlling for sampling bias. Here, we randomly permuted the gene CN of each gene for 35 individuals and calculated a new global V_ST for every permutation and every gene, respectively. We repeated 1,000 times the permutations to generate a random distribution of global V_ST values for each gene. We then selected candidate genes for which the observed global V_ST fell above the 95th and 99th percentiles of the permuted global V_ST distribution. These candidate genes displayed strong intra-population CN homogeneity and high degrees of inter-population differentiation. Finally, genes were considered significant when observed V_ST values were above the maximum 95% (differentiated) or 99% (strongly differentiated) confidence interval cutoff in both gene CN estimation methods control-FREEC (V_ST[CN]) and read depth (V_ST[RD]) (the V_ST cutoff see supplementary table S5, Supplementary Material online).

Gene Ontology (GO) Analysis

We first performed a BLAST (Camacho et al. 2009) of our CN-differentiated genes to the A. thaliana dataset TAIR10 (e-value cutoff was 10⁻⁶). We selected the best matching entry (lowest e-value) as the target homolog for enrichment analysis. We performed a GO enrichment analysis using the A. thaliana annotation database as the background using the R package clusterProfiler (Yu et al. 2012). When we determined the enriched GO terms, we used the Benjamini–Hochberg method (Benjamini and Hochberg 1995) to control the false discovery rate fixed at 0.05.

Expansion and Contraction of Gene Copy Number

To gain insights into the changes of the gene CN size across populations in a way that accounts for phylogenetic history, we performed an analysis of gene CN expansion and contraction with the set of 3,359 differentiated genes using CAFE v4.2.1 (Han et al. 2013). This program can estimate the evolution of the gene CN size based on a stochastic birth and death model. For a specified phylogenetic tree, and given the gene CN sizes in each individual, CAFE can calculate the global birth and death rate of gene CN. Then, it infers the most likely gene CN sizes at all nodes in the tree and detects genes that have accelerated rates of CN gains and losses. It finally computes a P-value associated with each gene CN and identifies significant rapidly evolving genes with the smallest P-values.

We first calculated the mean CN for 3,539 CN-differentiated genes for each population, respectively. We then constructed a population-based phylogenetic tree using SNPs by TreeMix v1.13 (Pickrell and Pritchard 2012), and the ultrametric tree (Fig. 4a) was generated based on the “force.ultrametric” function of the phytools R package (Revell 2012). Finally, we analyzed gene CN expansion and contraction in different groups. We first ran CAFE for genes with CN less than 100 to calculate an accurate lambda value (λ = 0.00207 in this study), because genes with large CN can lead to non-informative parameter estimates. We then ran CAFE for genes with CN (gene copies) larger than 100 using the same lambda value calculated from genes with CN less than 100. We chose a significance threshold of 0.05 (P-value) when identifying rapidly evolving genes with an excess rate of evolution (expansion or contraction) in different groups/populations. The code we used to analyze CN expansion and contraction can be found on https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/run_cafe.sh.

Association Analysis Between Gene Copy Number and Climatic Variables

We obtained the environmental data, including 37 climatic variables, from 2 public databases, WorldClim2 (Fick and Hijmans 2017), and ENVIREM (Title and Bemmels 2018) (supplementary Dataset S7, Supplementary Material online). To evaluate the relative contribution of the abiotic environment to explaining patterns of genetic variation, we first performed a redundancy analysis (RDA) to associate the CN of 3,539 differentiated genes with climatic variables. We performed the RDA analysis using the rda function from the R package vegan (Forester et al. 2018), modeling CN as a function of predictor variables and producing constrained axes and representative predictors (climatic variables). We assessed the multi-collinearity between representative predictors (climatic variables) using the variance inflation factor (VIF) and excluded all climatic variables with a VIF of 10 or above. We then calculated the significance of the RDA ordination axes using the anova.cca function (P < 0.001). The R script for the RDA analysis, including all steps and parameters, can be obtained at https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/RDA.R.

We obtained 6 climatic variables significantly correlated with the changes of gene CN across populations from the RDA (Fig. 5a). To identify candidate genes associated with the climatic variables, we used LFMM2 (latent factor mixed models) to build a model between each gene and climatic variable based on the univariate test (Caye et al. 2019). We used the lfmm_ridge function in the R package LFMM to obtain an object that contains the latent variable score matrix under the assumption of K = 4 latent factors (as evaluated from analysis of population structure) based on the CN of 3,539 differentiated genes and 6 representative climate variables (as obtained from RDA), respectively. Then, we performed association testing using the lfmm_test function. We finally used the method of Benjamini–Hochberg to calibrate P-values and set conservatively 0.01 as the significance threshold to obtain candidate genes associated with the climatic variables. The R script of LFMM we used is available on our Gitlab repository https://gitlab.lrz.de/population_genetics/s_chilense_cnv/-/blob/main/lfmm.R.

Supplementary Material

msaf191_Supplementary_Data

msaf191_supplementary_data.zip^{(2.1MB, zip)}

Acknowledgments

We thank the Tomato Genetics Resource Center (TGRC) of the University of California Davis, for generously providing us with the seeds of the population included in this study, and the Greenhouses & Phytochambers Unit of the TUM Plant Technology Center in Dürnast for plant care.

Contributor Information

Kai Wei, Xinjiang Key Laboratory of Biological Resources and Genetic Engineering, College of Life Science and Technology, Xinjiang University, Urumqi 830049, China; Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Freising 85354, Germany.

Remco Stam, Department of Phytopathology and Crop Protection, Institute of Phytopathology, Faculty of Agricultural and Nutritional Sciences, Christian Albrechts University, Kiel 24118, Germany.

Aurélien Tellier, Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Freising 85354, Germany.

Gustavo A Silva-Arias, Population Genetics, Department of Life Science Systems, School of Life Sciences, Technical University of Munich, Freising 85354, Germany; Instituto de Ciencias Naturales, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá 111321, Colombia.

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online.

Author Contributions

K.W., G.A.S-A., and A.T. planned and designed the study. R.S. and A.T. obtained the sequencing data. K.W. performed data analyses. K.W. wrote the first draft of the manuscript, and R.S., G.A.S.-A, and A.T. edited and improved the manuscript. All authors approved the final manuscript.

Funding

K.W. was funded by the Chinese Scholarship Council. G.A.S.-A was funded by the Technical University of Munich. K.W. acknowledges funding from Natural Science Foundation of Xinjiang Uygur Autonomous Region Grant Number: 2024D01C216 and “Tianchi Talents” introduction plan. A.T. acknowledges funding from the German Research Foundation (Deutsche Forschungsgemeinschaft grant number: 317616126). R.S. acknowledges funding from the German Research Foundation (Deutsche Forschungsgemeinschaft) grant number 170483403.

Data Availability

Raw sequence data are available at the European Nucleotide Achieve (ENA) BioProject PRJEB47577. The resource of copy number variation identified in this study and custom scripts for conducting the analyses are available at our Gitlab at the following link: https://gitlab.lrz.de/population_genetics/s_chilense_cnv.

References

Albalat R, Cañestro C. Evolution by gene loss. Nat Rev Genet. 2016:17(7):379–391. 10.1038/nrg.2016.39. [DOI] [PubMed] [Google Scholar]
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19(9):1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 2020:182(1):145–161.e23. 10.1016/j.cell.2020.05.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1,135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 2016:166(2):481–491. 10.1016/j.cell.2016.05.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of copy number variation in the Roma. Hum Genet. 2023:142(9):1327–1343. 10.1007/s00439-023-02579-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Arunyawat U, Stephan W, Städler T. Using multilocus sequence data to assess population structure, natural selection, and linkage disequilibrium in wild tomatoes. Mol Biol Evol. 2007:24(10):2310–2322. 10.1093/molbev/msm162. [DOI] [PubMed] [Google Scholar]
Bao S, Hua C, Shen L, Yu H. New insights into gibberellin signaling in regulating flowering in Arabidopsis. J Integr Plant Biol. 2020:62(1):118–131. 10.1111/jipb.12892. [DOI] [PubMed] [Google Scholar]
Beissinger TM, Wang L, Crosby K, Durvasula A, Hufford MB, Ross-Ibarra J. Recent demography drives changes in linked selection across the maize genome. Nat Plants. 2016:2(7):16084. 10.1038/nplants.2016.84. [DOI] [PubMed] [Google Scholar]
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc: B (Methodol). 1995:57(1):289–300. 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH. The role of recently derived FT paralogs in sunflower domestication. Curr Biol. 2010:20(7):629–635. 10.1016/j.cub.2010.01.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blanchard-Gros R, Bigot S, Martinez J-P, Lutts S, Guerriero G, Quinet M. Comparison of drought and heat resistance strategies among six populations of Solanum chilense and two cultivars of Solanum lycopersicum. Plants. 2021:10(8):1720. 10.3390/plants10081720. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012:28(3):423–425. 10.1093/bioinformatics/btr670. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolger A, Scossa F, Bolger ME, Lanz C, Maumus F, Tohge T, Quesneville H, Alseekh S, Sørensen I, Lichtenstein G, et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat Genet. 2014:46(9):1034–1038. 10.1038/ng.3046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Böndel KB, Lainer H, Nosenko T, Mboup M, Tellier A, Stephan W. North–south colonization associated with local adaptation of the wild tomato species Solanum chilense. Mol Biol Evol. 2015:32(11):2932–2943. 10.1093/molbev/msv166. [DOI] [PubMed] [Google Scholar]
Böndel KB, Nosenko T, Stephan W. Signatures of natural selection in abiotic stress-responsive genes of Solanum chilense. R Soc Open Sci. 2018:5(1):171198–171198. 10.1098/rsos.171198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brumlop S, Weedon O, Link W, Finckh M. Effective population size (Ne) of organically and conventionally grown composite cross winter wheat populations depending on generation. Eur J Agron. 2019:109:125922. 10.1016/j.eja.2019.125922. [DOI] [Google Scholar]
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009:10(1):421. 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Castagnone-Sereno P, Mulet K, Danchin EG, Koutsovoulos GD, Karaulic M, Da Rocha M, Bailly-Bechet M, Pratx L, Perfus-Barbeoch L, Abad P. Gene copy number variations as signatures of adaptive evolution in the parthenogenetic, plant-parasitic nematode Meloidogyne incognita. Mol Ecol. 2019:28(10):2559–2572. 10.1111/mec.15095. [DOI] [PubMed] [Google Scholar]
Catola S, Castagna A, Santin M, Calvenzani V, Petroni K, Mazzucato A, Ranieri A. The dominant allele Aft induces a shift from flavonol to anthocyanin production in response to UV-B radiation in tomato fruit. Planta. 2017:246(2):263–275. 10.1007/s00425-017-2710-z. [DOI] [PubMed] [Google Scholar]
Caye K, Jumentier B, Lepeule J, François O. LFMM 2: fast and accurate inference of gene-environment associations in genome-wide studies. Mol Biol Evol. 2019:36(4):852–860. 10.1093/molbev/msz008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, Cox AJ, Kruglyak S, Saunders CT. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016:32(8):1220–1222. 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]
Cheng J, Zhou Y, Lv T, Xie C, Tian C. Research progress on the autonomous flowering time pathway in Arabidopsis. Physiol Mol Biol Plants. 2017:23(3):477–485. 10.1007/s12298-017-0458-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, Marth GT, Quinlan AR, Hall IM. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods. 2015:12(10):966–968. 10.1038/nmeth.3505. [DOI] [PMC free article] [PubMed] [Google Scholar]
Coutelier M, Holtgrewe M, Jäger M, Flöttman R, Mensah MA, Spielmann M, Krawitz P, Horn D, Beule D, Mundlos S. Combining callers improves the detection of copy number variants from whole-genome sequencing. Eur J Hum Genet. 2022:30(2):178–186. 10.1038/s41431-021-00983-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011:27(15):2156–2158. 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM. Twelve years of SAMtools and BCFtools. Gigascience. 2021:10(2):giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daniell H, Lin C-S, Yu M, Chang W-J. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016:17(1):134. 10.1186/s13059-016-1004-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
DeBolt S. Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol Evol. 2010:2:441–453. 10.1093/gbe/evq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Mita S, Thuillet A-C, Gay L, Ahmadi N, Manel S, Ronfort J, Vigouroux Y. Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations. Mol Ecol. 2013:22(5):1383–1399. 10.1111/mec.12182. [DOI] [PubMed] [Google Scholar]
Díaz A, Zikhali M, Turner AS, Isaac P, Laurie DA. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS One. 2012:7(3):e33234. 10.1371/journal.pone.0033234. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fang H, Dong Y, Yue X, Hu J, Jiang S, Xu H, Wang Y, Su M, Zhang J, Zhang Z, et al. The B-box zinc finger protein MdBBX20 integrates anthocyanin accumulation in response to ultraviolet radiation and low temperature. Plant Cell Environ. 2019:42(7):2090–2104. 10.1111/pce.13552. [DOI] [PubMed] [Google Scholar]
Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006:7(2):85–97. 10.1038/nrg1767. [DOI] [PubMed] [Google Scholar]
Fick SE, Hijmans RJ. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol. 2017:37(12):4302–4315. 10.1002/joc.5086. [DOI] [Google Scholar]
Fischer I, Camus-Kulandaivelu L, Allal F, Stephan W. Adaptation to drought in two wild tomato species: the evolution of the Asr gene family. New Phytol. 2011:190(4):1032–1044. 10.1111/j.1469-8137.2011.03648.x. [DOI] [PubMed] [Google Scholar]
Forester BR, Lasky JR, Wagner HH, Urban DL. Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations. Mol Ecol. 2018:27(9):2215–2233. 10.1111/mec.14584. [DOI] [PubMed] [Google Scholar]
Frichot E, Schoville SD, Bouchard G, François O. Testing for associations between loci and environmental gradients using latent factor mixed models. Mol Biol Evol. 2013:30(7):1687–1699. 10.1093/molbev/mst063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fuentes RR, Chebotarov D, Duitama J, Smith S, De la Hoz JF, Mohiyuddin M, Wing RA, McNally KL, Tatarinova T, Grigoriev A, et al. Structural variants in 3000 rice genomes. Genome Res. 2019:29(5):870–880. 10.1101/gr.241240.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaudinier A, Blackman BK. Evolutionary processes from the perspective of flowering time diversity. New Phytol. 2020:225(5):1883–1898. 10.1111/nph.16205. [DOI] [PubMed] [Google Scholar]
Guo M, Yang F, Liu C, Zou J, Qi Z, Fotopoulos V, Lu G, Yu J, Zhou J. A single-nucleotide polymorphism in WRKY33 promoter is associated with the cold sensitivity in cultivated tomato. New Phytol. 2022:236(3):989–1005. 10.1111/nph.18403. [DOI] [PubMed] [Google Scholar]
Hämälä T, Wafula EK, Guiltinan MJ, Ralph PE, dePamphilis CW, Tiffin P. Genomic structural variants constrain and facilitate adaptation in natural populations of Theobroma cacao, the chocolate tree. Proc Natl Acad Sci U S A. 2021:118(35):e2102914118. 10.1073/pnas.2102914118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013:30(8):1987–1997. 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]
Hecht V, Foucher F, Ferrándiz C, Macknight R, Navarro C, Morin J, Vardy ME, Ellis N, Beltrán J, Rameau C, et al. Conservation of Arabidopsis flowering genes in model legumes. Plant Physiol. 2005:137(4):1420–1434. 10.1104/pp.104.057018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Helsen J, Voordeckers K, Vanderwaeren L, Santermans T, Tsontaki M, Verstrepen KJ, Jelier R. Gene loss predictably drives evolutionary adaptation. Mol Biol Evol. 2020:37(10):2989–3002. 10.1093/molbev/msaa172. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoballah ME, Gübitz T, Stuurman J, Broger L, Barone M, Mandel T, Dell'Olivo A, Arnold M, Kuhlemeier C. Single gene-mediated shift in pollinator attraction in Petunia. Plant Cell. 2007:19(3):779–790. 10.1105/tpc.106.048694. [DOI] [PMC free article] [PubMed] [Google Scholar]
Iskow RC, Gokcumen O, Lee C. Exploring the role of copy number variants in human adaptation. Trends Genet. 2012:28(6):245–257. 10.1016/j.tig.2012.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, Balloux F, Dessimoz C, Bähler J, Sedlazeck FJ. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017:8(1):14061. 10.1038/ncomms14061. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, et al. Recommendations for improving statistical inference in population genomics. PLoS Biol. 2022:20(5):e3001669. 10.1371/journal.pbio.3001669. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kaur S, Tiwari V, Kumari A, Chaudhary E, Sharma A, Ali U, Garg M. Protective and defensive role of anthocyanins under plant abiotic and biotic stresses: an emerging application in sustainable agriculture. J Biotechnol. 2023:361:12–29. 10.1016/j.jbiotec.2022.11.009. [DOI] [PubMed] [Google Scholar]
Kim HT, Lee JM. Organellar genome analysis reveals endosymbiotic gene transfers in tomato. PLoS One. 2018:13(9):e0202279. 10.1371/journal.pone.0202279. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019:20(1):1–18. 10.1186/s13059-019-1720-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kronenberg ZN, Osborne EJ, Cone KR, Kennedy BJ, Domyan ET, Shapiro MD, Elde NC, Yandell M. Wham: identifying structural variants of biological consequence. PLoS Comput Biol. 2015:11(12):e1004572. 10.1371/journal.pcbi.1004572. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lauer S, Gresham D. An evolving view of copy number variants. Curr Genet. 2019:65(6):1287–1295. 10.1007/s00294-019-00980-0. [DOI] [PubMed] [Google Scholar]
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014:15(6):R84. 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li N, He Q, Wang J, Wang B, Zhao J, Huang S, Yang T, Tang Y, Yang S, Aisimutuola P, et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet. 2023:55(5):852–860. 10.1038/s41588-023-01340-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lichtenstein G, Conte M, Asis R, Carrari F. Chloroplast and mitochondrial genomes of tomato. In: Causse M, Giovannoni J, Bouzayen M, Zouine M, editors. The tomato genome. Berlin, Heidelberg: Springer; 2016. p. 111–137. 10.1007/978-3-662-53389-5_7. [DOI] [Google Scholar]
Liu C, Chen H, Er HL, Soo HM, Kumar PP, Han JH, Liou YC, Yu H. Direct interaction of AGL24 and SOC1 integrates flowering signals in Arabidopsis. Development. 2008:135(8):1481–1491. 10.1242/dev.020255. [DOI] [PubMed] [Google Scholar]
Liu Z, Hou S, Rodrigues O, Wang P, Luo D, Munemasa S, Lei J, Liu J, Ortiz-Morea FA, Wang X, et al. Phytocytokine signalling reopens stomata in plant immunity and water loss. Nature. 2022:605(7909):332–339. 10.1038/s41586-022-04684-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo X, Xu J, Zheng C, Yang Y, Wang L, Zhang R, Ren X, Wei S, Aziz U, Du J, et al. Abscisic acid inhibits primary root growth by impairing ABI4-mediated cell cycle and auxin biosynthesis. Plant Physiol. 2022:191(1):265–279. 10.1093/plphys/kiac407. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lupski JR. Genomic rearrangements and sporadic disease. Nat Genet. 2007:39(S7):S43–S47. 10.1038/ng2084. [DOI] [PubMed] [Google Scholar]
Lye ZN, Purugganan MD. Copy number variation in domestication. Trends Plant Sci. 2019:24(4):352–365. 10.1016/j.tplants.2019.01.003. [DOI] [PubMed] [Google Scholar]
Lynch M, Walsh B. The origins of genome architecture. Sunderland (MA): Sinauer Associates; 2007. [Google Scholar]
Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol. 2019:20(1):246. 10.1186/s13059-019-1828-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Makita Y, Suzuki S, Fushimi K, Shimada S, Suehisa A, Hirata M, Kuriyama T, Kurihara Y, Hamasaki H, Okubo-Kurihara E. Identification of a dual orange/far-red and blue light photoreceptor from an oceanic green picoplankton. Nat Commun. 2021:12(1):3593. 10.1038/s41467-021-23741-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manel S, Perrier C, Pratlong M, Abi-Rached L, Paganini J, Pontarotti P, Aurelle D. Genomic resources and their influence on the detection of the signal of positive selection in genome scans. Mol Ecol. 2016:25(1):170–184. 10.1111/mec.13468. [DOI] [PubMed] [Google Scholar]
Marszalek-Zenczak M, Satyr A, Wojciechowski P, Zenczak M, Sobieszczanska P, Brzezinski K, Iefimenko T, Figlerowicz M, Zmienko A. Analysis of Arabidopsis non-reference accessions reveals high diversity of metabolic gene clusters and discovers new candidate cluster members. Front Plant Sci. 2023:14:1104303. 10.3389/fpls.2023.1104303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mérot C, Oomen RA, Tigano A, Wellenreuther M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol Evol. 2020:35(7):561–572. 10.1016/j.tree.2020.03.002. [DOI] [PubMed] [Google Scholar]
Monroe JG, McKay JK, Weigel D, Flood PJ. The population genomics of adaptive loss of function. Heredity (Edinb). 2021:126(3):383–395. 10.1038/s41437-021-00403-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nakazato T, Warren DL, Moyle LC. Ecological and geographic modes of species divergence in wild tomatoes. Am J Bot. 2010:97(4):680–693. 10.3732/ajb.0900216. [DOI] [PubMed] [Google Scholar]
Nosenko T, Böndel KB, Kumpfmüller G, Stephan W. Adaptation to low temperatures in the wild tomato species Solanum chilense. Mol Ecol. 2016:25(12):2853–2869. 10.1111/mec.13637. [DOI] [PubMed] [Google Scholar]
Ofria C, Adami C, Collier TC. Selective pressures on genomes in molecular evolution. J Theor Biol. 2003:222(4):477–483. 10.1016/S0022-5193(03)00062-6. [DOI] [PubMed] [Google Scholar]
Otto M, Wiehe T. The structured coalescent in the context of gene copy number variation. Theor Popul Biol. 2023:154:67–78. 10.1016/j.tpb.2023.08.001. [DOI] [PubMed] [Google Scholar]
Otto M, Zheng Y, Wiehe T. Recombination, selection, and the evolution of tandem gene arrays. Genetics. 2022:221(3):iyac052. 10.1093/genetics/iyac052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Panchy N, Lehti-Shiu M, Shiu S-H. Evolution of gene duplication in plants. Plant Physiol. 2016:171(4):2294–2316. 10.1104/pp.16.00523. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018:34(5):867–868. 10.1093/bioinformatics/btx699. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pérez-Ruiz RV, García-Ponce B, Marsch-Martínez N, Ugartechea-Chirino Y, Villajuana-Bonequi M, de Folter S, Azpeitia E, Dávila-Velderrain J, Cruz-Sánchez D, Garay-Arroyo A, et al. XAANTAL2 (AGL14) is an important component of the complex gene regulatory network that underlies Arabidopsis shoot apical meristem transitions. Mol Plant. 2015:8(5):796–813. 10.1016/j.molp.2015.01.017. [DOI] [PubMed] [Google Scholar]
Pesaresi P, Mizzotti C, Colombo M, Masiero S. Genetic regulation and structural changes during tomato fruit development and ripening. Front Plant Sci. 2014:5:124. 10.3389/fpls.2014.00124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012:8(11):e1002967. 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007:81(3):559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Putterill J, Varkonyi-Gasic E. FT and florigen long-distance flowering control in plants. Curr Opin Plant Biol. 2016:33:77–82. 10.1016/j.pbi.2016.06.008. [DOI] [PubMed] [Google Scholar]
Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021:184(13):3542–3558.e3516. 10.1016/j.cell.2021.04.046. [DOI] [PubMed] [Google Scholar]
Raduski AR, Igić B. Biosystematic studies on the status of Solanum chilense. Am J Bot. 2021:108(3):520–537. 10.1002/ajb2.1621. [DOI] [PubMed] [Google Scholar]
Ranjan A, Ichihashi Y, Sinha NR. The tomato genome: implications for plant breeding, genomics and evolution. Genome Biol. 2012:13(8):167. 10.1186/gb-2012-13-8-167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012:28(18):i333–i339. 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature. 2006:444(7118):444–454. 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]
Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012:3(2):217–223. 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]
Rinker DC, Specian NK, Zhao S, Gibbons JG. Polar bear evolution is marked by rapid changes in gene copy number in response to dietary shift. Proc Natl Acad Sci U S A. 2019:116(27):13446–13451. 10.1073/pnas.1901093116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sato S, Tabata S, Hirakawa H, Asamizu E, Shirasawa K, Isobe S, Kaneko T, Nakamura Y, Shibata D, Aoki K. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012:485(7400):635–641. 10.1038/nature11119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, Murphy K, O'Hara R, Casalunovo T, Conlin LK, D'arcy M, et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res. 2009:19(9):1682–1690. 10.1101/gr.083501.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shimizu KK, Shimizu-Inatsugi R, Tsuchimatsu T, Purugganan MD. Independent origins of self-compatibility in Arabidopsis thaliana. Mol Ecol. 2008:17(2):704–714. 10.1111/j.1365-294X.2007.03605.x. [DOI] [PubMed] [Google Scholar]
Silva-Arias GA, Gagnon E, Hembrom S, Fastner A, Khan MR, Stam R, Tellier A. Patterns of presence–absence variation of NLRs across populations of Solanum chilense are clade-dependent and mainly shaped by past demographic history. New Phytol. 2025:245(4):1718–1732. 10.1111/nph.20293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, et al. Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet. 2009:5(11):e1000734. 10.1371/journal.pgen.1000734. [DOI] [PMC free article] [PubMed] [Google Scholar]
Srikanth A, Schmid M. Regulation of flowering time: all roads lead to Rome. Cell Mol Life Sci. 2011:68(12):2013–2037. 10.1007/s00018-011-0673-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stam R, Nosenko T, Hörger AC, Stephan W, Seidel M, Kuhn JM, Haberer G, Tellier A. The de novo reference genome and transcriptome assemblies of the wild tomato species Solanum chilense highlights birth and death of NLR genes between tomato species. G3 (Bethesda). 2019a:9(12):3933–3941. 10.1534/g3.119.400529. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stam R, Silva-Arias GA, Tellier A. Subsets of NLR genes show differential signatures of adaptation during colonization of new habitats. New Phytol. 2019b:224(1):367–379. 10.1111/nph.16017. [DOI] [PubMed] [Google Scholar]
Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, Coe BP, Baker C, Nordenfelt S, Bamshad M. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015:349(6253):aab3761. 10.1126/science.aab3761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tan D, Han T-S, Hou X-H, Tian Z, Guo Y-L, Li Z-W, Xu Y-C, Yang L, Wu Q, Gu H, et al. Adaptation of Arabidopsis thaliana to the Yangtze river basin. Genome Biol. 2017:18(1):16–19. 10.1186/s13059-016-1142-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thompson PS, Cortez D. New insights into a basic site repair and tolerance. DNA Rep. 2020:90:102866. 10.1016/j.dnarep.2020.102866. [DOI] [PMC free article] [PubMed] [Google Scholar]
Title PO, Bemmels JB. ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography. 2018:41(2):291–307. 10.1111/ecog.02880. [DOI] [Google Scholar]
Tossi VE, Regalado JJ, Iannicelli J, Laino LE, Burrieza HP, Escandón AS, Pitta-Álvarez SI. Beyond Arabidopsis: differential UV-B response mediated by UVR8 in diverse species. Front Plant Sci. 2019:10:780. 10.3389/fpls.2019.00780. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wei K, Sharifova S, Zhao X, Sinha N, Nakayama H, Tellier A, Silva-Arias GA. Evolution of gene networks underlying adaptation to drought stress in the wild tomato Solanum chilense. Mol Ecol. 2024:33(21):e17536. 10.1111/mec.17536. [DOI] [PubMed] [Google Scholar]
Wei K, Silva-Arias GA, Tellier A. Selective sweeps linked to the colonization of novel habitats and climatic changes in a wild tomato species. New Phytol. 2023:237(5):1908–1921. 10.1111/nph.18634. [DOI] [PubMed] [Google Scholar]
Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009:25(16):2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia HUI, Camus-Kulandaivelu L, Stephan W, Tellier A, Zhang Z. Nucleotide diversity patterns of local adaptation at drought-related candidate genes in wild tomatoes. Mol Ecol. 2010:19(19):4144–4154. 10.1111/j.1365-294X.2010.04762.x. [DOI] [PubMed] [Google Scholar]
Xiao S, Jiang L, Wang C, Ow DW. Arabidopsis OXS3 family proteins repress ABA signaling through interactions with AFP1 in the regulation of ABI4 expression. J Exp Bot. 2021:72(15):5721–5734. 10.1093/jxb/erab237. [DOI] [PubMed] [Google Scholar]
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011:88(1):76–82. 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012:16(5):284–287. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao S, Gibbons JG. A population genomic characterization of copy number variation in the opportunistic fungal pathogen Aspergillus fumigatus. PLoS One. 2018:13(8):e0201611. 10.1371/journal.pone.0201611. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou Y, Minio A, Massonnet M, Solares E, Lv Y, Beridze T, Cantu D, Gaut BS. The population genetics of structural variants in grapevine domestication. Nat Plants. 2019:5(9):965–979. 10.1038/s41477-019-0507-8. [DOI] [PubMed] [Google Scholar]
Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, Wu Y, Cheng L, Fang Y, Wu K, et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 2022:606(7914):527–534. 10.1038/s41586-022-04808-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zmienko A, Marszalek-Zenczak M, Wojciechowski P, Samelak-Czajka A, Luczak M, Kozlowski P, Karlowski WM, Figlerowicz M. AthCNV: a map of DNA copy number variations in the Arabidopsis genome[OPEN]. Plant Cell. 2020:32(6):1797–1819. 10.1105/tpc.19.00640. [DOI] [PMC free article] [PubMed] [Google Scholar]
Żmieńko A, Samelak A, Kozłowski P, Figlerowicz M. Copy number polymorphism in plant genomes. Theor Appl Genet. 2014:127(1):1–18. 10.1007/s00122-013-2177-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zufall RA, Rausher MD. Genetic changes associated with floral adaptation restrict future evolutionary potential. Nature. 2004:428(6985):847–850. 10.1038/nature02489. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

msaf191_Supplementary_Data

msaf191_supplementary_data.zip^{(2.1MB, zip)}

Data Availability Statement

[msaf191-B1] Albalat R, Cañestro C. Evolution by gene loss. Nat Rev Genet. 2016:17(7):379–391. 10.1038/nrg.2016.39. [DOI] [PubMed] [Google Scholar]

[msaf191-B2] Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19(9):1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B3] Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 2020:182(1):145–161.e23. 10.1016/j.cell.2020.05.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B4] Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, et al. 1,135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 2016:166(2):481–491. 10.1016/j.cell.2016.05.063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B5] Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of copy number variation in the Roma. Hum Genet. 2023:142(9):1327–1343. 10.1007/s00439-023-02579-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B6] Arunyawat U, Stephan W, Städler T. Using multilocus sequence data to assess population structure, natural selection, and linkage disequilibrium in wild tomatoes. Mol Biol Evol. 2007:24(10):2310–2322. 10.1093/molbev/msm162. [DOI] [PubMed] [Google Scholar]

[msaf191-B7] Bao S, Hua C, Shen L, Yu H. New insights into gibberellin signaling in regulating flowering in Arabidopsis. J Integr Plant Biol. 2020:62(1):118–131. 10.1111/jipb.12892. [DOI] [PubMed] [Google Scholar]

[msaf191-B8] Beissinger TM, Wang L, Crosby K, Durvasula A, Hufford MB, Ross-Ibarra J. Recent demography drives changes in linked selection across the maize genome. Nat Plants. 2016:2(7):16084. 10.1038/nplants.2016.84. [DOI] [PubMed] [Google Scholar]

[msaf191-B9] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc: B (Methodol). 1995:57(1):289–300. 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]

[msaf191-B10] Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH. The role of recently derived FT paralogs in sunflower domestication. Curr Biol. 2010:20(7):629–635. 10.1016/j.cub.2010.01.059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B11] Blanchard-Gros R, Bigot S, Martinez J-P, Lutts S, Guerriero G, Quinet M. Comparison of drought and heat resistance strategies among six populations of Solanum chilense and two cultivars of Solanum lycopersicum. Plants. 2021:10(8):1720. 10.3390/plants10081720. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B12] Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012:28(3):423–425. 10.1093/bioinformatics/btr670. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B13] Bolger A, Scossa F, Bolger ME, Lanz C, Maumus F, Tohge T, Quesneville H, Alseekh S, Sørensen I, Lichtenstein G, et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat Genet. 2014:46(9):1034–1038. 10.1038/ng.3046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B14] Böndel KB, Lainer H, Nosenko T, Mboup M, Tellier A, Stephan W. North–south colonization associated with local adaptation of the wild tomato species Solanum chilense. Mol Biol Evol. 2015:32(11):2932–2943. 10.1093/molbev/msv166. [DOI] [PubMed] [Google Scholar]

[msaf191-B15] Böndel KB, Nosenko T, Stephan W. Signatures of natural selection in abiotic stress-responsive genes of Solanum chilense. R Soc Open Sci. 2018:5(1):171198–171198. 10.1098/rsos.171198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B16] Brumlop S, Weedon O, Link W, Finckh M. Effective population size (Ne) of organically and conventionally grown composite cross winter wheat populations depending on generation. Eur J Agron. 2019:109:125922. 10.1016/j.eja.2019.125922. [DOI] [Google Scholar]

[msaf191-B17] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009:10(1):421. 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B18] Castagnone-Sereno P, Mulet K, Danchin EG, Koutsovoulos GD, Karaulic M, Da Rocha M, Bailly-Bechet M, Pratx L, Perfus-Barbeoch L, Abad P. Gene copy number variations as signatures of adaptive evolution in the parthenogenetic, plant-parasitic nematode Meloidogyne incognita. Mol Ecol. 2019:28(10):2559–2572. 10.1111/mec.15095. [DOI] [PubMed] [Google Scholar]

[msaf191-B19] Catola S, Castagna A, Santin M, Calvenzani V, Petroni K, Mazzucato A, Ranieri A. The dominant allele Aft induces a shift from flavonol to anthocyanin production in response to UV-B radiation in tomato fruit. Planta. 2017:246(2):263–275. 10.1007/s00425-017-2710-z. [DOI] [PubMed] [Google Scholar]

[msaf191-B20] Caye K, Jumentier B, Lepeule J, François O. LFMM 2: fast and accurate inference of gene-environment associations in genome-wide studies. Mol Biol Evol. 2019:36(4):852–860. 10.1093/molbev/msz008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B21] Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, Cox AJ, Kruglyak S, Saunders CT. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016:32(8):1220–1222. 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]

[msaf191-B22] Cheng J, Zhou Y, Lv T, Xie C, Tian C. Research progress on the autonomous flowering time pathway in Arabidopsis. Physiol Mol Biol Plants. 2017:23(3):477–485. 10.1007/s12298-017-0458-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B23] Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, Marth GT, Quinlan AR, Hall IM. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods. 2015:12(10):966–968. 10.1038/nmeth.3505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B24] Coutelier M, Holtgrewe M, Jäger M, Flöttman R, Mensah MA, Spielmann M, Krawitz P, Horn D, Beule D, Mundlos S. Combining callers improves the detection of copy number variants from whole-genome sequencing. Eur J Hum Genet. 2022:30(2):178–186. 10.1038/s41431-021-00983-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B25] Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011:27(15):2156–2158. 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B26] Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM. Twelve years of SAMtools and BCFtools. Gigascience. 2021:10(2):giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B27] Daniell H, Lin C-S, Yu M, Chang W-J. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016:17(1):134. 10.1186/s13059-016-1004-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B28] DeBolt S. Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol Evol. 2010:2:441–453. 10.1093/gbe/evq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B29] De Mita S, Thuillet A-C, Gay L, Ahmadi N, Manel S, Ronfort J, Vigouroux Y. Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations. Mol Ecol. 2013:22(5):1383–1399. 10.1111/mec.12182. [DOI] [PubMed] [Google Scholar]

[msaf191-B30] Díaz A, Zikhali M, Turner AS, Isaac P, Laurie DA. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS One. 2012:7(3):e33234. 10.1371/journal.pone.0033234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B31] Fang H, Dong Y, Yue X, Hu J, Jiang S, Xu H, Wang Y, Su M, Zhang J, Zhang Z, et al. The B-box zinc finger protein MdBBX20 integrates anthocyanin accumulation in response to ultraviolet radiation and low temperature. Plant Cell Environ. 2019:42(7):2090–2104. 10.1111/pce.13552. [DOI] [PubMed] [Google Scholar]

[msaf191-B32] Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006:7(2):85–97. 10.1038/nrg1767. [DOI] [PubMed] [Google Scholar]

[msaf191-B33] Fick SE, Hijmans RJ. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol. 2017:37(12):4302–4315. 10.1002/joc.5086. [DOI] [Google Scholar]

[msaf191-B34] Fischer I, Camus-Kulandaivelu L, Allal F, Stephan W. Adaptation to drought in two wild tomato species: the evolution of the Asr gene family. New Phytol. 2011:190(4):1032–1044. 10.1111/j.1469-8137.2011.03648.x. [DOI] [PubMed] [Google Scholar]

[msaf191-B35] Forester BR, Lasky JR, Wagner HH, Urban DL. Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations. Mol Ecol. 2018:27(9):2215–2233. 10.1111/mec.14584. [DOI] [PubMed] [Google Scholar]

[msaf191-B36] Frichot E, Schoville SD, Bouchard G, François O. Testing for associations between loci and environmental gradients using latent factor mixed models. Mol Biol Evol. 2013:30(7):1687–1699. 10.1093/molbev/mst063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B37] Fuentes RR, Chebotarov D, Duitama J, Smith S, De la Hoz JF, Mohiyuddin M, Wing RA, McNally KL, Tatarinova T, Grigoriev A, et al. Structural variants in 3000 rice genomes. Genome Res. 2019:29(5):870–880. 10.1101/gr.241240.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B38] Gaudinier A, Blackman BK. Evolutionary processes from the perspective of flowering time diversity. New Phytol. 2020:225(5):1883–1898. 10.1111/nph.16205. [DOI] [PubMed] [Google Scholar]

[msaf191-B39] Guo M, Yang F, Liu C, Zou J, Qi Z, Fotopoulos V, Lu G, Yu J, Zhou J. A single-nucleotide polymorphism in WRKY33 promoter is associated with the cold sensitivity in cultivated tomato. New Phytol. 2022:236(3):989–1005. 10.1111/nph.18403. [DOI] [PubMed] [Google Scholar]

[msaf191-B40] Hämälä T, Wafula EK, Guiltinan MJ, Ralph PE, dePamphilis CW, Tiffin P. Genomic structural variants constrain and facilitate adaptation in natural populations of Theobroma cacao, the chocolate tree. Proc Natl Acad Sci U S A. 2021:118(35):e2102914118. 10.1073/pnas.2102914118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B41] Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013:30(8):1987–1997. 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]

[msaf191-B42] Hecht V, Foucher F, Ferrándiz C, Macknight R, Navarro C, Morin J, Vardy ME, Ellis N, Beltrán J, Rameau C, et al. Conservation of Arabidopsis flowering genes in model legumes. Plant Physiol. 2005:137(4):1420–1434. 10.1104/pp.104.057018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B43] Helsen J, Voordeckers K, Vanderwaeren L, Santermans T, Tsontaki M, Verstrepen KJ, Jelier R. Gene loss predictably drives evolutionary adaptation. Mol Biol Evol. 2020:37(10):2989–3002. 10.1093/molbev/msaa172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B44] Hoballah ME, Gübitz T, Stuurman J, Broger L, Barone M, Mandel T, Dell'Olivo A, Arnold M, Kuhlemeier C. Single gene-mediated shift in pollinator attraction in Petunia. Plant Cell. 2007:19(3):779–790. 10.1105/tpc.106.048694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B45] Iskow RC, Gokcumen O, Lee C. Exploring the role of copy number variants in human adaptation. Trends Genet. 2012:28(6):245–257. 10.1016/j.tig.2012.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B46] Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, Balloux F, Dessimoz C, Bähler J, Sedlazeck FJ. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017:8(1):14061. 10.1038/ncomms14061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B47] Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, et al. Recommendations for improving statistical inference in population genomics. PLoS Biol. 2022:20(5):e3001669. 10.1371/journal.pbio.3001669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B48] Kaur S, Tiwari V, Kumari A, Chaudhary E, Sharma A, Ali U, Garg M. Protective and defensive role of anthocyanins under plant abiotic and biotic stresses: an emerging application in sustainable agriculture. J Biotechnol. 2023:361:12–29. 10.1016/j.jbiotec.2022.11.009. [DOI] [PubMed] [Google Scholar]

[msaf191-B49] Kim HT, Lee JM. Organellar genome analysis reveals endosymbiotic gene transfers in tomato. PLoS One. 2018:13(9):e0202279. 10.1371/journal.pone.0202279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B50] Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019:20(1):1–18. 10.1186/s13059-019-1720-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B51] Kronenberg ZN, Osborne EJ, Cone KR, Kennedy BJ, Domyan ET, Shapiro MD, Elde NC, Yandell M. Wham: identifying structural variants of biological consequence. PLoS Comput Biol. 2015:11(12):e1004572. 10.1371/journal.pcbi.1004572. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B52] Lauer S, Gresham D. An evolving view of copy number variants. Curr Genet. 2019:65(6):1287–1295. 10.1007/s00294-019-00980-0. [DOI] [PubMed] [Google Scholar]

[msaf191-B53] Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014:15(6):R84. 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B54] Li N, He Q, Wang J, Wang B, Zhao J, Huang S, Yang T, Tang Y, Yang S, Aisimutuola P, et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet. 2023:55(5):852–860. 10.1038/s41588-023-01340-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B55] Lichtenstein G, Conte M, Asis R, Carrari F. Chloroplast and mitochondrial genomes of tomato. In: Causse M, Giovannoni J, Bouzayen M, Zouine M, editors. The tomato genome. Berlin, Heidelberg: Springer; 2016. p. 111–137. 10.1007/978-3-662-53389-5_7. [DOI] [Google Scholar]

[msaf191-B56] Liu C, Chen H, Er HL, Soo HM, Kumar PP, Han JH, Liou YC, Yu H. Direct interaction of AGL24 and SOC1 integrates flowering signals in Arabidopsis. Development. 2008:135(8):1481–1491. 10.1242/dev.020255. [DOI] [PubMed] [Google Scholar]

[msaf191-B57] Liu Z, Hou S, Rodrigues O, Wang P, Luo D, Munemasa S, Lei J, Liu J, Ortiz-Morea FA, Wang X, et al. Phytocytokine signalling reopens stomata in plant immunity and water loss. Nature. 2022:605(7909):332–339. 10.1038/s41586-022-04684-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B58] Luo X, Xu J, Zheng C, Yang Y, Wang L, Zhang R, Ren X, Wei S, Aziz U, Du J, et al. Abscisic acid inhibits primary root growth by impairing ABI4-mediated cell cycle and auxin biosynthesis. Plant Physiol. 2022:191(1):265–279. 10.1093/plphys/kiac407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B59] Lupski JR. Genomic rearrangements and sporadic disease. Nat Genet. 2007:39(S7):S43–S47. 10.1038/ng2084. [DOI] [PubMed] [Google Scholar]

[msaf191-B60] Lye ZN, Purugganan MD. Copy number variation in domestication. Trends Plant Sci. 2019:24(4):352–365. 10.1016/j.tplants.2019.01.003. [DOI] [PubMed] [Google Scholar]

[msaf191-B61] Lynch M, Walsh B. The origins of genome architecture. Sunderland (MA): Sinauer Associates; 2007. [Google Scholar]

[msaf191-B62] Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol. 2019:20(1):246. 10.1186/s13059-019-1828-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B63] Makita Y, Suzuki S, Fushimi K, Shimada S, Suehisa A, Hirata M, Kuriyama T, Kurihara Y, Hamasaki H, Okubo-Kurihara E. Identification of a dual orange/far-red and blue light photoreceptor from an oceanic green picoplankton. Nat Commun. 2021:12(1):3593. 10.1038/s41467-021-23741-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B64] Manel S, Perrier C, Pratlong M, Abi-Rached L, Paganini J, Pontarotti P, Aurelle D. Genomic resources and their influence on the detection of the signal of positive selection in genome scans. Mol Ecol. 2016:25(1):170–184. 10.1111/mec.13468. [DOI] [PubMed] [Google Scholar]

[msaf191-B65] Marszalek-Zenczak M, Satyr A, Wojciechowski P, Zenczak M, Sobieszczanska P, Brzezinski K, Iefimenko T, Figlerowicz M, Zmienko A. Analysis of Arabidopsis non-reference accessions reveals high diversity of metabolic gene clusters and discovers new candidate cluster members. Front Plant Sci. 2023:14:1104303. 10.3389/fpls.2023.1104303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B66] Mérot C, Oomen RA, Tigano A, Wellenreuther M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol Evol. 2020:35(7):561–572. 10.1016/j.tree.2020.03.002. [DOI] [PubMed] [Google Scholar]

[msaf191-B67] Monroe JG, McKay JK, Weigel D, Flood PJ. The population genomics of adaptive loss of function. Heredity (Edinb). 2021:126(3):383–395. 10.1038/s41437-021-00403-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B68] Nakazato T, Warren DL, Moyle LC. Ecological and geographic modes of species divergence in wild tomatoes. Am J Bot. 2010:97(4):680–693. 10.3732/ajb.0900216. [DOI] [PubMed] [Google Scholar]

[msaf191-B69] Nosenko T, Böndel KB, Kumpfmüller G, Stephan W. Adaptation to low temperatures in the wild tomato species Solanum chilense. Mol Ecol. 2016:25(12):2853–2869. 10.1111/mec.13637. [DOI] [PubMed] [Google Scholar]

[msaf191-B70] Ofria C, Adami C, Collier TC. Selective pressures on genomes in molecular evolution. J Theor Biol. 2003:222(4):477–483. 10.1016/S0022-5193(03)00062-6. [DOI] [PubMed] [Google Scholar]

[msaf191-B71] Otto M, Wiehe T. The structured coalescent in the context of gene copy number variation. Theor Popul Biol. 2023:154:67–78. 10.1016/j.tpb.2023.08.001. [DOI] [PubMed] [Google Scholar]

[msaf191-B72] Otto M, Zheng Y, Wiehe T. Recombination, selection, and the evolution of tandem gene arrays. Genetics. 2022:221(3):iyac052. 10.1093/genetics/iyac052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B73] Panchy N, Lehti-Shiu M, Shiu S-H. Evolution of gene duplication in plants. Plant Physiol. 2016:171(4):2294–2316. 10.1104/pp.16.00523. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B74] Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018:34(5):867–868. 10.1093/bioinformatics/btx699. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B75] Pérez-Ruiz RV, García-Ponce B, Marsch-Martínez N, Ugartechea-Chirino Y, Villajuana-Bonequi M, de Folter S, Azpeitia E, Dávila-Velderrain J, Cruz-Sánchez D, Garay-Arroyo A, et al. XAANTAL2 (AGL14) is an important component of the complex gene regulatory network that underlies Arabidopsis shoot apical meristem transitions. Mol Plant. 2015:8(5):796–813. 10.1016/j.molp.2015.01.017. [DOI] [PubMed] [Google Scholar]

[msaf191-B76] Pesaresi P, Mizzotti C, Colombo M, Masiero S. Genetic regulation and structural changes during tomato fruit development and ripening. Front Plant Sci. 2014:5:124. 10.3389/fpls.2014.00124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B77] Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012:8(11):e1002967. 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B78] Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007:81(3):559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B79] Putterill J, Varkonyi-Gasic E. FT and florigen long-distance flowering control in plants. Curr Opin Plant Biol. 2016:33:77–82. 10.1016/j.pbi.2016.06.008. [DOI] [PubMed] [Google Scholar]

[msaf191-B80] Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021:184(13):3542–3558.e3516. 10.1016/j.cell.2021.04.046. [DOI] [PubMed] [Google Scholar]

[msaf191-B81] Raduski AR, Igić B. Biosystematic studies on the status of Solanum chilense. Am J Bot. 2021:108(3):520–537. 10.1002/ajb2.1621. [DOI] [PubMed] [Google Scholar]

[msaf191-B82] Ranjan A, Ichihashi Y, Sinha NR. The tomato genome: implications for plant breeding, genomics and evolution. Genome Biol. 2012:13(8):167. 10.1186/gb-2012-13-8-167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B83] Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012:28(18):i333–i339. 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B84] Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature. 2006:444(7118):444–454. 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B85] Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012:3(2):217–223. 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]

[msaf191-B86] Rinker DC, Specian NK, Zhao S, Gibbons JG. Polar bear evolution is marked by rapid changes in gene copy number in response to dietary shift. Proc Natl Acad Sci U S A. 2019:116(27):13446–13451. 10.1073/pnas.1901093116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B87] Sato S, Tabata S, Hirakawa H, Asamizu E, Shirasawa K, Isobe S, Kaneko T, Nakamura Y, Shibata D, Aoki K. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012:485(7400):635–641. 10.1038/nature11119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B88] Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, Murphy K, O'Hara R, Casalunovo T, Conlin LK, D'arcy M, et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res. 2009:19(9):1682–1690. 10.1101/gr.083501.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B89] Shimizu KK, Shimizu-Inatsugi R, Tsuchimatsu T, Purugganan MD. Independent origins of self-compatibility in Arabidopsis thaliana. Mol Ecol. 2008:17(2):704–714. 10.1111/j.1365-294X.2007.03605.x. [DOI] [PubMed] [Google Scholar]

[msaf191-B90] Silva-Arias GA, Gagnon E, Hembrom S, Fastner A, Khan MR, Stam R, Tellier A. Patterns of presence–absence variation of NLRs across populations of Solanum chilense are clade-dependent and mainly shaped by past demographic history. New Phytol. 2025:245(4):1718–1732. 10.1111/nph.20293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B91] Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, Jia Y, Wu W, Richmond T, Kitzman J, Rosenbaum H, et al. Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet. 2009:5(11):e1000734. 10.1371/journal.pgen.1000734. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B92] Srikanth A, Schmid M. Regulation of flowering time: all roads lead to Rome. Cell Mol Life Sci. 2011:68(12):2013–2037. 10.1007/s00018-011-0673-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B93] Stam R, Nosenko T, Hörger AC, Stephan W, Seidel M, Kuhn JM, Haberer G, Tellier A. The de novo reference genome and transcriptome assemblies of the wild tomato species Solanum chilense highlights birth and death of NLR genes between tomato species. G3 (Bethesda). 2019a:9(12):3933–3941. 10.1534/g3.119.400529. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B94] Stam R, Silva-Arias GA, Tellier A. Subsets of NLR genes show differential signatures of adaptation during colonization of new habitats. New Phytol. 2019b:224(1):367–379. 10.1111/nph.16017. [DOI] [PubMed] [Google Scholar]

[msaf191-B95] Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, Coe BP, Baker C, Nordenfelt S, Bamshad M. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015:349(6253):aab3761. 10.1126/science.aab3761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B96] Tan D, Han T-S, Hou X-H, Tian Z, Guo Y-L, Li Z-W, Xu Y-C, Yang L, Wu Q, Gu H, et al. Adaptation of Arabidopsis thaliana to the Yangtze river basin. Genome Biol. 2017:18(1):16–19. 10.1186/s13059-016-1142-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B97] Thompson PS, Cortez D. New insights into a basic site repair and tolerance. DNA Rep. 2020:90:102866. 10.1016/j.dnarep.2020.102866. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B98] Title PO, Bemmels JB. ENVIREM: an expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography. 2018:41(2):291–307. 10.1111/ecog.02880. [DOI] [Google Scholar]

[msaf191-B99] Tossi VE, Regalado JJ, Iannicelli J, Laino LE, Burrieza HP, Escandón AS, Pitta-Álvarez SI. Beyond Arabidopsis: differential UV-B response mediated by UVR8 in diverse species. Front Plant Sci. 2019:10:780. 10.3389/fpls.2019.00780. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B100] Wei K, Sharifova S, Zhao X, Sinha N, Nakayama H, Tellier A, Silva-Arias GA. Evolution of gene networks underlying adaptation to drought stress in the wild tomato Solanum chilense. Mol Ecol. 2024:33(21):e17536. 10.1111/mec.17536. [DOI] [PubMed] [Google Scholar]

[msaf191-B101] Wei K, Silva-Arias GA, Tellier A. Selective sweeps linked to the colonization of novel habitats and climatic changes in a wild tomato species. New Phytol. 2023:237(5):1908–1921. 10.1111/nph.18634. [DOI] [PubMed] [Google Scholar]

[msaf191-B102] Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009:25(16):2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B103] Xia HUI, Camus-Kulandaivelu L, Stephan W, Tellier A, Zhang Z. Nucleotide diversity patterns of local adaptation at drought-related candidate genes in wild tomatoes. Mol Ecol. 2010:19(19):4144–4154. 10.1111/j.1365-294X.2010.04762.x. [DOI] [PubMed] [Google Scholar]

[msaf191-B104] Xiao S, Jiang L, Wang C, Ow DW. Arabidopsis OXS3 family proteins repress ABA signaling through interactions with AFP1 in the regulation of ABI4 expression. J Exp Bot. 2021:72(15):5721–5734. 10.1093/jxb/erab237. [DOI] [PubMed] [Google Scholar]

[msaf191-B105] Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011:88(1):76–82. 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B106] Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012:16(5):284–287. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B107] Zhao S, Gibbons JG. A population genomic characterization of copy number variation in the opportunistic fungal pathogen Aspergillus fumigatus. PLoS One. 2018:13(8):e0201611. 10.1371/journal.pone.0201611. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B108] Zhou Y, Minio A, Massonnet M, Solares E, Lv Y, Beridze T, Cantu D, Gaut BS. The population genetics of structural variants in grapevine domestication. Nat Plants. 2019:5(9):965–979. 10.1038/s41477-019-0507-8. [DOI] [PubMed] [Google Scholar]

[msaf191-B109] Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, Wu Y, Cheng L, Fang Y, Wu K, et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 2022:606(7914):527–534. 10.1038/s41586-022-04808-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B110] Zmienko A, Marszalek-Zenczak M, Wojciechowski P, Samelak-Czajka A, Luczak M, Kozlowski P, Karlowski WM, Figlerowicz M. AthCNV: a map of DNA copy number variations in the Arabidopsis genome[OPEN]. Plant Cell. 2020:32(6):1797–1819. 10.1105/tpc.19.00640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B111] Żmieńko A, Samelak A, Kozłowski P, Figlerowicz M. Copy number polymorphism in plant genomes. Theor Appl Genet. 2014:127(1):1–18. 10.1007/s00122-013-2177-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[msaf191-B112] Zufall RA, Rausher MD. Genetic changes associated with floral adaptation restrict future evolutionary potential. Nature. 2004:428(6985):847–850. 10.1038/nature02489. [DOI] [PubMed] [Google Scholar]

PERMALINK

Copy Number Variation Shapes Structural Genomic Diversity Associated With Ecological Adaptation in the Wild Tomato Solanum chilense

Kai Wei

Remco Stam

Aurélien Tellier

Gustavo A Silva-Arias

Roles

Abstract

Introduction

Fig. 1.

Results

Summary of CNVs in the Genome of S. chilense and Validation of the Pipeline

CNVs Effectively Capture the Known Species Population Structure

Fig. 2.

Differentiation of Gene CN in Different Populations

Fig. 3.

Copy Number Variation Illuminates Enriched Abiotic Stress-Response Pathways in S. chilense

Gene Expansion and Contraction Patterns Show Differences Along Altitudinal Gradients

Fig. 4.

Table 1.

CN-differentiated Genes Are Associated With Climatic Variation Along the Altitudinal Gradient

Fig. 5.

Discussion

Materials and Methods

Sample Collection and Sequence Read Processing

Identification and Genotyping of CNVs

Population Structure Analysis

Quantification of Gene Copy Number

Estimation of the Population Differentiation by CNVs

Identification of CNV Candidate Genes Associated With Population Differentiation

Gene Ontology (GO) Analysis

Expansion and Contraction of Gene Copy Number

Association Analysis Between Gene Copy Number and Climatic Variables

Supplementary Material

Acknowledgments

Contributor Information

Supplementary Material

Author Contributions

Funding

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases