Genomic exploration of Iranian almond (Prunus dulcis) germplasm: decoding diversity, population structure, and linkage disequilibrium through genotyping-by-sequencing analysis

Soheila Khojand; Mehrshad Zeinalabedini; Reza Azizinezhad; Ali Imani; Mohammad Reza Ghaffari

doi:10.1186/s12864-024-11044-0

. 2024 Nov 18;25:1101. doi: 10.1186/s12864-024-11044-0

Genomic exploration of Iranian almond (Prunus dulcis) germplasm: decoding diversity, population structure, and linkage disequilibrium through genotyping-by-sequencing analysis

Soheila Khojand ¹, Mehrshad Zeinalabedini ^2,^✉, Reza Azizinezhad ³, Ali Imani ⁴, Mohammad Reza Ghaffari ²

PMCID: PMC11575021 PMID: 39558316

Abstract

This study focuses on the genetic diversity and population structure of Prunus dulcis (almond tree), a crucial agricultural component with widespread cultivation and commercial importance, particularly in Iran, a region with a longstanding tradition of almond cultivation. The diverse almond collection in Iran encompasses many local varieties, breeding selections, rootstocks, and international cultivars. This diversity necessitates advanced genotyping techniques to gain insights into genetic diversity, population structure, and linkage disequilibrium (LD). In this paper, genotyping-by-sequencing (GBS) was employed to analyze 62 almond germplasm samples, identifying approximately 63,537 high-quality single nucleotide polymorphisms (SNPs) distributed across the eight chromosomes of the almond genome. On average, there were 30,225 SNPs per chromosome. The analysis yielded an average polymorphism information content (PIC) of 0.315 and an expected heterozygosity (He) rate of 0.28, indicating a significant level of genetic diversity within the studied almond germplasm. The LD analysis demonstrated a rapid decline, with an average LD decay spanning approximately 300 kb for an r² value of 0.2. This suggests substantial hybridization among the sampled almond varieties. Principal Component Analysis (PCA) and structure analysis could not differentiate genotypes based on geographical origin, providing further evidence of genetic mixing among the studied almond populations. An Analysis of Molecular Variance (AMOVA) highlighted significant genetic diversity within populations but revealed minimal differences. This comprehensive study of Iran’s almond genotypes offers valuable insights for future breeding and conservation efforts, emphasizing this agriculturally significant species abundant genetic diversity and intricate population structure.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-024-11044-0.

Keywords: Genetic diversity, Genotyping by sequencing, Population structure, Prunus dulcis

Background

Almonds (Prunus dulcis) are extensively cultivated for their kernels and belong to the Rosaceae family. Their adaptability to environmental conditions has led to the emergence of desirable genes and resistance against unfavorable circumstances [1]. Iran possesses excellent ecological conditions for almond cultivation and significantly contributes to the country’s economy through almond farming. Local germplasm displaying promising economic traits has been selected as cultivars for Iranian orchardists. Understanding almond germplasm’s genetic diversity and population structure is vital for acquiring essential genetic resources for enhancing almond genetics and preservation [2]. Molecular characterization of plant germplasm is the preferred approach for assessing genetic diversity and variations within almond populations [3]. Advanced breeding methods are fundamental in modern agricultural practices for reshaping crop genetics. Successful breeding necessitates precise management of plant genetic resources, integration of advanced genotyping and phenotyping techniques, and a profound understanding of crop genetic structures (4). DNA markers are employed to identify cultivars, assess genetic diversity, determine paternity, characterize almond germplasm collections, construct genetic linkage maps, and identify marker-trait associations [5]. Genotyping and phenotyping advancements have revolutionized almond breeding selection procedures, providing valuable insights and perspectives. Molecular studies and taxonomical research play a pivotal role in species identification within the almond genus [6]. SSRs, renowned for their high polymorphism, co-dominant characteristics, and reproducibility, have been widely utilized [7]. However, SSRs have limitations in extensive germplasm screening due to time, labor, and low throughput [8]. SNPs, on the other hand, with their abundance and codominant biallelic characteristics, have shown significant promise in various plant species, including almonds [9]. Next-generation sequencing (NGS) technology has advanced marker discovery, enabling efficient and comprehensive analysis of germplasm collections [10]. NGS-based genome-wide SNP markers play a pivotal role in characterizing germplasm and exploring population structures. Genotyping by sequencing (GBS), a cost-effective approach. has demonstrated success in high-throughput genotyping of various woody perennial species, including almonds [11–15]. GBS protocols are crucial in almond research, facilitating SNP identification, linkage map construction, genetic structure examination, kinship and inbreeding studies, LD decay analysis, gene detection for resistance and quality, GWAS, and QTL mapping. GBS is instrumental in efficient genotyping in almond research, particularly in analyzing genetic diversity, population structures, and LD patterns [4, 5, 9, 14–17].

This study demonstrates the suitability of GBS for efficient genotyping in almond research and represents the first introduction of SNP markers generated through GBS in a diverse collection of Iranian almond genotypes. The results provide valuable insights into Iranian almond genotypes’ genetic diversity and population structures, laying the foundation for future genetic advancements and introducing new cultivars in almond cultivation.

Materials and methods

Plant materials

The research was conducted at the Horticulture station of the AREEO in the southern part of Alborz province, Iran (50° E, 36° N), which features a moderate climate [18]. In a previous morpho-pomological study by Khojand et al. [19], 62 genotypes were precisely selected from a pool comprising over 228 genotypes, primarily based on their pivotal phenotypic characteristics including detailed information on the weight and dimensions of the fruit, as well as the nuts and kernels. For additional information regarding the origin and pedigree of the tested genotypes, please refer to the Supplementary Materials (Table S1). The collection of 62 native and carefully chosen almond samples from Iran and various other countries, specifically two from Italy, two from Spain, two from France, three from the USA, two of unknown origins, and 52 from Iran. Supplementary Table S1 provides a comprehensive listing of all the samples, complete with their precise codes, representing their origin, pedigree.

DNA extraction, library preparation, and sequencing

The modified CTAB (cetyltrimethylammonium bromide) method [20], was employed for DNA extraction from young leaves. Afterwards, the quality and quantity of the genomic DNA were assessed using the NanoDrop ND1000, while the integrity of the DNA was evaluated using a 1% (w/v) agarose gel. Furthermore, BGI Genomics (China) prepared and sequenced GBS (Genotyping-by-sequencing) DNA libraries. The libraries were constructed according to the protocol of Elshire et al. [21] with slight adjustments. In summary, 100 ng of genomic DNA was subjected to digestion using the ApeKI restriction enzyme (NEB, United States). Subsequently, common and barcode adaptors were attached to the digested DNA fragments of each sample, followed by an incubation period of 1 h at 22 °C.

Following the ligation of adapters, the DNA products from each sample were combined in equal volumes. The resulting pool was then purified using the QIAquick PCR Purification kit (Qiagen). To enrich the adapter-ligated DNA fragments, PCR amplification was performed using PCR Primer Cocktail and PCR Master Mix. The PCR products were subjected to agarose gel electrophoresis for size selection, and fragments ranging from 180 to 480 bp were retained using the QIAquick Gel Extraction kit (Qiagen) for gel purification. The quality of the libraries was assessed using the Agilent Technologies 2100 Bioanalyzer and the ABI StepOnePlus Real-Time PCR System. The completed GBS libraries were sequenced using an Illumina HiSeq 2000 platform, with 150 bp paired-end reads generated on a single lane. The libraries were prepared by digesting DNA with the ApeKI restriction enzyme in a 96-plex format, with each plate containing a randomly assigned blank well. PCR amplification was conducted to generate the GBS libraries, and the DNA was sequenced on an Illumina Genome Analyzer II device in a single-flow cell channel. The sequence depth averaged 30X. The GBS procedure was conducted at the University of Wisconsin-Madison Biotechnology Center, following the previously described method outlined by Elshire et al. [21].

Bioinformatic analyses

Quality filtering and SNP calling

The initial sequencing data underwent comprehensive processing to ensure data quality and to identify SNPs in a specialized manner. The original sequencing data were divided into individual samples using Illumina Experiment Manager version 1.16.0. Raw sequence data were deposited in the NCBI BioProject (PRJNA1087167) and NCBI SRA (SRR31052731- SRR31052792). The quality of individual fastq files was rigorously assessed using FASTQC ver. 0.11.5 (Babraham Bioinformatics, Babraham Institute1). The initial reads had sequencing adapters and low-quality bases removed to enhance data quality. This process was executed using trimmomatic-0.39 software [22]. SNPs were identified using version 2 of the stacks pipeline, as outlined Catchen et al. [23]. The trimmed reads were aligned to reference genome assembly of P. dulcis (NCBI; GCF_902201215.1_ALMONDv2_genomic.fa) using the BWA -mem aligner version 7.17 [24]. The Sequence Alignment Map files were converted to binary format and sorted using samtools version 1.6. Loci were constructed from the paired-end data using the gstacks module, and PCR duplicates were removed with Picard version 2.23.3 (available at https://github.com/broadinstitute/picard).

The Populations module of stacks 2.3b was utilized to generate a variant call format (VCF) file containing loci identified in a minimum of two out of four populations (-p 2). SNPs with a minor allele frequency (MAF) greater than 0.01, missing data exceeding 0.52 at marker and genotype levels, as well as high heterozygosity, were rigorously filtered using TASSEL (v5.2.64) [25, 26]. TASSEL (v5.2.64) [25] was also employed to eliminate monomorphic SNPs. To identify loci significantly deviating from the Hardy-Weinberg equilibrium, the dataset was analyzed using PLINK software version 1.9 [27]. The significance threshold for open-pollinated crops was set at 10^-4. Loci out of equilibrium were subsequently excluded from the analysis [26]. Following purification, the spatial distribution of SNPs across 1-megabase (Mb) segments was graphically represented using the “CMplot” tool in R, as described by Yin et al. [28].

Polymorphic Information Content (PIC) was calculated based on SNP data using the equation provided by Botstein et al. [29]. The variables P_i and P_j represent the population frequencies of the ith and jth alleles.

Observed heterozygosity (Ho) and expected heterozygosity (He) were computed using the vcftools software.

Linkage disequilibrium (LD)

LD measurements were conducted using PLINK software version 1.9 [27], focusing on identifying SNPs demonstrating significant linkage disequilibrium (LD), defined by a P-value threshold of less than 0.01. The assessment of LD decay involved plotting pairwise LD values (r²) against the physical distance between SNPs. This analysis was performed for segments of 300 kilobases (kb) within individual chromosomes and across the entire genome, utilizing nonlinear regression techniques in the R software, specifically utilizing the reshape2 and ggplot2 packages.

Analysis of genetic diversity and population structure

For the examination of 62 almond genotypes, both the centered Identity by State (IBS) relationship matrix and the genetic distance matrix were computed using TASSEL (version 5.2.90), following the protocol established by Bradbury et al. [25]. A relationship heatmap and histogram were generated using the “ggplot2” R package, as referenced by Warnes and colleagues. Additionally, the observed Ho and He measures were assessed using vcftools (https://vcftools.github.io/man_latest.html, Danecek et al. 2011 [30]), following the guidelines outlined by Danecek et al. in 2011 [30]. In the study investigating genetic diversity, the neighbor-joining method was employed through TASSEL (version 5.2.90), as detailed by Bradbury et al. [25]. Subsequently, a phylogenetic tree was constructed in Newick format using the same version of TASSEL and then visualized with iTOL (Interactive Tree of Life) version 4.3.3, a tool elucidated by Letunic and Bork [31].

Since sample selection was predicated on remarkable morphological characteristics and the absence of comprehensive data regarding their origins and lineage, the k-means clustering algorithm was applied to partition the population [32].

This research adopted a multifaceted approach to analyze cluster formations, commencing with a PCA tailored for individual data points. This analysis utilized R’s “ape” package [33, 34]. The distribution of almond samples was graphically represented using the initial two principal components, employing the “ggplot2” package for visualization [35].

Discriminant Analysis of Principal Components (DAPC) was then applied to SNP data to refine the distinction between various clusters. The ADMIXTURE software (version 1.3.0) [36] was pivotal in exploring the population structure, identifying the optimal number of population clusters (K values) ranging from 1 to 10. The most suitable K value was determined based on the minimum cross-validation (CV) error reported by the software, and the distribution of admixture proportions across each genotype was again visualized using the “ggplot2” package in R.

Furthermore, this study computed the Fst index, a metric for genetic differentiation among different groups of P. dulcis, using Genepop version 4.0.9 [37]. An analysis of molecular variance (AMOVA) with 1000 permutations was conducted to gain further insight into genetic diversity across populations using the “Poppr” package in R [38]. Prior to AMOVA, missing data were excluded from the analysis.

Results

Genotyping and SNP distribution in almond genomes: an in-depth analysis

In a comprehensive study involving 62 distinct almond genotypes, genomic sequencing generated over 2 million reads, averaging 30,225 reads per sample. This extensive dataset initially revealed 246,121 SNPs, which were refined to 63,537 SNP variants in the final VCF file after multiple filtration stages. The SNP counts within the 1 Mb window size, post-filtering for each chromosome, are depicted in Fig. 1. Further refinement through LD pruning resulted in a curated list of 25,804 SNPs, which were utilized in non-parametric analysis to mitigate the impact of correlated marker data. This comprehensive collection of SNPs was evenly distributed across the almond genome, averaging 7,942 markers per chromosome. The SNP count exhibited variation among chromosomes, with the lowest count of 5,916 on chromosome 5 and the highest count of 12,839 on chromosome 1 (Fig. 1). The cumulative distribution spanned 18,355 kilobases (kb) across the eight chromosomes.

Fig. 1 — Depicts the SNP density on the 8 chromosomes within a 1 Mb window size post-filtering

PIC values ranged from 0 to 0.375, with an average PIC of 0.315. This value remained consistent across all chromosomes. Observed heterozygosity (Ho) values ranged from 0.00 to 0.99, reflecting genetic diversity, with an average of 0.23 (Fig. 2). The highest heterozygosity was observed in group 2 (Ho = 0.13699), while group 3 exhibited the lowest (Ho = 0.12). In comparison, expected heterozygosity (He) values often exceeded observed rates in groups 1 and 4, spanning from 0.08 to 0.50, with an average of 0.28 (Table 1).

Fig. 2 — Presents characteristic statistics of SNPs. (A) Observed heterozygosity (Ho), (B) Expected heterozygosity (He), (C) Minor allele frequencies, (D) Polymorphism information content

Table 1.

Provides genomic information obtained for P.dulcis

Pop.	1	2	3	4
Observed heterozygosity	0.12109	0.13699	0.12043	0.12551
Observed homozygosity	0.87891	0.86301	0.87449	0.87449
Expected heterozygosity	0.14203	0.12165	0.13666	0.12235
Expected homozygosity	0.85797	0.87835	0.86337	0.87765

Open in a new tab

Linkage disequilibrium

LD was estimated for every pairwise SNP combination within the entire germplasm collection. LD decay varied across the 8 chromosomes, as illustrated in Fig. 3. Chromosome 2 exhibited the most rapid LD decay, while chromosome 6 showed the slowest. LD decayed rapidly when r² was between 0.025 and 0.05, leveling off when r² reached 0.025 (Fig. 3). The average LD decay for the entire genome was 300 kb when r² = 0.15.

Fig. 3 — LD measured by r² plotted against the physical map (bp) between pairs of SNP markers

Genetic diversity and population structure

In assessing genetic diversity among 62 genotypes, 63,537 SNPs distributed across the eight chromosomes were employed. Various complementary methods were used to evaluate genetic diversity within the almond germplasm collection. Marker-based kinship coefficients were meticulously calculated for pairs of the 62 almond genotypes, resulting in a wide range of values from − 0.28 to 1.24, with an average coefficient of 0.48, as detailed in Supplementary Table S2. The distribution of these relative kinship values for all marker pairs is illustrated in Fig. 4A.

A genetic distance matrix based on IBS was constructed to further explore genetic diversity among almond genotypes, as depicted in Fig. 4B. This matrix, derived from pairwise comparisons between almond genotypes, exhibited a range of genetic distances from 0.4784 to 2.003, with an average dissimilarity score of 0.1606, as detailed in Supplementary Table S3. Most of these genetic distances fell within the range of 0.15 to 0.2, indicating moderate genetic diversity among the studied genotypes. Notably, the minimal genetic distance observed at 0.4784 between two specific genotypes, “KQ1” and “H,” while the maximum distance of 0.2003 was found between “D2” and “48”.

This analysis provides valuable insights into genetic relationships and differentiation among the almond genotypes under investigation.

A neighbor-joining cluster analysis was conducted to better understand genetic diversity among the almond genotypes. This analysis revealed that the 62 almond varieties could be categorized into three primary clusters, each comprising varieties from distinct regions. This clustering pattern underscores a significant level of genetic admixture among the almond varieties, as illustrated in Fig. 5.

Fig. 5 — Depicts a neighbor-joining dendrogram offering insights into the genetic relationships among 62 distinct almond genotypes

Fig. 6 — A ADMIXTURE’s group assignment results for the 62 genotypes are depicted in a barplot when K = 2. B The admixture plot (K = 2) illustrates the distribution of shared genome proportions among distinct clusters. All almond genotypes included in this study are positioned on the X-axis, while the Y-axis represents the distribution of genetic proportions originating from different ancestral populations within the individuals’ genomes

Most of the examined genotypes are grouped in Cluster 1, which consists of 35 distinct genotypes with an average genetic distance of 0.168. This cluster is divided into three sub-groups: A, B, and C. Sub-group 1 A includes five genotypes, all from Iran, including one from Zanjan, one from Kashmar, one from an unknown location, and two from Karaj. Sub-group 1B comprises eight genotypes featuring diverse origins, including five from Karaj, one from Spain, one from Italy, and another from the Markazi province in Iran. Lastly, Sub-group 1 C is the largest, containing 22 genotypes. Among these, thirteen are from Karaj, while the others originate from various locations, including the USA, Italy, Spain, France, Ghazvin, Isfahan, Shahrekord, Tabriz, and one unidentified place.

Group 2, with an average genetic distance of 0.129, consists of 14 distinct genotypes and is divided into two sub-groups: 2 A and 2B. Sub-group 2 A includes six genotypes, featuring samples from Karaj, Tabriz, the USA, and an undisclosed origin. In contrast, sub-group 2B comprises eight genotypes, primarily from Tabriz (five genotypes), along with two from Karaj and one from an unknown location.

Similarly, Group 3, characterized by an average genetic distance of 0.138, contains 13 genotypes. This group includes a genotype from the USA, seven from Karaj, and five from unspecified locations. The genetic composition of the almond samples reveals a noteworthy admixture pattern across all genotypes studied, as demonstrated in Fig. 6. The population structure was examined using ADMIXTURE software to elucidate the genetic makeup of the 62 almond cultivars. The assessment of cross-validation (CV)errors included models with K values ranging from 1 to 10. Notably, when K (the number of clusters) is set to 2, the lowest CV error is 0.375. This suggests that the most appropriate number of clusters is set to 2, effectively dividing the individual almonds into two distinct categories. Among the 62 genotypes examined, 36 individuals, approximately 58% of the sample, were grouped into these two clusters. Conversely, the remaining 26 genotypes, representing 42% of the total, exhibited membership values below 0.70 and were designated admixed genotypes.

Fig. 7 — *PCA* Scatter Plot of P. dulcis Based on the First Two Principal Components

Table 2 showcases the range of pairwise Fst values, which vary varying from 0.009 to 0.01, specifically for comparisons between groups 2–3 and 1–3.

Table 2.

Pairwise fst values between different populations of P.dulcis

Pop.	4	1	2
3	0.022639	0.014246	0.009402
4		0.030642	0.020197
1			0.02116

Open in a new tab

An AMOVA analysis was conducted on the defined populations identified through structure analysis to explore the genetic differentiation within and among populations in the germplasm collection. The results indicated that a significant portion of the variance, specifically 91.12%, was attributed to variations within populations, while variations among populations accounted for 2.49% (p = 0.001).

PCA was employed to further explore genetic diversity. PC1 and PC2 accounted for a combined 17.00% of the total variance. The results indicated that almond individuals from population 4 exhibited broader dispersion and significant divergence from the other groups, as shown in Fig. 7. This fourth group can be distinguished from the first, second, and third groups along the first principal component.

Similar to the PCA analysis, the DAPC scatter plot effectively distinguishes the Pop 4 cluster from the other P. dulcis individuals, as depicted in Fig. 8.

Fig. 8 — *DAPC* scatter plot of *P. dulcis*

Discussion

The investigators at the Horticulture Station of the AREEO in Iran conducted a study to analyze the genetic composition of almonds using 63,537 SNP indicators. This research offers a novel approach to understanding the genetic structure of Iranian almond germplasm and provides an in-depth analysis of genetic diversity, population structure, and characteristics of these varieties. The findings are expected to significantly advance genetics and agricultural science, particularly in almond cultivation and breeding in Iran.

Genotyping by sequencing

This study offers a detailed genomic analysis of 62 distinct almond genotypes, resulting in over two million reads, with an average of 30,225 reads per sample. This high-throughput sequencing allowed for the initial identification of 246,121 SNPs across the almond genome. After thorough filtering for missing data and minor allele frequency (MAF), the dataset was refined to 63,537 high-quality SNPs, marking a significant advancement in almond genomics and surpassing previous GBS studies. The analysis of SNP distribution, using a 1 Mb window size, revealed considerable variation in SNP density across different chromosomes, highlighting the genomic diversity within almonds. Further refinement through linkage disequilibrium (LD) pruning reduced the dataset to 25,804 independent SNPs, which are ideal for non-parametric analyses. These SNPs were spread across the eight almond chromosomes, with an average of 7,942 markers per chromosome, though counts ranged from 5,916 to 12,839. This variation reflects the differences in chromosome lengths and gene densities, a trend consistent with findings from Pavan et al. [14]. Overall, the refined SNP dataset covers 18,355 kilobases of the almond genome, providing a valuable resource for future genetic research. This dataset is crucial for identifying genetic associations and advancing almond breeding programs. The use of GBS has proven to be a powerful tool for assessing genetic diversity [14, 15, 39, 41] and linking genetic markers to phenotypic traits through Quantitative Trait Loci (QTL) analysis [9, 40, 42] and genome-wide association studies (GWAS) [5]. These advancements underscore the potential of GBS to drive forward both genetic research and breeding strategies in almonds [43]. It is well-established that the PIC is influenced by several variables, encompassing marker types, genetic diversity, genotype scope, and the breeding type of the species, as illuminated by Singh et al. [44]. In SNP markers, PIC values typically fall within the range of 0 to 0.5, a characteristic of their bi-allelic nature. In contrast, multiallelic SSR markers can span from 0.5 to 1.0 [45]. In our study, PIC values spanned from 0.0 to 0.375, with an average PIC value of 0.351. Furthermore, our investigation delineated that the average PIC value for EST-SSR loci (0.66) lagged behind that of genomic SSR loci (0.80). These findings resonate with earlier research, demonstrating that EST-SSRs exhibit lower polymorphism than their genomic counterparts across various almond genotypes [46] and other species [47]. Comparing the mean PIC value of EST-SSR loci (0.66) with that of genomic SSR loci (0.80) demonstrates consistent support for the latter’s superior polymorphism. In a separate investigation, PIC values for RAPD primers ranged from 0.77 to 0.97, while those for ISSR primers spanned from 0.36 to 0.97 [48, 49]. Despite the limited discriminatory ability implied by the PIC values for SNP markers, the 7.28Kb high-quality SNPs demonstrated their effectiveness in quantifying genetic diversity and elucidating genetic relationships among almond genotypes. When it comes to genetic diversity, our study uncovered a common thread of limited genetic diversity characterized by observed heterozygosity across all populations. Remarkably, this pattern remained consistent across all groups, indicating similar levels of heterozygosity. Past studies have also painted a picture of low to moderate genetic diversity stemming from the ongoing almond breeding processes.

Consequently, an imperative need exists to enhance population size through non-natural breeding programs to safeguard and uncover the latent genetic diversity within Iranian almond germplasm. Notably, the average Ho for the SNPs within the four population clusters comprising 62 almond genotypes was 0.12, slightly below the average He of 0.13. This discrepancy underscores the prevailing limited heterozygosity within almond populations.

Linkage disequilibrium

Linkage disequilibrium (LD) is characterized by the non-random association of alleles at different genetic loci. It is influenced by various factors, including recombination, mutation, genetic drift, assortative mating, and population size and structure [50]. LD decay exhibits variations in plant genetics contingent upon the species’ reproductive strategies. Cross-pollinated plant species tend to experience a more rapid decay of LD than their self-pollinating counterparts, primarily due to more constrained recombination.

The analysis of Linkage Disequilibrium (LD) in the almond genotypes provides critical insights into the genetic architecture of this germplasm collection. The study revealed a rapid decay of LD.

in certain chromosomes, particularly Chromosome 2, compared to a slower decay in Chromosome 6. This variation suggests differing evolutionary pressures, possibly due to higher recombination rates or selection effects in Chromosome 2, while Chromosome 6 may be under stabilizing selection. The average LD decay observed at approximately 300 kb at r² = 0.15 indicates the scale.

at which LD impacts genetic studies. The significant decrease in LD when r² is between 0.025 and.

0.05 reflects the interplay of genetic drift and selection pressures. This highlights the need to consider both factors when interpreting LD patterns in the context of population genetics [51, 52]. Typically, Linkage Disequilibrium (LD) diminishes more quickly in species that cross-pollinate compared to those that self-pollinate. This is because recombination tends to be less effective inself-pollinating species [53]. In almonds, which are self-incompatible, we noticed a particularly swift reduction in LD values. Interestingly, this rapid decline reflects patterns observed in other pecies, such as the self-compatible peach [54] and the self-incompatible grapevine [55].

Assessment of genetic diversity and population structure in almond varieties

Evaluating genetic diversity and comprehension of population structure is paramount for conserving and utilizing germplasm collections [56]. Previous research has delved into the genetic characteristics of renowned almond germplasm collections [57]. The present study aims to elucidate the genetic composition of 62 almond genotypes sourced from the Iranian almond germplasm collection at AREEO in the southern region of Iran’s Alborz province. Our investigation harnesses genotyping-by-sequencing (GBS) data to enhance Iran’s efforts to conserve, advance, and optimize the utilization of its genetic resources.

Several superior genotypes have been carefully characterized using ampelographic descriptors [57–59]. Despite the significance of these efforts and the implementation of conservation strategies at the Agricultural Research Horticulture Station, the molecular characterization of Iranian genotypes has lagged behind that of other countries engaged in almond cultivation. Previous studies have utilized SSR markers to assess genetic diversity and structure across a broad spectrum of almond genotypes, including cultivated and wild varieties from various geographical regions in Iran [60, 61, 62]. By employing GBS markers, this study represents the first comprehensive comparison of genetic diversity, structure, and description among 62 almond genotypes sourced from the Iranian germplasm collection, encompassing national and international varieties.

The examined genotypes possess significant agronomic and economic value for almond orchards worldwide. This study employs diverse methodologies to investigate genetic diversity and population structure among almond genotypes. An identity-by-state (IBS) based genetic distance matrix and marker-based kinship coefficients were computed to assess genetic diversity. Three complementary clustering methods were utilized: hierarchical clustering via neighbor-joining (NJ), PCA, and Bayesian model-based clustering. The almond genotypes under scrutiny exhibit a noteworthy degree of admixture. Structural analysis reveals that all almond genotypes fall into two distinct genetic populations. The study unveils a genetic distance coefficient of -0.220 to 1.792, with an average dissimilarity of 0.785. These results underscore the substantial genetic diversity present among the 62 almond genotypes.

Moreover, the kinship values unveil substantial genetic diversity within most almond genotypes studied. The genetic difference values observed in our research align with those found in other studies characterizing almond genotypes. In our investigation, cluster analysis based on neighbor-joining was conducted without considering the geographical origin of the almond genotypes. While the almond genotypes clustered into three main groups, with some forming sub-clusters, this clustering pattern did not correspond to the geographical source of the samples, indicating shared genetic characteristics.

Consistent with our findings, cluster analyses in numerous other studies failed to classify almond genotypes based on geographical origin [62]. However, Ansari et al. [63], Halasz et al. [49], and Mahood et al. [48] successfully distinguished between wild and cultivated almond genotypes using cluster analysis. PCA and population structure analysis similarly struggled to differentiate among almond genotypes of diverse backgrounds, suggesting the presence of gene flow and a significant degree of admixture. Our results align with those of McCord et al. [64], who investigated a group of 94 almond genotypes and observed limited association with their geographical source based on PCA. Several studies have reported that PCA can distinguish wild and cultivated genotypes from various regions. McCord et al. [64] conducted PCA on 10 SSRs among 94 genotypes from Central Asian and Cold-adapted North American Germplasm, identifying two major groups corresponding to wild and cultivated almonds.

Several factors contribute to the observed lack of correlation between almond varieties and their geographical origins. Significant genetic admixture, coupled with a long history of cultivation and selective breeding, has obscured geographical patterns, leading to notable genetic similarities among varieties from different regions. Additionally, the movement of almond varieties through trade and migration has facilitated the exchange of genetic material, further diminishing geographical differentiation. Environmental factors also play a crucial role; similar environmental conditions can drive convergent evolution, resulting in genetically diverse populations exhibiting similar traits. Finally, the relatively small sample size of 62 genotypes limits the study’s ability to fully capture the dynamics between genetic diversity and geographical origin. Expanding the sample size could provide more comprehensive insights into these complex relationships [65]. Our investigation reveals that the results of the AMOVA demonstrate significant genetic diversity within populations, while the variation among populations is minimal. This suggests that the substantial variation within the population may be attributed to extensive genetic interaction or gene flow. These findings echo previous studies on almond genotypes, which consistently observed high diversity within populations but low genetic diversity among populations [15, 49].

Conclusion

This study utilized GBS technology to extensively analyze the genomes of Iran’s superior almond cultivars, revealing insights into genetic diversity, population structure, and phylogenetic relationships. Population structure analysis revealed two distinct clusters, while phylogenetic and PCA analyses revealed four clusters, suggesting gene flow or hybridization. Results indicated significant genetic mixing across regions and a low to moderate level of genetic diversity in the current breeding program, as evidenced by positive inbreeding coefficients. These findings highlight the need for strategies to boost genetic diversity and mitigate future breeding challenges. This research provides valuable insights for identifying superior genes and improving breeding approaches through GWAS and Marker-Assisted Selection (MAS), and emphasizes the importance of expanding the genetic base and exploring innovative breeding methods for sustainable almond crop enhancement.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(15.9KB, docx)}

Supplementary Material 2^{(46.6KB, txt)}

Supplementary Material 3^{(27.3KB, xlsx)}

Acknowledgements

We are grateful to the Agricultural Biotechnology Research Institute of Iran (ABRII) and Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran for providing the almond accessions used in this study.

Author contributions

M.Z.: designed and supervised the study, S.Kh: processed the DNA samples, conceived and conducted the experimental and bioinformatics analyses, and drafted the manuscript, R.A.: supervised the study, A.I.: provided the initial data, M.R.G.: revised the manuscript.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Data availability

All datasets generated in this study are included in this published article. Raw sequence data were deposited in the NCBI BioProject (PRJNA1087167) and NCBI SRA (SRR31052731- SRR31052792).

Declarations

Ethics approval and consent to participate

This article does not contain any research with human participants or animals performed by any of the authors.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Gradziel T, Lampinen B, editors. Defining the limits of almond productivity to facilitate marker assisted selection and orchard management. V International Symposium on Pistachios and Almonds 912; 2009.
2.Kaya HB, Dilli Y, Oncu-Oner T, Ünal A. Exploring genetic diversity and population structure of a large grapevine (Vitis vinifera L.) germplasm collection in Türkiye. Front Plant Sci. 2023;14:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Mas-Gómez J, Cantín CM, Moreno MÁ, Martínez-García PJ. Genetic Diversity and Genome-Wide Association Study of Morphological and Quality Traits in Peach using two Spanish Peach Germplasm collections. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]
4.Pérez de Los Cobos F, Coindre E, Dlalah N, Quilot-Turion B, Batlle I, Arús P, et al. Almond population genomics and non-additive GWAS reveal new insights into almond dissemination history and candidate genes for nut traits and blooming time. Hortic Res. 2023;10(10):uhad193. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Sideli GM, Mather D, Wirthensohn M, Dicenta F, Goonetilleke SN, Martínez-García PJ, et al. Genome-wide association analysis and validation with KASP markers for nut and shell traits in almond (Prunus dulcis [Mill.] D.A.Webb). Tree Genet Genomes. 2023;19(2):13. [Google Scholar]
6.Martinez-Gomez P, Sánchez-Pérez R, Rubio M, Dicenta F, Gradziel T, Sozzi G. Application of Recent Biotechnologies to Prunus Tree Crop Genetic Improvement. Int J Agric Nat Resour. 2005;32(2):73–96. [Google Scholar]
7.Gupta, P. K., Balyan, H. S., Sharma, P. C., & Ramesh, B. Microsatellites in plants: a new class of molecular markers. Current science. 1996;70:45–54.
8.Deschamps S, Llaca V, May GD. Genotyping-by-sequencing in plants. Biology (Basel). 2012;1(3):460–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Goonetilleke SN, March TJ, Wirthensohn MG, Arús P, Walker AR, Mather DE. Genotyping by sequencing in Almond: SNP Discovery, linkage mapping, and marker design. (Bethesda). 2018;G3(1):161–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Chung YS, Choi SC, Jun T-H, Kim C, Horticulture. Environ Biotechnol. 2017;58(5):425–31. [Google Scholar]
11.D’Agostino N, Taranto F, Camposeo S, Mangini G, Fanelli V, Gadaleta S, et al. GBS-derived SNP catalogue unveiled wide genetic variability and geographical relationships of Italian olive cultivars OPEN. Sci Rep. 2018;8:15877. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kaya H, Akdemir D, Lozano R, Cetin O, Kaya H, Sahin M, et al. Genome wide association study of 5 agronomic traits in olive (Olea europaea L). Sci Rep. 2019;9:18764. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Bielenberg DG, Rauh B, Fan S, Gasic K, Abbott AG, Reighard GL, et al. Genotyping by sequencing for SNP-Based Linkage Map Construction and QTL analysis of Chilling Requirement and Bloom date in Peach [Prunus persica (L.) Batsch]. PLoS ONE. 2015;10(10):e0139406. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pavan S, Delvento C, Mazzeo R, Ricciardi F, Losciale P, Gaeta L, et al. Almond diversity and homozygosity define structure, kinship, inbreeding, and linkage disequilibrium in cultivated germplasm, and reveal genomic associations with nut and seed weight. Hortic Res. 2021;8(1):15. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Wu P, Li D, Zhuang R, Zuo H, Pan Z, Yang B, et al. Genome resequencing reveals the population structure and genetic diversity of almond in Xinjiang, China. Genet Resour Crop Evol. 2023;70(8):2713–25. [Google Scholar]
16.Lotti C, Minervini AP, Delvento C, Losciale P, Gaeta L, Sánchez-Pérez R, et al. Detection and distribution of two dominant alleles associated with the sweet kernel phenotype in almond cultivated germplasm. Front Plant Sci. 2023;14:1171195. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lobato Gómez M, Guajardo V, Solís S, Martínez-García PJ, Gasic K, Moreno M. Genetic study of flower traits in a segregating peach-almond progeny. Acta Hort. 2021;1307:63–70. [Google Scholar]
18.Imani A, Amani G, Shamili M, Mousavi A, Hamed R, Rasouli M, et al. Diversity and broad sense heritability of phenotypic characteristic in almond cultivars and genotypes. Int J Hortic Sci Technol. 2021;8(3):281–9. [Google Scholar]
19.Khojand S, Zeinalabedini M, Azizinezhad R, Imani A, Ghaffari MR. Diversity of nut and Kernel Weight, Oil Content, and the main fatty acids of some almond cultivars and genotypes. J Nuts. 2023;14(1):33–44. [Google Scholar]
20.Lodhi MA, Ye G-N, Weeden NF, Reisch BI. A simple and efficient method for DNA extraction from grapevine cultivars andVitis species. Plant Mol Biology Report. 1994;12(1):6–13. [Google Scholar]
21.Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA. Stacks: an analysis tool set for population genomics. Mol Ecol. 2013;22(11):3124–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5. [DOI] [PubMed] [Google Scholar]
26.Pavan S, Delvento C, Ricciardi L, Lotti C, Ciani E, D’Agostino N. Recommendations for choosing the genotyping method and Best Practices for Quality Control in Crop Genome-Wide Association Studies. Front Genet. 2020;11:447. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Yin L, Zhang H, Li X, Zhao S, Liu X. hibayes: An R Package to Fit Individual-Level, Summary-Level and Single-Step Bayesian Regression Models for Genomic Prediction and Genome-Wide Association Studies. BioRxiv. 2022.
29.Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32(3):314–31. [PMC free article] [PubMed] [Google Scholar]
30.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Letunic I, Bork P. Interactive tree of life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47(W1):W256–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R. Springer New York; 2013.
33.Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35(3):526–8. [DOI] [PubMed] [Google Scholar]
34.R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing; 2013.
35.Wickham H. ggplot2. WIREs computational statistics. Wiley Interdiscip Rev. 2011;3(2):180–5. [Google Scholar]
36.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Rousset F. Genepop’007: a complete re-implementation of the genepop software for Windows and Linux. Mol Ecol Resour. 2008;8(1):103–6. [DOI] [PubMed] [Google Scholar]
38.Meirmans P, Tienderen P. GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Mol Ecol Notes. 2004;4(4):792–4. [Google Scholar]
39.Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, Casler MD, et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 2013;9(1):e1003215. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Guajardo V, Solís S, Sagredo B, Gainza F, Muñoz C, Gasic K, et al. Construction of high density Sweet Cherry (Prunus avium L.) linkage maps using microsatellite markers and SNPs detected by genotyping-by-sequencing (GBS). PLoS ONE. 2015;10(5):e0127750. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Gürcan K, Teber S, Ercisli S, Yilmaz KU. Genotyping by sequencing (GBS) in Apricots and Genetic Diversity Assessment with GBS-Derived single-nucleotide polymorphisms (SNPs). Biochem Genet. 2016;54(6):854–85. [DOI] [PubMed] [Google Scholar]
42.Salazar JA, Pacheco I, Shinya P, Zapata P, Silva C, Aradhya M, et al. Genotyping by sequencing for SNP-Based linkage analysis and identification of QTLs linked to Fruit Quality traits in Japanese Plum (Prunus salicina Lindl). Front Plant Sci. 2017;8:476. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Miazzi M, D’Agostino N, di Rienzo V, Venerito P, Savino V, Fucilli V, et al. Marginal Grapevine Germplasm from Apulia (Southern Italy) represents an unexplored source of genetic diversity. Agronomy. 2020;10(4):563. [Google Scholar]
44.Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, Tyagi RK, et al. Comparison of SSR and SNP markers in estimation of genetic diversity and population structure of Indian rice varieties. PLoS ONE. 2013;8(12):e84136. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Chen W, Hou L, Zhang Z, Pang X, Li Y, Genetic Diversity. Population structure, and linkage disequilibrium of a Core Collection of Ziziphus jujuba assessed with genome-wide SNPs developed by genotyping-by-sequencing and SSR markers. Front Plant Sci. 2017;8:575. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Rahemi A, Fatahi R, Ebadi A, Taghavi T, Hassani D, Gradziel T, et al. Genetic diversity of some wild almonds and related Prunus species revealed by SSR and EST-SSR molecular markers. Plant Syst Evol. 2012;298(1):173–92. [Google Scholar]
47.Tahan O, Geng Y, Zeng L, Dong S, Chen F, Chen J, et al. Assessment of genetic diversity and population structure of Chinese wild almond, Amygdalus nana, using EST- and genomic SSRs. Biochem Syst Ecol. 2009;37(3):146–53. [Google Scholar]
48.Mahood A, Hama-Salih FM. Characterization of genetic diversity and relationship in almond (Prunus dulcis [mill.] d.a. Webb.) Genotypes by rapd and issr markers in Sulaimani governorate. Appl Ecol Environ Res. 2020;18(1):1739–53. [Google Scholar]
49.Halász J, Kodad O, Galiba GM, Skola I, Ercisli S, Ledbetter CA, et al. Genetic variability is preserved among strongly differentiated and geographically diverse almond germplasm: an assessment by simple sequence repeat markers. Tree Genet Genomes. 2019;15(1):12. [Google Scholar]
50.Flint-Garcia SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54:357–74. [DOI] [PubMed] [Google Scholar]
51.Font i Forcada C, Oraguzie N, Reyes-Chin-Wo S, Espiau MT, Socias i Company R, Fernandez i Marti A. Identification of genetic loci associated with quality traits in almond via association mapping. PLoS ONE. 2015;10(6):e0127656. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Thurow LB, Gasic K, Bassols Raseira MDC, Bonow S, Marques Castro C, Tavaré S. (2002). Linkage disequilibrium: what history has to tell us. TRENDS in Genetics, 18(2), 83–90. [DOI] [PubMed]
53.Font i Forcada C, Oraguzie N, Igartua E, Moreno MÁ, Gogorcena Y. Population structure and marker–trait associations for pomological traits in peach and nectarine cultivars. Tree Genet Genomes. 2013;9:331–49. [Google Scholar]
54.Barnaud A, Laucou V, This P, Lacombe T, Doligez A. Linkage disequilibrium in wild French grapevine, Vitis vinifera L. subsp. Silvestris Heredity. 2010;104(5):431–7. [DOI] [PubMed] [Google Scholar]
55.Ramanatha Rao V, Hodgkin T. Genetic diversity and conservation and utilization of plant genetic resources. Planr Cell Tissue Organ Cult. 2002;68(1):1–19. [Google Scholar]
56.Žulj Mihaljević M, Maletić E, Preiner D, Zdunić G, Bubola M, Zyprian E et al. Genetic Diversity, Population Structure, and Parentage Analysis of Croatian Grapevine Germplasm. Genes (Basel). 2020;11(7). [DOI] [PMC free article] [PubMed]
57.Imani A, Shamili M. Almond nut weight assessment by stepwise regression and path analysis. Int J Fruit Sci. 2018;18(3):338–43. [Google Scholar]
58.Jamshidi A-R, Imani A, Miri SM. Identification of the pollinizer for a new almond genotype ‘Karaj 33’. J Hortic Postharvest Res. 2021;4(Issue 4):521–8. [Google Scholar]
59.Ranjbar A, Imani A, piri s AV. Changes in some physiological and biochemical characteristics of the selected of Almond cultivars (Prunus dulcis Mill.) Grafted on different rootstocks under Drought stress. Dev Biol. 2018;10(3):15–32. [Google Scholar]
60.Sorkheh K, Kiani S, Azimkhani R, Mehri N, Halász J. Nut set evaluation in inter-specific almond × peach backcross progenies for self-compatibility selection in almond breeding programme. Euphytica. 2017;213(8):191. [Google Scholar]
61.Shiran B, Amirbakhtiar N, Kiani S, Mohammadi S, Sayed-Tabatabaei BE, Moradi H. Molecular characterization and genetic relationship among almond cultivars assessed by RAPD and SSR markers. Sci Hort. 2007;111(3):280–92. [Google Scholar]
62.Zeinalabedini M, Majidian P, Ashori R, Gholaminejad A, Ebrahimi MA, Martinez-Gomez P. Integration of molecular and geographical data analysis of Iranian Prunus scoparia populations in order to assess genetic diversity and conservation planning. Sci Hort. 2019;247:49–57. [Google Scholar]
63.Ansari, A., & Gharaghani, A. A comparative study of genetic diversity, heritability and inter-relationships of tree and nut attributes between Prunus scoparia and P. elaeagnifolia using multivariate statistical analysis. Int. J. Hortic. Sci. Technol. 2019;6(1):137–50.
64.McCord P, Singh V, Kaundal A, Roper T. Genetic Diversity of New Almond accessions from central Asian and cold-adapted north American germplasm. J Am Soc Hortic Sci. 2023;148(5):221–8. [Google Scholar]
65.Fernández i Martí A, Font i Forcada C, Kamali K, Rubio-Cabetas MJ, Wirthensohn M. & Socias i Company, R. Molecular analyses of evolution and population structure in a worldwide almond [Prunus dulcis (Mill.) DA Webb syn. P. amygdalus Batsch] pool assessed by microsatellite markers. Genet Resour Crop Evol. 2015;62:205–19.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(15.9KB, docx)}

Supplementary Material 2^{(46.6KB, txt)}

Supplementary Material 3^{(27.3KB, xlsx)}

Data Availability Statement

All datasets generated in this study are included in this published article. Raw sequence data were deposited in the NCBI BioProject (PRJNA1087167) and NCBI SRA (SRR31052731- SRR31052792).

[CR1] 1.Gradziel T, Lampinen B, editors. Defining the limits of almond productivity to facilitate marker assisted selection and orchard management. V International Symposium on Pistachios and Almonds 912; 2009.

[CR2] 2.Kaya HB, Dilli Y, Oncu-Oner T, Ünal A. Exploring genetic diversity and population structure of a large grapevine (Vitis vinifera L.) germplasm collection in Türkiye. Front Plant Sci. 2023;14:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Mas-Gómez J, Cantín CM, Moreno MÁ, Martínez-García PJ. Genetic Diversity and Genome-Wide Association Study of Morphological and Quality Traits in Peach using two Spanish Peach Germplasm collections. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]

[CR4] 4.Pérez de Los Cobos F, Coindre E, Dlalah N, Quilot-Turion B, Batlle I, Arús P, et al. Almond population genomics and non-additive GWAS reveal new insights into almond dissemination history and candidate genes for nut traits and blooming time. Hortic Res. 2023;10(10):uhad193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Sideli GM, Mather D, Wirthensohn M, Dicenta F, Goonetilleke SN, Martínez-García PJ, et al. Genome-wide association analysis and validation with KASP markers for nut and shell traits in almond (Prunus dulcis [Mill.] D.A.Webb). Tree Genet Genomes. 2023;19(2):13. [Google Scholar]

[CR6] 6.Martinez-Gomez P, Sánchez-Pérez R, Rubio M, Dicenta F, Gradziel T, Sozzi G. Application of Recent Biotechnologies to Prunus Tree Crop Genetic Improvement. Int J Agric Nat Resour. 2005;32(2):73–96. [Google Scholar]

[CR7] 7.Gupta, P. K., Balyan, H. S., Sharma, P. C., & Ramesh, B. Microsatellites in plants: a new class of molecular markers. Current science. 1996;70:45–54.

[CR8] 8.Deschamps S, Llaca V, May GD. Genotyping-by-sequencing in plants. Biology (Basel). 2012;1(3):460–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Goonetilleke SN, March TJ, Wirthensohn MG, Arús P, Walker AR, Mather DE. Genotyping by sequencing in Almond: SNP Discovery, linkage mapping, and marker design. (Bethesda). 2018;G3(1):161–72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Chung YS, Choi SC, Jun T-H, Kim C, Horticulture. Environ Biotechnol. 2017;58(5):425–31. [Google Scholar]

[CR11] 11.D’Agostino N, Taranto F, Camposeo S, Mangini G, Fanelli V, Gadaleta S, et al. GBS-derived SNP catalogue unveiled wide genetic variability and geographical relationships of Italian olive cultivars OPEN. Sci Rep. 2018;8:15877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Kaya H, Akdemir D, Lozano R, Cetin O, Kaya H, Sahin M, et al. Genome wide association study of 5 agronomic traits in olive (Olea europaea L). Sci Rep. 2019;9:18764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Bielenberg DG, Rauh B, Fan S, Gasic K, Abbott AG, Reighard GL, et al. Genotyping by sequencing for SNP-Based Linkage Map Construction and QTL analysis of Chilling Requirement and Bloom date in Peach [Prunus persica (L.) Batsch]. PLoS ONE. 2015;10(10):e0139406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Pavan S, Delvento C, Mazzeo R, Ricciardi F, Losciale P, Gaeta L, et al. Almond diversity and homozygosity define structure, kinship, inbreeding, and linkage disequilibrium in cultivated germplasm, and reveal genomic associations with nut and seed weight. Hortic Res. 2021;8(1):15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Wu P, Li D, Zhuang R, Zuo H, Pan Z, Yang B, et al. Genome resequencing reveals the population structure and genetic diversity of almond in Xinjiang, China. Genet Resour Crop Evol. 2023;70(8):2713–25. [Google Scholar]

[CR16] 16.Lotti C, Minervini AP, Delvento C, Losciale P, Gaeta L, Sánchez-Pérez R, et al. Detection and distribution of two dominant alleles associated with the sweet kernel phenotype in almond cultivated germplasm. Front Plant Sci. 2023;14:1171195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Lobato Gómez M, Guajardo V, Solís S, Martínez-García PJ, Gasic K, Moreno M. Genetic study of flower traits in a segregating peach-almond progeny. Acta Hort. 2021;1307:63–70. [Google Scholar]

[CR18] 18.Imani A, Amani G, Shamili M, Mousavi A, Hamed R, Rasouli M, et al. Diversity and broad sense heritability of phenotypic characteristic in almond cultivars and genotypes. Int J Hortic Sci Technol. 2021;8(3):281–9. [Google Scholar]

[CR19] 19.Khojand S, Zeinalabedini M, Azizinezhad R, Imani A, Ghaffari MR. Diversity of nut and Kernel Weight, Oil Content, and the main fatty acids of some almond cultivars and genotypes. J Nuts. 2023;14(1):33–44. [Google Scholar]

[CR20] 20.Lodhi MA, Ye G-N, Weeden NF, Reisch BI. A simple and efficient method for DNA extraction from grapevine cultivars andVitis species. Plant Mol Biology Report. 1994;12(1):6–13. [Google Scholar]

[CR21] 21.Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA. Stacks: an analysis tool set for population genomics. Mol Ecol. 2013;22(11):3124–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Pavan S, Delvento C, Ricciardi L, Lotti C, Ciani E, D’Agostino N. Recommendations for choosing the genotyping method and Best Practices for Quality Control in Crop Genome-Wide Association Studies. Front Genet. 2020;11:447. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Yin L, Zhang H, Li X, Zhao S, Liu X. hibayes: An R Package to Fit Individual-Level, Summary-Level and Single-Step Bayesian Regression Models for Genomic Prediction and Genome-Wide Association Studies. BioRxiv. 2022.

[CR29] 29.Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32(3):314–31. [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Letunic I, Bork P. Interactive tree of life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47(W1):W256–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R. Springer New York; 2013.

[CR33] 33.Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35(3):526–8. [DOI] [PubMed] [Google Scholar]

[CR34] 34.R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing; 2013.

[CR35] 35.Wickham H. ggplot2. WIREs computational statistics. Wiley Interdiscip Rev. 2011;3(2):180–5. [Google Scholar]

[CR36] 36.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Rousset F. Genepop’007: a complete re-implementation of the genepop software for Windows and Linux. Mol Ecol Resour. 2008;8(1):103–6. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Meirmans P, Tienderen P. GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Mol Ecol Notes. 2004;4(4):792–4. [Google Scholar]

[CR39] 39.Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, Casler MD, et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 2013;9(1):e1003215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Guajardo V, Solís S, Sagredo B, Gainza F, Muñoz C, Gasic K, et al. Construction of high density Sweet Cherry (Prunus avium L.) linkage maps using microsatellite markers and SNPs detected by genotyping-by-sequencing (GBS). PLoS ONE. 2015;10(5):e0127750. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Gürcan K, Teber S, Ercisli S, Yilmaz KU. Genotyping by sequencing (GBS) in Apricots and Genetic Diversity Assessment with GBS-Derived single-nucleotide polymorphisms (SNPs). Biochem Genet. 2016;54(6):854–85. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Salazar JA, Pacheco I, Shinya P, Zapata P, Silva C, Aradhya M, et al. Genotyping by sequencing for SNP-Based linkage analysis and identification of QTLs linked to Fruit Quality traits in Japanese Plum (Prunus salicina Lindl). Front Plant Sci. 2017;8:476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Miazzi M, D’Agostino N, di Rienzo V, Venerito P, Savino V, Fucilli V, et al. Marginal Grapevine Germplasm from Apulia (Southern Italy) represents an unexplored source of genetic diversity. Agronomy. 2020;10(4):563. [Google Scholar]

[CR44] 44.Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, Tyagi RK, et al. Comparison of SSR and SNP markers in estimation of genetic diversity and population structure of Indian rice varieties. PLoS ONE. 2013;8(12):e84136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Chen W, Hou L, Zhang Z, Pang X, Li Y, Genetic Diversity. Population structure, and linkage disequilibrium of a Core Collection of Ziziphus jujuba assessed with genome-wide SNPs developed by genotyping-by-sequencing and SSR markers. Front Plant Sci. 2017;8:575. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Rahemi A, Fatahi R, Ebadi A, Taghavi T, Hassani D, Gradziel T, et al. Genetic diversity of some wild almonds and related Prunus species revealed by SSR and EST-SSR molecular markers. Plant Syst Evol. 2012;298(1):173–92. [Google Scholar]

[CR47] 47.Tahan O, Geng Y, Zeng L, Dong S, Chen F, Chen J, et al. Assessment of genetic diversity and population structure of Chinese wild almond, Amygdalus nana, using EST- and genomic SSRs. Biochem Syst Ecol. 2009;37(3):146–53. [Google Scholar]

[CR48] 48.Mahood A, Hama-Salih FM. Characterization of genetic diversity and relationship in almond (Prunus dulcis [mill.] d.a. Webb.) Genotypes by rapd and issr markers in Sulaimani governorate. Appl Ecol Environ Res. 2020;18(1):1739–53. [Google Scholar]

[CR49] 49.Halász J, Kodad O, Galiba GM, Skola I, Ercisli S, Ledbetter CA, et al. Genetic variability is preserved among strongly differentiated and geographically diverse almond germplasm: an assessment by simple sequence repeat markers. Tree Genet Genomes. 2019;15(1):12. [Google Scholar]

[CR50] 50.Flint-Garcia SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54:357–74. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Font i Forcada C, Oraguzie N, Reyes-Chin-Wo S, Espiau MT, Socias i Company R, Fernandez i Marti A. Identification of genetic loci associated with quality traits in almond via association mapping. PLoS ONE. 2015;10(6):e0127656. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Thurow LB, Gasic K, Bassols Raseira MDC, Bonow S, Marques Castro C, Tavaré S. (2002). Linkage disequilibrium: what history has to tell us. TRENDS in Genetics, 18(2), 83–90. [DOI] [PubMed]

[CR53] 53.Font i Forcada C, Oraguzie N, Igartua E, Moreno MÁ, Gogorcena Y. Population structure and marker–trait associations for pomological traits in peach and nectarine cultivars. Tree Genet Genomes. 2013;9:331–49. [Google Scholar]

[CR54] 54.Barnaud A, Laucou V, This P, Lacombe T, Doligez A. Linkage disequilibrium in wild French grapevine, Vitis vinifera L. subsp. Silvestris Heredity. 2010;104(5):431–7. [DOI] [PubMed] [Google Scholar]

[CR55] 55.Ramanatha Rao V, Hodgkin T. Genetic diversity and conservation and utilization of plant genetic resources. Planr Cell Tissue Organ Cult. 2002;68(1):1–19. [Google Scholar]

[CR56] 56.Žulj Mihaljević M, Maletić E, Preiner D, Zdunić G, Bubola M, Zyprian E et al. Genetic Diversity, Population Structure, and Parentage Analysis of Croatian Grapevine Germplasm. Genes (Basel). 2020;11(7). [DOI] [PMC free article] [PubMed]

[CR57] 57.Imani A, Shamili M. Almond nut weight assessment by stepwise regression and path analysis. Int J Fruit Sci. 2018;18(3):338–43. [Google Scholar]

[CR58] 58.Jamshidi A-R, Imani A, Miri SM. Identification of the pollinizer for a new almond genotype ‘Karaj 33’. J Hortic Postharvest Res. 2021;4(Issue 4):521–8. [Google Scholar]

[CR59] 59.Ranjbar A, Imani A, piri s AV. Changes in some physiological and biochemical characteristics of the selected of Almond cultivars (Prunus dulcis Mill.) Grafted on different rootstocks under Drought stress. Dev Biol. 2018;10(3):15–32. [Google Scholar]

[CR60] 60.Sorkheh K, Kiani S, Azimkhani R, Mehri N, Halász J. Nut set evaluation in inter-specific almond × peach backcross progenies for self-compatibility selection in almond breeding programme. Euphytica. 2017;213(8):191. [Google Scholar]

[CR61] 61.Shiran B, Amirbakhtiar N, Kiani S, Mohammadi S, Sayed-Tabatabaei BE, Moradi H. Molecular characterization and genetic relationship among almond cultivars assessed by RAPD and SSR markers. Sci Hort. 2007;111(3):280–92. [Google Scholar]

[CR62] 62.Zeinalabedini M, Majidian P, Ashori R, Gholaminejad A, Ebrahimi MA, Martinez-Gomez P. Integration of molecular and geographical data analysis of Iranian Prunus scoparia populations in order to assess genetic diversity and conservation planning. Sci Hort. 2019;247:49–57. [Google Scholar]

[CR63] 63.Ansari, A., & Gharaghani, A. A comparative study of genetic diversity, heritability and inter-relationships of tree and nut attributes between Prunus scoparia and P. elaeagnifolia using multivariate statistical analysis. Int. J. Hortic. Sci. Technol. 2019;6(1):137–50.

[CR64] 64.McCord P, Singh V, Kaundal A, Roper T. Genetic Diversity of New Almond accessions from central Asian and cold-adapted north American germplasm. J Am Soc Hortic Sci. 2023;148(5):221–8. [Google Scholar]

[CR65] 65.Fernández i Martí A, Font i Forcada C, Kamali K, Rubio-Cabetas MJ, Wirthensohn M. & Socias i Company, R. Molecular analyses of evolution and population structure in a worldwide almond [Prunus dulcis (Mill.) DA Webb syn. P. amygdalus Batsch] pool assessed by microsatellite markers. Genet Resour Crop Evol. 2015;62:205–19.

PERMALINK

Genomic exploration of Iranian almond (Prunus dulcis) germplasm: decoding diversity, population structure, and linkage disequilibrium through genotyping-by-sequencing analysis

Soheila Khojand

Mehrshad Zeinalabedini

Reza Azizinezhad

Ali Imani

Mohammad Reza Ghaffari

Abstract

Supplementary Information

Background

Materials and methods

Plant materials

DNA extraction, library preparation, and sequencing

Bioinformatic analyses

Quality filtering and SNP calling

Linkage disequilibrium (LD)

Analysis of genetic diversity and population structure

Results

Genotyping and SNP distribution in almond genomes: an in-depth analysis

Fig. 1.

Fig. 2.

Table 1.

Linkage disequilibrium

Fig. 3.

Genetic diversity and population structure

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Table 2.

Fig. 8.

Discussion

Genotyping by sequencing

Linkage disequilibrium

Assessment of genetic diversity and population structure in almond varieties

Conclusion

Electronic supplementary material

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases