Abstract
Wild soybean (Glycine soja), the ancestor of the cultivated soybean (G. max), is a crucial resource for capturing the genetic diversity of soybean species. In this study, we used a set of 78 genome-wide microsatellite markers to analyse the genetic diversity and geographic differentiation patterns in a global collection of 2,050 G. soja accessions and a mini-core collection of G. max stored in two public seed banks. We observed a notable reduction in the genetic diversity of G. max compared with G. soja and identified a close phylogenetic relationship between G. max and a G. soja subpopulation located in central China. Furthermore, we revealed substantial genetic divergence between northern and southern subpopulations, accompanied by diminished genetic diversity in the northern subpopulations. Two clusters were discovered among the accessions from north-eastern China—one genetically close to those from South Korea and Southern Japan, and another close to those from Amur Oblast, Russia. Finally, 192 accessions were assigned to a mini-core collection of G. soja, retaining 73.8% of the alleles detected in the entire collection. This mini-core collection is accessible to those who need it, facilitating efficient evaluation and utilization of G. soja genetic resources in soybean breeding initiatives.
Keywords: soybean, microsatellite, genetic diversity, population structure, core collection
1. Introduction
The process of domesticating plants from wild species to crops has been accompanied by artificial selection for desirable traits, such as the loss of seed dormancy and shattering, lodging tolerance, uniform maturation, and increased palatability.1,2 Because primitive farmers have used only a small portion of the plants from the progenitor species, and only the seeds from the plants with favourable properties in each generation have been propagated, a substantial portion of the genetic diversity has been lost during the domestication process, causing a genetic bottleneck in most crop species.2,3 The narrow range of crop genetic diversity renders them vulnerable to pathogens, pests, and environmental stress.3 Modern molecular and genomic technologies have enabled the dissection of the genetic basis underlying resistance to biotic and abiotic stresses in crop wild relatives. Certain traits, such as acid sulphate tolerance in rice,4 Fusarium wilt resistance in tomato,5 and drought tolerance in wheat,6 have been successfully transferred into susceptible modern crops through conventional breeding and modern transgenic technology. Many domestication-related genes have also been identified,7 and wild species with suitable genetic backgrounds can be rapidly domesticated by targeting domestication-related genes using genome editing.8–10 These achievements have facilitated high-efficiency molecular breeding programs to exploit genetic diversity in crop wild relatives.
Soybean (Glycine max (L.) Merr., 2n = 40) is an economically crucial crop worldwide because it provides not only healthy vegetable oil with reduced saturated fat and without cholesterol but also high-quality proteins with all the essential amino acids. Furthermore, soybeans contain bioactive compounds, such as isoflavones and saponins, and can release bioactive peptides following enzymatic treatment, such as gastrointestinal digestion, food processing, or fermentation.11 Soybean was domesticated 6,000–9,000 years ago from wild soybean (Glycine soja Sieb. et Zucc., 2n = 40), which is native to East Asia, including most of China, Japan, Korea, and the Far East region of Russia.12,13 Because G. soja has substantially higher genetic diversity than G. max,14,15 and as these species are cross-compatible, G. soja germplasm is an invaluable genetic reservoir for the improvement of G. max. Certain G. soja accessions showing strong resistance to biotic16–18 or abiotic stresses,19,20 or with desirable storage protein21 or saponin compositions22 have been identified, and their underlying genetic bases have been uncovered.17,19–23 Furthermore, novel genes/quantitative trait loci that influence various traits, including yield,24 root architecture,25 100-seed weight,26 and seed protein content,27 have been identified in G. soja. Understanding the natural variation in G. soja is crucial for developing high-quality soybean cultivars with high and stable yields.
Genetic diversity in G. max and G. soja has been analysed using various molecular markers, including randomly amplified polymorphic DNA (RAPD),28 amplified fragment length polymorphisms,29 simple sequence repeats (SSRs) or microsatellites,30–33 and single-nucleotide polymorphisms (SNPs).34–36 More recently, many germplasm accessions have been subjected to whole-genome resequencing using next-generation sequencing.37,38 These studies have revealed a higher diversity in G. soja than in G. max and that the phylogenetic relatedness of different G. soja populations tends to be associated with their geographic origins. However, most previous studies were limited in terms of the number of molecular markers, sample size, and geographical coverage. Thus, in various studies, some inconsistent findings regarding the diversification of wild soybeans have been reported; for example, G. soja populations from north-eastern China, South Korea, and southern Japan were revealed to have a close relationship,30,39 while other studies observed that G. soja populations in South Korea and Japan phylogenetically closely related to those in southern China.33,40 Additionally, various studies have independently proposed a single origin of G. max in different regions, including southern China,33,41 central China,34 and eastern Japan,35 according to the results of phylogenetic analysis. Therefore, it is necessary to use more germplasm accessions from the more extensive distribution areas of both G. soja and G. max to obtain a comprehensive understanding of their genetic diversity and differentiation.
Currently, global germplasm collections house over 13,000 accessions of G. soja, conserved primarily in China, the United States of America, Japan, South Korea, and Russia.42 Among them, the two germplasm collections of G. soja in Japan are accessible to the global community of breeders and scientists for public demand. One germplasm collection, comprising 1,633 and 829 accessions collected in Japan and other countries, respectively, was preserved in the National Agriculture and Food Research Organisation (NARO) Genebank (https://www.gene.affrc.go.jp/about_en.php). In this collection, 825 accessions were transferred from the National Plant Germplasm System (NPGS; www.ars-grin.gov) of the United States Department of Agriculture (USDA). The National BioResource Project (NBRP) Lotus and Glycine (https://legumebase.nbrp.jp/) preserve another germplasm collection comprising 715 G. soja accessions collected from Japan. Cultivation and evaluation of G. soja are time-consuming and labour-intensive due to its wild habits, such as extensive lateral branching, indeterminate growth, and pod shattering. Therefore, developing a mini-core G. soja collection may facilitate its evaluation and utilization. Although two G. soja mini-core collections have also been established in previous studies, one covered only Chinese germplasm,43 and one did not include accessions from Japan’s northernmost main island of Hokkaido, and the information regarding the accessions was not publicly available.44
In this study, we used 78 SSR markers covering all 20 chromosomes to investigate the genetic diversity of 2,050 G. soja accessions available at the time of initiation of this study. Furthermore, we selected a mini-core collection comprising 192 accessions to optimize the utilization of G. soja genetic resources in soybean improvement.
2. Materials and methods
2.1 Plant materials
We used 1,421 and 629 G. soja accessions from the NARO Genebank and NBRP Lotus and Glycine, respectively (Supplementary Table S1). Among the accessions from the NARO Genebank, 794 were transferred from the NPGS in the United States. Overall, 1,265 accessions were collected from Japan, 341 from South Korea, 256 from Russia, and 188 from China (Fig. 1). According to a previous study,32 accessions from China were divided into three geographic subpopulations—collected from north-eastern China (CI), central China (also called the Huang–Huai–Hai Valley; CII), and southern China (CIII), with minor modifications. Russian accessions were collected from Khabarovsk and Primorsky Krais, both located in the easternmost part of Russia and Amur Oblast, southeast Russia, respectively. Accordingly, we divided Russian accessions from Khabarovsk and Primorsky Krais into a subpopulation (RI) and those from Amur Oblast into RII. All accessions from South Korea were categorized as a single subpopulation (KI). Japanese accessions were collected from the four primary islands of Japan: Hokkaido, Honshu, Shikoku, and Kyushu. First, Hokkaido, Shikoku, and Kyushu accessions were divided into three subpopulations. Because Honshu Island is the largest island in Japan, roughly 1,300 km in length, we divided the accessions from Honshu Island into three subpopulations. Accordingly, the Japanese accessions were divided into six subpopulations (JI–JVI). For comparison, we used 192 G. max accessions from a global mini-core collection comprising 95 accessions from Japan and 97 from outside Japan (Supplementary Table S2; NARO Genebank), selected from a total of 1,603 accessions globally.45
Figure 1.
Geographic distributions of G. soja accessions collected in Japan, China, Russia, and South Korea. Japan, China, and Russia are divided into six, three, and two regions, respectively. The number of accessions from each region is included in brackets below. Detailed geographical origin information of each accession is shown in Supplementary Table S1. CH: China; JPN: Japan; KOR: South Korea; MNG: Mongolia; PRK: People’s Republic of Korea; RUS: Russia.
2.2 DNA extraction and genotyping
Eight seeds from each accession were ground to a powder and used for DNA extraction following the protocols of a previous report.46 For genotype analysis, we selected 78 SSR markers, with three or four markers spanning each chromosome (Supplementary Table S3). These markers were developed from genomic or EST sequences of G. max, and high polymorphism rates have been revealed in 87 soybean genotypes comprising eight G. soja accessions and 79 G. max cultivars.46 A fluorescent dye (one of 6-FAM, PET, VIC, or NED) was affixed to the 5ʹ-end of the forward primer in each primer pair. Three or four SSR markers labelled with different fluorophores were arranged in a multiplex analysis unit, and 20 units were established. Multiplex polymerase chain reaction (PCR) was performed in a reaction mixture of 5.5 μl comprising 5 ng of total genomic DNA as a PCR template, each primer at a concentration of 50 nM, and 2.5 μl of Qiagen Multiplex PCR Master Mix (Qiagen, Hilden, Germany). The PCR conditions were as follows: initial denaturation at 95°C for 15 min, followed by 35 cycles of denaturation for 30 s at 95°C, annealing for 90 s at 50°C, extension for 90 s at 60°C, and a final extension for 30 min at 60°C. A mixture containing 1 μl of PCR products, 0.3 μl of GeneScan 600 LIZ Size Standard (Applied Biosystems, Foster City, CA, USA), and 10 μl of Hi-Di formamide (Applied Biosystems) was heated for 3 min at 95°C and then cooled rapidly to 4°C. Capillary electrophoresis was performed on a 3730 DNA Analyzer (Applied Biosystems), and the highest peak of each marker was subjected to allele calling using GeneMapper software v4.0 (Applied Biosystems).
2.3 Data analysis
2.3.1 Genetic diversity
DNA fragments of varying sizes were designated as alleles. The number of different alleles (Na) and Shannon information index (I) for each SSR locus were estimated using GenAlEx software v6.5.47 The gene diversity (GD) and polymorphism information content (PIC) for each SSR locus were calculated using PowerMarker software v3.25.48
2.3.2 Genetic differentiation
Using the ‘poppr.amova’ function implemented in the R package poppr,49 we performed an analysis of molecular variance (AMOVA) to estimate and compare the percentage of genetic variation explained by different hierarchical levels: species (G. max and G. soja), subpopulations, and individuals. The statistical significance of variance components was assessed using the ‘randtest’ function implemented in the R package ade4 with 1,000 permutations.
Fixation index (FST) values and Nei’s genetic distance50 between pairs of subpopulations were estimated using GenAlEx software v6.5.51 The heatmaps of the pairwise FST and Nei’s genetic distance were constructed using the R package pheatmap.52
For visual assessment of differentiation among the subpopulations of wild soybean, we performed the discriminant analysis (DA) of principal components (DAPC), which transforms the dataset based on principal component analysis prior to DA.53 Cross-validation was conducted to determine the appropriate number of principal components (PCs) retained using the ‘xvalDapc’ function implemented in the R package adegenet.54 A DAPC scatter plot of individuals was generated using the ‘scatter’ function.
2.3.3 Population structure
To perform the phylogenetic analysis, we calculated pairwise genetic distances between all accessions using the ‘gd.kosman’ function for assessing genetic dissimilarity developed by Kosman and Leonard,55 which was implemented in the R package PopGenReport.56 The dendrogram was constructed using the neighbour-joining method implemented in the R package ape.57
To investigate the pattern of the population structure of the entire population, including both G. max and G. soja accessions, we used the Bayesian model-based clustering method implemented in STRUCTURE v2.358 to estimate the number of genetically distinct clusters (K) using an admixture model and correlated allele frequencies. The range of possible clusters was set from one to 20, with 20 independent iterations for each K. The analysis parameters included a burn-in length of 100,000 followed by a Markov chain Monte Carlo length of 100,000. The R package Pophelper v2.3.159 was used to process the results from STRUCTURE. The most likely value of K was determined according to the log probability of the data [ln Pr(X|K)]58 and the ad hoc statistic delta K (ΔK) based on the rate of change of ln Pr(X|K) between successive K values, as described by Evanno et al.60
2.3.4 Assigning a G. soja mini-core collection
The mini-core collection selection was performed using the ‘sampleCore’ function implemented in the R package corehunter,61 using the average entry-to-nearest-entry method. The matrix of distances was calculated using the modified Roger’s distance algorithms, as implemented in the package corehunter. Three accessions with high-quality reference genomes, namely, PI549046, PI578357, and PI562565,62 were continuously selected. Furthermore, 91 accessions with published whole-genome resequencing data62 and the ‘research set’ including 64 accessions selected by NBRP Lotus and Glycine (https://legumebase.nbrp.jp/legumebase/glycineResearchSetBrowseAction.do), were accorded high priority when they were phylogenetically closely related to the accessions selected by the R package corehunter.7
3. Results and discussion
3.1 Genetic diversity of G. max and G. soja
Glycine soja is a self-pollinating plant with an outcrossing rate of approximately 3.4%.63 Using 191 SNP markers, Kaga et al. analysed 264 wild soybean accessions in NARO-GB. They observed that the heterozygosity of mean overall loci was 0.012 (range, 0.000–0.068) in Japanese wild soybean and 0.001 (0.000–0.049) in exotic wild soybean.45 However, the genotypes for all of the accessions were determined from a single plant. In our trial experiment, we initially aimed to collect all alleles of each locus; however, some loci in some accessions displayed numerous peaks, making it challenging to differentiate them clearly. The presence of low peaks poses a risk of miss-calling. Consequently, in the present study, we opted to consider only the major allele, represented by the highest peak of each marker, as indicative of the genetic character of an accession.
As a result, a high level of variation was detected at 78 SSR loci in the 2,050 wild and 192 cultivated accessions (Table 1). Allele sizes and frequencies of each SSR loci were listed in Supplementary Table S4. Out of 2,637 alleles detected in all loci, most alleles exhibited short tandem repeat variations, and 17 alleles (0.6%) were single nucleotide indels. These single nucleotide indels might arise from mutational events occurring outside of the repeat or through the interruption of a perfect repeat.64 On average, 16.27 and 33.15 alleles per locus were detected in G. max and G. soja, respectively, higher than those reported in previous studies (5.45–13.70 in G. max and 17.80–28.00 in G. soja).31,32,34,44 The high rates of polymorphism of these SSR markers observed in the present study suggest that they were effective molecular genetic markers and could be used to estimate the genetic diversity of different populations precisely. The number of alleles with a frequency of more than 5% was only approximately five per locus in both species, indicating that most alleles were rare and possibly geographically localized. The I, an estimator of genetic diversity, was estimated to be 1.97 in G. max, representing 75.2% of that in G. soja. The I value has been revealed to stabilize after reaching the optimum sample size.65 In fact, the I value of 97 accessions in the World group of G. max reached a stable value. Therefore, despite the different sample sizes in these two species, their respective I value represented the highest in each species. The whole-genome resequencing of wild and cultivated accessions revealed that the average number of pairwise nucleotide differences (genetic diversity, π) in G. max was half that of G. soja.38 The different reduction degrees in genetic diversity may be due to the different estimators and materials. In rice, Watterson’s estimator (θ) and π of domesticated Asian rice (Oryza sativa L.) were estimated to be approximately 49.1 and 71.7% of those in wild rice (O. rufipogon and O. nivara), respectively.66 These results suggest that using crop wild relatives can broaden the genetic basis of cultivated crops. Furthermore, since only a limited number of wild plants were initially domesticated into ancient crops, the fact that more than 50–70% of the diversity in a crop wild relative has been retained in a domesticated crop supports the opinion that these crops have undergone extensive gene flow or introgression from their wild relatives to adapt to the new conditions encountered during expansion.67
Table 1.
Summary statistics for the microsatellite variation analysis in G. max and G. soja populations
| Species | Geographical origin | Allelic diversity | Genetic diversity | ||||
|---|---|---|---|---|---|---|---|
| N | Mean Na | Na Freq. >= 5% | I | GD | PIC | ||
| G. max | 192 | 16.27 | 5.06 | 1.97 | 0.77 | 0.74 | |
| Japan | 95 | 13.86 | 5.53 | 1.69 | 0.75 | 0.72 | |
| World | 97 | 11.85 | 4.72 | 1.97 | 0.77 | 0.74 | |
| G. soja | 2,050 | 33.15 | 5.51 | 2.63 | 0.86 | 0.85 | |
| China (Total) | 188 | 19.91 | 6.03 | 2.36 | 0.83 | 0.82 | |
| CI | 112 | 16.04 | 5.42 | 2.12 | 0.79 | 0.78 | |
| CII | 51 | 11.67 | 6.27 | 1.97 | 0.77 | 0.76 | |
| CIII | 25 | 8.97 | 5.35 | 1.88 | 0.79 | 0.77 | |
| Russia (Total) | 256 | 17.69 | 4.86 | 1.96 | 0.75 | 0.73 | |
| RI | 126 | 15.97 | 4.82 | 2.03 | 0.77 | 0.75 | |
| RII | 130 | 9.92 | 3.81 | 1.40 | 0.63 | 0.59 | |
| South Korea (Total)/KI | 341 | 24.36 | 5.97 | 2.50 | 0.85 | 0.84 | |
| Japan (Total) | 1,265 | 30.55 | 5.55 | 2.53 | 0.85 | 0.84 | |
| JI | 81 | 8.71 | 3.90 | 1.40 | 0.61 | 0.58 | |
| JII | 516 | 22.83 | 5.64 | 2.30 | 0.82 | 0.81 | |
| JIII | 217 | 22.87 | 5.92 | 2.47 | 0.85 | 0.84 | |
| JIV | 160 | 21.64 | 5.95 | 2.44 | 0.84 | 0.83 | |
| JV | 147 | 19.32 | 5.18 | 2.27 | 0.82 | 0.80 | |
| JVI | 144 | 21.31 | 5.64 | 2.47 | 0.85 | 0.84 | |
| Total | 2,242 | 33.83 | 5.62 | 2.65 | 0.87 | 0.86 | |
N: no. of samples; Na: no. of different alleles; Freq. >= 5%: no. of different alleles with a frequency >= 5%; I: Shannon’s information index; PIC: polymorphism information content; GD: gene diversity.
The genetic diversity of G. soja in Russia was notably lower than that in the other three countries (Table 1). Although 130 accessions in subpopulation RII from Amur Oblast were analysed in this study, their I was only 1.40. The subpopulation RI from Khabarovsk and Primorsky Krais showed a relatively higher genetic diversity, and their I was 2.03. Using RAPD markers, Seitova et al. analysed 200 accessions of wild soybeans collected in the Far East region of Russia. Also, they observed that subgroups collected in Primorskii Krai had higher levels of polymorphism than those in Amur Oblast.68 This regional difference in genetic diversity may be due to their distinct climates. Khabarovsk and Primorsky Krai tend to have a more diverse climate influenced by continental and maritime factors. In contrast, Amur Oblast has a more pronounced continental climate with distinct seasons and greater temperature extremes. Subpopulation JI from Hokkaido, located in northern Japan, showed the lowest I (1.4) among the subpopulations in Japan. Tsugaru Strait separates Hokkaido and Honshu, which may limit the gene flow between subpopulations in these regions, resulting in a sharp decrease in the genetic diversity of JI. The southern subpopulations (JIII, JIV, and JVI), except for JV, exhibited high genetic diversity, with an I of over 2.44, gene diversity of over 0.84, and PIC of over 0.83. The genetic diversity of subpopulation JV from Shikoku island was not as high as that of the other southern subpopulations. This difference may be attributed, to some extent, to the island’s smaller size and its earlier separation from Honshu than Kyushu.69,70 Subpopulation KI in South Korea also exhibited high genetic diversity; its genetic diversity estimates were comparable to those in Japan. Using 46 SSR markers, Lee et al.71 investigated the genetic diversity among 274 wild soybean accessions primarily originating from South Korea and also revealed high genetic diversity in accessions of South Korea. Subpopulation CIII from southern China showed the lowest I (1.88) among the subpopulations in China, possibly due to the analysis of only 25 accessions in this study. Several previous studies have revealed the genetic diversity of G. soja from southern China to be highest in China and suggested that southern China is a major centre of diversity for wild soybean in China.32,33 Since a limited number of G. soja germplasms in CIII were distributed to other counties, in the future, the association between genetic diversity and geographical location could be discussed more precisely by combining the public genomic data of G. soja germplasm covering all habitat areas.
3.2 Genetic differentiation
The AMOVA results revealed that 7.55% of the genetic variation was attributed to inter-species differences (P < 0.001), and 6.47% was attributed to the differences between subpopulations (P < 0.001; Table 2), indicating significant genetic differentiation between wild and cultivated soybeans and between subpopulations. The AMOVA results also revealed that principal genetic variation existed within the subpopulations (85.98%). To represent the relatedness between the subpopulations, we first used DAPC to investigate the genetic structure of the entire population, including G. soja and G. max subpopulations. Cross-validation revealed that the proportion of successful outcome predictions peaked at approximately 600 PCs (Supplementary Fig. S1). Therefore, we retained 600 PCs during the preliminary variable transformation and subsequently performed DA. The scatterplot highlights the differentiation of G. max from G. soja and displays a cline of genetic differentiation among the different subpopulations of G. soja (Supplementary Fig. S2). To obtain a more detailed visual assessment of subpopulation differentiation in G. soja, we performed the DAPC of G. soja subpopulations and created a scatterplot using 400 PCs (Fig. 2 and Supplementary Fig. S3). The subpopulations RI, RII, and CI are shown in the upper left panel of the scatterplot. The subpopulation CI, in which most accessions were from north-eastern China, was closely related to RI. Most of the samples in subpopulations CII, CIII, and KI are shown in the lower left panel. Subpopulation KI was also closely related to subpopulations JIV, JV, and JVI. Subpopulations JI and JII are shown in the upper right panel; JI is clearly separated from the other subpopulations. These results were roughly supported by a pairwise Fst analysis (Supplementary Fig. S4). The pairwise Fst values for KI-JIV and KI-JVI were only 0.01, indicating a low genetic differentiation between these subpopulations. The palaeontological and stratigraphy studies have concluded that western Japan and the Asian mainland were connected via a land bridge formed during periods of lowered sea levels associated with glacial maxima at least twice around 0.63 and 0.43 million years ago (Ma).72 As reported from previous studies on phylogeographic patterns and genetic diversity of land plants in East Asia,73,74 the Korea/Tsushima land bridge during the glacial period might have acted as a temporary genetic corridor and contributed to the gene flow between western Japan and South Korea.
Table 2.
Analysis of molecular variance (AMOVA) between different species and subpopulations
| Df | Sum Ss | Mean Ss | Sigma | Percentage of total variance(%) | |
|---|---|---|---|---|---|
| Between species | 1 | 2,323.9 | 2,323.9 | 5.08 | 7.55** |
| Between subpopulations | 12 | 8,862.5 | 738.5 | 4.35 | 6.47** |
| Variations within subpopulations | 2,228 | 128,905.6 | 57.9 | 57.86 | 85.98** |
| Total variations | 2,241 | 140,091.9 | 62.5 | 67.29 | 100.0 |
**Significant (P < 0.001).
Df: degree of freedom. Ss: sum of squares.
Figure 2.
Scatterplot of the discriminant analysis results of the principal components in different subpopulations of G. soja. Dots represent individuals, and colours represent subpopulations. The eigenvalues show that the genetic structure is captured by the first two principal components.
3.3 Phylogenetic relationship
The neighbour-joining tree (Fig. 3) revealed that G. soja accessions from the same countries tended to cluster together, further indicating that genetic differentiation is associated with geographic separation. The G. max accessions were divided into three subgroups (I, II, and III), and the accession GmWMC134 belonged to the G. soja clade (Supplementary Fig. S5). Accessions from Japan and other countries tended to cluster in different groups (III and II, respectively). Subgroup (I), comprising 11 accessions, was the closest to G. soja. Recently, Kajiya-Kanegae et al.75 performed whole-genome Illumina resequencing of 198 accessions, including the 192 accessions from the G. max mini-core collection and a G. soja accession; their results also showed that GmWMC134 was distinct from other G. max accessions. Furthermore, Kanegae et al.75 divided G. max accession into three subgroups: ‘Primitive’, ‘World’, and ‘Japan’, a categorization that closely corresponded to the three subgroups identified in our study (Supplementary Table S2). It is possible that the subgroup (I) or ‘Primitive’, including GmWMC134, is derived from the introgression between G. max and G. soja. Notably, the entire G. max group was phylogenetically closely related to a G. soja clade in population CII, collected from the Huang–Huai–Hai Valleys in China (Fig. 3 and Supplementary Fig. S5), which is consistent with the previously predominant results.34,36 However, chloroplast microsatellite marker analysis of wild and cultivated soybeans suggests the existence of multiple origins for soybeans.76 More recently, analysis of chloroplast genomes in 62 G. soja accessions, 130 landraces, and 110 improved cultivars indicates the presence of numerous maternal lines contributing to the domestication of soybeans.77 The present study (Supplementary Fig. S5) and numerous previous studies13,62,78 have revealed that Japanese- and Chinese-cultivated soybeans form significantly different germplasm pools and that Korean-cultivated soybeans are a mix of these two germplasm pools. Furthermore, an increase in seed size is a characteristic that accompanies the process of plant domestication. Archaeological records indicate that soybean seeds recovered at archaeological sites in the Central Highlands of Japan were larger than those recovered in Korea and China 6,000–4,000 years ago, suggesting an earlier domestication in Japan.79 These molecular phylogenetic and archaeological pieces of evidence suggest independent domestication events occurred in various locations. However, the widespread use of soybeans in Japan commenced around 3,000 years ago when rice and millet were introduced from China.79 This supports the hypothesis that most of the ancient domesticates either vanished or were integrated into the domesticated soybean from China, possibly due to its more advantageous traits for cultivation.13
Figure 3.
Neighbour-joining phylogenetic tree of 2,050 G. soja accessions in addition to 192 G. max accessions. Colours represent subpopulations. An expanded display of the subtree from the dashed arc that comprises 192 G. max accessions and their most genetically closely related G. soja clade is shown in Supplementary Fig. S5.
3.4 Population structure of the entire population
Population structure analysis of all G. max and G. soja accessions was conducted using STRUCTURE. The most significant change in ln Pr(X|K) occurred when K increased from 4 to 5 (Fig. 4a), and the highest Δk value was observed at k = 4 (Fig. 4b), indicating the highest probability for population clustering. In addition, ΔK was also high for K = 17, indicating an additional informative population structure. Therefore, the STRUCTURE results at K = 4 and K = 17 were subjected to population genetic analyses.
Figure 4.
STRUCTURE estimation of the genetic structure in all G. max and G. soja accessions. (a) Estimation of population structure using the mean of estimated log probability of data with cluster values (K) ranging from 1 to 20. (b) Estimation of population structure using delta K with K ranging from 1 to 20. (c) Visualization of population genetic structure with K = 4 (above) and 17 (below). Different subpopulations are demarcated by vertical dashed white lines, with the corresponding subpopulation names indicated under the bottom panel.
The estimated membership coefficient (Q) presents the proportion of an accession’s ancestry derived from the associated cluster. At K = 4, the Q values of Cluster 1 for most of G. max accession were nearly 1.00 (Fig. 4c). Whereas the Q value of Cluster 1 for each accession in subgroup I of G. max was 0.44–0.76, and Q values of Cluster 2 for most of the accessions except of GmWMC138 in subgroup I were 0.22–0.48 (Supplementary Fig. S6a). These results suggest that G. soja in Cluster 2 share ancestry with G. max. We assigned accessions with Q > 0.76 to the corresponding cluster, and the others were categorized into the group of admixture (Supplementary Tables S1 and S2). The frequency of each cluster in a subpopulation is presented in a two-dimensional table (Supplementary Table S5). Cluster 2 is a major cluster, comprising most of the accessions in G. soja subpopulations CII, CIII, KI, JV, and JVI, and 16.7–38.3% accessions in CI, RI, and JIV. Most accessions from subpopulation JII and 38.3% accessions from subpopulation JI were assigned to Cluster 3. Cluster 4 predominantly comprised accessions from subpopulation RII. Subpopulations CI and RI simultaneously included the accessions assigned to Cluster 2 or 4, indicating obvious population differentiation in these regions. Considering the population differentiation within the initially defined subpopulations based on geographic origin, we redefined the G. soja accessions into 14 subpopulations (Fig. 5), each with an accession number exceeding 20. Hence, Nei’s genetic distance between subpopulations was estimated. We estimated that the genetic distance between CI_Cluster4 and CI_Cluster2 was 0.74 (Fig. 5), whereas the distance between CI_Cluster4 and RII_Cluster4 was only 0.09. Furthermore, the genetic distances between CI_Cluster2 and the subpopulations in Korea and Japan were smaller than those between CII_Cluster2 (or CIII_Cluster2) and subpopulations in Korea and Japan (Paired t-test, P < 0.001), implying gene flow among the subpopulations in north-eastern China, Korea, and Japan. In some previous studies, G. soja accessions from north-eastern China were regarded to be genetically closer to those from South Korea and Japan, compared with those from other regions in China,30,39 while inconsistent results were revealed in other studies.33,40 These different results might arise from the varying proportions of Cluster 2 and Cluster 4 in the subpopulation CI. If there is a high proportion of Cluster 4 in CI, the CI would be genetically distant from those in Korea and Japan.
Figure 5.
Heatmap of Nei’s genetic distances between different G. soja subpopulations. Subpopulations, each with an accession number exceeding 20, were defined based on the geographic origin and population structure (K = 4) of each accession. Dendrograms were plotted using the unweighted-pair-group method with arithmetic means. Colours indicate varying degrees of genetic distances.
At K = 17, most G. max accessions clearly belonged to Cluster 1 (Fig. 4c). The Q value of Cluster 1 for each accession in subgroup I of G. max was 0.36–0.60 (Supplementary Fig. S6b). Therefore, we assigned accessions with Q > 0.60 to the corresponding cluster, and the others were assigned to the group of admixture (Supplementary Tables S1 and S2). The frequency of each cluster in a subpopulation is presented in a two-dimensional table (Supplementary Table S6). A genetic cluster specific to each subpopulation was identified. For example, Cluster 2 was primarily specific to subpopulation CII, whereas Cluster 12 comprised the majority of subpopulation JV. We also determined that most accessions from subpopulation JI belonged specifically to Cluster 14. Overall, the population structure analysis results are consistent with the results of the DAPC and phylogenetic analyses. Furthermore, Q values of Cluster 2 in most accessions of subgroup I of G. max were over 0.24, and Q values of Cluster 3 and 4 were also not low in some accessions (Supplementary Fig. S6b). These observations and the above findings strengthen the hypothesis13 that the introgression of local ancient soybean domesticates or G. soja into the early widespread ancestors of soybean may have occurred, potentially contributing to soybean domestication and subsequent genetic divergence.
3.5 Mini-core collection selection
We developed a G. soja mini-core collection comprising 192 accessions (Supplementary Table S7) to represent the genetic diversity of the 2,050 globally distributed accessions. Seventy-four percent of alleles were retained in the G. soja mini-core collection (Table 3), which is 1.77 times higher than in G. max mini-core collection (Table 1). Furthermore, we selected two research sets, including 96 and 48 accessions, for small-scale phenotypic evaluations and genetic diversity studies, respectively. The genetic diversity indices of these sets were nearly identical to those of the entire population (Table 3). The mini-core collection covered nearly all clusters at K = 4 or 17 in the 2,050 accessions and included 101 accessions from Japan, 38 from China, 31 from South Korea, and 22 from Russia. Fifty-nine accessions belonged to a ‘research set’ comprising the 64 accessions recommended by the NBRP Lotus and Glycine. Among the mini-core collection, whole-genome resequencing data of 41 accessions from NPGS has been deposited into the Genome Sequence Archive in previous studies.38,62 High-quality reference genome sequences of PI549046 (China), PI562565 (South Korea), and PI578357 (Russia) have been assembled de novo.62 Currently, we are engaged in the de novo assembly of the genome sequence of the Japanese accession JP110755. This accession has undergone crossing with the leading cultivar ‘Fukuyutaka’ in southwestern Japan to construct a recombinant inbred line population.17,80 These four representative accessions were separated in the phylogenetic tree (Supplementary Fig. S7). The geographical coverage of this mini-core collection is broader than the other two mini-core collections. One of them covered only Chinese germplasm,43 while the other did not include accessions from Hokkaido,44 where distinct genetic differentiation was observed, as mentioned earlier.
Table 3.
Summary statistics for microsatellite variation analysis in the selected G. soja mini-core collection and research set with 96 and 48 accessions, respectively
| Population | Allelic diversity | Genetic diversity | ||||
|---|---|---|---|---|---|---|
| N | Mean Na | Na Freq. >= 5% | I | GD | PIC | |
| Whole | 2,050 | 33.2 | 5.5 | 2.6 | 0.86 | 0.85 |
| Mini_Core | 192 | 24.5 | 6.0 | 2.6 | 0.88 | 0.87 |
| 96_set | 96 | 21.2 | 6.8 | 2.6 | 0.88 | 0.87 |
| 48_set | 48 | 17.3 | 6.4 | 2.5 | 0.88 | 0.87 |
N = no. of samples. Na = no. of different alleles. Ne = no. of effective allele. Freq. >= 5%: no. of different alleles with a frequency >= 5%. I = Shannon’s Information Index. PIC: polymorphism information content. GD: gene diversity.
Most importantly, all accessions in this mini-core collection are accessible to those who need them. Currently, we are resequencing this G. soja mini-core collection to facilitate the exploitation of novel alleles and the identification of useful genes or genomic regions through genomic and genome-wide association study analyses. We have propagated the seeds of the accessions in this mini-core collection using the single seed descent method, and these seeds can be easily obtained by direct application to NARO-GB (https://www.gene.affrc.go.jp/about_en.php) or NBRP Lotus and Glycine (https://legumebase.nbrp.jp/) in Japan.
4. Conclusions
In this study, using the highly polymorphic 78 SSR markers that were widely distributed across the genome, we analysed the genetic diversity and population structure of a global and publicly available wild soybean collection of 2,050 accessions as well as a mini-core collection of 192 cultivated soybean accessions. The number of wild soybean accessions and SSR loci far exceeds those in previous reports.32,44,81 We noticed a significant decrease in genetic diversity in G. max when compared with G. soja, indicating that utilizing G. soja can broaden the genetic basis of G. max. Population structure and phylogenetic analyses indicated a distinct geographic pattern of genetic differentiation in G. soja. The subpopulations in South Korea and southern Japan displayed high genetic diversity and were genetically similar. The Korea/Tsushima land bridge during the glacial period72,82 might have contributed to the widespread gene flow and genetic drift in these regions. We also observed that G. soja accessions from north-eastern China formed two differentiated clusters: one cluster is genetically close to subpopulations from South Korea and southern Japan, and the other is close to those from Amur Oblast, Russia. We also determined that G. max is phylogenetically similar to the G. soja subpopulation in central China. Based on the complex hypothesis of domesticated soybean13 and the ancestry proportions inferred through structure analysis, we propose that introgression with the local ancient domesticates or G. soja may have occurred as the early ancestors of soybeans in ancient central China dispersed to other regions, resulting in the replacement of local ancient domesticates. Furthermore, we developed a mini-core collection comprising 192 accessions, some of which have been used in genomic studies.38,62 We predict that this mini-core collection will be particularly useful for identifying and utilizing novel genes or alleles absent in the cultivated soybean germplasm.
Supplementary Material
Acknowledgements
This work was partially supported by the Cabinet Office, the Government of Japan, Moonshot Research and Development Program for Agriculture, Forestry and Fisheries (funding agency: Bio-oriented Technology Research Advancement Institution; grant no. JPJ009237) for FL, SH, and MI, and by the National BioResource Project (NBRP) of the Japan Agency for Medical Research and Development (AMED) for MH, HT, and RA. We gratefully acknowledge the Advanced Analysis Centre of NARO for the use of their high-speed processor system and the Advanced Genomics Breeding Section of the Institute of Crop Science of NARO for reconfirming the genotyping data of partial accessions. We thank Drs. Akito Kaga and Ryoichi Yano for providing useful information, and Editage for editing and reviewing this manuscript.
Contributor Information
Feng Li, Division of Crop Design Research, Institute of Crop Science, National Agricultural and Food Research Organization (NARO), Tsukuba, Ibaraki 305-8602, Japan.
Takashi Sayama, Division of Crop Design Research, Institute of Crop Science, National Agricultural and Food Research Organization (NARO), Tsukuba, Ibaraki 305-8602, Japan; Western Region Agricultural Research Center, National Agricultural and Food Research Organization (NARO), Zentsuji, Kagawa 765-8508, Japan.
Yuko Yokota, Division of Crop Design Research, Institute of Crop Science, National Agricultural and Food Research Organization (NARO), Tsukuba, Ibaraki 305-8602, Japan.
Susumu Hiraga, Division of Crop Design Research, Institute of Crop Science, National Agricultural and Food Research Organization (NARO), Tsukuba, Ibaraki 305-8602, Japan.
Masatsugu Hashiguchi, Faculty of Agriculture, University of Miyazaki, Gakuen-kibanadai-nishi-1-1, Miyazaki, 889-2192, Japan.
Hidenori Tanaka, Faculty of Agriculture, University of Miyazaki, Gakuen-kibanadai-nishi-1-1, Miyazaki, 889-2192, Japan.
Ryo Akashi, Faculty of Agriculture, University of Miyazaki, Gakuen-kibanadai-nishi-1-1, Miyazaki, 889-2192, Japan.
Masao Ishimoto, Division of Crop Design Research, Institute of Crop Science, National Agricultural and Food Research Organization (NARO), Tsukuba, Ibaraki 305-8602, Japan.
References
- 1. Gross, B.L. and Olsen, K.M. 2010, Genetic perspectives on crop domestication, Trends Plant Sci., 15, 529–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Doebley, J.F., Gaut, B.S., and Smith, B.D. 2006, The molecular genetics of crop domestication, Cell, 127, 1309–21. [DOI] [PubMed] [Google Scholar]
- 3. Tanksley, S.D. and McCouch, S.R. 1997, Seed banks and molecular maps: unlocking genetic potential from the wild, Science, 277, 1063–6. [DOI] [PubMed] [Google Scholar]
- 4. Mammadov, J., Buyyarapu, R., Guttikonda, S.K., Parliament, K., Abdurakhmonov, I.Y., and Kumpatla, S.P. 2018, Wild relatives of maize, rice, cotton, and soybean: treasure troves for tolerance to biotic and abiotic stresses, Front. Plant Sci., 9, 886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Chitwood-Brown, J., Vallad, G.E., Lee, T.G., and Hutton, S.F. 2021, Breeding for resistance to Fusarium wilt of tomato: a review, Genes (Basel), 12, 1673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Reynolds, M., Dreccer, F., and Trethowan, R. 2007, Drought-adaptive traits derived from wheat wild relatives and landraces, J. Exp. Bot., 58, 177–86. [DOI] [PubMed] [Google Scholar]
- 7. Meyer, R.S. and Purugganan, M.D. 2013, Evolution of crop species: genetics of domestication and diversification, Nat. Rev. Genet., 14, 840–52. [DOI] [PubMed] [Google Scholar]
- 8. Scheben, A., Wolter, F., Batley, J., Puchta, H., and Edwards, D. 2017, Towards CRISPR/Cas crops—bringing together genomics and genome editing, New Phytol., 216, 682–98. [DOI] [PubMed] [Google Scholar]
- 9. Zsogon, A., Cermak, T., Naves, E.R., et al. 2018, De novo domestication of wild tomato using genome editing, Nat. Biotechnol., 36, 1211–6. [DOI] [PubMed] [Google Scholar]
- 10. Yu, H., Lin, T., Meng, X., et al. 2021, A route to de novo domestication of wild allotetraploid rice, Cell, 184, 1156–1170.e14. [DOI] [PubMed] [Google Scholar]
- 11. Chatterjee, C., Gleddie, S., and Xiao, C.W. 2018, Soybean bioactive peptides and their functional properties, Nutrients, 10, 1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Carter, T. E. J., Hymowitz, T. and Nelson, R. L. 2004, Biogeography, local adaptation, vavilov, and genetic diversity in soybean. In: Werner, D. (ed), Biological resources and migration, Springer, Berlin, Germany, pp. 47–59. [Google Scholar]
- 13. Sedivy, E.J., Wu, F., and Hanzawa, Y. 2017, Soybean domestication: the origin, genetic architecture and molecular bases, New Phytol., 214, 539–53. [DOI] [PubMed] [Google Scholar]
- 14. Hyten, D.L., Song, Q., Zhu, Y., et al. 2006, Impacts of genetic bottlenecks on soybean genome diversity, Proc. Natl. Acad. Sci. USA, 103, 16666–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lam, H.M., Xu, X., Liu, X., et al. 2010, Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection, Nat. Genet., 42, 1053–9. [DOI] [PubMed] [Google Scholar]
- 16. Jiang, C.-J., Sugano, S., Ochi, S., Kaga, A., and Ishimoto, M. 2020, Evaluation of Glycine max and Glycine soja for resistance to Calonectria ilicicola, Agronomy, 10, 887. [Google Scholar]
- 17. Oki, N., Takagi, K., Ishimoto, M., Takahashi, M., and Takahashi, M. 2019, Evaluation of the resistance effect of QTLs derived from wild soybean (Glycine soja) to common cutworm (Spodoptera litura Fabricius), Breed. Sci., 69, 529–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hesler, L.S. 2013, Resistance to soybean aphid among wild soybean lines under controlled conditions, Crop Prot., 53, 139–46. [Google Scholar]
- 19. Tuyen, D.D., Lal, S.K., and Xu, D.H. 2010, Identification of a major QTL allele from wild soybean (Glycine soja Sieb. & Zucc.) for increasing alkaline salt tolerance in soybean, Theor. Appl. Genet., 121, 229–36. [DOI] [PubMed] [Google Scholar]
- 20. Qi, X., Li, M.W., Xie, M., et al. 2014, Identification of a novel salt tolerance gene in wild soybean by whole-genome sequencing, Nat. Commun., 5, 4340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Tsubokura, Y., Hajika, M., Kanamori, H., et al. 2012, The beta-conglycinin deficiency in wild soybean is associated with the tail-to-tail inverted repeat of the alpha-subunit genes, Plant Mol. Biol., 78, 301–9. [DOI] [PubMed] [Google Scholar]
- 22. Yano, R., Takagi, K., Takada, Y., et al. 2017, Metabolic switching of astringent and beneficial triterpenoid saponins in soybean is achieved by a loss-of-function mutation in cytochrome P450 72A69, Plant J., 89, 527–39. [DOI] [PubMed] [Google Scholar]
- 23. Zhang, S., Zhang, Z., Bales, C., et al. 2017, Mapping novel aphid resistance QTL from wild soybean, Glycine soja 85-32, Theor. Appl. Genet., 130, 1941–52. [DOI] [PubMed] [Google Scholar]
- 24. Li, D., Pfeiffer, T.W., and Cornelius, P.L. 2008, Soybean QTL for yield and yield components associated with Glycine soja alleles, Crop Sci., 48, 571–81. [Google Scholar]
- 25. Prince, S.J., Song, L., Qiu, D., et al. 2015, Genetic variants in root architecture-related genes in a Glycine soja accession, a potential resource to improve cultivated soybean, BMC Genomics, 16, 132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Lu, X., Xiong, Q., Cheng, T., et al. 2017, A PP2C-1 allele underlying a quantitative trait locus enhances soybean 100-seed weight, Mol. Plant, 10, 670–84. [DOI] [PubMed] [Google Scholar]
- 27. Leamy, L.J., Zhang, H., Li, C., Chen, C.Y., and Song, B.-H. 2017, A genome-wide association study of seed composition traits in wild soybean (Glycine soja), BMC Genomics, 18, 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li, Z.L. and Nelson, R.L. 2002, RAPD marker diversity among cultivated and wild soybean accessions from four Chinese provinces, Crop Sci., 42, 1737–44. [Google Scholar]
- 29. Abe, J., Hasegawa, A., Fukushi, H., Mikami, T., Ohara, M., and Shimamoto, Y. 1999, Introgression between wild and cultivated soybeans of Japan revealed by RFLP analysis for chloroplast DNAs, Econ. Bot., 53, 285–91. [Google Scholar]
- 30. He, S.L., Wang, Y.S., Li, D.Z., and Yi, T.S. 2016, Environmental and historical determinants of patterns of genetic differentiation in wild soybean (Glycine soja Sieb. et Zucc), Sci. Rep., 6, 22795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Kuroda, Y., Kaga, A., Tomooka, N., and Vaughan, D.A. 2006, Population genetic structure of Japanese wild soybean (Glycine soja) based on microsatellite variation, Mol. Ecol., 15, 959–74. [DOI] [PubMed] [Google Scholar]
- 32. Wen, Z.X., Ding, Y.L., Zhao, T.J., and Gai, J.Y. 2009, Genetic diversity and peculiarity of annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in China, Theor. Appl. Genet., 119, 371–81. [DOI] [PubMed] [Google Scholar]
- 33. Guo, J., Wang, Y., Song, C., et al. 2010, A single origin and moderate bottleneck during domestication of soybean (Glycine max): implications from microsatellites and nucleotide sequences, Ann. Bot, 106, 505–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Li, Y.H., Li, W., Zhang, C., et al. 2010, Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci, New Phytol., 188, 242–53. [DOI] [PubMed] [Google Scholar]
- 35. Jeong, S.C., Moon, J.K., Park, S.K., et al. 2019, Genetic diversity patterns and domestication origin of soybean, Theor. Appl. Genet., 132, 1179–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Han, Y., Zhao, X., Liu, D., et al. 2016, Domestication footprints anchor genomic regions of agronomic importance in soybeans, New Phytol., 209, 871–84. [DOI] [PubMed] [Google Scholar]
- 37. Wang, J., Hu, Z., Liao, X., et al. 2022, Whole-genome resequencing reveals signature of local adaptation and divergence in wild soybean, Evol. Appl., 15, 1820–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zhou, Z., Jiang, Y., Wang, Z., et al. 2015, Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean, Nat. Biotechnol., 33, 408–14. [DOI] [PubMed] [Google Scholar]
- 39. Meng, J., Yang, G., Li, X., Zhao, Y., and He, S. 2023, Population structure of wild soybean (Glycine soja) based on SLAF-seq have implications for its conservation, PeerJ, 11, e16415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Guo, J., Liu, Y., Wang, Y., et al. 2012, Population structure of the wild soybean (Glycine soja) in China: implications from microsatellite analyses, Ann Bot, 110, 777–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Wen, Z., Zhao, T., Ding, Y., and Gai, J. 2009, Genetic diversity, geographic differentiation and evolutionary relationship among ecotypes of Glycine max and G. soja in China, Science Bulletin, 54, 4393–403. [Google Scholar]
- 42. Nawaz, M.A., Lin, X., Chan, T.-F., et al. 2020, Korean wild soybeans (Glycine soja Sieb & Zucc.): geographic distribution and germplasm conservation, Agronomy, 10, 214. [Google Scholar]
- 43. Zhao, L., Dong, Y., Liu, B., Hao, S., Wang, K., and Li, X. 2005, Establishment of a core collection for the Chinese annual wild soybean (Glycine soja), Chin. Sci. Bull., 50, 989–96. [Google Scholar]
- 44. Kuroda, Y., Tomooka, N., Kaga, A., Wanigadeva, S.M.S.W., and Vaughan, D.A. 2009, Genetic diversity of wild soybean (Glycine soja Sieb. et Zucc.) and Japanese cultivated soybeans [G. max (L.) Merr.] based on microsatellite (SSR) analysis and the selection of a core collection, Genet. Resour. Crop Evol., 56, 1045–55. [Google Scholar]
- 45. Kaga, A., Shimizu, T., Watanabe, S., et al. 2011, Evaluation of soybean germplasm conserved in NIAS gene bank and development of mini core collections, Breed. Sci., 61, 566–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Sayama, T., Hwang, T.Y., Komatsu, K., et al. 2011, Development and application of a whole-genome simple sequence repeat panel for high-throughput genotyping in soybean, DNA Res., 18, 107–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Peakall, R. and Smouse, P.E. 2012, GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update, Bioinformatics, 28, 2537–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Liu, K. and Muse, S.V. 2005, PowerMarker: an integrated analysis environment for genetic marker analysis, Bioinformatics, 21, 2128–9. [DOI] [PubMed] [Google Scholar]
- 49. Kamvar, Z.N., Tabima, J.F., and Grünwald, N.J. 2014, Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction, PeerJ, 2, e281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Nei, M. 1972, Genetic distance between populations, Am. Nat., 106, 283–92. [Google Scholar]
- 51. Peakall, R. and Smouse, P.E. 2012, GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update, Bioinformatics, 28, 2537–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kolde, R. 2019, Package ‘pheatmap’. Available at: https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf
- 53. Jombart, T., Devillard, S., and Balloux, F. 2010, Discriminant analysis of principal components: a new method for the analysis of genetically structured populations, BMC Genet., 11, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Jombart, T. 2008, adegenet: a R package for the multivariate analysis of genetic markers, Bioinformatics, 24, 1403–5. [DOI] [PubMed] [Google Scholar]
- 55. Kosman, E. and Leonard, K.J. 2005, Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploid species, Mol. Ecol., 14, 415–24. [DOI] [PubMed] [Google Scholar]
- 56. Adamack, A.T. and Gruber, B. 2014, PopGenReport: simplifying basic population genetic analyses in R, Methods Ecol. Evol., 5, 384–7. [Google Scholar]
- 57. Paradis, E., Claude, J., and Strimmer, K. 2004, APE: analyses of phylogenetics and evolution in R language, Bioinformatics, 20, 289–90. [DOI] [PubMed] [Google Scholar]
- 58. Pritchard, J.K., Stephens, M., and Donnelly, P. 2000, Inference of population structure using multilocus genotype data, Genetics, 155, 945–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Francis, R.M. 2017, pophelper: an R package and web app to analyse and visualize population structure, Mol. Ecol. Resour., 17, 27–32. [DOI] [PubMed] [Google Scholar]
- 60. Evanno, G., Regnaut, S., and Goudet, J. 2005, Detecting the number of clusters of individuals using the software structure: a simulation study, Mol. Ecol., 14, 2611–20. [DOI] [PubMed] [Google Scholar]
- 61. De Beukelaer, H., Davenport, G.F., and Fack, V. 2018, Core Hunter 3: flexible core subset selection, BMC Bioinf., 19, 203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Liu, Y., Du, H., Li, P., et al. 2020, Pan-genome of wild and cultivated soybeans, Cell, 182, 162–176.e13. [DOI] [PubMed] [Google Scholar]
- 63. Kuroda, Y., Kaga, A., Tomooka, N., and Vaughan, D.A. 2008, Gene flow and genetic structure of wild soybean (Glycine soja) in Japan, Crop Sci., 48, 1071–9. [Google Scholar]
- 64. Oppen, M.J.H.V., Rico, C., Turner, G.F., and Hewitt, G.M. 2000, Extensive homoplasy, nonstepwise mutations, and shared ancestral polymorphism at a complex microsatellite locus in Lake Malawi cichlids, Mol. Biol. Evol., 17, 489–98. [DOI] [PubMed] [Google Scholar]
- 65. Bashalkhanov, S., Pandey, M., and Rajora, O.P. 2009, A simple method for estimating genetic diversity in large populations from finite sample sizes, BMC Genet., 10, 84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Jing, C.Y., Zhang, F.M., Wang, X.H., et al. 2023, Multiple domestications of Asian rice, Nat. Plants, 9, 1221–35. [DOI] [PubMed] [Google Scholar]
- 67. Janzen, G.M., Wang, L., and Hufford, M.B. 2019, The extent of adaptive wild introgression in crops, New Phytol., 221, 1279–88. [DOI] [PubMed] [Google Scholar]
- 68. Seitova, A.M., Ignatov, A.N., Suprunova, T.P., et al. 2004, Genetic variation of wild soybean Glycine soja Sieb. et Zucc. in the Far East Region of the Russian Federation, Russian Journal of Genetics, 40, 165–71. [Google Scholar]
- 69. Ohshima, K. 1990, The history of straits around the Japanese islands in the late-Quaternary, Quaternary Research (Daiyonki-Kenkyu), 29, 193–208 (in Japanese with English abstract). [Google Scholar]
- 70. Tashima, S., Kaneko, Y., Anezaki, T., Baba, M., Yachimori, S., and Masuda, R. 2010, Genetic diversity within the Japanese badgers (Meles anakuma), as revealed by microsatellite analysis, Mamm. Study, 35, 221–6. [Google Scholar]
- 71. Lee, J.-D., Yu, J.-K., Hwang, Y.-H., et al. 2008, Genetic diversity of wild soybean (Glycine soja Sieb. and Zucc.) accessions from South Korea and other countries, Crop Sci., 48, 606–16. [Google Scholar]
- 72. Yoshikawa, S., Kawamura, Y., and Taruno, H. 2007, Land bridge formation and proboscidean immigration into the Japanese Islands during the Quaternary, J. Geosci. Osaka City Univ., 50, 1–6. [Google Scholar]
- 73. Jin, D.P., Lee, J.H., Xu, B., and Choi, B.H. 2016, Phylogeography of East Asian (Fabaceae) based on chloroplast and nuclear ribosomal DNA sequence variations, J. Plant Res., 129, 793–805. [DOI] [PubMed] [Google Scholar]
- 74. Park, J.S., Takayama, K., Suyama, Y., and Choi, B.H. 2019, Distinct phylogeographic structure of the halophyte Suaeda malacosperma (Chenopodiaceae/Amaranthaceae), endemic to Korea-Japan region, influenced by historical range shift dynamics, J. Syst. Evol, 305, 193–203. [Google Scholar]
- 75. Kajiya-Kanegae, H., Nagasaki, H., Kaga, A., et al. 2021, Whole-genome sequence diversity and association analysis of 198 soybean accessions in mini-core collections, DNA Res., 28, dsaa032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Xu, D.H., Abe, J., Gai, J.Y., and Shimamoto, Y. 2002, Diversity of chloroplast DNA SSRs in wild and cultivated soybeans: evidence for multiple origins of cultivated soybean, Theor. Appl. Genet., 105, 645–53. [DOI] [PubMed] [Google Scholar]
- 77. Fang, C., Ma, Y.M., Yuan, L.C., et al. 2016, Chloroplast DNA underwent independent selection from nuclear genes during soybean domestication and improvement, J. Genet. Genomics, 43, 217–21. [DOI] [PubMed] [Google Scholar]
- 78. Abe, J., Xu, D.H., Suzuki, Y., Kanazawa, A., and Shimamoto, Y. 2003, Soybean germplasm pools in Asia revealed by nuclear SSRs, Theor. Appl. Genet., 106, 445–53. [DOI] [PubMed] [Google Scholar]
- 79. Takahashi, Y., Nasu, H., Nakayama, S., and Tomooka, N. 2023, Domestication of azuki bean and soybean in Japan: from the insight of archeological and molecular evidence, Breed. Sci., 73, 117–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Kuroda, Y., Kaga, A., Tomooka, N., et al. 2013, QTL affecting fitness of hybrids between wild and cultivated soybeans in experimental fields, Ecol. Evol., 3, 2150–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Kim, K.H., Lee, S., Seo, M.J., et al. 2014, Genetic diversity and population structure of wild soybean (Glycine soja Sieb. and Zucc.) accessions in Korea, Plant Genet. Resour., 12, S45–8. [Google Scholar]
- 82. Harrison, S.P., Yu, G., Takahara, H., and Prentice, I.C. 2001, Diversity of temperate plants in east Asia, Nature, 413, 129–30. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





