Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Oct 23;18(4):e70142. doi: 10.1002/tpg2.70142

Simulations of genomic selection implementation pathways in common bean (Phaseolus vulgaris L.) using parametric and nonparametric models

Isabella Chiaravallotti 1, Valerio Hoyos‐Villegas 1,2,
PMCID: PMC12547642  PMID: 41126758

Abstract

We conducted simulations of common bean (Phaseolus vulgaris L.) breeding programs to better understand the interplay between different choices a breeder must make when launching a genomic selection (GS) pipeline. We complement preceding studies on optimizing model parameters and training set makeup by exploring the practical implementation of GS in a common bean breeding program aimed at increasing seed yield. We simulated 24 GS implementation pathways on (1) what generation to train a new prediction model, (2) what generation to select parents for the next cycle, (3) which generation to collect training data, and (4) whether to use a parametric (ridge regression best linear unbiased predictor) or a nonparametric model (artificial neural network) for estimating breeding values. We found that early generation parent selections (also called rapid‐cycle GS) generally resulted in higher gain over three breeding cycles compared to late‐generation parent selections. When implementing a new parametric genomic prediction model, training data should be as diverse as possible, while also matching testing data in terms of genetic makeup and allele frequency. Parametric models showed more consistent genomic estimated breeding value prediction accuracy, while nonparametric models fluctuated, showing both the highest and the lowest prediction accuracy across all pathways. Despite the trade‐off between gains and genetic variance, nonparametric models showed greater balance of allelic diversity and gains. We observed that the key to sustained gains over time is the renewal of genetic variance. Our results indicate a potential for the use of nonparametric models, but more investigation will be required to stabilize their performance.

Core Ideas

  • While ridge regression (RR) remains better than neural network (NN) in overall gain, NN emerges as a genomic selection (GS) tool for maintaining genetic variance and gains.

  • When making early generation genomic estimated breeding value (GEBV) selection, a mixed training dataset of early and late generation is ideal.

  • When making late‐generation selections using GEBVs, it is best to use late‐generation training data.

  • Early generation parents may be useful for maintaining allelic diversity and capturing rare and favorable alleles.

Plain Language Summary

In this study, we conducted simulations of a common bean (Phaseolus vulgaris) breeding program to better understand the interplay between different choices a breeder must make when launching a genomic selection pipeline. We simulated 24 GS implementation pathways, focusing on (1) what generation to train a new prediction model, (2) what generation to select parents for the next cycle, (3) which generation to collect training data, and (4) whether to use a parametric or a nonparametric model for estimating breeding values. We found that early generation parent selections (also called rapid‐cycle GS) generally resulted in higher gain over three breeding cycles compared to late‐generation parent selections. Our results indicate a potential for the use of nonparametric models, but more investigation will be required to stabilize their performance.


Abbreviations

BV

breeding value

GEBV

genomic estimated breeding value

GS

genomic selection

MAF

major allele frequency

MAS

marker‐assisted selection

NN

neural network

QTL

quantitative trait loci

RR

ridge regression

RRBLUP

ridge regression best linear unbiased prediction

SY

seed yield

TBV

true breeding value

1. INTRODUCTION

1.1. Genomic selection to alleviate genetic gain bottlenecks

Plant breeding is a long‐term effort. When selecting on phenotypes alone, it can take years to achieve the desired result. The historic annual rate of genetic improvement for common bean (Phaseolus vulgaris L.) yield hovers between 1.02% and 1.34%, indicating the steady yet slow progress made by traditional breeders in the face of a projected 2% annual increase in demand (Beaver & Osorno, 2009; Chiorato et al., 2010; Crossa et al., 2017; de Faria et al., 2018). Genome‐based methods such as genome‐wide association studies and quantitative trait loci (QTL) mapping have allowed breeders to identify loci that are associated with a desired trait, and breed for these loci directly using marker‐assisted selection (MAS). Using these methods, breeders have achieved gain in areas such as disease resistance, drought tolerance, and flowering time (Assefa et al., 2019; O'Boyle et al., 2007; Raggi et al., 2019; Schneider et al., 1997; X. Wu et al., 2022; Yu et al., 2000). However, when it comes to quantitative traits such as seed yield that are controlled by many small effect loci, it is not feasible to pick up on these small effects and accumulate them individually (Bernardo & Yu, 2007; Jannick et al., 2010). Therefore, despite the implementation of MAS, the rate of genetic gain for seed yield remains stagnant at roughly 1%.

To mitigate this challenge and reach the required annual 2% increase in global demand on the food system (Crossa et al., 2017), genomic selection (GS) may allow breeders to achieve improved selection accuracy on quantitative traits by predicting the genomic estimated breeding value (GEBV) of selection candidates based on a dense set of molecular markers. GS, originally proposed by Bernardo (1994) using restriction fragment length polymorphisms and later implemented by Meuwissen et al. (2001) using single nucleotide polymorphism (SNP) markers, now uses genome‐wide markers to predict breeding values (BVs) of individual selection candidates. GS shows promise in increasing genetic gain, increasing selection accuracy, and reducing time and cost required to achieve desired results in a breeding program. GS claims several benefits over more traditional selection strategies as follows: (1) a reduction in cost where genotyping can replace or augment costly multilocation field trials, (2) the ability to select on complex low‐heritability traits, (3) the ability to select for multiple traits at once, and (4) a faster rate of genetic gain due to an increase in selection accuracy and intensity and a decrease in selection cycle time (Alemu et al., 2024; Crossa et al., 2017; Dreisigacker et al., 2021). For these reasons, GS will likely reduce the costs associated with plant breeding by reducing the cost of phenotyping. Using statistical modeling, we can explore the performance of several different prediction models and breeding scenarios before taking them to the field. This will increase the accuracy and efficiency of breeding programs permitting us to simultaneously increase genetic gain and accumulate desirable alleles (Montesinos López et al., 2022).

1.2. Challenges of breeding common bean

Considering the above benefits of GS, breeders of common bean focusing on seed yield and other complex quantitative traits may benefit from implementing GS in their breeding programs. Originating in South and Central America, the common bean (P. vulgaris L.) spread across the globe following the Columbian exchange and has become one of the world's most important sources of carbohydrates, protein, and nutrition (Assefa et al., 2019). Due to a growing population and an increased interest in plant‐based protein, the common bean market is expected to grow by 4.5% in the next 5 years (Mordor Intelligence, 2024). In addition to their role as a dietary staple, beans have gained popularity as a dietary tool for reducing the incidence of diabetes, heart disease, and cancer (Abdullah et al., 2017). Further, with recent trends toward plant‐based protein sources as a means of mitigating the impact of large‐scale meat production on global climate change, we can expect that the demand for beans will only increase in coming years (Abdullah et al., 2017; Bekkering, 2014; Uebersax et al., 2023). Due to their potential to improve soil health via nitrogen fixation, beans are not only nutritionally significant for humans, but may also improve the longevity of the increasingly tenuous environments in which they are grown (Assefa et al., 2019). Because it is grown in diverse climates under an array of different conditions, continued improvement of the common bean (in terms of agronomic traits such as yield, disease resistance, abiotic stress tolerance, and desirable market traits) will be a necessary component of feeding the world's growing population and meeting market growth (Basavaraja et al., 2020; Blair et al., 2012; Myers & Kmiecik, 2017).

Furthering these challenges, as plant breeders impose intense selection pressure on common bean germplasm, the genetic diversity of breeding populations (from which breeders pull desired traits) inevitably erodes. Without careful consideration for the maintenance of diversity, a breeding program's ability to push genetic gain can be exhausted (Chiaravallotti et al., 2024; J. Lin et al., 2023). Upon domestication, the common bean faced an intense bottleneck, leaving much diversity in wildtype varieties (Arriagada et al., 2024; Cortinovis et al., 2020; Singh, 2001). When seeking sources of diversity in the common bean, breeding programs can look to wild varieties. However, it can be difficult to incorporate a beneficial wild trait without also bringing along undesirable traits (i.e., linkage drag). Breeding programs must balance the competing goals of (1) creating a homogenous cultivar with an array of desirable traits and (2) maintaining vigor and diversity for continued genetic gain (Beaver & Osorno, 2009). Parker et al. (2022) indicated that while a loss in common bean biodiversity could negatively impact crop productivity over time, breeders can implement genomics‐informed breeding to offset this productivity loss within their breeding programs.

1.3. Practical implementation of genomic selection

To implement GS, breeders must first develop a training set consisting of genotypes and corresponding phenotypes to train a prediction model to that can be used to predict the GEBV using genotypes alone (without phenotyping).

Core Ideas

  • While ridge regression (RR) remains better than neural network (NN) in overall gain, NN emerges as a genomic selection (GS) tool for maintaining genetic variance and gains.

  • When making early generation genomic estimated breeding value (GEBV) selection, a mixed training dataset of early and late generation is ideal.

  • When making late‐generation selections using GEBVs, it is best to use late‐generation training data.

  • Early generation parents may be useful for maintaining allelic diversity and capturing rare and favorable alleles.

While common linear models such as GBLUP, ridge regression best linear unbiased prediction (RRBLUP), and Bayesian approaches pick up on linear patterns in the data with considerable accuracy (de los Campos et al., 2013; Habier et al., 2011; Heslot et al., 2012; Sallam et al., 2015; Whittaker et al., 2000) it may miss more complex nonlinear epistatic interactions that contribute to an expressed phenotype. When comparing several machine learning models and classical parametric models, no one model dominates, and the ideal algorithm used for genomic prediction (GP) is dependent on the trait at hand (Azodi et al., 2019; Gianola et al., 2022). Nonparametric machine learning models (whose parameters do not take a predetermined form, as is the case with GBLUP and RRBLUP) have shown promise in the prediction of complex traits controlled by multiple small‐effect loci (Lopez‐Cruz et al., 2021; Ray et al., 2023; Sandhu et al., 2021). Nonparametric models do not have a predetermined relationship between the response variable and the predictors, making them more flexible than classical linear regression, and apt at identifying nonlinear patterns, such as epistatic control of a trait (Montesinos López et al., 2022). A review by Montesinos‐López et al. (2021) gives an overview of 23 studies using deep neural networks for GP, concluding that there were no relevant differences between models. In the cases where artificial neural networks (NNs) do outperform linear methods, nonlinear interactions in the genome may be at play. Therefore, because there are no general rules that apply to all cases and model performance will vary by trait and population, it is best to investigate different models before selecting one for implementation in a breeding program. When it comes to transitioning from classical breeding to GS, breeders are faced with many decisions that can make or break the success of their GS program. For example, breeders must choose which germplasm to use as their training set, what generation the training samples should come from, what type of model to use, and when to employ their new model and launch their GS pipeline. Since the advent of GS, the frontier of GS research has focused primarily on (1) machine learning model optimization, and (2) training set optimization (Alemu et al., 2024). Studies on the size and makeup of the training population are abundant (Atanda et al., 2021; Berro et al., 2019; Fernández‐González et al., 2023; Garcia‐Abadillo et al., 2024; Isidro et al., 2015; Rincent et al., 2012), and several cross‐validation studies have made use of historical data to assess different training set sizes and marker densities (Bernardo & Yu, 2007; Crossa et al., 2010; de los Campos et al., 2009; Norman et al., 2018). Others have assessed different models with or without genotype by environment (G × E) interaction, and various SNP densities and traits training sets (Jarquin et al., 2016; Keller et al., 2020; Morais et al., 2023; Rutkoski et al., 2014). There are also several selection strategies commonly used by breeders, and a breeder must determine which strategy is ideal. Detailed information on these strategies can be found in J. Lin et al. (2023). Here, we will focus on the pedigree method because it is commonly employed by bean breeders. In short, the pedigree method involves making individual‐plant selections from top families at the F2 and F3 generations, and then selecting the top families at F4 and F5 before moving on to yield trials. In the case of common bean where a breeding cycle begins when new parents are crossed and subsequent generations are self‐pollinated, we will see alleles segregating in a Mendelian fashion at the F2 generation (with a 1:2:1 ratio of homozygous AA, heterozygous AB and homozygous BB across loci). With each subsequent generation of self‐pollination, the rate of heterozygotic loci is halved and the population becomes more homozygous (Fehr, 1991). Further, because early generation selections are made among families, later generations are not only more homozygous but also have a pattern of decreasing among‐family variance and increasing within‐family variance (Fehr, 1991). This unique population structure will be reflected in the genotypes used to create a training population. Therefore, when initiating a GS pipeline, breeders need to be vigilant that the allele frequencies in the training set resemble those in the test set, to optimize model performance and, in turn, gains. For example, a training set composed of late‐generation data may not be ideal for making selections on early generation lines, as is customary in the pedigree pipeline.

To examine model optimization parameters and training set makeup, this study aims to explore the practical implementation of GS in a breeding program by simulating several GS implementation pathways, focusing on (1) when to train a new prediction model, (2) when to select parents for the next cycle, (3) which generation to collect training data, and (4) whether to use linear regression or a nonparametric model for estimating BVs.

2. METHODS

2.1. Simulation parameters

Using the R package AlphaSimR (Gaynor et al., 2021), we simulated three cycles of breeding following the pedigree method (Figure 1). The pedigree method was chosen because it is commonly implemented by breeders in self‐pollinated crops, and previous simulations indicate that it is compatible with achieving higher gains compared to other strategies when using GS (J. Lin et al., 2023). AlphaSimR is a stochastic simulation platform using the gene drop method to simulate new haplotypes and a genetic map to simulate recombination (Hickey & Gorjanc, 2012). In short, the gene drop method creates sequences for each chromosome according to an ancestral population, simulating realistic recombination and mutation rates. QTL effects are sampled from a normal distribution for each simulated trait.

FIGURE 1.

FIGURE 1

Overview of simulation structure. Genotypes were loaded into AlphaSimR. Then, parents were selected from the base population and three breeding cycles were conducted following the pedigree method using genomic selection. The pedigree method was implemented as follows: Parents are crossed in a diallel. F1 plants are bulked. Individual plant selections are made from top families at F2 and F3. Top families are selected at F4 and F5. Replicated yield trials are conducted at F6 and F7. Parent selection was implemented at one of two generations, F2 or F5. The genomic prediction model was retrained and implemented at one of two generations, F2 or F5. Change in color from blue to green indicates that a new model is in use. Two models were tested: ridge regression and neural network. Three training set compositions were tested: F2 data, F5 data, or a mix of F2 and F5 data. Each training set consisted of 120 genotypes from the previous cycle. Pathway 1 selects parents at F2 and retrains the model at F2. Pathway 2 selects parents at F2 and retrains the model at F5. Pathway 3 selects parents at F5 and retrains the model at F2. Pathway 4 selects parents at F5 and retrains the model at F5.

While AlphaSimR has a built‐in capacity to generate a realistic simulated dataset, we used 122 lines from McGill's Pulse Breeding and Genetics Lab's 2021 yield trials to generate the base population in AlphaSimR in order to achieve a breeding population that is as close to ground truth as possible. From this base population, we selected parents to initiate a new breeding cycle and simulated 200 crosses to simulate a realistically sized breeding population (Kelly, 2010). Entries used to generate the base population were genotyped using the BARCBean12K BeadChip (Song et al., 2015). Calls were retrieved as either AA (homozygous with two copies of the major allele), AB (heterozygous), or BB (homozygous with two copies of the minor allele). SNP data used for GP are coded as [2] for the AA call, [1] for the AB call, and [0] for the BB call. However, when creating a population in AlphaSimR using real genotypic data, data for each individual chromosome are required. Therefore when loading SNPs into AlphaSimR, AA calls became [1,1], AB calls became [1,0], and BB calls became [0,0]. Further, because the newMapPop() function used to generate our population in AlphaSimR requires a position for each SNP, SNPs without a position on the genome were eliminated. SNPs with call frequency below 1.0 and minor allele frequency <5% were also eliminated. We were left with 4616 SNP markers.

Due to its agronomic importance and quantitative nature, we focused on seed yield in these simulations. The literature reports 42 meta‐QTL for yield and yield components (Arriagada et al., 2022). Using AlphaSimR, we generated four QTL on each of the 11 P. vulgaris chromosomes to approximate this number. Arriagada et al. (2022) also reports an average narrow‐sense heritability of 0.14, so the narrow‐sense heritability was set to 0.14 for the simulations. The mean phenotypic value was set on a per‐plant basis at 8.8 g, following observed yield in the McGill Pulse Breeding and Genetics Lab's 2021 yield trials and is reported at https://www.pulsebreeding.ca/research/yield‐trials

Additive, epistatic, and G × E effects were simulated. Gaynor (2023) describes how each of these effects are simulated within the AlphaSimR platform. Additive effects are sampled from a normal distribution and assigned to each QTL. Then, the values are scaled to match the desired additive genetic variance, which was set to 0.14. Additive and epistatic effects had a mean of 0 and variance of 1, while G × E variance was 1 × 10−6. Epistatic values are limited to interactions between pairs of loci and represent additive‐by‐additive epistasis. As with additive effects, epistatic effects are sampled from a normal distribution with a mean of zero and a standard deviation of one. To model G × E effects, an environmental covariate is sampled from a normal distribution and multiplied by a value for each genotype, which is determined by summing the G × E effect and additive dosage over all QTL.

The simulations were conducted using a custom‐coded simulation platform, which allowed us to easily test all combinations of implementation pathways (model, parent selection, training generation, and training set makeup). The pipeline was developed in the R programming language (R Core Team, 2025) using the R package “AlphaSimR” (Gaynor et al., 2021) to code the breeding pipeline, the “rrBLUP” package (Endelman, 2011) to implement the ridge regression (RR) model and the “Keras” package (Chollet et al., 2016) to implement the NN. All simulations were run using the Canadian Digital Alliance remote clusters. J. Lin et al. (2023) report that the training set for the prediction model should be regularly updated, so at each new breeding cycle, training data were updated using data from the previous breeding cycle. We ran an initial pedigree pipeline using phenotypic selection (PS) to generate a training set for cycle one. Each pathway was replicated 10 times and the mean performance across replications is reported. Each simulation provides five outputs: genotypes for all individuals; model performance for every instance of BV estimation; mean genetic values for every generation; genetic variance for each generation; and phenotype, genetic value (incorporating all effects), true breeding value (TBV) (incorporating only additive effects), and estimated breeding value (EBV) (computed by the prediction model) for every simulated individual. In order to make appropriate comparisons, genetic variance is reported here as the relative change in overall genetic variance from the base population, where variance is considered 100%. The simulation code can be found at https://github.com/McGillHaricots/peas‐andlove/tree/master/GSPathways.

Four different implementation pathways were tested:

  • Pathway 1: Train the model at F2 and choose parents at the F2 generation.

  • Pathway 2: Train the model at F5 and choose parents at the F2 generation.

  • Pathway 3: Train the model at F2 and choose parents at the F5 generation.

  • Pathway 4: Train the model at F5 and choose parents at the F5 generation.

For the purposes of this study, we were not interested in F2 and F5 generations per se but used F2 and F5 to represent early and late generation data, respectively. Each pathway was simulated using 3 different training sets: F5 data (120 observations), F2 data (120 observations), and a mix of F2 and F5 data (120 observations; 60 F2 and 60 F5). We also tested two different models (RRBLUP and NN), totaling 24 simulated GS implementation pathways. Models will be referred to as RRF2 (ridge regression trained with F2 data), RRF5 (ridge regression trained with F5 data), RRMIX (ridge regression trained with an equal mix of F2 and F5 data), NNF2 (Neural network trained with F2 data), NNF5 (Neural net trained with F5 data), and NNMIX (Neural network trained with an equal mix of F2 and F5 data) to represent each model/training set combination. For each pathway, GEBV‐based selections were used to replace phenotype‐based selections at all generations. We also ran two control breeding pipelines using PS: one where parents are selected at F2 (PSF2), and one where parents are selected at F5 (PSF5). We evaluated the pathways according to genetic gain at each generation, change in genetic variance at each generation, and the rate of fixed alleles in the preliminary yield trial stage for each cycle.

2.2. Prediction models

Predictions were made using either RRBLUP or feedforward NN as the prediction model. The model was trained with new data from the previous cycle either at F2 or F5. RRBLUP is defined as follows:

y=XGu+ε,

where y is the response variable, X is the design matrix relating lines to observations, G is the coded matrix of genotypes, uN(0,Iσu2) is a vector of marker effects and, ε is the error term. A penalty is applied to the coefficients so that those with a large magnitude are penalized. The solution is described as follows:

u^=ZZZ+λI1,

where Z = XG and the ridge parameter λ is the ratio between the residual variance and the marker variance (σe2σu2). The model was implemented using the rrBLUP package (Endelman, 2011). The model was evaluated using the Pearson's correlation coefficient between the TBV computed by the simulation, and the EBV computed by the prediction model.

NN can be defined as follows:

aL+1=FaLWL+bL+1,

where a indicates the value of each node in each layer l. Note that a is computed by applying an activation function (F) to the previous layer's nodes by the learned weights (W) plus the bias (b). The output of the NN is determined as follows:

y=FFXiW0+b1W1+b2WL+bL+1,

where y is the network output given X inputs, N is the hidden layers, F is the activation function, W denotes weights, and b denotes biases for all i parameters. Weights and biases were updated during model training to minimize the loss function, which in this case was mean squared error. A rectified linear unit activation function (ReLU) was used to correspond to our continuous response variable of seed yield. The input of the model (X) is a matrix of genotypes and the output of the model (y) is the estimated seed yield of each genotype. As such, the model was evaluated using the Pearson's correlation coefficient between the true phenotype computed by the simulation, and the estimated phenotype computed by the prediction model.

The architecture of our NN can be viewed in Figure S1. According to Montesinos López et al. (2022), two hidden layers are sufficient to represent any nonlinear function because two layers permit the elucidation of nonlinear decision boundaries between dependent and independent variables. We followed that to prevent overfitting while capturing adequate complexity, the hidden layers should contain the mean number of neurons between the input and output layer, in this case 2300 neurons. Therefore, the network implemented contained an input layer with one node for each marker (4616 nodes), and two densely connected layers with 2300 nodes. The output layer has one node, representing the predicted seed yield. Because RR is a regularized model, we also chose to implement a regularization technique in our NN. We included dropout regularization at a rate of 0.05, and batch normalization. Due to its dependability, AdaM (Adaptive Moment Estimation) was used as the optimizer, and it describes a form of gradient descent that allows the model to arrive at the ideal model weights, which minimize our loss function (Kingma & Ba, 2014). The Keras package was used to implement the NN.

3. RESULTS

3.1. Comparing models and gain across pathways

Figure 2 indicates percent gain (increase in seed yield) at each breeding cycle for each pathway, using each model. According to Figure 2, we can see that highest rate of genetic gain across all pathways was observed using RRMIX when following Pathways 1 and 2 (both achieved 17% gain). When only F2 data were used to train the model, Pathway 1 with RR resulted in the highest gains (16%). When only F5 data were used to train the model, Pathway 4 and NN resulted in the highest gains (11%).

FIGURE 2.

FIGURE 2

Heatmap of genetic gain for all pathways. The top panel shows the pathways using RRF2, RRF5, and RRMIX, and the bottom panel shows the pathways using NNF2, NNF5, and NNMIX. The gain for each cycle following each pathway is shown along the horizontal axis as a percent increase in per‐plant seed yield, which is shown in each square. The values are colored according to the percent change in seed yield, with darker colors indicating greater gain and lighter colors indicating lower gain. When using RRF2, gains are highest when the model is trained at F2 and used to predict F2 parents and gains decrease steadily when we follow Pathways 2–4 (gains of 4.8%, −3.4%, and −7.9%, respectively. Similarly, when using NNF5, gains are highest when the model is trained at F5 and used to predict F5 parents (11%) and gains decrease steadily when we follow Pathways 3–1 (7.1%, 2.4%, −5.6% respectively). The values for percent gain indicate mean gain across replications at the end of each cycle.

When using RRF5, highest performance in terms of gain was seen when the model was implemented at F5 (8.3%), and lowest performance was seen when the model was implemented at F2 (−0.56%) (Figure 2). Notably NNF2 resulted in similar gains across all pathways and although it was not the highest performer, it showed the least fluctuation in terms of gains (4.4%, 3.8%, 2.9%, and 3.2% for each pathway, respectively). NNMIX performed best under Pathway 1 (11%), followed by Pathway 4 (5.2%), Pathway 3 (5.2%), and Pathway 2 (0.89%).

Figure 3 shows the increase in seed yield (i.e., genetic gain) across pathways for each model. Each curve in Figure 3 represents the mean value across replications and the shaded region represents the standard deviation across replications. Figure 3A indicates that when using RRF2 Pathway 1 resulted in the highest gain, but at some points during the breeding program gains were comparable to the PSF2 control. When using RRF5 (Figure 3B), none of the pathways were able to overcome the PSF2 control, and Pathways 1 and 3 resulted in significantly lower gains than the controls and Pathways 2 and 4. When using RRMIX (Figure 3C), Pathway 1 was able to overcome the PS controls, and Pathway 2 showed a higher rate of gain than the controls at first, but plateaued at the end of Cycle 1.

FIGURE 3.

FIGURE 3

Genetic gain for three cycles for (A) RRF2, (B) RRF5, (C) RRMIX, (D) NN2, (E) NNF5, and (F) NNMIX. Percent gain for each implementation pathway and the phenotypic selection controls are plotted as average across replications. Shading shows standard deviation across replications. Percent change in genetic value indicates the mean percent increase in genetic value across all individuals for a given generation. The white dividing lines represent the end of each breeding cycle.

Using NNF2 (Figure 3D) set, Pathway 1 showed notably high standard deviation and no significant difference among the pathways emerged. The same was the case for NNMIX (Figure 3F). However, the mean for NNMIX was able to overcome the PSF5 control in terms of gains. Using NNF5 (Figure 3E), Pathway 4 emerged as the pathway with higher gain, however this pathway appears to plateau at the end of Cycle 1 and resulted in a similar final gain value as the F2 control. Pathways 1 and 2 showed notably low gain using this model, with no significant difference between Pathways 3 and 4 and the F5 control.

3.2. Model performance

Figure 4 indicates the maximum, minimum, and mean prediction accuracy for all models across all pathways. These values represent averages across replications, and the minimum and maximum are reported as the lowest and highest accuracy across all instances of the model over all three breeding cycles. The highest prediction accuracy observed across all pathways was 0.85 for NNF5 following Pathway 4 (Figure 4D). However, the variance for NNF5 performance fluctuated greatly throughout the simulated breeding cycles (Figure S2) and the mean for all instances of NNF5 under Pathway 4 was 0.4.

FIGURE 4.

FIGURE 4

Minimum, maximum, and average prediction accuracy across all models for all pathways. (A) Pathway 1, (B) Pathway 2, (C) Pathway 3, and (D) Pathway 4. Models are in the following order: NNF2, NNF5, NNMIX, RRF2, RRF5, and RRMIX. PCC indicates the Pearson's correlation coefficient between true and estimated breeding values. Each value represents the mean value observed across all replications.

When following Pathway 1 (Figure 4A), the highest accuracy was observed using the RRF2 set (r = 0.33). Using RRMIX also resulted in high performance (r = 0.27), followed by NNMIX with a maximum accuracy of 0.16. When following Pathway 2 (Figure 4B), RRF5 had the highest prediction accuracy (r = 0.52). RRMIX also had relatively high accuracy (r = 0.3). When following Pathway 3 (Figure 4C), NNF5 had the highest prediction accuracy (r = 0.74), followed by RRMIX (r = 0.35). For Pathway 4 (Figure 4D), NNF5 showed the highest prediction accuracy (r = 0.85) followed by RRF5(r = 0.62).

3.3. Change in genetic variance and allele frequency for each pathway

Figure 5 depicts the relative change in genetic variance across all pathways and models. Overall, the loss of genetic variance was greatest using Pathway 4, and most significant when using NNF2 (−97%). All scenarios except for Pathway 2 using RRF2 resulted in an overall decrease in overall genetic variance during the simulated breeding program. For Pathway 1, the greatest decrease in variance was observed using RRF2 (−57%) and the greatest maintenance of variance was observed when using the RRF5 model (−20%). The decrease in variance for all simulated pathways was greater than the decrease in variance for both controls PSF2 (−5%) and PSF5 (−16%). For Pathway 2, there was an increase in genetic variance (10%) when using RRF2. All other changes in variance were greater than that of the controls (RRF5: −32%, RRMIX: −12%, NNF2: −10%, NNF5: −24%, and NNMIX: 0.016%). For Pathway 3, the greatest decrease in variance occurred when using the RRMIX model (−71%). The greatest maintenance of variance was observed when using the RRF5 model (−4%).

FIGURE 5.

FIGURE 5

Overall change in genetic variance across all models and pathways as a relative percentage of the overall starting variance. Error bars show standard deviation across replications. P1, Pathway 1 (select parents at F2 and train the model at F2); P2, Pathway 2 (select parents at F2 and train the model at F5); P3, Pathway 3 (select parents at F5 and train the model at F2), and P4, Pathway 4 (select parents at F5 and train the model at F5).

Figure 6 depicts the change in the rate of fixed alleles for each breeding cycle following each pathway and using each model. Alleles were considered fixed when the major allele frequency (MAF) was 95% or higher. If we compare Figures 2 and 6, we can discern a general inverse relationship between gain and rate of fixed alleles when following Pathways 1, 3, and 4. However, this trend is less pronounced under Pathway 2.

FIGURE 6.

FIGURE 6

Heat map showing the rate of fixed alleles with a major allele frequency greater than 95%. The top panel shows the pathways using RRF2, RRF5, and RRMIX, and the bottom panel shows the pathways using NNF2, NNF5, and NNMIX. The rate of fixed alleles at each cycle following each pathway is shown along the horizontal axis as the rate of major alleles with a frequency greater than 95%. The values are colored according to the rate of fixed alleles, with darker colors indicating higher fixation and lighter colors indicating lower fixation.

When using RRF2, the rate of fixation was lowest for Pathways 1 and 2 (0.38 and 0.4) and highest for Pathways 3 and 4 (0.77 and 0.85). When using RRF5, the rate of fixation was lowest under Pathway 2 (0.44). For Pathways 1, 3, and 4, the final rates of allele fixation were 0.87, 0.89, and 0.87, respectively. For RRMIX, the rate of fixed alleles was highest under Pathway 1 (0.38), followed by Pathway 4 (0.67) and Pathways 2 and 3 (both had an ultimate allele fixation rate of 0.53). For NNF2, the highest rate of allele fixation was 0.99 for Pathway 4, followed by 0.8 for Pathway 2, 0.77 for Pathway 1, and 0.73 for Pathway 3. For NNF5, the highest rate of allele fixation was observed under Pathway 1 (0.71), followed by Pathway 4 (0.66), Pathway 3 (0.56), and Pathway 2 (0.44). For NNMIX, the rate of fixation was highest under Pathway 3 (0.62) followed by Pathway 4 (0.46), Pathway 1 (0.42), and Pathway 2 (0.41). Figure 6 indicates that for a given pathway, the model and training set used to implement GS can have a significant impact on which genotypes are selected on, and how much variance is lost. For example, the rate of fixed alleles for Pathway 1 ranged from 0.38 to 0.87 depending on which model and training set were implemented.

4. DISCUSSION

4.1. Best pathway for implementing GS in common bean

First, it is important to acknowledge that the goal of this study was to model a scenario that would lay the groundwork for further investigation into GS in common bean. Testing each of the variables present in this study (training set composition, parent selection, prediction model, and model retraining) would pose significant challenges and require significant resources in the field, making these questions ideal for in silico validation prior to designing future field experiments. In designing these simulations, we had to strike a balance between simulation simplicity and realistic complexity (for the sake of computational feasibility and generalizability), as is the case for all simulations (Bančič et al., 2025). For example, we chose to model the 42 identified meta‐QTL for yield in common bean, knowing that not all of these QTL may be present in a breeding population and that not all of these QTL may be liable to epistatic interactions. However, modeling yield in this fashion provides a reliable simulation environment for testing different breeding scenarios. Our study provides an exploration into GS in common bean, and the simulation approximates reality as closely as feasibility allows. Bančič et al. (2025) point out that there are current areas of improvement for simulation including G × E interaction, the size of the parameter space, simulation of the genome, recombination, simulating multiple populations, and phenotype simulation. We can conceive of further simulations studies beyond the present study that could parse the details of a breeding program even further. For example, in order to make an “apples‐to‐apples” comparison between the different pipelines we have set the heritability to 0.14 throughout the experiment, although in practice heritability may change from generation to generation or increase as we introduce more locations during the preliminary yield trials and advanced yield trials. Additionally, we decided to place simulations at a challenging position (low heritability) to set a baseline. The purpose of this study was to compare the 24 different GS pipelines we designed and understand the trends in genetic gain and genetic variance that emerge under each of these different breeding pathways.

Depending on the existing structure of a breeding program and the data that a breeder has to train a prediction model, care should be taken to construct a GS pipeline that will facilitate the greatest possible genetic gain while minimizing the loss of diversity and favorable alleles. Considering our results, an ideal scenario would be to collect a genetically diverse training set based on marker data, train an RR model, and select parents for future cycles in early generations. This scenario will result in the highest gains, while mitigating a diversity bottleneck. This pipeline is also an appealing implementation of GS in common bean because low seed count at early generations makes it challenging to conduct a replicated, multi‐environment trial and make informed decisions at early generations. GSs effectively eliminate the challenge of low seed count at early generations by allowing breeders to genotype selection candidates instead of phenotyping them. It should be noted that the benefit of making early generation selections to mitigate a loss of allelic diversity was not observed when using RRF5 and RRMIX. This is likely because due to the presence of F5 genotypes for training, the model was trained to identify highly homozygous genotypes with high genetic value. Therefore, a narrow set of genotypes with homozygous loci resembling those observed in the F5 training samples were likely selected from the F2 pool of parent candidates, and although gains were achieved, the rate of fixed alleles was much higher than that of the pathway which used RRF2 (see Figures 2 and 6). This pattern was not observed when using NN. Considering Pathway 1, NNMIX resulted in both the highest gains and lowest allele fixation rate compared to NNF5 and NNF2. This highlights the differing nature of the two models, and the potential benefit of using NNs to simultaneously achieve gains and maintain variance.

4.2. Comparing models and pathways according to genetic gain

The concept of using GS to make early generation parent selections to increase the rate of genetic gain is validated in a handful of empirical studies testing rapid cycle recurrent selection for grain yield in wheat (Dreisigacker et al., 2023), grain yield in maize (Zhang et al., 2017), and grain yield under drought and waterlogging stress for wheat (Das et al., 2020). Barili et al. (2018) and Sandhu et al. (2021) also report that GS can be used to help breeders identify lines of high merit in early generations. Considering the RRF2 model, we can see in Figure 3 that when the model is implemented at the F2 generation and used to select F2 parents (Pathway 1), gains are higher than the other pathways where an F2 trained model is implemented at F5 or used to select F5 parents. Similarly, for the RRF5 and NNF5 models, gains were highest when implemented at F5 and used to select F5 parents. This supports the idea that training data should resemble the test data as much as possible. Although F2 data may be closer to the test set in terms of generational separation, it may not resemble the testing data as closely in terms of allele frequency and patterns of segregation. Breeders should not only think about relatedness in terms of generational distance, but in terms of the nature of the genotypes at a particular generation. Considering a mixed training set, performance in terms of gain was highest when implemented in Pathway 1 for RRMIX, and when implemented in Pathway 1 for NNMIX. In other words, the mixed training sets resulted in the greatest gains when employed in the pathways that select parents at the F2 generation, and not the parents that select parents at the F5 generation. This is because segregating alleles at the F2 generation will result in a mix of heterozygotic and homozygotic loci. Therefore, F5 data can bolster an F2 set being used to train a modeling making F2 predictions; however, F2 data introduces noise when selecting on largely homozygous genotypes at the F5. Simulations conducted by J. Lin et al. (2023) and Chiaravallotti et al. (2024) arrived at a similar conclusion, reporting that training data should match test data in terms of allele frequencies in the population.

When implementing RRF5 at the F5 generation, we can see in Figure 3B that this pathway overlaps with the PSF2 and PSF5 controls. A similar result was observed by Bandillo et al. (2023) when applying PS and GS to soybean lines at the F5 generation in the field. They found that GS performed numerically, but not significantly, better than PS in the field. Granted genotyping costs are lower than phenotypic costs per sample, a transition from GS to PS would still be worthwhile; not necessarily for an increase in gains per se, but for a decrease in cost without a loss in gains.

When collecting data to construct a training set for RR, breeders may want to choose a highly diverse training set with diverse generational data to most broadly capture high and low effects alleles to generate a model with a wide range. Similarly, Isidro et al. (2015) concluded that both diversity within the training population and the relatedness between the training and testing population should be maximized. Verges and Van Sanford (2020) investigated GS implementation at the preliminary yield trial stage (as early as F5) and report that smaller training sets can be sufficient to make accurate predictions of BVs, given the training and test sets are related, and the training set should be a pooled mix of diverse material, confirming our result that RRMIX performs better than RRF5 or RRF2.

It was also notable that while, in general, the NNF2 model resulted in relatively low gains across low pathways, the gains for NNF2 were the most stable across pathways (Figure 2). This supports the idea that a NN trained with F5 data produced a model of good fit on F5 data that does not generalize well to F2 data. However, a neural net trained on F2 data did not produce a very strong fit, but it was more generalizable across pathways.

4.3. Genetic gain and genetic variance

The implementation pathways that yielded the highest gain over three breeding cycles were Pathways 1 and 2 when RRMIX was used as the prediction model (seen in Figure 2). According to MAF density plots (Figure S3), continued gains likely occurred when these pathways selected parents at the F2 generation, effectively regenerating variance when crossing a segregating population. In Figure 3, we can see that the gains for Pathway 1 continue to increase while gains for Pathway 4 (which recycles the relatively homozygous F5 population as parents) sees a plateau in gains likely due to a loss of variance.

Figure 5 indicates that across all pathways, when using RRMIX, the least amount of variance was lost when following Pathway 2, accounting for the continued gains over the course of the three breeding cycles. We also observed that when using the RRMIX training set and following Pathway 1, the correlation between the GEBVs and the phenotype (SY) was higher at F2 compared to the other three pathways during Cycle 1 (Figure S4C). However. during Cycles 2 and 3, Pathway 2 showed the highest correlation with phenotypes. This suggests that when we are following Pathway 2, and hence selecting F2 parents, a mixed training set was amenable to choosing those parents with high TBV and similarly, high yield compared to the other pathways. This supports the idea that early generation parent selections can be used to achieve continuous gains without a reciprocal loss in genetic variance. Similar results were observed by Gorjanc et al. (2018) in which continued gains were achieved in a simulated breeding program when parents were selected and recycled rapidly at early generations. When following Pathway 1, Figure 6 indicates that a great deal of variance was lost, and this is likely the cause of inconsistent gains observed in Figure 3A. Gorjanc et al. (2018) propose a two‐part breeding strategy where population improvement and line development are separated into two distinct pipelines. The procedure proposed by Gorjanc et al. (2018) resembles the pipeline implemented in Pathways 1 and 2, in that early generation selections were made to identify and cross new parents whose progeny advanced through a product‐development pipeline until elite materials were obtained. Gorjanc et al. (2018) describe a strategy of taking genetic variance and converting it into genetic gain over the course of the breeding program, which is likely what occurred when RRMIX and NNMIX were used to predict GEBVs in Pathway 1. Conversely, in the cases where gains were lowest (Pathway 4 using RRF2 and Pathway 4 using NNF2) and a dramatic decrease in genetic variance was observed. This is likely a combination of both (1) eroding variance due to selecting late‐cycle highly homozygous parents, each of which had similar GEBVs (and, therefore, similar genotypes) and (2) a poorly calibrated model that created a bottleneck with parents that had high EBVs but low TBVs (low accuracy). This phenomenon is visible in Figure S5D where many alleles are fixed, or nearly fixed (MAF near 1.0), indicating low variance. We also do not see the desired behavior of presumably favorable alleles becoming fixed over time. Instead, we see a very small shift in MAF, indicating the model was not able to successfully identify individuals with favorable alleles to advance through the breeding pipeline.

According to Figure 3, several of the pathway‐model combinations saw a plateau in gains as well as a dramatic (>40%) decrease in genetic variance, namely RRF2 Pathway 1, RRMIX Pathway 3, RRMIX Pathway 4, NNF2 Pathway 4, and RRF2 Pathway 4. For RRF2 Pathway 1, RRMIX Pathway 3, and RRMIX Pathway 4, the decrease in variance corresponded to an increase in gain, followed by a plateau in gain, indicating that useful variation existed early in early cycles but was lost over the course of the three breeding cycles. This was not the case for NNF2 Pathway 4 and RRF2 Pathway 4 where the decrease in variance was high and the gains were also near zero. In this case, the model showed poor performance, decreasing genetic variance without increasing the frequency of desirable alleles. While GS has the potential to increase speed, accuracy, and rate of genetic gain, breeders should take the utmost caution in building a model because if the model has low accuracy, it will not pick out top performing lines, and instead pick out a narrow band of poor‐ or mid‐performing lines, decreasing both genetic value and variance in the population. A plateau in gain due to a loss of genetic variance under GS was also observed in a simulation study by Chiaravallotti et al. (2024), J. Lin et al. (2023), and Z. Lin et al. (2016). Similarly, higher genetic relatedness and inbreeding were observed by Bandillo et al. (2023) under a GS protocol when comparing GS to PS. In each of the pathways where the genetic variance was almost entirely depleted, late‐generation parent selections were made (at the F5), furthering the case for applying GS in a rapid‐cycle early generation selection context. This is not to say that late‐generation GEBV parent selections should not be considered as an implementation pathway. We observed pathways where late‐generation GEBV selections were still able to overcome the PS controls in terms of gain without eroding the majority of the variance (RRF5 Pathway 4 and NNF5 Pathway 4). A breeder's ability to either regenerate or introduce new variation is vital for achieving longevity in genetic gain, and the recombinants generated when conducting rapid cycle GS can help mitigate any losses in genetic variance, especially when crosses are made not only within biparental families but also between biparental families (Das et al., 2020; Dreisigacker et al., 2023; Zhang et al., 2017).

4.4. Variation in model performance

Similar to J. Lin et al. (2023), we saw a variation in prediction accuracy according to selection strategy that is implemented. Overall, when RR was used as the prediction model, gains were highest (Figures 3 and 4). This is likely because when solving for RR coefficients, we can arrive at one closed‐form solution. However, when it comes to solving for the weights and biases in a NN, there is no one closed form solution as the weights and biases update iteratively during training, making the model inherently less stable. This is evident in Figure 5 (Pathway 3) where the NN prediction accuracy varies from −0.06 to 0.74. In the case where NN prediction accuracy is very low, it is likely that the model was not able to arrive at the optimal weights and biases in the 100 iterations used for training, whereas weights and biases were in fact optimized in the case where higher accuracies were achieved.

We expected NN to perform considerably worse than RR because, in general, nonparametric models are described as “data‐hungry,” requiring a larger training set to achieve accuracy. Another concern in this study is that linear algorithms are used to generate new genotypes, potentially biasing the results in favor of RR. However, this was not observed and, in many cases, NN performance was comparable to, or exceeded RR performance. Sandhu et al. (2021) report that nonparametric models do not, in fact, always require a simple increase in data volume to perform well. They can be comparable to parametric models, even with a relatively small dataset. Gorjanc et al. (2018) also report that increasing training set size is not a guarantee of a continued increase in prediction accuracy. The cases where the NN performed better than RR are most likely a result of the model's ability to quantify the simulated epistatic interactions between loci as well as additive effects. Derbyshire et al. (2021) report that partitioning variance into additive and epistatic effects resulted in better BV estimations for sclerotinia resistance, which, like seed yield, is a complex quantitative trait. This partitioning is effectively done by the NN architecture during model training. The cases where the NN was not able to contend with RR were likely due to noise introduced by heterozygotic loci, indicated in Figure 2 where highest gains were achieved by the NN when F5 data were used to train the model and F5 parents were selected. While adding noise to an NN can be used as a tool to prevent overfitting, here the noise of the heterozygotic loci resulted in underfitting. We can see evidence of noisy heterozygous loci preventing overfitting when we look at the gains for NNF2. NNF2 gains were stable across all pathways. However, the gains were stable but relatively low further indicating that optimal weights and biases were not reached. This could be a result of an insufficient number of training iterations or an insufficient number of training set observations.

Despite the challenges in reliable and stable NN training, there were cases where the NN was able to converge on the optimal model parameters. Across all instances of GEBV estimation, NN showed the highest observed prediction accuracy (0.85) as well as the lowest observed prediction accuracy (−0.25). A cross‐validation study using a common bean data set also saw a peak performance of up to 80% (Keller et al., 2020). NNs are capable of performing well in the task of GP but they are less stable than RR. Therefore, for achieving reliable gains over the entire course of a breeding program, RR continues to be regarded as the gold standard due to its transferability and reliability. NN may show higher performance in the future as methods for training a stable and transferrable model are improved. Further, more advanced architectures such as transformers have shown improved prediction accuracy in recent years (Jubair et al., 2021; Ubbens et al., 2021; C. Wu et al., 2024). Further investigation evaluating the use of these models and their ability to quantify genetic effects may contribute to the success of GP in the future.

5. CONCLUSIONS

GS has the potential to increase the rate of genetic gain in a common bean breeding program. However, the end result of a GS breeding pipeline relies on choices made by the breeder in terms of (1) what germplasm to use as a training set, (2) what generation to use a newly trained model for making selections, (3) when to select parents for the next cycle, and (4) what type of prediction model to use. In general, the benefits of GS are maximized when a diverse training set is used to train a linear regression model, and the model is implemented early in the breeding cycle for parent selections. Another benefit of making early generation parental selections is that breeders will be frequently making crosses, developing new recombinants, selecting parents on a wider pool of candidates that have not undergone selection, and renewing genetic variance that can be rapidly diminished under a GS protocol. When creating a training set, breeders should consider not only distance from the training set in terms of the generation, but also difference from the training set in terms of allele frequency composition. As breeders have limited resources regarding what germplasm can be genotyped and phenotype, it may be advisable to first address available germplasm, genotyping capability, and available phenotype data and use this information as a starting point for designing a GS pipeline (in contrast to designing the pipeline first and then trying to retrofit the training set). While a regularized parametric model generally performs better than a regularized nonparametric model, there are cases where the nonparametric model (1) outperforms the parametric model, (2) results in more consistent gains than the parametric model, and (3) maintains a higher level of genetic variation while also maintaining considerable gains. Therefore, more investigation into these nonparametric models that are high performers and/or stable performers would be a useful next step in optimizing GS pipelines and ensuring top performing lines are accurately identified by the GP model.

AUTHOR CONTRIBUTIONS

Isabella Chiaravallotti: Data curation; formal analysis; investigation; methodology; software; validation; visualization; writing—original draft; writing—review and editing. Valerio Hoyos‐Villegas: Conceptualization; data curation; methodology; project administration; resources; supervision; validation; writing—original draft; writing—review and editing.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

Supporting information

Supplemental Figure 1: A visual representation of the neural network implemented in this study.

Supplemental Figure 2: The model performance (Pearson's Correlation Coefficient between the true and estimated breeding valuet) for all pathways and all models over the course of the simulated breeding program.

Supplemental Figure 3: Density plots of major allele frequency for each Pathway using RRF2 (A‐D), RRF5(E‐H) and RRMIX(I‐L).

Supplemental Figure 4: The correlation between the EBV (estimated breeding value) and phenotype value for each model: Ridge Regression trained with F2 data (RRF2, plot A), Ridge Regression trained with F5 data (RRF5, plot B), Ridge Regression trained with F2 and F5 data (RRMIX, plot C), Neural Network trained with F2 data (NNF2), Neural Network trained with F5 data (NNF5) and Neural Network trained with F2 and F5 data (NNMIX).

Supplemental Figure 5: Density plots of major allele frequency for each Pathway using NNF2 (A‐D), NNF5(E‐H) and NNMIX(I‐L).

Supplemental Figure 6: Density plots of major allele frequency for each Pathway using the PSF2 control (left) and the PSF5 control (right).

TPG2-18-e70142-s001.docx (1.2MB, docx)

ACKNOWLEDGMENTS

The authors wish to acknowledge FRQNT and Mitacs for their financial support, as well as our colleagues Pietro Polinari‐Cassavin and Dr. Robert McGee for their contributions to the project's code and conceptualization, respectively.

Chiaravallotti, I. , & Hoyos‐Villegas, V. (2025). Simulations of genomic selection implementation pathways in common bean (Phaseolus vulgaris L.) using parametric and nonparametric models. The Plant Genome, 18, e70142. 10.1002/tpg2.70142

Assigned to Associate Editor Diego Jarquin.

DATA AVAILABILITY STATEMENT

Data and code are available on Github at https://github.com/McGillHaricots/peas‐andlove/tree/master/GSPathways.

REFERENCES

  1. Abdullah, M. M. H. , Marinangeli, C. P. F. , Jones, P. J. H. , & Carlberg, J. G. (2017). Canadian potential healthcare and societal cost savings from consumption of pulses: A cost‐of‐illness analysis. Nutrients, 9(7), 793. 10.3390/nu9070793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alemu, A. , Åstrand, J. , Montesinos‐López, O. A. , Isidro Y Sánchez, J. , Fernández‐Gónzalez, J. , Tadesse, W. , Vetukuri, R. R. , Carlsson, A. S. , Ceplitis, A. , Crossa, J. , Ortiz, R. , & Chawade, A. (2024). Genomic selection in plant breeding: Key factors shaping two decades of progress. Molecular Plant, 17(4), 552–578. 10.1016/j.molp.2024.03.007 [DOI] [PubMed] [Google Scholar]
  3. Arriagada, O. , Arévalo, B. , Cabeza, R. A. , Carrasco, B. , & Schwember, A. R. (2022). Meta‐QTL analysis for yield components in common bean (Phaseolus vulgaris L.). Plants, 12(1), 117. 10.3390/plants12010117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Arriagada, O. , Arévalo, B. , Pacheco, I. , Schwember, A. R. , Meisel, L. A. , Silva, H. , Márquez, K. , Plaza, A. , Pérez‐Diáz, R. , Pico‐Mendoza, J. , Cabeza, R. A. , Tapia, G. , Fuentes, C. , Rodríguez‐Alvarez, Y. , & Carrasco, B. (2024). A past genetic bottleneck from Argentine beans and a selective sweep led to the race Chile of the common bean (Phaseolus vulgaris L.). International Journal of Molecular Sciences, 25(7), 4081. 10.3390/ijms25074081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Assefa, T. , Assibi Mahama, A. , Brown, A. V. , Cannon, E. K. S. , Rubyogo, J. C. , Rao, I. M. , Blair, M. W. , & Cannon, S. B. (2019). A review of breeding objectives, genomic resources, and marker‐assisted methods in common bean (Phaseolus vulgaris L.). Molecular Breeding, 39, 1–23. 10.1007/s11032-018-0920-0 [DOI] [Google Scholar]
  6. Atanda, S. A. , Olsen, M. , Crossa, J. , Burgueño, J. , Rincent, R. , Dzidzienyo, D. , Beyene, Y. , Gowda, M. , Dreher, K. , Boddupalli, P. M. , Tongoona, P. , Danquah, E. Y. , Olaoye, G. , & Robbins, K. R. (2021). Scalable sparse testing genomic selection strategy for early yield testing stage. Frontiers in Plant Science, 12, 658978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Azodi, C. B. , Bolger, E. , McCarren, A. , Roantree, M. , de Los Campos, G. , & Shiu, S. H. (2019). Benchmarking parametric and machine learning models for genomic prediction of complex traits. G3: Genes, Genomes, Genetics, 9(11), 3691–3702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bančič, J. , Greenspoon, P. , Gaynor, C. R. , & Gorjanc, G. (2025). Plant breeding simulations with AlphaSimR. Crop Science, 65(1), e21312. [Google Scholar]
  9. Bandillo, N. B. , Jarquin, D. , Posadas, L. G. , Lorenz, A. J. , & Graef, G. L. (2023). Genomic selection performs as effectively as phenotypic selection for increasing seed yield in soybean. The Plant Genome, 16(1), e20285. 10.1002/tpg2.20285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Barili, L. D. , do Vale, N. M. , e Silva, F. F. , Carneiro, J. E. d. S. , de Oliveira, H. R. , Vianello, R. P. , Valdisser, P. A. M. R. , & Nascimento, M. (2018). Genome prediction accuracy of common bean via Bayesian models. Ciência Rural, 48, e20170497. [Google Scholar]
  11. Basavaraja, T. , Pratap, A. , Dubey, V. , Gurumurthy, S. , Gupta, S. , & Singh, N. P. (2020). Molecular and conventional breeding strategies for improving biotic stress resistance in common bean. Accelerated Plant Breeding, 3, 389–421. 10.1007/978-3-030-47306-8 [DOI] [Google Scholar]
  12. Beaver, J. S. , & Osorno, J. M. (2009). Achievements and limitations of contemporary common bean breeding using conventional and molecular approaches. Euphytica, 168, 145–175. 10.1007/s10681-009-9911-x [DOI] [Google Scholar]
  13. Bekkering, E. (2014). Pulses in Canada . Canadian agriculture at a glance (Catalogue no. 96‐325‐X — No. 007). Statistics Canada. http://www.statcan.gc.ca
  14. Bernardo, R. (1994). Prediction of maize single‐cross performance using RFLPs and information from related hybrids. Crop Science, 34(1), 20–25. 10.2135/cropsci1994.0011183X003400010003x [DOI] [Google Scholar]
  15. Bernardo, R. , & Yu, J. (2007). Prospects for genomewide selection for quantitative traits in maize. Crop Science, 47(3), 1082–1090. 10.2135/cropsci2006.11.0690 [DOI] [Google Scholar]
  16. Berro, I. , Lado, B. , Nalin, R. S. , Quincke, M. , & Gutiérrez, L. (2019). Training population optimization for genomic selection. The Plant Genome, 12(3), 190028. 10.3835/plantgenome2019.04.0028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Blair, M. W. , Soler, A. , & Cortés, A. J. (2012). Diversification and population structure in common beans (Phaseolus vulgaris L.). PLoS ONE, 7(11), e49488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chiaravallotti, I. , Lin, J. , Arief, V. , Jahufer, Z. , Osorno, J. M. , McClean, P. , Jarquin, D. , & Hoyos‐Villegas, V. (2024). Simulations of multiple breeding strategy scenarios in common bean for assessing genomic selection accuracy and model updating. The Plant Genome, 17(1), e20388. 10.1002/tpg2.20388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chiorato, A. F. , Carbonell, S. A. M. , Vencovsky, R. , Fonseca, N. d. S., Jr. , & Pinheiro, J. B. (2010). Genetic gain in the breeding program of common beans at IAC from 1989 to 2007. Crop Breeding and Applied Biotechnology, 10, 329–336. [Google Scholar]
  20. Chollet, F. . (2016). Building autoencoders in Keras . The Keras Blog. https://blog.keras.io/building‐autoencoders‐in‐keras.html [Google Scholar]
  21. Cortinovis, G. , Frascarelli, G. , Di Vittori, V. , & Papa, R. (2020). Current state and perspectives in population genomics of the common bean. Plants, 9(3), 330. 10.3390/plants9030330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Crossa, J. , Campos, G. D. L. , Pérez, P. , Gianola, D. , Burgueño, J. , Araus, J. L. , Makumbi, D. , Singh, R. P. , Dreisigacker, S. , Yan, J. , Arief, V. , Banziger, M. , & Braun, H. J. (2010). Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics, 186(2), 713–724. 10.1534/genetics.110.118521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Crossa, J. , Pérez‐Rodríguez, P. , Cuevas, J. , Montesinos‐López, O. , Jarquín, D. , de Los Campos, G. , Burgueño, J. , González‐Camacho, J. M. , Pérez‐Elizalde, S. , Beyene, Y. , Dreisigacker, S. , Singh, R. , Zhang, X. , Gowda, M. , Roorkiwal, M. , Rutkoski, J. , & Varshney, R. K. (2017). Genomic selection in plant breeding: Methods, models, and perspectives. Trends in Plant Science, 22(11), 961–975. [DOI] [PubMed] [Google Scholar]
  24. Das, R. R. , Vinayan, M. T. , Patel, M. B. , Phagna, R. K. , Singh, S. B. , Shahi, J. P. , Sarma, A. , Barua, N. S. , Babu, R. , Seetharam, K. , Burgueño, J. A. , & Zaidi, P. H. (2020). Genetic gains with rapid‐cycle genomic selection for combined drought and waterlogging tolerance in tropical maize (Zea mays L.). The Plant Genome, 13(3), e20035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. de Faria, L. C. , Melo, P. G. S. , de Souza, T. L. P. O. , Pereira, H. S. , & Melo, L. C. (2018). Efficiency of methods for genetic progress estimation in common bean breeding using database information. Euphytica, 214, 1–10. [Google Scholar]
  26. de los Campos, G. , Hickey, J. M. , Pong‐Wong, R. , Daetwyler, H. D. , & Calus, M. P. (2013). Whole‐genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327–345. 10.1534/genetics.112.143313 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. de los Campos, G. , Naya, H. , Gianola, D. , Crossa, J. , Legarra, A. , Manfredi, E. , Weigel, K. , & Cotes, J. M. (2009). Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics, 182(1), 375–385. 10.1534/genetics.109.101501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Derbyshire, M. C. , Khentry, Y. , Severn‐Ellis, A. , Mwape, V. , Saad, N. S. M. , Newman, T. E. , Taiwo, A. , Regmi, R. , Buchwaldt, L. , Denton‐Giles, M. , Batley, J. , & Kamphuis, L. G. (2021). Modeling first order additive × additive epistasis improves accuracy of genomic prediction for sclerotinia stem rot resistance in canola. The Plant Genome, 14(2), e20088. 10.1002/tpg2.20088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Dreisigacker, S. , Crossa, J. , Pérez‐Rodríguez, P. , Montesinos‐López, O. A. , Rosyara, U. , Juliana, P. , Mondal, S. , Crespo‐Herrera, L. , Govindan, V. , Singh, R. P. , & Braun, H.‐J. (2021). Implementation of genomic selection in the CIMMYT global wheat program, findings from the past 10 years. Crop Breeding, Genetics and Genomics, 3(2), e210005. [Google Scholar]
  30. Dreisigacker, S. , Pérez‐Rodríguez, P. , Crespo‐Herrera, L. , Bentley, A. R. , & Crossa, J. (2023). Results from rapid‐cycle recurrent genomic selection in spring bread wheat. G3: Genes, Genomes, Genetics, 13(4), jkad025 10.1093/g3journal/jkad025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome, 4(3). 10.3835/plantgenome2011.08.0024 [DOI] [Google Scholar]
  32. Fehr, W. (1991). Principles of cultivar development: Theory and technique (Vol. 1). Iowa State University. [Google Scholar]
  33. Fernández‐González, J. , Akdemir, D. , & Isidro y Sánchez, J. (2023). A comparison of methods for training population optimization in genomic selection. Theoretical and Applied Genetics, 136(3), 30. 10.1007/s00122-023-04265-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Garcia‐Abadillo, J. , Adunola, P. , Aguilar, F. S. , Trujillo‐Montenegro, J. H. , Riascos, J. J. , Persa, R. , Isidro Y Sanchez, J. , & Jarquín, D. (2024). Sparse testing designs for optimizing predictive ability in sugarcane populations. Frontiers in Plant Science, 15, 1400000. 10.3389/fpls.2024.1400000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gaynor, R. C. (2023). AlphaSimR: An R package for breeding program simulations. G3: Genes, Genomes, Genetics, 11(2), jkaa017. 10.1093/g3journal/jkaa017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Gaynor, R. C. , Gorjanc, G. , & Hickey, J. M. (2021). AlphaSimR: An R package for breeding program simulations. G3: Genes, Genomes, Genetics, 11(2), jkaa017. 10.1093/g3journal/jkaa017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gianola, D. , Crossa, J. , Gonzalez‐Recio, O. , & Rosa, G. J. M. (2022). Machine learning and genetic improvement of animals and plants: Where are we?. In Proceedings of 12th World Congress on Genetics Applied to Livestock Production (WCGALP). Technical and species orientated innovations in animal breeding, and contribution of genetics to solving societal challenges (pp. 1676−1679). Wageningen Academic Publishers. [Google Scholar]
  38. Gorjanc, G. , Gaynor, R. C. , & Hickey, J. M. (2018). Optimal cross selection for long‐term genetic gain in two‐part programs with rapid recurrent genomic selection. Theoretical and Applied Genetics, 131, 1953–1966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Habier, D. , Fernando, R. L. , Kizilkaya, K. , & Garrick, D. J. (2011). Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics, 12, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Heslot, N. , Yang, H. , Sorrells, M. E. , & Jannink, J. (2012). Genomic selection in plant breeding: A comparison of models. Crop Science, 52(1), 146–160. [Google Scholar]
  41. Hickey, J. M. , & Gorjanc, G. (2012). Simulated data for genomic selection and genome‐wide association studies using a combination of coalescent and gene drop methods. G3: Genes, Genomes, Genetics, 2(4), 425–427. 10.1534/g3.111.001297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Isidro, J. , Jannink, J. L. , Akdemir, D. , Poland, J. , Heslot, N. , & Sorrells, M. E. (2015). Training set optimization under population structure in genomic selection. Theoretical and Applied Genetics, 128, 145–158. 10.1007/s00122-014-2418-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Jannick, J.‐L. , Lorenz, A. J. , & Iwata, H. (2010). Genomic selection in plant breeding: From theory to practice. Briefings in Functional Genomics, 9(2), 166–177. [DOI] [PubMed] [Google Scholar]
  44. Jarquin, D. , Specht, J. , & Lorenz, A. (2016). Prospects of genomic prediction in the USDA soybean germplasm collection: Historical data creates robust models for enhancing selection of accessions. G3: Genes, Genomes, Genetics, 68, 2329–2341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Jubair, S. , Tucker, J. R. , Henderson, N. , Hiebert, C. W. , Badea, A. , Domaratzki, M. , & Fernando, W. G. D. (2021). GPTransformer: A transformer‐based deep learning method for predicting Fusarium related traits in barley. Frontiers in Plant Science, 12, 761402. 10.3389/fpls.2021.761402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Keller, B. , Ariza‐Suarez, D. , de la Hoz, J. , Aparicio, J. S. , Portilla‐Benavides, A. E. , Buendia, H. F. , Mayor, V. M. , Studer, B. , & Raatz, B. (2020). Genomic prediction of agronomic traits in common bean (Phaseolus vulgaris L.) under environmental stress. Frontiers in Plant Science, 11, 1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kelly, J. D. (2010). The story of bean breeding (White paper prepared for BeanCAP & PBG Works on the topic of dry bean production and breeding research in the U.S.). Michigan State University. [Google Scholar]
  48. Kingma, D. P. , & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv. 10.48550/arXiv.1412.6980 [DOI]
  49. Lin, J. , Arief, V. , Jahufer, Z. , Osorno, J. , McClean, P. , Jarquin, D. , & Hoyos‐Villegas, V. (2023). Simulations of rate of genetic gain in dry bean breeding programs. Theoretical and Applied Genetics, 136(1), 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Lin, Z. , Cogan, N. O. , Pembleton, L. W. , Spangenberg, G. C. , Forster, J. W. , Hayes, B. J. , & Daetwyler, H. D. (2016). Genetic gain and inbreeding from genomic selection in a simulated commercial breeding program for perennial ryegrass. The Plant Genome, 9(1), plantgenome2015.06.0046. 10.3835/plantgenome2015.06.0046 [DOI] [PubMed] [Google Scholar]
  51. Lopez‐Cruz, M. , Beyene, Y. , Gowda, M. , Crossa, J. , Pérez‐Rodríguez, P. , & de Los Campos, G. (2021). Multi‐generation genomic prediction of maize yield using parametric and non‐parametric sparse selection indices. Heredity, 127(5), 423–432. 10.1038/s41437-021-00474-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Meuwissen, T. H. , Hayes, B. J. , & Goddard, M. E. (2001). Prediction of total genetic value using genome‐wide dense marker maps. Genetics, 157(4), 1819–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Montesinos López, O. A. , Montesinos López, A. , & Crossa, J. (2022). General elements of genomic selection and statistical learning: Genomic selection. In Multivariate statistical machine learning methods for genomic prediction (pp. 1–34). Springer Nature. 10.1007/978-3-030-89010-0 [DOI] [PubMed] [Google Scholar]
  54. Montesinos‐López, O. A. , Montesinos‐López, A. , Pérez‐Rodríguez, P. , Barrón‐López, J. A. , Martini, J. W. R. , Fajardo‐Flores, S. B. , Gaytan‐Lugo, L. S. , Santana‐Mancilla, P. C. , & Crossa, J. (2021). A review of deep learning applications for genomic selection. BMC Genomics, 22, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Morais, O. P., Jr. , Müller, B. S. F. , Valdisser, P. A. M. R. , Brondani, C. , & Vianello, R. P. (2023). Genomic prediction for drought tolerance using multienvironment data in a common bean (Phaseolus vulgaris) breeding program. Crop Science, 63(4), 2145–2161. [Google Scholar]
  56. Mordor Intelligence . (2024). Dry beans market insights . www.mordorintelligence.com/industry‐reports/dry‐beans‐market.
  57. Myers, J. R. , & Kmiecik, K. (2017). Common bean: Economic importance and relevance to biological science research. In de la Vega M. P., Santalla M., & Marsolais F. (Eds.), The common bean genome (pp. 1–20). Springer Nature. [Google Scholar]
  58. Norman, A. , Taylor, J. , Edwards, J. , & Kuchel, H. (2018). Optimising genomic selection in wheat: Effect of marker density, population size and population structure on prediction accuracy. G3: Genes, Genomes, Genetics, 8(9), 2889–2899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. O'Boyle, P. D. , Kelly, J. D. , & Kirk, W. W. (2007). Use of marker‐assisted selection to breed for resistance to common bacterial blight in common bean. Journal of the American Society for Horticultural Science, 132(3), 381–386. 10.21273/JASHS.132.3.381 [DOI] [Google Scholar]
  60. Parker, T. A. , Gallegos, J. A. , Beaver, J. , Brick, M. , Brown, J. K. , Cichy, K. , Debouck, D. G. , Delgado‐Salinas, A. , Dohle, S. , Ernest, E. , de Jensen, C. E. , Gomez, F. , Hellier, B. , Karasev, A. V. , Kelly, J. D. , McClean, P. , Miklas, P. , Myers, J. R. , Osorno, J. M. , … Gepts, P. (2022). Genetic resources and breeding priorities in Phaseolus beans: Vulnerability, resilience, and future challenges. Plant Breeding Reviews, 46, 289–420. 10.1002/9781119874157 [DOI] [Google Scholar]
  61. R Core Team . (2025). R: A language and environment for statistical computing . R Foundation for Statistical Computing. https://www.R‐project.org/ [Google Scholar]
  62. Raggi, L. , Caproni, L. , Carboni, A. , & Negri, V. (2019). Genome‐wide association study reveals candidate genes for flowering time variation in common bean (Phaseolus vulgaris L.). Frontiers in Plant Science, 10, 962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ray, S. , Jarquin, D. , & Howard, R. (2023). Comparing artificial‐intelligence techniques with state‐of‐the‐art parametric prediction models for predicting soybean traits. The Plant Genome, 16(1), e20263. 10.1002/tpg2.20263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Rincent, R. , Laloë, D. , Nicolas, S. , Altmann, T. , Brunel, D. , Revilla, P. , Rodríguez, V. M. , Moreno‐Gonzalez, J. , Melchinger, A. , Bauer, E. , Schoen, C. C. , Meyer, N. , Giauffret, C. , Bauland, C. , Jamin, P. , Laborde, J. , Monod, H. , Flament, P. , Charcosset, A. , & Moreau, L. (2012). Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics, 192(2), 715–728. 10.1534/genetics.112.141473 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Rutkoski, J. E. , Poland, J. A. , Singh, R. P. , Huerta‐Espino, J. , Bhavani, S. , Barbier, H. , Rouse, M. N. , Jannink, J. , & Sorrells, M. E. (2014). Genomic selection for quantitative adult plant stem rust resistance in wheat. The Plant Genome, 7(3), plantgenome2014.02.0006. 10.3835/plantgenome2014.02.0006 [DOI] [Google Scholar]
  66. Sallam, A. H. , Endelman, J. B. , Jannink, J. L. , & Smith, K. P. (2015). Assessing genomic selection prediction accuracy in a dynamic barley breeding population. The Plant Genome, 8(1), plantgenome2014.05.0020. 10.3835/plantgenome2014.05.0020 [DOI] [PubMed] [Google Scholar]
  67. Sandhu, K. S. , Lozada, D. N. , Zhang, Z. , Pumphrey, M. O. , & Carter, A. H. (2021). Deep learning for predicting complex traits in spring wheat breeding program. Frontiers in Plant Science, 11, 613325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Schneider, K. A. , Brothers, M. E. , & Kelly, J. D. (1997). Marker‐assisted selection to improve drought resistance in common bean. Crop Science, 37(1), 51–60. [Google Scholar]
  69. Singh, S. P. (2001). Broadening the genetic base of common bean cultivars: A review. Crop Science, 41(6), 1659–1675. [Google Scholar]
  70. Song, Q. , Jia, G. , Hyten, D. L. , Jenkins, J. , Hwang, E. Y. , Schroeder, S. G. , Osorno, J. M. , Schmutz, J. , Jackson, S. A. , McClean, P. E. , & Cregan, P. B. (2015). SNP assay development for linkage map construction, anchoring whole‐genome sequence, and other genetic and genomic applications in common bean. G3: Genes, Genomes, Genetics, 5, 2285–2290. 10.1534/g3.115.020594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Ubbens, J. , Parkin, I. , Eynck, C. , Stavness, I. , & Sharpe, A. G. (2021). Deep neural networks for genomic prediction do not estimate marker effects. The Plant Genome, 14(3), e20147. 10.1002/tpg2.20147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Uebersax, M. A. , Cichy, K. A. , Gomez, F. E. , Porch, T. G. , Heitholt, J. , Osorno, J. M. , Kamfwa, K. , Snapp, S. S. , & Bales, S. (2023). Dry beans (Phaseolus vulgaris L.) as a vital component of sustainable agriculture and food security—A review. Legume Science, 5(1), e155. 10.1002/leg3.155 [DOI] [Google Scholar]
  73. Verges, V. L. , & Van Sanford, D. A. (2020). Genomic selection at preliminary yield trial stage: Training population design to predict untested lines. Agronomy, 10(1), 60. 10.3390/agronomy10010060 [DOI] [Google Scholar]
  74. Whittaker, J. C. , Thompson, R. , & Denham, M. C. (2000). Marker‐assisted selection using ridge regression. Genetics Research, 75(2), 249–252. 10.1017/S0016672399004462 [DOI] [PubMed] [Google Scholar]
  75. Wu, C. , Zhang, Y. , Ying, Z. , Li, L. , Wang, J. , Yu, H. , Zhang, M. , Feng, X. , Wei, X. , & Xu, X. (2024). A transformer‐based genomic prediction method fused with knowledge‐guided module. Briefings in Bioinformatics, 25(1), bbad438. 10.1093/bib/bbad438 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Wu, X. , Wang, B. , Xin, Y. , Wang, Y. , Tian, S. , Wang, J. , Wu, X. , Lu, Z. , Qi, X. , Xu, L. , & Li, G. (2022). Unravelling the genetic architecture of rust resistance in the common bean (Phaseolus vulgaris L.) by combining QTL‐seq and GWAS analysis. Plants, 11(7), 953. 10.3390/plants11070953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Yu, K. , Park, S. J. , & Poysa, V. (2000). Marker‐assisted selection of common beans for resistance to common bacterial blight: Efficacy and economics. Plant Breeding, 119(5), 411–415. 10.1046/j.1439-0523.2000.00514.x [DOI] [Google Scholar]
  78. Zhang, X. , Pérez‐Rodríguez, P. , Burgueño, J. , Olsen, M. , Buckler, E. , Atlin, G. , Prasanna, B. M. , Vargas, M. , San Vicente, F. , & Crossa, J. (2017). Rapid cycling genomic selection in a multiparental tropical maize population. G3: Genes, Genomes, Genetics, 7(7), 2315–2326. 10.1534/g3.117.043141 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure 1: A visual representation of the neural network implemented in this study.

Supplemental Figure 2: The model performance (Pearson's Correlation Coefficient between the true and estimated breeding valuet) for all pathways and all models over the course of the simulated breeding program.

Supplemental Figure 3: Density plots of major allele frequency for each Pathway using RRF2 (A‐D), RRF5(E‐H) and RRMIX(I‐L).

Supplemental Figure 4: The correlation between the EBV (estimated breeding value) and phenotype value for each model: Ridge Regression trained with F2 data (RRF2, plot A), Ridge Regression trained with F5 data (RRF5, plot B), Ridge Regression trained with F2 and F5 data (RRMIX, plot C), Neural Network trained with F2 data (NNF2), Neural Network trained with F5 data (NNF5) and Neural Network trained with F2 and F5 data (NNMIX).

Supplemental Figure 5: Density plots of major allele frequency for each Pathway using NNF2 (A‐D), NNF5(E‐H) and NNMIX(I‐L).

Supplemental Figure 6: Density plots of major allele frequency for each Pathway using the PSF2 control (left) and the PSF5 control (right).

TPG2-18-e70142-s001.docx (1.2MB, docx)

Data Availability Statement

Data and code are available on Github at https://github.com/McGillHaricots/peas‐andlove/tree/master/GSPathways.


Articles from The Plant Genome are provided here courtesy of Wiley

RESOURCES