Abstract
Background
Genomic Best Linear Unbiased Prediction (GBLUP) assumes that all SNPs contribute equally to genetic variance, including those with minimal impact, limiting its accuracy. A major challenge in animal breeding is to develop more scientific models or leverage SNP priors to enhance existing prediction frameworks.
Results
We analyzed 122,672 SNPs from 16,122 Holstein cattle with estimated breeding values (EBVs) for nine traits. SNP weight from GWAS and BayesBπ analyses were incorporated into a non-linear model to develop the Dynamic Prior Attention Neural Network (DPAnet), and into GBLUP to construct the SNP-weighted GBLUP (WGBLUP). These were benchmarked against GBLUP, four Bayesian methods, support vector regression (SVR), and kernel ridge regression (KRR) using fivefold cross-validation with 5 repetitions. Specifically, DPAnet significantly improved average accuracy for FP, PP, and FL by 3.0%, 1.1%, and 1.1%, respectively, over GBLUP. WGBLUP_BayesBπ outperformed GBLUP across all traits, averaging a 1.1% gain in accuracy, notably 4.9% for FP, while WGBLUP_GWAS improved accuracy by 1.3% but a 9.1% loss in unbiasedness. Overall, Bayesian models achieve the highest average accuracy (0.625 for BayesR). Even the lowest-performing Bayesian model (BayesCπ, 0.622) outperforms WGBLUP_BayesBπ, WGBLUP_GWAS, DPAnet and GBLUP by 0.8%, 0.6%, 2.2%, 1.9%, respectively. For three type traits, hyperparameter-optimized SVR (0.755), KRR (0.743), and DPAnet (0.741) ranked top three. However, all these advanced methods required, on average, more than six times the computational time of GBLUP, limiting their practical scalability.
Conclusion
In our dataset, BayesR achieves the highest predictive performance, while GBLUP maintained the best balance between accuracy and computational efficiency. Although weight models perform well for some traits, their overall performance remains inferior to that of the traditional Bayesian model. As more causal SNPs for complex traits are identified, the predictive accuracy of weighted models is expected to further improve.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12864-025-12218-0.
Keywords: GBLUP, WGBLUP, DPAnet, Bayesian, Machine learning
Introduction
Genomic selection (GS) is applied to estimate direct genomic values (DGVs) via single nucleotide polymorphisms (SNPs) that cover the whole genome, enabling early and accurate selection of superior individuals without the need for progeny testing [1]. As the most efficient breeding technology, GS has been successfully implemented in both animal and plant breeding programs, especially in dairy cattle [2, 3], where it shortens generation interval from 7 years to less than 2.5 years, thereby greatly reducing breeding costs and accelerates genetic progress [4, 5]. Genomic prediction accuracy is influenced by factors such as the reference population size [6, 7], marker density [1, 8], trait heritability [9], and statistical models [10–12]. Since 2009, genomic prediction strategies based on 50 K panels and the genomic best linear unbiased prediction (GBLUP) model have been widely applied for dairy cattle [13, 14]. However, despite its widespread use, GBLUP assumes all genetic effects are additive and does not adequately account for complex, non-linear genetic interactions such as dominance, epistasis, or genotype-by-environment effects, which can be important for many traits [15, 16]. To address these limitations and better model the diversity of genetic architectures found in complex traits, researchers have increasingly turned to Bayesian methods (BayesA [1], BayesBπ [1], BayesCπ [17], and BayesR [18]), which offer more flexible assumptions about marker effect distributions and more flexibility in addressing nonlinear effects in high-dimensional data. Some comparative analyses between GBLUP and different Bayesian methods conducted on real data indicate that Bayesian methods often outperform GBLUP in terms of predictive performance [11, 19, 20]. However, these methods require a significant number of computations to determine the parameters [21].
Machine learning (ML) methods, including support vector regression (SVR) [22] and kernel ridge regression (KRR) [23], have demonstrated superior predictive performance over traditional models for certain traits in animals and plants, without requiring assumptions about marker effect distributions [10, 24, 25]. As a subclass of ML, multilayer perceptrons (MLP) and convolutional neural networks (CNN) can capture complex linkage disequilibrium structures and SNP interactions [26]. Some studies have shown that MLP outperforms GBLUP and Bayesian LASSO in both simulated and real animal datasets, particularly for traits with non-additive genetic architectures [27, 28]. However, MLP performance is inconsistent across datasets [29], and ML models often require trait-specific hyperparameter tuning, limiting their generalizability and increasing computational demands. In this context, the self-attention mechanism offers an alternative approach: by dynamically assigns weights to input features by capturing long-range dependencies and complex interactions, it is well suited for high-dimensional genomic data with polygenic effects [30]. In genomic selection, self-attention can enhance prediction accuracy by adaptively highlighting informative SNPs and modeling non-linear relationships across the genome.
In 2012, China officially launched the national genomic selection project for Holstein young bulls using GBLUP. The large-scale accumulation of genomic data has greatly facilitated the accurate identification of key genetic variations associated with traits. Previous studies have shown that incorporating SNP prior information into SNP-weighted GBLUP (WGBLUP) can moderately increase the prediction accuracy of certain traits [31–34]. Despite these advances, there remains a need for models that can efficiently integrate prior biological knowledge, such as SNP-level functional information, with flexible modeling frameworks. To address this, we propose a neural network approach, called the Dynamic Prior-Attention Network (DPAnet), which incorporates prior information within a deep learning architecture to further improve prediction accuracy. We systematically evaluated the predictive performance of linear (GBLUP), Bayesian (BayesA, BayesBπ, BayesCπ, BayesR), machine learning (SVR and KRR), weighted GBLUP models (WGBLUP) and DPAnet using a reference population of 16,122 Chinese Holstein cattle, providing valuable insights for improving genomic prediction accuracy.
Materials and methods
Dataset collection
To date, the reference population of the national genomic selection project for Holstein cattle contains 253 proven bulls and 15,869 Chinese Holstein cows born between 1984 and 2019 [35]. In this study, the dataset comprises SNPs chip data from 16,122 Chinese Holstein cattle, whose estimated breeding values (EBVs) for nine traits were provided by the Dairy Data Center, Dairy Association of China (DAC). These traits included milk yield (MY), fat yield (FY), protein yield (PY), fat percentage (FP), protein percentage (PP), conformation (CONF), mammary system (MS), feet & legs (FL) and somatic cell score (SCS) [36]. De-regressed proof (DRP) values were calculated via the method described by Jairath L et al. [37] (Supplementary Figure S1a-c).
The genotyping datasets for these animals consisted of three types of chips, including 5,726 individuals with the BovineSNP50 BeadChip (54,609 SNPs; Illumina Inc., San Diego, CA, USA), 1,505 animals with the GeneSeek GGP-bovine 80 K SNP BeadChip (76,883 SNPs; Neogen, Lincoln, NE, USA), and 8,891 animals with the GGP BovineSNP 150 K (139,376 SNPs; Neogen, Lincoln, NE, USA) (Supplementary Figure S1d-e). We imputed all individuals into the 150 K panel via Beagle v5.0 [38]. The imputation accuracy for genotype correlation (COR) and genotype concordance rate (CR) of the imputing SNPs, unaffected by MAF, with values more than 0.9, while the dosage R-squared (DR2) metric exhibited lower accuracy (0.4 to 0.5) when MAF ≤ 0.05 (Supplementary Figure S2a-b). Consequently, we removed SNPs with minor allele frequency (MAF) below 0.05, Hardy–Weinberg equilibrium (HWE) below 1e-6 and call rates below 0.90, and excluded individuals with call rates below 0.9 via PLINK [39]. Finally, 122,672 SNPs from all the animals were retained. The imputation performance from 50K_to_150K was characterized by CR, COR, and DR2 values of 0.985, 0.967, and 0.986, respectively, whereas imputation from 80K_to_150K yielded higher values of 0.987, 0.971, and 0.991, respectively, ensuring data reliability for subsequent analyses.
Genomic prediction models
For the reference population of 16,122 Chinese Holstein cattle, 15,636 individuals were used for five production traits and the SCS trait, while 11,709 individuals were used for three type traits. All models were performed a fivefold cross-validation procedure with 5 repetitions on an HP server (CentOS Linux 7 Intel(R) Xeon(R) Gold 6348 CPU @ 2.60 GHz) with 20 threads. After removing outliers from the results using the interquartile range, Wilcoxon test was applied to assess the significance of differences.
GBLUP
GBLUP is one of the most extensively used regression methods for genomic prediction. The statistical model of GBLUP can be written as:
![]() |
1 |
where y is the vector of DRPs of genotyped individuals, μ is the overall mean, 1 is the identity matrix, g is the vector of DGVs, e is the vector of random errors, and Z is an incidence matrix allocating records to g. The random effects follow a normal distribution
and
, where G is the genomic matrix, and
and
are the additive genetic variance and the residual variance, respectively. The genetic relationship between individuals i and j is calculated as follows [40]:
![]() |
2 |
where m is the number of markers, M is the numerical genotype matrix (the AA, AB, and BB genotypes are coded as 0, 1, and 2, respectively), and p is the frequency of the coded allele. GBLUP were implemented via bwgs [41].
Bayesian methods
Bayesian methods primarily differ in their assumptions about the distribution of SNP effects. BayesA assumes that all SNP effects follow a normal distribution with a mean of zero and different variances and that the effect variance follows an inverse chi-square distribution (Table 1). BayesBπ assumes that SNPs with a ratio of 1-π have no-zero effects, and take different variance which follows an inverse chi-square distribution with a scale parameter of
and degrees of freedom of
, but π is not fixed. BayesCπ is similar to BayesCπ, but the SNP with non-zero effects share the same variance. BayesR divides markers into four groups on the basis of their effects: large, medium, small, and none. The variance between groups varies in a gradient fashion.
Table 1.
Assumption of effect size distribution of markers for the methods in hibayes [43]
| Model | Formula | Joint distribution |
| BayesA | ![]() |
t |
| BayesBπ | ![]() |
Point-t |
| BayesCπ | ![]() |
t mixture |
| BayesR | ![]() |
Point-normal mixture |
represents the effect size equals to zero, v and S are the degree freedom and scale parameter forinverse chi-square distribution, t represents student's t-distribution
For Bayesian methods, the Markov chain Monte Carlo (MCMC) chain length is set to 20,000, with a burn-in of 12,000 and thinning to 5. All Bayesian models were implemented via hibayes [42].
WGBLUP
First, owing to the five-fold cross-validation with 5 repetitions, we conducted GWAS analysis via GCTA [43] software on the training set for each round of testing across 9 traits, resulting in a total of 25 GWAS runs per trait. The model is as follows:
![]() |
3 |
where y is the vector of DRP,
is top five principal components, W is an incidence matrix allocating records to
. x is a variable for SNP genotypes, corresponding to three genotypes (0, 1 and 2),
is the vector of random polygenic effects, with G is the genomic relationship matrix based on 150 K SNPs, and
is the polygenic variance; and
is the vector of random residual effects, with
is the residual variance. D is a diagonal matrix with inverse weights for DRP to account for heterogeneous accuracy.
The SNP weights from the GWAS summary and BayesBπ were calculated by the method described by Theo Meuwissen et al. [31].
![]() |
4 |
Among of them,
represents the weight of each SNP;
is the prior probability that SNPs have a substantial effect.
denote the moving average of the log-likelihood-ratio (LR) of five SNPs, and LR can be calculated by
, which
is the SNP effect estimate and se is its standard error. When SNP priors come from GWAS,
is set to 0.001, while when they come from BayesBπ,
is the value estimated by BayesBπ. When the SNPs prior comes from GWAS summary, we also calculate the weight by
![]() |
5 |
WGBLUP, which assigns each SNP a separate weight to replace the identity matrix in the GBLUP G-matrix, and then predicts the DGV by Eqs. 1. The derivation of weighting SNPs in constructing a kinship matrix was described by Su et al. [44] and can be formulated as follows:
![]() |
6 |
where
is the weight of the SNP.
Machine learning
Support Vector Regression (SVR)
SVR employs linear or nonlinear kernel functions to map the SNP dataset into a higher-dimensional feature space for modelling and prediction. The SVR model is as follows:
![]() |
7 |
in which h(x)Tβ is the kernel function, β is the vector of weights, and β0 is the bias. Generally, the formalized SVR is given by minimizing the following restricted loss function: , in which
.
where
is the ε-insensitive loss and C (cost parameter) is the regularization constant that controls the trade-off between prediction error and model complexity. y is a quantitative response, and ||·|| is the norm in Hilbert space. After optimization, the final form of SVR can be written as
, in which
is the kernel function.
Kernel Ridge Regression (KRR)
KRR is a nonlinear regression method that employs a nonlinear kernel function to map data to a higher-dimensional kernel space, where ridge regression is applied to achieve linear separability. The linear model in this kernel space is determined by minimizing the mean squared error with ridge regularization. The KRR model is as follows:
![]() |
8 |
where K is the gram matrix with entries
; λ is the ridge parameter; I is the identity matrix;
with j = 1,2,3,…,n, n is the number of training samples, and xi is the test sample. In the expanded form
Hyperparameter optimization
For each trial, we employed the Bayesian optimization strategy implemented in the Optuna package to identify the optimal hyperparameters for SVR and KRR using the training data [45]. In SVR, model fitting ability, nonlinear mapping capacity, and generalization performance were improved by tuning the kernel, gamma, and regularization parameter C (kernel ϵ ["linear", "rbf", "poly", "sigmoid"], gamma ϵ [1e − 3, 10], and C ϵ [0.1, 100]). Similarly, in KRR, model fitting, nonlinear modeling, and regularization strength were enhanced by adjusting the kernel, gamma, and alpha parameters (kernel ϵ ["linear", "rbf", "poly", "sigmoid"], gamma ϵ [1e − 3, 10], and alpha ϵ [0, 1]). All ML models were implemented using the scikit-learn library in Python [46].
Dynamic prior-attention network
Multilayer perceptron (MLP) is a feedforward artificial neural network widely used in genomic prediction [27, 47]. An MLP consists of an input layer, one or more hidden layers, and an output layer. Each layer comprises multiple neurons (nodes), with each neuron connected to all neurons in the previous layer, receiving and transmitting signals. Input signals are passed through weighted connections, compared with thresholds, and activated by corresponding functions to generate output values. In this study, we propose the Dynamic Prior-Attention Network (DPAnet), which is based on the MLP architecture and integrates a single-head attention mechanism with Bayesian residual connections. DPAnet can be described by the following formula:
![]() |
9 |
where
: the DGV of traits;
: the output layer weight matrix;
: the final hidden layer feature vector;
: a output layer bias term.
The dynamic attention layer is designed to dynamically re-scale and weight each individual SNP based on its prior biological importance. It can be described as:
![]() |
10 |
where
is the attention weight for the estimated j-th SNP, a ϵ [0, 1];
denotes the Sigmoid activation function;
,
and
represents the P-value of the j-th SNP estimated by GWAS, the minor allele frequency, and the effect value estimated by BayesB, respectively.
is Leaky Rectified Linear Unit.
and
represent the weight matrices for the first and second fully connected layers, respectively, where
and
.
The forward propagation of the Bayesian residual layer is defined by the following formula:
![]() |
11 |
where the weight initialization for
in the residual layer is as follows
, and BN is batch normalization.
is the learnable projection matrix with dimensions [Nhidden, NSNP] to unify the dimensions of the residual network, where Nhidden denotes the number of neurons in the first hidden layer and NSNP is the number of input SNP.
The fully connected network can be described as:
![]() |
12 |
where
: the weight matrix of the current layer;
: the output from the previous layer.
The loss function is defined by the following formula:
![]() |
13 |
where y is DRP of traits.
We also used Optuna for hyperparameter optimization. The training range for the learning rate was [1e-5, 1e-2], with batch sizes of [32, 64, 128, 256, 512]. The two hidden layers had sizes of [64, 512] and [32, 256], dropout rates ranged from [0.1, 0.7], and weight decay was set between [1e-7, 1e-2]. We performed 100 trials, each consisting of 100 epochs, with an early stopping patience of 15 trials. If there was no improvement for 20 trials, the tuning process was terminated early. Afterward, the model was retrained using the optimal hyperparameters from the training process, with 200 epochs and early stopping after 30 epochs without improvement.
Evaluation indicators
The accuracy of genome prediction is measured by the correlation between the DGV and DRP, which is calculated via the following formula:
![]() |
14 |
where
is the average DRP accuracy. Unbiasedness is measured by the regression coefficient of the DGV through the DRP and can be calculated via
![]() |
15 |
the DGV is an unbiased estimate of the DRP when b = 1.
Data and code availability
The computational codes are freely available via https://github.com/bioramen-Blip/GS-Benchmark-analysis.
Results
GWAS result
To minimize potential information leakage from the validation set, GWAS was performed exclusively within the training population for each test fold. The genomic inflation factor (λ) across analyses averaged 0.963 (Supplementary Table S1), indicating well-controlled population structure and the absence of systematic inflation in the GWAS results. We identified 478 genome-wide significant SNPs across 9 traits (P-value < 4.08e-07). Among these, 61, 16, 55, 147, 159, 39 and 1 SNPs were identified for MY, FY, PY, FP PP, SCS and CONF (Fig. 1; Supplementary Table S2). Notably, 37 genes associated with production traits were identified consistently across all testing rounds, including well-known causal genes such as DGAT1 and EEF1D on chromosome 14, ABCG2 on chromosome 6, and GHR on chromosome 20, confirming the reliability of the GWAS results (Fig. 1a-b). Enrichment analysis showed significant enrichment of GO terms related to acetylcholine receptor regulator activity, neurotransmitter receptor regulator activity, and signaling receptor inhibitor activity (Supplementary Table S3). Furthermore, our previous single-cell atlas comprising 1,793,854 cells from 59 tissues and 131 cell types demonstrated a strong association between production traits and excitatory neurons in the brain [48]. underscoring the role of the central nervous system in lactation regulation. These findings reflect the complex regulatory role of the brain in the lactation function of dairy cows. By contrast, no stable significant SNPs were detected for the three type traits (Fig. 1c-d).
Fig. 1.
GWAS analysis for nine traits. QQplots (a, c) and Manhattan plots (b, d) of -log10(P-value) for five production traits (b), SCS traits (b) and three type traits (d). The black horizontal line denotes the genome-wide significance level
Incorporating SNP priors to increase genomic prediction accuracy
We incorporated SNP weights from GWAS into WGBLUP using two approaches (Eqs. 4 and 5). WGBLUP_GWAS, which derived weights based on SNP effects using log-likehood ratios, achieved average prediction accuracy and unbiasedness of 0.615 and 0.908, respectively (Fig. 2a-b; Supplementary Table S4). Specifically, WGBLUP_GWAS significantly outperformed GBLUP by an average accuracy by 1.5%, 1.3%, 5.9%, 4.5%, and 1.5% (P-value < 0.05) for MY, FY, FP, PP, and SCS, respectively, but a decreased by 1% for FL. However, the unbiasedness significantly decreased across all traits, with an average reduction of 9.14%. Using GWAS-derived P-values, WGBLUP_P-value reached an average accuracy of 0.588 and unbiasedness of 0.714, representing decreases of 1.4% and 27.9% relative to GBLUP. Only FP and PP showed significant accuracy gains of 3.1% and 1.7%, respectively.
Fig. 2.
Integration of SNP prior information in WGBLUP and DPAnet. a The box plot shows the prediction accuracy of GBLUP, DPAnet, and three WGBLUP models incorporating SNP prior across 9 traits in dairy cattle. WGBLUP_P-value represents SNP weight derived from GWAS -log10(P-value) (Eq. (5)), while WGBLUP_GWAS and WGBLUP_BayesB represent SNP priors derived from GWAS and BayesB estimated effects, respectively, with SNP weight calculated by Eq. (4). Using GBLUP as the baseline, statistical significance between methods is tested by Wilcoxon test, where * indicates P-value < 0.05, ** indicates P-value < 0.01, and *** indicates P-value < 0.001. Absence of a line indicates no significant difference. b The bar chart shows the prediction unbiasedness of GBLUP, DPAnet, and three WGBLUP models incorporating SNP prior information for 9 traits in dairy cattle
Second, we estimated SNP effects via BayesBπ on the same population from the GWAS, obtaining results largely consistent with GWAS. For the five milk production traits and SCS, large-effect SNPs were detected on chromosomes 5, 6, 14, and 20 (Supplementary Figure S3). In contrast, only some small-effect SNPs (|effect|< 0.5) were observed for the three type traits. Incorporating this prior information, WGBLUP_BayesBπ achieved an average accuracy of 0.614 and an unbiasedness of 0.990 (Fig. 2a, b; Supplementary Table S4). WGBLUP_BayesBπ outperformed GBLUP across all traits by an average accuracy of 1.1%, with significant improvements for FY (1.4%), FP (4.9%), and PP (1.5%) (P-value < 0.05). Although the unbiasedness slightly decreased by 0.2% on average, this change was not statistically significant.
Previous studies suggest that traditional neural networks can capture non-additive genetic effects such as dominance and epistasis, but often underperform compared to linear models when predicting traits dominated by additive effects [27–29]. In this study, we incorporated GWAS significance level (-log10(P-value)), SNP effect estimates (β) from BayesBπ, and minor allele frequency (MAF) as three-dimensional prior feature vectors into an MLP framework. A single-head attention mechanism was used to dynamic adjust SNP-specific weights, yielding the DPAnet model. DPAnet significantly improved prediction accuracy for FL (1.1%), FP (3.0%), and PP (1.1%), performed comparably to GBLUP for CONF and MS, but decreased accuracy for MY (−1.9%), PY (−2.2%), FY (−1.1%), and SCS (−2.7%). Overall, its average accuracy and unbiasedness were 0.599 and 0.863, corresponding to reductions of 0.3% and 12.9% relative to GBLUP.
In general, WGBLUP_BayesBπ showed the best prediction performance, improving average accuracy by 1.1% over GBLUP without loss of unbiasedness. In contrast, DPAnet, achieved the highest prediction accuracy for type traits, reaching an average accuracy of 0.742, a 0.4% improvement over both GBLUP and WGBLUP_BayesBπ.
Distribution of SNP weight
The different prediction performance across traits between three weighted models may be explained by their underlying genetic architectures. For FP, all three methods assigned higher weights to SNPs located on chromosomes 5 (SLC15A5), chromosomes 6 (ABCG2), chromosomes 14 (DGAT1, EEF1D), and chromosomes 20 (GHR). These genes that have been confirmed to be causally related to production traits (Fig. 3), likely explaining their superior accuracy over GBLUP. For three yield and health traits, WGBLUP effectively highlighted key SNPs, whereas DPAnet failed to prioritize these functional loci, resulting in reduced accuracy. In type traits such as FL, DPAnet assigned relatively balanced SNP weight across SNPs (0.55–0.75), avoiding overemphasis on individual loci. This weight distribution is consistent with the polygenic nature of low-heritability traits, where many small-effect loci jointly shape the phenotype. As a result, DPAnet outperformed both GBLUP and WGBLUP for FL, highlighting its advantage in modeling complex genetic architectures.
Fig. 3.
SNPs weight distribution. Manhattan plots of SNPs weight for FP (a), PY (b), SCS (c) and FL (d). The values range between 0 and 1
Optimization of machine learning models
The prediction accuracy of machine learning is strongly influenced by feature parameters. To maximize accuracy, we applied Bayesian optimization for hyperparameter tuning within the traning set of each testing round, and subsequently used the optimized parameters to predict DGV in the validation set. This procedure significantly improved model performance compared with default settings. Specifically, the hyperparameter-optimized KRR (KRR_hyper) exhibited improvements of 19.46% in accuracy and 52.84% in unbiasedness (P-value < 0.05; Fig. 4a, b; Supplementary Table S5). In particular, for production traits, accuracy and unbiasedness were enhanced by 25.74% and 53.6%, respectively. Similarly, SVR demonstrated a 4.6% increase in accuracy and a 20.4% improvement in unbiasedness. Between the two optimized models, KRR outperformed SVR by an average of 1.7%, particularly for production and health traits, where it improved by 3.2% and 2.9%, respectively. By contrast, SVR achieved 1.2% higher accuracy in type traits and exhibited substantially higher unbiasedness, with an average advantage of 12.3% over KRR.
Fig. 4.
Machine learning hyperparameter optimization. The bar chart shows the prediction accuracy (a) and unbiasedness (b) between ML models with hyperparameter optimization and those with default parameters. Statistical significance between groups is tested using Wilcoxon test, where *** indicates P-value < 0.001. Absence of a line indicates no significant difference
Comparison of prediction performance among the WGBLUP, DPAnet, ML, Bayesian and GBLUP methods
To comprehensively benchmark prediction models, we evaluated 11 widely used approaches, including GBLUP, four Bayesian methods, three WGBLUP methods, a neural network, and two hyperparameter-optimized machine learning (ML) methods (Fig. 5a-b; Supplementary Table S4). Overall, BayesR achieved the best predictive performance with a accuracy of 0.625 and an unbiasedness 1.013, representing a 3.8% improvement over GBLUP (0.602), while maintaining comparable unbiasedness (0.992). Among the other Bayesian methods, BayesA, BayesBπ, and BayesCπ yielded comparable accuracies (0.622–0.623) with unbiasedness close to 1. Although BayesCπ was the least effective within this group, its accuracy (unbiasedness) still surpassed WGBLUP_BayesBπ, WGBLUP_GWAS, and DPAnet by 0.8% (0.1%), 0.6% (8.9%), and 2.2% (12.7%), respectively. For production traits, BayesR again ranked highest (0.581), outperforming GBLUP, WGBLUP_BayesBπ, WGBLUP_GWAS, and DPAnet by 3.7%, 1.9%, 1.1%, and 3.9%, respectively. SVR exhibited the lowest accuracy at 0.585. But for type traits, SVR demonstrated the highest average accuracy (0.755), followed by KRR (0.743) and DPAnet (0.742), corresponding to improvements of 1.8%, 0.5%, and 0.4%, respectively, over GBLUP.
Fig. 5.
Comparison of genomic prediction performance for different models. The bar plot showed a compares the prediction accuracy (a) and unbiasedness (b) between11 models across 9 traits in dairy cattle. Using GBLUP as the baseline, statistical significance between methods is tested using Wilcoxon test, where * indicates P-value < 0.05, ** indicates P-value < 0.01, and *** indicates P-value < 0.001. Absence of a line indicates no significant difference
Timeliness is another important indicator for evaluating GS model. In this study, KRR was the efficient, with a runtime of approximately one-third that of GBLUP (37.40 min; Supplementary Table S6). In contrast, WGBLUP incurred additional computational costs due to GWAS and BayesBπ analyses for SNP prior estimation, averaging 304.74 and 195.17 min, respectively. Bayesian methods showed moderate efficiency, with an average runtime of 221.63 min. Machine learning models and the neural network were more resource-intensive, as hyperparameter optimization substantially increased runtime. Although prediction on the validation set required only 142.48 min for SVR and 26.42 min for DPAnet, their hyperparameter training consumed 582.21 and 360.41 min, respectively.
Discussion
In this study, we used a dataset comprising 122,672 SNP markers and DRP for nine traits from 16,122 Holstein cattle. SNP weight derived from GWAS or BayesBπ analyses were incorporated into a non-linear model to develop DPAnet, and into GBLUP to construct the WGBLUP. These models were benchmarked against GBLUP, four Bayesian methods, hyperparameter-optimized SVR, and KRR using five-fold cross-validation with 5 repetitions. Our results showed that weighted models improved prediction for additive traits such as FP and PP. Machine learning models optimized with Bayesian priors also enhanced prediction accuracy, with SVR performing best for three type traits. Bayesian models showed robust performance overall, with BayesR achieving the highest average accuracy, while KRR was the most computationally efficient.
The prediction accuracy of Bayesian models depends on the alignment between the model’s assumptions and the true distribution of marker effects [11]. BayesR achieved the highest average accuracy due to its reasonable assumptions, which effectively reduce the impact of noisy SNPs [49, 50]. Bayesian methods outperformed GBLUP primarily because of the presence of QTLs that significantly influence the traits [51]. Compared to GBLUP, the average accuracy for the production and health traits improved by 3.3%, while it decreased 0.1% for type traits. In contrast, ML methods do not make distribution assumptions for SNP effects, enabling them to capture nonlinear genotype–phenotype relationships [52]. They can also account for interactions between SNPs, whereas linear models based on pedigree and genomic relationships struggle to capture the SNP effects for complex traits [53]. Consequently, traditional linear models are superior for traits with a purely additive genetic architecture [54], while ML methods perform better for traits influenced by non-additive effects [55]. Of course, optimal hyperparameters are crucial for reducing the risk of overfitting in ML models [56, 57]. In our study, optimizing the hyperparameters using the Bayesian method led to a significant improvement in the predictive accuracy of each ML model. These results are consistent with those obtained by Wang et al. in pig [10].
GBLUP assumes that all SNP effects follow the same normal distribution [1], although most SNPs have minimal or negligible effects. Increasing the weights of SNPs associated with QTLs can effectively increase prediction accuracy, making the selection of appropriate SNP priors and weighting methods crucial [34, 58, 59]. In this study, significant SNPs identified by GWAS and BayesBπ for production and health traits highlighted the presence of major QTLs, leading to improved performance of WGBLUP and DPAnet compared with GBLUP, consistent with previous studies [32, 60–62]. Although WGBLUP_GWAS showed a slight improvement of 0.2% in average accuracy compared to WGBLUP_BayesBπ, it exhibited a decrease in accuracy for three type traits over GBLUP, with a significantly decline in unbiasedness. In contrast, WGBLUP_BayesBπ consistently outperformed GBLUP across all traits. This difference likely arises because Bayesian accounts for SNP interactions and estimates joint effects using MCMC, making them more robust for complex traits with multicollinearity. In contrast, GWAS assumes that SNP effects are independent and ignores interactions [1, 63]. Indeed, our findings parallel those of Meuwissen et al. [31], who reported improvements of 2%, 1.1%, 1.4%, and 0.2% in WGBLUP using the same method as WGBLUP_GWAS over GBLUP for MY, FY, PY, and SCC, respectively. However, when the SNP weights are derived from GWAS-estimated P-values, the performance of the WGBLUP decreases across all traits compared to GBLUP. These results indicate that effective identification and precise weight calculation for large-effect SNPs are essential for WGBLUP. DPANet introduces a dynamic attention mechanism that integrates multiple SNP priors, including GWAS significance (-log10(P-value)), allele frequency, and Bayesian-estimated SNP effect. These features are adaptively combined to capture nonlinear patterns in the genetic architecture. Compared with WGBLUP, DPAnet provides more flexibility in modeling the SNP weight. It outperforms GBLUP and WGBLUP for type traits and surpasses GBLUP, SVR and KRR for FP and PP. However, for three yield and health traits, its performance declined, likely because the learned weight distributions failed to adequately represent true causal variants. Currently, several weighted models show strong competitiveness. SLEMM integrates weights derived from window SNP-MAF and effect values, achieving accuracy comparable to BayesR and performing highly efficient on large-scale datasets [64]. KAML optimizes SNP weights through machine learning and incorporates large-effect SNPs as fixed effects in GBLUP, markedly improving prediction accuracy but with considerable computational cost (~ 11 days for 50 k animals and 45 k SNPs) [60, 64]. In contrast, DNNGP integrates multi-omics data using three CNN layers, one batch normalization layer (to mitigate overfitting), and two dropout layers, delivering substantial gains in prediction accuracy while maintaining computational efficiency comparable to GBLUP, LightGBM, and SVR [65].
Although Bayesian, WGBLUP, and ML methods generally outperform GBLUP across various traits, their significantly higher computational requirements, averaging more than six times that of GBLUP, limit their broader application. Bayesian methods rely on extensive MCMC sampling to estimate the full posterior distribution, WGBLUP requires prior GWAS or Bayes analyses to obtain SNP weights, and ML methods such as DPAnet require pre-training to optimize hyperparameters [66]. However, in practical breeding scenarios with relatively stable reference populations, model parameters can be pre-trained, reducing the runtime of DPAnet and WGBLUP drops to just 0.7 and 1.1 times that of GBLUP, respectively.
This study has serveral limitations. First, compared with large-scale international studies involving millions of individuals [64, 67], our sample size relatively small, which may constrain machine learning performance and reduce the accuracy of SNP weight estimation, thereby affecting weighted prediction models. Moreover, due to data constraints, SNP priors and genomic predictions were derived from the same population, raising the risk of information leakage and overfitting. Third, although DPAnet allows flexible SNP weighting, the single-head attention mechanism struggles to disentangle complex genetic interactions [68], and the fully connected structure lacks explicit modeling of linkage disequilibrium (LD), limiting the detection of haplotype-level effects. Finally, DPAnet has O(n2) complexity, making it the second most time-consuming approach among the tested models, which limits its scalability for large SNP datasets. Future work should address these limitations by using larger independent populations, integrating multi-head attention and LD-aware architectures, and developing hybrid CNN–Transformer frameworks to capture both local haplotype patterns and cross-region regulatory networks [69, 70].
Conclusion
In this study, we performed a multilayer perceptrons with a single-head attention mechanism, which leverages GWAS-derived SNP P-value, allele frequencies, and BayesBπ-estimated SNP effects as prior features. DPAnet outperformed GBLUP for FP, PP, and FL, but exhibited reduced accuracy for yield and SCS, indicating that its advantage is closely tied to the ability to capture trait-specific genetic architectures. Overall, BayesR achieves the highest predictive performance, whereas GBLUP remained the model offering the best overall balance between accuracy, robustness, and computational efficiency. Weighted models yielded improvements for specific traits, but their overall performance was inferior to that of traditional Bayesian models and further limited by the computational demands of training and prior estimation. Nevertheless, as more causal variants are identified for complex traits, the predictive potential of weighted models that integrated functional genomic insights into genomic selection is expected to increase.
Supplementary Information
Acknowledgements
All the authors thank the Dairy Association of China for providing phenotype data of experimental animals. We acknowledge the support of the High-Performance Computing Platform of China Agricultural University (Beijing) and the Xihe High-Performance Computing Platform of the National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing).
Authors’ contributions
DS conceived and designed the experiments and collected the data. WZ contributed to the experimental design, performed the statistical analyses, visualized the data, and edited the manuscript. Qi Z, JH, and BH assisted with experimental design, reviewed the results, and edited the manuscript. Qin Z provided guidance on statistical analysis and contributed to manuscript editing. All authors read and approved the final manuscript.
Funding
This work was financially supported by the National Key R&D Program of China (2021YFF1000700, 2022YFF1000103); STI 2030-Major Projects (2023ZD04069), the Program for Changjiang Scholar and Innovation Research Team in University (IRT_15R62), the 2115 Talent Development Program of China Agricultural University, Innovation and application of modern breeding technology for high-yield and high-quality dairy cows based on the mining of important genes in MIR trait (242N6601Z), and Genetic genomics research and functional gene identification of Hainan cattle (2022KJCX89).
Data availability
The data that support the findings of this study are available from the Dairy Data Center, Dairy Association of China (DAC) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from Dongxiao Sun upon reasonable request and with permission of the Dairy Data Center, Dairy Association of China (DAC).
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29. 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wiggans GR, Cole JB, Hubbard SM, Sonstegard TS. Genomic selection in dairy cattle: the USDA experience. Annu Rev Anim Biosci. 2017;5:309–27. 10.1146/annurev-animal-021815-111422. [DOI] [PubMed] [Google Scholar]
- 3.Meuwissen T, Hayes B, Goddard M. Accelerating improvement of livestock with genomic selection. Annu Rev Anim Biosci. 2013;1:221–37. 10.1146/annurev-animal-031412-103705. [DOI] [PubMed] [Google Scholar]
- 4.García-Ruiz A, Cole JB, VanRaden PM, Wiggans GR, Ruiz-López FJ, Van Tassell CP. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci U S A. 2016;113:E3995-4004. 10.1073/pnas.1519061113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schaeffer LR. Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet. 2006;123:218–23. 10.1111/j.1439-0388.2006.00595.x. [DOI] [PubMed] [Google Scholar]
- 6.Zhong S, Dekkers JCM, Fernando RL, Jannink J-L. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics. 2009;182:355–64. 10.1534/genetics.108.098277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Habier D, Fernando RL, Dekkers JCM. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–97. 10.1534/genetics.107.081190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Goddard M. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica. 2009;136:245–57. 10.1007/s10709-008-9308-0. [DOI] [PubMed] [Google Scholar]
- 9.Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Invited review: Genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009;92:433–43. 10.3168/jds.2008-1646. [DOI] [PubMed] [Google Scholar]
- 10.Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, et al. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J Anim Sci Biotechnol. 2022;13:60. 10.1186/s40104-022-00708-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Meher PK, Rustgi S, Kumar A. Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results. Heredity. 2022;128:519–30. 10.1038/s41437-022-00539-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gao H, Su G, Janss L, Zhang Y, Lund MS. Model comparison on genomic predictions using high-density markers for different groups of bulls in the Nordic Holstein population. J Dairy Sci. 2013;96:4678–87. 10.3168/jds.2012-6406. [DOI] [PubMed] [Google Scholar]
- 13.VanRaden PM, Wiggans GR. Derivation, calculation, and use of national animal model information. J Dairy Sci. 1991;74:2737–46. 10.3168/jds.S0022-0302(91)78453-1. [DOI] [PubMed] [Google Scholar]
- 14.VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23. 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
- 15.Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de Los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017;22:961–75. 10.1016/j.tplants.2017.08.011. [DOI] [PubMed] [Google Scholar]
- 16.Wang Y, Xin C, Gao Y, Li P, Wang M, Wu S, et al. Advancing selective breeding in leopard coral grouper (P. leopardus) through development of a high-throughput image-based growth trait. Agriculture Communications. 2024;2:100042. 10.1016/j.agrcom.2024.100042. [Google Scholar]
- 17.Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186. 10.1186/1471-2105-12-186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012;95:4114–29. 10.3168/jds.2011-5019. [DOI] [PubMed] [Google Scholar]
- 19.Wang X, Miao J, Chang T, Xia J, An B, Li Y, et al. Evaluation of GBLUP, BayesB and elastic net for genomic prediction in Chinese Simmental beef cattle. PLoS One. 2019;14:e0210442. 10.1371/journal.pone.0210442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Haile TA, Heidecker T, Wright D, Neupane S, Ramsay L, Vandenberg A, et al. Genomic selection for lentil breeding: empirical evidence. Plant Genome. 2020;13:e20002. 10.1002/tpg2.20002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Varona L, Legarra A, Toro MA, Vitezica ZG. Non-additive effects in genomic selection. Front Genet. 2018;9:78. 10.3389/fgene.2018.00078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1:67–82. 10.1109/4235.585893. [Google Scholar]
- 23.Saunders C, Gammerman A, Vovk V. Ridge Regression Learning Algorithm in Dual Variables. In: Proceedings of the Fifteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1998. p. 515–21.
- 24.González-Camacho JM, Ornella L, Pérez-Rodríguez P, Gianola D, Dreisigacker S, Crossa J. Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance. Plant Genome. 2018;11. 10.3835/plantgenome2017.11.0104. [DOI] [PMC free article] [PubMed]
- 25.González-Recio O, Rosa GJM, Gianola D. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livest Sci. 2014;166:217–31. 10.1016/j.livsci.2014.05.036. [Google Scholar]
- 26.Wang Y, Ni P, Sturrock M, Zeng Q, Wang B, Bao Z, et al. Deep learning for genomic selection of aquatic animals. Mar Life Sci Technol. 2024;6:631–50. 10.1007/s42995-024-00252-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Abdollahi-Arpanahi R, Gianola D, Peñagaricano F. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet Sel Evol. 2020;52:12. 10.1186/s12711-020-00531-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Waldmann P. Approximate bayesian neural networks in genomic prediction. Genet Sel Evol. 2018;50:70. 10.1186/s12711-018-0439-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bellot P, de Los Campos G, Pérez-Enciso M. Can deep learning improve genomic prediction of complex human traits? Genetics. 2018;210:809–19. 10.1534/genetics.118.301298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gong R, He S, Tian T, Chen J, Hao Y, Qiao C. FRCNN-AA-CIF: an automatic detection model of colon polyps based on attention awareness and context information fusion. Comput Biol Med. 2023;158:106787. 10.1016/j.compbiomed.2023.106787. [DOI] [PubMed] [Google Scholar]
- 31.Meuwissen T, Eikje LS, Gjuvsland AB. GWABLUP: genome-wide association assisted best linear unbiased prediction of genetic values. Genet Sel Evol. 2024;56:17. 10.1186/s12711-024-00881-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang Z, Ober U, Erbe M, Zhang H, Gao N, He J, et al. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One. 2014;9:e93017. 10.1371/journal.pone.0093017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhang Z, Liu J, Ding X, Bijma P, de Koning D-J, Zhang Q. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS One. 2010;5:e12648. 10.1371/journal.pone.0012648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhang Z, Erbe M, He J, Ober U, Gao N, Zhang H, et al. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix. G3 Genes|Genomes|Genetics. 2015;5:615–27. 10.1534/g3.114.016261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ding X, Zhang Z, Li X, Wang S, Wu X, Sun D, et al. Accuracy of genomic prediction for milk production traits in the Chinese Holstein population using a reference population consisting of cows. J Dairy Sci. 2013;96:5315–23. 10.3168/jds.2012-6194. [DOI] [PubMed] [Google Scholar]
- 36.Wu X, Fang M, Liu L, Wang S, Liu J, Ding X, et al. Genome wide association studies for body conformation traits in the Chinese Holstein cattle population. BMC Genomics. 2013;14:897. 10.1186/1471-2164-14-897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jairath L, Dekkers JC, Schaeffer LR, Liu Z, Burnside EB, Kolstad B. Genetic evaluation for herd life in Canada. J Dairy Sci. 1998;81:550–62. 10.3168/jds.S0022-0302(98)75607-3. [DOI] [PubMed] [Google Scholar]
- 38.Browning BL, Zhou Y, Browning SR. A one-penny imputed genome from next-generation reference panels. Am J Hum Genet. 2018;103:338–48. 10.1016/j.ajhg.2018.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9. 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Charmet G, Tran L-G, Auzanneau J, Rincent R, Bouchet S. BWGS: a R package for genomic selection and its application to a wheat breeding programme. PLoS ONE. 2020;15:e0222733. 10.1371/journal.pone.0222733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yin L, Zhang H, Li X, Zhao S, Liu X. hibayes: An R Package to Fit Individual-Level, Summary-Level and Single-Step Bayesian Regression Models for Genomic Prediction and Genome-Wide Association Studies. 2022;:2022.02.12.480230. 10.1101/2022.02.12.480230.
- 43.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.G S, Of C, L J, Ms L. Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances. J Dairy Sci. 2014;97. 10.3168/jds.2014-8210. [DOI] [PubMed]
- 45.Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. 2019.
- 46.Yang F, Wang X, Ma H, Li J. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models. BMC Med Inform Decis Mak. 2021;21(Suppl 2):90. 10.1186/s12911-021-01459-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pedrosa VB, Chen S-Y, Gloria LS, Doucette JS, Boerman JP, Rosa GJM, et al. Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle. J Dairy Sci. 2024;107:4758–71. 10.3168/jds.2023-24082. [DOI] [PubMed] [Google Scholar]
- 48.Fang L, Han B, Li H, Zhang Q, Zheng W, Chen A, et al. Cattle Cell Atlas: a multi-tissue single cell expression repository for advanced bovine genomics and comparative biology. 2024. 10.21203/rs.3.rs-4631710/v1.
- 49.Thavamanikumar S, Dolferus R, Thumma BR. Comparison of genomic selection models to predict flowering time and spike grain number in two hexaploid wheat doubled haploid populations. G3 Genes|Genomes|Genetics. 2015;5:1991–8. 10.1534/g3.115.019745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ma H, Li H, Ge F, Zhao H, Zhu B, Zhang L, et al. Improving genomic predictions in multi-breed cattle populations: a comparative analysis of BayesR and GBLUP models. Genes. 2024;15:253. 10.3390/genes15020253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Shi S, Li X, Fang L, Liu A, Su G, Zhang Y, et al. Genomic prediction using Bayesian regression models with global-local prior. Front Genet. 2021;12:628205. 10.3389/fgene.2021.628205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Piles M, Bergsma R, Gianola D, Gilbert H, Tusell L. Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning. Front Genet. 2021;12:611506. 10.3389/fgene.2021.611506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Gianola D, Okut H, Weigel KA, Rosa GJ. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet. 2011;12:87. 10.1186/1471-2156-12-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, et al. Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species. Front Plant Sci. 2020;11:25. 10.3389/fpls.2020.00025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Long N, Gianola D, Rosa GJM, Weigel KA. Application of support vector regression to genome-assisted prediction of quantitative traits. Theor Appl Genet. 2011;123:1065–74. 10.1007/s00122-011-1648-y. [DOI] [PubMed] [Google Scholar]
- 56.Ornella L, Pérez P, Tapia E, González-Camacho JM, Burgueño J, Zhang X, et al. Genomic-enabled prediction with classification algorithms. Heredity. 2014;112:616–26. 10.1038/hdy.2013.144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics. 2021;22:19. 10.1186/s12864-020-07319-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.de Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 2013;9:e1003608. 10.1371/journal.pgen.1003608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ramstein GP, Evans J, Kaeppler SM, Mitchell RB, Vogel KP, Buell CR, et al. Accuracy of genomic prediction in switchgrass (Panicum virgatum L.) improved by accounting for linkage disequilibrium. G3 Genes|Genomes|Genetics. 2016;6:1049–62. 10.1534/g3.115.024950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Yin L, Zhang H, Zhou X, Yuan X, Zhao S, Li X, et al. KAML: improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biol. 2020;21(1):146. 10.1186/s13059-020-02052-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wang D, Ning C, Liu J-F, Zhang Q, Jiang L. Short communication: replication of genome-wide association studies for milk production traits in Chinese Holstein by an efficient rotated linear mixed model. J Dairy Sci. 2019;102(3):2378–83. 10.3168/jds.2018-15298. [DOI] [PubMed] [Google Scholar]
- 62.Medina CA, Kaur H, Ray I, Yu L-X. Strategies to increase prediction accuracy in genomic selection of complex traits in alfalfa (Medicago sativa L.). Cells. 2021;10:3372. 10.3390/cells10123372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hayes B. Overview of Statistical Methods for Genome-Wide Association Studies (GWAS). In: Gondro C, van der Werf J, Hayes B, editors. Genome-Wide Association Studies and Genomic Prediction. Totowa, NJ: Humana Press; 2013. p. 149–69. 10.1007/978-1-62703-447-0_6. [DOI] [PubMed]
- 64.Cheng J, Maltecca C, VanRaden PM, O’Connell JR, Ma L, Jiang J. SLEMM: million-scale genomic predictions with window-based SNP weighting. Bioinformatics. 2023;39:btad127. 10.1093/bioinformatics/btad127. [DOI] [PMC free article] [PubMed]
- 65.Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant. 2023;16:279–93. 10.1016/j.molp.2022.11.004. [DOI] [PubMed] [Google Scholar]
- 66.Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: a review for statisticians. J Am Stat Assoc. 2017;112:859–77. 10.1080/01621459.2017.1285773. [Google Scholar]
- 67.Liang Z, Prakapenka D, Zaabza HB, VanRaden PM, Van Tassell CP, Da Y. A million-cow genome-wide association study of productive life in U.S. Holstein cows. Genet Sel Evol. 2024;56:67. 10.1186/s12711-024-00935-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Liu L, Liu J, Han J. Multi-head or Single-head? An Empirical Comparison for Transformer Training. 2021. 10.48550/arXiv.2106.09650.
- 69.Zeynali A, Tinati MA, Tazehkand BM. Hybrid CNN-transformer architecture with Xception-based feature enhancement for accurate breast cancer classification. IEEE Access. 2024;12:189477–93. 10.1109/ACCESS.2024.3516535. [Google Scholar]
- 70.Ji L, Hou W, Xiong L, Zhou H, Liu C, Li L, et al. GSCNN: A genomic selection convolutional neural network model based on SNP genotype and physical distance features and data augmentation strategy. 2024. 10.21203/rs.3.rs-3991262/v1.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The computational codes are freely available via https://github.com/bioramen-Blip/GS-Benchmark-analysis.
The data that support the findings of this study are available from the Dairy Data Center, Dairy Association of China (DAC) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from Dongxiao Sun upon reasonable request and with permission of the Dairy Data Center, Dairy Association of China (DAC).
























