Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Apr 25;26(2):bbaf191. doi: 10.1093/bib/bbaf191

WheatGP, a genomic prediction method based on CNN and LSTM

Chunying Wang 1,2,#, Di Zhang 3,#, Yuexin Ma 4, Yonghao Zhao 5, Ping Liu 6,7,, Xiang Li 8,9,
PMCID: PMC12021598  PMID: 40275535

Abstract

Wheat plays a crucial role in ensuring food security. However, its complex genetic structure and trait variation pose significant challenges for breeding superior varieties. In this study, a genomic prediction method for wheat (WheatGP) is proposed. WheatGP is designed to improve the phenotype prediction accuracy by modeling both additive genetic effects and epistatic genetic effects. It is primarily composed of a convolutional neural network (CNN) module and a long short-term memory (LSTM) module. The multilayer CNNs within the CNN module focus on capturing short-range dependencies within the genomic sequence. Meanwhile, the LSTM module, with its unique gating mechanism, is designed to retain long-distance dependency relationships between gene loci in the features. Therefore, WheatGP could comprehensively extract multilevel features from genomic inputs. Compared to ridge regression best linear unbiased prediction (rrBLUP), extreme gradient boosting (XGBoost), support vector regression (SVR), and deep neural network genomic prediction (DNNGP), WheatGP demonstrates a clear advantage in terms of prediction accuracy. The prediction accuracy for wheat yield reaches 0.73, while the prediction accuracies for various agronomic traits range between 0.62 and 0.78. It also exhibits robust performance across other crop types and multi-omics datasets. In addition, SHapley Additive exPlanations (SHAP) is employed to evaluate the contributions of inputs to the predictive model. As a high-performance tool for genomic prediction in wheat, WheatGP opens up new possibilities for achieving efficient and optimized wheat breeding.

Keywords: wheat, phenotype, genomic prediction, long short-term memory

Introduction

Food security is an escalating global concern, exacerbated by rapid population growth and climate change [1]. Improving crop yield and quality through advanced breeding methods, particularly for key crops like wheat, is essential to strengthen food security. The complex genetic structure and trait variation of wheat bring both opportunities and challenges for breeding excellent varieties [2]. Genomic selection (GS) is a breeding method based on whole genome sequencing data and phenotypic data for the prediction of different phenotypes by constructing prediction models [3]. GS has been widely applied to the breeding of new wheat varieties, providing breeders with a reliable reference for germplasm resource screening [4, 5]. Candidate genotypes are selected without measuring phenotypic information, which greatly reduces the time required for the selection process of breeding.

Genomic prediction (GP) is the key step in GS for wheat breeding [6]. GP models has emerged as a powerful tool in plant breeding and has made significant strides in recent years [3]. Traditional GP methods, such as regression-based techniques, could capture complex relationships between the predictor and response variables [7]. However, they have limitations when analyzing high-dimensional data. GP methods based on machine learning (ML) offer the ability to model nonlinear relationships in genotypes without requiring prior knowledge of the underlying genetic architecture [8, 9]. For instance, support vector regression (SVR) and extreme gradient boosting (XGBoost) have been applied to plant breeding [10, 11], achieving a performance comparable to or even surpassing that of traditional methods.

GP methods must overcome the challenge of using high-dimensional marker data to accurately predict phenotypes, where the number of genotypic markers is much larger than the population size [12–14]. With the development of various deep learning algorithms, a genomic selection method based on deep learning (DeepGS) has been proposed for the first time [15]. DeepGS yielded a high correlation between the predicted and observed grain lengths, with a Pearson correlation coefficient (PCC) of 0.742. DLGWAS is a deep learning (DL) architecture featuring dual convolutional neural network (CNN) streams, developed for predicting multiple traits [16]. The average prediction accuracy of DLGWAS for multiple phenotypic traits reached 0.563. The Poisson deep neural network (PDNN) method was proposed and outperformed the Bayesian regression and generalized Poisson regression methods in terms of prediction accuracy [17]. A DL framework, SoyDNGP, has been proposed for soybean breeding [18]. The prediction accuracy of SoyDNGP for flowering date, yield and plant height, and hundred-seed weight reached 0.555, 0.508, and 0.836, respectively. Deep neural network–based genome prediction (DNNGP) was proposed by stacking multiple linear and nonlinear processing units in a layer-wise fashion, thereby enabling the learning of complex representations at different levels of abstraction [19]. DNNGP achieved prediction accuracies of 0.68 for wheat yield and 0.766 for wheat grain length. All of these GP methods based on DL employ convolutional neural networks or deep neural networks to effectively model complex, nonlinear relationships between gene sequences and phenotypic traits.

In phenotype prediction of wheat, not only the additive effect should be considered, but more attention should be paid to the dominance effect and epistasis effects. Long short-term memory (LSTM), a prominent architecture within the realm of recurrent neural networks (RNNs), exhibits a unique ability in capturing the global features of an entire sequence [20, 21]. Despite the RNN architectures that have been employed in previous studies for GP, no further exploration has been conducted within the algorithms of the RNN architecture [22]. Therefore, a prediction method based on CNN and LSTM (WheatGP) was designed to predict various agronomic traits and yield using genomic data by capturing the cross-spatial dependencies in gene sequences.

Materials and methods

Overview of WheatGP

Multiple modules and blocks are hierarchically stacked as shown in Fig. 1a, through which the intricate features inherent in genomic data could be effectively learned by WheatGP at diverse levels of abstraction. In the CNN module, the genotypic input in the form of a one-dimensional vector is evenly divided into five slices, and each slice is processed through the multilayer CNN structure. The CNN module is concentrated on learning the local features of the slices. The LSTM module is employed to further extract global features from genomic data. The interactions between non-allelic genes are captured by its long short-term memory mechanism to modeling the additive and epistatic effects of genes. Ultimately, the fully connected layer in the Prediction module maps the extracted distributed feature representations to the sample tag space, enabling phenotype prediction of wheat. The shape adjustment block could perform a series of adaptive linear operations on the features extracted by the previous network layer, feeding them into the subsequent network layer.

Figure 1.

Figure 1

Overview of WheatGP. (a) The WheatGP is composed of the following: (1) a CNN module; (2) two shape adjustment blocks; (3) a LSTM module; (4) a predict module. The input to the model is genomic data from wheat individuals of a specific genotype, either in its 01 encoded form or the dimensionality-reduced features. (b) Each sample is divided into five slices and input in the form of one-dimensional vector, and multiple convolutional layers enable the CNN module to focus on extracting local features and mapping them into a high-dimensional space. (c) LSTM module is employed to further extract global features from genomic data and modeling the long-range dependencies within the genomic sequences.

The feature extraction process of CNN module is shown in Fig. 1b. The CNN module contains three convolutional layers, and each convolutional layer is followed by a ReLU activation function. The feature information between the local single-nucleotide polymorphisms (SNPs) of the input gene sequence is extracted by convolution in a sliding window manner. A series of feature vectors are generated after the convolutional layer (equation 1). The one-dimensional sequence is mapped into a high-dimensional feature space through convolutional layers [23].

graphic file with name DmEquation1.gif (1)

where Inline graphic represents the output element at the i-th position of the k-th output channel for the n-th sample (a specific gene sequence). Inline graphic represents the number of input channels, while Inline graphic represents the size of the convolutional kernel. Inline graphic represents the weight element at the j-th position within the convolutional kernel that connects the c-th input channel to the k-th output channel. Inline graphic represents the input element from the n-th sample and c-th input channel, where Inline graphic represents the input position index calculated based on the convolution operation parameters, including kernel size, stride, and padding. The Inline graphic represents the bias term for the k-th output channel. The ReLU activation function could enhance the nonlinear representation ability of the CNN module. These weights are initialized using the Glorot initialization method, which could mitigate the issues of vanishing and exploding gradients.

The feature extraction process of LSTM module is shown in Fig. 1c. Its main components include the forget gate, input gate, and output gate. The previous information is selectively discarded by the LSTM network through the forget gate (Equation 2). The acceptance of input information by the memory cell is controlled by the input gate (Equation 3).

graphic file with name DmEquation2.gif (2)
graphic file with name DmEquation3.gif (3)

where Inline graphic and Inline graphic represent the outputs of the forget gate and input gate, respectively. They are determined by the sigmoid activation function Inline graphic. Inline graphic and Inline graphic represent the weight matrices of the forget gate and input gate, respectively. Inline graphic represents the previous hidden state and Inline graphic represents current input. Inline graphic and Inline graphic represent the bias terms associated with the forget gate and input gate, respectively. Guided by the outputs of the forget gate and the input gate, the memory cell selectively retains a portion of the current information and integrates new vectors derived from the input features (Equations 4 and 5).

graphic file with name DmEquation4.gif (4)
graphic file with name DmEquation5.gif (5)

Here, Inline graphic represents the memory cell state at the current time step, Inline graphic represents the memory cell state at the previous time step, Inline graphic represents the output of the forget gate, Inline graphic represents the output of the input gate, and Inline graphic represents the next cell state generated by the input gate. In equation (5), Inline graphic represents the weight matrix for the next cell state and Inline graphic represents the bias term associated with it. The output gate, based on the current cell state, determines the information that is subsequently output to the hidden state of the LSTM unit (equations 6 and 7).

graphic file with name DmEquation6.gif (6)
graphic file with name DmEquation7.gif (7)

Here, Inline graphic represents the output of the output gate, which is determined by the sigmoid activation function Inline graphic. Inline graphic represents the weight matrix of the output gate and Inline graphic represents the bias term associated with it. In equation (7), Inline graphic represents the hidden state at the current time step Inline graphic. Following the features have been processed by the LSTM module, each sample yields a hidden state vector with 128 dimensions. These hidden state vectors encapsulate intricate interplay among features. A dropout layer is also used after the LSTM layer to decrease the risk of overfitting.

Evaluation metrics and training settings

The PCC [19] between the predicted and observed values was used as the evaluation index of prediction accuracy in this study. Given the limited sample sizes of the wheat datasets employed, 10-fold cross-validation [24] was used to assess model accuracy. To prevent WheatGP from converging to local optima during training, a Bayesian-based hyperparameter optimization strategy within the Optuna framework was utilized to autonomously explore the hyperparameter space across multiple datasets [25]. This approach played a critical role in accurately identifying the optimal values of key hyperparameters, including batch size, weight decay, and learning rate.

Dataset

The wheat599 dataset and the wheat2000 dataset were the main datasets used in this study. Wheat599 was the CIMMYT wheat public dataset [26]. It contains the grain yields (GY) of 599 varieties under four representative environments and each variety was genotyped using 1279 markers. The wheat2000 dataset contained 2000 local varieties of Iranian bread wheat, each of which was genotyped using 33 709 markers [27]. Six traits were used in this study, including grain length (GL), grain width (GW), grain hardness (GH), thousand-kernel weight (TKW), test weight (TESTW), and grain protein (GPR). To more comprehensively evaluate the performance of WheatGP, the Rice and Maize datasets were also used in this study. The Rice299 dataset was obtained from the irrigated rice breeding program of the International Rice Research Institute. A total of 73 147 markers and two agronomic traits, plant height (PH) and GL, were used in this study [28]. Maize1404 consisted of 1404 progeny descended from 24 Chinese elite inbred maize. It included transcriptomic data for 392 samples and featuring expression levels of 39 484 genes [19, 29], providing a multi-omics perspective for analysis. In this study, two agronomic traits, PH and days to anthesis (DTA), were used.

Result

Prediction accuracy of WheatGP and comparison with other methods

In two wheat datasets, wheat599 and wheat2000, the prediction accuracy of rrBLUP, XGBoost, SVR, DNNGP, and WheatGP was compared. The rrBLUP was implemented using the rrBLUP package within the R software [30], while XGboost and SVR were constructed in Python. The DNNGP method was constructed based on the architecture proposed by Wang et al. [19], and the code could be obtained from the publicly available GitHub repository of Xie et al. [31]. The values of SNP locus effect were used as input for these methods, with phenotypes such as GY, GH, and GW serving as target variables. The results are presented in Fig. 2.

Figure 2.

Figure 2

The comparison of prediction accuracy among five methods across wheat datasets. The x-axis represents different phenotypic traits, with each trait evaluated using five methods arranged from left to right: rrBLUP, SVR, XGBoost, DNNGP, and WheatGP. (a) Best prediction accuracy for grain yield (GY) in the wheat599 dataset. GY1, GY2, GY3, and GY4 denote grain yield measured under four environments. (b) Best prediction accuracy for six agronomic traits in the wheat2000 dataset, including grain hardness (GH), grain protein (GPR), test weight (TESTW), grain length (GL), grain width (GW), and thousand-kernel weight (TKW).

As shown in Fig. 2a, the variation in prediction accuracy for GY within the wheat599 dataset may primarily stem from the differences of environment-specific data noise, while the limited sample size posed a modeling bottleneck in deciphering the nonlinear relationships between this complex quantitative trait and the genome. Compared to the other four methods, rrBLUP has conceptual differences in the a priori assumptions assigned to marker effects. It exhibited deficiencies in capturing complex genetic effects. Among the five methods evaluated for GY prediction on the wheat599 dataset, rrBLUP demonstrated the lowest performance. Although nonlinear modeling structures were possessed by SVR and XGBoost, their predictive accuracy was not consistently outstanding across all scenarios. Specifically, XGBoost demonstrated poor accuracy in predicting GY2 and GY3. In contrast, DNNGP and WheatGP featured more complex nonlinear modeling architectures, making them better suited for addressing nonlinear regression problems. They consistently demonstrated superior prediction accuracy across all environments compared to other methods. Compared to the DNNGP, WheatGP demonstrated an average improvement of 8.5%.

In the wheat2000 dataset, the WheatGP achieved higher prediction accuracy compared to the other four methods as illustrated in Fig. 2b. Among these six agronomic traits, the highest prediction accuracy was achieved for GL, which was able to reach 0.785. The accuracy was 9.2% higher than that of DNNGP and ranged from 12.2% to 16.9% higher than those of the other prediction methods. This outcome could be partially attributed to the higher heritability of GL compared to that of the other traits, indicating a closer relationship between its phenotype and genotype. The accuracy of WheatGP was comparable to that of DNNGP in the prediction of GH. Compared to rrBLUP, SVR, and XGboost, the accuracy of WheatGP improved by 21.8%, 7.6%, and 2.4%, respectively. All five methods demonstrated limited accuracy in predicting GPR, potentially attributable to the complex genotype–environment interactions influencing this trait. However, WheatGP still demonstrated superior accuracy, outperforming the other four methods by 5.2% to 30.7%. In the prediction of TKW, the ML-based SVR demonstrated superior accuracy compared with the DL-based DNNGP. The accuracy of SVR was only slightly 1.8% lower than that of WheatGP. This could be attributed to the specific statistical properties of the TKW data, which enabled the SVR to model its key features accurately. These results suggested that WheatGP was a better method that could effectively utilize genetic data to predict multiple traits of wheat.

Improving genomic prediction in different environments via transfer learning

Transfer learning was implemented to enhance the performance of WheatGP across multiple environments in this study. The genotype–phenotype relationship patterns learned from source environments were transferred to target environments, allowing feature representations to be effectively reused across different environmental conditions. Specifically, a layer-by-layer unfreezing strategy was implemented for transfer learning, as shown in Fig. 3a. The LSTM module was initially unfrozen with a small learning rate to avoid disrupting the learned general features. After the unfreezing of both LSTM and predict module, it was fine-tuned according to the validation results to learn high-level features and adapt to the specific requirements of the new environment.

Figure 3.

Figure 3

Transfer learning in different environments. (a) The step-by-step unfreezing strategy. Arrows showed the training steps. Dashed arrows mean the LSTM and predict modules were unfrozen and fine-tuning, while the CNN module stayed frozen. The model was pre-trained on grain yield (GY) from three other environments, then used to predict yield in environment 1 (GY1). (b) Scatter plots compare GY1 predictions before and after using the pre-trained model. R represents the Pearson correlation coefficient; mean absolute error (MAE) and mean squared error (MSE) values were also compared to measure improvement.

Fine-tuning the pre-trained model resulted in a modest improvement in the accuracy of yield prediction in the new environment, as shown in Fig. 3b. Although the increase in accuracy was relatively small, the significant reduction in mean absolute error and mean squared error by 33.17% and 22.22%, respectively, indicated substantial progress in reducing prediction bias. The result may be attributed to the transfer learning process, which leveraged a broader range of data information, enabling the model to better capture the overall distribution patterns of the data [28]. As a result, the impact of anomaly values on the prediction outcomes was effectively mitigated.

In the evaluation of deep learning methods, computational resource requirements were considered another critical metric besides accuracy. The large number of parameters in WheatGP provided the potential for learning complex patterns, but it also increased resource consumption. When the original genotype matrix was used as input and training was conducted on the wheat599 dataset, WheatGP was found to have ~5.23 million total parameters. On the wheat2000 dataset, WheatGP was found to have ~65.52 million total parameters. It was roughly >3000 times the number of DNNGP’s total parameters. On an RTX 3090 GPU, the training time was noted to differ by ~10 times and the resource consumption gap was further widened without a GPU. By using transfer learning, WheatGP could avoid training from scratch, reducing the training time to about one-tenth. It is encouraging to note that WheatGP demonstrated remarkable computational efficiency during the inference phase, like DNNGP, typically completing inference tasks within seconds. To address this, the use of a dimensionality-reduced feature matrix as input for WheatGP was considered to accelerate both training and inference speeds.

Enhancing WheatGP’s performance with dimensionality-reduced inputs

The predictive accuracy of deep learning models is markedly influenced by the sample size and input dimensionality [32]. Larger samples provided a more comprehensive genotype–phenotype relationship, thereby typically improving the prediction accuracy of GP [19]. For high-dimensional genotypic data, efficient feature extraction could also enhance the accuracy of the GP model [33], which was equally important as the sample size increases. In the wheat2000 dataset, the OGM is in binary (0/1) form, where “0” denoted a mutation at a specific SNP locus and “1” indicated no mutation. The features after dimensionality reduction derived from principal component analysis (PCA) [34] were utilized to evaluate the impact of input dimensionality on the accuracy of GP, as shown in Fig. 4.

Figure 4.

Figure 4

The impact of input dimensionality on the prediction accuracy of WheatGP. The experiment was conducted on the wheat2000 dataset using a 10-fold cross-validation approach under a fixed NumPy and torch random seed. To investigate the influence of output dimensionality on WheatGP’s accuracy, five distinct scenarios were established: (1) using the original genotype matrix (OGM) as input, (2) folding the extracted OGM into 5 channels as input, (3) using 1691 principal components (PCs) as input, (4) using 400 PCs as input, and (5) using 200 PCs as input. The upper and lower edges of each box represented the third quartile (75%) and first quartile (25%) of the accuracy, respectively, defining the interquartile range which reflected the primary distribution characteristics of the accuracy. A black horizontal line within each box plot indicated the average accuracy.

Compared to using the OGM, WheatGP showed nearly equivalent prediction accuracy when utilizing the principal components (PCs) matrix. It was observed that for the predictions of TESTW, GL, and TKW, the use of OGM yielded average accuracy that was 2.1%, 2.2%, and 1.4% higher, respectively, than that achieved with the best-performing PC matrix. However, in the predictions of GH, GPR, and GW, the best-performing PCs matrix showed slightly higher accuracy than the OGM by 1.0%, 1.2%, and 0.01% in average accuracy, respectively. In the predictions of GH, GPR, TESTW, GL, GW, and TKW, the average accuracy using OGM was higher than that using folded OGM by 0.7%, 1.3%, 1.7%, 2.7%, 0.9%, and 1.1%, respectively. This may be attributed to the folded OGM altering the original data distribution, thereby introducing additional noise into the data. Although folded OGM offers faster training speed, it is not considered a superior strategy. The use of 200 PCs as input yielded the highest average accuracy and demonstrated better robustness. Additionally, owing to the reduced input dimensions, it offered faster training speeds compared to using OGM as the input. The worst performance was obtained when 1691 PCs were used. This suggests that, for WheatGP, while retaining more information with an increased number of PCs, more noise was also retained. Therefore, feature selection became crucial, as it helped minimize the noisy data, thereby improving the prediction accuracy of WheatGP.

SHAP-based interpretability for prognostic analysis

To elucidate the biological rationale behind WheatGP’s predictions, SHapley Additive exPlanations (SHAP) [35] was applied to quantify feature contributions. Specifically, we utilized WheatGP to develop predictive models for grain yield using the wheat599 dataset. Subsequently, the principal component–grain yield relationships and the marker–grain yield relationships were investigated, as illustrated in Fig. 5.

Figure 5.

Figure 5

Contributions of dimensionality-reduced features and SNPs. The larger the absolute value of SHAP, the greater its impact on the model’s prediction results. Positive values indicate a positive contribution to predicted value, while negative values indicate a negative contribution. (a) The distribution of the top 10 principal components with the highest global contribution to models. The value of features from bottom to top represents the increasing values of the principal components. (b) The contribution of SNPs in a single sample. Red dots indicate mutations at the corresponding positions and blue dots indicate absence of mutations.

The contribution of principal components was analyzed based on the entire training set and the results are shown in Fig. 5a. The PC1 had the highest contribution to the model and significantly promoted the decrease in the predicted value. It was also the principal component with the highest cumulative variance explained in PCA. From the perspective of model interpretability, this alignment confirmed that PCA effectively captures key variations in genomic data. Figure 5b shows that a large number of SNPs had SHAP values concentrated near zero, indicating that these SNPs contributed little or almost nothing to the model predictions. The SNPs with large SHAP values significantly influence the model’s prediction results, indicating their potential importance in regulating gene expression. These results revealed the distribution patterns of feature importance and provided clear directions for subsequent feature selection, model improvement, and exploration of biological significance. Higher training weights could be assigned to high-contribution features to strengthen the model’s focus on them. Meanwhile, analyzing low-contribution features could help identify potential noise, thereby improving the accuracy of WheatGP.

Scalability of WheatGP

To evaluate the cross-crop adaptability of WheatGP, comparative experiments were conducted on three small-scale datasets of wheat, maize, and rice. Furthermore, to improve the WheatGP’s accuracy for other crop species, we introduced several refinements to its architecture and the comparative methods were constructed as follows: (1) WheatCNN, where the LSTM module was removed; (2) WheatLSTM, where the CNN module was removed; (3) WheatRes, which integrated residual connections into the CNN module; (4) WheatECA, which incorporated an efficient channel attention mechanism into the CNN module; (5) WheatBiLSTM, which implemented a bidirectional propagation structure in the LSTM module; and (6) WheatMulti, which applied a multi-head attention mechanism before the LSTM module. Table 1 shows the best prediction accuracy of two traits (DAT and PH) using 280 PCs from the maize transcriptome, the best prediction accuracy of two traits (GL and PH) using 185 PCs from the rice genome, and the average prediction accuracy for grain yield across four environments using 250 PCs from the wheat genome.

Table 1.

Performance of WheatGP and its improved methods in maize and rice datasets

Model Wheat GY Maize DTA Maize PH Rice GL Rice PH Average Score
WheatGP 0.6898 0.7656 0.781 0.6538 0.5898 0.694
WheatMulti 0.6192 0.7787 0.7576 0.6375 0.6 0.6786
WheatRes 0.6064 0.7451 0.7858 0.5666 0.6163 0.664
WheatECA 0.6498 0.7549 0.7478 0.6267 0.5568 0.6672
DNNGP 0.6718 0.7416 0.7444 0.5256 0.5905 0.6548
WheatLSTM 0.683 0.7119 0.7233 0.5608 0.5927 0.6543
WheatCNN 0.6744 0.713 0.6798 0.5349 0.564 0.6332
WheatBiLSTM 0.6069 0.7819 0.7529 0.521 0.5015 0.6328

Overall, WheatGP demonstrated the best average performance among all methods. The results revealed that WheatGP slightly underperformed DNNGP in predicting rice PH by a marginal difference of 0.12%, while outperforming DNNGP in predicting the remaining four phenotypes. Compared to WheatGP, WheatMulti, WheatRes, and WheatBiLSTM led to varying levels of accuracy improvement for Maize DTA, Maize PH, and Rice PH prediction. These results indicated that these architectural enhancements could effectively model complex nonlinear relationships across diverse prediction tasks. In addition, the introduction of attention mechanisms enables the model to prioritize key genomic regions or gene expression patterns most relevant to the target phenotypes. By dynamically assigning higher weights to these critical features, the interpretability of the model’s predictions is also enhanced. Notably, transcriptomic data predicted DTA and PH traits with >0.75 accuracy using fewer markers, highlighting its effectiveness in explaining maize trait variation. However, the attention mechanism reduced accuracy in Maize PH prediction. This suggests that employing algorithms with more complex architectures is not always an optimal strategy, particularly in scenarios with limited sample sizes and substantial data noise.

Discussion

Previous studies have demonstrated the significant potential of DL methods for enhancing the prediction accuracy of complex traits in plant breeding [3, 36]. Most DL-based GP methods have been constructed using CNN [37]. However, unlike diploid species such as maize and rice, wheat’s polyploid structure involves intricate allelic interactions and pronounced epistatic effects [38], which require advanced algorithmic to accurately model the genotype-to-phenotype relationship. Although the model trained by WheatGP has a significantly larger number of total parameters compared to existing methods, the algorithmic architectural complexity is necessary to address the unique challenges of wheat’s phenotype prediction. WheatGP has improved the prediction of epistatic gene effects. Specifically, the input sequence was divided into five segments, allowing the multilayer CNNs to focus on capturing short-range dependencies within the sequence. LSTM, with its unique gating mechanism, retained dependencies representing long-distance relationships between gene loci in the features. Thus, with its unique multilayer structure, WheatGP could extract low-, medium-, and high-level features from wheat genomic data in a dynamic manner.

When dealing with the genomic big data, researchers were confronted with a “large p, small n” situation. The input dimensions of the genomic data greatly exceeded the sample size [39]. With the proliferation of model parameters, while augmenting expressive capability of GP model, the inherent high correlation and redundancy among the inputs inevitably resulted in overfitting. Efficient feature extraction was a critical step in the development of WheatGP, as the utilization of dimensionality-reduced features as inputs enabled the projection of genotype inputs across varying scales into a lower-dimensional space, thereby alleviating overfitting issues associated with high-dimensional data. To integrate multi-omics data for wheat phenotype prediction, specialized feature compression and fusion strategies should also be considered to alleviate overfitting.

Through comprehensive evaluations of WheatGP’s scalability, we observed that simpler neural networks exhibit lower accuracy in predicting maize plant height. This limitation likely stems from the nonadditive gene expression patterns in maize [40]. The shallow networks cannot adequately model to capture nonlinear gene–gene interactions due to their restricted depth. In contrast, the more complex neural networks show reduced accuracy in predicting rice PH. This may be attributed to the relatively compact genome of rice [41], where nonadditive effects are less pronounced compared to maize or wheat. These findings demonstrated that in genomic selection modeling, there was no single method worked best for all scenarios [42]. It was essential to carefully compare different methods based on the specific requirements of each prediction task to identify the most suitable one. Therefore, the improvement of wheat GP methods must not only focus on predictive accuracy but also strive to understand the biological meaning behind the models. However, the inherent opacity of neural networks poses a significant challenge [43, 44]. In this study, the features that made significant contributions to the model’s performance were visualized by using SHAP. In the next phase of our research, we plan to incorporate genome-wide association study (GWAS) [45] findings into WheatGP, thereby improving the interpretability of the model.

WheatGP holds promising prospects for integration into plant breeding toolkits [46–48], making it readily available for utilization in large-scale breeding initiatives. However, validation across larger wheat datasets will be essential to evaluate its performance. By incorporating WheatGP into existing breeding platforms and leveraging the abundant multi-omics datasets, researchers can conduct ongoing targeted refinement of it, thereby promising to further accelerate wheat breeding progress.

Key Points

  • We proposed a genomic prediction method for wheat (WheatGP) by combining CNN and LSTM, which could capture additive and nonadditive genetic effects more effectively.

  • WheatGP demonstrates competitive performance with other methods in predicting wheat grain yield and various agronomic traits, while showing cross-species adaptability and robustness.

  • Dimensionality reduction of input features could enhance WheatGP’s computational efficiency without significantly accuracy loss, while SHAP-based visualization effectively identifies key SNPs and principal components contributing to predictions.

Contributor Information

Chunying Wang, State Key Laboratory of Wheat Improvement, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China; Shandong Engineering Research Center of Agricultural Equipment Intelligentization, College of Mechanical and Electronic Engineering, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Di Zhang, Shandong Engineering Research Center of Agricultural Equipment Intelligentization, College of Mechanical and Electronic Engineering, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Yuexin Ma, Shandong Engineering Research Center of Agricultural Equipment Intelligentization, College of Mechanical and Electronic Engineering, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Yonghao Zhao, Shandong Engineering Research Center of Agricultural Equipment Intelligentization, College of Mechanical and Electronic Engineering, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Ping Liu, State Key Laboratory of Wheat Improvement, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China; Shandong Engineering Research Center of Agricultural Equipment Intelligentization, College of Mechanical and Electronic Engineering, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Xiang Li, State Key Laboratory of Wheat Improvement, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China; College of Life Sciences, Shandong Agricultural University, 61 Daizong Street, Tai’an 271018, China.

Author contributions

P.L. conceived and led the research project, C.Y.W. and D.Z. collaborated with the experiments and wrote the manuscript, X.L. reviewed the manuscript and revised the biological content of the article, Y.X.M. reviewed the manuscript and revised the computer science content of the article, Y.H.Z. reviewed the manuscript and provided data and its preprocessing for this study, and all the authors approved the final manuscript.

Conflict of interest: None declared.

Funding

This work was supported by the Shandong Provincial Key Research and Development Program Project (2023TZXD004; 2024LZGC006; 2021LZGC013) and Shandong Province Postdoctoral Innovation Project (SDCX-ZG-202400195).

Data availability

The dataset utilized in this study could be found in the references provided earlier, and the source code for WheatGP and a localized GUI could be obtained from the GitHub repository (https://github.com/Breed-AI/WheatGP.git).

References

  • 1. Zhao  J, Zhang  Z, Zhao  C, et al.  Dissecting the vital role of dietary changes in food security assessment under climate change. Commun Earth Environ  2024;5:440. 10.1038/s43247-024-01612-3. [DOI] [Google Scholar]
  • 2. Wang  Z, Miao  L, Chen  Y, et al.  Deciphering the evolution and complexity of wheat germplasm from a genomic perspective. J Genet Genomics  2023;50:846–60. 10.1016/j.jgg.2023.08.002. [DOI] [PubMed] [Google Scholar]
  • 3. Alemu  A, Åstrand  J, Montesinos-López  OA, et al.  Genomic selection in plant breeding: key factors shaping two decades of progress. Mol Plant  2024;17:552–78. 10.1016/j.molp.2024.03.007. [DOI] [PubMed] [Google Scholar]
  • 4. Guzman  C, Peña  RJ, Singh  R, et al.  Wheat quality improvement at CIMMYT and the use of genomic selection on it. Appl Transl Genom  2016;11:3–8. 10.1016/j.atg.2016.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Marulanda  JJ, Mi  X, Melchinger  AE, et al.  Optimum breeding strategies using genomic selection for hybrid breeding in wheat, maize, rye, barley, rice and triticale. Theor Appl Genet  2016;129:1901–13. 10.1007/s00122-016-2748-5. [DOI] [PubMed] [Google Scholar]
  • 6. Gizachew  HG. Genomic selection: a faster strategy for plant breeding. In: Haiping  W (ed). Case Studies of Breeding Strategies in Major Plant Species. Rijeka: IntechOpen, 2022, Ch. 2. [Google Scholar]
  • 7. Crossa  J, Pérez-Rodríguez  P, Cuevas  J, et al.  Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci  2017;22:961–75. 10.1016/j.tplants.2017.08.011. [DOI] [PubMed] [Google Scholar]
  • 8. González-Recio  O, Rosa  GJM, Gianola  D. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livest Sci  2014;166:217–31. 10.1016/j.livsci.2014.05.036. [DOI] [Google Scholar]
  • 9. CAV  B, Das Graças Dias  KO, De Sousa IC, et al.  Genomic prediction in multi-environment trials in maize using statistical and machine learning methods. Sci Rep  2024;14:1062. 10.1038/s41598-024-51792-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lourenço Vanda  M, Ogutu Joseph  O, Rodrigues Rui  AP, et al.  Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data. BMC Genomics  2024;25:152. 10.1186/s12864-023-09933-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhou  G, Gao  J, Zuo  D, et al.  MSXFGP: combining improved sparrow search algorithm with XGBoost for enhanced genomic prediction. BMC Bioinform  2023;24:384. 10.1186/s12859-023-05514-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Desta  ZA, Ortiz  R. Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci  2014;19:592–601. 10.1016/j.tplants.2014.05.006. [DOI] [PubMed] [Google Scholar]
  • 13. Jannink  J-L, Lorenz  AJ, Iwata  H. Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics  2010;9:166–77. 10.1093/bfgp/elq001. [DOI] [PubMed] [Google Scholar]
  • 14. Schmidt  M, Kollers  S, Maasberg-Prelle  A, et al.  Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet  2016;129:203–13. 10.1007/s00122-015-2639-1. [DOI] [PubMed] [Google Scholar]
  • 15. Ma  W, Qiu  Z, Song  J, et al.  A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta  2018;248:1307–18. 10.1007/s00425-018-2976-9. [DOI] [PubMed] [Google Scholar]
  • 16. Liu  Y, Wang  D, He  F, et al.  Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean. Front Genet  2019;10:1091. 10.3389/fgene.2019.01091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Montesinos-Lopez  OA, Montesinos-Lopez  JC, Salazar  E, et al.  Application of a Poisson deep neural network model for the prediction of count data in genome-based prediction. Plant Genome  2021;14:e20118. 10.1002/tpg2.20118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Gao  P, Zhao  H, Luo  Z, et al.  SoyDNGP: a web-accessible deep learning framework for genomic prediction in soybean breeding. Brief Bioinform  2023;24:1–12. 10.1093/bib/bbad349. [DOI] [PubMed] [Google Scholar]
  • 19. Wang  K, Abid  MA, Rasheed  A, et al.  DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. Mol Plant  2023;16:279–93. 10.1016/j.molp.2022.11.004. [DOI] [PubMed] [Google Scholar]
  • 20. Lindemann  B, Müller  T, Vietz  H, et al.  A survey on long short-term memory networks for time series prediction. Procedia CIRP  2021;99:650–5. 10.1016/j.procir.2021.03.088. [DOI] [Google Scholar]
  • 21. Al-Selwi  SM, Hassan  MF, Abdulkadir  SJ, et al.  RNN-LSTM: from applications to modeling techniques and beyond—systematic review. J King Saud Univ Comput Inf Sci  2024;36:102068. 10.1016/j.jksuci.2024.102068. [DOI] [Google Scholar]
  • 22. Pérez-Enciso  M, Zingaretti  LM. A guide on deep learning for complex trait genomic prediction. Genes  2019;10:553. 10.3390/genes10070553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Ige  AO, Sibiya  M. State-of-the-art in 1D convolutional neural networks: a survey. IEEE Access  2024;12:144082–105. 10.1109/ACCESS.2024.3433513. [DOI] [Google Scholar]
  • 24. Schrauf  MF, de los  Campos  G, Munilla  S. Comparing genomic prediction models by means of cross validation. Front Plant Sci  2021;12:734512. 10.3389/fpls.2021.734512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Akiba  T, Sano  S, Yanase  T. et al.  Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–31. Anchorage, AK, USA: Association for Computing Machinery, 2019. [Google Scholar]
  • 26. Crossa  J, Gdl  C, Pérez  P, et al.  Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics  2010;186:713–24. 10.1534/genetics.110.118521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Crossa  J, Jarquín  D, Franco  J, et al.  Genomic prediction of Gene Bank wheat landraces. G3  2016;6:1819–34. 10.1534/g3.116.029637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Li  J, Zhang  D, Yang  F, et al.  TrG2P: a transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. Plant Commun  2024;5:100975. 10.1016/j.xplc.2024.100975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu  H-J, Wang  X, Xiao  Y, et al.  CUBIC: an atlas of genetic architecture promises directed maize improvement. Genome Biol  2020;21:20. 10.1186/s13059-020-1930-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Endelman  JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome  2011;4:250–55. 10.3835/plantgenome2011.08.0024. [DOI] [Google Scholar]
  • 31. Xie  Z, Xu  X, Li  L, et al.  Residual networks without pooling layers improve the accuracy of genomic predictions. Theor Appl Genet  2024;137:138. 10.1007/s00122-024-04649-2. [DOI] [PubMed] [Google Scholar]
  • 32. Cheng  Q, Wang  X. Machine learning for AI breeding in plants. Genomics Proteomics Bioinformatics  2024;22:qzae051. 10.1093/gpbjnl/qzae051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Zhao  L, Tang  P, Luo  J, et al.  Genomic prediction with NetGP based on gene network and multi-omics data in plants. Plant Biotechnol J  2025;23:1190–201. 10.1111/pbi.14577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Lever  J, Krzywinski  M, Altman  N. Principal component analysis. Nat Methods  2017;14:641–2. 10.1038/nmeth.4346. [DOI] [Google Scholar]
  • 35. Wang  H, Yan  S, Wang  W, et al.  Cropformer: an interpretable deep learning framework for crop genomic prediction, plant. Communications  2024;6:101223. 10.1016/j.xplc.2024.101223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Feng  W, Gao  P, Wang  X. AI breeder: genomic predictions for crop breeding. New Crops  2024;1:100010. 10.1016/j.ncrops.2023.12.005. [DOI] [Google Scholar]
  • 37. Wu  H, Han  R, Zhao  L, et al.  AutoGP: an intelligent breeding platform for enhancing maize genomic selection. Plant Commun  2025;6:101240. 10.1016/j.xplc.2025.101240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Tessele  A, González-Diéguez  DO, Crossa  J, et al.  Improving genomic selection in hexaploid wheat with sub-genome additive and epistatic models. G3  2025;jkaf031. 10.1093/g3journal/jkaf031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Vogelstein  JT, Bridgeford  EW, Tang  M, et al.  Supervised dimensionality reduction for big data. Nat Commun  2021;12:2872. 10.1038/s41467-021-23102-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Zhou  T, Zhang  J, Liang  Y, et al.  Nonadditive regulation confers phenotypic variation in hybrid maize. New Phytol  2025;246:631–44. 10.1111/nph.20453. [DOI] [PubMed] [Google Scholar]
  • 41. Wijerathna-Yapa  A, Bishnoi  R, Ranawaka  B, et al.  Rice–wheat comparative genomics: Gains and gaps. Crop J  2024;12:656–69. 10.1016/j.cj.2023.10.008. [DOI] [Google Scholar]
  • 42. Montesinos, López  OA, Montesinos López  A, Overfitting  CJ, et al.  Multivariate Statistical Machine Learning Methods for Genomic Prediction. Cham: Springer International Publishing, 2022, 109–39. [PubMed] [Google Scholar]
  • 43. Danilevicz  MF, Gill  M, Anderson  R, et al.  Plant genotype to phenotype prediction using machine learning. Front Genet  2022;13:1–12. 10.3389/fgene.2022.822173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Azodi  CB, Tang  J, Shiu  S-H. Opening the black box: interpretable machine learning for geneticists. Trends Genet  2020;36:442–55. 10.1016/j.tig.2020.03.005. [DOI] [PubMed] [Google Scholar]
  • 45. Chen  J, Tan  C, Zhu  M, et al.  CropGS-hub: a comprehensive database of genotype and phenotype resources for genomic prediction in major crops. Nucleic Acids Res  2023;52:D1519–29. 10.1093/nar/gkad1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Xu  Y, Zhang  X, Li  H, et al.  Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. Mol Plant  2022;15:1664–95. 10.1016/j.molp.2022.09.001. [DOI] [PubMed] [Google Scholar]
  • 47. Li  H, Li  X, Zhang  P, et al.  Smart Breeding Platform: a web-based tool for high-throughput population genetics, phenomics, and genomic selection. Mol Plant  2024;17:677–81. 10.1016/j.molp.2024.03.002. [DOI] [PubMed] [Google Scholar]
  • 48. Zhu  W, Han  R, Shang  X, et al.  The CropGPT project: call for a global, coordinated effort in precision design breeding driven by AI using biological big data. Mol Plant  2024;17:215–8. 10.1016/j.molp.2023.12.015. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset utilized in this study could be found in the references provided earlier, and the source code for WheatGP and a localized GUI could be obtained from the GitHub repository (https://github.com/Breed-AI/WheatGP.git).


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES