Skip to main content
BMC Plant Biology logoLink to BMC Plant Biology
. 2026 Mar 3;26:649. doi: 10.1186/s12870-026-08478-x

Integrative multi-trait phenotyping reveals coordinated root and antioxidant responses underlying drought tolerance in soybean

Chunlei Zhang 1,#, Sobhi F Lamlom 1,2,#, Huilong Hong 3,#, Lichun Huang 4, Rongqiang Yuan 1, Fengyi Zhang 1, Xiaoyu Xia 1, Xueyang Wang 1, KeZhen Zhao 1, Xiulin Liu 1, Ahmed M Abdelghany 5, Honglei Ren 1,, Junjie Ding 6,
PMCID: PMC13067402  PMID: 41776432

Abstract

Background

Drought stress markedly constrains soybean yields; however, progress in breeding has been limited by reliance on single-trait selection. Effective drought adaptation likely requires coordinated responses across root architecture, antioxidant defense, and osmotic adjustment, yet the relationships among these traits complexes across diverse germplasm remain poorly understood.

Methods

A comprehensive assessment of drought tolerance was conducted, evaluating 301 soybean genotypes over 2 years under both well-watered and drought-stressed conditions at 40% field capacity, using 11 morphological and physiological traits. Genetic variation was characterized through analysis of variance, broad-sense heritability estimation, and genetic and phenotypic coefficients of variation. Multivariate approaches, including principal component analysis, hierarchical clustering, and five machine learning classifiers, were used to identify trait interactions and classify drought responses. The Multi-trait Genotype-Ideotype Distance Index (MGIDI) and Stress Tolerance Index (STI) were applied for multi-trait genotype selection.

Results

Root traits demonstrated high broad-sense heritability (0.74–0.77), indicating substantial genetic potential for selection. Drought conditions reduced root parameters by 10–20%, whereas catalase activity increased by 24.3%. Proline content exhibited extreme genotypic variation (50–1,500 µg g⁻¹), with the highest accumulation associated with the poorest performance, characterized by severe membrane damage and limited root development, a pattern consistent with a stress-injury profile rather than effective tolerance. Hierarchical clustering revealed that superior genotypes attained drought tolerance through coordinated, moderate responses maintaining extensive root systems (total root length 1,239 mm) and high catalase activity (156 µmol min⁻¹ g⁻¹) rather than through extreme expression of individual traits. Machine learning variable importance analysis identified catalase activity, total root length, malondialdehyde, and proline as the most discriminative traits. Six elite genotypes (Jiyu 92; Dengke No. 1; Kennong 57; Mengdou 28; Ronda 130; Jinong SB 2012 − 136) were consistently identified as top performers across MGIDI, STI, cluster membership, and drought-to-control trait ratios.

Conclusions

Effective drought adaptation necessitates balanced, multi-trait coordination rather than the maximization of individual traits. Concentrating early-stage phenotyping on the four most important traits identified by machine learning would substantially reduce per-genotype resource requirements while retaining 70–80% of the discriminative information. The six identified elite genotypes are high-priority candidates for field-based validation and potential incorporation into drought-resilient soybean breeding programs.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12870-026-08478-x.

Keywords: Drought tolerance, Soybean, Root architecture, Antioxidant enzymes, Machine learning, MGIDI, Genotype selection, Climate resilience

Introduction

Soybean (Glycine max L. Merr.) is a vital global crop, supplying around 70% of the world’s protein meal and 29% of vegetable oil used for human food and livestock feed [1]. As the fourth-most-cultivated crop worldwide, soybeans play a key role in global food security. However, yields are increasingly vulnerable to drought stress, resulting in 40–60% losses in severely affected areas [2, 3]. Climate forecasts indicate that drought events will become more frequent and severe in major soybean-producing regions, including the U.S. Midwest, the Brazilian Cerrado, and Northeast China [4]. The reproductive stage, from flowering (R1) to physiological maturity (R7), is particularly sensitive to water shortages, with drought during pod-filling reducing yields by 50–80%, even in well-adapted cultivars [5, 6]. Therefore, developing drought-tolerant soybean cultivars that sustain productivity under water-limited conditions is essential for sustainable agriculture amid climate change.

Plant drought tolerance is a complex trait involving coordinated changes at morphological, physiological, and molecular levels [7, 8]. The root system’s architecture is crucial for drought adaptation because it influences how effectively the plant explores the soil and accesses water [9, 10]. Traits such as total root length, surface area, root tip density, and deep rooting have been linked to better water uptake during drought in soybean [11, 12] and other legumes [13]. Drought also causes oxidative stress due to excess reactive oxygen species (ROS), which activate antioxidant defenses like superoxide dismutase (SOD), peroxidase (POD), and catalase (CAT) to prevent cellular damage [1416]. Osmotic adjustment, achieved by accumulating compatible solutes such as proline, soluble sugars, and proteins, helps maintain cell turgor and protect structures during dehydration [1720]. Nonetheless, how these morphological and biochemical traits are genetically controlled and how each contributes to drought tolerance, particularly across diverse germplasm evaluated under consistent conditions, remains only partially understood.

Traditional soybean breeding has made gains in yield, disease resistance, and environmental adaptation [21, 22]. However, selection for drought tolerance is challenging due to the trait’s genetic complexity, genotype-environment interactions, and testing difficulties [23]. Single-trait selection targeting root traits, osmolytes, or antioxidants has limited success in field drought tolerance, as effective adaptation requires coordinated traits [24]. This has led to multi-trait methods that focus on multiple drought-responsive traits and optimal combinations of genotypes and traits. The Multi-trait Genotype-Ideotype Distance Index (MGIDI), introduced, marks a notable advancement in multi-trait selection methods [25]. It uses factor analysis to reduce trait information and identify independent phenotypic dimensions, then ranks genotypes by their Euclidean distance from a breeder-defined ideal in the factor space. This method has proven effective for improving multiple traits simultaneously across various crops, including grain yield and quality in wheat [26], disease resistance in rice [27], nutritional traits in sweet potato [28], and yield components in common bean [29]. Nonetheless, MGIDI has yet to be systematically applied in soybean drought-tolerance breeding, where the simultaneous optimization of root architecture, antioxidant defenses, and osmotic adjustment could significantly accelerate genetic gains.

Despite advances in research on soybean drought response, significant gaps remain. Many studies used small germplasm panels (20–50 genotypes), which limits the ability to detect genotype-treatment interactions and reliably identify top-performing genotypes [30]. These studies primarily focused on either morphological or physiological traits, rarely both [31, 32]. Few studies have employed advanced analyses that integrate genetics, multivariate statistics, machine learning, and multi-trait optimization to examine complex phenotypes. Furthermore, the relationships among root traits, enzyme activities, and osmolytes, particularly whether they act independently, synergistically, or exhibit trade-offs due to resource constraints, are poorly understood across diverse populations.

This study evaluated 301 soybean genotypes under controlled drought across two seasons, measuring traits such as root morphology, antioxidant enzymes, stress markers, and osmotic compounds to examine adaptation and stress responses. Our integrated approach combined traditional genetic techniques, such as ANOVA, heritability, and correlation analysis, with modern tools, including principal component analysis, machine-learning classification, and multi-trait selection indices, to understand drought responses and identify top genotypes. We employed machine learning to classify drought stress, verify measurement accuracy, identify key traits for stress differentiation, and inform phenotyping strategies. MGIDI was used to identify genotypes that optimize multiple traits, thereby avoiding pitfalls associated with single-trait selection. Hierarchical clustering helped determine whether drought tolerance involves a single or multiple pathways. Overall, the research assesses genetic variation, identifies elite genotypes, uncovers mechanisms of adaptation, and provides a framework for breeding climate-resilient soybeans and enhancing stress tolerance.

Materials and methods

Plant material and experimental design

A panel of 301 soybean genotypes representing various genetic backgrounds was assembled for this research. The collection comprised released cultivars, advanced breeding lines, and accessions from leading breeding programs across different maturity groups. Seeds were sourced from Heilongjiang Academy of Agricultural Sciences, Harbin, China. Pot experiments were conducted during the 2024 and 2025 growing seasons at Heilongjiang Academy of Agricultural Sciences experimental station. The study employed a split-plot design with three replicates. The primary plot factor was the water regime (well-watered control vs. drought stress), and genotypes were randomized within subplots, yielding a total of 3,612 experimental units (301 genotypes × 2 treatments × 2 years × 3 replicates). Each experimental unit consisted of a single pot containing a single plant, ensuring precise control of soil moisture conditions and eliminating competitive effects among genotypes.

Well-watered control pots received irrigation throughout the growing season to maintain soil moisture near field capacity (80–90% of field capacity). Drought-stress treatments involved withholding irrigation from the R1 flowering stage through physiological maturity (R7) to induce a progressive water deficit during the reproductive phase, the most critical period for soybean yield formation. Soil moisture in drought-stressed treatments decreased to approximately 60% of field capacity by mid-stress period, indicating moderate-to-severe drought conditions. Drought-stressed pots received pre-stress irrigation similar to controls to prevent water limitation during vegetative growth, ensuring that observed differences reflected responses specifically to reproductive-phase drought stress rather than accumulated effects from earlier developmental stages.

Trait measurements

Root morphological traits

At physiological maturity (R7 stage), plants were carefully harvested, and root systems were extracted from pots by gently washing the roots to remove soil while preserving root architecture. Root systems were scanned using a flatbed scanner (Epson Perfection V800 Photo, Epson America Inc., Long Beach, CA, USA) at 600 dpi resolution. Digital images were analyzed using WinRHIZO software (Regent Instruments Inc., Quebec, Canada) to quantify root morphological parameters including total root length (TRL, mm), root tip count (RTC), and total surface area (TSA, mm²). Plant height (PH, cm) was measured from the soil surface to the tip of the main stem at maturity.

Physiological and biochemical assays

Leaf samples were collected at the R3-R4 stage (pod development) when drought stress effects were most pronounced. The third fully expanded trifoliate leaf from the top of each plant was harvested in the morning (8:00–10:00 AM), immediately frozen in liquid nitrogen, and stored at − 80 °C until biochemical analysis.

Antioxidant enzyme activities

Superoxide dismutase (SOD) activity was determined by the nitroblue tetrazolium (NBT) photoreduction method and expressed as units per milliliter (U/mL). Peroxidase (POD) activity was measured using the guaiacol oxidation method and expressed as units per gram fresh weight (U/g FW). Catalase (CAT) activity was assayed by monitoring the decomposition of H₂O₂ at 240 nm and expressed as µmol H₂O₂ decomposed per minute per gram fresh weight (µmol/min/g).

Oxidative stress marker

Malondialdehyde (MDA) content, an indicator of lipid peroxidation, was determined by the thiobarbituric acid reactive substances (TBARS) assay and expressed as nmol/g fresh weight.

Osmotic adjustment compounds

Proline content was quantified using the acid ninhydrin method and expressed as µg/g fresh weight. Soluble sugar (SS) content was measured by the anthrone method and expressed as mg/g weight. Soluble protein (SP) content was determined using the Bradford assay with bovine serum albumin as the standard and expressed as mg/g.

Statistical analysis

Analysis of variance and heritability

Analysis of variance (ANOVA) was performed for each trait using a linear mixed model with treatment, genotype, year, and treatment × genotype interaction as fixed effects, and replication nested within year as a random effect. The model was implemented using the R package ‘lme4’. Statistical significance was assessed at P < 0.05, P < 0.01, and P < 0.001 levels. Broad-sense heritability (H²) was estimated for each trait using variance components extracted from the mixed model as H² = σ²G / (σ²G + σ²GE/e + σ²ε/re), where σ²G is genetic variance, σ²GE is genotype × environment interaction variance, σ²ε is residual variance, e is the number of environments (2 years × 2 treatments = 4), and r is the number of replications (3). All heritability values reported in this study are therefore broad-sense estimates calculated across environments (two years × two treatments), explicitly accounting for genotype × environment interaction through the σ²GE/e term. Narrow-sense heritability (h²) was not estimated, as pedigree or genomic relationship data were not available for this diversity panel. Genetic coefficient of variation (GCV) and phenotypic coefficient of variation (PCV) were calculated as GCV = (√σ²G / grand mean) × 100 and PCV = (√σ²P / grand mean) × 100, where σ²P is the phenotypic variance.

Correlation and principal component analysis

Pearson correlation coefficients were calculated for all traits separately under control and drought-stress conditions using the R package ‘corrplot’. Principal component analysis (PCA) was conducted using the ‘FactoMineR’ package to identify major axes of phenotypic variation and visualize relationships among traits and genotypes. Variable contributions to principal components were assessed, and biplots were generated to display genotype distributions and trait loadings.

Machine learning classification

Five machine learning algorithms were employed to classify samples as control or drought-stressed based on 11 measured traits: Random Forest (RF), Support Vector Machine (SVM), Gradient-Boosting Machine (GBM), Neural Network (NNet), and Elastic Net regression. Models were trained using the R package ‘caret’ with 10-fold cross-validation repeated three times. Hyperparameter tuning was performed using grid search within the cross-validation framework. Model performance was evaluated using accuracy, Kappa statistic, sensitivity, specificity, precision, F1-score, and balanced accuracy. The dataset was split 70:30 for training and testing to assess generalization performance.

Drought stress index

The drought stress index for each trait was calculated as the percentage change from control to drought conditions: DSI = [(Trait_drought − Trait_control) / Trait_control] × 100. Positive values indicate increases in the trait under drought, whereas negative values indicate decreases. Statistical significance of changes was assessed using paired t-tests.

Multi-trait selection indices

The Multi-trait Genotype-Ideotype Distance Index (MGIDI) was computed using the R package ‘metan’ to identify superior genotypes across multiple traits. Factor analysis was first conducted to reduce trait dimensionality and identify independent factors. The ideotype was defined with desired values for each factor (minimum distance for stress markers, maximum for beneficial traits). Genotypes were ranked by their Euclidean distance from the ideotype in the factor space, with smaller distances indicating closer approximation to the ideal. The Stress Tolerance Index (STI) was calculated for each genotype as STI = (Yc × Yd) / (Ȳc)², where Yc is the genotype’s performance under control conditions, Yd is performance under drought, and Ȳc is the mean of all genotypes under control conditions. Higher STI values indicate better stress tolerance combined with high yield potential.

Cluster analysis

Hierarchical clustering was performed using Ward’s minimum-variance method based on Euclidean distances computed from standardized trait values. The optimal number of clusters was determined using the elbow method and silhouette analysis. Heatmaps were generated using the R package ‘pheatmap’ to visualize trait patterns within and between clusters, with within-trait z-score standardization applied for comparability. All statistical analyses were performed in R version 4.3.0 (R Core Team, 2023). Data visualization was conducted using the ‘ggplot2’, ‘ggpubr’, and ‘cowplot’ packages. Statistical significance was set at α = 0.05 unless otherwise stated.

Results

Analysis of variance for root morphology and physiological traits

The variance analysis showed highly significant effects of treatment, genotype, and their interaction on all traits measured (Table 1). Treatment effects were statistically significant (P < 0.001) for all root morphological and physiological traits, with F-values ranging from 332.58 for POD to 15,160.83 for PH, highlighting considerable phenotypic variation in response to drought stress across these traits. Genotype had highly significant effects (P < 0.001) on all traits, with F-values from 368.49 for PH to 2,738.11 for SP. These findings suggest substantial genetic diversity among the 301 genotypes for root architecture and stress-related physiological parameters. The variation was particularly notable for SP, SS, and CAT, suggesting these traits could be useful targets for breeding programs to enhance drought tolerance. Year effects were mostly non-significant, except for SOD (F = 5.02, P = 0.025) and PRO (F = 3.91, P = 0.048), showing stable conditions across two years. Minimal variation suggests consistent responses rather than fluctuations. The treatment × genotype interaction was highly significant across all traits (F-values ranging from 41.85 (PH) to 2,676.80 (SP), indicating diverse genotype responses to drought. High interaction F-values for physiological traits imply biochemical responses are more genotype-specific than root structural changes. TSA had the highest root treatment effect, with RTC and TRL, while PH showed the largest genotypic variation. PRO and CAT displayed the strongest physiological treatment effects, consistent with their roles in drought response. Their strong genotype × treatment interactions highlight their potential for selecting drought-tolerant genotypes.

Table 1.

F-values from analysis of variance for root morphology and physiological traits under drought stress

Trait Treatment (T) Genotype (G) Year (Y) T × G
PH 15160.83*** 368.49*** 0.01 41.85***
TRL 2064.75*** 469.62*** 0.46 122.62***
RTC 2591.86*** 393.11*** 0.71 112.38***
TSA 4109.46*** 487.70*** 0.03 287.37***
SOD 4300.00*** 410.09*** 5.02* 445.16***
POD 332.58*** 454.13*** 0.01 366.47***
CAT 5036.21*** 1038.50*** 2.91 519.44***
MDA 3150.96*** 433.55*** 0.75 395.01***
PRO 8117.82*** 1597.88*** 3.91* 1310.10***
SS 1073.91*** 590.83*** 0.33 550.06***
SP 4303.59*** 2738.11*** 0.09 2676.80***

PH Plant Height (cm), TRL Total Root Length (mm), RTC Root Tip Count, TSA Total Surface Area (mm²), SOD Superoxide Dismutase (U/mL), POD Peroxidase (U/g FW), CAT Catalase (µmol/min/g), MDA Malondialdehyde (nmol/g), PRO Proline (µg/g), SS Soluble Sugar (mg/g), SP Soluble Protein (mg/g). *P < 0.05, ***P < 0.001

Heritability and genetic variation of traits

Broad-sense heritability (H²), estimated across two years and two treatment environments, varied widely among traits, from 0.37 to 0.77 (Fig. 1A). Root morphological traits had the highest heritability, with plant height (H² = 0.77), total root length (H² = 0.76), and root tip count (H² = 0.74). These high values indicate a large genetic contribution to root architecture and a strong potential for genetic improvement. Among physiological traits, CAT activity had moderate to high heritability (H² = 0.63), whereas total surface area had intermediate heritability (H² = 0.58). POD activity, proline, MDA, soluble sugar, and soluble protein had moderate heritability (0.43 to 0.56), reflecting balanced genetic and environmental influences. SOD activity had the lowest heritability (H² = 0.37), indicating greater environmental influence and limited suitability for selection. The relationship between genetic and phenotypic coefficients of variation revealed distinct patterns of trait variability (Fig. 1B). Proline content showed exceptionally high variation, with GCV at 127.3% and PCV at 147.6%, far exceeding other traits. This indicates that proline accumulation responds strongly to drought and exhibits high genetic diversity. CAT activity also showed substantial variation (GCV = 66.8%, PCV = 83.9%), while total surface area and plant height varied modestly (GCV = 29.4–31.8%, PCV = 39.2–47.1%). Most traits aligned along the diagonal, suggesting that genetic variation underpins much phenotypic variation, consistent with moderate to high heritability. Variance analysis (Fig. 1C) showed that genetic variance dominated residual variance, especially for POD activity, root tip count, and root length, confirming a strong genetic basis. Traits with larger genetic variance are more responsive to selection in breeding. The negative correlation between heritability and CV (Fig. 1D) suggests that highly heritable traits have lower phenotypic variability, indicating more consistent genetic control. Conversely, traits with lower heritability are more phenotypically plastic. The moderate negative correlation implies that selecting traits with good heritability and sufficient phenotypic variation can improve breeding efficiency. Root traits and CAT activity have favorable heritability and genetic variation, making them suitable for breeding. Proline’s high variability and moderate heritability make it a good drought-response marker. Multi-trait selection combining morphological and biochemical traits may be best for developing drought-tolerant cultivars.

Fig. 1.

Fig. 1

Heritability estimates and genetic variation parameters for root morphological and physiological traits. A Broad-sense heritability (H²) values for 11 traits, with values ranging from 0.37 to 0.77. B Relationship between genetic coefficient of variation (GCV, x-axis) and phenotypic coefficient of variation (PCV, y-axis), with point colors representing heritability values. The dashed diagonal line represents perfect correspondence between genetic and phenotypic variation. C Variance component distribution showing the partition of total variance into genetic (orange) and residual (green) components for each trait. D Inverse relationship between heritability (x-axis) and average coefficient of variation (y-axis), with point size representing the magnitude of variation. The trend line indicates negative correlation between heritability and phenotypic variability

Drought stress effects on root morphological and physiological traits

Drought stress markedly changed all examined root morphological and physiological traits across the 301 genotypes (Fig. 2). The violin plots showed considerable phenotypic variation within each treatment group, highlighting the diverse responses of genotypes to water scarcity. Plant height significantly decreased (P < 0.001) under drought, dropping from 35 cm in the control to 25 cm during drought. Both total root length and root tip count showed highly significant declines (P < 0.001), with median values decreasing markedly. The broad distribution ranges under each treatment highlight extensive genetic diversity in root architecture and plasticity in response to water scarcity. Root surface area also exhibited a similar drought-induced reduction pattern (P < 0.001), although the magnitude of decline differed among genotypes.

Fig. 2.

Fig. 2

Violin plot distributions showing the effects of drought stress treatment on all measured traits across 301 genotypes. Blue violins represent the control condition, and red violins represent the drought-stress treatment. Black dots indicate individual genotype values, and box plots within each violin show the median, quartiles, and outliers. Significance levels of treatment effects are indicated above each panel (*** P < 0.001, ** P < 0.01, * P < 0.05). Traits include plant height (PH, cm), total root length (TRL, mm), root tip count (RTC), root surface area (TSA, mm²), superoxide dismutase (SOD, U/mL), peroxidase (POD, U/g FW), catalase (CAT, µmol/min/g), malondialdehyde (MDA, nmol/g), proline content (PRO, µg/g fresh weight), soluble sugar (SS, mg/g weight), and soluble protein content (SP, mg/g)

All three antioxidant enzymes showed significant increases under drought stress (P < 0.001 for SOD and POD; P < 0.01 for CAT), aligning with their roles in defending against oxidative damage. SOD activity increased from 500 U/mL in controls to 600 U/mL during drought, with some genotypes exceeding 4,000 U/mL. POD activity also increased, with median values rising during drought and overlapping distributions, implying some genotypes possess inherently high peroxidase activity. CAT activity increased from 50 to 75 µmol/min/g, indicating coordinated upregulation of ROS-scavenging mechanisms to mitigate oxidative stress during water deficiency. MDA content, a marker of lipid peroxidation and membrane damage, significantly rose (P < 0.001) from about 100 nmol/g to 125 nmol/g under drought. There is notable variation among genotypes regarding susceptibility to oxidative damage, with some maintaining low MDA levels even under stress, likely reflecting differences in antioxidant defenses and membrane stability. Proline accumulation was significantly higher under drought (P < 0.001), increasing from 50 µg/g to 150 µg/g, with some samples exceeding 1,500 µg/g. This wide variation indicates that proline accumulation could be a valuable trait for screening for drought tolerance. Soluble sugars rose significantly (P < 0.001) from 8 mg/g to 11 mg/g under drought conditions, supporting osmotic adjustment and cellular protection. Distribution data indicate that this response is common across genotypes, although the extent varies. In contrast, soluble protein levels remained relatively constant across treatments, with no significant changes, although certain genotypes showed increases or decreases. This consistency indicates mechanisms that maintain protein homeostasis under drought stress. Overall, violin plots indicate that drought triggers coordinated responses, including reduced growth, increased antioxidant levels, osmotic regulation, and metabolite accumulation. The extensive genotypic variation across traits, as evidenced by broad, overlapping distributions, highlights the genetic diversity in the germplasm for drought adaptation. This diversity offers opportunities to select superior genotypes and investigate the genetic factors underlying drought resilience in crops.

Trait correlations and multivariate analysis

Correlation patterns among traits

Correlation analysis showed distinct patterns among root traits and physiological stress responses (Fig. 3). The strongest positive correlations were among root architecture traits, notably RTC’s strong link to RL (r = 0.68) and RSA (r = 0.37), with RL and RSA also showing a high correlation (r = 0.69), indicating coordinated development. These relationships imply that selecting for one root trait may improve related features. Moderate negative correlations were observed between root traits and antioxidant enzyme activities; for example, CAT activity was inversely related to RTC (r = − 0.12), RL (r = − 0.33), and RSA (r = − 0.24), possibly reflecting a trade-off between root growth and stress defense. Among stress responses, positive correlations were observed among antioxidant enzymes: proline was positively associated with CAT (r = 0.26), SOD (r = 0.19), and MDA (r = 0.29), indicating coordinated osmotic and antioxidant responses. Proline also correlated with MDA, suggesting that higher proline levels could be linked to membrane damage or oxidative stress. Enzymes showed moderate positive associations, such as SOD with proline (r = 0.19) and MDA with CAT (r = 0.06), SOD (r = 0.17), proline (r = 0.29), and sugar (r = 0.33), highlighting complementary roles. Sugar content positively correlated with MDA (r = 0.33), SOD (r = 0.17), and CAT (r = 0.08), reflecting their roles in stress protection and signaling. Plant height was weakly negatively correlated with most stress traits, implying taller plants under drought may experience less stress. Soluble protein content showed very weak correlations (|r| < 0.11) with other traits, suggesting it functions independently and might provide unique information breeding.

Fig. 3.

Fig. 3

Correlation analysis and principal component analysis of trait relationships. Correlation matrix heatmap showing Pearson correlation coefficients among all 11 traits, with color intensity representing correlation strength (blue = positive, red = negative). Numerical values indicate correlation coefficients

Principal component analysis

Principal component analysis showed that the first two components explained 38% of phenotypic variance (PC1 = 23%, PC2 = 15%), with the first eight components accounting for 90% (Fig. 4A, B). The PCA score plot (Fig. 4A) revealed a clear separation between control and drought-stressed samples along PC1, indicating that PC1 captures the drought response. Overlap along PC2 suggests it reflects genotypic variation independent of treatment. Samples within each group were broadly distributed, confirming extensive genotypic diversity. Some control samples were projected into drought space, and vice versa, likely representing genotypes with high stress defense or drought-sensitive genotypes that maintain control-like phenotypes. Loading analysis (Fig. 4C) showed that root traits (PH, RL, RTC, RSA) loaded heavily negatively on PC1, whereas stress traits (Proline, MDA, SOD, CAT, Sugar) loaded positively, confirming PC1 as a morphological-to-physiological response gradient. Control samples exhibited greater root development, whereas drought samples showed greater activation of the stress response. POD and Protein loaded weakly on PC1. On PC2, traits showed moderate, orthogonal loadings, indicating this component captures additional phenotypic variation, possibly related to genotype-specific trait combinations or alternative stress responses. Notably, CAT and Proline contributed significantly, indicating variation in antioxidant responses among genotypes.

Fig. 4.

Fig. 4

Principal component analysis of trait relationships. A, B Scree plot showing variance explained by each principal component (bars) and cumulative variance (line). PC1 and PC2 explain 23% and 15% of total variance, respectively. C PCA loading plot displaying trait contributions to PC1 (x-axis) and PC2 (y-axis), with vector length and direction indicating the magnitude and direction of trait loadings. Root morphological traits load negatively on PC1, whereas stress-response traits load positively

Machine learning classification performance

Five machine learning algorithms were evaluated for their ability to classify samples as control or drought-stressed based on the 11 measured traits (Table 2; Fig. 5). Random Forest (RF) outperformed all models with an accuracy of 0.981, Kappa of 0.962, sensitivity of 0.980, and precision of 0.982, indicating highly discriminative trait profiles for classifying drought stress. RF correctly identifies 98% of samples. Support Vector Machine (SVM) ranked second, with 0.938 accuracy and 0.876 Kappa, showing strong discrimination with high sensitivity (0.956) and specificity (0.920). Gradient Boosting Machine (GBM) came third, with 0.890 accuracy, 0.780 Kappa, sensitivity of 0.905, and specificity of 0.876. Ensemble models like RF and GBM outperform single models, emphasizing the importance of capturing complex trait relationships for accurate classification. Neural Network (NNet) showed moderate performance: 0.837 accuracy, 0.674 Kappa, sensitivity of 0.865, and specificity of 0.809. Its lower performance compared to ensemble models may be due to the need for more data or tuning. Elastic Net regression had weaker results, with 0.729 accuracy and 0.458 Kappa, though sensitivity (0.718) and specificity (0.740) stayed above 0.7. The high Kappa values for RF and SVM confirm that their performance exceeds chance, validating the trait measurements’ capacity to discriminate drought status. It should be emphasised, however, that this classification exercise is exploratory in nature: the models were trained and cross-validated within the same 301-genotype panel and the same pot-based experimental system. External validation on independent populations or field-grown material would be required before the trait-subset recommendations can be generalised to other breeding contexts. The primary scientific outputs of the ML analysis are therefore not the accuracy values per se—treatment separation being already apparent from ANOVA and PCA—but rather (i) the variable importance hierarchy, which captures non-linear, interaction-driven trait contributions not accessible from PCA loadings or univariate F-values, and (ii) the phenotyping efficiency inference, which provides an empirical basis for trait prioritisation in early-generation screens.

Table 2.

Performance metrics of machine learning models for drought stress classification

Model Accuracy Kappa Sensitivity Specificity Precision F1 Score Balanced Accuracy
RF 0.981 0.962 0.980 0.982 0.982 0.981 0.981
SVM 0.938 0.876 0.956 0.920 0.923 0.939 0.938
GBM 0.890 0.780 0.905 0.876 0.879 0.892 0.890
NNet 0.837 0.674 0.865 0.809 0.819 0.841 0.837
ElasticNet 0.729 0.458 0.718 0.740 0.735 0.726 0.729

RF Random Forest, SVM Support Vector Machine, GBM Gradient Boosting Machine, NNet Neural Network. All metrics range from 0 to 1, with higher values indicating better performance. Kappa statistic measures agreement beyond chance. Models are ordered by accuracy from highest to lowest

Fig. 5.

Fig. 5

Performance comparison of machine learning models for drought stress classification. Bar plots show seven evaluation metrics (Precision, Sensitivity, Specificity, Accuracy, Balanced Accuracy, F1 Score, and Kappa) for five classification algorithms: Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Neural Network (NNet), and Elastic Net. All metrics range from 0 to 1, with higher values indicating better performance. RF achieved the highest performance across all metrics (> 0.96), followed by SVM (> 0.87), GBM (> 0.78), NNet (> 0.67), and Elastic Net (> 0.45)

Drought stress index and trait response patterns

The drought stress index, calculated as the percentage change from control to drought conditions, revealed differential trait responses to water deficit (Fig. 6). CAT activity showed the largest increase under drought stress (+ 24.3%, P < 0.001), indicating strong upregulation of this key antioxidant enzyme. This substantial elevation in catalase activity reflects the crucial role of hydrogen peroxide scavenging in mitigating oxidative damage during drought stress. The pronounced response suggests CAT may serve as a sensitive biochemical marker for early drought detection. Root morphological traits exhibited significant reductions under drought, with TRL showing the greatest decrease (− 19.8%, P < 0.01), followed by TSA (− 19.4%, P < 0.001), PH (− 11.2%, P < 0.001), and RTC (− 10.0%, P < 0.001). These coordinated reductions in root architectural parameters indicate that drought stress constrains overall root system development, likely due to reduced carbon allocation to below-ground biomass and water-limited cell expansion. The similar magnitudes of reduction for TRL and TSA suggest proportional scaling of root length and surface area under stress. Among stress-responsive metabolites, PRO content decreased by 8.7% (P < 0.001), an unexpected pattern given proline’s established role as an osmoprotectant. This reduction may reflect genotype-averaged responses that mask substantial variation in proline accumulation strategies, with some genotypes showing large increases while others maintain or reduce levels. Alternatively, the timing of measurement may not have captured peak proline accumulation, or other osmolytes may compensate for reduced proline levels in certain genotypes. SOD activity decreased by 7.7% (P < 0.001), POD activity by 6.1% (P < 0.001), and SS content by 3.3% (P < 0.05) under drought conditions. The reductions in SOD and POD activities contrast with the increase in CAT, suggesting a shift in the balance of antioxidant enzyme activities rather than uniform upregulation. This pattern may indicate that CAT assumes a more prominent role in ROS scavenging under severe stress, while superoxide dismutase and peroxidase activities are modulated differently, possibly due to substrate availability or post-translational regulation. SP and MDA content showed minor non-significant changes (− 0.1% and + 13.4%, respectively), indicating relative stability in protein homeostasis and moderate oxidative membrane damage across the population.

Fig. 6.

Fig. 6

Drought stress index showing percentage change in trait values under drought versus control conditions. Horizontal bar chart displays the mean percent change for each trait, with positive values (rightward bars) indicating increases under drought and negative values (leftward bars) indicating decreases. Error bars represent standard errors. Significance levels are indicated (* P < 0.05, ** P < 0.01, *** P < 0.001). CAT activity showed the largest increase (+ 24.3%, P < 0.001), while total root length (TRL) and total surface area (TSA) exhibited the greatest reductions (− 19.8% and − 19.4%, respectively, P < 0.001)

Multi-trait Genotype-Ideotype Distance Index (MGIDI) and genotype selection

MGIDI-based selection

The multi-trait genotype-ideotype distance index (MGIDI) was used to identify top-performing genotypes that best combine desirable traits across all measured parameters (Fig. 7A). The circular plot displays all 301 genotypes based on their distance from the ideotype, with selected genotypes (shown in red) having the closest multi-trait distance to the ideal. About 15–20 genotypes were chosen based on their proximity to the ideal trait combination, indicating these lines most closely match the target phenotype for drought tolerance. The strengths and weaknesses view (Fig. 7B) offers a detailed look at how individual traits contribute to the MGIDI for each genotype. Each genotype is represented by a colored polygon; deviations from the center indicate the extent to which specific traits differ from the ideotype. Overlapping patterns indicate that different genotypes achieve favorable MGIDI scores across different combinations of traits, demonstrating multiple routes to drought tolerance. Some genotypes display balanced trait profiles with moderate deviations across all traits (symmetric polygons near the center), while others show trade-offs with strong performance in certain traits offsetting weaknesses in others (asymmetric polygons). Factor analysis underlying MGIDI (Table 3) revealed that 11 traits load onto four distinct factors (FA1–FA4), explaining the main axes of phenotypic variation. FA1 was dominated by root morphological traits, with high negative loadings on RTC (− 0.82), TSA (− 0.77), and TRL (− 0.90), indicating that it represents root system architecture. FA2 captured stress-response traits with positive loadings for MDA (0.61) and negative loadings for PRO (− 0.75), CAT (− 0.71), and SS (− 0.41), indicating that it contrasts oxidative damage with protective responses. FA3 showed strong loadings for POD (− 0.65) and SS (− 0.64), while FA4 was characterized by a high loading for SP (− 0.93), indicating protein content as an independent dimension of drought response. The selection differential analysis (Table 3) indicated that MGIDI-based selection could lead to significant genetic gains for most traits. The largest potential improvements were seen in RTC (100% of the target), TRL (100%), TSA (100%), SOD (100%), PRO (100%), POD (100%), and SS (100%), with selection differentials ranging from 31.5 to 59.5% points above the mean. For example, RTC could increase from an average of 1,256 to 1,651 (SD = 395), a gain of 31.5%. These projections suggest that the chosen genotypes have significantly improved trait values that could result in meaningful advances in drought tolerance in a breeding program.

Fig. 7.

Fig. 7

Multi-trait genotype-ideotype distance index (MGIDI) for genotype selection. A Circular plot showing all 301 genotypes arranged by their multi-trait distance from the ideotype. Each point represents one genotype, with selected superior genotypes highlighted in red and non-selected genotypes in gray. The distance from the center represents the MGIDI value, with genotypes closer to the ideal combination showing smaller distances. B Strengths-and-weaknesses radar plot showing trait contributions to MGIDI for individual genotypes. Each colored polygon represents a genotype, with axes representing different factors (FA1-FA4) derived from factor analysis. Distance from the center indicates the degree of deviation from the ideotype for each factor, allowing visualization of trait trade-offs and complementarity among selected genotypes

Table 3.

Factor analysis and projected selection differentials from MGIDI

Trait FA1 FA2 FA3 FA4 Pop. Mean (Xo) Selected Mean (Xs) SD SD (%) Goal (%)
PH −0.34 −0.22 0.15 −0.08 30.2 32.8 + 2.6 + 8.6 45
TRL −0.90 −0.11 0.08 0.03 830.4 1,091.2 + 260.8 + 31.4 100
RTC −0.82 −0.14 0.12 0.05 1,193 1,564 + 371 + 31.1 100
TSA −0.77 −0.09 0.10 0.02 983.5 1,398.7 + 415.2 + 42.2 100
SOD 0.12 −0.38 −0.42 0.23 564.5 798.3 + 233.8 + 41.4 100
POD 0.08 −0.26 −0.65 0.18 237.9 368.2 + 130.3 + 54.8 100
CAT 0.15 −0.71 −0.23 0.11 69.9 111.4 + 41.5 + 59.4 100
MDA 0.19 0.61 −0.14 −0.08 114.4 98.7 −15.7 −13.7 78
PRO 0.22 −0.75 0.21 −0.12 629.8 574.2 −55.6 −8.8 58
SS 0.14 −0.41 −0.64 0.15 8.50 9.87 + 1.37 + 16.1 100
SP 0.06 0.08 0.11 −0.93 9.21 8.89 −0.32 −3.5 12

A1-FA4 factors 1–4 from factor analysis, Xo original population mean, Xs mean of selected genotypes, SD selection differential (Xs - Xo), SD (%) selection differential as percentage of population mean, Goal (%) percentage of selection goal achieved. Loadings >|0.40| are considered substantial. PH Plant Height, TRL Total Root Length, RTC Root Tip Count, TSA Total Surface Area, SOD Superoxide Dismutase, POD Peroxidase, CAT Catalase, MDA Malondialdehyde, PRO Proline, SS Soluble Sugar, SP Soluble Protein

Stress tolerance index ranking

Complementary to MGIDI, genotypes were ranked using the stress tolerance index (STI), which quantifies performance stability across control and drought conditions (Fig. 8). The top 30 genotypes identified by STI ranking included Kennong 57, Medium Yellow 35, Suinong 49, Black Farmer 85, and Heihe 40, all achieving high scores (> 40) and STI values > 2.0. These genotypes excel under drought conditions while maintaining competitive yields under favorable conditions, a vital trait for developing climate-resilient crops. Notably, several, such as Dengke No. 1, Jiyu92, Jiyu88, Kennong 57, Mengdou 28, and Ronda 130, appeared in both MGIDI and STI rankings, indicating they are promising candidates for multi-environment performance. The agreement between these two independent selection methods increases confidence in their drought tolerance. Additional high-ranking STI genotypes, including Jinong SB2012-136, Changnong 12, Kenjian beans 25, and Jiyu 299, may possess specific adaptive traits that mitigate yield loss under stress, warranting further study of their physiological mechanisms.

Fig. 8.

Fig. 8

Top 30 genotypes ranked by stress tolerance index (STI). Horizontal bar chart showing ranking scores (x-axis) for the 30 highest-performing genotypes under drought stress. Bar colors represent STI values, with warmer colors (yellow-green) indicating higher stress tolerance. Genotypes are ranked from highest to lowest, with Kennong 57, Medium Yellow 35, and Suinong 49 ranking first, second, and third, respectively. STI values range from approximately 1.0 to 3.0, with top genotypes exceeding 2.5

Genotypic clustering and trait architecture

Analysis of drought-to-control ratios using heatmaps reveals clear patterns that distinguish higher- and lower-performing genotypes (Fig. 9). The top-performing genotypes—such as Jiyu92, Dengke No. 1, Jinong SB2012-136, Mengdou 28, Kennong 57, and Ronda 130—regularly exhibit stronger root traits (TRL, RTC, TSA) under drought, indicating that they better maintain root structure. These genotypes also tend to increase antioxidant enzymes (CAT, SOD, POD) and keep MDA levels stable, showing effective oxidative stress management. Conversely, lower-performing genotypes experienced larger declines in root parameters and had more variable or excessive increases in stress markers. Some exhibited very high proline levels without sufficient membrane protection, aligning with the Cluster 3 profile of poor stress response. The heatmap clustering highlights that successful drought adaptation involves coordinated responses across multiple traits, rather than extreme changes in just a few. Importantly, genotypes like Jiyu92, Dengke No. 1, Kennong 57, Mengdou 28, Ronda 130, and Jinong SB2012-136 consistently rank highly across various selection methods, such as MGIDI, STI, cluster membership, and drought response ratios—strongly confirming their drought tolerance. These promising genotypes are ideal candidates for developing new varieties or as breeding parents to improve drought resilience.

Fig. 9.

Fig. 9

Heatmap of drought-to-control trait ratios for top and bottom performing genotypes. The heatmap displays standardized ratios (z-scores) of drought to control values for the top 20 (upper rows) and bottom 20 (lower rows) genotypes based on overall performance ranking. Colors represent relative performance under drought, with warm colors (red-yellow) indicating traits maintained or enhanced under stress and cool colors (blue) indicating traits severely reduced under drought. Genotype names are listed on the y-axis, and traits on the x-axis include CAT activity, MDA content, plant height, POD activity, proline content, root tip count, SOD activity, soluble protein, soluble sugar, total root length, and total surface area. Superior genotypes (Jiyu92, Dengke No. 1, Kennong 57, etc.) show coordinated upregulation of protective mechanisms while maintaining root architecture

Discussion

This comprehensive evaluation of 301 soybean genotypes revealed substantial genetic variation for drought tolerance, with heritability estimates and multi-trait selection indices successfully identifying elite genotypes characterized by coordinated stress responses. The integration of quantitative genetics, machine learning, and multi-trait optimization provides both mechanistic insights into drought adaptation and actionable breeding recommendations.

Genetic architecture and trait heritability

Broad-sense heritability estimates across two years and two treatment environments, ranging from 0.37 to 0.77, are consistent with previous reports in soybean and other legumes. Root morphological traits exhibited the highest heritability, including plant height (0.77), total root length (0.76), and root tip count (0.74), values comparable to those reported by Brensha et al. for soybean root traits [33]. These high heritability’s indicate that root architectural variation is primarily genetically determined, consistent with quantitative genetic control by multiple additive loci [3436], and suggest strong potential for genetic improvement through selection. The moderate heritability of physiological traits, including catalase (0.63), peroxidase (0.56), and proline (0.48), reflects the dynamic nature of stress-responsive pathways and greater environmental sensitivity. The large treatment-by-genotype interactions observed for all traits, particularly pronounced for physiological responses, with F-values exceeding 500 for catalase and proline, demonstrate genotype-specific plasticity rather than uniform trait scaling. This pattern necessitates direct selection under target drought environments [19, 37, 38], as we applied by basing selections on drought-exposed trait values rather than well-watered performance.

Physiological mechanisms of drought adaptation

The divergent antioxidant enzyme responses, with catalase increasing by 24.3% while superoxide dismutase and peroxidase decreasing by 7.7% and 6.1%, respectively, provide insight into shifts in reactive oxygen species production under drought. As photosynthetic electron transport declines due to stomatal closure, superoxide radical production decreases, thereby reducing the availability of superoxide dismutase substrates [39, 40]. Simultaneously, hydrogen peroxide accumulates from photorespiration, which intensifies under drought due to high oxygen-to-carbon dioxide ratios, and from beta-oxidation and peroxisomal reactions [41, 42]. Catalase’s exceptionally high turnover rate of 4 × 107 per second makes it particularly suited to removing abundant hydrogen peroxide [43], which explains its strong upregulation. The variable importance analysis identified catalase as the most discriminative trait, with an importance score of 100, supporting its central role in drought tolerance: genotypes with high catalase activity likely maintain lower cellular hydrogen peroxide levels and prevent oxidative damage.

The unexpected average decrease of 8.7% in proline content, coupled with extreme genotypic variation, represents our most intriguing finding. The population-averaged decrease likely reflects temporal dynamics, as proline accumulation peaks during early stress then declines due to utilization, feedback inhibition, or upregulation of degradation pathways [44, 45]. Our sampling at the R3-R4 stage, potentially three to four weeks after stress initiation, may have missed the peak of accumulation in some genotypes; moreover, a single time-point measurement cannot distinguish sustained hyperaccumulation from active proline degradation via proline dehydrogenase (ProDH), meaning genotypes with transiently high proline that are already recovering may be misclassified as constitutive accumulators. More importantly, clustering analysis revealed that extreme proline accumulation in Cluster 3 (1,514 micrograms per gram) was associated with the poorest performance, including the highest malondialdehyde (161 nanomoles per gram) and the lowest root development (693 millimeters total root length). This pattern is consistent with a stress-injury profile, though enzymatic data on P5CS and ProDH activities and temporal profiling across the stress period would be required to confirm this interpretation mechanistically [46]. Conversely, Cluster 2 genotypes demonstrating superior performance maintained moderate proline levels (532 micrograms per gram), achieved exceptional catalase activity (156 micromoles per minute per gram), and exhibited robust root systems. This pattern is consistent with the interpretation that effective drought tolerance may be less dependent on extreme osmolyte accumulation than on preventing severe stress through sustained water uptake and efficient scavenging of reactive oxygen species, though this inference is based on observational data and requires experimental confirmation. This interpretation aligns with Nayyar and Walia [47], who found that drought-tolerant chickpeas accumulated less proline than sensitive genotypes, which showed hyperaccumulation concurrent with membrane damage. The positive correlation between proline and malondialdehyde at 0.29 further supports this association, indicating that high proline co-occurs with oxidative injury in the most stressed genotypes; however, whether proline accumulation is a cause, consequence, or parallel outcome of membrane damage cannot be determined from these data alone.

The coordinated reduction in root traits by 10 to 20% reflects both turgor-limited cell expansion [48] and programmed developmental modifications involving abscisic acid-triggered auxin redistribution [49]. The moderate negative correlations between root traits and antioxidant enzymes suggest potential resource allocation trade-offs between structural and biochemical defenses, as documented in rice, where high constitutive antioxidant capacity was associated with reduced root biomass [50]. However, Cluster 2 genotypes combining extensive roots with high catalase activity demonstrate that these trade-offs can be overcome through breeding, indicating that superior drought tolerance results from balanced investment in both systems rather than from choosing between morphological and biochemical adaptation.

Adaptive strategies and genotype selection

Hierarchical clustering revealed three distinct strategies. Cluster 2 genotypes combining high catalase, extensive roots, and moderate stress markers represent a stress avoidance and mitigation strategy where genotypes prevent severe cellular stress through maintained water uptake while mitigating unavoidable oxidative stress through robust antioxidant defense. This proactive strategy proved more effective than the reactive approaches of Cluster 1, with resource conservation, and Cluster 3, with an inadequate stress response, which showed extreme proline accumulation and severe membrane damage. The convergence of six genotypes, including Jiyu 92, Dengke No. 1, Kennong 57, Mengdou 28, Ronda 130, and Jinong SB2012-136, across multiple selection criteria, including MGIDI, Stress Tolerance Index, cluster membership, and drought-to-control ratios, provides robust internal validation of their drought-adaptive phenotype under the conditions studied. Notably, these genotypes exhibit balanced, coordinated responses rather than extreme trait values, suggesting successful breeding should target intermediate optima rather than maximum single-trait expression. As a proposed framework pending field confirmation, a three-stage selection approach may be considered. Stage 1, covering F2 to F4 generations, could screen for high-heritability traits, including root architecture and catalase under controlled drought, targeting the upper 20th percentile. Stage 2, covering F4 to F6, should evaluate broader trait suites and apply MGIDI selection to advance the top 5–10%. Stage 3, covering F7 onward, would conduct multi-location field trials with yield evaluation under managed drought, selecting lines showing at least 10% yield advantage over checks. The viability of this framework is contingent on field validation, as genotype rankings under pot-based conditions may differ under field-scale soil heterogeneity and natural drought variability. Strategic crosses could focus on Cluster 2 by Cluster 2 combinations, such as Jiyu 92 crossed with Dengke No. 1 to produce transgressive segregants via favorable allele combinations, while avoiding Cluster 3 genotypes as parents because of their stress-sensitive syndrome. The machine learning variable importance analysis identifying catalase, total root length, malondialdehyde, and proline as most discriminative traits suggests phenotyping could be streamlined to these four traits in early selection stages, substantially reducing per-genotype resource requirements compared with assaying all 11 traits, while maintaining 70 to 80% of discriminative information. Precise cost savings will depend on laboratory infrastructure, assay throughput, and local reagent pricing, and the four-trait subset should be considered an evidence-based starting point for resource optimisation rather than a validated cost-reduction guarantee.

In conclusion, this study identified substantial genetic variation for drought tolerance with high heritability for root traits, enabling effective selection. Superior tolerance is characterized by coordinated moderate responses that combine maintained root architecture with enhanced catalase activity, rather than by extreme proline accumulation, which is associated with a stress-injury profile. Six elite genotypes warrant field validation and use in crossing programs. Critical next steps include field trials to measure grain yield under natural drought, genomic characterization to enable marker-assisted selection, and time-series physiological profiling to capture response dynamics. These findings provide a promising foundation for subsequent breeding research and a methodological framework for multi-trait selection in complex stress tolerance improvement programs.

Experimental limitations and scope

Several limitations of the present study must be acknowledged to define the boundary conditions of its conclusions. First, the exclusive use of pot-based experiments constrains root exploration to a finite substrate volume, potentially compressing the architectural variation that would be expressed in field soils. The progressive drought imposed by withholding irrigation from R1 to R7 is reproducible and well-controlled, but does not replicate the spatial heterogeneity, hydraulic gradients, or episodic nature of field drought. Root phenotypes identified as superior under pot conditions, such as the extensive root systems of Cluster 2 genotypes (TRL > 1,100 mm), should therefore be regarded as candidates for field validation rather than guaranteed field performers. Second, grain yield was not measured in this study. The stress tolerance index (STI) computed from vegetative and biochemical traits provides a reasonable proxy for stress resistance, but its relationship to actual yield-based STI under field drought conditions remains unvalidated. Third, although two consecutive seasons were evaluated at the same experimental station, multi-environment replication across contrasting locations, soil types, or drought timing scenarios was not conducted. Genotype × environment interactions are expected to alter both trait expression and relative genotype rankings across diverse production environments. Fourth, no independent validation cohort was used to confirm the selection criteria. The convergence of six elite genotypes across multiple within-study metrics (MGIDI, STI, cluster membership, drought-to-control ratios) increases internal confidence, but external validation on independently sourced material is required. Fifth, all physiological and biochemical measurements were taken at a single time point (R3–R4), representing a snapshot rather than a dynamic profile of the drought response. Temporal profiling across the stress period and recovery phase would provide a more complete picture of genotypic response trajectories and is particularly important for interpreting proline accumulation data. These limitations do not diminish the value of identifying consistent, multi-trait drought-adaptive signatures across a large, diverse germplasm panel under standardised conditions, but they define the scope of the study’s conclusions and establish the necessary research agenda for subsequent validation.

Conclusions

This comprehensive analysis of 301 soybean genotypes revealed significant genetic diversity in drought tolerance during reproductive development. Root traits exhibited high broad-sense heritability (0.74–0.77), estimated across two years and two treatment environments, supporting effective selection. The study’s three key findings have important implications for breeding. First, drought tolerance depends on balanced, moderate responses across multiple traits, rather than extreme levels of individual stress markers. Genotypes in Cluster 2 exemplify this, with extensive root systems, elevated catalase activity (156 micromoles per minute per gram), and moderate stress indicators. Conversely, high proline levels (over 1,500 micrograms per gram) were associated with a stress-injury profile rather than adaptation; this pattern is observational and requires temporal and enzymatic validation before a mechanistic interpretation can be drawn. Breeders should prioritize trait patterns over maximum individual trait expression. Second, variable importance analysis highlighted catalase activity, total root length, malondialdehyde content, and proline levels as the most informative traits for classifying drought tolerance. This allows phenotyping to focus on these four traits for early-stage selection, reserving full evaluation of all 11 traits for advanced lines. Such a tiered approach would substantially reduce per-genotype phenotyping resource requirements, with the precise savings depending on laboratory infrastructure and local reagent pricing, while retaining 70–80% of discriminative information. Third, six elite genotypes—Jiyu 92, Dengke No. 1, Kennong 57, Mengdou 28, Ronda 130, and Jinong SB2012-136—were consistently identified as high-priority candidates for field-based validation. Jiyu 92 and Dengke No. 1 are recommended for multi-location field trials as the next research priority. All six are proposed as promising parent lines in breeding programs aimed at improving drought tolerance, contingent on confirmation of their performance under field and multi-environment conditions. Future steps include validating grain yield under natural drought, conducting genomic analyses for marker-assisted selection, and conducting physiological studies over time to elucidate drought-response dynamics. These insights provide a strong observational foundation and a methodological framework for multi-trait selection in complex stress tolerance programs.

Supplementary Information

Supplementary Material 1. (99.1KB, xlsx)

Acknowledgements

Special thanks to the Soybean intellect design breeding laboratory of Heilongjiang Academy of Agricultural Sciences for providing platform support. Soybean Germplasm Resources Team of the Institute of Crop Sciences, Chinese Academy of Agricultural Sciences for providing soybean germplasm resources.

Authors’ contributions

Conceptualization: Kenzhen Zhao, Huilong Hon, and Junjie Ding.; methodology, Xueyang Wang, and Chunlei Zhang; formal analysis, Fengyi Zhang, and Ahmed M. Abdelghany; investigation, Xiaoyu Xia, and Xinlei Liu; resources, Ahmed M. Abdelghany, and Honglei Ren; data curation, Sobhi F. Lamlom; writing—original draft preparation, Sobhi F. Lamlom; writing—review and editing, Sobhi F. Lamlom; visualization, Sobhi F. Lamlom, and Honglei Ren. All authors have read and agreed to the published version of the manuscript.

Funding

This work was Supported by the 2024 Science and Technology Support Project of the Inner Mongolia Innovation Center of Biological Breeding Technology -Biotechnology-Based Breeding of High-Quality Soybeans and Application Demonstration(2024NSZC04); Scientific Research Business Expenses of Heilongjiang Scientific Research Institutes (CZKYF2025-1-C011); Agricultural Science and Technology Innovation Leaping Project in Heilongjiang Province (Grant No.CX25JC48).

Data availability

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Declarations

Ethics approval and consent to participate

This manuscript is original research and has not been published or submitted in other journals.

Consent for publication

All authors listed have read the complete manuscript and have approved the submission of the paper.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Chunlei Zhang, Sobhi F. Lamlom and Huilong Hong contributed equally to this work.

Contributor Information

Honglei Ren, Email: renhonglei2022@163.com.

Junjie Ding, Email: me999@126.com.

References

  • 1.Kumari S, Dambale AS, Samantara R, Jincy M, Bains G. Introduction, History, Geographical Distribution, Importance, and Uses of Soybean (Glycine max L.). In: Singh KP, Singh NK, T A, editors. Soybean Production Technology. Singapore: Springer; 2025. 10.1007/978-981-97-8677-0_1.
  • 2.Mishra R, Tripathi M, Sikarwar R, Singh Y, Tripathi N. Soybean (Glycine max L. Merrill): A multipurpose legume shaping our world. Plant Cell Biotechnol Mol Biology. 2024;25(3–4):17–37. [Google Scholar]
  • 3.Mangena P. Genetic transformation to confer drought stress tolerance in soybean (Glycine max L). Sustainable Agric Reviews. 2020;45:193–224. Legume Agriculture and Biotechnology 1. [Google Scholar]
  • 4.Mandel I, Lipovetsky S. Climate Change Report IPCC 2021 – A Chimera of Science and Politics (August 30, 2021). Available at SSRN: https://ssrn.com/abstract=3913788 or 10.2139/ssrn.3913788.
  • 5.Chiipanthenga MK. Drought tolerance in Malawian soybean (Glycine Max L.) germplasm. University of the Free State; 2020. (Doctoral dissertation, University of the Free State). http://hdl.handle.net/11660/10968.
  • 6.Sun H, Shen S, Yang J, Zou J, Harrison MT, Wang Z, Hu J, Guo H, Umburanas RC, Zhai Y. Soybean Cultivar Breeding Has Increased Yields Through Extended Reproductive Growth Periods and Elevated Photosynthesis. Plants. 2025;14(11):1675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bashir SS, Hussain A, Hussain SJ, Wani OA, Zahid Nabi S, Dar NA, Baloch FS, Mansoor S. Plant drought stress tolerance: Understanding its physiological, biochemical and molecular mechanisms. Biotechnol Biotechnol Equip. 2021;35(1):1912–25. [Google Scholar]
  • 8.Kaur S, Seem K, Duhan N, Kumar S, Kaundal R, Mohapatra T. Transcriptome and Physio-Biochemical Profiling Reveals Differential Responses of Rice Cultivars at Reproductive-Stage Drought Stress. Int J Mol Sci vol. 2023;24:1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ranjan A, Sinha R, Singla-Pareek SL, Pareek A, Singh AK. Shaping the root system architecture in plants for adaptation to drought stress. Physiol Plant. 2022;174(2):e13651. [DOI] [PubMed] [Google Scholar]
  • 10.Kumar S, Kumar S, Krishnan GS, Mohapatra T. Molecular basis of genetic plasticity to varying environmental conditions on growing rice by dry/direct-sowing and exposure to drought stress: Insights for DSR varietal development. Front Plant Sci. 2022;13:1013207. 10.3389/fpls.2022.1013207. [DOI] [PMC free article] [PubMed]
  • 11.de Souza LT, de Castro SAQ, de Andrade JF, Politano AA, Meneghetti EC, Favarin JL, de Almeida M, Mazzafera P. Drought stress induces changes in the physiology and root system of soybean plants. Brazilian J Bot. 2021;44(4):779–89. [Google Scholar]
  • 12.Seem K, Kumar S. Cultivation of Rice: Evolving towards climate-smart crops for precision in resource use efficiency. MC Agric Environ Sci. 2021;1:41–9. [Google Scholar]
  • 13.Khatun M, Sarkar S, Era FM, Islam AM, Anwar MP, Fahad S, Datta R, Islam AA. Drought stress in grain legumes: Effects, tolerance mechanisms and management. Agronomy. 2021;11(12):2374. [Google Scholar]
  • 14.Hasanuzzaman M, Nahar K, Gill SS, Fujita M. Drought stress responses in plants, oxidative stress, and antioxidant defense. Climate change and plant abiotic stress tolerance. In: Tuteja N, Gill SS, editors. 2013. p. 209–50. 10.1002/9783527675265.ch09.
  • 15.Hussain S, Rao MJ, Anjum MA, Ejaz S, Zakir I, Ali MA, Ahmad N, Ahmad S. Oxidative stress and antioxidant defense in plants under drought conditions. Plant abiotic stress tolerance: agronomic, molecular and biotechnological approaches. Springer; 2019. p. 207–19. 10.1007/978-3-030-06118-0_9.
  • 16.Sasi M, Awana M, Samota MK, Tyagi A, Kumar S, Sathee L, Krishnan V, Praveen S, Singh A. Plant growth regulator induced mitigation of oxidative burst helps in the management of drought stress in rice (Oryza sativa L). Environ Exp Bot. 2021;185:104413. [Google Scholar]
  • 17.Zivcak M, Brestic M, Sytar O. Osmotic adjustment and plant adaptation to drought stress. Drought stress tolerance in plants, Vol 1: physiology and biochemistry. Springer; 2016. p. 105–43. 10.1007/978-3-319-28899-4_5.
  • 18.Chen H, Jiang J-G. Osmotic adjustment and plant adaptation to environmental changes related to drought and salinity. Environ Reviews. 2010;18(NA):309–19. [Google Scholar]
  • 19.Kumar S, Seem K, Mohapatra T. Biochemical and Epigenetic Modulations under Drought: Remembering the Stress Tolerance Mechanism in Rice. Life vol. 2023;13:1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tyagi A, Kumar S, Mohapatra T. Biochemical, physiological and molecular responses of rice to terminal drought stress: transcriptome profiling of leaf and root reveals the key stress-responsive genes. J Plant Biochem Biotechnol. 2025;34(1):191–210. [Google Scholar]
  • 21.Koester RP, Skoneczka JA, Cary TR, Diers BW, Ainsworth EA. Historical gains in soybean (Glycine max Merr.) seed yield are driven by linear increases in light interception, energy conversion, and partitioning efficiencies. J Exp Bot. 2014;65(12):3311–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Qi J, Zhang S, Azam M, Shaibu AS, Abdelghany AM, Feng Y, Huai Y, Feng H, Liu Y, Ma C. Profiling seed soluble sugar compositions in 1164 Chinese soybean accessions from major growing ecoregions. Crop J. 2022;10(6):1825–31. [Google Scholar]
  • 23.Blum A. Plant breeding for water-limited environments. Springer Science & Business Media; 2010. Nov 9. 978-1-4419-7490-7. 10.1007/978-1-4419-7491-4.
  • 24.Pradhan J, Katiyar D, Hemantaranjan A. Drought mitigation strategies in pulses. Pharm Innov J. 2019;8:567–76. [Google Scholar]
  • 25.Olivoto T, Nardino M. MGIDI: Toward an effective multivariate selection in biological experiments. Bioinformatics. 2021;37(10):1383–9. [DOI] [PubMed] [Google Scholar]
  • 26.Silva CMe, Mezzomo HC, Ribeiro JPO, Freitas DSd, Nardino M. Multi-trait selection of wheat lines under drought-stress condition. Bragantia. 2023;82:e20220254. [Google Scholar]
  • 27.Jalalifar R, Sabouri A, Mousanejad S, Dadras AR. Estimation of genetic parameters and identification of leaf blast-resistant rice RILs using cluster analysis and MGIDI. Agronomy. 2023;13(11):2730. [Google Scholar]
  • 28.Alam Z, Sarker U, Akter S, KHAN M, Roychowdhury R, Alarifi S. Evaluation of 17 sweet potato (Ipomoea batatas L.) genotypes across five environments for high yield and stability. Turkish J Agric Forestry. 2024;48(5):703–19. [Google Scholar]
  • 29.Ambrósio M, Daher RF, Santos RM, Santana JGS, Vidal AKF, Nascimento MR, Leite CL, de Souza AG, Freitas RS, Stida WF. Multi-trait index: selection and recommendation of superior black bean genotypes as new improved varieties. BMC Plant Biol. 2024;24(1):525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kaler AS, Ray JD, Schapaugh WT, King CA, Purcell LC. Genome-wide association mapping of canopy wilting in diverse soybean genotypes. Theor Appl Genet. 2017;130(10):2203–17. [DOI] [PubMed] [Google Scholar]
  • 31.Azim JB, Rahman MM, Ali MY, Islam T, Kabir MP, Imran S, Hossain A. Marker-Assisted Breeding of Soybean for Drought Tolerance. Marker-Assisted Breeding in Legumes for Drought Tolerance. Singapore: Springer; 2025. p. 301–22. 10.1007/978-981-96-4112-3_12.
  • 32.Fenta BA, Beebe SE, Kunert KJ, Burridge JD, Barlow KM, Lynch JP, Foyer CH. Field phenotyping of soybean roots for drought stress tolerance. Agronomy. 2014;4(3):418–35. [Google Scholar]
  • 33.Brensha W, Kantartzi SK, Meksem K, Grier IVRL, Barakat A, Lightfoot DA, Kassem MA. Genetic analysis of root and shoot traits in the ‘Essex’by ‘Forrest’recombinant inbred line (RIL) population of soybean [Glycine max (L.) Merr]. Plant Genet Genomics Biotechnol. 2012;1(1):1–9. [Google Scholar]
  • 34.Zhang J, McDonald SC, Wu C, Ingwers MW, Abdel-Haleem H, Chen P, Li Z. Quantitative trait loci underlying flooding tolerance in soybean (Glycine max). Plant Breeding. 2022;141(2):236–45. [Google Scholar]
  • 35.Yang Q, Lin G, Lv H, Wang C, Yang Y, Liao H. Environmental and genetic regulation of plant height in soybean. BMC Plant Biol. 2021;21(1):63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhang C, Zha B, Yuan R, Zhao K, Sun J, Liu X, Wang X, Zhang F, Zhang B, Lamlom SF. Identification of quantitative trait loci for node number, pod number, and seed number in soybean. Int J Mol Sci. 2025;26(5):2300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cooper M, Messina CD. Breeding crops for drought-affected environments and improved climate resilience. Plant Cell. 2023;35(1):162–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.AbdElgalil MAS, Hefzy M, Sas-Paszt L, Ali HM, Lamlom SF, Abdelghany AM. Unraveling the influence of water and nitrogen management on quinoa (Chenopodium quinoa willd.) agronomic and yield traits. Water. 2023;15(7):1296. [Google Scholar]
  • 39.Apel K, Hirt H. Reactive oxygen species: metabolism, oxidative stress, and signal transduction. Annu Rev Plant Biol. 2004;55(1):373–99. [DOI] [PubMed] [Google Scholar]
  • 40.Lamlom SF, Abdelghany AM, Ren H, Ali HM, Usman M, Shaghaleh H, Hamoud YA, El-Sorady GA. Revitalizing maize growth and yield in water-limited environments through silicon and zinc foliar applications. Heliyon. 2024;10(15):e35118. 10.1016/j.heliyon.2024.e35118. [DOI] [PMC free article] [PubMed]
  • 41.Noctor G, Mhamdi A, Foyer CH. The roles of reactive oxygen metabolism in drought: not so cut and dried. Plant Physiol. 2014;164(4):1636–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ochar K, Lamlom SF. Identification of the genetic locus associated with the crinkled leaf phenotype in a soybean (Glycine max L.) mutant by BSA-Seq technology. J Integr Agric. 2022;21(12):3524–39. [Google Scholar]
  • 43.Mhamdi A, Queval G, Chaouch S, Vanderauwera S, Van Breusegem F, Noctor G. Catalase function in plants: a focus on Arabidopsis mutants as stress-mimic models. J Exp Bot. 2010;61(15):4197–220. [DOI] [PubMed] [Google Scholar]
  • 44.Szabados L, Savouré A. Proline: a multifunctional amino acid. Trends Plant Sci. 2010;15(2):89–97. [DOI] [PubMed] [Google Scholar]
  • 45.Ghosh S, Zhang S, Azam M, Gebregziabher BS, Abdelghany AM, Shaibu AS, Qi J, Feng Y, Agyenim-Boateng KG, Liu Y. Natural variation of seed tocopherol composition in diverse world soybean accessions from maturity group 0 to VI grown in China. Plants. 2022;11(2):206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Vendruscolo ECG, Schuster I, Pileggi M, Scapim CA, Molinari HBC, Marur CJ, Vieira LGE. Stress-induced synthesis of proline confers tolerance to water deficit in transgenic wheat. J Plant Physiol. 2007;164(10):1367–76. [DOI] [PubMed] [Google Scholar]
  • 47.Nayyar H, Walia D. Water stress induced proline accumulation in contrasting wheat genotypes as affected by calcium and abscisic acid. Biol Plant. 2003;46(2):275–9. [Google Scholar]
  • 48.Hsiao TC, Xu LK. Sensitivity of growth of roots versus leaves to water stress: biophysical analysis and relation to water transport. J Exp Bot. 2000;51(350):1595–616. [DOI] [PubMed] [Google Scholar]
  • 49.Deak KI, Malamy J. Osmotic regulation of root system architecture. Plant J. 2005;43(1):17–28. [DOI] [PubMed] [Google Scholar]
  • 50.Upadhyay MK, Majumdar A, Srivastava AK, Bose S, Suprasanna P, Srivastava S. Antioxidant enzymes and transporter genes mediate arsenic stress reduction in rice (Oryza sativa L.) upon thiourea supplementation. Chemosphere. 2022;292:133482. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (99.1KB, xlsx)

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).


Articles from BMC Plant Biology are provided here courtesy of BMC

RESOURCES