A lightweight hybrid transformer approach for hyperspectral imaging-based drought tolerance evaluation in tea plants

Yuchen Li; Yi Zhang; Yu Wang; Hao Chen; Xiao Han; Yilin Mao; Litao Sun; Jiazhi Shen; Zhaotang Ding

doi:10.1186/s13007-025-01487-1

. 2025 Dec 31;22:11. doi: 10.1186/s13007-025-01487-1

A lightweight hybrid transformer approach for hyperspectral imaging-based drought tolerance evaluation in tea plants

Yuchen Li ¹, Yi Zhang ², Yu Wang ³, Hao Chen ³, Xiao Han ³, Yilin Mao ³, Litao Sun ¹, Jiazhi Shen ^1,^✉, Zhaotang Ding ^1,^✉

PMCID: PMC12866587 PMID: 41476286

Abstract

Background

In Shandong Province of China, where annual precipitation is below 800 mm, tea plants face persistent drought stress exacerbated by global warming. Breeding drought-tolerant tea cultivars is one of the effective ways to cope with this challenge. However, traditional breeding approaches are still limited by prolonged cycles, low efficiency, and subjective evaluation. To overcome these limitations, the development of rapid and objective germplasm evaluation methods has become critical‌.

Results

In this study, hyperspectral images of leaves from 12 widely cultivated ‘Lucha series’ tea cultivars in Shandong Province during different drought periods were collected, and the drought-related physiological indicators were measured simultaneously. Then, a tea drought tolerance index (TDTI) with enhanced accuracy was established by integrating the rate of change of indicators with temporal weights and indicator weights. Subsequently, we developed a novel lightweight Transformer-based hybrid integrated architecture to establish prediction models for the physiological indicators and TDTI. The Transformer-based models synergistically combined a Transformer encoder with XGBoost and LightGBM within a lightweight framework that leverages ensemble learning, data augmentation, and regularization to ensure robustness on limited datasets. Finally, we compared the performance of Transformer-based models against traditional machine learning models. The optimal models for MRP, MDA, Pro, SS, ChlT and TDTI were identified as 1D-CARS-TF, 2D-UVE-SVM, 2D-UVE-BRR, 2D-CARS-SVM, 1D-UVE-TF-CNN, and 2D-UVE-TF, respectively, achieving determination coefficient (R²) of 0.8992, 0.8307, 0.8929, 0.8373, 0.7894, and 0.7614, on an independent test set. The results demonstrated that the lightweight Transformer-based models equipped with multi-head self-attention mechanism exhibited outstanding capabilities in processing indicators requiring multi-band correlation mining. Simultaneously, feature selection algorithms and overfitting-mitigation optimization strategies played a critical role in enhancing both the accuracy and stability of the Transformer-based models..

Conclusions

This study established a robust technical foundation for rapid, accurate, and non-destructive comprehensive evaluation of drought tolerance for tea plant germplasm resources. However, it should be noted that they were based on a specific set of greenhouse-cultivated samples, and further validation under field conditions with expanded germplasm resources would strengthen generalizability. Anyway, the demonstrated potential of the Transformer-based model in our study advances phenomics of tea plants toward greater intelligence and efficiency.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13007-025-01487-1.

Keywords: Tea plants, Drought tolerance, Hyperspectral imaging, Lightweight transformer, High-throughput phenotyping, Deep learning

Introduction

Tea plants [Camellia sinensis (L.) O. Kuntze], as a popular beverage crop and also an important economic crop, are widely planted in China. Tea plants prefer humid environments and are highly sensitive to drought [1, 2]. However, with the increasing frequency of extreme weather events caused by global climate warming, tea plants are facing more frequent drought stress [3, 4]. Breeding drought-tolerant cultivars is an effective mitigation strategy, yet traditional tea plant breeding methods are hindered by long cycles and subjective evaluation [4–6]. Hyperspectral imaging (HSI) serves as a non-destructive analytical tool capable of decoding plant phenotypic traits by interpreting variations in their spectral signatures. Combined with machine learning, HSI technology can achieve quantitative analyses and qualitative assessments for key agronomic traits [7, 8], making it possible to evaluate tea plant germplasm resources quickly, accurately and non-destructively.

In recent years, the application of HSI technology in tea science has become increasingly widespread, such as environmental stress diagnosis [4, 9, 10], growth and developmental information monitor [11, 12], and quality indicator detection [13, 14]. The hyperspectral data processing workflow for tea plants typically encompasses four key stages: data acquisition, data preprocessing, feature selection, and model establishment. The data acquisition is completed through hyperspectral imaging systems [9]. The obtained data is often interfered with noise caused by changes from collection instruments, light sources, and environmental conditions [15]. In order to eliminate this interference, the multiplicative scatter correction (MSC) [16], Savitzky-Golay (S-G) [17], and derivative [18] algorithms are usually employed for data preprocessing. In practical applications, integrating multiple preprocessing methods is standard practice. Following preprocessing, hyperspectral data necessitate further feature selection due to their inherent high dimensionality. This often introduces substantial computational complexity, making it essential to extract the most informative features and eliminate redundancy spectral bands [3, 19]. In the phenotype analysis of tea plants, Uninformative Variable Elimination (UVE), Competitive Adaptive Reweighted Sampling (CARS), and Successive Projections Algorithm (SPA) are commonly used feature selection methods [4, 5, 9, 20]. Following feature selection, establishing a robust math model to correlate plant traits with optical characteristics becomes the key of hyperspectral-based phenotype analysis of tea plants. The Support Vector Machine (SVM) and Random Forest (RF) algorithms of traditional machine learning are often used for predicting physiological indicators of tea plants. For instance, in the researches of Chen et al. (2021, 2022), the best models of drought damage degree (DDD) and drought tolerance coefficient (DTC) of tea plants were UVE-SVM and MSC-2D-UVE-SVM, respectively [4, 5]. In the research of Luo et al. (2021), the SVM model also had the best performance in predicting nitrogen and tea polyphenol content [21]. In addition, deep learning (DL), an advanced subset of machine learning, has shown significant potential in recent years for crop phenotype analysis. This field emphasizes the architecture and function of neural networks [3]. Data is processed through a hierarchical cascade of layers, with the output of one layer feeding into the next, enabling the learning of representations of data at increasingly abstract levels [22]. Compared to traditional machine learning approaches, DL models are able to extract more intricate features. Currently, DL is being applied in tea science research. Convolutional neural network (CNN)-based models, for instance, demonstrated enhanced efficacy in assessing low-temperature stress responses in tea plants [9]. Long-short term memory (LSTM) neural architectures demonstrated superior capability in classification of tea coal diseases [10]. It should be noted that hyperspectral analysis of tea plants faces the significant challenge of capturing complex, non-local interactions among spectral bands, which are crucial for accurately assessing plant physiological status. While CNNs and LSTMs have been employed, they possess inherent limitations. CNNs, effective in extracting local spatial-spectral features, struggle to model long-range dependencies across the entire spectral range due to their limited receptive fields. Conversely, LSTMs, designed for sequential data, can suffer from computational inefficiency and difficulties in capturing true long-term interactions in lengthy spectral sequences. To address these limitations, we propose the adoption of the Transformer architecture.

Transformer, a groundbreaking deep learning architecture initially developed for natural language processing in 2017, has revolutionized multiple domains through its self-attention mechanism [23, 24]. As the core of Transformer, the self-attention mechanism can effectively solve the long- distance dependency issues in text sequences for natural language processing tasks while improving computational efficiency [24]. In recent years, Transformers have been extensively applied in image classification researches, such as differentiating between normal and defective blueberries [25], detecting tomato leaf diseases [26], and classifying land cover [27]. These studies process images as sequences of image patches. Notably, the plant spectral data acquired within specific spectral bands can also be regarded as long sequential data. Leveraging Transformer’s capacity for modeling long-range dependencies, cross-band spectral interactions can be effectively characterized, leading to enhanced prediction accuracy [28]. Modified Transformer architectures have already been implemented in prediction models. For instance, in the study on predicting canopy water content of alpine shrub by Wang et al. (2024) [24] and the study on predicting leaf water content of blueberries by Rahman et al. (2025) [28], Transformer-based models have shown excellent performance. In these studies, the plant spectral data represented as average reflectance across spectral bands, constitutes a one-dimensional sequential signal. The self-attention mechanism of the Transformer is particularly adept at modeling such sequences, as it can dynamically weigh the importance of different spectral bands and capture complex interactions between distant spectral bands. Additionally, hybrid architectures similar to Transformer-CNN combining sequence processing strengths of Transformer with superior feature extraction capabilities of CNN have exhibited outstanding results in soil property prediction [29] and soluble solids content analysis of blueberries [30]. The successful applications of Transformer in these researches confirm its suitability for sequence modeling challenges analogous to hyperspectral data analysis. It therefore constitutes a highly promising yet entirely unexplored approach for tea plant research.

To address this gap, this study proposed a lightweight Transformer-based hybrid integrated architecture to comprehensively evaluate the drought tolerance of different tea plant germplasm resources using hyperspectral imaging technology. Firstly, we acquired hyperspectral images of leaves from different tea plant varieties during the drought treatment, and measured drought-related physiological indicators (MRP, MDA, Pro, SS and ChlT) simultaneously. Secondly, we established the tea drought tolerance index (TDTI) by integrating the rate of change of indicators on the time series with temporal weights and indicator weights. Then, we proposed the TF deep learning model based on the lightweight Transformer-based hybrid architecture and the TF-CNN deep learning model further combined with CNN. Finally, we used traditional machine learning models SVM, RF, and BRR, as well as deep learning models TF and TF-CNN, to established the prediction models for physiological indicators and the index TDTI. The specific objectives of this study were as follows: (1) Establish a comprehensive index TDTI to improve the accuracy of evaluating the intrinsic drought tolerance of tea plants. (2) Develop a lightweight Transformer-based hybrid integrated architecture (TF + trees) to enhance predictive capability for drought-related physiological indicators and TDTI index, and achieve proximal non-destructive high-throughput phenotyping analysis of tea plants for drought tolerance screening. (3) Conduct an exhaustive comparison of various traditional machine learning models and deep learning models to evaluate the applicability of the lightweight Transformer-based models in the evaluation of tea plant germplasm resources. This study provides technical support for the rapid, accurate, and non-destructive comprehensive evaluation of tea plant germplasm resources. Furthermore, this study can also advance tea plant phenomics toward greater intelligence and efficiency, providing novel insights for digital breeding in agricultural systems.

Materials and methods

Plant materials and treatments

This study employed two-year-old ‘Zhongcha 108’ (ZC 108) and twelve ‘Lucha series’ tea cultivars as experimental materials, and cultivated in the greenhouse. The 12 ‘Lucha series’ tea plant cultivars included LC1, LC2, LC4, LC5, LC6, LC7, LC8, LC10, LC11, LC17, LC21 and LC22. Approximately 100 tea seedlings with similar growth of each cultivar were selected and transplanted into three 32-cell plug trays.

Following transplantation, tea seedlings underwent a 21-day acclimatization period. The greenhouse environment was automatically regulated by an integrated climate control system. Temperature and relative humidity (RH) were continuously monitored by sensors and maintained by the greenhouse management system. During the acclimatization period, the greenhouse environment was set to 28 °C (from 5:00 AM to 9:00 PM, 16-hour photoperiod) and 22 °C (from 9:00 PM to 5:00 AM, 8-hour dark period), with RH sustained at 75% using a Pad-and-Fan System. Sprinkler irrigation was performed every 3 days to keep the depth of water in the trays about 10 mm. Two days after the final sprinkler irrigation of acclimatization, the drought treatment was initiated on May 15, 2024.

During the drought period, the Pad-and-Fan System was closed and the irrigation was suspended, while all other environmental parameters remained constant. The tea samples were harvested at 0d (drought treatment initiation), 6d, 12d, and 18d after drought exposure. Upon completion of the 18d sampling, the drought stress treatment was terminated and rehydration initiated. The environmental parameters were restored to match those during the acclimatization period, and the last sampling occurred 6 days later (24d). When sampling, at least three biological replicates were set for each cultivar. The term “biological replicates” specifically referred to the sampling of leaves from independent individual plants per tea cultivar at each sampling time point. It was important to note that the same individual plants could be sampled repeatedly across different time points throughout the duration of the experiment. The final reported sample size (n) for each physiological indicator represented the cumulative count of all successfully measured samples across all time points and biological replicates, with slight variations among different indicators.

Following sampling, the hyperspectral images of tea leaves were acquired firstly. Then, the tea leaves were quickly freeze-dried in liquid nitrogen and further ground for testing.

Data acquisition of tea samples

Acquisition of hyperspectral data

In our study, the hyperspectral imaging acquisition system depicted in Fig. 1 comprises GaiaField-Pro-V10 hyperspectral camera (Jiangsu Dualix Spectral Image Technology Co. Ltd, China), four 200-watt halogen lights (hsia-ls-t-200 W, China), dark chamber assembly, and computer workstation.

Hyperspectral image acquisition and calibration procedures followed established methods from previous studies [4, 5, 9, 20]. Prior to imaging acquisition, the hyperspectral camera and halogen lights underwent a 30-minute preheating protocol within the dark chamber. Exposure duration was maintained below 19.6 milliseconds during data capture. Notably, a whiteboard image with a reflectance of about 100% and a blackboard image with a reflectivity of about 0% should be captured before and after shooting to the subsequent correction. The acquired hyperspectral data featured 960 × 1101 (Space × Spectrum) pixel spatial resolution with 176 spectral bands (3.5 nm spectral resolution) covering 397 ~ 1001 nm.

The hyperspectral image calibration and data extraction were performed by software Specview (Jiangsu Dualix Imaging Technology Co., Ltd, China) and ENVI 5.6 (Research System Inc, Boulder, CO, USA), respectively [9]. Initial black-and-white calibration was implemented using the following formula to mitigate dark current noise and environmental interference:

Where, R is the corrected image, O is the original image, B is the black reference image obtained by covering with a lens cover with a reflectance of about 0%, and W is the white reference image of the Spactralon 99% white reference panel. Following calibration, the reflectance of the hyperspectral images would be between 0 and 1. Secondly, the lens distortion correction and reflectance calibration of images were implemented via the analysis tools of Specview. Finally, the tea leaf regions in the hyperspectral images were labeled as regions of interest (ROI) in ENVI. The ROIs were meticulously defined by manually outlining the entire healthy leaf area of each sample, excluding veins and any visibly damaged regions. This manual approach ensured precise targeting of the relevant leaf tissue. The size of the ROIs varied with leaf dimensions, typically encompassing between 10,000 and 20,000 pixels per leaf sample. In the end, the average reflectance of ROIs was calculated as the spectral data of the samples, and the spectral matrix of variables (spectral bands) × samples were obtained.

Acquisition of drought phenotype images

In order to enable visual comparison of drought stress responses among different tea plant varieties, RGB images of tea plant phenotypes were collected in this study at the completion of drought treatment (on the 18th day). The RGB images were captured using a 1/1.28-inch Sony IMX707 CMOS sensor under natural daylight conditions (10:00–11:00 AM). The imaging system operated at an equivalent focal length of 24 mm (the actual focal length was 7 mm) with an aperture of f/1.9. Auto-exposure parameters covered a shutter speed range of 1/1200s to 1/100s, ISO 50 sensitivity, and 0 EV compensation. Image acquisition was conducted at an approximate working distance of 50 cm from the specimen. All images were recorded at 4096 × 1844 pixels resolution with 24-bit sRGB color depth.

Acquisition of physiological indicators

In this study, five drought-related physiological indicators were determined, including membrane relative permeability (MRP), malondialdehyde content (MDA), proline content (Pro), soluble sugar content (SS) and total chlorophyll content (ChlT). Among them, the MRP was obtained by calculating the ratio of conductivity before and after boiling in distilled water [31]. The contents of MDA, Pro and SS were determined by corresponding assay kits (Suzhou Grace Biotechnology Co., Ltd, Suzhou, China; Catalog numbers: MDA: G0109W; Pro: G0111W; SS: G0501W). The contents of ChlT were determined in accordance with the Ministry of Agriculture of China recommended standard “NY/T 3082 − 2017 Determination of chlorophyll content in fruits, vegetables and derived products — Spectrophotometry method”.

Following the measurements, a one-way ANOVA followed by Tukey’s HSD test was conducted using SPSS software to determine the significance of differences among the different samples.

Acquisition of tea drought tolerance index (TDTI)

In order to comprehensively evaluate the drought tolerance of different tea cultivars, this study calculated the temporal weights and indicator weights respectively based on the sampling time points and the rates of change of physiological indicators by using principal component analysis (PCA), and finally obtained the tea drought tolerance index (TDTI). This method referred to the previous research on the establishment of comprehensive indicators of tea plants [5, 32], with some modifications.

Firstly, the relative change rate (X) of each physiological indicator (j) at each time point (k) was calculated against the baseline at 0d (k₀). After global Z-score normalization of X, the temporal weight coefficients (Wₖ) for sampling time points, calculated through PCA, were quantified as follows: 0.0336 (W₆) at 6d, 0.3232 (W₁₂) at 12d, 0.3261 (W₁₈) at 18d, and 0.3171 (W₂₄) at 24d.

Secondly, the time-weighted sum for each physiological indicator (j) of each tea cultivar (i) was calculated based on the temporal weight (Wₖ) and the standardized X. The indicator weights (V_j) were then obtained through PCA, yielding the following coefficients: 0.1958 (V_MRP), 0.1923 (V_MDA), 0.2084 (V_Pro), 0.2194 (V_SS), and 0.1839 (V_ChlT).

Finally, the TDTI was calculated according to the following formula:

where, X_ijk is the standardized value of cultivar i for physiological indicator j at time point k, Wₖ is the temporal weight of time point k, V_j is the indicator weight of physiological indicator j. The negative sign was added because the rate of change was negatively correlated with the drought tolerance. The smaller the rate of change, the stronger the drought tolerance.

In the formula (2), the weights Wₖ and V_j were determined objectively using Principal Component Analysis (PCA) implemented in the PCA module from the scikit-learn (v1.6.1) in Python 3.12.6. For the temporal weights (Wₖ), the input data was a matrix of the Z-score normalized rates of change (X_ijk) for all indicators at the four time points (6d, 12d, 18d, 24d) across all cultivars. The first principal component (PC1) of this temporal dataset explained 54.80% of the total variance, indicating it captured the dominant temporal pattern of drought response. The loadings of PC1 on the four time points were: 6d: -0.0601, 12d: 0.5781, 18d: 0.5833, 24d: 0.5674. The absolute values of these loadings were normalized to sum to unity, yielding the final temporal weights: W₆=0.0336, W₁₂=0.3232, W₁₈=0.3261, W₂₄=0.3171. The increasing weight values from 12d to 18d biologically reflected the cumulative and intensifying nature of drought stress over time, with the later stages being more critical for distinguishing cultivar tolerance. For the indicator weights (V_j), the input was a matrix of the time-weighted sums for each cultivar and indicator. The PC1 of this indicator dataset explained 71.74% of the variance, representing the primary trend in physiological responses. The PC1 loadings were: MRP: 0.4369, MDA: 0.4295, Pro: 0.4651, SS: 0.4896, ChlT: 0.4105. After normalization of their absolute values, the final indicator weights were determined as: V_MRP=0.1958, V_MDA=0.1923, V_Pro=0.2084, V_SS=0.2194, V_ChlT=0.1839. The relatively balanced weights among these five indicators suggested that multiple physiological pathways (osmotic regulation and photosynthetic performance) jointly contribute to the comprehensive drought tolerance evaluated by the TDTI. The use of PCA ensured that the TDTI integrates the most significant temporal and physiological patterns in an unsupervised, data-driven manner, enhancing the objectivity of the index.

Preprocessing of hyperspectral data

Hyperspectral data are inherently susceptible to noise contamination due to intrinsic sensor artifacts and extrinsic environmental factors. In order to improve the signal-to-noise ratio of hyperspectral data and reduce the effects of baseline drift caused by environmental interference, as well as diffuse reflection and spectral overlap, preprocessing of the spectral data was required. In this study, the methods of preprocessing for spectral data included MSC [33], S-G [17], first derivative (1D) [34] and second derivative (2D) [18]. The relevant formulas were as follows:

Multiplicative scatter correction (MSC):

where X is the original spectral matrix of the sample. X_i is the spectral values of the i th sample. Inline graphic is the average of all spectral data. k_i and b_i were baseline offset and baseline shift, respectively. X_i(msc) is the spectrum after MSC correction.

Savitzky-Golay (S-G):

where X_i is the original spectral data, Inline graphic is the Savitzky-Golay smoothed data, W_j is the convolution coefficient for smoothing window of width 2R + 1.

First derivative:

Second derivative:

where y is the spectrum absorbance, λ is the wavelength, y_i is the spectrum of the i th sample, Δλ is the wavelength interval.

Feature band screening of hyperspectral data

The hyperspectral data collected in this study spanned 397 ~ 1001 nm with 176 spectral bands. In order to address the computational complexity and model instability caused by high-dimensional spectral variables, we implemented three feature selection algorithms: UVE, CARS, and SPA [35], to identify the feature spectrum for subsequent modeling. The basic parameters of these algorithms were shown in Table 1.

Table 1.

The main parameters of UVE, CARS, and SPA

Algorithm	Parameter	Symbol	Value
UVE	PLS components	a	5
	Random noise variables	pZ	700
	Cutoff percentile	cutoff	0.99
SPA	Minimum variables	m_min	1
	Maximum variables	m_max	30
	Autoscaling flag	Autoscaling	0 (Mean-centering only)
CARS	Maximal PLS components	A	10
	Cross-validation folds	Fold	10
	Pretreatment method	Method	‘None’
	Monte Carlo sampling runs	Num	300

Open in a new tab

Establishment of models

In order to determine the optimal prediction model for each physiological indicator and TDTI index, three machine learning models — SVM, RF, and BRR (Bayesian Ridge Regression) — along with two deep learning models based on lightweight Transformer-based architecture were employed in this study. Then, the prediction models were established between the hyperspectral data and physiological indicators of tea leaves.

Establishment of SVM, RF and BRR

Among these algorithms, both SVM and RF had been demonstrated as robust prediction models in physiological indicator of tea leaves in previous research [9, 11, 36]. The SVM, grounded in structural risk minimization theory, operates by identifying the optimal hyperplane in feature space for classification tasks, with kernel tricks enabling nonlinear separability. When applied to regression problems, it optimizes model performance by regulating the ε-insensitive loss function and tuning kernel parameters [37, 38]. The RF, as an ensemble method leveraging bootstrap aggregation, enhances prediction accuracy and generalizability through constructing multiple decision trees trained on randomly sampled feature subspaces, followed by output integration [9]. The main parameters of SVM and RF were shown in Table 2.

Table 2.

The main parameters of models

Algorithm	Parameter	Symbol	Value
SVM	SVM Type	-s	3(epsilon-SVR)
	Kernel Type	t	2(RBF Kernel)
	Penalty Coefficient	-c	{2^-8, …2^8}
	Kernel Parameter	-g	{2^-8, …2^8}
	Loss Threshold	-p	0.01
RF	Number of Trees	Ntrees	200
	Minimum Leaf Size	Minleaf	5
	Bootstrap Sample Ratio	FBoot	1
	Surrogate Splits	Surrogate	‘On’
	Out-of-Bag Importance	Oobvarimp	‘On’
	Task Type	Method	‘Regression’
BRR	Noise Level	Noise_level	0.02
	Augmentation Ratio	Aug_ratio	0.2
	Correlation Threshold	Corr_threshold	80th percentile
	Polynomial Degree	Interaction_rules	2
	Burn-in Iterations	Burnin	1000
	Total Draws	NumDraws	5000

Open in a new tab

Bayesian Ridge Regression (BRR) was a probabilistic modeling method that achieves automatic regularization through conjugate prior distributions, enabling stable predictions for high-dimensional data [39]. Based on the classical BRR framework, this study developed a dynamic feature engineering and hierarchical regularization strategy to enhance model robustness. Firstly, zero-variance filtering (threshold = 1e-6) was applied to the training set, followed by Gaussian noise augmentation. All features and target variables were standardized using Z-score normalization based on training-set-specific means and standard deviations. Secondly, interaction terms were generated by screening highly correlated feature pairs using an 80 th percentile threshold of absolute Pearson correlation coefficients derived from training data. Quadratic terms were also constructed for all original features to capture nonlinear relationships. During Bayesian inference, prior parameters were dynamically adjusted according to the feature dimension p. Finally, posterior means were estimated via Gibbs sampling, and predictions were obtained by inverse standardization. The main parameters of BRR were also shown in Table 2, and the dynamic regularization parameters were shown in Table 3.

Table 3.

Dynamic regularization parameters of BRR

Feature dimension range	Prior variance inverse (V)	Shape parameter (A)	Scale parameter (B)
p < 5	1e2	2	1.5
5 ≤ p < 20	1e3	10	1.0
20 ≤ p ≤ 50	5e4	20	0.5
50 < p ≤ 100	1e5	30	0.3
100 < p ≤ 150	2e5	40	0.2
150 < p ≤ 200	5e5	50	0.1
p > 200	1e6	60	0.05

Open in a new tab

Establishment of the lightweight transformer-based hybrid architecture

Considering that Transformer models generally require large-scale training data [27], while the datasets in this study were limited in size. Therefore, this research proposed a lightweight Transformer-based hybrid integrated model (TF). The construction of the TF model (Fig. 2; Table 4) primarily consisted of three stages:

Fig. 2 — Schematic of the lightweight Transformer-based hybrid integrated architecture

Table 4.

The main parameters of each module in the TF and TF-CNN

Module	Parameter	Value/Range
Data augmentation	Seed	42
	Gaussian noise Std	0.05
	Local masking rate	10%
	Mixup beta parameters	α = β = 0.5
XGBoost	Number of trees	2000
	Learning rate	0.05
	Max depth	6
	Early stopping	100
	L2 regularization	1.0
LightGBM	Number of trees	2000
	Number of leaves	63
	Learning rate	0.05
	Early stopping	100
	L1 regularization	0.1
Transformer	Input dim	192
	Embedding activation function	GELU
	Encoder layers	3
	Attention heads	8
	Encoder activation function	GELU
	Feedforward dim	768
	Feedforward dropout	0.3
	Hidden dim	96
	Regressor activation function	SiLU
	Output dim	1
	Regressor dropout	0.4
	Initial learning rate	1e-4
	Weight decay	1e-4
	Warmup epochs	10
	Max epochs	300
	Early stopping patience	20
CNN	Convolutional layers	2
	Convolution kernel	5, 3
	Output channels	32 → 64
	Fusion dropout	0.5
Meta-model	Cross-validation folds	5
Meta-model	Bayesian ridge iterations	300

Open in a new tab

(1) Data augmentation

This study designed three augmentation strategies tailored to spectral data characteristics: gaussian noise injection, random local masking, and spectral mixing.

(2) Base model training

This study conducted parallel training of eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and lightweight Transformer.

XGBoost employed the exact greedy algorithm, achieving higher accuracy in regression tasks of small-sample datasets.
LightGBM utilized the histogram algorithm to efficiently process high-dimensional spectral data.
Transformer-based model was lightweighted by removing components such as positional encoding and decoders from the classical Transformer structure [40], and retaining a streamlined encoder. The encoder was designed with an embedding dimension of 192, which projects the input features (e.g., 176 spectral bands) into a higher-dimensional space to enhance representational capacity while maintaining parameter efficiency. This lightweight architecture, comprising a 3-layer encoder with 8-head attention and a single multilayer perceptron regressor, was further enhanced by a dimensional adaptation mechanism to address gradient decay issues in long-range dependency modeling by traditional sequence models. This design makes the model particularly suitable for end-to-end regression tasks on spectral data.

Furthermore, given the small-sample datasets in this study, the optimization strategies such as learning rates, regularization, early stopping, and dropout were tailored to different base models to prevent overfitting.

(3) Meta model integration

The predictions from base models were dynamically weighted and integrated through Bayesian Ridge Regression, with all hyperparameters set to the default configurations of scikit-learn. The generalization performance of this hybrid framework was the evaluated by 5-fold cross-validation. It is necessary to make a special note that the meta-model (Bayesian Ridge Regression) was trained exclusively on out-of-fold predictions generated from the 5-fold cross-validation process of the base models. This stringent practice prevents information leakage from the training targets into the meta-features, ensuring the generalizability of the stacked ensemble.

This multi-faceted design of the hybrid integrated architecture, encompassing lightweighting, ensemble strategies, and regularization, was intentionally crafted to mitigate the risk of overfitting and enhance model generalizability, particularly crucial given the limited sample size of our study.

In addition, based on the above-mentioned TF model, we further proposed a hybrid TF-CNN model that synergistically modeled global dependencies and local morphological patterns in spectral data through parallel integration of the Transformer encoder (3-layer encoder with 8-head self-attention) and a convolutional neural network (CNN) branch (two 1D convolutional layers with kernel sizes: 5→3) in the aforementioned Transformer-base model. The concatenated 192- and 64-dimensional features from dual pathways were fused via a 128-dimensional fully-connected layer for regression prediction.

Software, dataset partitioning, and model evaluation

All experiments in this study were performed on a laptop computer equipped with an Intel (R) Core (TM) i9-14900HX CPU, 16 GB of RAM, and running the Windows 11 operating system (version 23H2). The data preprocessing, feature selection, and the implementation of SVM, RF, and BRR models were conducted in MATLAB R2023b. The development of TF and TF-CNN models was performed in Visual Studio Code (version 1.99.3) using Python 3.12.6.

This study included the following datasets: MRP (189 samples), MDA (254 samples), SS (254 samples), Pro (243 samples), and ChlT (244 samples). The above-mentioned five datasets were respectively divided into the training set and the independent test set in a ratio of 4:1, ensuring that each sample was exclusively assigned to either the training set or the test set. The independent test set was only used for the final model evaluation, while the hyperparameters were optimized through cross-validation within the training set. This strict partitioning guarantees no data leakage, as the model is evaluated on completely unseen samples. For the TDTI, which integrated temporal and indicator weighting mechanisms, only 34 samples (each sample was composed of TDTI and the corresponding hyperspectral data at 0d) were available due to calculation constraints. Given the limited sample size, the model performances of TDTI were directly evaluated using the aggregated predictions from all validation folds in the 5-fold cross-validation process without independent test set division.

In this paper, the determination coefficient (R²), the root mean square error (RMSE), the mean absolute error (MAE), and the relative predictive deviation (RPD) were employed to evaluate the performance of models [39, 41]. R² quantifies the goodness-of-fit between model predicted values and observed values. Lower RMSE and MAE values and higher RPD values correspond to enhanced predictive ability and improved model robustness. The specific calculation was given by formula (9–12).

where n is the number of samples in the corresponding dataset, Inline graphic is the measured value, is the predicted value, is the average of the measured values.

Results

Response of different tea plant varieties to drought

In order to compare the ability of different tea plant varieties to resist drought stress and their recovery ability after rehydration, we respectively measured the key drought-related indicators such as MRP, MDA, Pro, SS, and ChlT of each variety on the 0d, 6d, 12d, and 18d days of drought treatment, as well as on the 6d after rehydration (24d of treatment). Meanwhile, we also recorded the phenotypes of all tea plant varieties at the end of the drought treatment (18d). From the phenotype of each variety, LC17 had the best growth, followed by LC8, LC10, and LC11.

As shown in Fig. 3, the changes in the measurement results of MRP and MDA had a certain similarity, showing a pattern of continuous increase during drought treatment and a slight decrease after rehydration. It was because that MDA is an important indicator of cellular membrane degradation or dysfunction. Drought stress can lead to the generation of large amounts of reactive oxygen species in tea leaf cells, which will further cause lipid peroxidation, damage of cell membrane system, increase of membrane permeability, and generation of MDA [42]. Among the tea plant varieties in this study, the MRP and MDA levels of LC1, LC2, LC5 and LC7 were higher on 18d, while those of LC8 were lower and showed small changes throughout the drought process. According to the SS measurement results, the SS content of LC17 was higher than that of other varieties, especially in the early stages of drought. In addition, the contents showed a downward trend with the increase of treatment time. Soluble sugars are not only osmotic regulators but also important energy substances when plants undergo drought stress. Under abiotic stress conditions, soluble sugar content typically declines in the majority of plant species [43]. It may be because when facing stress, plants need to consume a large amount of energy for self-protection, and free sugars are the most easily utilized energy source [44]. Proline serves as a vital osmoprotectant [45], capable of protecting the photosynthetic functions of PS II in isolated thylakoid membranes from photodamage [46]. In this study, the Pro contents of these varieties continued to increase during drought treatment and decreased after rehydration. Among them, the changes in LC2 and LC5 were the most significant, and the contents were higher than others at 18d. However, the phenotypes of LC2 and LC5 at 18d showed that they were more severely affected by drought stress. At the same time, we noticed that LC17 originally had a high Pro content (0d), and the change of Pro content was relatively small during drought treatment. The phenotype at 18d also indicated that LC17 grew well and was less affected by drought stress. In general, when plants are subjected to drought stress, chlorophyll content decreases [47, 48]. However, in this study, the chlorophyll contents of these varieties increased during drought, especially LC1, LC2, and LC5. Paradoxically, although chlorophyll content increased, LC1, LC2, and LC5 did not exhibit strong drought tolerance phenotypically. This result was puzzling. We speculated that drought stress triggered some chlorophyll protection mechanisms in these three varieties, which were originally intended to maintain photosynthesis. However, drought stress also triggered severe peroxidation reactions at the same time, resulting in severe membrane system damage (LC1, LC2, and LC5 indeed had high levels of MRP and MDA at 18d) and impaired photosynthetic system function [49, 50]. Therefore, although the chlorophyll contents were higher, the photosynthetic systems were actually damaged, indicating poor drought tolerance in these tea plant varieties.

Based on the analysis of the five physiological indicators above and their corresponding phenotypes (Fig. 4), we found that it was difficult to accurately evaluate the drought tolerance of tea plant through a single indicator. Therefore, we developed a comprehensive tea drought tolerance index (TDTI) by combining temporal weights with indicator weights. The results of the TDTI for each tea plant variety were shown in Fig. 4. Compared with ZC108, tea plant varieties such as LC4, LC8, LC10, LC11, LC17, LC21 and LC22 showed higher TDTI, indicating their superior drought tolerance. Concretely speaking, LC17 had the highest TDTI, followed by LC8, LC10, LC11, and LC22. The TDTI values of LC4 and LC21 showed little difference from ZC108.

Impact of drought on spectral reflectance curves

In this study, we preprocessed the obtained hyperspectral data by combining MSC and S-G with first and second derivatives, respectively (Fig. 5). The original spectral reflectance curves of hyperspectral data exhibited typical vegetation spectral characteristics [25]. Spectral reflectance declined within the 430 ~ 480 nm and 640 ~ 670 nm ranges, indicating chlorophyll absorption bands in the blue and red spectral regions. Subsequently, plant spectral reflectance exhibited a steep ascent within the 680 ~ 780 nm range, characterizing the typical “red edge effect” [51, 52]. A highly reflective plateau was observed in the 750 ~ 1000 nm range, primarily attributed to light scattering within the spongy mesophyll and the lower absorption of incident radiation by photosynthetic pigments and water within this spectral region [24, 53]. Notably, in the near-infrared region (780 ~ 840 nm), the spectral reflectance curves of all samples showed partial separation. This separation might be attributed to the drought treatment in the experiment, as the reflectance in this wavelength range was sensitive to the internal structure of leaf and strongly influenced by water absorption [28]. After preprocessing with MSC, derivative (1D or 2D), and S-G smoothing, the spectral peaks and valleys became more distinct, and the differences among spectral reflectance curves in this band were amplified. These preprocessing steps were instrumental in mitigating peak overlap interference while enhancing spectral resolution and detection sensitivity [5].

Selection of feature bands

The spectral data of this study contained 176 bands. Considering the possible existence of redundant variables, UVE, SPA, and CARS were adopted to eliminate redundant bands to improve the efficiency and reliability of models. Table 5 listed the number of feature bands screened out for each indicator through different algorithms (See Supplementary Table S1 for detailed results). Among the feature selection methods of MRP, 1D-UVE selected the largest number of features, totaling 89, whereas 1D-SPA selected the smallest number, with only 8 features. For MDA and SS, 2D-UVE selected the most features (89 and 91), whereas 1D-SPA selected the least (16 and 20). For Pro, 2D-UVE had the most selected features (109), while 2D-SPA had the fewest (15). For ChlT, 1D-UVE selected the most features, with 113, while 1D-CARS and 1D-SPA selected fewer features, both with 15. Finally, for TDTI, 2D-UVE selected the most features at 66, while 1D-CARS selected the least at 15. Overall, similar to other research findings [9, 20, 54], UVE was able to screen out more feature bands compared to CARS and SPA.

Table 5.

The number of feature bands screened out for each indicator through different algorithms

	MSC + 1D + S-G			MSC + 2D + S-G
	UVE	CARS	SPA	UVE	CARS	SPA
MRP	96	27	8	88	9	13
MDA	82	28	16	89	35	17
SS	85	22	20	91	23	22
Pro	107	22	16	109	16	15
ChlT	113	15	15	89	22	21
TDTI	29	15	17	66	17	22

Open in a new tab

Establishment and comparison of models

In this study, through two preprocessing methods of 1D and 2D, combined with feature bands screened out by UVE, SPA and CARS, 154 different indicator models were established by using SVM, RF, BRR, TF and TF-CNN. Furthermore, considering the ability of Transformer to capture long-range dependencies in sequential data, we also explored direct modeling using TF and TF-CNN without UVE, SPA, or CARS-based feature selection (24 models). Notably, due to the limitations of BRR, many BRR models had failed during training. Only the models jointly established by BRR and UVE feature screening algorithm among the five basic physiological indicators were retained in this study. The determination coefficient (R²) of the test sets for the 178 successful models were shown in Fig. 6. We further compared the optimal traditional machine learning model with the optimal Transformer-based model for each indicator and performed a one-way ANOVA on the R² of test sets (Table 6). Detailed evaluation records for the other models were provided in Supplementary Table S2.

Fig. 6 — The determination coefficient (R²) of prediction models for each indicator of tea plants. Color intensity represents the metric value. The optimal traditional machine learning model and optimal Transformer-based model for each indicator are outlined in red

Table 6.

The optimal traditional machine learning model and optimal Transformer-based model of each indicator

Indicator	Model	Train Sets				Test Sets
Indicator	Model	R²	RMSE	MAE	RPD	R²	RMSE	MAE	RPD
MRP	1D-CARS-TF	0.7183	0.0490	0.0319	1.8842	0.8992	0.0339	0.0219	3.1500
MRP	2D-UVE-SVM	0.8564	0.0369	0.0176	2.5105	0.8971	0.0394	0.0242	2.7472
MDA	2D-UVE-SVM	0.8423	4.3072	2.9909	2.5101	0.8307***	4.5168	3.5305	2.4526
MDA	1D-SPA-TF-CNN	0.5944	6.8686	5.2923	1.5702	0.7792	5.1547	3.9978	2.1279
Pro	2D-UVE-BRR	0.9808	3.2792	2.2361	7.2399	0.8929***	8.8972	6.8587	3.0875
Pro	2D-TF	0.7715	10.9100	7.8700	2.0919	0.7640	13.2084	8.2682	2.0584
SS	2D-CARS-SVM	0.9248	1.6858	1.0991	3.6354	0.8373***	2.5327	2.0439	2.4031
SS	2D-TF-CNN	0.7618	2.9834	2.2068	2.0491	0.7906	2.7581	2.2087	2.1850
ChlT	1D-UVE-TF-CNN	0.6489	0.1516	0.1184	1.6878	0.7894**	0.1121	0.0855	2.1791
ChlT	2D-SPA-RF	0.9023	0.0879	0.0644	2.9170	0.7500	0.1279	0.0959	1.9296
TDTI	2D-UVE-TF	0.9322	0.1299	0.1048	3.8413	0.7614*	0.2437	0.1914	2.0473
TDTI	2D-UVE-SVM	0.9311	0.1331	0.0676	3.7630	0.6280	0.3106	0.2576	1.6302

Open in a new tab

* One-way ANOVA: * p < 0.05, ** p < 0.01, *** p < 0.001

The Fig. 6 showed that all the models of MRP performed well and their R² were all above 0.8. There were 6 models with R² higher than 0.89, including 1D-UVE-TF, 1D-CARS-SVM, 1D-CARS-TF, 1D-CARS-TF-CNN, 2D-UVE-SVM, 2D-UVE-RF, and 2D-SPA-TF-CNN. Among them, the R² and RPD of the 1D-CARS-TF model were the highest (0.8992 and 3.1500 respectively), while the RMSE and MAE were relatively low (0.0339 and 0.0219, respectively). Therefore, this model was considered the best model for MRP. In the MDA models, except for the two models of CARS-RF, the R² of others were all above 0.7. Among them, five models had excellent modeling effects with R² reaching above 0.8, including 2D-UVE-SVM, two models of SPA-SVM, and two models of CARS-SVM. The 2D-UVE-SVM model achieved the highest R² and RPD (0.8307 and 2.4526, respectively), was identified as the optimal model, and its R² was significantly (*** p < 0.001) higher than that of the best Transformer-based model. Among the 30 models of Pro, the R² distribution of most models were between 0.6 and 0.8, with only five models, 1D-UVE-SVM, 2D-UVE-SVM, 2D-UVE-BRR, 2D-SPA-SVM, and 2D-CARS-SVM, having R² values higher than 0.8. Among them, 2D-UVE-BRR demonstrated the best performance, achieving the highest R² value (0.8929), which was significantly (*** p < 0.001) higher than that of the best Transformer-based model 2D-TF. In the SS models, we found that the R² values of most SS models were between 0.7 and 0.8. Only two models had R² values higher than 0.8. The best model for SS was 2D-CARS-SVM, with the highest R² and RPD of 0.8373 and 2.4031, respectively. Additionally, its R² was significantly (*** p < 0.001) higher than that of the 2D-TF-CNN model. Of the 30 models developed for ChlT, 22 yielded R² values greater than 0.7, with the top three performers (1D-UVE-TF-CNN, 1D-TF-CNN, and 2D-SPA-RF) exceeding 0.75. Notably, 1D-UVE-TF-CNN emerged as the optimal model, with the highest recorded R² and RPD values of 0.7894 and 2.1791, respectively, demonstrating a significant advantage (** p < 0.01) in R² over the traditional machine learning model 2D-SPA-RF. In the models of TDTI, there were 6 Transformer-based models achieved R² above 0.7. Among them, 2D-UVE-TF and 2D-UVE-TF-CNN performed well, with R² values of 0.7614 and 0.7518, respectively, and RPD values greater than 2, which were 2.0473 and 2.0071, respectively. The performance of other 22 models were not satisfactory, especially the models based on SVM and RF algorithms.

Additionally, it is important to highlight that while the results were comprehensively presented here for a fair comparison, the proposed Transformer-based models (TF and TF-CNN) demonstrated distinct advantages for predicting specific physiological indicators such as MRP, ChlT, and TDTI. Specifically, the 1D-CARS-TF model emerged as the optimal model for MRP, while the 1D-UVE-TF-CNN models showed superior performance for ChlT. Notably, for the TDTI, almost all well-performing models were based on the Transformer architecture. These results suggested that these complex traits, which likely involve intricate interactions across multiple spectral bands, were more effectively captured by the global modeling capabilities of the self-attention mechanism in Transformer-based architecture. A detailed discussion elucidating the rationale behind the superior performance of the Transformer-based models in these specific cases, contrasting them with the limitations of traditional machine learning approaches, is provided in Sect. Transformer-based models: the advantage in capturing complex, multi-band interactions and A paradigm for model selection: matching algorithmic strengths to spectral complexity.

Discussion

A more rational comprehensive index: developing TDTI for accurate evaluation of tea plant drought tolerance

The response of tea plants to drought stress is a complex physiological process, making it difficult to accurately evaluate their drought tolerance using a single indicator. Previous studies used multiple indicators to calculate a comprehensive index to evaluate tea plant varieties. For example, Chen et al. (2021, 2022) established the DDD index and DTC index of tea plants based on multiple drought resistance indexes to evaluate their drought resistance [4, 5]. On the basis of previous methods, our study attempted to establish a comprehensive index from five indicators (MRP, MDA, Pro, SS, and ChlT) to evaluate the drought tolerance of tea plants, but the correspondence between the calculated results and the phenotype was not satisfactory. We believed there were two reasons for this. On one hand, these indexes relied solely on indicator weights and primarily reflected the physiological status of tea plants under specific environmental conditions. So, their prediction models were more suitable for monitoring the phenotypes of tea plants under drought stress than for evaluating their drought tolerance. On the other hand, these comprehensive indexes were just based on the absolute values of indicators, ignored the “initial values” (intrinsic differences) of different varieties. For example, the LC17 tea cultivar in this study exhibited high levels of MDA and low levels of Pro and ChlT on day 18. The performances of these parameters were conventionally associated with weak drought tolerance. However, the phenotype of LC17 distinctly demonstrated stronger drought tolerance compared to other cultivars. Our study further analyzed the measurement results of physiological indicators of LC17 and found that the five indicators of LC17 had relatively small changes during the experimental treatment process (drought and rehydration). This might be because LC17 had strong drought tolerance, and the impact of drought stress in the experiment was relatively small, so it did not cause significant changes in the physiological condition. These results implied that intrinsic genetic differences could make absolute-value-based predictions unreliable across different varieties. On the basis of these two factors, we decided to normalize the data using initial values to accurately quantifying stress-induced changes. Thereby, we calculated the rate of change in the indicators of tea plants at 6d, 12d, 18d, and 24d using the indicator at 0d as the baseline. The smaller the change rate, the stronger the drought tolerance. Finally, by combining temporal weights with indicator weights, the comprehensive tea drought tolerance index TDTI was obtained. The result analysis in Sect. Response of different tea plant varieties to drought showed that the validity of the TDTI was strongly corroborated by its high concordance with observed phenotypic responses. Cultivars with higher TDTI scores (such as LC17 and LC8) exhibited superior visual vitality and less drought damage under stress conditions, while low-scoring cultivars showed pronounced stress symptoms, providing robust external validation of the practicality of TDTI.

Traditional machine learning: effective models for indicators with distinct spectral absorption peaks

Based on the above results, the optimal models of MRP, MDA, Pro, SS, ChlT and TDTI were 1D-CARS-TF, 2D-UVE-SVM, 2D-UVE-BRR, 2D-CARS-SVM, 1D-UVE-TF-CNN and 2D-UVE-TF, respectively (Fig. 7). We observed that the traditional machine learning models significantly outperformed the Transformer-based models in modeling MDA, Pro, and SS. The strong performance of traditional machine learning models underscores a key finding of this study: the traditional machine learning models exhibit clear superiority for predicting physiological indicators with distinct spectral absorption peaks. This performance pattern is consistent with previous research. For instance, in studies related to MDA modeling, SVM has indeed demonstrated excellent performance [9, 55]. While prediction models for proline are less common, the work of Angela et al. (2021), which established a relationship between biochemically observed and spectrally predicted values of Pro (validation R² = 0.84), provides a relevant benchmark [56]. Notably, the performance of our 2D-UVE-BRR model (R² = 0.8929) was unexpectedly superior. The BRR is implemented for classification and predictive modeling by specifying covariates as random effects. While demonstrating high competitiveness and stability in genomic prediction, the model exhibits suboptimal performance with hyperspectral reflectance datasets [57]. The excellent performance of BRR in our study might be attributed to manual tuning of hyperparameters, so although the 2D-UVE-BRR model of Pro performed well, it still had certain limitations. Of course, it was undeniable that the results of our study had demonstrated the potential of BRR in the application of hyperspectral data. During the process of identifying 2D-CARS-SVM as the optimal model for SS, we noticed that spectral data preprocessed with the second derivative generally yielded higher modeling accuracy than those preprocessed with the first derivative. The modeling results of Pro also conformed to this rule. However, this was not the case for MDA. Further analysis revealed that while MDA, Pro, and SS each exhibited individual absorption peaks, the Pro and SS tended to accumulate within the cell, whereas the MDA was located closer to the cell membrane [58]. This resulted in the spectral features caused by the absorption peaks of Pro and SS might be more likely covered by other factors and became weak. Therefore, we speculated that the second derivative could enhance curvature information to resolve weak or overlapping spectral features of physiological indicators within cells [59].

Fig. 7 — The optimal model and the distribution of corresponding feature bands for each indicator. A Distribution of MRP feature bands screened by 1D-CARS. b The optimal 1D-CARS-TF model of MRP. C Distribution of MDA feature bands screened by 2D-UVE. D The optimal 2D-UVE-SVM model of MDA. E Distribution of Pro feature bands screened by 2D-UVE. F The optimal 2D-UVE-BRR model of Pro. G Distribution of SS feature bands screened by 2D-CARS. H The optimal 2D-CARS-SVM model of SS. I Distribution of ChlT feature bands screened by 1D-UVE. J The optimal 1D-UVE-TF-CNN model of ChlT. K Distribution of TDTI feature bands screened by 2D-UVE. L The optimal 2D-UVE-TF model of TDTI

Transformer-based models: the advantage in capturing complex, multi-band interactions

Beyond the capabilities of traditional models, the core innovation of our study lies in demonstrating that Transformer-based architectures uniquely address a more complex class of spectral analysis problems. As detailed in Sect. Establishment and comparison of models, the proposed Transformer-based models exhibited superior performance for MRP, ChlT, and the comprehensive index TDTI.

The results showed that all the models of MRP performed well and the 1D-CARS-TF model was the best. In addition, we noticed that MRP screened out only 27 feature bands through 1D-CARS. This indicated that CARS, as a feature variable selection method based on partial least squares algorithm weighted regression coefficients [60], selected a smaller number of feature bands with higher correlation when conducting MRP feature screening. Moreover, MRP could reflect changes in the properties of cell membranes, which were influenced by multiple factors. Consequently, the spectral characteristics of MRP were collectively determined by numerous distinct spectral bands. The Transformer architecture has demonstrated exceptional capability in capturing long-range dependencies through its multi-head self-attention mechanism. Within MRP modeling, this architecture effectively captured complex interdependencies among spectral bands, facilitating extraction of inherent and abstract spectral features [27, 28]. When combined with the high-correlation feature band selection capability of CARS, this integration was shown to account for the superior performance of the 1D-CARS-TF model in modeling of MRP.

During the analysis of the models developed for ChlT, it was observed that the TF-CNN algorithm performed well, as the RPD values of 1D-UVE-TF-CNN and 1D-TF-CNN models were higher than 2.0, indicating their enhanced predictive capability and practical application potential [61]. It might be due to the inherent synergy between the TF-CNN algorithm and the spectral characteristics of chlorophyll. Chlorophyll had two key absorption bands in the 400 ~ 1000 nm wavelength range (430 ~ 450 nm and 620 ~ 670 nm). The spectral reflectance of these two bands was closely related to chlorophyll content. In addition, the red edge position (670 ~ 780 nm) had also been proved to be closely related to chlorophyll content [62]. Therefore, the spectral characteristics of chlorophyll require the model to not only extract features from these three bands, but also model long-range dependencies. The TF-CNN model was particularly suitable for this task. The CNN is effective at extracting local features, while the Transformer excels at capturing long-range dependencies [29, 30]. Interestingly, the 1D-UVE-TF-CNN model was better than the 1D-TF-CNN model, which might be attributed to the UVE algorithm. The original 176 spectral bands in the hyperspectral image did contain some irrelevant information, and the UVE algorithm effectively eliminated these noisy bands while retaining 113 informative features and thereby enhancing the accuracy of model. It indicated that the efficiency of deep leaning models could be further improved when combined with feature selection methods, and this conclusion had been confirmed in other researches [24].

TDTI was a comprehensive index developed by combining change rates of indicators with temporal weights and indicator weights, and was more inclined to indicate the original drought tolerance abilities of different tea plant varieties. The prediction model jointly established by TDTI and hyperspectral data at 0d could predict and evaluate the drought tolerance abilities of tea plant varieties without drought treatment. Through analysis of all TDTI models, we found that the overall performances of Transformer-based models were better than that of SVM and RF. We attributed this result to two main factors: data volume and model adaptability. From the perspective of data volume, firstly, the XGBoost base model based on gradient boosting machine had a certain ability to enhance the accuracy and robustness on small-sample datasets [63] through ensemble strategies. Secondly, multiple optimization strategies were adopted during the construction of the Transformer-based models: (1) data augmentation expanded the limited training data; (2) regularization, early stopping and dropout effectively mitigated overfitting; (3) model lightweighting reduced parameter count and computational complexity while maintaining performance. These strategies ensured stable operation of the Transformer-based model even with limited data. In addition, the 2D-UVE-TF model performed better than the 2D-UVE-TF-CNN model in the results. We speculated that this might be due to the fact that the Transformer-based model in this study had already integrated tree-based architectures for local feature extraction, thereby making the TF-CNN combined algorithm excessively complex for the TDTI dataset with only 34 samples, which ultimately leaded to its inferior performance compared to TF. Moreover, in terms of model adaptability, TDTI was a comprehensive index to indicate the original drought resistance abilities of tea plants. Its spectral features synthesized characteristics from multiple drought related indicators, representing the high-order interactions between complex physiological features and different spectral bands. Traditional machine learning models struggled to capture intricate cross-band correlations, whereas the Transformer-based models with self-attention architectures demonstrated remarkable capabilities in this aspect [24, 28].

A paradigm for model selection: matching algorithmic strengths to spectral complexity

Through analysis of all models for the physiological indicators and the TDTI index, we found that the selection of the optimal model for each indicator was mainly determined by the spectral characteristic of the indicator. Before this, we need to be clear that the quantification of MDA, Pro, and SS conventionally rely on the measurement of absorbance at specific wavelengths. It meant that these indicators have clear absorption peaks, which is particularly important for the selection of preprocessing and modeling strategies. Firstly, in the preprocessing strategies, 2D preprocessing demonstrated particular efficacy for Pro, SS, and TDTI. In hyperspectral images, even when the absorption peaks of Pro and SS were identifiable, their spectral features were prone to being masked or weakened by other factors due to their intracellular aggregation. As mentioned earlier, 2D preprocessing could enhance curvature information to make the weak or overlapping spectral features of these physiological indicators clearer. For the comprehensive indicator TDTI, its spectral features were highly complex due to the multi-component interactions. In other studies, second derivatives were also particularly useful for the modeling of similar comprehensive indicator such as the canopy cover of soybean [64], the maturity of rapeseed [65], and the level salinity index (LSI) of lettuce [59]. Therefore, we believe that the 2D preprocessing could accentuate these intricate spectral features, laying the foundation for subsequent modeling. Secondly, in the modeling strategies, traditional machine learning models demonstrated superior compatibility with MDA, Pro, and SS, whereas Transformer-based deep learning models exhibited enhanced adaptability for MRP, ChlT, and TDTI. This paradigm differentiation originated from intrinsic spectral complexity gradients: indicators with isolated absorption peaks were favored by the localized wavelength-specific analysis of traditional machine learning, while those requiring multi-band correlation mining significantly benefited from the multi-head self-attention mechanism in Transformer-based deep learning. Specifically, the parallel computation across multiple attention heads enabled simultaneous extraction of intensity variations between non-adjacent bands and nonlinear interactions among latent features. These capabilities were unattainable by single-kernel traditional machine learning models.

It is important to acknowledge that the scope of this study was focused on evaluating model performance under constrained data conditions. While the limited dataset size inherently restricts the broad applicability of the conclusions, it provides valuable insights into the efficacy of the proposed lightweight Transformer-based models in data-scarce scenarios. The implemented strategies — including data augmentation, regularization, and hybrid ensemble learning — have demonstrated effectiveness in controlling overfitting risks, as evidenced by the high RPD values (> 2.0) of the optimal models. Future work will prioritize the collection of larger-scale datasets to further validate, train, and enhance the proposed models for broader application.

Conclusion

In order to accurately evaluate the intrinsic drought tolerance of tea plants, we established the tea drought tolerance index (TDTI) by integrating the change rates of five physiological indicators on the time series with temporal weights and indicator weights. Furthermore, to enable non-destructive prediction of this index and related physiological indicators, we developed various traditional machine learning models and a novel lightweight Transformer-based hybrid integrated architecture, followed by a systematic performance comparison. The main findings were as follows:

The comprehensive index TDTI demonstrated robust accuracy in evaluating the intrinsic drought tolerance capabilities of different tea plant varieties, and its prediction model was more suitable for the evaluation and screening of germplasm resources.
The optimal models for MRP, MDA, Pro, SS, ChlT and TDTI were identified as 1D-CARS-TF, 2D-UVE-SVM, 2D-UVE-BRR, 2D-CARS-SVM, 1D-UVE-TF-CNN, and 2D-UVE-TF, respectively. All optimal models demonstrated robust predictive performance, with the determination coefficient (R²) and the relative predictive deviation (RPD) exceeding 0.75 and 2.0, respectively.
Compared to traditional machine learning models, the lightweight Transformer-based models equipped with multi-head self-attention exhibited outstanding capabilities in processing indicators requiring multi-band correlation mining.
The high RPD values (> 2.0) of the optimal models demonstrated the effectiveness of the employed strategies — data augmentation, regularization, and hybrid integrated learning — in controlling overfitting risks on small datasets.

In conclusion, the combination of the Transformer-based deep learning model and the comprehensive index TDTI achieved excellent performance in evaluation and prediction of drought tolerance for tea plants. This advancement established a robust technical foundation for rapid, accurate, and non-destructive comprehensive evaluation of tea plant germplasm resources. However, it is also important to note that this research was intentionally focused on evaluating model performance under constrained data conditions. While the limited dataset size inherently restricts the broad applicability of the conclusions, it provides valuable insights into the efficacy of the proposed models in data-scarce scenarios. In addition, the findings of our study were based on a specific set of greenhouse-cultivated samples, and further validation under field conditions with expanded germplasm resources would strengthen generalizability. Anyway, our study demonstrated the potential of Transformer-based model in high-throughput analysis of phenotype in tea plants, advancing tea plant phenomics toward greater intelligence and efficiency. Simultaneously, this study also provided more possibilities for the digital breeding in agricultural systems.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1.^{(12.7KB, xlsx)}

Supplementary Material 2.^{(32.4KB, xlsx)}

Acknowledgements

Not applicable.

Author contributions

YL designed and performed the experiments, acquired, analyzed and integrated data, created the figures, drafted the original manuscript, and led the final revision. YZ analyzed and integrated data, and proposed critical insights during model development. HC, XH and YM participated in the experimental work and assisted with data acquisition. YW, LS, JS and ZD conceived the hypothesis for this work, designed the experiment, revised the manuscript, and read and approved the submitted version.

Funding

This work was supported by the Innovation Project of Shandong Academy of Agricultural Sciences [CXGC2024F15, CXGC2025F14] and the Natural Science Foundation of Shandong Province [ZR2023QC086].

Data availability

No datasets were generated or analysed during the current study.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jiazhi Shen, Email: shenjiazhitea@163.com.

Zhaotang Ding, Email: dzttea@163.com.

References

1.Xu R, Shao C, Luo Y, Zhou B, Zhu Q, Qiu S et al. Tea polyphenol mediated CsMYB77 regulation of CsPOD44 to promote tea plant (Camellia sinensis) root drought resistance. Hortic Res. 2025;12(6). [DOI] [PMC free article] [PubMed]
2.Guo Y, Zhao S, Zhu C, Chang X, Yue C, Wang Z et al. Identification of drought-responsive MiRNAs and physiological characterization of tea plant (Camellia sinensis L.) under drought stress. BMC Plant Biol. 2017;17(1). [DOI] [PMC free article] [PubMed]
3.Luo B, Sun H, Zhang L, Chen F, Wu K. Advances in the tea plants phenotyping using hyperspectral imaging technology. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]
4.Chen S, Shen J, Fan K, Qian W, Gu H, Li Y et al. Hyperspectral machine-learning model for screening tea germplasm resources with drought tolerance. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]
5.Chen S, Gao Y, Fan K, Shi Y, Luo D, Shen J et al. Prediction of drought-Induced components and evaluation of drought damage of tea plants based on hyperspectral imaging. Front Plant Sci. 2021;12. [DOI] [PMC free article] [PubMed]
6.Tao H, Xu S, Tian Y, Li Z, Ge Y, Zhang J et al. Proximal and remote sensing in plant phenomics: 20 years of progress, challenges, and perspectives. Plant Commun. 2022;3(6). [DOI] [PMC free article] [PubMed]
7.Kuswidiyanto LW, Noh H-H, Han X. Plant disease diagnosis using deep learning based on aerial hyperspectral images: A review. Remote Sens. 2022;14(23).
8.Tang T, Luo Q, Yang L, Gao C, Ling C, Wu W. Research review on quality detection of fresh tea leaves based on spectral technology. Foods. 2023;13(1). [DOI] [PMC free article] [PubMed]
9.Mao Y, Li H, Wang Y, Fan K, Shen J, Zhang J et al. Low temperature response index for monitoring freezing injury of tea plant. Front Plant Sci. 2023;14. [DOI] [PMC free article] [PubMed]
10.Xu Y, Mao Y, Li H, Sun L, Wang S, Li X, et al. A deep learning model for rapid classification of tea coal disease. Plant Methods. 2023;19:98. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Li H, Wang Y, Fan K, Mao Y, Shen Y, Ding Z. Evaluation of important phenotypic parameters of tea plantations using multi-source remote sensing data. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]
12.Sun J, Zhou X, Hu Y, Wu X, Zhang X, Wang P. Visualizing distribution of moisture content in tea leaves using optimization algorithms and NIR hyperspectral imaging. Comput Electron Agric. 2019;160:153–9. [Google Scholar]
13.Dutta D, Das PK, Bhunia UK, Singh U, Singh S, Sharma JR, et al. Retrieval of tea polyphenol at leaf level using spectral transformation and multi-variate statistical approach. Int J Appl Earth Obs Geoinf. 2015;36:22–9. [Google Scholar]
14.Tu Y, Bian M, Wan Y, Fei T. Tea cultivar classification and biochemical parameter Estimation from hyperspectral imagery obtained by UAV. PeerJ. 2018;6:e4858. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Pu H, Sun D-W, Ma J, Liu D, Cheng J-h. Using wavelet textural features of visible and near infrared hyperspectral image to differentiate between fresh and Frozen–Thawed pork. Food Bioprocess Technol. 2014;7(11):3088–99. [Google Scholar]
16.Wu Y, Peng S, Xie Q, Han Q, Zhang G, Sun H. An improved weighted multiplicative scatter correction algorithm with the use of variable selection: application to near-infrared spectra. Chemometr Intell Lab Syst. 2019;185(15):114–21. [Google Scholar]
17.Yibing L, Wenqing L, Yujun Z, Kai Z, Ying H, Kun Y, et al. An adaptive hierarchical Savitzky-Golay spectral filtering algorithm and its application. Spectrosc Spectr Anal. 2019;9:2657–63. [Google Scholar]
18.Xiaoli C, Hongfu Y, Wanzhen L. Progress and application of spectral data pretreatment and wavelength selection methods in NIR analytical technique. Progress Chem. 2004;16(4):528–42. [Google Scholar]
19.Bai J, Zhu S, Hao Y, Li X, Yang C, Wang C, et al. Comparative analysis of the effects of different dimensionality reduction algorithms on hyperspectral Estimation of total nitrogen content in wheat soils. Eur J Agron. 2025;168:127660. [Google Scholar]
20.Xu Y, Mao Y, Li H, Shen J, Xu X, Wang S et al. A deep learning model based on RGB and hyperspectral images for efficiently detecting tea green leafhopper damage symptoms. Smart Agricultural Technol. 2025;10.
21.Luo D, Gao Y, Wang Y, Shi Y, Chen S, Ding Z, et al. Using UAV image data to monitor the effects of different nitrogen application rates on tea quality. J Sci Food Agric. 2021;102(4):1540–9. [DOI] [PubMed] [Google Scholar]
22.Singh AK, Ganapathysubramanian B, Sarkar S, Singh A. Deep learning for plant stress phenotyping: trends and future perspectives. Trends Plant Sci. 2018;23(10):883–98. [DOI] [PubMed] [Google Scholar]
23.Zhang L, Wang Y, Yang L, Chen J, Liu Z, Wang J, et al. A multi-range spectral-spatial transformer for hyperspectral image classification. Infrared Phys Technol. 2023;135:104983. [Google Scholar]
24.Wang Y, Wang C, Wang B, Wang H. Combination of feature selection methods and lightweight transformer model for estimating the canopy water content of alpine shrub using spectral data. Infrared Phys Technol. 2024;139.
25.Deng B, Lu Y, Stafne E. Fusing spectral and Spatial features of hyperspectral reflectance imagery for differentiating between normal and defective blueberries. Smart Agricultural Technol. 2024;8.
26.Shehu HA, Ackley A, Marvellous M, Eteng OE. Early detection of tomato leaf diseases using Transformers and transfer learning. Eur J Agron. 2025;168.
27.Mohamed S, Haghighat M, Fernando T, Sridharan S, Fookes C, Moghadam P. FactoFormer: factorized hyperspectral Transformers with Self-Supervised pretraining. IEEE Trans Geosci Remote Sens. 2024;62.
28.Rahman MH, Busby S, Ru S, Hanif S, Sanz-Saez A, Zheng J et al. Transformer-Based hyperspectral image analysis for phenotyping drought tolerance in blueberries. Comput Electron Agric. 2025;228.
29.Cao L, Sun M, Yang Z, Jiang D, Yin D, Duan Y. A novel Transformer-CNN approach for predicting soil properties from LUCAS Vis-NIR spectral data. Agronomy. 2024;14.
30.George KSS, AC SNOV, J KM, Francis PK. J, NorBlueNet: hyperspectral imaging-based hybrid CNN-transformer model for non-destructive SSC analysis in Norwegian wild blueberries. Comput Electron Agric. 2025;235.
31.Tian S, Guo R, Zou X, Zhang X, Yu X, Zhan Y et al. Priming with the green leaf volatile (Z)-3-Hexeny-1-yl acetate enhances salinity stress tolerance in peanut (Arachis Hypogaea L.) seedlings. Front Plant Sci. 2019;10. [DOI] [PMC free article] [PubMed]
32.Mao Y, Li H, Wang Y, Wang H, Shen J, Xu Y et al. Rapid monitoring of tea plants under cold stress based on UAV multi-sensor data. Comput Electron Agric. 2023;213.
33.Cheng J, Sun D, Zeng X-a, Pu H. Non-destructive and rapid determination of TVB-N content for freshness evaluation of grass carp (Ctenopharyngodon idella) by hyperspectral imaging. Innovative Food Sci Emerg Technol. 2014;21:179–87. [Google Scholar]
34.Feng Y, Sun D. Near-infrared hyperspectral imaging in tandem with partial least squares regression and genetic algorithm for non-destructive determination and visualization of Pseudomonas loads in chicken fillets. Talanta. 2013;15:74–83. [DOI] [PubMed] [Google Scholar]
35.Mao Y, Li H, Xu Y, Wang S, Yin X, Fan K et al. Early detection of Gray blight in tea leaves and rapid screening of resistance varieties by hyperspectral imaging technology. J Sci Food Agric. 2024;104(15). [DOI] [PubMed]
36.Li H, Mao Y, Wang Y, Fan K, Shi H, Sun L et al. Environ Simul Model Rapid Prediction Tea Seedl Growth Agron. 2022;12(12).
37.Yazdian H, Salmani-Dehaghi N, Alijanian M. A spatially promoted SVM model for GRACE downscaling: using ground and satellite-based datasets. J Hydrol. 2023;626.
38.Sun J, Yang F, Cheng J, Wang S, Fu L. Nondestructive identification of soybean protein in minced chicken meat based on hyperspectral imaging and VGG16-SVM. J Food Compos Anal. 2024;125.
39.Wei L, Yuan Z, Wang Z, Zhao L, Zhang Y, Lu X et al. Hyperspectral inversion of soil organic matter content based on a combined spectral index model. Sensors. 2020;20(10). [DOI] [PMC free article] [PubMed]
40.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al. Attention Is All You Need. 31st Conference on Neural Information Processing Systems; Long Beach, CA, USA2017.
41.Wei L, Yuan Z, Zhong Y, Yang L, Hu X, Zhang Y. An improved gradient boosting regression tree Estimation model for soil heavy metal (Arsenic) pollution monitoring using hyperspectral remote sensing. Appl Sci. 2019;9(9).
42.Zhu M, Zhang M, Gao D, Zhou K, Tang S, Zhou B et al. Rice OsHSFA3 gene improves drought tolerance by modulating polyamine biosynthesis depending on abscisic acid and ROS levels. Int J Mol Sci. 2020;21. [DOI] [PMC free article] [PubMed]
43.Begum N, Wang L, Ahmad H, Akhtar K, Roy R, Khan MI, et al. Co-inoculation of arbuscular mycorrhizal fungi and the plant growth-Promoting rhizobacteria improve growth and photosynthesis in tobacco under drought stress by Up-Regulating antioxidant and mineral nutrition metabolism. Microb Ecol. 2021;83(4):971–88. [DOI] [PubMed] [Google Scholar]
44.Zaman Qu, Rehman M, Feng Y, Liu Z, Murtaza G, Sultan K et al. Combined application of Biochar and Peatmoss for mitigation of drought stress in tobacco. BMC Plant Biol. 2024;24(1). [DOI] [PMC free article] [PubMed]
45.Ozturk M, Turkyilmaz Unal B, García-Caparrós P, Khursheed A, Gul A, Hasanuzzaman M. Osmoregulation and its actions during the drought stress in plants. Physiol Plant. 2020;172(2):1321–35. [DOI] [PubMed] [Google Scholar]
46.Wang F, Jiang Z, Wang H, Liang F, Wang Y, Zhang J, et al. Exogenous Trehalose alleviates the inhibitory effects of salt and drought stresses in Okra plants. Hortic Environ Biotechnol. 2025;66(1):25–38. [Google Scholar]
47.Allakhverdiev SI. Optimising photosynthesis for environmental fitness. Funct Plant Biol. 2020;47(11). [DOI] [PubMed]
48.Wahab A, Abdi G, Saleem MH, Ali B, Ullah S, Shah W, et al. Plants’ Physio-Biochemical and Phyto-Hormonal responses to alleviate the adverse effects of drought stress: A comprehensive review. Plants. 2022;11:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Sun C, Johnson JM, Cai D, Sherameti I, Oelmüller R, Lou B. Piriformospora indica confers drought tolerance in Chinese cabbage leaves by stimulating antioxidant enzymes, the expression of drought-related genes and the plastid-localized CAS protein. J Plant Physiol. 2010;167(12):1009–17. [DOI] [PubMed] [Google Scholar]
50.Wang P, Sun X, Li C, Wei Z, Liang D, Ma F. Long-term exogenous application of melatonin delays drought‐induced leaf senescence in Apple. J Pineal Res. 2012;54(3):292–302. [DOI] [PubMed] [Google Scholar]
51.Zhu J, Xu J, Cao Y, Fu J, Li B, Sun G et al. Leaf reflectance and functional traits as environmental indicators of urban dust deposition. BMC Plant Biol. 2021;21(1). [DOI] [PMC free article] [PubMed]
52.Guo B, Qi S, Heng Y, Duan J, Zhang H, Wu Y, et al. Remotely assessing leaf N uptake in winter wheat based on canopy hyperspectral red-edge absorption. Eur J Agron. 2017;82:113–24. [Google Scholar]
53.Leiva-Valenzuela GA, Lu R, Aguilera JM. Prediction of firmness and soluble solids content of blueberries using hyperspectral reflectance imaging. J Food Eng. 2013;115(1):91–8. [Google Scholar]
54.Guo Z, Wang M, Shujat A, Wu J, El-Seedi HR, Shi J, et al. Nondestructive monitoring storage quality of apples at different temperatures by near‐infrared transmittance spectroscopy. Food Sci Nutr. 2020;8(7):3793–805. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Kong W, Liu F, Fang H, He Y. Rapid detection of malondialdehyde in herbicide-stressed barley leaves using spectroscopic techniques. Trans Chin Soc Agricultural Eng. 2012;28(2):171–5. [Google Scholar]
56.Burnett AC, Serbin SP, Davidson KJ, Ely KS, Rogers A. Detection of the metabolic response to drought stress using hyperspectral reflectance. J Exp Bot. 2021;72(18):6474–89. [DOI] [PubMed] [Google Scholar]
57.Yassue RM, Galli G, Fritsche-Neto R, Morota G. Classification of plant growth‐promoting bacteria inoculation status and prediction of growth‐related traits in tropical maize using hyperspectral image and genomic data. Crop Sci. 2022;63(1):88–100. [Google Scholar]
58.Hao X, Cao H, Wang Z, Jia X, Jin Z, Pei Y. Hydrogen sulfide improves plant drought tolerance by regulating the homeostasis of reactive oxygen species. Plant Growth Regul. 2024;104(2):803–21. [Google Scholar]
59.Lara M, Diezma B, Lleó L, Roger J, Garrido Y, Gil M et al. Hyperspectral imaging to evaluate the effect of irrigationwater salinity in lettuce. Appl Sci. 2016;6(12).
60.Shen T, Zhang C, Liu F, Wang W, Lu Y, Chen R et al. High-Throughput screening of free proline content in rice leaf under cadmium stress using hyperspectral imaging with chemometrics. Sensors. 2020;20(11). [DOI] [PMC free article] [PubMed]
61.He X, Fu X, Rao X, Fang Z. Assessing firmness and SSC of Pears based on absorption and scattering properties using an automatic integrating sphere system from 400 to 1150 Nm. Postharvest Biol Technol. 2016;121:62–70. [Google Scholar]
62.Ali A, Imran M. Evaluating the potential of red edge position (REP) of hyperspectral remote sensing data for real time Estimation of LAI & chlorophyll content of Kinnow Mandarin (Citrus reticulata) fruit orchards. Sci Hort. 2020;267.
63.Hu G, Wan M, Wei K, Ye R. Computer vision based method for severity Estimation of tea leaf blight in natural scene images. Eur J Agron. 2023;144.
64.Thorp KR, Tian L, Yao H, Tang L. Narrow-band and derivative-based vegetation indices for hyperspectral data. Trans ASAE. 2004;47(1):291–9. [Google Scholar]
65.Feng H, Chen Y, Song J, Lu B, Shu C, Qiao J et al. Maturity classification of rapeseed using hyperspectral image combined with machine learning. Plant Phenomics. 2024;6. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.^{(12.7KB, xlsx)}

Supplementary Material 2.^{(32.4KB, xlsx)}

Data Availability Statement

No datasets were generated or analysed during the current study.

[CR1] 1.Xu R, Shao C, Luo Y, Zhou B, Zhu Q, Qiu S et al. Tea polyphenol mediated CsMYB77 regulation of CsPOD44 to promote tea plant (Camellia sinensis) root drought resistance. Hortic Res. 2025;12(6). [DOI] [PMC free article] [PubMed]

[CR2] 2.Guo Y, Zhao S, Zhu C, Chang X, Yue C, Wang Z et al. Identification of drought-responsive MiRNAs and physiological characterization of tea plant (Camellia sinensis L.) under drought stress. BMC Plant Biol. 2017;17(1). [DOI] [PMC free article] [PubMed]

[CR3] 3.Luo B, Sun H, Zhang L, Chen F, Wu K. Advances in the tea plants phenotyping using hyperspectral imaging technology. Front Plant Sci. 2024;15. [DOI] [PMC free article] [PubMed]

[CR4] 4.Chen S, Shen J, Fan K, Qian W, Gu H, Li Y et al. Hyperspectral machine-learning model for screening tea germplasm resources with drought tolerance. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]

[CR5] 5.Chen S, Gao Y, Fan K, Shi Y, Luo D, Shen J et al. Prediction of drought-Induced components and evaluation of drought damage of tea plants based on hyperspectral imaging. Front Plant Sci. 2021;12. [DOI] [PMC free article] [PubMed]

[CR6] 6.Tao H, Xu S, Tian Y, Li Z, Ge Y, Zhang J et al. Proximal and remote sensing in plant phenomics: 20 years of progress, challenges, and perspectives. Plant Commun. 2022;3(6). [DOI] [PMC free article] [PubMed]

[CR7] 7.Kuswidiyanto LW, Noh H-H, Han X. Plant disease diagnosis using deep learning based on aerial hyperspectral images: A review. Remote Sens. 2022;14(23).

[CR8] 8.Tang T, Luo Q, Yang L, Gao C, Ling C, Wu W. Research review on quality detection of fresh tea leaves based on spectral technology. Foods. 2023;13(1). [DOI] [PMC free article] [PubMed]

[CR9] 9.Mao Y, Li H, Wang Y, Fan K, Shen J, Zhang J et al. Low temperature response index for monitoring freezing injury of tea plant. Front Plant Sci. 2023;14. [DOI] [PMC free article] [PubMed]

[CR10] 10.Xu Y, Mao Y, Li H, Sun L, Wang S, Li X, et al. A deep learning model for rapid classification of tea coal disease. Plant Methods. 2023;19:98. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Li H, Wang Y, Fan K, Mao Y, Shen Y, Ding Z. Evaluation of important phenotypic parameters of tea plantations using multi-source remote sensing data. Front Plant Sci. 2022;13. [DOI] [PMC free article] [PubMed]

[CR12] 12.Sun J, Zhou X, Hu Y, Wu X, Zhang X, Wang P. Visualizing distribution of moisture content in tea leaves using optimization algorithms and NIR hyperspectral imaging. Comput Electron Agric. 2019;160:153–9. [Google Scholar]

[CR13] 13.Dutta D, Das PK, Bhunia UK, Singh U, Singh S, Sharma JR, et al. Retrieval of tea polyphenol at leaf level using spectral transformation and multi-variate statistical approach. Int J Appl Earth Obs Geoinf. 2015;36:22–9. [Google Scholar]

[CR14] 14.Tu Y, Bian M, Wan Y, Fei T. Tea cultivar classification and biochemical parameter Estimation from hyperspectral imagery obtained by UAV. PeerJ. 2018;6:e4858. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Pu H, Sun D-W, Ma J, Liu D, Cheng J-h. Using wavelet textural features of visible and near infrared hyperspectral image to differentiate between fresh and Frozen–Thawed pork. Food Bioprocess Technol. 2014;7(11):3088–99. [Google Scholar]

[CR16] 16.Wu Y, Peng S, Xie Q, Han Q, Zhang G, Sun H. An improved weighted multiplicative scatter correction algorithm with the use of variable selection: application to near-infrared spectra. Chemometr Intell Lab Syst. 2019;185(15):114–21. [Google Scholar]

[CR17] 17.Yibing L, Wenqing L, Yujun Z, Kai Z, Ying H, Kun Y, et al. An adaptive hierarchical Savitzky-Golay spectral filtering algorithm and its application. Spectrosc Spectr Anal. 2019;9:2657–63. [Google Scholar]

[CR18] 18.Xiaoli C, Hongfu Y, Wanzhen L. Progress and application of spectral data pretreatment and wavelength selection methods in NIR analytical technique. Progress Chem. 2004;16(4):528–42. [Google Scholar]

[CR19] 19.Bai J, Zhu S, Hao Y, Li X, Yang C, Wang C, et al. Comparative analysis of the effects of different dimensionality reduction algorithms on hyperspectral Estimation of total nitrogen content in wheat soils. Eur J Agron. 2025;168:127660. [Google Scholar]

[CR20] 20.Xu Y, Mao Y, Li H, Shen J, Xu X, Wang S et al. A deep learning model based on RGB and hyperspectral images for efficiently detecting tea green leafhopper damage symptoms. Smart Agricultural Technol. 2025;10.

[CR21] 21.Luo D, Gao Y, Wang Y, Shi Y, Chen S, Ding Z, et al. Using UAV image data to monitor the effects of different nitrogen application rates on tea quality. J Sci Food Agric. 2021;102(4):1540–9. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Singh AK, Ganapathysubramanian B, Sarkar S, Singh A. Deep learning for plant stress phenotyping: trends and future perspectives. Trends Plant Sci. 2018;23(10):883–98. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Zhang L, Wang Y, Yang L, Chen J, Liu Z, Wang J, et al. A multi-range spectral-spatial transformer for hyperspectral image classification. Infrared Phys Technol. 2023;135:104983. [Google Scholar]

[CR24] 24.Wang Y, Wang C, Wang B, Wang H. Combination of feature selection methods and lightweight transformer model for estimating the canopy water content of alpine shrub using spectral data. Infrared Phys Technol. 2024;139.

[CR25] 25.Deng B, Lu Y, Stafne E. Fusing spectral and Spatial features of hyperspectral reflectance imagery for differentiating between normal and defective blueberries. Smart Agricultural Technol. 2024;8.

[CR26] 26.Shehu HA, Ackley A, Marvellous M, Eteng OE. Early detection of tomato leaf diseases using Transformers and transfer learning. Eur J Agron. 2025;168.

[CR27] 27.Mohamed S, Haghighat M, Fernando T, Sridharan S, Fookes C, Moghadam P. FactoFormer: factorized hyperspectral Transformers with Self-Supervised pretraining. IEEE Trans Geosci Remote Sens. 2024;62.

[CR28] 28.Rahman MH, Busby S, Ru S, Hanif S, Sanz-Saez A, Zheng J et al. Transformer-Based hyperspectral image analysis for phenotyping drought tolerance in blueberries. Comput Electron Agric. 2025;228.

[CR29] 29.Cao L, Sun M, Yang Z, Jiang D, Yin D, Duan Y. A novel Transformer-CNN approach for predicting soil properties from LUCAS Vis-NIR spectral data. Agronomy. 2024;14.

[CR30] 30.George KSS, AC SNOV, J KM, Francis PK. J, NorBlueNet: hyperspectral imaging-based hybrid CNN-transformer model for non-destructive SSC analysis in Norwegian wild blueberries. Comput Electron Agric. 2025;235.

[CR31] 31.Tian S, Guo R, Zou X, Zhang X, Yu X, Zhan Y et al. Priming with the green leaf volatile (Z)-3-Hexeny-1-yl acetate enhances salinity stress tolerance in peanut (Arachis Hypogaea L.) seedlings. Front Plant Sci. 2019;10. [DOI] [PMC free article] [PubMed]

[CR32] 32.Mao Y, Li H, Wang Y, Wang H, Shen J, Xu Y et al. Rapid monitoring of tea plants under cold stress based on UAV multi-sensor data. Comput Electron Agric. 2023;213.

[CR33] 33.Cheng J, Sun D, Zeng X-a, Pu H. Non-destructive and rapid determination of TVB-N content for freshness evaluation of grass carp (Ctenopharyngodon idella) by hyperspectral imaging. Innovative Food Sci Emerg Technol. 2014;21:179–87. [Google Scholar]

[CR34] 34.Feng Y, Sun D. Near-infrared hyperspectral imaging in tandem with partial least squares regression and genetic algorithm for non-destructive determination and visualization of Pseudomonas loads in chicken fillets. Talanta. 2013;15:74–83. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Mao Y, Li H, Xu Y, Wang S, Yin X, Fan K et al. Early detection of Gray blight in tea leaves and rapid screening of resistance varieties by hyperspectral imaging technology. J Sci Food Agric. 2024;104(15). [DOI] [PubMed]

[CR36] 36.Li H, Mao Y, Wang Y, Fan K, Shi H, Sun L et al. Environ Simul Model Rapid Prediction Tea Seedl Growth Agron. 2022;12(12).

[CR37] 37.Yazdian H, Salmani-Dehaghi N, Alijanian M. A spatially promoted SVM model for GRACE downscaling: using ground and satellite-based datasets. J Hydrol. 2023;626.

[CR38] 38.Sun J, Yang F, Cheng J, Wang S, Fu L. Nondestructive identification of soybean protein in minced chicken meat based on hyperspectral imaging and VGG16-SVM. J Food Compos Anal. 2024;125.

[CR39] 39.Wei L, Yuan Z, Wang Z, Zhao L, Zhang Y, Lu X et al. Hyperspectral inversion of soil organic matter content based on a combined spectral index model. Sensors. 2020;20(10). [DOI] [PMC free article] [PubMed]

[CR40] 40.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al. Attention Is All You Need. 31st Conference on Neural Information Processing Systems; Long Beach, CA, USA2017.

[CR41] 41.Wei L, Yuan Z, Zhong Y, Yang L, Hu X, Zhang Y. An improved gradient boosting regression tree Estimation model for soil heavy metal (Arsenic) pollution monitoring using hyperspectral remote sensing. Appl Sci. 2019;9(9).

[CR42] 42.Zhu M, Zhang M, Gao D, Zhou K, Tang S, Zhou B et al. Rice OsHSFA3 gene improves drought tolerance by modulating polyamine biosynthesis depending on abscisic acid and ROS levels. Int J Mol Sci. 2020;21. [DOI] [PMC free article] [PubMed]

[CR43] 43.Begum N, Wang L, Ahmad H, Akhtar K, Roy R, Khan MI, et al. Co-inoculation of arbuscular mycorrhizal fungi and the plant growth-Promoting rhizobacteria improve growth and photosynthesis in tobacco under drought stress by Up-Regulating antioxidant and mineral nutrition metabolism. Microb Ecol. 2021;83(4):971–88. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Zaman Qu, Rehman M, Feng Y, Liu Z, Murtaza G, Sultan K et al. Combined application of Biochar and Peatmoss for mitigation of drought stress in tobacco. BMC Plant Biol. 2024;24(1). [DOI] [PMC free article] [PubMed]

[CR45] 45.Ozturk M, Turkyilmaz Unal B, García-Caparrós P, Khursheed A, Gul A, Hasanuzzaman M. Osmoregulation and its actions during the drought stress in plants. Physiol Plant. 2020;172(2):1321–35. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Wang F, Jiang Z, Wang H, Liang F, Wang Y, Zhang J, et al. Exogenous Trehalose alleviates the inhibitory effects of salt and drought stresses in Okra plants. Hortic Environ Biotechnol. 2025;66(1):25–38. [Google Scholar]

[CR47] 47.Allakhverdiev SI. Optimising photosynthesis for environmental fitness. Funct Plant Biol. 2020;47(11). [DOI] [PubMed]

[CR48] 48.Wahab A, Abdi G, Saleem MH, Ali B, Ullah S, Shah W, et al. Plants’ Physio-Biochemical and Phyto-Hormonal responses to alleviate the adverse effects of drought stress: A comprehensive review. Plants. 2022;11:13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Sun C, Johnson JM, Cai D, Sherameti I, Oelmüller R, Lou B. Piriformospora indica confers drought tolerance in Chinese cabbage leaves by stimulating antioxidant enzymes, the expression of drought-related genes and the plastid-localized CAS protein. J Plant Physiol. 2010;167(12):1009–17. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Wang P, Sun X, Li C, Wei Z, Liang D, Ma F. Long-term exogenous application of melatonin delays drought‐induced leaf senescence in Apple. J Pineal Res. 2012;54(3):292–302. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Zhu J, Xu J, Cao Y, Fu J, Li B, Sun G et al. Leaf reflectance and functional traits as environmental indicators of urban dust deposition. BMC Plant Biol. 2021;21(1). [DOI] [PMC free article] [PubMed]

[CR52] 52.Guo B, Qi S, Heng Y, Duan J, Zhang H, Wu Y, et al. Remotely assessing leaf N uptake in winter wheat based on canopy hyperspectral red-edge absorption. Eur J Agron. 2017;82:113–24. [Google Scholar]

[CR53] 53.Leiva-Valenzuela GA, Lu R, Aguilera JM. Prediction of firmness and soluble solids content of blueberries using hyperspectral reflectance imaging. J Food Eng. 2013;115(1):91–8. [Google Scholar]

[CR54] 54.Guo Z, Wang M, Shujat A, Wu J, El-Seedi HR, Shi J, et al. Nondestructive monitoring storage quality of apples at different temperatures by near‐infrared transmittance spectroscopy. Food Sci Nutr. 2020;8(7):3793–805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Kong W, Liu F, Fang H, He Y. Rapid detection of malondialdehyde in herbicide-stressed barley leaves using spectroscopic techniques. Trans Chin Soc Agricultural Eng. 2012;28(2):171–5. [Google Scholar]

[CR56] 56.Burnett AC, Serbin SP, Davidson KJ, Ely KS, Rogers A. Detection of the metabolic response to drought stress using hyperspectral reflectance. J Exp Bot. 2021;72(18):6474–89. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Yassue RM, Galli G, Fritsche-Neto R, Morota G. Classification of plant growth‐promoting bacteria inoculation status and prediction of growth‐related traits in tropical maize using hyperspectral image and genomic data. Crop Sci. 2022;63(1):88–100. [Google Scholar]

[CR58] 58.Hao X, Cao H, Wang Z, Jia X, Jin Z, Pei Y. Hydrogen sulfide improves plant drought tolerance by regulating the homeostasis of reactive oxygen species. Plant Growth Regul. 2024;104(2):803–21. [Google Scholar]

[CR59] 59.Lara M, Diezma B, Lleó L, Roger J, Garrido Y, Gil M et al. Hyperspectral imaging to evaluate the effect of irrigationwater salinity in lettuce. Appl Sci. 2016;6(12).

[CR60] 60.Shen T, Zhang C, Liu F, Wang W, Lu Y, Chen R et al. High-Throughput screening of free proline content in rice leaf under cadmium stress using hyperspectral imaging with chemometrics. Sensors. 2020;20(11). [DOI] [PMC free article] [PubMed]

[CR61] 61.He X, Fu X, Rao X, Fang Z. Assessing firmness and SSC of Pears based on absorption and scattering properties using an automatic integrating sphere system from 400 to 1150 Nm. Postharvest Biol Technol. 2016;121:62–70. [Google Scholar]

[CR62] 62.Ali A, Imran M. Evaluating the potential of red edge position (REP) of hyperspectral remote sensing data for real time Estimation of LAI & chlorophyll content of Kinnow Mandarin (Citrus reticulata) fruit orchards. Sci Hort. 2020;267.

[CR63] 63.Hu G, Wan M, Wei K, Ye R. Computer vision based method for severity Estimation of tea leaf blight in natural scene images. Eur J Agron. 2023;144.

[CR64] 64.Thorp KR, Tian L, Yao H, Tang L. Narrow-band and derivative-based vegetation indices for hyperspectral data. Trans ASAE. 2004;47(1):291–9. [Google Scholar]

[CR65] 65.Feng H, Chen Y, Song J, Lu B, Shu C, Qiao J et al. Maturity classification of rapeseed using hyperspectral image combined with machine learning. Plant Phenomics. 2024;6. [DOI] [PMC free article] [PubMed]

PERMALINK

A lightweight hybrid transformer approach for hyperspectral imaging-based drought tolerance evaluation in tea plants

Yuchen Li

Yi Zhang

Yu Wang

Hao Chen

Xiao Han

Yilin Mao

Litao Sun

Jiazhi Shen

Zhaotang Ding

Abstract

Background

Results

Conclusions

Supplementary Information

Introduction

Materials and methods

Plant materials and treatments

Data acquisition of tea samples

Acquisition of hyperspectral data

Fig. 1.

Acquisition of drought phenotype images

Acquisition of physiological indicators

Acquisition of tea drought tolerance index (TDTI)

Preprocessing of hyperspectral data

Feature band screening of hyperspectral data

Table 1.

Establishment of models

Establishment of SVM, RF and BRR

Table 2.

Table 3.

Establishment of the lightweight transformer-based hybrid architecture

Fig. 2.

Table 4.

Software, dataset partitioning, and model evaluation

Results

Response of different tea plant varieties to drought

Fig. 3.

Fig. 4.

Impact of drought on spectral reflectance curves

Fig. 5.

Selection of feature bands

Table 5.

Establishment and comparison of models

Fig. 6.

Table 6.

Discussion

A more rational comprehensive index: developing TDTI for accurate evaluation of tea plant drought tolerance

Traditional machine learning: effective models for indicators with distinct spectral absorption peaks

Fig. 7.

Transformer-based models: the advantage in capturing complex, multi-band interactions​

A paradigm for model selection: matching algorithmic strengths to spectral complexity

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Transformer-based models: the advantage in capturing complex, multi-band interactions