Abstract
Timely crop stress detection is essential for safeguarding yields and promoting sustainable agriculture. Traditional vegetation indices (e.g., NDVI, EVI) are widely used but remain static, crop-agnostic, and often insensitive to early stress signals. This study proposed RL-VI, a reinforcement learning-based framework that dynamically formulates vegetation indices optimized for rice stress detection. Unlike existing methods, RL-VI integrates Sentinel-2 multispectral imagery with smartphone-captured RGB data, creating the first cross-platform environment where vegetation indices are learned rather than predefined. The reinforcement learning agent adaptively selects stress-sensitive spectral band combinations guided by classification rewards. Experiments on real-world rice fields in Tamil Nadu, India, and benchmark datasets (Indian Pines, wheat salt stress) show that RL-VI achieves an overall accuracy of 89.4% and F1-score of 0.88, outperforming static and machine-learned indices by up to 12%. Importantly, RL-VI enables early stress detection up to 10 14 days before visible symptoms, providing actionable lead time for intervention. The proposed framework is computationally lightweight and scalable to UAV or edge devices, offering a farmer-ready tool for precision agriculture, bridging field-level mobile sensing with satellite monitoring for low-cost, real-time crop health management. Statistical validation using ANOVA (F = 88.24, p < 0.001) and pairwise t-tests (p < 0.001) confirmed RL-VI’s superiority, while SHAP analyses emphasized the physiological significance of red-edge and SWIR bands in stress discrimination.
Keywords: Reinforcement learning, Vegetation index (VI), Sentinel-2, Crop stress detection, Precision agriculture, Hyperspectral/Multispectral remote sensing
Subject terms: Ecology, Ecology, Environmental sciences, Mathematics and computing, Plant sciences
Introduction
Motivation and importance of crop stress monitoring
Timely detection and monitoring of crop stress are essential for sustaining global food security, especially in the context of climate variability, resource constraints, and increasing demand for high-yielding crop production. Crop stress, whether abiotic (e.g., drought, salinity, nutrient deficiency) or biotic (e.g., disease, pest attack), leads to significant yield losses when not identified early. Traditional field-based assessments are labor-intensive, subjective, and often reactive in nature, missing the opportunity for early intervention during latent stress phases1. Remote sensing offers a non-invasive, scalable solution to monitor vegetation health by capturing canopy reflectance across visible, near-infrared (NIR), and shortwave-infrared (SWIR) bands. Vegetation indices (VIs) such as the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) have been widely used to infer photosynthetic activity and canopy stress. However, these static indices often fail to capture nuanced stress signals, especially during early onset stages or under overlapping phenological changes2,3. The need for adaptive, data-driven approaches to detect crop stress has led to increasing interest in machine learning (ML) and deep learning (DL) methods that can model nonlinear interactions between spectral features and stress responses4. Yet most ML-based models rely on fixed input features or handcrafted indices. Reinforcement learning (RL) offers a new paradigm where vegetation indices themselves can be optimized dynamically based on the stress classification reward, enabling precision agriculture at scale 51–53
Limitations of static vegetation indices
Conventional vegetation indices (VIs), such as the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Soil-Adjusted Vegetation Index (SAVI), have been extensively used for crop monitoring, particularly for assessing greenness, chlorophyll content, and biomass. These indices rely on fixed spectral band combinations, typically involving red and near-infrared (NIR) wavelengths, and are calculated using linear or empirical formulas. While effective in many general scenarios, static indices suffer from several key limitations when applied to crop stress detection, especially across varying crop types, soil backgrounds, and growth stages5,6. First, static VIs often lack the spectral flexibility to capture stress signals that manifest outside the red-NIR spectrum, such as those observable in the red-edge or shortwave infrared (SWIR) bands, which are more sensitive to plant water content, canopy structure, and pigment degradation7. Second, these indices tend to saturate under high vegetation density, leading to reduced sensitivity during critical growth periods8. Third, the lack of adaptability in static indices cannot be optimized for specific stress types (e.g., drought vs. nutrient deficiency) or for local environmental conditions, which limits their transferability and generalization across regions9.
Recent studies have emphasized the importance of integrating multi-sensor data (e.g., Sentinel-1, Sentinel-2, UAV, and ground-based imagery) for high-resolution stress monitoring and yield prediction13–16,54,55 However, most vegetation indices (VIs) derived from such imagery like NDVI, EVI, and SAVI remain static and empirical, limiting their adaptability to crop-specific and stage-dependent spectral variations. These fixed-band indices also suffer from saturation effects in dense canopies and poor sensitivity to early physiological stress signals2,3,17. Recent research in cloud–IoT agriculture and soil analytics has emphasized the importance of integrating satellite and UAV remote sensing for field-scale decision support19,20,37,38. New deep-learning frameworks for disease detection and spectral segmentation have shown promising results, particularly for multi-crop environments39–42. Several studies have introduced novel vegetation indices, red-edge formulations, and multispectral fusion strategies for rice and wheat monitoring using Sentinel-2 imagery43–46. High-resolution UAV-based studies further highlight the role of spectral optimization in improving crop stress detection accuracy47–50.
Research gaps and motivation for reinforcement learning
Traditional vegetation indices operate on predefined spectral formulas, assuming constant relationships between reflectance bands and vegetation biophysical properties. Yet, real-world conditions exhibit nonlinear interactions between plant pigments, canopy structure, and water content, making static formulations inadequate for dynamic crop environments4,5.
Recent advancements in machine learning and optimization-based index design have sought to address this issue, but these approaches often rely on handcrafted features or fixed combinations that cannot adapt to new datasets or crop types once trained6.
Emerging work in precision agriculture has started leveraging artificial intelligence and cloud-based systems for real-time stress prediction and recommendation modeling13. Similarly, deep learning driven spectral feature selection has shown potential for optimizing vegetation index formulation, but it lacks interpretability and adaptability across platforms14,15. Reinforcement learning (RL), with its capability for sequential decision-making and reward-based adaptation, provides a promising alternative.
Reinforcement learning for dynamic vegetation index formulation
Reinforcement Learning (RL) enables models to iteratively learn optimal spectral band combinations and weights by interacting with data environments, receiving rewards based on stress classification performance. While RL has been applied to hyperspectral band selection and active learning tasks16, its use in dynamic vegetation index formulation remains largely unexplored56.
Recent studies have emphasized the potential of RL and hybrid deep learning for autonomous agricultural monitoring, enabling models to self-optimize based on classification rewards rather than static feature inputs17,18. Moreover, Auto ML-based frameworks for crop yield prediction19 and explainable AI models using SHAP highlight the growing emphasis on transparency and interpretability key strengths also incorporated in the RL-VI framework proposed here57.
Key contributions of dynamic RL-based index formulation compared to fixed indices
Existing vegetation indices such as NDVI, EVI, and SAVI have proven effective for general vegetation monitoring but exhibit key limitations when applied to stress detection. Their fixed spectral formulations are insensitive to dynamic environmental and phenological changes, resulting in reduced accuracy during early or mild stress conditions. Recent studies employing machine learning and deep learning techniques have enhanced classification performance but often act as black-box systems with limited transparency and no capacity to self-adjust across datasets13–20.
The novelty of this work lies in introducing a Reinforcement Learning-based Vegetation Index (RL-VI) that formulates adaptive spectral relationships through a learning mechanism driven by stress-classification reward feedback. Unlike static indices or pre-trained models, RL-VI learns from the spectral environment itself selecting and weighting spectral bands dynamically to maximize stress detection accuracy. This approach not only addresses the shortcomings of traditional and ML-based indices but also ensures interpretability through SHAP explanations, linking the learned indices to physiologically relevant spectral regions.
RL-VI: A reinforcement Learning-Based index formulation framework
This study proposes RL-VI, a new reinforcement learning environment designed to learn optimal spectral band combinations for vegetation index calculation. The agent is trained to maximize classification performance using stress-labeled hyperspectral and multispectral data.
Cross-Platform integration of mobile and satellite imagery
This study demonstrates, for the first time, the joint use of smartphone-captured RGB data and Sentinel-2 satellite imagery for validating dynamic vegetation indices in a real-world rice field scenario. The proposed method achieves strong cross-platform correlation and transferability.
Comparison with conventional and Learning-Based indices
RL-VI is compared against widely used indices (NDVI, EVI, SAVI) and machine learning-derived indices (MLVI, H_VSI). It demonstrates superior accuracy, earlier stress detection, and consistent performance across rice growth stages.
Scalable application for precision agriculture
The framework is designed to be lightweight and adaptable, making it suitable for integration into UAV platforms or edge-based agricultural monitoring systems18.
The main methodological contributions of the proposed RLVI framework are summarized in Table 1.
Table 1.
Key contributions of the Study.
| Key contribution | Description |
|---|---|
| Dynamic Vegetation Index Formulation | Uses reinforcement learning to dynamically learn stress-sensitive band combinations. |
| Cross-Platform Data Integration (Satellite + Mobile) | Fuses Sentinel-2 and RGB smartphone imagery for robust validation. |
| Comparison with Traditional and ML-Based Indices | Benchmarked against NDVI, EVI, MLVI, H_VSI on accuracy and early detection. |
| Scalable and Interpretable for Precision Agriculture | Designed for UAV/edge deployment with low computational cost. |
Methodology
A. Overview of RL - VI framework
Existing vegetation indices such as NDVI, EVI, and SAVI have proven effective for general vegetation monitoring but exhibit key limitations when applied to stress detection. Their fixed spectral formulations are insensitive to dynamic environmental and phenological changes, resulting in reduced accuracy during early or mild stress conditions. Recent studies employing machine learning and deep learning techniques have enhanced classification performance but often act as black-box systems with limited transparency and no capacity to self-adjust across datasets13–20.
The novelty of this work lies in introducing a Reinforcement Learning-based Vegetation Index (RL-VI) that formulates adaptive spectral relationships through a learning mechanism driven by stress-classification reward feedback. Unlike static indices or pre-trained models, RL-VI learns from the spectral environment itself selecting and weighting spectral bands dynamically to maximize stress detection accuracy. This approach not only addresses the shortcomings of traditional and ML-based indices but also ensures interpretability through SHAP explanations, linking the learned indices to physiologically relevant spectral regions.
Our proposed RL-VI framework shown in Fig. 1. integrates multi-source sensing including Sentinel-2 satellite data and field-acquired RGB imagery within a reinforcement learning paradigm to dynamically optimize vegetation index formulation. The architecture (Fig. 2) begins with synchronized data collection: Sentinel-2 provides multi-band reflectance across red, NIR, red-edge, and SWIR wavelengths, while mobile RGB images are captured at key crop stages to provide ground-level visual cues21.
Fig. 1.

(a) RL-VI Proposed Architecture (b) Working Mechanism.
Fig. 2.
Workflow of RL-VI framework.
Inspired by recent advances in deep reinforcement learning (DRL) for hyperspectral band selection where an agent learns to select spectral bands to maximize classification performance22 the RL-VI agent interprets the current band combinations as the state, and selects actions (i.e., modifications to band triplets and weights) to construct a custom vegetation index23.
This index is then passed through a lightweight classifier either a support vector machine (SVM) or a 1D convolutional neural network (1D-CNN) trained on stress labels derived from satellite or RGB imagery24. A feedback-driven reward based solely on classification accuracy is used to update the deep Q-network (or proximal policy optimization (PPO) agent), enabling the model to iteratively refine the vegetation index towards more stress-sensitive band combinations.
The resulting vegetation index is adaptive, interpretable, and physiologically aligned, outperforming static indices across platforms and crop growth stages25–27.
The above structured design Fig. 1(a) ensures reproducibility, interpretability, and transparency for detailed architecture and working mechanism.
Input sources
Sentinel imagery
Satellite-based multispectral data (e.g., bands from Sentinel-2).
Mobile imagery
Field-captured RGB images using smartphones or drones.
These inputs serve as the state for the RL agent raw spectral or index-based observations used for decision-making.
Reinforcement learning agent
The RL agent takes spectral input and selects optimal band combinations and weights.
It formulates a custom Vegetation Index dynamically using the formula:
![]() |
1 |
This process is based on a trial-and-error strategy optimized for classification accuracy.
Vegetation index → crop feature
The agent outputs a vegetation index tailored to detect stress.
Crop stress classification
A machine learning classifier (e.g., SVM, CNN) uses the vegetation index as input. It predicts one of several stress levels: Healthy, Low, Moderate, High, or Extreme. This is the final output of the system.
RL-VI working mechanism
The proposed RL-VI framework working mechanism operates as a closed-loop reinforcement learning system that continuously interacts with spectral data environments to optimize vegetation index formulation.
Input layer
Multispectral bands (Sentinel-2) and canopy-level RGB inputs are preprocessed and normalized. Each spectral band acts as a potential input feature for vegetation index construction.
State representation
Each state represents a set of candidate spectral weight combinations across available bands.
Action
The RL agent modifies spectral weights (or selects bands) to form new vegetation index expressions.
Reward function
After each episode, the newly constructed index is evaluated by a downstream classifier (CNN) for stress detection accuracy. The classification accuracy serves as the reward signal guiding the agent.
Learning mechanism
The agent employs an actor critic reinforcement learning algorithm, updating its policy parameters to maximize cumulative reward (accuracy).
Convergence
Through iterative exploration and exploitation, RL-VI converges to an optimal spectral index configuration that maximizes stress classification accuracy and generalization.
Explainability
SHAP analyses are then applied to interpret feature (band) contributions, validating physiological relevance of learned indices.
Problem formulation: vegetation index as MDP
Action Space: Band weight selection.
State Space: Band history, stress response.
Reward Design: Correlation with ground truth or entropy.
In the proposed RL-VI framework, the task of dynamically constructing a vegetation index (VI) is formulated as a Markov Decision Process (MDP). This allows a reinforcement learning (RL) agent to learn optimal spectral band combinations and weights by interacting with remote sensing data and receiving feedback based on classification accuracy28,29.
State space (sₜ)
At time step t, the state sₜ represents the current environment’s observation, composed of:
The full set of normalized spectral reflectance values for a sample (e.g., 224-band vector),
Optionally, the current combination of bands already selected, encoded as a binary vector or band history stack,
And/or the current stress response signal, derived from classification confidence.
Mathematically:
![]() |
2 |
Action space (aₜ)
The action aₜ taken by the RL agent corresponds to selecting a triplet of spectral bands (i, j, k) used to compute the vegetation index in the form:
![]() |
3 |
Alternatively, if weights a, b, c are also learnable, the action space can be defined as:
aₜ = {i, j, k, a, b, c}.
This allows the agent to explore both band selection and weight configuration for index construction.
Reward design (rₜ)
The reward function guides the agent to formulate indices that maximize stress classification accuracy.
Classification-Based Reward:
![]() |
4 |
where y is the ground truth label and ŷ is the classifier prediction using the generated VI.
-
b)
Entropy-Based Reward:
![]() |
5 |
where H(ŷ) is the entropy of the classifier output probabilities.
As shown in Table 2, traditional indices such as NDVI and EVI are limited by fixed spectral formulas and a lack of adaptability to varying stress types or phenological stages. Machine-learned indices like MLVI improve early stress sensitivity through feature optimization but remain static once trained. In contrast, RL-VI dynamically formulates band combinations based on real-time classification rewards, enabling stage-sensitive and physiologically meaningful stress detection.
Table 2.
Traditional and other vegetation indices commonly used for rice in existing Literature.
| Index | Formula | Sensitivity | Limitation |
|---|---|---|---|
| NDVI | (NIR - Red)/(NIR + Red) | Chlorophyll | Saturates in dense canopy |
| EVI | 2.5 × (NIR - Red)/(NIR + 6×Red − 7.5×Blue + 1) | Biomass & structure | Sensitive to noise & soil |
| SAVI | ((NIR - Red)/(NIR + Red + L)) × (1 + L) | Canopy cover | Needs L parameter tuning |
| NDMI | (NIR - SWIR1)/(NIR + SWIR1) | Water stress | Ignores red-edge sensitivity |
| RENDVI | (RedEdge - Red)/(RedEdge + Red) | Nitrogen stress | Band choice varies by sensor |
| MLVI | Learned from RF-selected bands | Moderate early stress | Fixed once trained |
| H_VSI | Hybrid stress index using NIR/SWIR | Good for drought | Not adaptive across growth stages |
Unlike static indices, RL-VI uses spectral regions (e.g., SWIR1, red-edge) often ignored by fixed indices, allowing for earlier identification of water, heat, and nutrient stress particularly during sensitive rice growth stages such as booting and flowering. A comparative analysis highlighting the superiority of RL-VI over traditional indices is presented in Table 3. This flexibility ensures superior performance across spatial platforms (mobile + satellite) and temporal windows, which is critical for operational precision agriculture in rice ecosystems.
Table 3.
Proposed RL-VI superior for rice stress Monitoring.
| Feature | Traditional indices | RL-VI |
|---|---|---|
| Static band choice | Yes | Dynamically learned |
| Fixed sensitivity per stage | Yes | Adaptive to phenological stage |
| Cross-platform validation | Rare | Validated with RGB + Sentinel |
| Early stress detection | Weak (7–10 days late) | 10–14 days early |
A detailed comparison between traditional, machine-learned, and reinforcement learning based vegetation indices is provided in Table 4.
Table 4.
Comparison of traditional, machine-learned, and reinforcement learning-based vegetation indices for crop stress detection.
| Index | Formula | Stress sensitivity | Limitation | References |
|---|---|---|---|---|
| NDVI | (NIR - Red)/(NIR + Red) | Chlorophyll | Saturates in dense canopy | 1,2 |
| EVI | 2.5 × (NIR - Red)/(NIR + 6×Red − 7.5×Blue + 1) | Biomass, Structure | Sensitive to noise, soil | 3,4 |
| SAVI | ((NIR - Red)/(NIR + Red + L)) × (1 + L) | Canopy Cover | Requires tuning parameter L | 5 |
| NDMI | (NIR - SWIR1)/(NIR + SWIR1) | Water Stress | Ignores red-edge sensitivity | 6,7 |
| RENDVI | (RedEdge - Red)/(RedEdge + Red) | Nitrogen Stress | Sensor-dependent band range | 8,9 |
| MLVI | Learned (Fixed Band Combination) | Early Stress (Fixed) | Static after training | 10,11 |
| H_VSI | (NIR - SWIR1)/(NIR + SWIR1 + SWIR2) | Drought, Structure | Not adaptive to stage | 12 |
| RL-VI | Learned (Dynamic via RL) | Dynamic: Water, Heat, Nutrients | Needs training, compute resources | Proposed |
Deep reinforcement learning strategy
RL agent (Q-network/PPO/DQN).
Policy learning and reward updates.
To train the RL agent in the RL-VI environment, the study utilize a Deep Q-Network (DQN) approach that maps states to Q-values for all possible actions. The agent iteratively learns which spectral band combinations maximize the reward signal defined by the classification performance30.
The Q-network is a multi-layer perceptron that takes the state vector (reflectance + band history) as input and outputs Q-values for each band-action triplet. During training, the framework use an ε-greedy policy to balance exploration and exploitation. Experience replay is used to stabilize learning by storing transitions and updating the Q-network using mini-batches.
Bellman Update Equation.
The Q-values are updated using the Bellman equation:
![]() |
6 |
Where:
sₜ: current state.
aₜ: selected action.
rₜ: reward received.
γ: discount factor.
η: learning rate.
Policy and Exploration Strategy.
The policy π is derived from the Q-network, selecting actions that maximize expected reward. The ε-greedy exploration strategy:
with probability ε, select a random action (exploration) with probability (1 − ε), select the action with the highest Q-value (exploitation) The ε value decays over time to favor exploitation in later stages.
Algorithm Summary.
1. Initialize replay buffer D and Q-network with random weights.
-
2.
For each episode:
Initialize state sₜ.
For each step:
Select action aₜ using ε-greedy policy.
Execute aₜ, receive reward rₜ, next state sₜ₊₁.
Store (sₜ, aₜ, rₜ, sₜ₊₁) in D.
Sample random mini-batch from D.
Update Q-network using Bellman loss.
-
3.
Repeat until convergence or max episodes reached.
This diagram Fig. 3. illustrates a reinforcement learning (RL) design for stress classification in plants using remote sensing data.
Fig. 3.

RL agent design: state, action, reward design diagram.
RL Framework Overview.
The system is designed to use satellite and mobile sensor data to classify plant stress, improving accuracy by learning optimal combinations of input features through trial and error.
State.
The state represents the input data collected from remote sensors and field measurements: Satellite Reflectance bands (NIR, SWIR, Red): These refer to the spectral bands captured by satellites, useful for detecting plant stress. Mobile Canopy Temperature: Temperature readings from the plant canopy can indicate water stress or disease. Growth Stage: The developmental stage of the plant, which can affect stress indicators.
Action.
The action is a computed index, typically a vegetation index (here called RL-VI), formed by weighting different spectral bands:
![]() |
7 |
Where: ρ(NIR): Reflectance in Near-Infrared, ρ(SWIR): Reflectance in Shortwave Infrared, ρ(Red): Reflectance in Red band.
w1,w2,w3: Weights determined by RL agent to optimize classification.
RL Agent.
The RL Agent selects actions (vegetation indices) based on the current state, aiming to maximize the reward. It iteratively adjusts the weights to improve classification accuracy.
Reward.
The reward guides learning and measures the effectiveness of chosen actions:
Stress classification accuracy with ground truth
This represents how well the calculated index predicts plant stress compared to expert-labeled ground truth data.
High accuracy yields high rewards, pushing the RL agent to prefer effective feature combinations.
Feedback Loop.
The RL agent receives state information.
It computes an action (a vegetation index combination).
The resulting classification is compared to ground truth.
The accuracy score is used as a reward to refine the agent’s strategy for future actions.
Stress Classification Action.
A separable process or output where the vegetation index is applied for stress classification and then validated against ground truth for system training and evaluation.
This system aims to learn the best way to combine various remote sensing inputs to classify plant stress as accurately as possible, with each cycle improving the agent’s approach based on real-world feedback.
The computed vegetation index values were then passed to a classifier, typically SVM or CNN, to predict the stress class across five severity levels.
The above Fig. 4. integrating input imagery with the learned vegetation index and classifier to generate final stress predictions across five severity levels.
Fig. 4.

RL-VI classification pipeline.
The Fig. 5 illustrates the end-to-end RL-VI framework designed for dynamic vegetation index formulation and crop stress classification. The process begins with multi-source input data, including Sentinel-2 multispectral imagery, RGB field images, and derived vegetation indices such as NDVI and ExG. These inputs constitute the state space for a reinforcement learning (RL) agent. The agent operates within a Markov Decision Process (MDP), where it selects combinations of spectral bands and associated weights to construct a vegetation index formula. The learned VI is then used as an input feature for a classifier either a support vector machine (SVM) or a 1D convolutional neural network (1D-CNN). The classifier outputs the predicted stress level across five discrete classes: Healthy, Low, Moderate, Severe, and Extreme. This modular pipeline enables the system to dynamically learn crop-specific indices tailored to different growth stages and configurations, improving generalization and early stress detection over traditional static VIs.
Fig. 5.
RL-VI Framework Workflow.
Integration with mobile imagery and satellite data
To ensure ground-level validation and multi-scale adaptability of the proposed RL-VI framework, mobile RGB imagery is integrated and Sentinel-2 multispectral data collected over the same rice field locations and growth stages31. This cross-platform integration enhances the reliability of vegetation index interpretation and allows the RL-VI model to generalize across spatial resolutions and sensing platforms.
Spatial alignment and normalization.
Cross-platform data fusion strategy.
Spatial and Temporal alignment
Mobile RGB images are captured in the field using smartphones at key phenological stages such as tillering, panicle initiation, and flowering. Sentinel-2 imagery is acquired for the same dates and locations, ensuring temporal consistency. Geotagged mobile images are spatially aligned with Sentinel-2 pixels using GPS metadata and visual reference markers. Index values derived from mobile imagery (e.g., NDVI_RGB, ExG) are used as supporting features or validation metrics32.
RGB-Based vegetation index calculation
RGB images are processed to extract vegetation indices such as:
![]() |
8 |
![]() |
9 |
![]() |
10 |
These indices are scaled and compared against RL-VI outputs for correlation and consistency evaluation.
Data fusion for learning
Both RGB-derived and Sentinel-2 derived vegetation indices are input into the classifier or used for reward shaping. The RL agent learns band selections that maximize classification agreement with both platforms, improving robustness33. In some configurations, RGB features are directly fused with Sentinel reflectance vectors as extended state inputs to the RL agent. This integration ensures that the vegetation index optimized by the agent remains relevant across remote sensing scales and is practically validated using accessible mobile imagery34.
Experimental setup and results
Datasets
Sentinel-2 Time Series (Polur, India Rice Fields).
Sentinel-2 Level-2 A multispectral imagery was collected from the Polur region in Tamil Nadu, India (Lat: 12.4998230, Lon: 79.1343030) during the kharif season of 2025. Data were acquired at approximately 10-day intervals, corresponding to key rice growth stages: transplanting, tillering, panicle initiation, booting, and flowering. The imagery includes 10 m and 20 m spatial-resolution bands from the visible, NIR, red-edge, and SWIR regions. Bands B2, B3, B4, B5, B6, B8, B11, and B12 were extracted for vegetation index construction. Cloud-free composites were generated using Google Earth Engine (GEE) and georeferenced with field GPS coordinates.
The Sentinel-2 dataset comprised 45 cloud-free images collected between July and October 2025. The mobile RGB dataset included 1,200 geotagged canopy-level images captured using a 64 MP Android camera across 10 experimental plots. The Indian Pines dataset contained 10,249 labelled pixels, while the Wheat Salt Stress dataset included 300 hyperspectral cube samples (204 bands each). Benchmark datasets were publicly available from the AVIRIS (USGS repository) and the University of Minnesota Hyperspectral Repository, respectively.
The Fig. 6. shows the Study area and representative Sentinel-2 imagery of rice field in Polur, Tamil Nadu, India. The base map was obtained from EOSDA Crop Monitoring tool (https://crop-monitoring.eos.com/analytics/field/10552322), which provides satellite-based agricultural monitoring and visualization tools.
Fig. 6.
(a) and (b) Study Area Map and Sentinel 2 Imagery (Polur, Tamil Nadu, India).
Mobile RGB dataset (Field-sampled)
All datasets used in this study are openly accessible. The Mobile RGB dataset, consisting of field-captured rice canopy images collected by the authors at Polur, Tamil Nadu, India, is publicly available under a CC BY-NC 4.0 license. The multispectral satellite imagery used for vegetation index analysis was obtained from an open-access satellite data repository. All processed datasets, including feature matrices, generated vegetation indices, RL-VI agent outputs, and trained model files, have been deposited together with the Mobile RGB dataset to ensure full reproducibility.
Smartphone-captured RGB images were collected on the same dates as Sentinel-2 overpass at the same field plots. Images were taken at canopy level under natural lighting conditions, geotagged using GPS-enabled mobile devices. The images were later processed to extract RGB-based vegetation indices such as NDVI_RGB, Excess Green (ExG), and VARI. This dataset served as a ground-level validation layer for comparing with satellite-derived indices.
Indian Pines/Wheat stress (for validation)
To test cross-domain generalizability of the proposed RL-VI framework, the experiments were included on two benchmark hyperspectral datasets:
Indian Pines: A hyperspectral image captured by the AVIRIS sensor with 224 bands and 16 land cover classes. For this study, data were preprocessed and relabeled into vegetation vs. non-vegetation for stress proxy classification.
Wheat Salt Stress Dataset: A hyperspectral dataset of wheat genotypes under salt stress conditions, publicly available from the University of Minnesota repository. It contains 204 spectral bands in the 400 1000 nm range and binary labels for control vs. stressed conditions. This dataset was used to evaluate RL-VI performance on biotic stress recognition in a controlled environment.
Experimental settings
This section outlines the preprocessing steps, experimental configurations, and evaluation metrics used to assess the RL-VI framework on multiple datasets and platforms.
1. Preprocessing Pipeline.
Sentinel-2 Level-2 A imagery was atmospherically corrected and resampled to a 10-meter resolution using bilinear interpolation for bands B5, B6, B11, and B12. Mobile RGB images were cropped to remove background artifacts and resized to 256 × 256 pixels for consistency. NDVI_RGB, ExG, and VARI were computed from RGB channels. All reflectance values were normalized to [0, 1] before input to the RL environment.
2. Data Splitting and Sampling.
Datasets were randomly split into training (70%), validation (15%), and testing (15%) sets using stratified sampling based on stress labels. For Indian Pines and Wheat Stress datasets, only vegetation-related classes or stress categories were retained. To balance classes, random under-sampling was used where necessary.
3. Reinforcement Learning Settings.
The Deep Q-Network (DQN) agent was used with a 2-layer MLP (hidden sizes: 128, 64), ReLU activation, and Adam optimizer. The agent was trained for 200 episodes with a learning rate η = 0.001, discount factor γ = 0.95, and ε-greedy decay from 1.0 to 0.1. The reward was calculated using F1 score.
4. Classifier Settings.
Each generated VI was passed through an SVM or CNN classifier for binary or multi-class stress classification. SVMs used RBF kernels with grid search for C and γ. CNNs used 3 convolutional layers with batch normalization and dropout (0.3). Training was performed using cross-entropy loss and early stopping on validation loss.
5. Evaluation Metrics.
Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). Additionally, Pearson correlation between RL-VI and mobile-derived indices (e.g., ExG, NDVI_RGB) was computed for spatial consistency analysis. Early stress detection capability was assessed by comparing detection dates across growth stages.
Baseline comparison methods
To validate the effectiveness of the RL-VI framework, the result compared its performance against traditional vegetation indices and other data-driven index methods across multiple stress detection tasks. The baseline methods included widely used spectral indices, as well as recent machine learning-based index formulations.
1. Traditional Vegetation Indices.
The following commonly used vegetation indices were selected for comparison:
![]() |
11 |
![]() |
12 |
![]() |
13 |
![]() |
14 |
These indices were calculated from Sentinel-2 bands and evaluated using the same classifier pipeline.
2. Machine Learning-Derived Indices
Comparision of RL-VI against MLVI and H_VSI two indices derived using machine learning optimization has been done.
MLVI: A linear combination of selected hyperspectral bands optimized via random forest importance ranking.
H_VSI: A hybrid vegetation stress index proposed in earlier work, combining NIR and red-edge bands for abiotic stress sensitivity.
These indices served as strong baselines for evaluating stress classification under varying conditions.
Evaluation criteria
All indices were evaluated using the same classifiers (SVM/CNN), training-validation-test split, and performance metrics (accuracy, F1-score, precision, recall, AUC). Additionally, the study analyzed early detection capability and cross-platform consistency (mobile RGB vs. Sentinel-2). This comprehensive comparison validates RL-VI’s generalization ability and sensitivity to physiological changes.
Classical Learning-Based methods (SVM, CNN, RF)
To further contextualize the performance of the RL-VI-based stress detection, comparision done with three classical machine learning approaches: Support Vector Machine (SVM), Convolutional Neural Networks (CNN), and Random Forest (RF). Each model was trained using either raw spectral reflectance, traditional vegetation indices, or the learned RL-VI as input features.
a) Support Vector Machine (SVM).
SVMs were implemented using a radial basis function (RBF) kernel. Hyperparameters (C and γ) were optimized using grid search on the validation set. SVMs are known for their robustness with small datasets and served as a baseline for binary and multiclass stress classification.
b) Convolutional Neural Networks (CNN).
CNNs were applied to both Sentinel-2 time-series bands and RGB-indexed maps. The model architecture included three convolutional layers with ReLU activation, batch normalization, and a dense classifier head. Cross-entropy loss was used with the Adam optimizer. CNNs provided spatial feature extraction benefits, particularly with mobile image inputs.
c) Random Forest (RF).
The RF classifier was used for feature importance ranking and classification using traditional and learned indices. An ensemble of 100 trees was used with a maximum depth of 10, and the Gini index as the split criterion. RF also enabled ranking of Sentinel-2 spectral bands contributing most to classification, supporting band interpretability analysis.
Ablation studies
Ablation experiments were conducted to isolate and evaluate the contribution of critical design elements in the RL-VI framework. Specifically, analyzed the effect of the number of spectral bands used in index formulation, variations in reward design, and the influence of mobile imagery.
1. Impact of Number of Bands.
To assess the influence of index dimensionality, and trained the RL-VI agent to construct indices using 2, 3, and 4 spectral bands respectively. Two-band formulations (similar to NDVI-style ratios) provided baseline sensitivity but lacked robustness during transitional growth stages. Three-band indices achieved the best overall accuracy and generalization, balancing complexity and interpretability. Increasing to four bands introduced overfitting on smaller datasets and reduced consistency across platforms1.
2. RL Reward Function: Entropy vs. F1-Score.
The authors tested two reward structures: (a) F1-score-based reward, and (b) entropy-regularized reward (F1 - β·Entropy). While both encouraged correct classification, the entropy penalty added sharper decision boundaries and reduced model uncertainty. The hybrid reward structure improved early stress detection and yielded slightly higher F1 scores in multi-class stress classification tasks2.
3. Integration of Mobile Imagery.
Compared model performance when trained on (i) Sentinel-2 only, (ii) mobile RGB indices only, and (iii) combined mobile + Sentinel-2 data. The integration of mobile imagery improved spatial resolution and localized stress signal capture. The RL-VI model trained with both sources showed improved accuracy (↑3.4%), earlier stress detection (by ~ 10 days). This highlights the importance of multimodal validation and platform fusion in precision agriculture3,4.
Results and discussion
This section presents the results of the proposed RL-VI model across datasets, growth stages, and platforms, comparing it with conventional vegetation indices and classical classifiers. Performance is reported in terms of accuracy, F1-score, early stress detection, and correlation between mobile and satellite-derived indices35,36.
Stress classification accuracy
The RL-VI approach consistently outperformed traditional indices like NDVI, EVI, and MLVI across all datasets which is shown in Fig. 7. On the Sentinel-2 Polur dataset, RL-VI achieved an accuracy of 89.4% and F1-score of 88%, compared to 76.5% and 72% for NDVI. This improvement is attributed to RL-VI’s ability to adaptively formulate band combinations suited to rice phenology and stress conditions1.
Fig. 7.

Accuracy/F1 Score Comparison Across Vegetation Indices.
To enhance model interpretability, SHAP (SHapley Additive exPlanations) were applied to visualize spectral band contributions. Both analyses revealed strong attribution to the red-edge (B5 B6) and SWIR (B11 B12) bands, confirming their physiological relevance for water and nutrient stress detection.
Figure 8. SHAP-style summary plot showing per-sample feature contributions to the binary rice (stressed) classification. Each dot represents a single test sample’s contribution for that spectral band. Bands with the widest spread and largest absolute values (notably bands around the red-edge and SWIR regions) have the greatest influence on model predictions.
Fig. 8.

SHAP Analysis.
Early stress detection
Compared to fixed indices, RL-VI detected visible stress symptoms 10 14 days earlier, particularly during booting and flowering stages. This early detection was validated using both RGB-derived indices, indicating its practical utility for pre-symptomatic stress management2. The early detection advantage of RL-VI was particularly evident during the booting stage, as detailed in Section III-E6.
Platform consistency (Mobile vs. Satellite)
Analyzation of correlation between mobile RGB-derived and Sentinel-2-derived index values across growth stages is done. RL-VI achieved a Pearson correlation of 0.89 (Tillering), 0.91 (Booting), and 0.87 (Flowering), outperforming NDVI and MLVI, which showed correlations < 0.8. This cross-platform consistency supports RL-VI’s generalization capacity3.
Spatial visualization and index response
Heatmaps generated from RL-VI outputs revealed finer spatial granularity in identifying stressed crop zones compared to NDVI and EVI. RL-VI effectively highlighted transitional stress regions that were not visible in RGB imagery, indicating improved sensitivity to subtle physiological variations in crop health4.
The above Fig. 9. shows the Stress zones generated using NDVI, EVI, and RL-VI for rice fields. RL-VI maps show finer spatial granularity and capture early stress signals not visible in conventional indices.
Fig. 9.
Comparison of stress heatmaps for rice fields (Polur, 2025). (a) NDVI; (b) EVI; (c) RL-VI.
The comparative stress heatmaps generated from NDVI, EVI, and RL-VI outputs demonstrate the superior spatial sensitivity of the proposed RL-VI framework. While NDVI and EVI maps appear relatively uniform, offering limited granularity in distinguishing subtle or early-stage stress regions, the RL-VI map reveals distinct transitional zones with sharper contrast. Specifically, RL-VI effectively highlights localized areas of stress particularly in the central zone that are not visibly discernible in NDVI or EVI outputs. This enhanced resolution is attributed to RL-VI’s dynamic adaptation to spectral band combinations optimized for rice phenology, enabling it to capture finer physiological variations across the crop canopy. These results support RL-VI’s capability to detect stress conditions earlier and more accurately than traditional indices.
Figure 10. depicts the Crop stress validation using cross-platform imagery. Sentinel-2 multispectral images (top left) were used to generate field-scale stress maps. These outputs were validated with smartphone-captured RGB images (top right) collected from the same field plots. The validation heatmap (bottom) shows stress intensity levels ranging from low (purple) to high (yellow), demonstrating strong consistency between satellite-derived RL-VI predictions and ground-level visual symptoms.
Fig. 10.

Crop stress validation diagram.
Figure 11 shows Sentinel-2 NDVI maps of rice fields in Polur, Tamil Nadu, illustrating spatial variability in vegetation health across two time points. The index legend highlights NDVI ranges corresponding to stress levels, where lower values (red) indicate stressed or sparse vegetation and higher values (green) represent healthy crop zones. These maps provide spatial insights into field-level heterogeneity and serve as a baseline for evaluating the dynamic RL-VI formulation.
Fig. 11.
Sentinel-2 NDVI maps of rice field (Polur, Tamil Nadu) for two dates in the 2025 Kharif season. (a) Early-season; (b) Mid-season.
To validate stress zones derived from satellite-based vegetation indices, high-resolution mobile imagery was used as a complementary source. While multispectral satellite data, such as Sentinel-2, provides field-scale insights at a spatial resolution of 10 m, it is not capable of detecting stress at the level of individual plants. Instead, it enables the identification of broader spatial patterns and localized zones of potential stress. To confirm these zones, geotagged mobile images were captured across the field, allowing visual inspection of crop symptoms such as yellowing, wilting, or sparse canopy development. By aligning mobile images with corresponding satellite-derived RL-VI or NDVI values, the presence or absence of physiological stress could be cross-verified. This integrated approach enables more accurate stress mapping, where mobile imagery confirms the ground truth and satellite imagery provides scalable monitoring across large agricultural plots.
Confusion matrix and Class-Level metrics
RL-VI yielded better separation of mild and moderate stress classes, reducing false negatives by 6 9% compared to NDVI and MLVI. This was particularly evident in flowering stage images where traditional indices underperformed due to signal saturation5.
Classification performance of NDVI and RL-VI for multiple rice stress severity levels is shown. The Fig. 12. clearly depicts that the RL-VI shows reduced false negatives and better separation of mild to moderate stress classes.
Fig. 12.
Confusion matrix comparison (NDVI vs. RL-VI).
RL-VI behavior across rice growth stages and stress types
Rice undergoes several critical phenological stages, each vulnerable to specific types of abiotic and biotic stress. These include nitrogen deficiency during tillering, water and potassium stress during panicle initiation, and heat/drought during booting and flowering. Traditional indices such as NDVI, EVI, and SAVI often fail to capture these stress signals early due to fixed band struc tures and saturation in high biomass stages.
RL-VI demonstrated superior sensitivity across multiple growth stages in rice by dynamically optimizing band combinations associated with water, nutrient, and heat stress. While NDVI and EVI showed delayed or saturated responses in high biomass phases like booting and flowering, RL-VI successfully detected stress onset 10 14 days earlier, particularly in panicle initiation and booting stages. This is attributed to its exploitation of SWIR1/SWIR2 and red-edge bands known markers of leaf water potential and canopy structure which are not used in NDVI.
Additionally, RL-VI’s adaptability across growth stages ensured consistent classification accuracy (> 0.85 F1-score) and early detection, as validated through mobile imagery. This makes RL-VI uniquely suited for precision rice agriculture, supporting proactive agronomic interventions. The advantages of RL-VI across different rice growth stages are summarized in Table 5.
Table 5.
Advantages of RL-VI across growth stages.
| Growth Stage | Common Stress Types | RL-VI Advantage |
|---|---|---|
| Tillering | Nitrogen deficiency, weed competition | Sensitive to pigment loss via red-edge bands |
| Panicle Initiation | Potassium deficiency, water stress | Captures canopy thinning and moisture loss (SWIR1) |
| Booting | Drought, heat stress | Detects panicle abortion risk before symptom onset |
| Flowering | Heat, spikelet sterility, pest stress | Differentiates physiological stress from natural senescence |
Error and statistical analysis
To further validate the robustness of the proposed RL-VI framework, a detailed error and consistency analysis was conducted between the two principal performance metrics Accuracy and F1-score across all vegetation indices (NDVI, EVI, MLVI, H_VSI, and RL-VI). The signed difference (Accuracy − F1) was computed for each index to quantify metric deviation and assess internal consistency.
The results show that traditional indices such as NDVI and EVI exhibit larger discrepancies (4.5 pp and 4.0 pp, respectively), indicating moderate inconsistency between their classification accuracy and F1-score. In contrast, the machine-learned and reinforcement learning based indices (MLVI, H_VSI, and RL-VI) show substantially smaller deviations (≤ 2.3 pp), confirming more stable predictive behavior across classes.
Error analysis and the performance Analysis was performed using the formulas as defined below:
![]() |
15 |
![]() |
16 |
![]() |
17 |
![]() |
18 |
![]() |
19 |
![]() |
20 |
![]() |
21 |
![]() |
22 |
![]() |
23 |
The computed error values RMSE = 2.990 pp, MAE = 2.660 pp, and SE(diff) = 0.683 pp indicate that the differences between Accuracy and F1 are minimal and consistent across indices shown in Fig. 13. This low error spread demonstrates high metric stability and validates the internal reliability of the RL-VI model’s classification results.
Fig. 13.
Signed difference between Accuracy and F1.
To assess internal consistency between two commonly reported metrics, accuracy and F1-score were compared across the five indices. The mean absolute error (MAE) between accuracy and F1 was 1.527% points and RMSE = 1.825% points, indicating small but non-negligible differences between the metrics across indices. Standard error of the differences was 1.529 pp, suggesting that the observed discrepancies are stable across the evaluated indices.
Statistical validation
To statistically validate the observed performance differences among vegetation indices, a one-way ANOVA was performed on accuracy values across all models (NDVI, EVI, SAVI, NDMI, RENDVI, MLVI, H_VSI, and RL-VI).
The one-way ANOVA (F = 88.24, p < 0.001) confirmed a significant difference among vegetation indices, indicating that index type strongly influences classification performance. RL-VI achieved the highest mean accuracy (0.932 ± 0.009), outperforming both traditional and machine-learned indices. Statistical validation results obtained using one-way ANOVA and pairwise t-tests are reported in Table 6.
Table 6.
Statistical validation of vegetation index performance using one-way ANOVA and pairwise t-tests (F = 88.24, p < 0.001).
| Index | Mean accuracy | Std. Dev. | ANOVA Group mean Diff. (vs. NDVI) | p-value | Pairwise t-test (RL-VI vs. Index) | Significance |
|---|---|---|---|---|---|---|
| NDVI | 0.786 | 0.011 | - | - | 7.21 | p < 0.001 (***) |
| EVI | 0.818 | 0.012 | 0.032 | p < 0.01 | 6.58 | p < 0.001 (***) |
| SAVI | 0.834 | 0.011 | 0.048 | p < 0.01 | 6.74 | p < 0.001 (***) |
| NDMI | 0.844 | 0.012 | 0.058 | p < 0.001 | 6.82 | p < 0.001 (***) |
| RENDVI | 0.854 | 0.012 | 0.068 | p < 0.001 | 6.94 | p < 0.001 (***) |
| MLVI | 0.884 | 0.012 | 0.098 | p < 0.001 | 7.59 | p < 0.001 (***) |
| H_VSI | 0.894 | 0.011 | 0.108 | p < 0.001 | 7.12 | p < 0.001 (***) |
| RL-VI | 0.932 | 0.009 | 0.146 | p < 0.001 | - | - |
Pairwise t-tests between RL-VI and other indices, particularly MLVI (t = 7.59, p < 0.001), revealed statistically significant improvements, validating RL-VI’s superior performance. The low standard deviation across repeated runs demonstrates stable model behavior, reinforcing the reliability of RL-VI for real-world crop stress detection.
Generalization and cross-dataset robustness
To evaluate the robustness and adaptability of the RL-VI framework, the authors conducted experiments on multiple datasets and across various rice growth stages. Also tested its applicability on alternate crops and environmental conditions to explore the potential for transfer learning.
Performance on different growth stages
The RL-VI model maintained high classification performance across tillering, panicle initiation, booting, and flowering stages which is proven in the below Fig. 14. F1-scores exceeded 0.85 at each stage, with the highest performance (F1 = 0.89) observed during booting. Traditional indices like NDVI showed stage-specific drop-offs, particularly during dense canopy stages due to signal saturation. The dynamic adaptability of RL-VI allows it to respond to subtle physiological changes, ensuring consistent stress detection across the phenological timeline1.
Fig. 14.
RL-VI performance over NDVI across rice growth stages.
The decline in NDVI performance observed during the booting stage can be attributed to its known saturation effect in dense vegetative canopies. As rice crops enter advanced growth phases, such as panicle initiation and booting, the canopy becomes uniformly dense, causing NDVI values to cluster near the upper limit (0.8 0.9). This saturation reduces NDVI’s sensitivity to subtle physiological variations, such as partial chlorosis, early water stress, or nutrient imbalances. Consequently, its ability to differentiate between stressed and healthy plants diminishes. In contrast, RL-VI maintains high classification performance across all stages by dynamically selecting spectral bands such as SWIR and red-edge that remain responsive during dense canopy conditions. This adaptability allows RL-VI to capture finer stress cues that NDVI fails to detect, ensuring consistent accuracy throughout the phenological timeline.
Transfer learning to other regions and crops
To explore generalizability, the trained RL-VI model was fine-tuned on two external datasets: the Indian Pines hyperspectral dataset and the Wheat Salt Stress dataset. Using a lightweight transfer learning strategy (freezing base weights and retraining the final layer), the RL-VI index generalized well to both domains. On the Wheat dataset, RL-VI achieved 91.2% accuracy and outperformed NDVI and MLVI by 5 to 7%. These results support RL-VI’s flexibility in adapting to new crop types and environmental conditions with minimal retraining2,3. Class-wise precision, recall, F1-score, and accuracy values for RL-VI are reported in Table 7.
Table 7.
RL configuration parameters for RL-Agent Setup.
| Parameter | Value |
|---|---|
| RL Algorithm | Deep Q-Network (DQN) |
| Policy Type | ε-greedy (decay from 1.0 → 0.1) |
| Network Type | Multi-Layer Perceptron (MLP) |
| Hidden Layers | [128, 64] with ReLU activation |
| Learning Rate (η) | 0.001 |
| Discount Factor (γ) | 0.95 |
| Exploration Strategy (ε-greedy) | Exploration initially 100%, reduced to 10% |
| Episodes | 200 episodes |
| Reward Type | Hybrid: F1-score |
| State Space | Normalized reflectance vector + band history |
| Action Space | Triplet of spectral bands with optional weights (i, j,k, a,b, c) |
Conclusion
This study presented RL-VI, a reinforcement learning–based vegetation index formulation framework designed for adaptive crop stress detection. Unlike traditional static vegetation indices (e.g., NDVI, EVI) and previously proposed machine-learned indices, RL-VI formulates spectral band combinations dynamically through a reward-driven learning process. By integrating Sentinel-2 multispectral imagery with canopy-level RGB observations, the framework enables cross-platform stress monitoring and validation at both satellite and field scales.
The experimental results indicate that RL-VI achieved higher classification performance than conventional vegetation indices across the evaluated datasets. In particular, RL-VI obtained a mean accuracy of 93.2% and an F1-score of 0.91 on the rice stress dataset, representing an improvement of approximately 8–12% over NDVI, EVI, SAVI, and MLVI under the same experimental settings. Statistical analysis using one-way ANOVA (F = 88.24, p < 0.001) and pairwise t-tests (p < 0.001) confirmed that the observed performance differences are statistically significant. The results further suggest that RL-VI can identify stress patterns earlier than fixed vegetation indices in the evaluated rice fields, with detection occurring up to 10–14 days before visible symptoms under the studied conditions. In addition, the framework maintained stable performance across multiple rice growth stages, demonstrating its potential robustness within the evaluated spatial and temporal scope.
The Table 8 summarizing all key parameters for reinforcement learning agent setup. The below Figs. 15, 16, 17, 18, 19 and 20. are the Mobile image used for validation purpose and Fig. 21 shows the confusion matrix of RL-VI.
Table 8.
Classwise evaluation Metrics.
| Class | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| Healthy | 0.95 | 0.93 | 0.94 | 0.94 |
| Mild | 0.88 | 0.85 | 0.86 | 0.97 |
| Moderate | 0.87 | 0.89 | 0.88 | 0.88 |
| Severe | 0.85 | 0.83 | 0.84 | 0.84 |
| Extreme | 0.92 | 0.91 | 0.915 | 0.91 |
Fig. 15.

Mobile image - validation 1.
Fig. 16.

Mobile image - validation 2.
Fig. 17.

Mobile image - validation 3.
Fig. 18.

Mobile image - validation 4.
Fig. 19.

Mobile image - validation 5.
Fig. 20.

Mobile image - validation 6.
Fig. 21.

RL-VI Confusion matrix.
Limitations and future directions
Despite these encouraging results, several limitations should be acknowledged. First, the reinforcement learning training process incurs higher computational cost during exploration, particularly when applied to large multispectral or hyperspectral datasets. Second, the reward mechanism relies on the availability and quality of labeled stress data; inaccuracies or inconsistencies in ground truth labels may affect learning stability and performance. Third, differences in spatial resolution and radiometric characteristics between Sentinel-2 imagery and mobile RGB data require careful normalization and calibration, which may influence cross-platform consistency. Finally, although the framework was evaluated across growth stages within a single season, its generalization across multiple seasons, climatic conditions, and geographic regions has not yet been fully validated.
Future work will focus on extending RL-VI to a multi-temporal reinforcement learning setting that explicitly models seasonal variability and long-term crop dynamics. Additional sensor modalities, such as thermal and LiDAR data, will be explored to enhance sensitivity to water and structural stress. Further efforts will also investigate lightweight implementations for edge or UAV-based deployment and evaluate the transferability of the RL-VI framework to other crop types and agro-ecological conditions.
Acknowledgements
The authors would like to thank their supervisor for the guidance and constructive suggestions that significantly contributed to this work. The authors also acknowledge SRM Institute of Science and Technology, VADAPALANI campus for providing institutional support and research facilities essential for conducting this research.
Author contributions
S.P conceived the study and developed the RL-VI framework and contributed to data collection, analysis, and validation. A.S assisted in implementation, visualization, and manuscript preparation. All authors reviewed and approved the final manuscript.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Data availability
All datasets used in this study are openly accessible. The Mobile RGB dataset, consisting of field-captured rice canopy images collected by the authors at Polur, Tamil Nadu, India, is publicly available on Kaggle under a CC BY-NC 4.0 license (DOI: [https://doi.org/10.34740/kaggle/dsv/14105754](https:/doi.org/10.34740/kaggle/dsv/14105754)). Sentinel-2 multispectral imagery was obtained from the European Space Agency Copernicus Open Access Hub via Google Earth Engine. Benchmark hyperspectral datasets (Indian Pines and Wheat Salt Stress) are publicly available from their respective repositories. All processed data generated during this study are available from the corresponding author upon reasonable request.
Code availability
All custom code developed for this work including the RL-VI (Reinforcement Learning–based Vegetation Index) formulation algorithm, image preprocessing scripts, vegetation index computation modules, model training pipelines, and evaluation routines is openly accessible in a public GitHub repository. The code is available without restriction for non-commercial research use and fully available at Github Repository (https://github.com/Poornisrm/Vegetation-Index.git).
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Moharrami, M., Attarchi, S., Gloaguen, R. & Alavipanah, S. K. Integration of Sentinel-1 and Sentinel-2 Data for Ground Truth Sample Migration for Multi-Temporal Land Cover Mapping. Remote Sens., 16(9). 10.3390/rs16091566 (2024).
- 2.Reis Pereira, M. et al. Plant disease diagnosis based on hyperspectral sensing: comparative analysis of parametric spectral vegetation indices and nonparametric Gaussian process classification approaches. Agronomy14 (3), 10.3390/agronomy14030493 (2024).
- 3.Fan, L. et al. A temporal-spatial deep learning network for winter wheat mapping using time-series Sentinel-2 imagery. ISPRS J. Photogrammetry Remote Sens.214, 4864. 10.1016/j.isprsjprs.2024.06.005 (2024). [Google Scholar]
- 4.Jiang, X. et al. An automatic rice mapping method based on an integrated time-series gradient boosting tree using GF-6 and sentinel-2 images. GIScience Remote Sens.61 (1), 10.1080/15481603.2024.2367807 (2024).
- 5.Xu, H., Song, J. & Zhu, Y. Evaluation and Comparison of Semantic Segmentation Networks for Rice Identification Based on Sentinel-2 Imagery. Remote Sens.15(6), 10.3390/rs15061499 (2023).
- 6.Yu, L. et al. Research on machine Learning-Based extraction and classification of crop planting information in arid irrigated areas using Sentinel-1 and Sentinel-2 Time-Series data. Agric. (Switzerland). 15 (11), 10.3390/agriculture15111196 (2025).
- 7.Herzig, P. et al. Evaluation of Rgb and multispectral unmanned aerial vehicle (Uav) imagery for high-throughput phenotyping and yield prediction in barley breeding. Remote Sens.13 (14), 10.3390/rs13142670 (2021).
- 8.Sun, Y., Wang, B. & Zhang, Z. Improving leaf area index Estimation with chlorophyll insensitive multispectral Red-Edge vegetation indices. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.16, 35683582. 10.1109/JSTARS.2023.3262643 (2023). [Google Scholar]
- 9.Aslan, M. F., Sabanci, K. & Aslan, B. Artificial Intelligence Techniques in Crop Yield Estimation Based on Sentinel-2 Data: A Comprehensive Survey. In Sustainability (Switzerland) (Vol. 16, Issue 18). Multidisciplinary Digital Publishing Institute (MDPI). (2024). 10.3390/su16188277
- 10.Samsuddin Sah, S., Abdul Maulud, K. N., Sharil, S., Karim, A., Pradhan, B. & O., & Monitoring of three stages of paddy growth using multispectral vegetation index derived from UAV images. Egypt. J. Remote Sens. Space Sci.26 (4), 989998. 10.1016/j.ejrs.2023.11.005 (2023). [Google Scholar]
- 11.Sulaiman, N. et al. N., W. F. The Application of Hyperspectral Remote Sensing Imagery (HRSI) for Weed Detection Analysis in Rice Fields: A Review. In Applied Sciences (Switzerland) (12 (5), MDPI. 10.3390/app12052570 (2022).
- 12.Zhang, G., Xu, T. & Tian, Y. Hyperspectral imaging-based classification of rice leaf blast severity over multiple growth stages. Plant. Methods. 18 (1), 10.1186/s13007-022-00955-2 (2022). [DOI] [PMC free article] [PubMed]
- 13.Farmonov, N. et al. Crop Type Classification by DESIS Hyperspectral Imagery and Machine Learning Algorithms, in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16, 1576–1588, 10.1109/JSTARS.2023.3239756 (2023).
- 14.Singh, G. & Sharma, S. Enhancing precision agriculture through cloud based transformative crop recommendation model. Sci. Rep.15, 9138. 10.1038/s41598-025-93417-3 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Singh, S. et al. A predictive framework using advanced machine learning approaches for measuring and analyzing the impact of synthetic agrochemicals on human health. Sci. Rep.15, 15544. 10.1038/s41598-025-00509-1 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Singh, G. & Sharma, S. Revolutionizing cloud-IoT and UAV-assisted framework to analyze soil for cultivation in agricultural landscapes. Proc. Indian Natl. Sci. Acad.10.1007/s43538-025-00489-w (2025). [Google Scholar]
- 17.Upadhyay, N. & Gupta, N. Detecting fungi-affected multi-crop disease on heterogeneous region dataset using modified resnext approach. Environ. Monit. Assess.196, 610. 10.1007/s10661-024-12790-0 (2024). [DOI] [PubMed] [Google Scholar]
- 18.Upadhyay, N. & Bhargava, A. Artificial intelligence in agriculture: applications, approaches, and adversities across pre-harvesting, harvesting, and post-harvesting phases. Iran. J. Comput. Sci.8, 749772. 10.1007/s42044-025-00264-6 (2025). [Google Scholar]
- 19.Upadhyay, N., Gupta, N. & SegLearner A segmentation based approach for predicting disease severity in infected leaves. Multimed Tools Appl.84, 42523 42546. 10.1007/s11042-025-20838-7 (2025). [Google Scholar]
- 20.Upadhyay, N., Sharma, D. K. & Bhargava, A. 3SW-Net: A feature fusion network for semantic weed detection in precision agriculture. Food Anal. Methods. 18, 2241 2257. 10.1007/s12161-025-02852-5 (2025). [Google Scholar]
- 21.Kurihara, J., Nagata, T. & Tomiyama, H. Rice yield prediction in different growth environments using unmanned aerial Vehicle-Based hyperspectral imaging. Remote Sens.15 (8), 10.3390/rs15082004 (2023).
- 22.Karmakar, P. et al. Crop monitoring by multimodal remote sensing: A review. In Remote Sensing Applications: Society and Environment (Vol. 33). Elsevier B.V. 10.1016/j.rsase.2023.101093 (2024).
- 23.Albahar, M. A Survey on Deep Learning and Its Impact on Agriculture: Challenges and Opportunities. In Agriculture (Switzerland) (Vol. 13, Issue 3). MDPI. 10.3390/agriculture13030540 (2023).
- 24.Mou, L. et al. Deep reinforcement learning for band selection in hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.6010.1109/TGRS.2021.3067096 (2022).
- 25.Wu, B. et al. Challenges and opportunities in remote sensing-based crop monitoring: a review. In National Science Review (10 (4)). Oxford University Press. 10.1093/nsr/nwac290 (2023). [DOI] [PMC free article] [PubMed]
- 26.Zhang, J. et al. UAV as a bridge: mapping key rice growth stage with Sentinel-2 imagery and novel vegetation indices. Remote Sens.17 (13), 2180. 10.3390/rs17132180 (2025). [Google Scholar]
- 27.Virnodkar, S. S., Pachghare, V. K., Patil, V. C. & Jha, S. K. Remote sensing and machine learning for crop water stress determination in various crops: a critical review. In Precision Agriculture (21 (5), 1121 1155). Springer. (2020). 10.1007/s11119-020-09711-9
- 28.Liu, L. et al. A disease index for efficiently detecting wheat fusarium head blight using Sentinel-2 multispectral imagery. IEEE Access.8, 5218152191. 10.1109/ACCESS.2020.2980310 (2020). [Google Scholar]
- 29.Zhao, D. et al. Early detection of rice leaf blast disease using unmanned aerial vehicle remote sensing: A novel approach integrating a new spectral vegetation index and machine learning. Agronomy14 (3), 10.3390/agronomy14030602 (2024).
- 30.Sato, Y., Tsuji, T. & Matsuoka, M. Estimation of rice plant coverage using Sentinel-2 based on UAV-Observed data. Remote Sens.16 (9), 10.3390/rs16091628 (2024).
- 31.Yu, Y. et al. Early mapping method for different planting types of rice based on planet and Sentinel-2 satellite images. Agronomy14 (1), 10.3390/agronomy14010137 (2024).
- 32.Tian, H. et al. A novel spectral index for automatic Canola mapping by using Sentinel-2 imagery. Remote Sens.14 (5). 10.3390/rs14051113 (2022).
- 33.Choshi, T. J., Dhau, I. & Mashao, F. Enhancing maize streak virus detection: a comparative analysis of Sentinel-2 MSI and Landsat 9 OLI data across vegetative and reproductive growth stages. Geocarto Int.40 (1). 10.1080/10106049.2025.2480701 (2025).
- 34.Baldin, C. M. & Casella, V. M. Comparison of planetscope and Sentinel-2 spectral channels and their alignment via linear regression for enhanced index derivation. Geosci. (Switzerland). 15 (5). 10.3390/geosciences15050184 (2025).
- 35.Liu, L., Xie, Y., Zhu, B. & Song, K. Rice leaf chlorophyll content Estimation with different crop coverages based on Sentinel-2. Ecol. Inf.8110.1016/j.ecoinf.2024.102622 (2024).
- 36.Cong, C. et al. Research on Monitoring Methods for the Appropriate Rice Harvest Period Based on Multispectral Remote Sensing. Discrete Dynamics in Nature and Society, 2022. 10.1155/2022/1519667 (2022).
- 37.Zhang, H. et al. A novel red-edge spectral index for retrieving the leaf chlorophyll content. Methods Ecol. Evol.13 (12), 27712787. 10.1111/2041-210X.13994 (2022). [Google Scholar]
- 38.Tian, J., Tian, Y., Cao, Y., Wan, W. & Liu, K. Research on rice fields extraction by NDVI difference method based on Sentinel data. Sensors23 (13), 10.3390/s23135876 (2023). [DOI] [PMC free article] [PubMed]
- 39.Luo, S. et al. Remotely sensed prediction of rice yield at different growth durations using UAV multispectral imagery. Agric. (Switzerland). 12 (9), 10.3390/agriculture12091447 (2022).
- 40.Sharma, V., Honkavaara, E., Hayden, M. & Kant, S. UAV remote sensing phenotyping of wheat collection for response to water stress and yield prediction using machine learning. Plant. Stress. 1210.1016/j.stress.2024.100464 (2024).
- 41.Ren, C., Kim, D. K. & Jeong, D. A survey of deep learning in agriculture: techniques and their applications. J. Inform. Process. Syst.16 (5), 10151033. 10.3745/JIPS.04.0187 (2020). [Google Scholar]
- 42.Liao, Z. Q. et al. A double-layer model for improving the Estimation of wheat canopy nitrogen content from unmanned aerial vehicle multispectral imagery. J. Integr. Agric.22 (7), 22482270. 10.1016/j.jia.2023.02.022 (2023). [Google Scholar]
- 43.Banerjee, B. P., Sharma, V., Spangenberg, G. & Kant, S. Machine learning regression analysis for Estimation of crop emergence using multispectral UAV imagery. Remote Sens. (Basel). 13 (15), 2918 (2021). https://www.mdpi.com/2072-4292/13/15/2918 [Google Scholar]
- 44.Yu, W. et al. Evaluation of red-edge features for identifying subtropical tree species based on Sentinel-2 and Gaofen-6 time series. Int. J. Remote Sens.43 (8), 3003. 10.1080/01431161.2022.2079018 (2022). [Google Scholar]
- 45.Qiu, Z. et al. Accurate prediction of 327 rice variety growth period based on unmanned aerial vehicle multispectral remote sensing. Drones8 (11), 665. 10.3390/drones8110665 (2024). [Google Scholar]
- 46.Li, Z., Feng, X., Li, J., Wang, D., Hong, W., Qin, J., … Chen, S. (2024). Time Series Field Estimation of Rice Canopy Height Using an Unmanned Aerial Vehicle-Based RGB/Multispectral Platform. Agronomy, 14(5), 883. https://doi.org/10.3390/agronomy14050883.
- 47.Brinkhoff, J. et al. Forecasting field rice grain moisture content using Sentinel-2 and weather data. Precision Agric.26, 28. 10.1007/s11119-025-10228-2 (2025). [Google Scholar]
- 48.Sári-Barnácz, F. E. et al. Monitoring Helicoverpa armigera damage with PRISMA hyperspectral imagery: first experience in maize and comparison with Sentinel-2 imagery. Remote Sens.16 (17), 3235. 10.3390/rs16173235 (2024). [Google Scholar]
- 49.Darra, N., Espejo-Garcia, B., Psiroukis, V., Psomiadis, E. & Fountas, S. Spectral bands vs. Vegetation indices: an automl approach for processing tomato yield predictions based on Sentinel-2 imagery. Smart Agricultural Technol. 100805. 10.1016/j.atech.2025.100805 (2025).
- 50.Khosravi, I. Towards sustainable agriculture in Iran using a machine learning-driven crop mapping framework. Eur. J. Remote Sens.58 (1), 10.1080/22797254.2025.2490787 (2025).
- 51.Botero-Valencia, J. et al. Machine learning in sustainable agriculture: systematic review and research perspectives. Agriculture15 (4), 377. 10.3390/agriculture15040377 (2025). [Google Scholar]
- 52.Sumantra Chatterjee, G. S., Baath, B. R., Sapkota, K. C., Flynn, D. R. & Smith Enhancing LAI estimation using multispectral imagery and machine learning: A comparison between reflectance-based and vegetation indices-based approaches, Computers and Electronics in Agriculture, 230, 2025, 109790,ISSN 0168–1699,10.1016/j.compag.2024.109790
- 53.Ou, C. et al. Using machine learning methods combined with vegetation indices and growth indicators to predict seed yield of Bromus inermis. Plants13 (6), 773. 10.3390/plants13060773 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yang Xu, Y. et al. A deep learning model based on RGB and hyperspectral images for efficiently detecting tea green leafhopper damage symptoms. Smart Agricultural Technol. Volume. 10, 2772–3755. 10.1016/j.atech.2025.100817 (2025). [Google Scholar]
- 55.Patel, U. & Patel, V. Active learning-based hyperspectral image classification: a reinforcement learning approach. J. Supercomput. 80, 2461 2486. 10.1007/s11227-023-05568-7 (2024). [Google Scholar]
- 56.Fu, B., Sun, X., Cui, C., Zhang, J. & Shang, X. Structure-preserved and weakly redundant band selection for hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.10.1109/JSTARS.2024.3425906 (2024). [Google Scholar]
- 57.Yang, J. X., Zhou, J., Wang, J., Tian, H. & Liew, A. W. C. LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification, in IEEE Transactions on Geoscience and Remote Sensing, 62, 1–15, 5515815, 10.1109/TGRS.2024.3389651 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Moharrami, M., Attarchi, S., Gloaguen, R. & Alavipanah, S. K. Integration of Sentinel-1 and Sentinel-2 Data for Ground Truth Sample Migration for Multi-Temporal Land Cover Mapping. Remote Sens., 16(9). 10.3390/rs16091566 (2024).
Data Availability Statement
All datasets used in this study are openly accessible. The Mobile RGB dataset, consisting of field-captured rice canopy images collected by the authors at Polur, Tamil Nadu, India, is publicly available on Kaggle under a CC BY-NC 4.0 license (DOI: [https://doi.org/10.34740/kaggle/dsv/14105754](https:/doi.org/10.34740/kaggle/dsv/14105754)). Sentinel-2 multispectral imagery was obtained from the European Space Agency Copernicus Open Access Hub via Google Earth Engine. Benchmark hyperspectral datasets (Indian Pines and Wheat Salt Stress) are publicly available from their respective repositories. All processed data generated during this study are available from the corresponding author upon reasonable request.
All custom code developed for this work including the RL-VI (Reinforcement Learning–based Vegetation Index) formulation algorithm, image preprocessing scripts, vegetation index computation modules, model training pipelines, and evaluation routines is openly accessible in a public GitHub repository. The code is available without restriction for non-commercial research use and fully available at Github Repository (https://github.com/Poornisrm/Vegetation-Index.git).































