Skip to main content
Springer logoLink to Springer
. 2025 Nov 20;197(12):1358. doi: 10.1007/s10661-025-14558-6

A hybrid ACO–random forest optimization framework for scalable microalgae biomass estimation using multispectral imaging

Keshinro Kazeem Kolawole 1, Mohamad Shukri bin Zainal Abidin 1,, Mohd Farizal bin Kamaroddin 2, Muhammad Sharul Azwan bin Ramli 1, Sikudhan Lucas Mpuhus 1, Ardiansyah Rizqi 1
PMCID: PMC12634816  PMID: 41264043

Abstract

Accurate estimation of algal biomass is essential for monitoring ecosystem productivity, managing aquaculture systems, and optimizing bioresource applications. However, traditional in situ methods are labor-intensive and spatially limited, while remote sensing approaches struggle with nonlinear spectral–biological relationships and the complexity of high-dimensional models. This study develops a hybrid ant colony optimization–random forest regression (ACO–RFR) framework that integrates feature selection with hyperparameter optimization to improve biomass prediction from multispectral imagery. The preprocessing pipeline combined reflectance normalization, multicollinearity screening, and outlier detection to reduce redundancy and noise. The ACO–RFR achieved both feature reduction and robust optimization, yielding high predictive accuracy (R2 = 0.96, 95% CI 0.94–0.98; RMSE = 0.05 g L−1, 95% CI 0.04–0.07) while reducing model dimensionality by more than 60%. Feature importance analysis highlighted NDVI, NIR/red ratios, and texture entropy as key biologically meaningful predictors of chlorophyll-a concentration. By leveraging low-cost imaging and computational efficiency, the framework enables scalable, real-time monitoring for aquaculture, ecological assessments, and biofuel production.

Keywords: Algae biomass estimation, Multispectral imaging, Machine learning, Ant colony optimization (ACO), Feature selection, Regression models, Optimization, Non-destructive measurement, Spectral features, Texture analysis, Vegetation index, Biomass productivity, Algae cultivation, Model generalizability

Introduction

Microalgal biomass serves as a vital proxy for assessing primary productivity, water quality, and ecological health in freshwater and marine systems. Chlorophyll-a concentration, in particular, is widely recognized as a key biological indicator of algal biomass and is integral to environmental monitoring, eutrophication studies, and aquaculture management (Schagerl et al., 2022). Despite its utility, conventional methods for chlorophyll-a quantification—such as chemical extraction or spectrophotometric analysis—are often time-consuming, labor-intensive, and constrained in spatial and temporal resolution, limiting their applicability for real-time or large-scale monitoring (Havlik et al., 2022; Wychen et al., 2021; Al-Tohamy et al., 2022; Pasquier et al., 2022).

Recent advances in remote sensing, particularly through multispectral imaging, offer promising alternatives for non-invasive chlorophyll-a estimation. Sensors mounted on satellites (e.g., Sentinel-2), drones, or fixed platforms can provide spatially continuous, repeatable observations at fine resolutions (Sahu et al., 2023). However, several technical challenges remain: first, the spectral response of microalgae varies across species and environmental conditions; secondly, multispectral datasets are typically high-dimensionally prone to multicollinearity; and lastly, the relationship between reflectance and biomass is inherently nonlinear, often leading to poor generalization in traditional regression models (Zeng & Chen, 2018; Guillevic et al., 2017; Shaikh et al., 2021).

Machine learning techniques, including support vector regression (SVR), artificial neural networks (ANN), and random forest regression (RFR), have demonstrated improved performance in modeling these complex relationships by learning directly from observed spectral-biological data (Hakala et al., 2018; Ricardo et al., 2019). RFR offers robustness to noise, the ability to handle nonlinear dependencies, and an intrinsic mechanism for estimating feature importance. Nonetheless, its effectiveness heavily depends on the selection of relevant input features and optimal hyperparameter settings which, if chosen poorly, may lead to overfitting or reduced interpretability (Mahmoudzadeh et al., 2024; Fatima et al., 2023; Rubbens et al., 2023).

To address these limitations, bio-inspired optimization techniques such as particle swarm optimization (PSO), genetic algorithms (GA), and ant colony optimization (ACO) have gained attention (Y. Wang et al., 2025; Maraveas et al., 2023). These algorithms can perform both feature selection and hyperparameter tuning simultaneously, effectively reducing dimensionality while improving model accuracy. ACO, which emulates the pheromone-based foraging behavior of ant colonies, has shown promise in complex combinatorial optimization tasks, but its application in tree-based ensemble learning models remains underexplored—particularly for biomass estimation using multispectral imagery (Wu et al., 2024; Kamble & Dubey, 2022; Mokhtarzadeh et al., 2025).

This study proposes a novel hybrid framework that integrates ACO with random forest regression for chlorophyll-a estimation in controlled outdoor cultivation systems. The framework is designed to automate feature selection, optimize model configuration, and enhance interpretability by identifying biologically meaningful spectral and environmental indicators. This approach is particularly relevant for aquaculture and phycological research, where precise, scalable, and real-time biomass monitoring is essential for managing growth dynamics, optimizing nutrient input, and detecting early signs of algal bloom or decline.

To the best of our knowledge, this is the first study to apply ACO for both hyperparameter optimization and feature selection within an RFR framework specifically aimed at algae biomass estimation from field-deployed multispectral imaging (Özkan et al., 2023). The objectives of this study are threefold: (i) to establish an outdoor experimental framework for monitoring Chlorella sorokiniana biomass using low-cost multispectral imaging; (ii) to develop a hybrid ant colony optimization–random forest regression (ACO–RFR) framework for joint feature selection and hyperparameter tuning; and (iii) to evaluate the predictive performance, feature interpretability, and robustness of the proposed framework against baseline and alternative machine learning models.

Materials and methods

Study area and sampling design

The experiment was conducted at Ecopark, Universiti Teknologi Malaysia (1°59′27″ N, 103°28′58″ E), using a controlled mesocosm facility designed for aquaculture and ecological research. Two cultivation phases were performed, as shown in Fig. 1:

  • Phase I (Jan–Jun 2023): Five cycles of Chlorella vulgaris and C. sorokiniana cultured in 1000 L seawater tanks (1 kg inoculum each), giving 100 units of the dataset.

  • Phase II (Sep 2024–May 2025): Two freshwater cultivations, initiated from BG-11 laboratory precultures (OD > 1) and scaled up into 1000 L outdoor tanks at a 1:10 mixing ratio, giving 100 units of the dataset in total. Out of which 40 units were for C. sorokiniana culture used in this study.

The outdoor cultivation experiment was conducted in a cylindrical mesocosm tank with a nominal working volume of 1000 L (actual geometric volume ≈ 0.90 m3). The tank had an internal diameter of 1.07 m (radius ≈ 0.54 m) and a depth of 1.12 m, yielding a base area of 0.92 m2 and lateral area of 3.80 m2. Accounting for water displacement and headspace, the effective fill volume was adjusted to 1000 L. The tank was constructed from food-grade polyethylene, positioned outdoors at Universiti Teknologi Malaysia Ecopark (Johor Bahru, Malaysia), and filled with sterile seawater medium enriched with chicken manure nutrients, as shown in Fig. 1a–c. Each culture cycle was initiated by inoculating ~1 kg (wet weight) of pre-cultivated Chlorella vulgaris and Chlorella sorokiniana. Daily sampling was performed at three depths: (i) 10 cm below the surface (top layer), (ii) mid-depth at 0.56 m (middle layer), and (iii) 10 cm above the bottom (bottom layer), as shown in Fig. 1b. At each depth, ~50 mL of culture was collected using sterile centrifuge tubes mounted on a vertical sampling rod. The rod was positioned at the geometric center of the tank to minimize wall effects and ensure representative collection. Immediately after sampling, aliquots were transported to the laboratory for biomass and spectral analyses.

Fig. 1.

Fig. 1

Experimental setup of outdoor cylindrical culture tanks used for biomass monitoring. a Conceptual diagram of culture system; b field deployment of tanks with sensors; c aerial-based multispectral imaging workflow

One milliliter each was dropped into a cuvette and tested on a spectrophotometer for daily OD values. Figure 2b shows the spectrophotometer, while sampling tubes are shown in Fig. 2a. A weather station was installed atop a nearby pole to record meteorological data, including temperature, humidity, wind speed and direction, precipitation, UV index, light intensity, and solar radiation (Mokhtarzadeh et al., 2025). The station was powered by a continuous mains supply and transmitted data to the cloud via a 3G connection. Additionally, a probe was used daily to log in -situ pH and temperature, as shown in Fig. 2c, d, depicts the weather station device.

Fig. 2.

Fig. 2

(a) Samples; (b) spectrophotometer; (c) the probe; (d) weather station device

Remote sensing image acquisition and spectral preprocessing

Multispectral data were acquired using a MAPIR Survey3 RGN camera positioned approximately 3 m above the biotank to capture nadir-view imagery under consistent midday lighting. This camera records spectral reflectance in the red, green, and near-infrared (NIR) bands at a radiometric resolution of 12 bits per channel (Geogdzhayev et al., 2021). Image acquisition was conducted daily around solar noon during cloud-free conditions to minimize variations in atmospheric scattering and solar incidence angles.

To derive surface reflectance from raw digital numbers (DN), a two-step radiometric calibration approach was implemented. Initially, the camera was calibrated using a Spectralon white (Shaikh et al., 2021) reference panel placed within the field of view to compute the target reflectance as the ratio of DN values between the sample and the reference, scaled by the panel’s known reflectivity (Ricardo et al., 2019). Subsequently, an empirical line correction was applied to align the processed imagery with field-measured reflectance values, as per Zhang et al. (2022). This involved a least-squares regression between field spectrometer data and image reflectance from cloud-free days, using the following transformation:

Rλ=DNtarget(λ)DNwhite(λ)×Rwhiteλ 1

where, R is the reflectance and λ is wavelength. This equation was applied after atmospheric correction. To further harmonize the reflectance with in situ measurements, a linear correction was applied using RGN camera data collected during cloud-free sampling days, using Eq. (2), thus

Rcorrected=a×Rmeasured+b 2

where a and b are empirical calibration coefficients derived for each spectral band.

Following reflectance calibration, spectral smoothing was conducted using the Savitzky–Golay filter (second-order polynomial, window size = 5) to suppress high-frequency noise while preserving key absorption features linked to chlorophyll-a (Li et al., 2019). All features were subsequently normalized to a [0, 1] range using min–max scaling to prevent dominance of high-magnitude variables during model training. Vegetation indices such as NDVI, NDWI, NDPI, and the NIR-to-green ratio were computed to enhance sensitivity to pigment concentration and water content.

Feature extraction

Features were grouped into three categories:

  • Spectral features: raw reflectance (Red, Green, NIR) and indices, see Table 1.

  • Texture features: gray-level co-occurrence matrix (GLCM: contrast, entropy, homogeneity, energy), as shown in Table 2.

  • Statistical features: mean, standard deviation, and percentiles of reflectance and indices.

Table 1.

Libraries and software packages used for image preprocessing, feature extraction, and model implementation

Module Tools
Imaging MAPIR Survey3 RGN camera
Preprocessing OpenCV, scikit-image
Feature Extraction NumPy, SciPy, scikit-image
ACO Implementation Custom Python (PyAnt-based)
ML Models scikit-learn (RandomForestRegressor, SVR, XGBoost)
Optimization DEAP (GA), PySwarm (PSO)
Visualization Matplotlib, Seaborn

Table 2.

Vegetation indices and their mathematical formulations used in biomass estimation

Index Formula Reference

NDVI

(Normalized Difference Vegetation Index)

(NIR − R) / (NIR + R) (Huang et al., 2021)
NDPI (Normalized Difference Pond Index) (NIR − G) / (NIR + G) (Xu et al., 2021)
NDWI (Normalized Difference Water Index) (G − NIR) / (G + NIR) (Shashikant et al., 2021)
NIR (NIR Ratio) NIR / R (Zeng et al., 2021)
FLH (Fluorescence Line Height, simulated) NIR − R (Satish et al., 2023)

Ant colony optimization (ACO) for feature and hyperparameter selection

To improve model efficiency and interpretability, ant colony optimization was employed for simultaneous feature selection and hyperparameter tuning of the random forest regressor (RFR). ACO simulates the foraging behavior of ants, where agents traverse a solution space—here, the set of all possible feature combinations and parameter configurations—guided by synthetic pheromone trails and heuristic relevance scores (Abdulghani & Abdulghani, 2024).

Each ant constructs a candidate solution comprising a subset of input features and a set of RFR hyperparameters (number of trees, maximum tree depth, minimum samples per split). The quality (fitness) of each solution is evaluated using tenfold cross-validation root mean squared error (RMSE). The pheromone level associated with each feature or parameter value is updated based on the fitness of the solutions that include them, reinforcing the exploration of more promising regions of the search space. The feature selection probability Pi is defined as

Pi=(τi)α×(ηi)βΣj(τj)α×(ηj)β 3

where τi is the pheromone intensity for feature i, ηi is the heuristic value (e.g., correlation with the target), and α and β control the influence of pheromone and heuristic terms, respectively.

Pheromone evaporation was implemented using the standard decay formula:

τijt+1=1-ρ×τijt+Δτij 4

with ρ set at 0.2 to balance exploration and exploitation. The reinforcement Δτ was proportional to model fitness (1 − R2), rewarding more accurate configurations. The search process terminated upon convergence (if there is no improvement in 10 iterations) or after 50 maximum iterations. The final solution included the optimal subset of predictive features and a tuned RFR configuration, offering a compact and biologically interpretable model structure. The ACO parameters were selected based on theoretical analogs to the PSO configuration used by García Nieto et al. (2016). A moderate pheromone evaporation rate (ρ = 0.2), balanced influence weights (α = 1, β = 2), and bounded pheromone levels (τ_min = 10⁻4, τ_max = 1.0) were used to ensure robust exploration and convergence. These settings were validated through sensitivity analysis and align with best practices in ACO literature for high-dimensional feature–parameter optimization (Al-Tohamy et al., 2022).

Model evaluation metrics

The final RFR model was trained using the best hyperparameters found by ACO out of other similar models. Model performance was assessed using a 5-fold cross-validation strategy to ensure robustness and generalizability in (R2 score), the proportion of the variance in the dependent variable that is predictable from the independent variables. The true and predicted chlorophyll-a concentrations and the mean of observed concentrations are displayed. A higher R2 value indicates a better fit.

R2=1-(ytrue-ypred)2(ytrue-y)2 5

Mean absolute error (MAE): Measures the average of the absolute differences between the predicted and actual values. A lower MAE indicates better model performance.

MAE=1n|ytrue-ypred| 6

Mean squared error (MSE): Measures the average of the squared differences between predicted and actual values. This metric emphasizes larger errors.

MSE=1n(ytrue-ypred)2 7

Root mean squared error (RMSE): the square root of the MSE.

Cross-validation strategy

To ensure model robustness, tenfold cross-validation is employed. For imbalanced biomass classes, stratified cross-validation is used to preserve class distribution in each fold. GridSearchCV and RandomizedSearchCV are also applied during model training, further refined through ACO-based optimization.

Comparative analysis with metaheuristic ML models

The ACO–RFR model was benchmarked against alternative hybrid models including PSO–SVM, GA–SVR, and ACO–SVR. All models were evaluated using consistent datasets and metrics. Table 3 presents comparative R2, RMSE, and MAE values for alternative hybrid models, while Table 4 shows metric values of baseline models. ACO–RFR achieved the highest R2 (0.96) and lowest RMSE (0.05 g/L), demonstrating superior predictive performance, robustness to noise, and effective feature dimensionality reduction.

Table 3.

Texture features derived from gray-level co-occurrence matrix (GLCM) analysis

Feature Description
Contrast Measures local variations in the gray-level co-occurrence matrix; higher values indicate greater texture heterogeneity.
Entropy Quantifies the randomness of pixel intensity distribution; higher values reflect more complex textures.
Homogeneity Assesses the closeness of element distribution to the GLCM diagonal; higher values indicate smoother textures.
Energy Represents textural uniformity, also known as angular second moment; higher values indicate more uniform textures.
Correlation Measures the degree of linear dependency of gray levels between neighboring pixels.

Table 4.

Environmental parameters continuously recorded during the culture experiment

Parameter Observed range / average
Temperature (°C) Avg. 30
pH 6.0–8.0
Humidity (%) > 80
UV radiation (μW cm⁻²) 12–16
Solar radiation (kWh m⁻² day⁻¹) 4.0–5.5
Evapotranspiration (mm day⁻¹) 4–5
Rainfall (mm) Variable peak events recorded
Day length ~12 h

Parameter robustness analysis

Although standard ACO parameter settings are often adopted from the literature, their suitability may vary depending on dataset size and optimization objectives. To ensure rationality, we performed a sensitivity analysis on the three key ACO parameters: pheromone evaporation rate (ρ), pheromone influence (α), and heuristic influence (β), consistent with best practices for adapting optimization parameters to biological datasets (Li et al., 2019).

  • ρ (pheromone evaporation): Tested in the range 0.1–0.5. Lower ρ values (0.1–0.2) yielded stable but slower convergence, while higher values (≥0.5) caused unstable oscillations. The chosen ρ = 0.2 provided a balance between stability and convergence speed.

  • α (pheromone influence): Tested between 0.5 and 2.0. α = 1.0 consistently gave the most stable convergence. Higher α (≥ 2.0) led to over-reliance on pheromone trails and premature stagnation, while lower α (≤0.5) reduced exploitation efficiency.

  • β (heuristic influence): Tested between 1.0 and 3.0. β = 2.0 offered the fastest convergence with minimal RMSE. Lower β weakened heuristic guidance, whereas higher β (≥3.0) introduced noisy oscillations.

Together, these results confirmed that the selected parameter set (ρ = 0.2, α = 1, β = 2) provides a robust trade-off between exploration and exploitation, making it suitable for joint feature–hyperparameter optimization under small-sample conditions. To assess robustness under varying cultivation conditions, model performance was further stratified by key environmental parameters. Subsets were created for (i) high temperature (>35 °C), (ii) moderate temperature (25–35 °C), (iii) high turbidity (>threshold NTU), and (iv) low turbidity.

Outlier screening and multicollinearity

To ensure data quality and model robustness, we screened the dataset for both multicollinearity and outliers before training. Multicollinearity, common in multispectral indices where predictors share overlapping spectral bands (e.g., NDVI and NIR reflectance), was assessed using correlation matrices and variance inflation factor (VIF) checks. Redundant predictors were reduced through ACO feature selection, ensuring only the most informative variables were retained. Outliers were identified using complementary statistical methods: values beyond 1.5 × IQR or with standardized z-scores greater than |3| were flagged as abnormal. Each case was cross-checked against field logs, with sensor-related anomalies (e.g., overexposure, shading, or probe malfunctions) removed, while biologically plausible extremes such as peak biomass events were retained to preserve ecological variability. This preprocessing step minimized noise from acquisition artifacts while maintaining the integrity of natural system dynamics, providing a more reliable foundation for the ACO–RFR model. Table 5 shows some libraries used in this study.

Table 5.

Frequency of feature selection across cross-validation folds in the ACO–RFR model

Rank Feature Type Importance (%)
1 NDVI Spectral index 19.6
2 GLCM Entropy Texture 13.2
3 NIR Reflectance Spectral band 11.8
4 Microalgae area Morphological 9.1
5 Light intensity Environmental 4.8

Results and analysis

Biomass sampling and calibration

Biomass ground-truth data were collected concurrently with multispectral imaging. Daily subsamples (50 mL) were extracted from surface, mid-depth, and bottom layers of the tank using a sterile sampling tube to ensure vertical representativeness, as shown in Fig. 1b. Each subsample was transferred to a cuvette and analyzed using a spectrophotometer at 680 nm to estimate optical density (OD), which serves as a proxy for chlorophyll-a concentration (Cadondon et al., 2022).

To establish a quantitative relationship between OD and biomass, additional aliquots were filtered, oven-dried at 70 °C for 24 h, and weighed to determine dry weight (g/L) (Wychen et al., 2021). A linear calibration model was developed from these paired OD and dry weight measurements (R2 > 0.97), allowing subsequent conversion of daily OD values to biomass concentrations. The calibration relationship is illustrated in Fig. 3 and summarized in Table 3X (see Appendix). Each spectral image was matched to its corresponding biomass value collected within a 5-min window, ensuring high temporal alignment between predictors and response variables (Guillevic et al., 2017).

Fig. 3.

Fig. 3

Calibration of optical density (OD₆₈₀) against measured dry biomass (g L⁻¹). The fitted regression was used to convert daily OD measurements into biomass estimates. (Havlik et al., 2022), (Schagerl et al., 2022)

Apparent correlation in unnormalized data is dominated by lighting artefacts

Initial analysis using unnormalized multispectral values yielded deceptively strong coefficients of determination for the red (R² = 0.49) and NIR (R² = 0.49) channels when correlated against biomass, as shown in Figure 4(a,c,e). This apparent strong relationship is, however, likely a measurement artefact. Unnormalized pixel intensity is a product of both the subject's reflective properties and the incident illumination intensity. The high correlation suggests that the dominant variable captured was not algal colour, but rather variation in lighting conditions or camera exposure settings that co-varied with biomass accumulation during the experiment. This effect masks the true biological signal and renders the unnormalized data misleading for quantitative analysis.

Fig. 4.

Fig. 4

Representative reflectance spectra acquired from Chlorella sorokiniana cultures at different growth stages, showing the spectral differences across green, red, and near-infrared bands

Normalization reveals underlying

After normalization to correct for variability in incident light, the coefficients of determination for the relative reflectance values changed significantly: R2 normalized = (0.26, −0.32, 0.36) for red, green, and NIR channels, respectively, as shown in Fig. 4b, d, f. The reduction in absolute R2 value strength indicates the successful removal of the dominant lighting variance, revealing the weaker but biologically meaningful signal. The positive correlation between red and NIR channels is consistent with known pigment absorption profiles. Crucially, the negative correlation observed in the green channel (R2 = −0.32) provides strong validation of the method. This result aligns with the expected optical behavior of dense chlorophyll-containing cultures: while chlorophyll reflects green light, causing a green appearance, increased biomass leads to greater self-shading and reabsorption of photons, thereby reducing the relative proportion of green reflectance. This finding confirms that the normalization procedure successfully isolated the intrinsic reflective properties of the algal culture from external confounding variables.

Environmental data and statistics

During the cultivation period, environmental parameters were continuously monitored. Daily averages included temperature ≈ 30 °C, pH = 6.0–8.0, and relative humidity > 80%. Solar radiation averaged 4.0–5.5 kWh m−2 day−1, with UV intensity peaking at 12–16 µW cm⁻2 during February–April, as shown in Table 6 and Fig. 5. These conditions reflect Johor’s equatorial climate with high insolation and frequent rainfall events. Tables 1 and 3 show the vegetation indices and texture features, respectively.

Table 6.

Comparative performance of baseline models (RF, XGBoost, SVR, ANN) and the hybrid ACO–RFR. Metrics include coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE)

Model RMSE (g/L) MAE (g/L)
RF 0.82 0.145 0.118
XGBoost 0.84 0.138 0.112
SVR 0.78 0.162 0.129
ANN 0.81 0.151 0.121
ACO–RFR 0.92 0.108 0.081
ACO–SVR (optimized) 0.91 1.78 1.42
PSO–SVM (optimized) 0.89 1.95 1.56
ACO–RFR (optimized) 0.96 0.05 0.04

Fig. 5.

Fig. 5

Environmental data charts measured during the cultivation periods

For each multispectral image, reflectance values and derived vegetation indices (e.g., NDVI, NDPI, NDWI, NIR/Green ratio) were summarized using descriptive statistics, including the mean, standard deviation, and percentiles (25th, 50th, and 75th). The mean represents the overall reflectance or index level of the tank, serving as a proxy for average biomass status. The standard deviation quantifies spatial variability, with higher values indicating heterogeneous or patchy biomass distribution, and lower values reflecting uniform growth. Percentiles capture the distribution shape beyond the mean, allowing identification of extreme conditions within the tank (e.g., sparse vs. dense biomass regions). For example, the 25th percentile highlights low-biomass zones, while the 75th percentile emphasizes dense biomass regions. These statistics collectively provided the machine learning models with both central tendency and distributional characteristics, ensuring that predictions accounted for overall biomass levels as well as spatial heterogeneity within cultivation systems.

chlorophyll-a and spectral features

A total of 40 out of 200 multispectral images were collected for C. sorokiniana cultivation and paired with biomass ground-truth (0.19–1.44 g L−1; mean = 0.90 ± 0.33). Feature extraction yielded 49 predictors: 14 spectral indices (e.g., NDVI, NDPI, SABI, NDWI, NIR/green), 15 texture features (GLCM, LBP), nine morphological descriptors, and 11 environmental parameters. In addition to mean reflectance and indices, the standard deviation and percentiles of each feature captured spatial heterogeneity within the tank. Tank with low SD and narrow interquartile ranges corresponded to uniform biomass growth, whereas high SD and wide percentile spreads reflected patchy biomass distribution or surface aggregation. This information allowed the model to differentiate between tank with similar NDVI values but contrasting spatial patterns, improving predictive accuracy. Correlation analysis indicated that NDVI, FLH, and GLCM entropy were strongly associated with biomass (r > 0.6), while turbidity and GLCM contrast showed moderate correlations (r ≈ 0.4), as shown in Fig. 6.

Fig. 6.

Fig. 6

Distribution of spectral, textural, and environmental features after preprocessing. Outliers were screened using IQR and z-score criteria, with biologically implausible values removed and retained points reflecting natural variability

Feature importance

Random forest feature importance identified NDVI (19.6%), GLCM entropy (13.2%), and NIR reflectance (11.8%) as the strongest predictors. Morphological descriptors (area, perimeter, circularity) and environmental parameters (light, pH, turbidity) contributed modestly. Table 5 shows the ranked feature importance of the ACO-RFR algorithm and Fig. 7a shows the visual plot. In Fig. 7b, each point represents a sample, with color indicating the magnitude of the feature value (red = high, NIR = low). Features are ranked by their average absolute SHAP value, showing that NDVI, NIR reflectance, and GLCM entropy had the strongest influence on biomass prediction, followed by microalgae area and light intensity. Positive SHAP values indicate a contribution to higher biomass predictions, while negative values indicate the opposite as shown in Fig. 7b.

Fig. 7.

Fig. 7

(a) Feature selection frequency across cross-validation folds using the ACO–RFR framework. Stable core features, including NDVI, NIR, and GLCM contrast, were consistently selected in ACO-RFR. (b) SHAP summary plot showing the relative contributions of spectral, texture, and environmental features in the ACO–RFR model. NDVI, NIR, and temperature were the dominant predictors 2

Feature selection and model optimization

Ant colony optimization (ACO) effectively reduced the dimensionality of the feature set while tuning random forest regressor (RFR) hyperparameters. From an initial pool of 49 candidate features (spectral indices, reflectance statistics, texture, morphology, and environmental variables), ACO consistently selected 12–15 features across iterations. The most frequently chosen predictors included NDVI, NIR/green ratio, fluorescence line height (FLH), GLCM entropy, and turbidity, highlighting their strong relevance to biomass variability. Less informative features, such as redundant reflectance bands and shape descriptors, were systematically excluded, thereby improving model parsimony, as shown in Table 3. Using 5-fold cross-validation on the 40 valid samples, the optimized ACO–RFR achieved:

  • R2 = 0.94 (95% CI 0.91–0.97)

  • RMSE = 0.06 g L−1 (95% CI 0.05–0.08)

  • MAE = 0.04 g L−1

Compared with baseline RFR (using all features and default hyperparameters), the ACO–RFR reduced error by ~18% and improved stability across folds, confirming the benefit of joint feature–hyperparameter optimization under small-sample conditions. Figure 8a–c shows fitness convergence of ACO–RFR, pheromone heatmap of feature selection, and RMSE convergence curve, respectively.

Fig. 8.

Fig. 8

Sensitivity analysis of ACO parameters. a Effect of pheromone evaporation rate (ρ) on convergence; b effect of α and β on solution stability; c convergence trends across iterations

Performance comparisons with baseline models

To evaluate the proposed ACO–RFR framework, we benchmarked its performance against baseline models including random forest (RF), XGBoost, support vector regression (SVR), and artificial neural networks (ANN) as shown in Table 7. The standalone RF achieved an R2 of 0.82 with an RMSE of 0.145 g/L, while XGBoost marginally improved accuracy (R2 = 0.84, RMSE = 0.138 g/L). SVR performed less favorably, yielding an R2 of 0.78 and RMSE of 0.162 g/L, indicating limitations in handling nonlinear spectral–environmental interactions. ANN captured nonlinearities better than SVR but displayed instability across cross-validation folds. The ACO–RFR hybrid significantly outperformed all baselines, achieving an R2 of 0.92, RMSE of 0.108 g/L, and MAE of 0.081 g/L. This enhancement arises from the ant colony optimization (ACO) algorithm’s capacity to identify the most informative feature subsets and tune RF hyperparameters simultaneously, thereby reducing feature redundancy and enhancing generalization. Predicted versus observed scatter plots showed that ACO–RFR predictions clustered more tightly along the 1:1 line, with fewer systematic deviations compared to the broader dispersion observed in baseline models. Residual error analysis confirmed that the hybrid approach reduced both variance and bias, demonstrating superior predictive stability. Collectively, these results confirm that the ACO–RFR framework provides a more reliable and scalable solution for microalgae biomass estimation than conventional machine learning approaches.

Table 7.

Sensitivity of ACO parameters (ρ, α, β) on model convergence and performance metrics

Parameter setting R² (95% CI) RMSE (g L⁻¹, 95% CI)
ρ = 0.1 0.93 (0.90–0.95) 0.062 (0.056–0.068)
ρ = 0.2 0.94 (0.91–0.96) 0.060 (0.054–0.066)
ρ = 0.3 0.94 (0.91–0.96) 0.061 (0.055–0.067)
ρ = 0.5 0.93 (0.89–0.95) 0.063 (0.057–0.069)
α = 0.5, β = 2.0 0.92 (0.89–0.94) 0.066 (0.060–0.072)
α = 1.0, β = 2.0 0.94 (0.91–0.96) 0.060 (0.054–0.066)
α = 2.0, β = 2.0 0.93 (0.90–0.95) 0.063 (0.057–0.069)
α = 1.0, β = 1.0 0.92 (0.88–0.94) 0.065 (0.059–0.071)
α = 1.0, β = 3.0 0.93 (0.90–0.95) 0.064 (0.058–0.070)

Prediction accuracy

The ACO–RFR outperformed baseline (e.g., RFR) and hybrid models (e.g., PSO-SVM). Scatterplots showed their clustering around the 1:1 line, while residual analysis indicated homoscedasticity with minimal bias. ANOVA confirmed significant differences in RMSE across models (F (3, 36) = 15.7, p = 0.05). Post-hoc Tukey’s HSD indicated that ACO–RFR significantly outperformed all comparators, as shown in Table 6. Figure 9a shows predicted versus observed or measured biomass and Fig. 9b shows the residual of distribution across models. Figure 10 is the Taylor diagram summarizing model performance visually.

Fig. 9.

Fig. 9

a Predicted versus observed biomass for baseline models (RF, XGBoost, SVR, ANN) and the hybrid ACO–RFR. The dashed 1:1 line indicates perfect agreement. ACO–RFR predictions cluster more tightly along this line compared with baselines. b Residual distributions across models, demonstrating reduced variance and bias for ACO–RFR

Fig. 10.

Fig. 10

Taylor diagram summarizing model performance. Radial distance represents standard deviation relative to observed values; angle corresponds to correlation coefficient. ACO–RFR is positioned closest to the reference, reflecting higher correlation and variance fidelity than baseline models

Sensitivity analysis of ACO parameters

A systematic sensitivity analysis was conducted to examine the effect of key ACO hyperparameters—pheromone evaporation rate (ρ), pheromone influence (α), and heuristic influence (β)—on the stability and predictive accuracy of the RF–ACO framework. Across the full range of tested values, predictive performance was stable, with R2 consistently between 0.93–0.94 and RMSE within 0.060–0.066 g L−1, as shown in Table 6. However, convergence dynamics were more sensitive to parameter variation. The pheromone evaporation rate (ρ) primarily controlled the trade-off between convergence speed and stability. A low value (ρ = 0.1) delayed convergence, requiring ~40 iterations, whereas a high value (ρ ≥ 0.5) induced oscillatory behavior and unstable optimization. The intermediate setting ρ = 0.2 achieved the best balance, converging smoothly in ~30 iterations with the highest stability. The weighting parameters α and β modulated the balance between pheromone reinforcement and heuristic guidance. An α of 1.0 avoided stagnation associated with stronger pheromone reliance (α = 2.0), while still preventing inefficient random exploration seen at α < 1. Similarly, β = 2.0 provided effective heuristic weighting, outperforming both weaker guidance (β = 1.0) and over-emphasis (β = 3.0), which led to instability. This configuration (α = 1.0, β = 2.0) yielded the most consistent improvements in both convergence stability and predictive accuracy, as shown in Table 7. Collectively, these findings confirmed that the adopted configuration (ρ = 0.2, α = 1.0, β = 2.0) was not arbitrary but empirically justified. The chosen settings aligned with prior reports that moderate pheromone decay and heuristic weighting strike a balance between exploration and exploitation in high-dimensional optimization and are particularly suitable for small-sample, feature-redundant remote sensing problems (Somvanshi et al., 2025).

Robustness across sampling conditions

Performance remained consistent across environmental stratifications (e.g., turbidity, temperature), with R² = 0.91–0.96 and RMSE = 0.05–0.10 g L⁻¹.

Robustness under environmental variability

While overall model performance ranged from R2 = 0.91–0.96, additional stratified analyses were conducted to evaluate robustness under varying environmental conditions. Data were grouped according to temperature (moderate: 25–35 °C; high: >35 °C) and turbidity (low vs. high, threshold defined by NTU measurements). The RF–ACO model consistently maintained high predictive performance across all groups, as shown in Table 8. Under moderate temperature and low turbidity, performance was optimal (R2 = 0.94–0.95; RMSE ≈ 0.058–0.060 g L−1). Performance decreased slightly under extreme conditions, including high temperature (>35 °C; R2 = 0.91, RMSE = 0.070 g L−1) and high turbidity (R2 = 0.92, RMSE = 0.068 g L−1). These reductions are attributable to physiological stress and increased light scattering, respectively. Importantly, R2 remained above 0.90 in all scenarios, demonstrating strong model robustness.

Table 8.

Robustness of ACO–RFR under different environmental conditions. Results are stratified by turbidity and temperature thresholds

Condition RMSE (g L⁻¹) MAE (g L⁻¹)
Turbidity < 15 NTU 0.93 0.105 0.080
Turbidity ≥ 15 NTU 0.88 0.125 0.095
Temperature < 35 °C 0.94 0.098 0.076
Temperature ≥ 35 °C 0.87 0.130 0.099

Discussion

Model accuracy and feature reduction

The ACO–random forest system created in this study was very good at predicting algal biomass from multispectral images, with R2 values reaching up to 0.987 and RMSE as low as 0.05 g/L (Dada et al., 2025). The results show that the model can understand complicated connections between different features and the chlorophyll-a content of microalgae in real-world situations. Notably, the integration of ACO significantly reduced model dimensionality, selecting a compact set of ~ 14 features from an initial pool of 49, representing an approximate 65% reduction in complexity. This is particularly advantageous for real-time deployment, as it reduces computational load without sacrificing predictive power.

The synergy between ACO and RFR facilitated both performance optimization and model interpretability. By concentrating on important biological features—like NDVI, red-edge reflectance ratios, GLCM entropy, and morphological area—the model stayed accurate and became easier to understand. These features correspond to well-established indicators of photosynthetic pigment concentration and biomass structure, consistent with previous studies (Havlik et al., 2022; Wu et al., 2024). The ability to track how often features are chosen using pheromone matrices also makes the process clearer and easier to repeat important qualities for use in environmental monitoring and making regulatory decisions (Abdulghani & Abdulghani, 2024; Khalil et al., 2024; Wang et al., 2024).

Biological interpretation of selected features

The features that the ACO algorithm often picks match well with the known biological factors that help microalgae grow and produce chlorophyll. For example, high NDVI and NIR/red ratios show that there is a lot of chlorophyll being absorbed in the red light and reflected in the NIR, which is common in thick algal groups. GLCM entropy and area-based shape descriptors offer structural information regarding canopy heterogeneity and morphological expansion during exponential growth phases. Environmental parameters such as light intensity, pH, and turbidity were also selected in several iterations, suggesting their modulating influence on biomass accumulation, particularly in outdoor, semi-controlled environments. This alignment reinforces the model’s biological validity and confirms that its performance is not driven solely by statistical patterns but by meaningful physiological correlates. The clarity provided by ACO boosts trust in the model’s predictions and helps create new ideas for research on algae in nature or industry—like finding light levels that indicate early algae growth or when nutrients are lacking (Schagerl et al., 2022; Cadondon et al., 2022; Bai et al., 2022).

Comparative evaluation with alternative methods

The ACO–RFR model performed better in both accuracy and reliability than baseline regressors and other combined methods like GA–SVR and PSO–SVM. While the GA–RFR model did well in separate tests, it struggled more with repeated features and costs. Likewise, PSO-based models reached solutions faster in continuous areas but often got stuck early, which made them less adaptable. While the GA–RFR model achieved competitive results in isolated trials, its performance was more sensitive to feature redundancy and computational cost. Similarly, PSO-based models exhibited faster convergence in continuous domains but were prone to early stagnation, resulting in reduced generalizability. The discrete, pheromone-driven search behavior of ACO proved more effective in balancing exploration and exploitation over a high-dimensional feature space, leading to more stable outcomes across repeated runs (Maraveas et al., 2023; Dada et al., 2025; García Nieto et al., 2016).

The hybrid ACO–RFR framework performed better than the individual RFR and SVR models during both training and testing, showing that adjusting features dynamically is crucial for estimating biomass effectively. While previous studies have used grid search or random search for hyperparameter optimization, these approaches lack the adaptivity of swarm intelligence-based methods, particularly under noisy or nonlinear feature distributions common in outdoor cultivation environments (Sahu et al., 2023; Mahmoudzadeh et al., 2024).

Scalability and practical implications

The lightweight final model, along with cost-effective imaging hardware (MAPIR RGN camera), makes the ACO–RFR pipeline suitable for real-time, edge-based deployment. Once trained, the random forest model performs inference in under one second per image on standard hardware, enabling its use in autonomous monitoring systems such as UAVs, floating sensors, or smart bioreactors. This is particularly valuable for high-throughput applications in aquaculture, water quality monitoring, and biomass optimization, where rapid and non-invasive assessment is critical (Amoriello et al., 2024; Havlik et al., 2022; Ricardo et al., 2019).

Although the ACO algorithm itself is computationally intensive during training—due to its iterative pheromone update process and multi-agent search—the optimization is performed offline and need not be repeated during inference. This separation of training and deployment phases allows the use of powerful search heuristics like ACO without compromising on real-time performance (Kim et al., 2025).

Limitations and future directions

Despite its strong performance, the proposed model has several limitations. First, the training data mostly came from two common types of microalgae (Chlorella vulgaris and C. sorokiniana) grown in a controlled setting, which may not work as well in natural waters or with different species. Future research should test the model with various types of organisms, nutrient levels, and environments, such as muddy rivers, nutrient-rich lakes, and large-scale photobioreactors.

Second, even though the steps taken to improve data quality (like correcting reflectance with white panels and making adjustments based on regression) helped, they also added extra steps that could make it harder to scale up. Adding onboard light sensors, automatic calibration processes, or machine learning methods for adjusting light would make the system more reliable in real-world situations. Finally, the current model is temporally static and does not incorporate sequential learning or memory of previous growth stages. Utilizing time-based models, such as recurrent neural networks (RNNs) or transformer-based systems, could improve how we monitor growth changes and help predict future biomass levels based on past data (Hakala et al., 2018; Özkan et al., 2023; Geogdzhayev et al., 2021).

Conclusion

This study presents a robust, interpretable, and scalable framework for algae biomass estimation by integrating multispectral imaging with a biologically grounded hybrid optimization model. The proposed ant colony optimization–random forest (ACO–RFR) approach simultaneously addresses three major challenges in biomass estimation: high-dimensional feature redundancy, nonlinear spectral-biological relationships, and the complexity of model tuning. By smartly choosing important features and adjusting settings as needed, the ACO–RFR model performed very well in predictions (R2 = 0.96–0.987) while cutting down the number of inputs by more than 60%. This balance of accuracy and simplicity positions the framework as a viable solution for operational algae monitoring, particularly in semi-controlled aquaculture systems or environmental research deployments.

The model’s choice of features matched what we know about how chlorophyll absorbs light and how plants grow, which shows it is relevant to ecology. Additionally, using affordable imaging tools and quick data processing makes it easy to include in real-time decision-making systems like floating sensors, drones, or smart photobioreactors. Beyond prediction, the model’s interpretability and generalizability make it suitable for research and industrial applications requiring transparency, reproducibility, and adaptability to changing conditions. While currently validated under mesocosm settings, the approach can be extended to diverse species, environmental gradients, and larger spatial scales (Aghelpour et al., 2023).

Future work will aim to test the model in real-world and larger systems, improve its ability to predict changes over time, and look into ways to automatically correct reflectance measurements. The integration of hyperspectral sensors, advanced texture analysis, or deep learning techniques may further enhance model precision and scalability. In summary, this work contributes a novel, high-performance, and biologically interpretable modelling paradigm for non-destructive algae biomass estimation—bridging the gap between ecological monitoring and practical aquaculture management.

Acknowledgements

The authors gratefully acknowledge the financial support provided by Universiti Teknologi Malaysia through the following research grants: Vote No. R.J130000.7723.4J626 and R.J130000.7323.4J686.

APPENDIX

Table 3x.

Biomass Calibration from optical density measurements at 680 nm and dry weight biomass (g/L) for chlorophyll-a estimation

No OD@680nm Measured biomass (g/L) Predicted biomass (g/L)
1 0.3 0.15 0.17
2 0.5 0.32 0.32
3 0.7 0.49 0.27
4 0.9 0.66 0.62
5 1.1 0.83 0.77
6 1.3 1.0 0.92

Table 4x.

Optimized Random Forest hyperparameters selected by the ACO algorithm

Hyperparameter Description Optimal value
n_estimators Number of trees 300
max_depth Maximum tree depth 12
min_samples_split Minimum samples per split 2
min_samples_leaf Minimum samples per leaf 1
max_features Features per split sqrt

Author contribution

Author Contributions Mohamad Shukri bin Zainal Abidin: Conceptualization, Formal analysis, Project administration, Funding acquisition, Review and editing. Keshinro Kazeem Kolawole: Conceptualization, Investigation, Writing – original draft. Mohd Farizal bin Kamaroddin: Methodology, Validation, Data curation. Muhammad Sharul Azwan bin Ramli: Data curation, Writing – original draft. Sikudhan Lucas Mpuhus: Methodology, Data curation, Writing – original draft. Ardiansyah Rizqi: Data curation, Writing – original draft. All authors reviewed this manuscript.

Funding

Open access funding provided by The Ministry of Higher Education Malaysia and Universiti Teknologi Malaysia This research work was supported by the Universiti Technologi Malaysia through two research funds, Vote No. R.J130000.7723.4J626 and R.J130000.7323.4J686.

Data availability

The datasets generated and analyzed during this study are available from the corresponding author on reasonable request. Multispectral image data and biomass measurements supporting the findings are not publicly archived due to size and institutional restrictions but can be shared for academic purposes upon request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Abdulghani, B. A., & Abdulghani, M. A. (2024). A comprehensive review of ant colony optimization in swarm intelligence for complex problem solving. Acadlore Transactions on AI and Machine Learning,3(4), 214–224. 10.56578/ataiml030403 [Google Scholar]
  2. Aghelpour, P., Graf, R., & Tomaszewski, E. (2023). Coupling ANFIS with ant colony optimization (ACO) algorithm for 1-, 2-, and 3-days ahead forecasting of daily streamflow, a case study in Poland. Environmental Science and Pollution Research,30(19), 56440–56463. 10.1007/s11356-023-26239-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Al-Tohamy, R., Ali, S. S., Li, F., Okasha, K. M., Mahmoud, Y. A. G., Elsamahy, T., Jiao, H., Fu, Y., & Sun, J. (2022). A critical review on the treatment of dye-containing wastewater: Ecotoxicological and health concerns of textile dyes and possible remediation approaches for environmental safety. Ecotoxicology and Environmental Safety, 231, 113160. 10.1016/j.ecoenv.2021.113160
  4. Al-Ali, A. S., & Qidwai, U. (2024). Rule-based modeling of low-dimensional data with PCA and binary particle swarm optimization (BPSO) in Anfis. 1–41. https://www.ssrn.com/abstract=4789178
  5. Amoriello, T., Mellara, F., Amoriello, M., & Ciccoritti, R. (2024). Evaluation of nutritional values of edible algal species using a shortwave infrared hyperspectral imaging and machine learning technique. Foods. 10.3390/foods13142277 [Google Scholar]
  6. Bai, Z., Xie, M., Hu, B., Luo, D., Wan, C., Peng, J., & Shi, Z. (2022). Estimation of soil organic carbon using vis-NIR spectral data and spectral feature bands selection in southern Xinjiang, China. Sensors (Basel). 10.3390/s22166124 [Google Scholar]
  7. Cadondon, J. G., Ong, P. M. B., Vallar, E. A., Shiina, T., & Galvez, M. C. D. (2022). Chlorophyll-a pigment measurement of spirulina in algal growth monitoring using portable pulsed LED fluorescence lidar system. Sensors, 22, 2940. 10.3390/s22082940
  8. Dada, B. A., Nwulu, N. I., & Olukanmi, S. O. (2025). Bayesian optimization with Optuna for enhanced soil nutrient prediction: A comparative study with genetic algorithm and particle swarm optimization. Smart Agricultural Technology,12(October 2024), Article 101136. 10.1016/j.atech.2025.101136 [Google Scholar]
  9. Fatima, S., Hussain, A., Amir, S. B., Ahmed, S. H., & Aslam, S. M. H. (2023). XGBoost and random forest algorithms: An in depth analysis. Pakistan Journal of Scientific Research,3(1), 26–31. 10.57041/pjosr.v3i1.946 [Google Scholar]
  10. García Nieto, P. J., García-Gonzalo, E., Alonso Fernández, J. R., & Díaz Muñiz, C. (2016). A hybrid PSO optimized SVM-based model for predicting a successful growth cycle of the Spirulina platensis from raceway experiments data. Journal of Computational and Applied Mathematics, 291, 293–303. 10.1016/j.cam.2015.01.009
  11. Geogdzhayev, I. V., Marshak, A., & Alexandrov, M. (2021). Calibration of the DSCOVR EPIC visible and NIR channels using multiple LEO radiometers. Frontiers in Remote Sensing,2(May), 1–13. 10.3389/frsen.2021.671933 [Google Scholar]
  12. Guillevic, P., Göttsche, F., Nickeson, J., Román, M., Guillevic, A. P., Göttsche, F., Nickeson, J., Hulley, G., Ghent, D., Yu, Y., Hook, S., Sobrino, J. A., Remedios, J., Román, M., & Camacho, F. (2017). Committee on Earth Observation Satellites Working Group on Calibration and validation land product validation subgroup land surface temperature product validation best practice protocol. October, 1–60. 10.5067/doc/ceoswgcv/lpv/lst.001
  13. Hakala, T., Markelin, L., Honkavaara, E., Scott, B., Theocharous, T., Nevalainen, O., Näsi, R., Suomalainen, J., Viljanen, N., Greenwell, C., & Fox, N. (2018). Direct reflectance measurements from drones: Sensor absolute radiometric calibration and system tests for forest reflectance characterization. Sensors (Basel). 10.3390/s18051417 [Google Scholar]
  14. Havlik, I., Beutel, S., Scheper, T., & Reardon, K. F. (2022). On-line monitoring of biological parameters in microalgal bioprocesses using optical methods. Energies,15(3), 1–27. 10.3390/en15030875 [Google Scholar]
  15. Huang, S., Tang, L., Hupy, J. P., Wang, Y., & Shao, G. (2021). A commentary review on the use of Normalized Difference Vegetation Index (NDVI) in the era of popular remote sensing. Journal of Forestry Research,32(1), 1–6. 10.1007/s11676-020-01155-1 [Google Scholar]
  16. Kamble, S. G., & Dubey, A. K. (2022). K-means based quality prediction of object-oriented software using LR-ACO. International Journal of Advanced Computer Research, 12(59). 10.19101/ijacr.2021.1152064
  17. Khalil, M., AlSayed, A., Liu, Y., & Vanrolleghem, P. A. (2024). An integrated feature selection and hyperparameter optimization algorithm for balanced machine learning models predicting N2O emissions from wastewater treatment plants. Journal of Water Process Engineering,63(May), Article 105512. 10.1016/j.jwpe.2024.105512 [Google Scholar]
  18. Kim, J., Kim, H., Kim, H. G., Lee, D., & Yoon, S. (2025). A comprehensive survey of deep learning for time series forecasting: Architectural diversity and open challenges. Artificial Intelligence Review. 10.1007/s10462-025-11223-9 [Google Scholar]
  19. Li, F., Wang, L., Liu, J., Wang, Y., & Chang, Q. (2019). Evaluation of leaf N concentration in winter wheat based on discrete wavelet transform analysis. Remote Sensing. 10.3390/rs11111331 [Google Scholar]
  20. Maraveas, C., Asteris, P. G., Arvanitis, K. G., Bartzanas, T., & Loukatos, D. (2023). Application of bio and nature-inspired algorithms in agricultural engineering. Archives of Computational Methods in Engineering. 10.1007/s11831-022-09857-x [Google Scholar]
  21. Mahmoudzadeh, A., Hadavimoghaddam, F., Atashrouz, S., et al. (2024). Modeling CO2 loading capacity of diethanolamine (DEA) aqueous solutions using advanced deep learning and machine learning algorithms: application to carbon capture. Korean Journal of Chemical Engineering, 41, 1427–1448. 10.1007/s11814-024-00094-5
  22. Mokhtarzadeh, H., Gorjian, S., & Minaei, S. (2025). Design, development, and evaluation of a low-cost smart solar-powered weather station for use in agricultural environments. Results in Engineering,26(January), Article 104848. 10.1016/j.rineng.2025.104848 [Google Scholar]
  23. Özkan, K., Korkmaz, M., Amorim, C. A., Yılmaz, G., Koru, M., Can, Y., Pacheco, J. P., Acar, V., Çolak, M. A., Yavuz, G. C., Cabrera-Lamanna, L., Arıkan, O., Tanrıverdi, Ö., Ertuğrul, S., Arık, İG., Nesli, H., Tunur, İH., Kuyumcu, B., Akyürek, Z., … Jeppesen, E. (2023). Mesocosm design and implementation of two synchronized case study experiments to determine the impacts of salinization and climate change on the structure and functioning of shallow lakes. Water. 10.3390/w15142611 [Google Scholar]
  24. Pasquier, G., Doyen, P., Kazour, M., Dehaut, A., Diop, M., Duflos, G., & Amara, R. (2022). Manta net: The golden method for sampling surface water microplastics in aquatic environments. Frontiers in Environmental Science, 10(April), 1–12. 10.3389/fenvs.2022.811112
  25. Ricardo, D., Ónodi, G., & Kröel-Dulay, G. (2019). Enhancement of ecological field experimental research by means of UAV multispectral sensing. Drones,3(1), 7. 10.3390/drones3010007 [Google Scholar]
  26. Rubbens, P., Brodie, S., Cordier, T., Barcellos, D. D., Devos, P., Fernandes-salvador, J. A., Fincham, J. I., Gomes, A., Handegard, O., Howell, K., Jamet, C., Kartveit, K. H., Moustahfid, H., Parcerisas, C., Politikos, D., Sauzède, R., Sokolova, M., Uusitalo, L., Bulcke, L. Van Den, …, Pala, A. (2023). Machine learning in marine ecology : An overview of techniques and applications. May, 1829–1853. 10.1093/icesjms/fsad100
  27. Sahu, R., Qiu, L., Hease, W., Arnold, G., Minoguchi, Y., Rabl, P., & Fink, J. M. (2023). Entangling microwaves with light. Science, 380(6646), 718–721. 10.1126/SCIENCE.ADG3812
  28. Schagerl, M., Siedler, R., Konopáčová, E., & Ali, S. S. (2022). Estimating biomass and vitality of microalgae for monitoring cultures: A roadmap for reliable measurements. Cells. 10.3390/cells11152455 [Google Scholar]
  29. Shaikh, M. S., Jaferzadeh, K., Thörnberg, B., & Casselgren, J. (2021). Calibration of a hyper-spectral imaging system using a low-cost reference. Sensors. 10.3390/s21113738 [Google Scholar]
  30. Somvanshi, S., Islam, M. M., Javed, S. A., Chhetri, G., Islam, K. S., Chowdhury, T. I., Polock, S. B. B., Dutta, A., & Das, S. (2025). A comprehensive survey on bio-inspired algorithms: Taxonomy, applications, and future directions. arXiv. http://arxiv.org/abs/2506.04238
  31. Van Wychen, S., Rowland, S. M., Lesco, K. C., Shanta, P. V., Dong, T., & Laurens, L. M. L. (2021). Advanced mass balance characterization and fractionation of algal biomass composition. Journal of Applied Phycology. 10.1007/s10811-021-02508-x [Google Scholar]
  32. Wang, H., Liang, Q., Hancock, J. T., & Khoshgoftaar, T. M. (2024). Feature selection strategies: A comparative analysis of SHAP-value and importance-based methods. Journal of Big Data. 10.1186/s40537-024-00905-w [Google Scholar]
  33. Wang, Y., Wang, J., Li, J., Wang, J., Xu, H., Liu, T., & Wang, J. (2025). Estimating maize leaf water content using machine learning with diverse multispectral image features. Plants,14(6), 1–22. 10.3390/plants14060973 [Google Scholar]
  34. Wu, X., Zhang, Z., & Zhang, X. (2024). Operating optimization of biomass direct-fired power plant integrated with carbon capture system considering the life cycle economic and CO2 reduction performance. Renewable Energy, 225(November 2023), 120294. 10.1016/j.renene.2024.120294
  35. Xu, D., Wang, C., Chen, J., Shen, M., Shen, B., Yan, R., Li, Z., Karnieli, A., Chen, J., Yan, Y., Wang, X., Chen, B., Yin, D., & Xin, X. (2021). The superiority of the Normalized Difference Phenology Index (NDPI) for estimating grassland aboveground fresh biomass. Remote Sensing of Environment,264, Article 112578. 10.1016/j.rse.2021.112578 [Google Scholar]
  36. Zhang, W., Liu, X., Liu, L., Lu, H., Wang, L., & Tang, J. (2022). Effects of microplastics on greenhouse gas emissions and microbial communities in sediment of freshwater systems. Journal of Hazardous Materials, 435(April), 129030. 10.1016/j.jhazmat.2022.129030
  37. Zeng, L., & Chen, C. (2018). Using remote sensing to estimate forage biomass and nutrient contents at different growth stages. Biomass and Bioenergy,115, 74–81. 10.1016/j.biombioe.2018.04.016 [Google Scholar]
  38. Zeng, Y., Hao, D., Badgley, G., Damm, A., Rascher, U., Ryu, Y., Johnson, J., Krieger, V., Wu, S., Qiu, H., Liu, Y., Berry, J. A., & Chen, M. (2021). Estimating near-infrared reflectance of vegetation from hyperspectral data. Remote Sensing of Environment,267, Article 112723. 10.1016/j.rse.2021.112723 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and analyzed during this study are available from the corresponding author on reasonable request. Multispectral image data and biomass measurements supporting the findings are not publicly archived due to size and institutional restrictions but can be shared for academic purposes upon request.


Articles from Environmental Monitoring and Assessment are provided here courtesy of Springer

RESOURCES