Abstract
This study used machine learning models to investigate the potential of biosorbents derived from natural fruit seed waste (apricot, almond, and walnut) for removing a cationic dye. Levulinic acid (LA)-modified powders of almond shell (ASh), apricot kernel shell (APKSh), and walnut shell (WSh) were used to remove methylene blue (MB) from an aqueous solution, producing 105 experimental data points under various circumstances. Attributes included pH (3–5), adsorbent dose (0.4–6.0 g/L), concentration (10–500 mg/L), time (30–600 min), and temperature (25–55 °C). Species information was incorporated into the data set using the One-Hot Encoding method. The data were normalized using the min-max method, and due to the non-normal distribution of the data, Spearman correlation analysis was employed to rank the importance of the attributes. Gradient Boosting (GB), Multilayer Perceptron (MLP), XGBoost (XGB), and Random Forest (RF) algorithms were applied for regression estimation. Based on 5-fold cross-validation results, the GB model achieved the highest performance, with R2 values of 0.8858 for removal percentage and 0.9532 for adsorption capacity.
Introduction
Discharging chemical pollutants into water bodies due to industrial activities, particularly within the chemical and textile sectors, is a critical environmental issue today. These pollutants encompass many substances, including organic compounds, heavy metals, and synthetic dyes. Among them, dyes used extensively in the textile industry are of particular concern due to their structural complexity and persistence. Structurally, dyes are aromatic compounds containing functional groups such as azo, anthraquinone, methyl, carbonyl, nitro, and aryl methane. Most modern dyes are synthetic and derived from petroleum, making them highly resistant to natural biodegradation. As early as the 19th century, awareness grew regarding the hazardous effects of synthetic dyes, including their toxicity and carcinogenic potential, which spurred efforts to develop effective wastewater treatment strategies. Nevertheless, despite these known environmental risks, the use of synthetic dyes remains widespread due to their vibrant color range, year-round availability, and ease of industrial application.
The removal of dyes from wastewater has been investigated using a range of treatment methods, such as coagulation–flocculation, membrane filtration, advanced oxidation processes, biological treatments, − chemical oxidation, and adsorption. − Because of its ease of use, adaptability, nontoxicity, and controllable parameters, adsorption has become one of the most popular and effective techniques among them.
Natural materialssuch as lignocellulosic and plant-based wastes, clays, biopolymers, biocomposites, and activated carbons derived from biomassare commonly used as adsorbents due to their eco-friendly origin. Numerous investigations have assessed the performance of these materials in removing various hazardous dyes from wastewater. , For instance, agricultural residues like fruit stones and peels have been pyrolyzed to produce activated carbon for decolorization. However, the production process can be cost-prohibitive and often leads to challenges in replacing and disposing of the spent carbon. Consequently, research has increasingly focused on directly applying inexpensive, readily available waste materials for dye adsorption. Agricultural byproducts, in particular, offer a promising and sustainable alternative to synthetic adsorbents owing to their availability, low cost, and environmental safety. Notable examples of such bioadsorbents include cashew, Terminalia chebula, almond, walnut, and apricot shells, , cherry kernel, peanut peach kernel shell, pistachio, and coconut, all of which have been successfully utilized in treating dye-laden wastewater.
In recent years, the application of machine learning (ML) techniques has expanded significantly across various scientific domains, including chemistry and environmental engineering. For instance, Shomope et al. created a multilayer perceptron-based artificial neural network (ANN) to forecast biohydrogen production from organic waste materials. The model was trained using 180 data points collected from 35 different investigations, with the output being hydrogen yield and the inputs being substrate and inoculum type, concentration, pH, and temperature. The model demonstrated strong performance, achieving an R2 value of 0.8381 through 5-fold cross-validation. Similarly, Mahata et al. investigated biohydrogen production via dark fermentation and compared the effectiveness of various ML approaches, including ANN, Support Vector Machines (SVM), and response surface methodology. Among these, SVM delivered the best predictive accuracy with an R2 of 0,988. In another study, Hosseinzadeh et al. employed Gradient Boosting (GB), SVM, Random Forest (RF), and AdaBoost algorithms for modeling hydrogen production, achieving R2 scores of 0.893, 0.885, 0.902, and 0.889, respectively.
Further contributions by Shomope et al. focused on hydrogen generation in proton exchange membrane (PEM) electrolysis systems. Random Forest and XGBoost algorithms yielded exceptional prediction accuracy, with R2 values of 0.9898 and 0.9894. Similarly, Bilgiç et al. applied an ANN model to examine the effects of parameters such as magnetic field strength, electrode material, electrolyte type, temperature, and time on hydrogen production in water electrolysis systems, reporting a high correlation (R = 0.973) between model outputs and experimental data. Odabaşı et al. utilized ML techniques to assess and predict key performance indicators of reverse osmosis (RO) membranes in municipal wastewater recovery, identifying ANN as the most accurate for pressure prediction, RF for salt passage, and multiple linear regression for permeate flow rate.
AI models can accurately predict the effectiveness of adsorbents in removing pollutants from wastewater. They have great potential in water treatment thanks to hybrid systems with appropriate data integration. ML models have been employed in literature to predict the performance of adsorbent materials in dye removal. In the study conducted by Liu et al. to predict the adsorption capacity of hydrochar on different dyes, the GB model was the most successful model, with an R2 value of 0.9629 being reported. The most effective variable in the feature importance analysis was the experimental conditions. In the study conducted by Hamri et al. with Kaolinite (DD3) and its acid-treated form, the GPR-PSO (Gaussian Process Regression–Particle Swarm Optimization) model yielded the most optimal result; R2 = 0.9978 was obtained. The highest adsorption was observed at a pH of 11. In the study of Gamboa et al., the GB model was combined with Bayesian optimization in the removal of Congo red using biochar (ABHC), with the best result being obtained with approximately 90.47% efficiency. Rajput et al. found that, in their study on methylene blue removal, the RF model was the most successful. They achieved the highest accuracy with R2 = 0.94. The most effective variable was determined to be the initial dye concentration (C 0). In the study by Kulkarni et al., the XGBoost model demonstrated the optimal performance in the context of engineered carbon systems (ECS), with an R2 value of 0.978 being reported. The most effective factors are identified as dye concentration, ECS dose and pH. The findings of these studies demonstrate the potential of machine learning to generate precise predictions in water treatment processes involving various adsorbents and dyes, thereby contributing to environmental sustainability. In studies by Kumari et al., methylene blue (MB) and crystal violet (CV) dyes were removed with over 98% efficiency using Saccharum officinarum L., and the artificial neural network (ANN) model showed the highest prediction accuracy with R2 = 0.9236. In the research conducted by Kumari et al., a modeling approach and machine learning (ML) techniques were employed to remediate an organic pollutant using a Juglans regia adsorbent. The process was successfully predicted with artificial neural networks (ANNs) (R2 = 0.9373) and the RSM (R2 = 0.9117) models, and the adsorption was spontaneous and exothermic.
The present study integrates both batch and continuous systems to investigate novel biosorbent-based strategies for dye removal from wastewater. The apricot, almond, and walnut shells employed herein are commonly available agricultural residues in Turkey. These natural materials are rich in hydroxyl functional groups, which facilitate the adsorption of pollutants and can be further functionalized by introducing sulpho, amino, or carboxyl groups. However, their inherent adsorption capacity is often limited. Studies have shown that chemical modification can significantly improve the efficiency of these biosorbents by enhancing their surface reactivity and functional group density. −
The adsorbent’s performance is a crucial consideration in the adsorption process design field, typically evaluated through adsorption capacity assessments. This study uses data mining and machine learning methodologies to predict the target variables, removal percentage (% R), and adsorption capacity (mg/g), and rank the importance of various attributes affecting these variables. The key contributions of this study to the existing literature are as follows:
The use of agricultural byproducts such as apricot kernel shells (APKSh), almond shells (ASh), and walnut shells (WSh), abundant in Turkey, presents a sustainable and eco-friendly approach to waste management, promoting the effective use of domestic, low-cost biosorbent resources.
This study proposes a novel method for improving the performance of biosorbents by chemically treating natural shell wastes, thereby enhancing their adsorption capacitya subject not previously explored in the literature.
This research integrates batch and continuous systems, allowing for development of more comprehensive and practical solutions for dye removal.
The application of data mining and machine learning techniques in the adsorption process enables the automated prediction of target variables (removal percentage and adsorption capacity) and identification of the most effective parameters. This advancement contributes to developing digital decision support systems for process optimization.
Methodology
Adsorption Studies and Data Acquisition Process
The data sets for this study were obtained from the peer-reviewed research article of Kocaman in 2020. A simple dye called MB was used as an adsorbate. A nitrogen center in its aromatic ring gives MB, a cationic dye, a positively charged surface. Adsorption tests were conducted on modified natural adsorbents derived from biomass waste (apricot kernel shells (APKSh), almond shells (ASh), and walnut shells (WSh)) modified with levulinic acid (LA). A series of aqueous dye solutions was prepared, varying in concentration from 10 to 500 mg/L, employing distilled water as the solvent. A volume of 25 mL of each solution was treated in a shaking bath containing known amounts of adsorbent (0.4–6 g/L) at a speed of 150 rpm Following a designated period (0–10 h), the mixture was centrifuged at 5000 rpm for 5 min to separate the liquid and solid phases. A UV–vis spectrophotometer was used to analyze the aqueous phase for methylene blue at a maximum wavelength of 661 nm, corresponding to the compound’s highest absorption peak (λmax). The amount of adsorbed dye was ascertained by computing the difference between the dye solution’s initial and final concentrations. Eqs and were used to calculate the adsorption % and the adsorbent’s capacity, as explained below.
| 1 |
| 2 |
C 0: initial dye concentration (mg/L)
C e : final dye concentration (mg/L)
R: removal percentage (%)
q e : adsorbent capacity (mg/g)
V: volume of dye solution (L)
M: amount of adsorbent (g)
The present study investigates the effects of pH (3, 5, 7, 8, 10), adsorbent dose (0.4, 1, 2, 4, 6 g/L), temperature (25, 35, 45, 55 °C), time (30, 60, 120, 180, 240, 240, 300, 360, 420, 480, 540, 600 min) and concentration (10, 20, 30, 40, 50, 100, 150, 200, 300, 500 mg/L) on MB adsorption. The initial pH effect is examined first in this study. Then, the effects of adsorbent dose, concentration, time, and temperature on removal percentage (%) and adsorption capacity (q e ) are examined. Consequently, a data set comprising 105 samples was obtained. The parameters of the LA-modified biosorbent (0.025 L; 250 rpm) utilized in the investigation of the effects of various factors on MB adsorption are presented in Table .
1. LA-Modified Biosorbents Parameters Used to Explore the Effects of Various Factors on MB Adsorption (0.025 L; 250 rpm).
| pH | Dosage (g/L) | Concn (mg/L) | Time (min) | Temp (°C) | |
|---|---|---|---|---|---|
| Effect of pH | 3, 5, 7, 8, 10 | 1.0 | 10 | 300 | 25 |
| Effect of dosage | 5 | 0.4, 1.0, 2.0, 4.0, 6.0 | 10 | 300 | 25 |
| Effect of MB concentration | 5 | 2.0 | 10, 20, 30, 40, 50, 100, 150, 200, 300, 500 | 300 | 25 |
| Effect of time | 5 | 2.0 | 100 | 30, 60, 120, 180, 240, 300, 360, 420, 480, 540, 600 | 25 |
| Effect of temperature | 5 | 2.0 | 100 | 300 | 25, 35, 45, 55 |
Data Preprocessing
The data set used in this investigation includes maximum removal percentage (%) and adsorption capacity (q e ) values obtained by removing MB dye from water using different adsorbent types. These adsorbents were modified by incorporating LA into almond, walnut, and apricot kernel powders. The data set under consideration encompasses pH, adsorbent dose, concentration, temperature, and time values.
One-hot encoding is a prevalent technique for the digitization of categorical data. This method creates a separate binary column (0 or 1) for each category in the attribute to be digitized. Creating new attribute values of the type categorical value, consisting of three attribute values, was performed for almond, walnut, and apricot kernels. This was achieved employing One-Hot Encoding. Consequently, a data set consisting of 105 samples with eight attributes and two target values was created. The values and descriptions of the data set attributes are presented in tabular form in Table . The categorical feature type comprised three distinct material types (almond, walnut, and apricot kernel) that could affect the adsorption behavior in terms of physicochemical properties and surface properties. One-hot encoding was adopted to ensure that the model could capture and distinguish potential variation in adsorption performance attributable to these types. This transformation resulted in a marginal increase in the dimensionality of the data set (from one to three binary variables) without any adverse impact on model performance, as evidenced by the findings of the cross-validation process. Given the limited number of categories and observations, the impact on model complexity was minimal, and the benefit of capturing material-specific effects justified the use of this encoding strategy.
2. Dataset Attribute Values and Descriptions.
| Attribute | Description | Value |
|---|---|---|
| Almond | If almond is present, the result will be 1; if not, it will be 0. | 0 or 1 |
| Walnut | If walnut is present, the result will be 1; if not, it will be 0. | 0 or 1 |
| Apricot kernel | If apricot kernel is present, the result will be 1; if not, it will be 0. | 0 or 1 |
| pH | MB solution pH | 3, 5, 7, 9, 10 |
| Adsorbent Dose (g/L) | Amount of natural adsorbent | 0.4, 1.0, 2.0, 4.0, 6.0 |
| Concn (mg/L) | Concentration of MB | 10, 20, 30, 40, 50, 100, 150, 200, 300, 500 |
| Temp (°C) | Ambient temperature | 25, 35, 45, 55 |
| Time (s) | Adsorption time | 30, 60, 120, 180, 240, 300, 360, 420, 480, 540, 600 |
| Removal percentage (%) | Target 1 | [22.7, 98.83] |
| Adsorption capacity (mg/g) | Target 2 | [1.63, 99.4] |
During data analysis, it is important to note that the value range of each attribute may vary, with the potential to impact the analysis results. To eliminate such interactions, it is necessary to normalize or standardize the data. This process enables the data to be transformed into a specific range, such as [−1,1] or [0,1]. In this study, Min-Max normalization is employed for each attribute, with the data being rescaled such that the minimum value is set to 0 and the maximum value is set to 1.
Statistical Analysis
In data analysis, it is imperative to thoroughly examine the data distribution to select the most appropriate statistical methods. The Kolmogorov–Smirnov (KS) test measures the differences between two data groups by comparing their distribution functions. , According to the data distribution characteristics, the analysis process is shaped by applying parametric or nonparametric tests.
Correlation analysis is a method of determining the direction and strength of the relationship between variables. The Pearson Correlation Coefficient should be preferred when the data are normally distributed. Conversely, if the data do not demonstrate a normal distribution, the Spearman correlation coefficient should be considered a more appropriate method.
Machine Learning
The capacity of computers to acquire knowledge has the potential to generate software applications that can enhance their functionality with experience. In this field, ML algorithms have achieved significant success in applications such as data mining in many different fields. Its extensive application encompasses domains like extracting valuable information from substantial data sets and decision support systems. Within machine learning, classification and regression represent fundamental methodologies that seek to predict future outcomes through analyzing existing data. The primary function of classification is to predict categorical labels, whereas regression is employed to predict continuous numerical values. Regression is a predictive modeling method used when the target variable is continuous. Regression analysis is widely utilized for determining numerical predictions and distributional trends. Before classification and regression, relevance analysis can be performed to identify important attributes, and unnecessary attributes can be removed from the process.
It has been demonstrated that neural network learning methods effectively learn functions with continuous and discrete values. Furthermore, these methods are robust to noise in the training data. The backpropagation algorithm is one of this field’s most widely utilized methods. This algorithm generates hypotheses by updating the weights within a given network architecture, enabling the model to learn by minimizing the error rate with the gradient descent method. Multilayer feed-forward (MLP) artificial neural networks have been shown to possess the capacity to approximate any function with a certain level of accuracy, provided that they contain a sufficient number of neurons. A significant benefit of the back-propagation algorithm is its capacity to identify novel features that are not inherently present in the input data but emerge during the learning process.
Ensemble methods aim to obtain more robust predictions by combining simple models. It is acknowledged that weak learners may not achieve optimal results independently; however, their efficacy can be enhanced through collaborative efforts. Bagging, RF, and Boosting are popular ensemble methods. The process of bagging trains involves the construction of multiple decision trees, each derived from a different bootstrapped data set sample. This approach reduces variance by averaging the predictions made by these trees. RF represents an enhancement of Bagging, whereby the interdependence between trees is diminished and the model’s generalization capability is augmented by employing a specific number of randomly selected variables at each split.
The objective of boosting algorithms is to construct a more robust model through the successive training of weak learners. Each new model incorporates improvements derived from the errors observed in previous iterations, thereby enhancing the model’s accuracy through refinement and reduction in error rate. The gradient boosting method, developed for regression problems, focuses similarly, minimizing the loss function at each step and gradually reducing the model’s prediction errors. A gradient boosting decision tree ensemble designed for great scalability and efficiency is called XGBoost. By reducing the loss function, XGBoost produces an additive expansion of the objective function, which is comparable to the gradient boosting method. This procedure makes it easier to produce more reliable predictions.
Evaluation Metrics
k-Fold Cross Validation
Resampling methods are widely used in statistical analysis and can often be computationally expensive. Using various subsets of the training data, these approaches repeatedly apply the same statistical procedure. However, thanks to improved computational capacities, the computational requirements of these methods are no longer a significant obstacle. Cross-validation (CV) assesses model accuracy or determines appropriate model flexibility. In particular, k-fold CV divides the data set into k equal groups, using a different group as a validation set each time, and calculates the model’s accuracy. This process is repeated k times to obtain error estimates based on each validation set, and the final model error is calculated by averaging them.
RMSE and MAE
The predictive performance of ML models is determined by two error metrics: the mean absolute error (MAE) and the root-mean-square error (RMSE), which are widely used in regression analysis. These metrics quantitatively reveal the proximity of the model’s predictions to the true values and are widely preferred in the literature with regard to interpretability.
MAE represents the average of the absolute differences between the predicted and true values, while RMSE is calculated by taking the square root of the mean square of the prediction errors (eq ).
| 3 |
where yi is the true (observed) dependent variable value and f(xi) is the dependent variable value predicted by the regression model.
R2 Score
The primary purpose of regression is to create a model that provides the best fit to the input data and minimizes the amount of error. To evaluate the performance of the regression model, measures such as total sum of squares (SST) and regression error squared (RMS) are used:
- SST (Total Sum of Squares): expresses the amount of error in the predictions made using the overall mean of the dependent variable (eq ).
4 - SSM (Regression Sum of Squares): expresses the amount of error created when the model’s dependent variable is on (eq ).
5
In eqs and , yi is the true (observed) dependent variable value, y̅ is the mean value of the dependent variable, and f(xi) is the dependent variable value predicted by the regression model.
R2 (Coefficient of Determination) is used to determine the success of the regression model. The R2 value varies between 0 and 1 and shows how well the model explains the dependent variable (eq ):
| 6 |
Proposed Framework
The proposed framework is presented in Figure . As illustrated in Figure , the summary of the study is as follows: First, in the Data Acquisition Process step, apricot, almond, and walnut shell powders, which are natural fruit seed wastes, were first ground and sieved on 53 μm sieves to become powder. Natural shells that had undergone a chemical modification with LA were utilized as adsorbents in an innovative and environmentally friendly approach to remove MB dye. The present study examined the adsorption behavior of MB on biosorbents, with a range of parameters investigated, including the pH (3–10), sorbent dose (0.4–6 g/L), initial dye concentration (10–500 mg/L), and temperature (25–65 °C). The adsorbents’ maximum adsorption capacity (mg/g) and removal percentage (%) values were determined for each case. The data set under consideration consists of 35 separate samples for all almond, walnut, and apricot seed adsorbents, a total of 105 samples. The attribute values consist of the following: pH, sorbent dose, concentration, time, and temperature. According to the Kolmogorov–Smirnov test, it was determined that the data did not demonstrate a normal distribution. The subsequent determination of the importance order of the attributes was achieved by employing Spearman’s correlation coefficient between the target variables of removal percentage (%) and adsorption capacity (mg/g).
1.

Data acquisition and ML processing workflow.
Figure demonstrates that the hierarchy of significance is established by the correlation coefficient among the target adsorption capacity values, the removal percentage, and the characteristics. The Species attribute was also added to the Data Set using One Hot Encoding. A new column value was created for each species, such that if the sample belonged to that species, it would be assigned a value of 1. Otherwise, it would be assigned a value of 0. Consequently, three additional columns were added for species values, resulting in a data set with a total of eight attributes and two target values. The MinMax normalization process was employed for standardization, with all attribute values ranging from 0 to 1. The present study employed various popular ML models for regression estimation, including GB, XGB, MLP, and RF methods. The MLmodels utilized in the study were configured with hyperparameters that are predominantly favored in the extant literature and closely resemble the library defaults, with a view to mitigating the risk of overfitting in view of the limited data set. The evaluation of the model was conducted through the utilization of 5-fold cross-validation. In this study, the MLP model was constructed with two hidden layers comprising 10 and 20 neurons, respectively. The rectified linear unit (ReLU) activation function and the ‘lbfgs’ solver were utilized. The RF model was configured with 100 trees, with the number of estimators set to 100. The GB and XGBoost models were both constructed with 100 trees and a learning rate of 0.1. Tables and illustrate the R2 performance scores achieved by the models across each fold of the 5-fold cross-validation process.
2.
Correlation values between target values and attributes.
3. Results of the Regression Estimation Process for the Removal Percentage.
| Model | R2 | RMSE | MAE |
|---|---|---|---|
| MLP | 0.8041 | 5.0713 | 3.5355 |
| RF | 0.8268 | 5.0891 | 3.3009 |
| GB | 0.8858 | 4.0837 | 2.6381 |
| XGBoost | 0.8423 | 4.7345 | 3.0092 |
4. Results of the Regression Estimation Process for Adsorption Capacity.
| Model | R2 | RMSE | MAE |
|---|---|---|---|
| MLP | 0.8844 | 8.4585 | 4.9902 |
| RF | 0.9086 | 7.0426 | 3.9303 |
| GB | 0.9532 | 5.3473 | 2.8165 |
| XGBoost | 0.9424 | 5.9737 | 3.0242 |
The performance of the regression models was evaluated using the R2, RMSE and MAE metrics. The model results for removal percentage are presented in Table and the results for adsorption capacity in Table . The analyses show that the Gradient Boosting (GB) model achieved the highest R2 values for both target variables in particular (0.8858 for removal percentage and 0.9532 for adsorption capacity). These high R2 scores indicate that the models successfully explain the variance of the relevant target variables and demonstrate strong predictive ability. However, the absolute magnitudes of the RMSE and MAE values seem higher than in some previous studies. The main reason for this is that these metrics are unit-dependent; therefore, the value ranges of the target variables should be considered when making direct comparisons. Studies in the literature where error metrics are reported as low are generally based on data sets where the target variable is limited to a narrow range (e.g., 90–100). Error metrics naturally remain low within such limited ranges. In this study, removal percentage varies over a wide range, from 22.7 to 98.83 mg/g, and adsorption capacity from 1.63 to 99.4%. These wide scales inevitably increase the magnitude of the RMSE and MAE values. Therefore, when interpreting error metrics, it is important to consider the absolute values and distribution ranges of the target variables. In this context, high R2 scores strongly indicate that the models have high predictive power, regardless of the absolute error magnitudes.
Figures and illustrate the R2 performance scores achieved by the models across each fold of the 5-fold cross-validation process.
3.

Fold-wise R2 scores for removal percentage.
4.

Fold-wise R2 scores for adsorption capacity.
In Table , the average R2 score obtained via 5-fold cross-validation is presented.
5. Results of the Regression Estimation Process.
| Model | R2 Score for Removal Percentage | R2 Score for Adsorption Capacity |
|---|---|---|
| GB | 0.8858 | 0.9532 |
| XGBoost | 0.8423 | 0.9424 |
| RF | 0.8268 | 0.9086 |
| MLP | 0.8003 | 0.8614 |
Results and Conclusions
Surface Morphology and Adsorption Mechanism
Surface morphology analyses of raw and LA-modified biomass powders were carried out using scanning electron microscopy (SEM) (Figure ). The untreated ASh (Figure a1) exhibited a relatively compact and fibrous structure with limited surface irregularities and pore development, indicating a low number of accessible adsorption sites for dye molecules. In contrast, the LA-treated ASh (Figure a2) demonstrated clear surface disruption, with visible microcracks, flaking, and increased textural heterogeneity, suggesting an enhanced availability of active sites for adsorption due to surface oxidation and partial depolymerization of lignocellulose components. ,
5.
SEM images of raw and LA-modified biomass powders: (a1) raw ASh, (b1) raw WSh, (c1) raw APKSh; (a2) LA-modified ASh, (b2) LA-modified WSh, (c2) LA-modified APKSh (Mag = 5.00 KX, 2 μm).
Similarly, the raw WSh (Figure b1) showed a dense and fragmented surface structure with limited porosity and a comparatively smoother texture. However, following LA treatment (Figure b2), the surface displayed distinct stratification, roughness, and delamination, implying a more developed porous network and increased surface reactivity. These morphological changes are likely attributed to esterification or hydrogen bonding between LA and the hydroxyl groups on the lignocellulosic matrix, which disturb the original structure and create accessible binding pockets.
The APKSh (Figure c1) before modification presented a compact, plate-like morphology with minimal surface irregularities. Post-treatment with LA (Figure c2), the structure was noticeably more disrupted, with visible fragmentation, porous zones, and increased surface heterogeneity. These features point toward enhanced dye accessibility and the formation of polar functional groups such as carboxyl and carbonyl moieties on the surface, which are favorable for electrostatic and π–π interactions with aromatic dye molecules like MB.
Overall, the SEM images confirm that LA modification induces significant changes in surface morphology, increasing porosity and roughness across all biomass types examined. This enhancement is consistent with previous findings in the literature, where LA-treated lignocellulosic materials exhibited improved adsorption capacities for organic pollutants due to chemical activation and surface restructuring. ,
Levulinic acid (CH3C(O)CH2CH2COOH) is a bifunctional organic acid containing both a ketone and a carboxylic acid group. During the modification process, the carboxylic acid group of LA is capable of forming ester or hydrogen bonds with the hydroxyl (−OH) groups abundantly present on the lignocellulosic structure of the fruit shell powders (APKSh, ASh, WSh). , The interaction is primarily via esterification or surface grafting, especially under mild heating or catalytic conditions.
This surface modification introduces additional polar oxygen-containing groups (e.g., carbonyl, carboxyl, hydroxyl) onto the adsorbent surface, thereby ,,
increasing the number of active binding sites for dye molecules,
enhancing surface acidity, which improves electrostatic attraction toward cationic dyes like MB,
improving hydrogen bonding interactions between MB and the modified surface, and
slightly increasing the hydrophilicity and swelling behavior, thus improving dye diffusion into surface pores.
Furthermore, the presence of LA-derived carbonyl groups facilitates π–π stacking and dipole–dipole interactions with the aromatic structure of MB, thereby enhancing the dye’s retention.
This mechanistic understanding is consistent with prior findings in the literature, where LA-modified lignocellulosic adsorbents showed enhanced removal capacities for various organic pollutants due to increased surface polarity, electronic interactions, and binding affinity. ,, The proposed mechanism for the interaction of MB dye onto LA-based adsorbents is provided in Figure .
6.

Proposed adsorption mechanism of MB on LA-modified bioadsorbents.
Evaluation of Machine Learning Results
This study employed machine learning techniques to model and interpret the dye removal process from wastewater. Natural fruit seed wastes, including apricot, almond, and walnut shells, were ground into powder form and chemically modified with LA to develop environmentally friendly adsorbent materials. The modified adsorbents were then used to remove methylene blue (MB) dye.
The q e (mg/g) and removal percentage (%) were analyzed under various parameters, including pH levels (3–10), adsorbent doses (0.4–6 g/L), initial dye concentrations (10–500 mg/L), and temperatures (25–55 °C). The data set consisted of 105 samples, each corresponding to almond, walnut, or apricot species. It included attributes like pH, adsorbent dose, dye concentration, temperature, time, and species type. The One-Hot Encoding method incorporated the species variable into the data set, and all attributes were normalized to the [0–1] range using the Min-Max normalization method.
The distributional characteristics of the entry features were analyzed both statistically and visually. The Kolmogorov–Smirnov test was applied to assess the normality of the data. It was observed that the features did not conform to a normal distribution. Furthermore, the histogram plots presented in Figure demonstrate that the majority of variables manifest skewed. The presence of skewness, concentration of values in narrow ranges, and the existence of outliers, as evidenced by the histograms, provide sufficient justification for the utilization of nonparametric methodologies. Accordingly, Spearman’s rank correlation coefficient was utilized to evaluate the relevance of features.
7.
Histogram plots of features.
In this study, Spearman’s correlation analysis was employed to assess the significance of the features due to the non-normal distribution of the data set. The importance of the attributes was ranked based on their correlation with the target variables, and these rankings were corroborated by findings in the existing literature, which similarly highlight the relevance of such attributes in dye removal processes. Specifically, previous studies have consistently emphasized the critical roles of factors such as pH, adsorbent dose, and dye concentration in the adsorption process. The obtained rankings align well with prior research in the field of dye adsorption and contribute to the interpretability of the model outputs. This approach is particularly valuable for ensuring a balanced trade-off between model performance and interpretability, especially when working with limited or small-scale data sets.
Prominent machine learning algorithms, such as GB, XGB, MLP, and RF, were applied to regression estimation of adsorption capacity and removal efficiency. These algorithms were evaluated through a 5-fold cross-validation (5-fold CV) process. The highest prediction performance was achieved using the GB model, with R2 values of 0.8858 for removal efficiency and 0.9532 for adsorption capacity. These results demonstrate a high level of prediction accuracy, thereby confirming the effectiveness of ML-based approaches in this field. ML models are distinguished not only by their predictive accuracy but also by their ability to analyze multidimensional relationships among complex process parameters.
Recently, there has been a significant increase in the number of adsorption studies based on artificial intelligence and machine learning in the literature. − In our study, we estimated two target variables, namely removal and adsorption capacity, using a data set consisting of three different types and machine learning methods. For our contribution to the literature, the different types were converted into separate feature columns using the one-hot encoding method during the data preprocessing stage, and the data were normalized. Additionally, correlation-based feature importance ranking was performed. Comprehensive analyses were performed using multiple machine learning models that are widely used in the literature and demonstrate high performance. Although the removal and adsorption capacity values in the data set used in the study were within a wide range, the R2 scores obtained demonstrate that our models successfully predicted these targets. These findings highlight the potential of machine learning to systematically and efficiently evaluate complex interactions that are often difficult to reveal through conventional experimental approaches, and to support the development of data-driven decision-making systems in chemical and environmental engineering by providing scientifically sound and practically applicable outputs for the design of sustainable wastewater treatment technologies.
This study is limited to a specific dye–biosorbent system, and the generalizability of the proposed models to other dyes or biosorbents has not been evaluated. Since adsorption behavior can vary with different chemical structures and surface properties, future work should focus on expanding the data set to include a wider range of dye types and biosorbent materials. This would allow for broader model applicability and robustness. Additionally, methods such as transfer learning or domain adaptation could be explored to enhance model generalization across different adsorption systems
Acknowledgments
The data employed in this study were sourced from Kocaman.
The data set used in this study is available via Zenodo at 10.5281/zenodo.15655679 under a Creative Commons license. The data includes experimental values of maximum adsorption capacity (q e ) and removal percentage (%) for MB dye removal using levulinic acid-modified natural adsorbents. The data set used in this study was constructed based on the data acquisition process described by Kocaman and was reused with appropriate citation.
The manuscript was written through the euqal contributions of both authors. Both authors have approved the final version of the manuscript.
The authors declare no competing financial interest.
References
- Benkhaya S., M' rabet S., El Harfi A.. A review on classifications, recent synthesis and applications of textile dyes. Inorg. Chem. Commun. 2020;115:107891. doi: 10.1016/j.inoche.2020.107891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh K., Ashok B., Kaur M., Ravishankar B., Chandola H.. Hypoglycemic and antihyperglycemic activity of Saptarangyadi Ghanavati: An Ayurvedic compound formulation. AYU. 2014;35:187. doi: 10.4103/0974-8520.146248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao C., Zhou J., Yan Y., Yang L., Xing G., Li H., Wu P., Wang M., Zheng H.. Application of coagulation/flocculation in oily wastewater treatment: A review. Sci. Total Environ. 2021;765:142795. doi: 10.1016/j.scitotenv.2020.142795. [DOI] [PubMed] [Google Scholar]
- Samsami S., Mohamadizaniani M., Sarrafzadeh M.-H., Rene E. R., Firoozbahr M.. Recent advances in the treatment of dye-containing wastewater from textile industries: Overview and perspectives. Process Saf. Environ. Prot. 2020;143:138–163. doi: 10.1016/j.psep.2020.05.034. [DOI] [Google Scholar]
- Al-Tohamy R., Ali S. S., Li F., Okasha K. M., Mahmoud Y. A. G., Elsamahy T., Jiao H., Fu Y., Sun J.. A critical review on the treatment of dye-containing wastewater: Ecotoxicological and health concerns of textile dyes and possible remediation approaches for environmental safety. Ecotoxicol. Environ. Saf. 2022;231:113160. doi: 10.1016/j.ecoenv.2021.113160. [DOI] [PubMed] [Google Scholar]
- Bharathi D., Thiruvengadam Nandagopal J. G., Rajamani R., Pandit S., Kumar D., Pant B., Pandey S., Kumar Gupta P.. Enhanced photocatalytic activity of St-ZnO nanorods for methylene blue dye degradation. Mater. Lett. 2022;311:131637. doi: 10.1016/j.matlet.2021.131637. [DOI] [Google Scholar]
- Islam A., Teo S. H., Taufiq-Yap Y. H., Ng C. H., Vo D. V. N., Ibrahim M. L., Hasan M. M., Khan M. A. R., Nur A. S. M., Awual M. R.. Step towards the sustainable toxic dyes removal and recycling from aqueous solution – A comprehensive review. Resour. Conserv. Recycl. 2021;175:105849. doi: 10.1016/j.resconrec.2021.105849. [DOI] [Google Scholar]
- Ali S. S., Al-Tohamy R., Sun J., Wu J., Huizi L.. Screening and construction of a novel microbial consortium SSA-6 enriched from the gut symbionts of wood-feeding termite, Coptotermes formosanus and its biomass-based biorefineries. Fuel. 2019;236:1128–1145. doi: 10.1016/j.fuel.2018.08.117. [DOI] [Google Scholar]
- Saravanan A., Deivayanai V. C., Kumar P. S., Rangasamy G., Hemavathy R. V., Harshana T., Gayathri N., Alagumalai K.. A detailed review on advanced oxidation process in treatment of wastewater: Mechanism, challenges and future outlook. Chemosphere. 2022;308:136524. doi: 10.1016/j.chemosphere.2022.136524. [DOI] [PubMed] [Google Scholar]
- Kocaman S.. Removal of methylene blue dye from aqueous solutions by adsorption on levulinic acid-modified natural shells. Environ. Technol. 2020;22:885–895. doi: 10.1080/15226514.2020.1736512. [DOI] [PubMed] [Google Scholar]
- Kocaman S.. Synthesis and cationic dye biosorption properties of a novel low-cost adsorbent: coconut waste modified with acrylic and polyacrylic acids. Environ. Technol. 2020;22:551–566. doi: 10.1080/15226514.2020.1741509. [DOI] [PubMed] [Google Scholar]
- Kocaman S.. Evaluation of adsorption characteristics of new-generation CNT-based adsorbents: characterization, modeling, mechanism, and equilibrium study. Carbon Lett. 2023;33:883–897. doi: 10.1007/s42823-023-00468-5. [DOI] [Google Scholar]
- Papadaki M. I., Mendoza-Castillo D. I., Reynel-Avila H. E., Bonilla-Petriciolet A., Georgopoulos S.. Nut shells as adsorbents of pollutants: Research and perspectives. Frontiers in Chemical Engineering. 2021;3:640983. doi: 10.3389/fceng.2021.640983. [DOI] [Google Scholar]
- Jain H., Yadav V., Rajput V. D., Minkina T., Agarwal S., Garg M. C.. An Eco-sustainable Green Approach for Biosorption of Methylene Blue Dye from Textile Industry Wastewater by Sugarcane Bagasse, Peanut Hull, and Orange Peel: A Comparative Study Through Response Surface Methodology, Isotherms, Kinetic, and Thermodynamics. Water Air Soil Pollut. 2022;233:187. doi: 10.1007/s11270-022-05655-0. [DOI] [Google Scholar]
- Kumari S., Agrawal N. K., Agarwal A., Kumar A., Malik N., Goyal D., Rajput V. D., Minkina T., Sharma P., Garg M. C.. A Prominent Streptomyces sp. Biomass-Based Biosorption of Zinc (II) and Lead (II) from Aqueous Solutions: Isotherm and Kinetic. Separations. 2023;10:393. doi: 10.3390/separations10070393. [DOI] [Google Scholar]
- An J., Nhung N. T. H., Ding Y., Chen H., He C., Wang X., Fujita T.. Chestnut shell-activated carbon mixed with pyrolytic snail shells for methylene blue adsorption. Materials. 2022;15:8227. doi: 10.3390/ma15228227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senthil Kumar P., Ramalingam S., Senthamarai C., Niranjanaa M., Vijayalakshmi P., Sivanesan S.. Adsorption of dye from aqueous solution by cashew nut shell: Studies on equilibrium isotherm, kinetics and thermodynamics of interactions. Desalination. 2010;261:52–60. doi: 10.1016/j.desal.2010.05.032. [DOI] [Google Scholar]
- Uddin M. K., Nasar A.. Walnut shell powder as a low-cost adsorbent for methylene blue dye: Isotherm, kinetics, thermodynamic, desorption, and response surface methodology examinations. Sci. Rep. 2020;10:1–13. doi: 10.1038/s41598-020-64745-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cruz-Reina L. J., Fonseca-Bermúdez Ó. J., Flórez-Rojas J. S., Rodríguez-Cortina J., Giraldo L., Moreno-Piraján J. C., Herrera-Orozco I., Carazzone C., Sierra R.. Pyrolysis-derived activated carbon from Colombian cashew (Anacardium occidentale) nut shell for valorization in phenol adsorption. Adsorption. 2025;31:1–18. doi: 10.1007/s10450-024-00574-4. [DOI] [Google Scholar]
- Shabbir M., Rather L. J., Shahid-ul-Islam, Bukhari M. N., Shahid M., Ali Khan M., Mohammad F.. An eco-friendly dyeing of woolen yarn by Terminalia chebula extract with evaluations of kinetic and adsorption characteristics. Journal of Advanced Research. 2016;7:473–482. doi: 10.1016/j.jare.2016.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karod M., Hubble A. H., Maag A. R., Pollard Z. A., Goldfarb J. L.. Clay-catalyzed in situ pyrolysis of cherry pits for upgraded biofuels and heterogeneous adsorbents as recoverable by-products. Biomass Conversion and Biorefinery. 2024;14:7873–7885. doi: 10.1007/s13399-022-02921-3. [DOI] [Google Scholar]
- Reghioua A., Atia D., Hamidi A., Jawad A. H., Abdulhameed A. S., Mbuvi H. M.. Production of eco-friendly adsorbent of kaolin clay and cellulose extracted from peanut shells for removal of methylene blue and Congo red removal dyes. Int. J. Biol. Macromol. 2024;263:130304. doi: 10.1016/j.ijbiomac.2024.130304. [DOI] [PubMed] [Google Scholar]
- Soydal U., Kocaman S., Ahmetli G., Avşar S.. Methylene blue sorption performance of lignocellulosic peach kernel shells modified with cellulose derivative chitosan as a new bioadsorbent. Int. J. Biol. Macromol. 2024;280:1338–1345. doi: 10.1016/j.ijbiomac.2024.135646. [DOI] [PubMed] [Google Scholar]
- Baseri H., Farhadi A.. Valorization of pistachio bark as the biosorbent for adsorption of dye and heavy metal ions from the contaminated water. Biomass Conversion and Biorefinery. 2025;15:7239–7250. doi: 10.1007/s13399-024-05473-w. [DOI] [Google Scholar]
- Omwoyo F. O., Otieno G.. Optimization of Methylene Blue Dye Adsorption onto Coconut Husk Cellulose Using Response Surface Methodology: Adsorption Kinetics, Isotherms and Reusability Studies. Journal of Materials Science and Chemical Engineering. 2024;12:1–18. doi: 10.4236/msce.2024.122001. [DOI] [Google Scholar]
- Shomope I., Tawalbeh M., Al-Othman A., Almomani F.. Predicting biohydrogen production from dark fermentation of organic waste biomass using multilayer perceptron artificial neural network (MLP-ANN) Comput. Chem. Eng. 2025;192:108900. doi: 10.1016/j.compchemeng.2024.108900. [DOI] [Google Scholar]
- Mahata C., Ray S., Das D.. Optimization of dark fermentative hydrogen production from organic wastes using acidogenic mixed consortia. Energy Convers. Manag. 2020;219:113047. doi: 10.1016/j.enconman.2020.113047. [DOI] [Google Scholar]
- Hosseinzadeh A., Zhou J. L., Altaee A., Li D.. Machine learning modeling and analysis of biohydrogen production from wastewater by dark fermentation process. Bioresour. Technol. 2022;343:126111. doi: 10.1016/j.biortech.2021.126111. [DOI] [PubMed] [Google Scholar]
- Shomope I., Al-Othman A., Tawalbeh M., Alshraideh H., Almomani F.. Machine learning in PEM water electrolysis: A study of hydrogen production and operating parameters. Comput. Chem. Eng. 2025;194:108954. doi: 10.1016/j.compchemeng.2024.108954. [DOI] [Google Scholar]
- Bilgiç G., Öztürk B., Atasever S., Şahin M., Kaplan H.. Prediction of hydrogen production by magnetic field effect water electrolysis using artificial neural network predictive models. Int. J. Hydrogen Energy. 2023;48:20164–20175. doi: 10.1016/j.ijhydene.2023.02.082. [DOI] [Google Scholar]
- Odabaşı Ç., Dologlu P., Gülmez F., Kuşoğlu G., Çağlar Ö.. Investigation of the factors affecting reverse osmosis membrane performance using machine-learning techniques. Comput. Chem. Eng. 2022;159:107669. doi: 10.1016/j.compchemeng.2022.107669. [DOI] [Google Scholar]
- Kumari S., Chowdhry J., Chandra Garg M.. AI-enhanced adsorption modeling: Challenges, applications, and bibliographic analysis. J. Environ. Manage. 2024;351:119968. doi: 10.1016/j.jenvman.2023.119968. [DOI] [PubMed] [Google Scholar]
- Liu C., Balasubramanian P., Li F., Huang H.. Machine learning prediction of dye adsorption by hydrochar: Parameter optimization and experimental validation. J. Hazard. Mater. 2024;480:135853. doi: 10.1016/j.jhazmat.2024.135853. [DOI] [PubMed] [Google Scholar]
- Hamri N., Imessaoudene A., Hadadi A., Cheikh S., Boukerroui A., Bollinger J.-C., Amrane A., Tahraoui H., Tran H. N., Ezzat A. O., Al-Lohedan H. A., Mouni L.. Enhanced adsorption capacity of methylene blue dye onto kaolin through acid treatment: Batch adsorption and machine learning studies. Water. 2024;16(2):243. doi: 10.3390/w16020243. [DOI] [Google Scholar]
- Gamboa D. M. P., Abatal M., Lima E., Franseschi F. A., Ucan C. A., Tariq R., Elias M. A. R., Vargas J.. Sorption behavior of azo dye Congo red onto activated biochar from Haematoxylum campechianum waste: Gradient boosting machine learning-assisted Bayesian optimization for improved adsorption process. Int. J. Mol. Sci. 2024;25(9):4771. doi: 10.3390/ijms25094771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajput P., Yadav S., Liu C., Balasubramanian P.. Predicting biochar adsorption capacity for methylene blue removal using machine learning. J. Water Process Eng. 2025;69:106749. doi: 10.1016/j.jwpe.2024.106749. [DOI] [Google Scholar]
- Kulkarni O., Dongare P., Shanmughan B., Nighojkar A., Pandey S., Kandasubramanian B.. Machine learning-assisted prediction of engineered carbon systems’ capacity to treat textile dyeing wastewater via adsorption technology. Environ. Monit. Assess. 2025;197(2):223. doi: 10.1007/s10661-025-13664-9. [DOI] [PubMed] [Google Scholar]
- Kumari S., Chowdhry J., Sharma P., Agarwal S., Chandra Garg M.. Integrating artificial neural networks and response surface methodology for predictive modeling and mechanistic insights into the detoxification of hazardous MB and CV dyes using Saccharum officinarum L. biomass. Chemosphere. 2023;344:140262. doi: 10.1016/j.chemosphere.2023.140262. [DOI] [PubMed] [Google Scholar]
- Kumari S., Singh S., Lo S.-L., Sharma P., Agarwal S., Garg M. C.. Machine learning and modelling approach for removing methylene blue from aqueous solutions: Optimization, kinetics and thermodynamics studies. J. Taiwan Inst. Chem. Eng. 2025;166(2):105361. doi: 10.1016/j.jtice.2024.105361. [DOI] [Google Scholar]
- Bayrakci A. G., Koçar G.. Utilization of renewable energies in Turkey’s agriculture. Renew. Sustain. Energy Rev. 2012;16:618–633. doi: 10.1016/j.rser.2011.08.027. [DOI] [Google Scholar]
- Zhu M., Yao J., Dong L., Sun J.. Adsorption of naphthalene from aqueous solution onto fatty acid modified walnut shells. Chemosphere. 2016;144:1639–1645. doi: 10.1016/j.chemosphere.2015.10.050. [DOI] [PubMed] [Google Scholar]
- Kim C., Zhang Z., Wang L., Sun T., Hu X.. Core-shell magnetic manganese dioxide nanocomposites modified with citric acid for enhanced adsorption of basic dyes. J. Taiwan Inst. Chem. Eng. 2016;67:418–425. doi: 10.1016/j.jtice.2016.07.015. [DOI] [Google Scholar]
- Šoštarić T. D., Petrović M. S., Pastor F. T., Lončarević D. R., Petrović J. T., Milojković J. V., Stojanović M. D.. Study of heavy metals biosorption on native and alkali-treated apricot shells and its application in wastewater treatment. J. Mol. Liq. 2018;259:340–349. doi: 10.1016/j.molliq.2018.03.055. [DOI] [Google Scholar]
- Nakhli A., Bergaoui M., Toumi K. H., Khalfaoui M., Benguerba Y., Balsamo M., Soetaredjo F. E., Ismadji S., Ernst B., Erto A.. Molecular insights through computational modeling of methylene blue adsorption onto low-cost adsorbents derived from natural materials: A multi-model’s approach. Comput. Chem. Eng. 2020;140:106965. doi: 10.1016/j.compchemeng.2020.106965. [DOI] [Google Scholar]
- Han, J. ; Kamber, M. ; Pei, J. . Data mining: Concepts and techniques, 3rd ed.; Morgan Kaufmann: Boston, 2012. [Google Scholar]
- Kolmogorov A.. Sulla determinazione empirica di una legge di distribuzione. Giorn. Ist. Ital. Attuari. 1933;4:83–91. [Google Scholar]
- Smirnov N.. Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics. 1948;19(2):279–281. doi: 10.1214/aoms/1177730256. [DOI] [Google Scholar]
- Pearson K., Pearson E. S.. On polychoric coefficients of correlation. Biometrika. 1922;14(1):127. doi: 10.2307/2331858. [DOI] [Google Scholar]
- Spearman C.. Correlation calculated from faulty data. British Journal of Psychology. 1910;3(3):271. doi: 10.1111/j.2044-8295.1910.tb00206.x. [DOI] [Google Scholar]
- Mitchell, T. M. Machine learning; McGraw-Hill, 1997. [Google Scholar]
- Breiman L.. Bagging predictors. Mach. Learn. 1996;24:123–140. doi: 10.1007/BF00058655. [DOI] [Google Scholar]
- Breiman L.. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
- James, G. ; Witten, D. ; Hastie, T. ; Tibshirani, R. . An introduction to statistical learning: With applications in R, 2nd ed.; Springer: New York, 2021. 10.1007/978-1-0716-1418-1. [DOI] [Google Scholar]
- Friedman J. H.. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001;29:1189–1232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
- Chen T. Q., Guestrin C.. XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Min. 2016:785–794. doi: 10.1145/2939672.2939785. [DOI] [Google Scholar]
- Chai T., Draxler R. R.. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014;7(3):1247–1250. doi: 10.5194/gmd-7-1247-2014. [DOI] [Google Scholar]
- Tan, P. N. ; Steinbach, M. ; Kumar, V. . Introduction to Data Mining; Addison-Wesley, 2006. [Google Scholar]
- Zhang P., O’Connor D., Wang Y., Jiang L., Xia T., Wang L., Tsang D. C.W., Ok Y. S., Hou D.. A Green Biochar/Iron Oxide Composite for Methylene Blue Removal. J. Hazard. Mater. 2020;384:121286. doi: 10.1016/j.jhazmat.2019.121286. [DOI] [PubMed] [Google Scholar]
- Liu S., Wang K., Yu H., Li B., Yu S.. Catalytic preparation of levulinic acid from cellobiose via Brønsted-Lewis acidic ionic liquids functional catalysts. Sci. Rep. 2019;9:1810. doi: 10.1038/s41598-018-38051-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Victor A., Sharma P., Pulidindi I. N., Gedanken A.. Levulinic Acid Is a Key Strategic Chemical from Biomass. Catalysts. 2022;12(8):909. doi: 10.3390/catal12080909. [DOI] [Google Scholar]
- Khan A., Szulejko J. E., Kim K.-H., Sammadar P., Lee S. S., Yang X., Ok Y. S.. A Comparison of Figure of Merit (FOM) for Various Materials in Adsorptive Removal of Benzene under Ambient Temperature and Pressure. Environ. Res. 2019;168:96–109. doi: 10.1016/j.envres.2018.09.019. [DOI] [PubMed] [Google Scholar]
- Sessa A., Prete P., Cespi D., Scotti N., Tabanelli T., Antonetti C., Russo V., Cucciniello R.. Levulinic Acid Biorefinery in a Life Cycle Perspective. Curr. Opin. Green Sustain. Chem. 2024;50:100963. doi: 10.1016/j.cogsc.2024.100963. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data set used in this study is available via Zenodo at 10.5281/zenodo.15655679 under a Creative Commons license. The data includes experimental values of maximum adsorption capacity (q e ) and removal percentage (%) for MB dye removal using levulinic acid-modified natural adsorbents. The data set used in this study was constructed based on the data acquisition process described by Kocaman and was reused with appropriate citation.





