Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 22;15:30982. doi: 10.1038/s41598-025-17166-z

RETRACTED ARTICLE: Identification of soil texture and color using machine learning algorithms and satellite imagery

Jiyang Wang 1,
PMCID: PMC12373780  PMID: 40846782

Abstract

The demand for high-quality and cost-effective soil information is increasing due to its importance in land-use planning and precision agriculture. This study aimed to estimate soil texture and color using satellite imagery as input variables for support vector regression (SVR) and decision tree regression (DTR) models. Soil properties, including soil texture (clay, silt, and sand) and color components (Hue, Value, and Chroma), were measured. Additionally, a wide range of indices derived from MODIS sensor imagery were calculated. Duncan’s test at a 5% significance level revealed significant temporal differences among the indices, although no significant differences were observed in the mean indices concerning soil texture variability. The results of error metrics, including root mean squared error (RMSE), absolute mean absolute percentage error (AMAPE), mean absolute error (MAE), mean squared error (MSE), and ratio of performance to deviation (RPD), demonstrated the superiority of the SVR method over the DTR method. Soil texture classification using the soil texture triangle and validation methods showed good agreement between measured and predicted data using the SVR approach. The lowest RMSE was observed for Hue, indicating the most accurate prediction, whereas sand showed the highest error. The differences in error metrics, including RMSE, AMAPE, MAE, MSE, and RPD, between SVR and DTR methods were 0, 0.2, 0, 0, and 0.8 for Hue and 0.41, 5, 0.1, 0.1, and 0.87 for sand, respectively. For future research, it is recommended to explore the combination of SVR with optimization techniques such as genetic algorithms to further improve the accuracy of soil texture and color predictions.

Keywords: Regression tree, Soil color, Soil texture, Support vector regression

Subject terms: Environmental sciences, Solid Earth sciences

Introduction

Soil is one of the most essential components of the environment, intricately connected to human life. It serves as a medium for plant growth and agricultural productivity, while also playing a pivotal role in water, nutrient, and carbon cycles1. Therefore, understanding soil properties is crucial for assessing and analyzing its condition2. These properties have significant implications across diverse fields such as civil engineering, agriculture, irrigation, and geology. Recognizing these properties can greatly enhance agricultural practices, support the development of resilient and sustainable infrastructure, and facilitate effective water resource management36. Furthermore, detailed insights into changes in soil properties provide a valuable foundation for optimal design and management planning, particularly in agricultural regions. Factors such as climate, topography, and geology greatly influence soil properties at a broad scale. For example, arid and hot climates can result in soils with distinct physical and chemical properties that differ markedly from those found in humid and colder climates. On a more localized scale, human activities, including agriculture, fertilizer and pesticide application, construction, and land-use changes, play a significant role in modifying soil properties7. Considering the fundamental importance of soil to human life and its critical role in environmental sustainability, conducting comprehensive and precise studies of soil properties and the factors affecting them is essential for achieving sustainable development and optimal resource management. Such studies can lay the groundwork for designing and implementing programs that help preserve soil quality and promote its responsible use8.

Soil texture is one of the most significant soil properties, influencing numerous physical and chemical properties and behaviors such as fertility, cation exchange capacity, water retention, and internal drainage. Additionally, soil color serves as a strong indicator for describing other soil properties, such as iron content and organic matter. Determining soil texture and creating maps of its distribution play an essential role in land-use planning, water regime management, and soil conservation studies9. However, conventional methods for soil texture analysis, such as hydrometer and pipette techniques, are time-consuming and costly, making them unsuitable for assessing large numbers of samples and producing high-resolution spatial maps. Moreover, the determination of soil color using the Munsell color chart has limitations, including user sensitivity and environmental conditions10. Consequently, the use of machine learning algorithms using remote sensing data for estimating soil properties has gained increasing attention11.

Rizzo, et al.12 predicted soil color using Landsat satellite images, demonstrating a high degree of agreement between remote sensing data and ground-measured soil color. This highlights the potential of satellite imagery for large-scale and cost-effective soil color detection. Similarly, Sahwan, et al.13 utilized satellite imagery from Landsat-8 and Sentinel-2 in Jordan to evaluate multiple machine learning algorithms for soil color prediction. Their findings revealed that the Support Vector Machine (SVM) algorithm achieved superior accuracy compared to other methods. Moreover, they identified a significant correlation between the spatial distribution of red soil color, the region’s annual average rainfall, and its geomorphological features, emphasizing the environmental influence on soil color variability. Barman and Choudhury14 applied a multi-class SVM approach for soil texture classification in India, developing a digital soil classification system. The model incorporated Hue and Value as key features and achieved an average accuracy of 91.37%, confirming the effectiveness of SVM in soil property classification. The utilization of MODIS satellite imagery proves highly effective for large-scale environmental monitoring, owing to its high availability and extensive spatial coverage. The integration of MODIS remote sensing data with machine learning algorithms such as SVR and Decision Tree Regression (DTR) enables efficient and accurate estimation of soil properties. These methods were selected for their balance between predictive performance and computational efficiency, making them appropriate for handling the spectral complexity of MODIS data and the spatial variability of soil characteristics15,16.

The Shenyang region in northwestern China, with its unique climatic characteristics, is one of the country’s important agricultural areas. This region is primarily dedicated to the cultivation of cereal crops such as wheat and corn, playing a crucial role in ensuring food security. However, agriculture in Shenyang faces significant challenges, including water resource scarcity, soil erosion, declining soil quality, and climatic fluctuations. These issues, particularly due to the lack of precise data and comprehensive information on soil characteristics, have impacted agricultural productivity. To address these challenges, the use of modern technologies, particularly machine learning techniques, could provide an effective solution. These methods, by aiding in the detailed study of vast areas, can contribute to the development of agriculture in these regions. In this context, the present study compares DTR and SVR methods for estimating soil texture and color in the Shenyang region in northeastern China.

Materials and methods

Geographic position, climate, agriculture, and soil resources of Shenyang

Shenyang, located in northeastern China, lies between latitudes 41°11′00″ N and 43°02′00″ N, and longitudes 122°25′00″ E and 123°48′00″ E, covering an area of ~ 13 × 103 km² (Fig. 1). According to the China Meteorological Administration, the region receives an average annual precipitation of around 600 mm, with a mean annual temperature of 8.6 °C. The elevation of Shenyang, obtained from the Google Earth Engine platform, ranges from ASL to 1025 m, with an average of 47 m ASL. Geographically, Shenyang consists of vast plains in its central and southern regions, while the northern and northeastern parts are characterized by low-lying hills and mountains17. Key agricultural products in Shenyang include corn, rice, wheat, soybeans, potatoes, and a variety of vegetables. However, agriculture in the region is often impacted by natural disasters such as droughts and floods18. The soils of the area are composed of silty, sandy, loamy, and alluvial types. According to the USDA Soil Taxonomy, the soils are predominantly classified as Alfisols and Mollisols, which are generally fertile but may require careful management to prevent nutrient depletion. In the mountainous and riverine regions, Entisols and Inceptisols can also be found19.

Fig. 1.

Fig. 1

Geographical boundaries of Shenyang, provincial location, and soil sampling points (https://www.qgis.org/ and https://www.google.com/earth/).

Soil assessment of the study area: from sampling to laboratory analyses

A total of 280 soil samples were collected from the 0–20 cm depth layer (topsoil) using an irregular random sampling technique across the study area. The sampling campaign was conducted during the dry season, between June and August 2023, to minimize the influence of short-term fluctuations in soil moisture. This period was selected to ensure consistency with standard practices in soil science and to facilitate comparability with similar studies. The sampling protocol followed the guidelines proposed by Cochran20. The irregular random sampling design was chosen to effectively capture the spatial variability of soil properties while minimizing systematic bias commonly associated with regular grid sampling. This method is especially suitable for heterogeneous landscapes characterized by diverse land uses and variable topography, as it permits flexible allocation of sampling points21,22. At each sampling location, soil was gathered from three to five sub-points23. The samples from each site were then combined, placed in polyester bags, and transported to the laboratory for further analysis. In the laboratory, the soil samples were air-dried for 24 h in a shaded area with sufficient airflow. After drying, the samples were sieved through a 2-mm mesh to prepare them for detailed analysis. Soil texture was determined by measuring the proportions of sand, silt, and clay using the Bouyoucos24 method. Additionally, the soil color of each sample was identified using the Munsell Soil Color Charts25. After performing the analyses and preparing the samples, the data were divided into two sets: the test dataset (25%; n = 70) and the training dataset (75%; n = 210), for the subsequent steps.

Calculation and analysis of remote sensing indices

In this study, raw MODIS Terra and Aqua sensor bands for the summer season of 2023, corresponding to the soil sampling period, were obtained from the Google Earth Engine platform. The native spatial resolutions of these bands vary between 250 m, 500 m, and 1 km. To ensure spatial consistency and accuracy in calculating spectral reflectance, thermal, and combined indices, all bands were resampled to a uniform spatial resolution of 250 m using bilinear interpolation26. The MODIS data used consist of 8-day composite images, which align temporally with the soil sampling campaign, thereby ensuring consistency between satellite observations and ground measurements. Following resampling, spectral reflectance indices (SRIs)27thermal indices (TIs)28and combined indices (CIs)29 were manually computed from the resampled bands according to established formulas (Table 1). This approach allows precise alignment of data across bands and enhances the reliability and reproducibility of the derived indices. Temporal variations and soil texture-based differences in the indices were statistically analyzed using the Duncan test. Additionally, the impact of selecting specific satellite index parameters was evaluated and discussed.

Table 1.

List and description of the spectral reflectance used in this study.

Indices Name Specific formula Symbol References
SRIs Normalized Difference Vegetation Index (NIR-Red)/(NIR + Red) NDVI 52
Vegetation Condition Index (NDVI - NDVI min)/(NDVI max + NDVI min) VCI Konag, 199553
Perpendicular Drought Index Inline graphic(R Red + MR NIR) PDI 54
Modified Perpendicular Drought Index Inline graphic MPDI 55
Fraction of Vegetation Inline graphic SAVI 56
Soil Adjustment Vegetation Index Inline graphic Fv 57
Modified Soil-Adjusted Vegetation Index Inline graphic MSAVI 58
NIR: Near Infrared band, Red: Red band, NDVI min: Minimum value of NDVI during a specified time period, NDVI max: Maximum value of NDVI during a specified time period, M: Slope of the soil line, and L: Correction factors for soil effects.
TIs Temperature Condition Index Inline graphic TCI Kogan, 199553
LST: Land surface temperature, LST max: Maximum land surface temperature, LST min: Minimum land surface temperature.
CIs Temperature Vegetation Index Inline graphic TVI 59
Vegetation Health Index Inline graphic VHI 1
Normalized Vegetation Supply Water Index Inline graphic NVSWI 60
Vegetation Water Content Index Inline graphic VWCI 61
Rescaled Normalized Difference Vegetation Index Inline graphic RNDVI Kogan, 200162
Relative Land Surface Temperature Inline graphic RLST 63
Modified Canopy Water Content Index Inline graphic MCWCI 64
α: A parameter that depends on temperature and moisture conditions.

Machine learning models

Decision tree regression

To predict soil texture and color, a decision tree regression (DTR) model was employed. DTR is a non-linear technique that forecasts target variable values by recursively splitting the data into subsets based on various features. In this study, the classification and regression trees algorithm was used to construct the DT30. The Gini index was used as the splitting criterion to determine the best feature at each node. To avoid overfitting, post-pruning was applied using cost-complexity pruning. Additionally, a 10-fold cross-validation was performed to evaluate the model’s performance and ensure its generalizability. The DT divides the data into different nodes and leaves using various soil features, and this model ultimately predicts the values for soil texture and color.

Support vector regression

Support vector regression (SVR) was also used for predicting soil texture and color. SVR is a regression model that predicts target variable values by mapping the data into a high-dimensional feature space and selecting an appropriate kernel. In this study, the radial basis function (RBF) kernel was used as the main kernel, as it effectively models the non-linear relationships between soil features and target variables31. The SVR model seeks to determine the best prediction boundary while minimizing the error.

Model performance evaluation

To evaluate the performance of the DTR and SVM models, several error statistics were utilized. These include mean absolute error (MAE), which quantifies error by calculating the absolute differences between observed and predicted values and dividing by the number of samples. Additionally, mean squared error (MSE) aggregates the squared differences between observed and predicted data and divides by the number of data points to assess model accuracy. Another statistic, root mean squared error (RMSE), computes the square root of the sum of squared differences, providing the error scale in the original units of the data. Moreover, mean absolute percentage error (AMAPE) calculates the average percentage difference between actual and predicted values. Lastly, the ratio of performance to deviation (RPD) divides the standard deviation of the observed data by the prediction error, reflecting the model’s efficiency in delivering more accurate predictions. To ensure the robustness and reproducibility of our results, model training and evaluation were performed over 40 iterations using random resampling. This iterative approach helps reduce the influence of data selection bias and provides a more reliable assessment of model performance. These statistics are essential tools for evaluating and comparing the performance of regression models in predicting various soil properties32,33. In this study, R, ENVI, ArcGIS, and SPSS software were utilized for remote sensing image processing, spatial analyses, and data analysis. The research methodology, including the 40 model evaluation iterations, is illustrated in Fig. 2.

Fig. 2.

Fig. 2

Flowchart of the research methodology.

Results

Comparative descriptive statistics for training and test sets

To evaluate the representativeness and variability of the training (n = 210) and test (n = 70) datasets, descriptive statistics—including mean, standard deviation, minimum, maximum, and coefficient of variation (CV)—were calculated separately for each subset. In the training dataset, clay content ranged from 11 to 33%, with a mean of 22.5%, standard deviation (SD) of 4.8%, and a CV of 21.3%. Silt content ranged from 21 to 45% (mean = 32.0%, SD = 4.2%, CV = 13.1%), and sand ranged from 30 to 65% (mean = 45.5%, SD = 5.3%, CV = 11.6%). In the test dataset, clay ranged from 12 to 34% (mean = 24.2%, SD = 5.6%, CV = 23.1%), silt from 20 to 44% (mean = 30.5%, SD = 4.9%, CV = 16.1%), and sand from 31 to 63% (mean = 44.1%, SD = 6.0%, CV = 13.6%). The higher CV observed for clay content in both datasets reflects the influence of flood irrigation, which can redistribute finer particles like clay unevenly across the field, especially in slightly sloping or irregular terrains. Conversely, the relatively lower CVs for silt and sand suggest a more homogeneous spatial distribution, which may be attributed to their coarser nature and lower mobility under surface flow conditions. Texture classification revealed that the training dataset comprised four dominant soil texture classes: clay loam, sandy clay loam, loam, and sandy loam. In contrast, the test dataset included only three classes, with the sandy clay loam category missing. This imbalance is likely due to the natural spatial heterogeneity of soil properties in the study area and the inherent randomness of the stratified sampling process34. Additionally, both datasets shared consistent soil color categories—5YR 5/4, 7.5YR 5/4, 10YR 5/2, and 10YR 5/4—confirming chromatic similarity across the sampling locations.

Comparison of means

The input dataset consisted of MODIS sensor indices and satellite bands (bands 1, 2, 3, 4, 31, and 32), along with various spectral indices such as NDVI, SAVI, PDI, MPDI, MSAVI, NVWSI, MCWCI, NVSWI, VCI, TCI, TVI, and VHI. To assess the variability of the input data, analyses were conducted considering soil texture types and seasonal variations. The findings indicated no statistically significant differences in the mean values of the input data between the four soil texture classes, except for bands 31 and 32. This lack of significance is likely attributed to the high similarity between the soil texture classes and minimal variation in the proportions of soil texture components across the study area. Seasonal comparisons, however, revealed significant differences in the means of indices such as band 2, band 3, PDI, and MPDI across the three seasons. Similarly, for bands 1 and 4, significant variations were observed between spring and autumn compared to summer. Conversely, for bands 31 and 32, as well as indices like NDVI, SAVI, MSAVI, NVSWI, VCI, TVI, TCI, VHI, and MCWCI, significant differences were noted between spring and summer compared to autumn (Table 2).

Table 2.

Results of duncan’s test for input variables based on time and soil texture variation at the 5% probability level.

Input variables Soil texture classes Seasons
Clay loam Sandy clay loam Loam Sandy loam Autumn Summer Spring
Band-1 b a A A b a B
Band-2 b a b a c a B
Band-3 b a b a b a C
Band-4 a a a a b a B
Band-31 a a a b b A A
Band-32 a a a b b a A
NDVI a a a a b a A
PDI a a a a c a B
SAVI a a a a b a A
MPDI a a a a c a B
MSAVI a a a a b a A
TCI a a a a a b b
TVI a a a a a b b
VHI a a a a a b a
NVSWI a a a a b a a
NVSWI a a a a b a a
MCWCI a a b a b a a
VCI a a b a b a a
LST a a a a b a a

Pearson correlation between soil texture and color with satellite image indices

The Pearson correlation analysis between MODIS sensor data and soil properties revealed several significant relationships. Hue showed strong negative correlations with Band 31 (r = − 0.61), Band 32 (r = − 0.61), TCI (r = − 0.59), and VHI (r = − 0.65). Value demonstrated notable positive correlations with Band 1 (r = 0.60), Band 2 (r = 0.55), Band 4 (r = 0.45), Band 31 (r = 0.61), Band 32 (r = 0.55), as well as PDI (r = 0.61), MPDI (r = 0.61), and TVI (r = 0.65). Similarly, Chroma was positively correlated with Band 1 (r = 0.60), Band 2 (r = 0.55), Band 31 (r = 0.61), Band 32 (r = 0.55), and TVI (r = 0.65). Regarding soil texture, clay content showed negative correlations with Band 31 (r = − 0.61), Band 32 (r = − 0.61), TCI (r = − 0.59), and VHI (r = − 0.65), whereas sand content had positive correlations with Band 31 (r = 0.61) and Band 32 (r = 0.55). Silt displayed moderate correlations with Band 31 (r = 0.45) and Band 32 (r = 0.55) (Fig. 3).

Fig. 3.

Fig. 3

Person correlation between soil texture and color with bands and spectral indices derived from the MODIS sensor.

Modeling soil texture and color using machine learning models

The kernel functions used in SVR model include linear, RBF, and sigmoid functions. Accordingly, a sensitivity analysis was performed in SVR model with Epsilon to assess the impact of the kernel function and the model’s regularization parameter. The average results of the sensitivity analysis for all parameters used are presented in Fig. 4. Based on the Fig. 4, the minimum error occurs at a regularization parameter value of 1.5 and the RBF kernel function. After performing the sensitivity analysis of the regression models, the components of soil texture and color were estimated using SVR and DTR models. The spatial distribution of these parameters includes soil texture using the SVR model (Fig. 5), the DTR model (Fig. 6), and soil color (Fig. 7). Based on the spatial distribution of soil mineral components in both models, it is observed that the clay content is higher in the western part of the region in both methods, while the sand content is lower in the same area. Conversely, for sand, the pattern is the opposite of that for clay. Additionally, the soil texture prediction based on clay and sand particles showed that the measured texture classes corresponded well with the predicted texture classes. The results of soil texture class mapping revealed that the study area consists of four classes: Clay loam, Sandy clay loam, Loam, and Sandy loam. The majority of the soils in the study area fall into the Loam class.

Fig. 4.

Fig. 4

Sensitivity analysis of SVR models.

Fig. 5.

Fig. 5

Spatial distribution of clay (a), sand (b), silt (c), and soil texture (d) using SVR model.

Fig. 6.

Fig. 6

Spatial distribution of clay (a), sand (b), silt (c), and soil texture (d) using DTR model.

Fig. 7.

Fig. 7

Spatial distribution of soil color using SVR (a) and DTR (b) models.

Based on Fig. 8, the results show that the error in estimating the components of sand and silt using the SVR method is lower compared to other components. In the estimation of clay using the DTR method, the AMAPE error is higher than for all other components. A general comparison of all error statistics reveals that the DTR method has higher error rates compared to the SVR method. Additionally, the analysis of the important RPD statistic, which is a key metric for evaluating the accuracy of machine learning methods, shows that the error for the DTR method is higher than that of the SVR method for the estimation of all measured components (Fig. 8). The reduction in the error criteria (RMSE, AMAPE, MSE, MAE, and RPD) from the RT method to the SVR method is as follows: for clay, 0.41, 5, 0.1, 0.1, and 0.87 (%), respectively; for sand, 1.77, 0.3, 0.3, and 0.18 (%); for silt, 1.24, 3, 0.2, 0.2, and 1 (%); for Hue, 0, 0.2, 0, 0, and 0.2 (%); for Value, 1.04, 3, 0.2, 0.2, and 1 (%); and for Chroma, 0.57, 2, 0.1, 0.1, and 1 (%), respectively (Table 3).

Fig. 8.

Fig. 8

Comparison of SVR and DTR for determining soil color and texture.

Table 3.

Analysis of error criteria differences between DTR and SVR methods in estimating measured soil properties.

Soil texture
Error criteria DTR Clay – SVR Clay DTR Sand – SVR Sand DTR Silt – SVR Silt
RMSE 0.41 1.77 1.24
AMAPE 5 0.3 3
MSE 0.1 0.3 0.2
MAE 0.1 0.3 0.2
RPD 0.87 0.18 1
Soil color
Error criteria DTR Hue – SVR Hue DTR Value – SVR Value DTR Chroma – SVR Chroma
RMSE 0 1.04 0.57
AMAPE 0.2 3 2
MSE 0 0.2 0.1
MAE 0 0.2 0.1
RPD 0.8 1 1

3.5. Performance analysis of SVR and DTR models based on soil texture and color similarity percentages in test data.

After obtaining the spatial distribution of soil texture and soil color using the training data (n = 210), their values were extracted at the test points (n = 70). Table 4 presents the percentage similarity between the measured values of soil texture and soil color based on the test data. The analysis of the results indicates that the SVR model generally outperforms the DTR model across most soil texture and color classes. For instance, in the soil texture classes of Loam, Sandy clay loam, and Clay loam, the SVR model achieved accuracies of 90%, 85%, and 81%, respectively, whereas the DTR model attained accuracies of 85%, 71%, and 61%, respectively. A notable difference was observed in the Sandy loam texture class, where the SVR model reached an accuracy of 100%, while the DTR model showed no match (0%). This discrepancy may be attributed to an insufficient number of sampling points within this class, which likely impaired the predictive capability of the DTR model. Regarding soil color prediction, the SVR model also demonstrated superior performance compared to the DTR model across all measured color classes. Specifically, for the color classes 10YR 5/4 and 10YR 5/2, the SVR model achieved accuracies of 89% and 86%, respectively, while the DTR model achieved 69% and 78%. These results suggest that the SVR model not only provides higher accuracy in soil texture prediction but also offers greater reliability in estimating soil color.

Table 4.

Percentage similarity between predicted and measured classes of soil texture and color using SVR and DTR models based on test data (n = 70).

Soil texture
Measured class SVR method DTR method
Clay loam 81 (%) 61 (%)
Sandy clay loam 85 (%) 71 (%)
Loam 90 (%) 85 (%)
Sandy loam 100 (%) 0 (%)
Soil color
Measured class SVR method DTR method
5YR 5/4 83 (%) 69 (%)
7.5YR 5/4 73 (%) 66 (%)
10YR 5/2 86 (%) 78 (%)
10YR 5/4 89 (%) 69 (%)

Discussion

In this study, soil texture and color modeling and prediction were conducted using machine learning techniques, including SVR and DTR, in conjunction with MODIS sensor data (Table 1). Soil texture and color, as two key properties, have significant implications for environmental and agricultural processes35. Soil texture, comprising the proportions of clay, silt, and sand particles, plays a critical role in processes such as water retention, nutrient transport, and drainage11,32,36,37. Remote sensing data, such as MODIS imagery, offer valuable tools for more accurate soil property modeling due to their capability for continuous measurement and reduced ground sampling costs16,38. Machine learning techniques, with their ability to analyze complex datasets and produce high-accuracy results, have become leading approaches in soil modeling39,40. Moreover, soil color, influenced by its mineral composition and organic matter content, serves as an important indicator for estimating soil organic carbon and assessing environmental health.

A key challenge in linking qualitative soil color metrics to quantitative remote sensing data lies in the inherent subjectivity of traditional color assessment methods. Soil color is often described using qualitative scales such as the Munsell Soil Color Chart, which relies on human visual interpretation and can introduce variability. To bridge this gap, current remote sensing approaches utilize spectral reflectance data from sensors like MODIS, which provide objective, quantifiable measurements across different wavelengths correlated with soil color properties. These spectral signatures are then processed through machine learning models to predict soil color parameters quantitatively. However, the quantification of soil color remains complex due to factors such as soil moisture, organic matter content, and mineral composition, which influence reflectance values and complicate direct translation from spectral data to perceived color. To address this ambiguity, advanced preprocessing steps such as atmospheric correction and soil surface standardization are applied to remote sensing data, improving the reliability of color estimation. Moreover, developing standardized indices and calibration protocols linking spectral data to established color metrics can enhance clarity and reproducibility in soil41,42.

The findings of this study demonstrated that the SVR model outperformed the DTR model in estimating soil texture and color properties, providing more accurate predictions with lower error rates (Fig. 8; Table 3). This difference in model performance may stem from variations in the architecture of machine learning methods and the influence of the number and quality of soil sampling points. Previous studies, such as the work by Arrouays, et al.43have shown that the quantity and accuracy of soil samples can significantly impact modeling results. Furthermore, error criteria like RMSE have been widely employed in many studies as indicators for evaluating model accuracy44. This error criteria often highlights the performance of robust models in comparison to others Liu et al.37. In this study, the primary focus was on assessing model performance using prediction error criteria, which enabled a more detailed analysis of each technique’s strengths and weaknesses in soil property modeling (Fig. 8). Recent studies by Liu & Xu45 and Sun, et al.46 also affirm that comparative analyses of models under various conditions can provide valuable insights for developing stronger and more accurate models. Over time, these advancements will enhance the predictive capability for soil properties and ensure optimal use of available informational resources.

The results of this study indicated that the spatial distribution of soil texture and color content (Figs. 5 and 6) is significantly influenced by the composition of clay, silt, and sand, as well as by erosion and sedimentation processes47. In the western regions, the highest sand content was observed, which, due to the lighter color of sand, led to a lighter color of soil in these areas. In contrast, the eastern regions exhibited higher clay content, resulting in darker soil colors. These patterns align with the effects of topography and hydrological processes, such as salt transport and sedimentation changes48. Furthermore, the analysis of soil silt particle revealed that the changes in silt across different areas did not follow a consistent trend36. This variation could be due to the influence of silt particle size and properties, which lie between clay and sand, as well as the impact of very fine sand particles in silt classification. Additionally, due to its intermediate properties, silt exhibits complex variations, making its accurate assessment more challenging11,37.

In this study, two machine learning models, SVR and DTR, were employed for modeling and predicting soil texture and color. Among these methods, the SVR model demonstrated superior accuracy, attributed to its higher capability in modeling complex and nonlinear relationships between soil texture and color (Fig. 8; Table 3, and 4). One important limitation of the SVR model is its sensitivity to the tuning of hyperparameters such as the kernel type, regularization parameter, and epsilon. Improper tuning of these parameters can lead to reduced prediction accuracy or overfitting, which requires careful optimization using techniques like grid search or evolutionary algorithms. This sensitivity also necessitates high-quality and sufficiently diverse training data to ensure robust model performance. Moreover, compared to decision tree-based models, SVR tends to be computationally more intensive, which can pose challenges in handling large datasets or environments with limited computational resources. Despite these limitations, SVR remains a powerful approach due to its superior capability in modeling complex nonlinear relationships in soil property. However, it is important to note that each soil mapping method has its unique advantages and limitations. Achieving accurate results requires precise sampling tailored to regional conditions and the selection of appropriate environmental parameters for estimation. The overarching goal of employing these models is to advance agriculture. As long as life exists on Earth, there will be a need to develop soil science and create models for more accurate estimations. However, the findings of this and similar studies do not necessarily confirm the universal applicability of a single model across all locations and conditions. That said, the use of machine learning techniques represents a promising approach for estimating other soil properties. Despite their significant advantages, these methods also present challenges, including the need for high-quality data, robust computational capabilities, and technical expertise for implementation and interpretation. Nevertheless, with technological advancements and improved access to data and computational resources, the application of these methods in soil studies is expected to expand and improve. Continuous development of models for more precise soil property predictions is crucial, especially in today’s world, where increasing population and intensified pressure on soil resources demand sustainable management. Developing more accurate models will remain a priority to address these pressing challenges49,50.

One limitation of this study was the imbalance in sampling soil texture classes, especially the underrepresentation of the Sandy loam class, which posed challenges for accurate prediction of this category (Table 4). To overcome this, future research should consider employing stratified sampling techniques to ensure adequate representation across all soil texture classes. Stratified sampling can reduce bias and improve model generalizability by creating more balanced datasets21,22. Additionally, seasonal variations significantly affect spectral indices such as NDVI, SAVI, and VCI, influencing the reliability of soil texture and color predictions. Changes in vegetation cover, soil moisture, and atmospheric conditions throughout different seasons cause variability in spectral reflectance, potentially introducing bias and reducing model accuracy. To address these seasonal effects, utilizing multi-temporal datasets covering various seasons can enhance model robustness by capturing full seasonal variability. Moreover, advanced preprocessing methods, including atmospheric correction, cloud masking, and spectral normalization, contribute to consistent data quality. Future modeling efforts may benefit from developing season-specific models or incorporating seasonality as an explicit variable within machine learning frameworks. Implementing these strategies will improve prediction precision and the overall robustness and applicability of soil property mapping51.

Conclusion

In this study, the performance of two machine learning models, SVR and DTR, was evaluated for predicting soil texture and color properties based on MODIS satellite data. The results indicated that satellite-derived features, including bands 31 and 32, as well as the TCI and VHI indices, exhibited significant correlations with soil texture components and color attributes, highlighting the necessity for precise calibration of parameters associated with these indices. The SVR model demonstrated superior performance compared to the DTR model, owing to its ability to minimize structural risk, achieve an optimal balance between training error and model capacity, and map input data into high-dimensional spaces using kernel functions. To further enhance prediction accuracy, it is recommended that the SVR model be integrated with evolutionary optimization algorithms such as Genetic Algorithm or Particle Swarm Optimization to optimize parameter tuning. From a practical perspective, this advanced modeling approach can assist end-users, including farmers and land resource planners, in optimizing resource management and improving productivity through more accurate soil property predictions. Considering the importance of temporal trend analysis in soil property variations and satellite indices, future studies are encouraged to employ time series methodologies to investigate soil dynamics. Advanced time series models, such as ARIMA or recurrent neural networks (RNN/LSTM), have the potential to capture complex temporal patterns and improve forecasting capabilities. Finally, the limitations of this study include restricted generalizability of the results across different geographic regions and varying soil conditions, necessitating broader evaluation and validation in future research.

Author contributions

Jiyang Wang, Conceptualization, Formal analysis, Investigation, data curation, Methodology, Software, writing – original draft, Writing – review & editing.

Funding

This study is supported by Liaoning Province Science and Technology Plan Joint Program (Natural Science Foundation-General Program) “Research on Intelligent Recognition and Classification Method of Multi type Iron Ore Based on Image Processing, Project No. 2024-MSLH-347”.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The data of satellite images and digital elevation model are available in the Google Earth Engine (GEE) platform.

Declarations

Competing interests

The authors declare no competing interests.

Ethics approval

All authors have read, understood, and have complied as applicable with the statement on “Ethical responsibilities of Authors” as found in the Instructions for Authors and are aware that with minor exceptions, no changes can be made to authorship once the paper is submitted.

Footnotes

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1038/s41598-026-38673-7

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

2/4/2026

This article has been retracted. Please see the Retraction Notice for more detail: 10.1038/s41598-026-38673-7

References

  • 1.Khosravi Aqdam, K., Rezapour, S., Asadzadeh, F. & Nouri, A. An integrated approach for estimating soil health: incorporating digital elevation models and remote sensing of vegetation. Comput. Electron. Agric.210, 107922. 10.1016/j.compag.2023.107922 (2023). [Google Scholar]
  • 2.Dadgar, M. & Faramarzi, S. E. Assessing the performance of machine learning models for predicting soil organic carbon variability across diverse landforms. Environ. Earth Sci.83, 657. 10.1007/s12665-024-11960-0 (2024). [Google Scholar]
  • 3.Abbaszad, P., Asadzadeh, F., Rezapour, S., Khosravi Aqdam, K. & Shabani, F. Evaluation of Landsat 8 and Sentinel-2 vegetation indices to predict soil organic carbon using machine learning models. Model. Earth Syst. Environ.10, 2581–2592. 10.1007/s40808-023-01916-x (2024). [Google Scholar]
  • 4.Barrena-González, J., Gabourel-Landaverde, V. A., Mora, J., Contador, J. F. L. & Fernández, M. P. Exploring soil property Spatial patterns in a small grazed catchment using machine learning. Earth Sci. Inf.16, 3811–3838. 10.1007/s12145-023-01125-1 (2023). [Google Scholar]
  • 5.Faramarzi, S. E., Pazira, E., Masihabadi, M. H., Torkashvand, M., Motamedvaziri, B. & A. & Modeling and estimating the Spatial distribution of soil organic matter content in irrigated lands. Int. J. Environ. Sci. Technol.19, 7399–7410. 10.1007/s13762-022-03909-2 (2022). [Google Scholar]
  • 6.Makovníková, J., Kološta, S., Pálka, B. & Flaška, F. Evaluation of the soil quality using health index in temperate European conditions (Slovak Republic). Environ. Earth Sci.83, 591. 10.1007/s12665-024-11890-x (2024). [Google Scholar]
  • 7.Mechal, A. & Bayisa, A. Modeling the impacts of climate change on watershed hydrology using climate and hydrological models: the case of the Ziway lake watershed, Ethiopian rift. Environ. Model. Assess.10.1007/s10666-024-10010-0 (2024). [Google Scholar]
  • 8.Dong, Y. & Hauschild, M. Z. Indicators for environmental sustainability. Procedia CIRP. 61, 697–702. 10.1016/j.procir.2016.11.173 (2017). [Google Scholar]
  • 9.Ding, X. et al. Model prediction of depth-specific soil texture distributions with artificial neural network: A case study in yunfu, a typical area of Udults zone, South China. Comput. Electron. Agric.169, 105217. 10.1016/j.compag.2020.105217 (2020). [Google Scholar]
  • 10.Stiglitz, R. et al. Soil color sensor data collection using a GPS-enabled smartphone application. Geoderma296, 108–114. 10.1016/j.geoderma.2017.02.018 (2017). [Google Scholar]
  • 11.Coblinski, J. A. et al. Prediction of soil texture classes through different wavelength regions of reflectance spectroscopy at various soil depths. CATENA189, 104485. 10.1016/j.catena.2020.104485 (2020). [Google Scholar]
  • 12.Rizzo, R. et al. Remote sensing of the earth’s soil color in space and time. Remote Sens. Environ.299, 113845. 10.1016/j.rse.2023.113845 (2023). [Google Scholar]
  • 13.Sahwan, W., Lucke, B., Kappas, M. & Bäumler, R. Assessing the Spatial variability of soil surface colors in Northern Jordan using satellite data from Landsat-8 and Sentinel-2. Eur. J. Remote Sens.51, 850–862. 10.1080/22797254.2018.1502624 (2018). [Google Scholar]
  • 14.Barman, U. & Choudhury, R. D. Soil texture classification using multi class support vector machine. Inform. Process. Agric.7, 318–332. 10.1016/j.inpa.2019.08.001 (2020). [Google Scholar]
  • 15.Khosravi Aqdam, K., Nouri, A., Miran, N., Faramarzi, S. E. & Akhlaghi, M. Enhanced surface soil moisture prediction through Dual-Satellite spectral fusion. Earth Syst. Environ.9, 1235–1252. 10.1007/s41748-025-00638-7 (2025). [Google Scholar]
  • 16.Chen, D., Chang, N., Xiao, J., Zhou, Q. & Wu, W. Mapping dynamics of soil organic matter in croplands with MODIS data and machine learning algorithms. Science of The Total Environment 669, 844–855 (2019). 10.1016/j.scitotenv.2019.03.151 (2019). [DOI] [PubMed]
  • 17.Hannaway, D. B., Daly, C., Coop, L., Chapman, D. & Wei, Y. GIS-based Forage Species Adaptation Mapping in Grasslands305–329 (CRC, 2019).
  • 18.Wang, X. & Liu, C. The Agricultural product logistics in shenyang based on the perspective of the supply chain. International Conference on Mechatronics, Control and Electronic Engineering (MCE-14). 739–742 (2014).
  • 19.Huang, J., Ebach, M. C. & Triantafilis, J. Cladistic analysis of Chinese soil taxonomy. Geoderma Reg.10, 11–20. 10.1016/j.geodrs.2017.03.001 (1977). [Google Scholar]
  • 20.Cochran, W. G., Sampling & Techniques John Wiley & Sons (1977).
  • 21.Minasny, B. & McBratney, A. B. A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & geosciences 32, 1378–1388 https://doi.org/0.1016/j.cageo.2005.12.009 (2006).
  • 22.Wadoux, A. M. C., Brus, D. J. & Heuvelink, G. B. Sampling design optimization for soil mapping with random forest. Geoderma355, 113913. 10.1016/j.geoderma.2019.113913 (2019). [Google Scholar]
  • 23.Khosravi Aqdam, K., Asadzadeh, F., Momtaz, H. R., Miran, N. & Zare, E. Digital mapping of soil erodibility factor in Northwestern Iran using machine learning models. Environ. Monit. Assess.194, 387. 10.1007/s10661-022-10048-1 (2022). [DOI] [PubMed] [Google Scholar]
  • 24.Bouyoucos, G. J. Hydrometer method improved for making particle size analyses of soils. Agron. J.54, 464–465. 10.2134/agronj1962.00021962005400050028x (1962). [Google Scholar]
  • 25.Munsell, A. H. Munsell soil color charts. Gretagmacbeth (2000).
  • 26.Ma, S. et al. Application of the water-related spectral reflectance indices: A review. Ecol. Ind.98, 68–79. 10.1016/j.ecolind.2018.10.049 (2019). [Google Scholar]
  • 27.Jae-Hyun, R., Dohyeok, O. & Jaeil, C. Simple method for extracting the seasonal signals of photochemical reflectance index and normalized difference vegetation index measured using a spectral reflectance sensor. J. Integr. Agric.20, 1969–1986. 10.1016/S2095-3119(20)63410-4 (2021). [Google Scholar]
  • 28.Yan, G., Li, H. & Shi, Z. Evaluation of thermal indices as the indicators of heat stress in dairy cows in a temperate climate. Animals11, 2459. 10.3390/ani11082459 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yang, B. et al. Combined multivariate drought index for drought assessment in China from 2003 to 2020. Agric. Water Manage.281, 108241. 10.1016/j.agwat.2023.108241 (2023). [Google Scholar]
  • 30.Zhang, X. et al. Allocate soil individuals to soil classes with topsoil spectral characteristics and decision trees. Geoderma320, 12–22. 10.1016/j.geoderma.2018.01.023 (2018). [Google Scholar]
  • 31.Ballabio, C. Spatial prediction of soil properties in temperate mountain regions using support vector regression. Geoderma151, 338–350. 10.1016/j.geoderma.2009.04.022 (2009). [Google Scholar]
  • 32.Agussabti, R., Satriyo, P. & Munawar, A. A. Data analysis on near infrared spectroscopy as a part of technology adoption for cocoa farmer in Aceh province, Indonesia. Data Brief.29, 105251. 10.1016/j.dib.2020.105251 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bas, M. C., Ortiz, J., Ballesteros, L. & Martorell, S. Forecasting 7BE concentrations in surface air using time series analysis. Atmos. Environ.155, 154–161. 10.1016/j.atmosenv.2017.02.021 (2017). [Google Scholar]
  • 34.Roy, D. et al. Impact of long term conservation agriculture on soil quality under cereal based systems of North West India. Geoderma405, 115391. 10.1016/j.geoderma.2021.115391 (2022). [Google Scholar]
  • 35.Mgohele, R. N., Massawe, B., Shitindi, M. J., Sanga, H. G. & Omar, M. Prediction of soil texture using remote sensing data. A systematic review. Front. Remote Sens.5, 1461537. 10.3389/frsen.2024.1461537 (2024). [Google Scholar]
  • 36.Khosravi Aqdam, K. et al. Predicting the Spatial distribution of soil mineral particles using OLI sensor in Northwest of Iran. Environ. Monit. Assess.193, 377. 10.1007/s10661-021-09163-2 (2021). [DOI] [PubMed] [Google Scholar]
  • 37.Liu, F. et al. High-resolution and three-dimensional mapping of soil texture of China. Geoderma361, 114061. 10.1016/j.geoderma.2019.114061 (2020). [Google Scholar]
  • 38.Lyu, Y. et al. Machine learning techniques and interpretability for maize yield Estimation using Time-Series images of MODIS and Multi-Source data. Comput. Electron. Agric.222, 109063. 10.1016/j.compag.2024.109063 (2024). [Google Scholar]
  • 39.Driba, D. L., Emmanuel, E. D. & Doro, K. O. Predicting wetland soil properties using machine learning, geophysics, and soil measurement data. J. Soils Sediments. 24, 2398–2415. 10.1007/s11368-024-03801-1 (2024). [Google Scholar]
  • 40.Khawaja, L. et al. Development of machine learning models for forecasting the strength of resilient modulus of subgrade soil: genetic and artificial neural network approaches. Sci. Rep.14, 18244. 10.1038/s41598-024-69316-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Du, Y. et al. A comparative study of four color measurement methods for soil color identification and related properties prediction. Comput. Electron. Agric.230, 109801. 10.1016/j.compag.2024.109801 (2025). [Google Scholar]
  • 42.Wang, X., Li, S., Zhang, C., Mao, D. & Wang, L. Satellite images reveal soil color changes in typical black soil region of china: brighter, redder, and yellower. CATENA254, 108958. 10.1016/j.catena.2025.108958 (2025). [Google Scholar]
  • 43.Arrouays, D., Lagacherie, P. & Hartemink, A. E. Digital soil mapping across the Globe. Geoderma Reg.9, 1–4. 10.1016/j.geodrs.2017.03.002 (2017). [Google Scholar]
  • 44.Kaya, F., Başayiğit, L., Keshavarzi, A. & Francaviglia, R. Digital mapping for soil texture class prediction in Northwestern Türkiye by different machine learning algorithms. Geoderma Reg.31, e00584. 10.1016/j.geodrs.2022.e00584 (2022). [Google Scholar]
  • 45.Liu, M. & Xu, N. Adaptive neural predefined-time hierarchical sliding mode control of switched under-actuated nonlinear systems subject to bouc-wen hysteresis. Int. J. Syst. Sci.55, 2659–2676. 10.1080/00207721.2024.2344059 (2024). [Google Scholar]
  • 46.Sun, X. et al. Genesis of Pb–Zn-Ag-Sb mineralization in the Tethys himalaya, china: early magmatic-hydrothermal Pb–Zn(-Ag) mineralization overprinted by Sb-rich fluids. Miner. Deposita. 59, 1275–1293. 10.1007/s00126-024-01264-5 (2024). [Google Scholar]
  • 47.Vitharana, U. W. A., Mishra, U. & Mapa, R. B. National soil organic carbon estimates can improve global estimates. Geoderma337, 55–64. 10.1016/j.geoderma.2018.09.005 (2019). [Google Scholar]
  • 48.Yang, J. et al. Effect of colour calibration on the prediction of soil organic matter content based on original soil images obtained from smartphones under different lighting conditions. Soil Tillage. Res.238, 106018. 10.1016/j.still.2024.106018 (2024). [Google Scholar]
  • 49.49 Padarian, J., Minasny, B. & McBratney, A. B. Machine learning and soil sciences: a review aided by machine learning tools. SOIL 6, 35–52 (2020). 10.5194/soil-6-35-2020 (2020).
  • 50.Román Dobarco, M., McBratney, A., Minasny, B. & Malone, B. A modelling framework for Pedogenon mapping. Geoderma393, 115012. 10.1016/j.geoderma.2021.115012 (2021). [Google Scholar]
  • 51.Hesketh, M. & Sánchez-Azofeifa, G. A. The effect of seasonal spectral variation on species classification in the Panamanian tropical forest. Remote Sens. Environ.118, 73–82. 10.1016/j.rse.2011.11.005 (2012). [Google Scholar]
  • 52.Tucker, C. J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ.8, 127–150. 10.1016/0034-4257(79)90013-0 (1979). [Google Scholar]
  • 53.Kogan, F. N. Application of vegetation index and brightness temperature for drought detection. Adv. Space Res.15, 91–100. 10.1016/0273-1177(95)00079-T (1995).11539265 [Google Scholar]
  • 54.Richardson, A. J. & Wiegand, C. Distinguishing vegetation from soil background information. Photogram. Eng. Remote Sens.43, 1541–1552 (1977). [Google Scholar]
  • 55.Ghulam, A., Qin, Q., Teyip, T. & Li, Z. L. Modified perpendicular drought index (MPDI): a real-time drought monitoring method. ISPRS J. Photogrammetry Remote Sens.62, 150–164. 10.1016/j.isprsjprs.2007.03.002 (2007). [Google Scholar]
  • 56.King, M. D., Kaufman, Y. J., Menzel, W. P. & Tanre, D. Remote sensing of cloud, aerosol, and water vapor properties from the moderate resolution imaging spectrometer (MODIS). IEEE Trans. Geosci. Remote Sens.30, 2–27 (1992). [Google Scholar]
  • 57.Huete, A. R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ.25, 295–309. 10.1016/0034-4257(88)90106-X (1988). [Google Scholar]
  • 58.Qi, J., Chehbouni, A., Huete, A. R., Kerr, Y. H. & Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ.48, 119–126. 10.1016/0034-4257(94)90134-1 (1994). [Google Scholar]
  • 59.Nemani, R. R. & Running, S. W. Estimation of regional surface resistance to evapotranspiration from NDVI and thermal-IR AVHRR data. J. Appl. Meteorol. Climatology. 28, 276–284. 10.1175/1520-0450( (1989). 1989)028%3C0276:EORSRT%3E2.0.CO;2. [Google Scholar]
  • 60.Souza, A. G. S. S., Ribeiro Neto, A. & Souza, L. L. d. Soil moisture-based index for agricultural drought assessment: SMADI application in Pernambuco State-Brazil. Remote Sens. Environ.252, 112124. 10.1016/j.rse.2020.112124 (2021). [Google Scholar]
  • 61.Sridhar, B. B. M., Vincent, R. K., Roberts, S. J. & Czajkowski, K. Remote sensing of soybean stress as an indicator of chemical concentration of biosolid amended surface soils. Int. J. Appl. Earth Obs. Geoinf.13, 676–681. 10.1016/j.jag.2011.04.005 (2011). [Google Scholar]
  • 62.Kogan, F. N. Operational space technology for global vegetation assessment. Bull. Am. Meteorol. Soc.82, 1949–1964. 10.1175/1520-0477( (2001). 2001)082%3C1949:OSTFGV%3E2.3.CO;2. [Google Scholar]
  • 63.63 Sobrino, J. A. & Romaguera, M. Land surface temperature retrieval from MSG1-SEVIRI data. Remote Sens. Environ.92, 247–254. 10.1016/j.rse.2004.06.009 (2004). [Google Scholar]
  • 64.Gao, B. -c. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ.58, 257–266. 10.1016/S0034-4257(96)00067-3 (1996). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The data of satellite images and digital elevation model are available in the Google Earth Engine (GEE) platform.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES