Abstract
Geological hazards, such as landslides, pose significant risks to railway infrastructure. This study assesses landslide susceptibility along a critical section of the Hefei-Fuzhou High-Speed Railway, which traverses a geologically complex region. We developed a landslide susceptibility model using the Random Forest (RF) algorithm based on a historical landslide inventory. The model demonstrates excellent predictive performance, and the resulting susceptibility map highlights that a significant portion of the study area falls within the high and extremely high susceptibility zones. These high-risk areas are spatially correlated with steep slopes and high annual precipitation, providing crucial guidance for regional disaster prevention. Furthermore, an assessment of influencing factor importance using the RF algorithm revealed that slope is the dominant factor affecting landslide susceptibility. The varying importance rankings of other factors between the northern and southern sub-regions highlight the complex and diverse landslide mechanisms driven by different geological conditions. This research fills a key gap in geological hazard mapping for a major high-speed railway corridor in southeastern China. The findings and methodological approach offer a valuable framework for regional infrastructure planning and geological risk management in similar developing regions.
Keywords: Landslide Inventory, Landslide Susceptibility, Geological Hazards, Random Forest, Hefei-Fuzhou High-Speed Railway
Subject terms: Natural hazards, Sustainability, Environmental impact
Introduction
Geological hazards, particularly landslides, are significant natural processes that pose a major threat to human lives and infrastructure globally1,2. Geological hazards cause extensive economic losses and casualties each year3, often triggered by a complex interplay of environmental factors (such as topography, geology, climate) and triggering factors (e.g., seismic activity, heavy rainfall)4. To mitigate these risks, landslide susceptibility mapping (LSM) has become an indispensable tool for spatially predicting landslide occurrence and guiding effective disaster management and land use planning5.
Traditionally, LSM relied on heuristic and statistical methods6, in recent years, the field has been revolutionized by the emergence of machine learning (ML) models. These models are increasingly favored due to their superior ability to capture complex, non-linear relationships within high-dimensional data7, leading to more accurate and reliable predictions. Among the various ML algorithms, the random forest (RF) model has gained widespread recognition8,9. It is an ensemble learning algorithm that combines the power of multiple decision trees, offering exceptional stability, high accuracy, and computational efficiency10. Furthermore, a key advantage of the RF model is its ability to quantify the importance of different environmental factors11, providing critical insights into the underlying mechanisms of landslide development. while SVM are known for their effectiveness in high-dimensional spaces12. The XGBoost13 and deep learning models14 have been explored to further enhance prediction performance.
Despite the proliferation of ML-based LSM studies, a significant gap remains in applying these advanced techniques to specific, high-stakes linear infrastructure, such as high-speed railways15. Notably, infrastructure projects, particularly railway construction, are critically influenced by geological hazards16. The construction and use of roads can damage surface vegetation, leading to slope instability that threatens the safety of the infrastructure17. Slope failures can not only disrupt transportation and damage property but also cause casualties and adversely affect socio-economic stability and regional development. Therefore, it is particularly important to evaluate the risk of geological hazards along major constructing or operating roads.
The Hefei-Fuzhou High-speed Railway is a critical transportation artery connecting the central and southeastern regions in China. This railway traversed a diverse and complex geomorphological landscape, including plains, hills and mountains, making it highly vulnerable to geological hazards18. In recent years, there have been several catastrophic geological hazards reported in nearby regions. For example, an embankment collapse disrupted the Meida Highway and killed 48 travelers in May 202419. A disastrous avalanche in Lishui City claimed 38 lives in November 201520. A major landslide in Chengtian village, Zhejiang destroyed a section of a provincial highway and blocked the local traffic for 17 days21. These catastrophic hazards in nearby regions with similar climatic and topographic characteristics highlight the necessity of mapping regional landslide susceptibility for this crucial linear infrastructure22. Zhou et al.23 and Huang et al.24 mapped landslide susceptibility for few counties in Jiangxi Province and no work has been done for Anhui Province, the northern part of our study area. Zhou et al.25 mapped landslide susceptibility for Fujian Province. While these regional scale LSM studies have been conducted in the provinces along this corridor, they often lack the fine-scale resolution required for infrastructure-specific risk management. Specifically, previous studies have not provided a comprehensive and tailored landslide susceptibility assessment for this crucial railway corridor, nor have they systematically addressed the distinct landslide-driving mechanisms in its geologically different northern and southern sections.
To address these critical knowledge gaps, this study provides a comprehensive landslide susceptibility assessment for the Hefei-Fuzhou High-Speed Railway corridor using the Random Forest model. We aim to offer robust scientific support for disaster prevention and mitigation along this major transportation artery. The central objectives of this study are threefold:
To develop a robust landslide susceptibility model using the Random Forest algorithm and generate a high-resolution susceptibility map for the railway corridor.
To evaluate the importance of various landslide influencing factors to identify the dominant drivers of slope instability in the study area.
To conduct a comparative analysis of the northern and southern sub-regions to reveal segment-specific controlling mechanisms and provide a foundation for tailored risk mitigation strategies.
This study will fill a key gap in geological hazard mapping for one of China’s most important high-speed railway corridors, offering a valuable methodological framework for similar infrastructure projects in developing regions.
Study area
In this study, the Huangshan-Fuzhou section of Hefei-Fuzhou High-Speed Railway and its vicinity is selected as the study area. This crucial transportation artery, located in southeastern China, traverses a complex geological and geomorphological landscape, passing through five cities in Anhui Province, one city in Jiangxi Province, and two cities in Fujian Province before reaching Fuzhou City (Fig. 1).
Fig. 1.
Location of the study area (The basemap was created using a DEM to produce a hillshade map. The legend and other elements were also generated using QGIS 3.1029).
The study area exhibits significant topographic variability, with elevation ranging from approximately − 8 m to over 2000 m. The terrain is dominated by low-to-medium mountains and hills, characterized by steep slopes and deeply incised valleys, which are conducive to slope instability. The region spans multiple tectonic units, including the Yangtze Plate and South China Plate, and exhibits well-developed fault structures and relatively active neotectonic movements. The main exposed strata consist predominantly of Presinian mica-quartz schist and the third and fourth intrusive granites from the Yanshanian period, with developed folds and relatively fractured rock mass, resulting in a complex geological hazard background26.
The region is characterized by a subtropical monsoon climate, with a distinct long rainy season extending from April to September with an average annual rainfall of 1283–2308 mm. The area is also frequently affected by typhoons in the summer and autumn, which can bring short-duration, high-intensity rainfall events. Such extreme precipitation acts as a key triggering factor, leading to soil saturation, increased pore water pressure, and a subsequent decrease in slope stability, thereby significantly increasing landslide risk27.
The study area has a history of significant landslide activity. As of 2020, more than 9,000 landslides of varying scales had been recorded within Jiangxi Province alone28. There are many landslides in the northeast and southwest regions of the corridor, and the distribution is concentrated (the red dots in Fig. 1 represent large-scale landslide traces). These large-scale landslide traces are closely associated with areas of high relief and intense human activity, highlighting the need for a systematic and comprehensive susceptibility assessment.
Methodology
Data
Landslide and non-landslide samples
A comprehensive landslide inventory for the study area was compiled through a multi-step process. Using high-resolution satellite imagery from Google Earth, we employed visual interpretation to identify and delineate the boundaries of landslide traces along the Hefei-Fuzhou High-Speed Railway corridor30,31. This was systematically verified and supplemented by field investigations and a review of existing literature, ensuring the precise spatial positioning and accuracy of the inventory. This process yielded a dataset of 2,299 landslide locations, which were used as our positive samples (landslides distribution shown in Fig. 1).
To create a balanced dataset, an equal number of non-landslide points (negative samples) were generated. We first identified the non-landslide areas by taking the spatial complement of the landslide polygons within the study region. Subsequently, 2,299 random points were generated within these non-landslide areas using GIS software. To ensure spatial representativeness and avoid sampling bias, a minimum distance of 1 km was maintained between any two sample points. The final dataset, consisting of 4,598 landslide and non-landslide points, was carefully screened for invalid values and served as the training data for the machine learning model.
Environmental factors
The occurrence of landslides is a complex phenomenon influenced by the interaction of multiple environmental factors. A scientifically grounded selection of these factors is crucial for developing an accurate landslide susceptibility model. Based on a comprehensive review of existing literature32,33 and the specific characteristics of the study area, we selected 13 key factors, which were categorized into four groups: topographic, geological, hydrological, and land cover factors (Table 1; Fig. 2).
Table 1.
Environmental factors and data sources.
| Factors | Data sources | Class |
|---|---|---|
| DEM | Shuttle Radar Topography Mission39 | The data resolution is 3 arc-second meters. It is projected to a DEM with a resolution of 100 m. |
| Slope |
dz is the elevation difference and dx is the horizontal distance (cell size). |
The slope ranges from 0–90°, generated based on the DEM using GIS software. |
| Aspect |
dzx is the elevation gradient in the horizontal direction. dzy is the elevation gradient in the vertical direction. |
9 categories: Flat, North, Northeast, East, Southeast, South, Southwest, West, and Northwest. |
| Curvature |
|
The Curvature is generated based on the DEM using GIS software. Values > 0 : concave terrain surfaces. Values < 0 : convex terrain surfaces. |
| TPI |
Z0 is the elevation of the center cell. Zi is the elevation of the surrounding neighboring cells. n is the number of cells within the neighborhood. |
6 categories: Valleys, Lower Slopes, Gentle Slopes, Steep Slopes, Upper Slopes, Ridges. |
| Relief |
Zmax is the highest elevation point. Zmin is the lowest elevation point within the analysis area. |
Based on the DEM, set up a circular area search for domain analysis to calculate the Relief. |
| TWI |
SCA is the flow accumulation per unit area. Slope is the gradient of the terrain. |
The TWI is generated based on the DEM using GIS software. |
| Geology | China Geological Survey and United States Geological Survey (USGS) | 12 categories: Quaternary (Q), Tertiary (Te), Cretaceous (K), Jurassic (J), Triassic (Tr), Permian (P), Carboniferous (C), Devonian (D), Silurian (S), Ordovician (O), Cambrian (∈), and Pre-Cambrian (Pre ∈) |
| Distance to Fault | National Seismic Active Fault Data40,41 | 11 categories: 0–1 km、1–2 km、2–3 km、 3–4 km、4–5 km、5–6 km、6–7 km、7–8 km、8–9 km、9–10 km、>10 km |
| Distance to River | Open Street Map(https://www.openstreetmap.org//) | The raster data of distance to river was calculated used Euclidean distance analysis using GIS software. |
| Average Annual Precipitation | Global Climate data42 | Rain (mm) at 1 km resolution was interpolated to 100 m resolution using linear interpolation to match other factors. |
| Landcover | Global Land Cover by National Mapping Organizations43 | Landcover class: 1 Broad-leaved Evergreen Forests, 2 Broadleaf Deciduous Forest, 3 Coniferous Evergreen Forests, 4 Coniferous Deciduous Forests, 5 Mixed Forests, 6 Sparse Forests, 7 Shrublands, 8 Grassy Areas, 10 Sparse Vegetation, 11 Farmlands, 12 Paddy land, 13 Croplands with Other Crops, 15 Wetlands, 16 Consolidated Rock Areas, 18 Urban Areas, 20 Water Bodies. |
| FVC | Global Land Cover by National Mapping Organizations43 | The FVC of land in the study area was set according to 0–100%, and was assigned a value of −1 in the watershed. The data resolution is 1 km, and it was interpolated to 100 m resolution using linear interpolation to match other factors. |
Fig. 2.
Influence factors in the study area: (a) River (b) DEM (c) Relief (d) Slope (e) Curvature (f) TWI (g) Rain (h) FVC (i) TPI (j) Fault (k) Landcover (L) Geology (m) Aspect (Note: This map was generated using QGIS 3.1029 with a base map from Tianditu44.).
Topographic factors
Topography plays a crucial role in controlling landslide occurrence by influencing slope stability, water flow and weathering processes34. Six topographic factors were derived from a high-resolution Digital Elevation Model (DEM), including Slope, Aspect, Curvature, Topographic Position Index (TPI), Topographic Relief, Terrain Wetness Index (TWI). Slope angle is widely recognized as the most critical factor influencing landslide susceptibility. Steeper slopes are more susceptible to failure as they experience higher shear stress, reducing overall stability. Aspect affects the duration and intensity of solar radiation and rainfall received by a slope. This in turn influences soil moisture, weathering processes, and vegetation distribution, leading to variations in slope stability35. Curvature affects the convergence or divergence of surface runoff and subsurface water flow, which affects soil moisture and pore water pressure, thereby impacting slope stability36. TPI reflects the relative position of a point on a slope compared to its surroundings, which helps to identify different landforms (e.g., valleys, ridges, and slopes) with distinct susceptibility characteristics. Relief can measure the complexity of the terrain in the region by quantifying the fluctuation of the terrain. TWI represents the spatial distribution of soil moisture, which is a critical hydrological factor37.
Geological factors
Among the geological factors, we selected geology and distance to faults as evaluation indicators. The geological map of the study area provides a basis for assessing susceptibility based on the lithological characteristics of a region. Faults are zones of structural weakness and fracturing. The closer a location is to a fault, the higher the likelihood of rock mass fracturing and weathering, making it more susceptible to landslide38.
Hydrological and land cover factors
Hydrological conditions and human activities are also key influencing factors that affect slope stability. River erosion at the toe of a slope can reduce its lateral support, leading to instability. Therefore, proximity to rivers is often a significant factor in landslide susceptibility. Precipitation is a primary triggering factor for landslides. Prolonged and intense rainfall increases soil saturation, raises groundwater levels, and elevates pore water pressure, reducing the shear strength of the slope material. Land cover reflects both natural conditions and human activities, disturbing slope equilibrium and significantly increasing landslide risk. Fractional vegetation cover reflects the density of vegetation in a region, directly influencing slope stability by affecting the efficiency of rainfall infiltration and the intensity of surface runoff. Higher FVC generally correlates with greater slope stability.
All these factors were processed using GIS software, and their data sources are detailed in Table 1. The spatial distribution of these factors across the study area is shown in Fig. 2.
Methods
The overall workflow of this study is depicted in the flow chart (Fig. 3). Firstly, a landslide inventory was created by visual interpretation of high-resolution satellite images, followed by field investigation. The landslide samples and the randomly generated non-landslide points constitute the model training samples, which were then integrated with the environmental factor data. RF algorithm was used to train the susceptibility model, which was subsequently applied to the entire study area to predict landslide susceptibility values. Finally, a landslide susceptibility map was generated and classified into five risk levels (Very Low, Low, Moderate, High and Very High) using the Natural Breakpoint Classification method.
Fig. 3.

The overall workflow of this work.
Building landslide susceptibility model
As an ensemble learning algorithm, RF utilizes decision trees as base classifiers. It repeatedly performs bootstrap sampling from the original dataset to generate multiple subsets, and constructs a decision tree for each subset. These individual trees are then combined into an ensemble. During prediction, each tree votes on the classification outcome, and the final result is determined by majority voting across all trees in the forest.
For the model training, we configured the hyperparameters as follows: the forest was composed of 100 decision trees, each with a maximum depth of 8. The minimum number of samples required to split an internal node was set to 10, while the minimum for a leaf node was 5. At each split, the number of features considered was the square root of the total feature count. The model was trained using the 13 environmental factors described previously. To prepare the data, the landslide and non-landslide samples were divided into training and validation sets at an 80/20 ratio, with a fixed random seed (random_state = 42) to ensure reproducibility.
The choice of 100 trees for our RF model was carefully considered based on a trade-off between computational efficiency and predictive performance. Although increasing the number of trees (e.g., to 500 or 1 000) could theoretically enhance model stability. Preliminary tests confirmed that the model’s accuracy and stability plateaus around 100 trees, while using more trees significantly increases computation time without a meaningful improvement in predictive power.
Evaluating model performance
Model accuracy
Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) are used to comprehensively evaluate the classification performance of the model45. The AUC value was calculated based on the ROC curve, which ranges from 0 to 1. Generally, greater than 0.7 of the AUC value indicates a better discriminatory ability and a more accurate model46. The ROC curve is plotted based on the True Positive Rate and False Positive Rate of the model at different thresholds.
Model reliability
Calibration curve or Reliability diagram is frequently used to assess model uncertainty47. The Mean Square Error of the calibration curve (a.k.a. the Brier score), the Expected Calibration Error (ECE) and the Maximum Calibration Error (MCE) are three important indicators in model reliability evaluation48. They are always used to measure the gap between the predicted probability and the actual probability. The lower the value of these indicators, the better the model’s reliability is.
Quantifying the importance of environmental factors
To quantify the importance of the environmental factors, this study employed a dual-approach analysis. First, we used the Gini Impurity metric, an intrinsic feature importance method of the RF model, to conduct a global assessment49. This method calculates a feature’s overall importance by measuring the average reduction in node impurity it provides when used for splitting, aggregated across all trees in the forest. This allowed us to identify the dominant factors influencing landslide susceptibility across the entire study area. Second, to compare the controlling factors within the distinct sub-regions, we utilized Permutation Importance50. This method assesses a feature’s significance by measuring the decrease in model performance when its values are randomly shuffled. A substantial drop in prediction accuracy indicates that the feature is critical for the specific sub-region. By training the same RF model on data from different sub-regions, we were able to conduct a fine-grained analysis of the varying factor importances. This dual approach allowed us to analyze factor importance at both a macro- and micro-level, providing a more robust and nuanced understanding of the landslide-driving mechanisms.
Results
Inter-factor correlation analysis
To assess the potential for multicollinearity among the continuous environmental factors (DEM, Slope, Relief, TWI, Curvature, River, Rain, and FVC), the Pearson’s Correlation Coefficient (PCC) was used in this study. Results of the correlation matrix are shown in Fig. 4. Previous works show that PCC > 0.7 could be used as a threshold to determine high degree of covariance between the variables51. From Fig. 4, he highest PCC in our study area is 0.49 and the mean absolute PCCs is 0.17, both of which are far lower than 0.7. Therefore, the above mentioned 13 influencing factors were used to establish this landslide susceptibility model. The categorical factors (Distance to Fault, Aspect, Geology, Landcover, and TPI) were not included in this analysis.
Fig. 4.
Plot of correlation coefficients between factors.
Model performances and reliability
The performance of our Random Forest model was evaluated using both accuracy and reliability metrics. The Receiver Operating Characteristic (ROC) curve and its corresponding Area Under the Curve (AUC) value were used to assess the model’s predictive accuracy. As shown in Fig. 5, the AUC values of the training dataset and the validation dataset were 0.93 and 0.85, respectively. While a slight performance difference exists between the two datasets, the model maintains a high discriminant efficiency, confirming its strong prediction ability and capacity to effectively distinguish between landslide-prone and non-prone areas.
Fig. 5.

ROC curves and AUC values.
Furthermore, we evaluated the model’s reliability using the Calibration curve to assess the alignment between predicted probabilities and actual outcomes. A perfect Calibration curve lies on the 45-degree line of the plot, indicating a perfect match between predicted and actual probabilities. As shown in Fig. 6, our model exhibits high reliability, as evidenced by low values for key metrics: a Brier score of 0.1589, an Expected Calibration Error (ECE) of 0.0572, and a Maximum Calibration Error (MCE) of 0.1178. These results collectively demonstrate the model’s robustness and its suitability for landslide susceptibility assessment.
Fig. 6.
Calibration curve.
The landslide susceptibility map and factor importance
To identify key environmental factors that have significant impacts on the landslide susceptibility model, the importance of landslide evaluation factors is obtained as shown in following Fig. 7. We can see that Slope (0.31), Relief (0.18) and TWI (0.11) are among the top three environmental factors that contribute to landslide susceptibility in the entire study area.
Fig. 7.
Assessment of the importance of each factor’s characteristics.
By applying the trained RF model to the entire study area, we generated a landslide susceptibility map of the study area. With the Jenks Natural Breaks Classification Method, the results were categorized into five categories: Very Low (< 0.13), Low [0.13–0.31), Medium [0.31–0.50), High [0.50–0.69), and Very High (> 0.69) susceptibility (Fig. 8). The Very High and High susceptibility categories account for a combined 31.87% of the study area (14.36% and 17.51%, respectively). These high-risk areas are mainly distributed in mountainous areas with high elevation, high annual precipitation, and steep slope, spatially correlating well with the distribution of the landslide inventory. In contrast, the Low and Very Low susceptibility zones, which comprise 50.97% of the study area, are typically found in relatively flatter terrain dominated by farmland, paddy land, and urban areas.
Fig. 8.
Spatial distribution of landslide susceptibility evaluation results. The northern and southern rectangular boxes in the figure represent the two typical sub-regions selected for this study. (Note: This map was generated using QGIS 3.1029 with a base map from Tianditu44.).
Comparison of the South and North sub-regions
Our results show that high and very high susceptible areas are mainly located at the north and south ends of the study area (red rectangular box in Fig. 7). The results, as shown in Fig. 9, confirm that the dominant factors differ between these two areas. In the northern sub-region, the top three factors are Slope (0.091), Relief (0.040) and Curvature (0.033). This highlights that topography remains the primary driver of landslide susceptibility in this section. However, in the southern sub-region, while Slope (0.135) remains the most important factor, Rain (0.026) ranks as the third most significant factor after Curvature (0.031). The increased importance of precipitation in the south is likely due to its proximity to the coast, which leads to higher annual rainfall and more frequent typhoon events. This comparative analysis demonstrates the segment-specific controls on slope instability and the need for tailored mitigation strategies.
Fig. 9.
Assessment of Permutation Importance in the sub-regions: (a) the northern sub-region; (b) the southern sub-region.
Discussion
This study provides a comprehensive landslide susceptibility assessment along the Hefei-Fuzhou High-Speed Railway, a critical infrastructure corridor in southeastern China, using a RF model. Our discussion is structured to provide a deeper interpretation of our findings beyond mere results, focusing on three key areas: the regional variability of landslide-controlling factors, the robustness of our model, and a critical analysis of its limitations for future research.
Regional variability of landslide susceptibility
A primary contribution of this study is the identification of distinct regional drivers of landslide susceptibility along the railway corridor, which for the first time provides a targeted hazard assessment for this key transportation artery. While topographic factors are universally important, their specific nature and interaction with hydroclimatic conditions vary significantly between the northern and southern sections. This finding underscores the necessity of considering regional environmental characteristics when assessing risks along linear infrastructure that traverses diverse geographic zones.
In the mountainous and hilly terrain of the northern section, the most influential factors are Slope, Relief, and Curvature, which collectively point to a topography-dominated landslide regime. The high susceptibility in this region is primarily driven by steep slopes that generate greater shear stress, and high relief which is indicative of vigorous geomorphic processes and active uplift. The strong correlation with curvature, especially in areas of high relief, suggests that local terrain morphology is key to stress redistribution and the concentration of both surface and subsurface water flow, which ultimately controls slope stability. This aligns with previous studies in similar mountainous terrains where landslides are controlled by stress and structural configurations.
In contrast, the southern section, which lies in Fujian Province, is governed by Slope, Curvature, and Rain. The heightened importance of precipitation in this sub-region highlights a hydroclimatic-driven landslide mechanism, consistent with studies identifying intense, short-duration rainfall as the primary trigger for catastrophic landslides in Fujian52. This difference is rotted in the region’s distinct geological and climatic setting. The southern sub-region is characterized by a warm, humid climate with abundant rainfall and frequent typhoons, which significantly exacerbates slope instability. Intense rainfall events facilitate water infiltration into fractured bedrock and deep weathering layers, leading to elevated pore water pressure that drastically reduces the shear strength of the slope material, particularly along weak lithological layers53.
In addition, the significant differences between the northern and southern sub-regions are further explained by their unique lithology and land use patterns. The northern sub-region is underlain by Proterozoic metamorphic rocks (such as schist and gneiss) and Mesozoic Yanshanian granites, that weather into strong, coarse-grained soils. In such environments, topographic factors become the primary control on slope stability. Conversely, the southern sub-region is dominated by the red sedimentary beds, that weather into clay-rich, expansive soils. This lithology, combined with heavy rainfall, creates ideal conditions for the development of high pore water pressure above weak and slippery layers, leading to rapid destabilization. Additionally, land use practices amplify these disparities. The northern sub-region benefits from well-preserved forests which provide root reinforcement. while the southern sub-region, with dense populations, experiences intensive agricultural use and deforestation, which disrupts soil structure and increases infiltration capacity, thereby exacerbating landslide susceptibility.
Model reliability and uncertainty analysis
The robustness of our findings is supported by a rigorous data and modeling methodology. The RF model’s high predictive accuracy (AUC = 0.85) and reliability (low Brier score, ECE, and MCE) confirm its suitability for this analysis. Crucially, we extended our reliability assessment by analyzing the model’s predictive standard deviation (SD), which provides a quantitative measure of uncertainty54,55.
As shown in Fig. 10, the spatial distribution of SD reveals that the model’s predictions are highly confident across most of the study area, with over 90% of the region showing an SD of less than 0.3. This low uncertainty indicates a high level of consensus among the individual decision trees within the forest, suggesting that the selected environmental factors effectively and unambiguously distinguish between stable and unstable slope conditions56. It also indirectly reflects the high quality and representativeness of our training inventory57, which sufficiently captures the diverse environmental conditions along the railway corridor.
Fig. 10.
Spatial distribution of Standard Deviation (SD) (Note: This map was generated using QGIS 3.1029 with a base map from Tianditu44.).
However, areas with higher uncertainty (SD > 0.3) are primarily located within transition zones between different susceptibility classes (e.g., between “low” and “high” susceptibility) and in regions with complex geological conditions. Within these transition zones, the values of environmental factors fall near the decision boundaries of many trees, leading to lower inter-tree consensus and, consequently, higher SD. This finding is critical for practical application. For hazard management, the SD map acts as a vital prioritization tool58. Areas classified as “high susceptibility” with “low uncertainty” should be assigned the highest priority for immediate mitigation measures. In contrast, areas with both “high susceptibility” and “high uncertainty” require further field investigation or expert evaluation before committing significant resources59. This approach transforms a static susceptibility map into a more dynamic and risk-informed decision-support system, ensuring the efficient allocation of limited resources.
Limitations and future research
While this study provides valuable insights, we acknowledge several limitations that offer avenues for future research. First, while the 13 factors used in our model represent a comprehensive suite of common landslide triggers, region-specific phenomena may have been overlooked. For instance, future models could incorporate variables like wind speed and typhoon tracks to better capture the localized dynamics in the southern sub-region60. Second, this study focuses on static susceptibility mapping. A more advanced approach would be to incorporate active slope deformation data, such as from InSAR, to enable a transition from static susceptibility maps to dynamic hazard warnings. This would provide more effective support for risk management and emergency response along the Hefei-Fuzhou High-Speed Railway.
Conclusion
This study successfully developed and validated a Random Forest model to map landslide susceptibility along the Hefei-Fuzhou High-Speed Railway, a critical transportation artery in southeastern China. The model demonstrated high predictive accuracy, providing a robust quantitative basis for landslide risk management.
Our findings reveal two major implications for hazard assessment of linear infrastructure. First, while topographic factors (Slope, Relief, and TWI) are the dominant drivers of landslide susceptibility across the entire corridor, their influence varies significantly between sub-regions. We found a clear distinction between a topography-dominated landslide regime in the northern section and a hydroclimatic-driven regime in the southern section, where rainfall emerges as a critical factor. This pronounced regional variability underscores the necessity of moving beyond a uniform approach and adopting tailored, location-specific strategies for disaster prevention.
Second, this research provides not only a crucial susceptibility map to guide hazard mitigation but also a vital methodological framework. By demonstrating how machine learning can be used to identify and explain complex regional differences in landslide drivers, our work offers a valuable scientific basis for assessing geological hazards for similar large-scale projects in diverse environments. The study’s systematic analysis of model uncertainty, using the standard deviation of RF predictions, provides a practical tool for prioritizing mitigation efforts and allocating resources effectively.
Despite its contributions, the study has certain limitations. The model’s accuracy is affected by uncertainties in multi-source data and data gaps in some coastal areas. Future research could enhance predictive performance and regional applicability by integrating more precise remote sensing data and incorporating dynamic factors such as slope deformation monitoring to enable a shift from static susceptibility analysis to a real-time early warning system for slope instability.
Acknowledgements
This work was supported by the National Key Research and Development Program of China (Grant No. 2024YFC3012603 and 2024YFC3012604), National Natural Science Foundation of China (Grant No. 42407257), Chongqing Water Resources Bureau, China (Project No. CQS24C00836) and Key Project of China Railway Design Corporation (Project No. 2023A0226409). Special thanks to Dr. Wentao Yang for his help for the uncertainty assessment of model in the revision stage. Special thanks also go to the editors and anonymous reviewers for their invaluable assistance in refining this work.
Author contributions
J. L. and W. Q.: Methodology, Investigation, Writing—original draft. J. L., W. Q., C. X. and Z. X.: Writing—review and editing, Validation. W. Q. and C. X.: Project administration. P.W., J. S., X. Z., J. C., Y. C., J. P., J. W. and Q. S.: Data acquisition, Investigation.
Data availability
The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study did not involve any human participants, animal subjects, or sensitive data requiring ethical approval. No experiments, interventions, or interactions were conducted that necessitate formal review or approval by an ethics committee.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Kirschbaum, D., Stanley, T. & Zhou, Y. Spatial and Temporal analysis of a global landslide catalog [J]. Geomorphology249, 4–15. 10.1016/j.geomorph.2015.03.016 (2015). [Google Scholar]
- 2.Nadim, F. et al. Global landslide and avalanche hotspots [J]. Landslides3 (2), 159–173. 10.1007/s10346-006-0036-1 (2006). [Google Scholar]
- 3.Haque, U. et al. The human cost of global warming: deadly landslides and their triggers (1995–2014) [J]. Sci. Total Environ.682, 673–684. 10.1016/j.scitotenv.2019.03.415 (2019). [DOI] [PubMed] [Google Scholar]
- 4.Fell, R. et al. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning [J]. Eng. Geol.102 (3), 85–98. 10.1016/j.enggeo.2008.03.022 (2008). [Google Scholar]
- 5.Chang, Z. et al. Landslide susceptibility prediction using slope unit-based machine learning models considering the heterogeneity of conditioning factors [J]. J. Rock. Mech. Geotech. Eng.15 (5), 1127–1143. 10.1016/j.jrmge.2022.07.009 (2023). [Google Scholar]
- 6.Youssef, K. et al. Landslide susceptibility modeling by interpretable neural network [J]. Commun. Earth Environ.4, 162. 10.1038/s43247-023-00806-5 (2023). [Google Scholar]
- 7.Tehrani, F. S. et al. Machine learning and landslide studies: recent advances and applications [J]. Nat. Hazard.114 (2), 1197–1245. 10.1007/s11069-022-05423-7 (2022). [Google Scholar]
- 8.Ishwaran, H. & Malley, J. D. Synthetic learning machines [J]. Biodata Min.7 (1), 1–12. 10.1186/s13040-014-0028-y (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Qi, W., Xu, C., Xu, X. & AutoGluon A revolutionary framework for landslide hazard analysis [J]. Nat. Hazard. Res.1 (3), 103–108. 10.1016/j.nhres.2021.07.002 (2021). [Google Scholar]
- 10.Thi Ngo, P. T. et al. Evaluation of deep learning algorithms for National scale landslide susceptibility mapping of Iran [J]. Geosci. Front.12 (2), 505–519. 10.1016/j.gsf.2020.06.013 (2021). [Google Scholar]
- 11.Reichenbach, P. A review of statistically-based landslide susceptibility models [J]. Sci. Lett.10.1016/j.earscirev.2018.03.001 (2018). [Google Scholar]
- 12.Bammou, Y. et al. Improving landslide susceptibility mapping in semi-arid regions using machine learning and Geospatial techniques [J]. Dysona Appl. Sci.6 (2), 269–290. 10.30493/das.2025.484839 (2025). [Google Scholar]
- 13.He, K., Shen, X. & Anagnostou, E. Global burn severity in forest ecoregions: trends, climate drivers, and predictive insights [J]. Npj Nat. Hazards. 2 (1), 61. 10.1038/s44304-025-00113-3 (2025). [Google Scholar]
- 14.Pham, B. T. et al. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India) [J]. Environ. Modell Softw.84, 240–250. 10.1016/j.envsoft.2016.07.005 (2016). [Google Scholar]
- 15.Shi, W. & Lu, P. Intelligent perception of coseismic landslide migration areas along Sichuan–Tibet railway [J]. IEEE J. Sel. Top. Appl. Earth Obs Remote Sens.14, 8876–8883. 10.1109/JSTARS.2021.3105671 (2021). [Google Scholar]
- 16.Marchesini, I. et al. National-scale assessment of railways exposure to rapid flow-like landslides [J]. Eng. Geol.332, 107474. 10.1016/j.enggeo.2024.107474 (2024). [Google Scholar]
- 17.Panchal, S. & Shrivastava, A. Landslide hazard assessment using analytic hierarchy process (AHP): A case study of National highway 5 in India [J]. Ain Shams Eng. J. 13. 10.1016/j.asej.2021.10.021 (2021).
- 18.Pu, H. et al. Railway alignment optimization in mountainous regions considering Spatial geological hazards: A sustainable safety perspective [J]. Sustainability13 (4), 1661. 10.3390/su13041661 (2021). [Google Scholar]
- 19.Xue, Y. et al. Slope failure mechanism of the 5·1 Meida highway collapse in Guangdong, china: interaction between multi-source water and weathered granite soil [J]. Landslides22 (4), 1199–1212. 10.1007/s10346-025-02455-5 (2025). [Google Scholar]
- 20.Chen, X. et al. The catastrophic 13 November 2015 rock-debris slide in Lidong, south-western Zhejiang (China): a landslide triggered by a combination of antecedent rainfall and triggering rainfall [J]. Geomatics Nat. Hazards Risk. 9 (1), 608–623. 10.1080/19475705.2018.1455750 (2018). [Google Scholar]
- 21.Wang, F. et al. The fault-controlled Chengtian landslide triggered by rainfall on 20 May 2021 in Songyang County, Zhejiang Province, China [J]. Landslides19 (7), 1751–1765. 10.1007/s10346-022-01891-x (2022). [Google Scholar]
- 22.Feng Wenkai, Z. J. et al. Characteristics and drivers of clustered landslides induced by the extreme rainstorm on June 16 in the Fujian-Guangdong-Jiangxi junction area [J]. Earth Sci.10.3799/dqkx.2025.087 (2025). [Google Scholar]
- 23.Zhou, X. et al. Geoinformation-based landslide susceptibility mapping in subtropical area [J]. Sci. Rep.11 (1), 24325. 10.1038/s41598-021-03743-5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Huang, F. et al. Landslide susceptibility prediction based on a semi-supervised multiple-layer perceptron model [J]. Landslides17 (12), 2919–2930. 10.1007/s10346-020-01473-9 (2020). [Google Scholar]
- 25.Zhou, S. et al. Exploring bayesian network model with noise filtering for rainfall-induced landslide susceptibility assessment in Fujian, China [J]. Front. Earth Sci.12. 10.3389/feart.2024.1444882 (2024).
- 26.Lin, J. et al. Risk assessment and its influencing factors analysis of geological hazards in typical mountain environment [J]. J. Clean. Prod. (Aug.1), 309. 10.1016/j.jclepro.2021.127077 (2021).
- 27.Zhou, X. et al. Hydrogeochemical Evolution, isotopic Insights, and genetic models of geothermal water in Anhui Province, China [J]. Water. 17 (2), 236. 10.3390/w17020236 (2025). [Google Scholar]
- 28.Ou, P. et al. Assessment of landslide hazard in Jiangxi using Geo-information [J]. Front. Earth Sci. 9, 648342. 10.3389/feart.2021.648342 (2021). [Google Scholar]
- 29.QGIS Geographic Information System 3.10. (2019). https://www.qgis.org
- 30.Wang, W. et al. Identification and distribution of 13003 landslides in the Northwest margin of Qinghai-Tibet plateau based on human-computer interaction remote sensing interpretation [J]. China Geol.7 (2), 171–187. 10.31035/cg2023140 (2024). [Google Scholar]
- 31.Gao, H. et al. Landslides triggered by the July 2023 extreme rainstorm in the Haihe river Basin, China [J]. Landslides21 (11), 2885–2890. 10.1007/s10346-024-02322-9 (2024). [Google Scholar]
- 32.Ma, H. & Wang, F. Factors controlling the formation and movement of clustered shallow landslides triggered by the extreme rainstorm in July 2023 in Beijing, China [J]. Geomorphology, 478: 109728. (2025). 10.1016/j.geomorph.2025.109728
- 33.Ersayin, K. & Uzun, A. A comprehensive analysis of landslide susceptibility in Iyidere basin (NE, Turkey) using machine learning techniques and statistical bivariate methods [J]. Nat. Hazard.10.1007/s11069-025-07354-5 (2025). [Google Scholar]
- 34.Dahal, A. et al. Quantifying the influence of topographic amplification on the landslides triggered by the 2015 Gorkha earthquake [J]. Commun. Earth Environ.5 (1), 678. 10.1038/s43247-024-01822-9 (2024). [Google Scholar]
- 35.Nguyen, H-H-D. et al. A hybrid approach combining physics-based model with extreme value analysis for Temporal probability of rainfall-triggered landslide [J]. Landslides22 (1), 149–168. 10.1007/s10346-024-02366-x (2025). [Google Scholar]
- 36.Kumari, K. et al. Spatial footprints of moisture-driven landslides in Western Himalayas from 2007 to 2022 [J]. Nat. Hazard.121 (6), 7325–7345. 10.1007/s11069-024-07086-y (2025). [Google Scholar]
- 37.Achu, A. L., Gopinath, G., a C, H. T. J. & D V. Redefining landslide susceptibility under extreme rainfall events using deep learning [J]. Geomorphology448 (Mar.1). 10.1016/j.geomorph.2023.109033 (2024). 1.1–1.18.
- 38.Yacine et al. How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? [J]. Geosci. Front., 11(3): 13. CNKI:SUN:GSFT.0.2020-03-012 (2020).
- 39.Jarvis, A. et al. Hole-filled SRTM for the globe Version 4 [J]. (2008).
- 40.Xu, X. et al. Seismotectonic map in China and its adjacent regions. In: Beijing: Seismological Press. [Z]. In: Beijing: Seismological Press. 2016.activefault.china.250.2016.db.
- 41.Wu, X. et al. China active faults database and its web system [J]. Earth Syst. Sci. Data Discuss.10.5194/essd-2023-119 (2023). [Google Scholar]
- 42.Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km Spatial resolution climate surfaces for global land areas [J]. Int. J. Climatol. 37 (12), 4302–4315. 10.1002/joc.5086 (2017). [Google Scholar]
- 43.Tateishi, R. et al. Production of global land cover data - GLCNMO [J]. Int. J. Digit. Earth. (1), 4. 10.1080/17538941003777521 (2011).
- 44.Tianditu http://www.tianditu.gov.cn/
- 45.Fawcett, T. An introduction to ROC analysis [J]. Pattern Recognit. Lett.27 (8), 861–874. 10.1016/j.patrec.2005.10.010 (2006). [Google Scholar]
- 46.Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms [J]. Pattern Recognit.30 (7), 1145–1159. 10.1016/S0031-3203(96)00142-2 (1997). [Google Scholar]
- 47.Degroot, M. H. & Fienberg, S. E. The comparison and evaluation of forecasters [J]. J. R Stat. Soc. D. 32 (1), 12–22. 10.2307/2987588 (1983). [Google Scholar]
- 48.Guo, C. et al. On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning (ICML 2017). 1321–1330.
- 49.Breiman, L. Random forests [J]. Mach. Learn.45 (1), 5–32. 10.1023/A:1010933404324 (2001). [Google Scholar]
- 50.Fisher, A., Rudin, C. & Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously [J]. J. Mach. Learn. Res. 20. 10.48550/arXiv.1801.01489 (2019). [PMC free article] [PubMed]
- 51.Bui, D. T. et al. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree [J]. Landslides13 (2), 361–378. 10.1007/s10346-015-0557-6 (2016). [Google Scholar]
- 52.Ma, S., Shao, X. & Xu, C. Landslides triggered by the 2016 heavy rainfall event in Sanming, Fujian province: distribution pattern analysis and Spatio-Temporal susceptibility assessment [J]. Remote Sens.15 (11), 24. 10.3390/rs15112738 (2023). [Google Scholar]
- 53.Luo, X., Li, C. & Zhou, J. GIS-based prediction method of shallow landslides induced by heavy rainfall in large mountainous areas [J]. J. Mountain Sci.21 (5), 1534–1548. 10.1007/s11629-023-8535-2 (2024). [Google Scholar]
- 54.Huang, F. et al. Uncertainties of landslide susceptibility prediction considering different landslide types [J]. J. Rock. Mech. Geotech. Eng.15 (11), 2954–2972. 10.1016/j.jrmge.2023.03.001 (2023). [Google Scholar]
- 55.Huang, F. et al. Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping [J]. CATENA, 191: 104580. (2020). 10.1016/j.catena.2020.104580
- 56.Takoutsing, B. & Heuvelink, G. B. M. Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors [J]. Geoderma428, 116192. 10.1016/j.geoderma.2022.116192 (2022). [Google Scholar]
- 57.Kuhn, S. et al. Identification of intrusive lithologies in volcanic terrains in British Columbia by machine learning using random forests: the value of using a soft classifier [J]. Geophysics85 (6), B249–B258. 10.1190/geo2019-0461.1 (2020). [Google Scholar]
- 58.Das, R. et al. Machine learning-based assessment of regional-scale variation of landslide susceptibility in central Vietnam [J]. PLOS ONE. 19 (10), e0308494. 10.1371/journal.pone.0308494 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Abbasi, M. et al. A hybrid of random forest and deep Auto-Encoder with support vector regression methods for accuracy improvement and uncertainty reduction of long-term streamflow prediction [J]. J. Hydrol.597, 125717. 10.1016/j.jhydrol.2020.125717 (2021). [Google Scholar]
- 60.Lin, Y-C. et al. Strong wind is one of the important factors that trigger landslides [J]. Npj Nat. Hazards. 2 (1), 12. 10.1038/s44304-025-00062-x (2025). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.
















