Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 21;15:13801. doi: 10.1038/s41598-025-97147-4

Improving generalization in slope movement prediction using sequential models and hierarchical transformer predictor autoencoder

Praveen Kumar 1,, Priyanka Priyanka 1, K V Uday 2, Varun Dutt 1
PMCID: PMC12012136  PMID: 40258904

Abstract

Predicting slope movement has become a great challenge, especially in the Himalayan region, as such natural hazards cause great damage. Machine Learning (ML) models can help in the prediction of landslide hazards. Despite the capabilities of ML models in predicting landslide hazards, most existing approaches are deficient in capturing changes in weather conditions at day, hour, or minute scales, thus affecting their accuracy in real-time scenarios. These models also generally have difficulties in generalizing predictions due to limited data availability, and they cannot frequently provide multi-step ahead predictions that are crucial for effective disaster preparedness and timely response. We introduced the hierarchical architecture ML model, specifically the hierarchical transformer prediction autoencoder (H-TPA), which is capable of predicting slope movement with high temporal resolution and enhanced generalization capabilities. This study was based on a rich dataset from sixty-four landslide locations over five years. In this work, we utilize 1,066,009 samples for the training set, which were balanced down to 23,328 samples in order to address class imbalance. The validation set contained 100,000 samples, while the test set was made up of 164,082 samples. This work also presents a VSA methodology for determining threshold values of environmental attributes that trigger slope movements. The performance evaluation of the H-TPA model using this dataset demonstrates very good performance with an F1 score of 0.889, 0.760, and 0.746 for the training, validation, and test datasets, respectively, in predicting slope movements 10 min in advance. Moreover, the present study focused on the analyses of weather condition factors and soil moisture affecting the landslide triggers, which indicated the role of temperature, humidity, barometric pressure, rainfall, and sunlight intensity in small or large slope movements according to certain threshold values. This study generally contributes to the present understanding and enhances the knowledge of landslide prediction in the Himalayan region, besides providing recommendations for geo-scientific knowledge improvement and mitigation strategies.

Keywords: Landslide, Monitoring, Hierarchical transformer, Variable sensitivity analysis, Environmental factors, Machine learning

Subject terms: Natural hazards, Computational science

Introduction

Natural hazards such as landslides negatively affect multiple parts of the world, as well as the Himalayan region. It causes more infrastructure damage, economic losses, disruption in communication, and loss of lives1. A warning about impending landslides can be generated by continuously monitoring using landslide monitoring systems (LMSs)2. Machine learning (ML) models could be used to predict landslides using LMS data3,4. However, the present ML models cannot be generalized because of their limited data availability and inability to capture multi-level changes in weather conditions5,6. Such models are often incapable of producing multi-step ahead predictions, which are critical in preparing for disasters7. Newer and more advanced ML models will need to be developed in order to address the current gaps.

Environmental factors and many other factors can influence and trigger slope movement8. However, such type of influence is yet to be investigated on a large scale, particularly in the Himalayan region. In the literature, researchers have developed sensitivity analysis (SA) techniques to understand the sensitivity of model outputs to changes in input features9. However, these methods did not focus on identifying specific threshold values but instead focused on how variations in input variables affect model predictions.

Several ML models have demonstrated their contributions to landslide prediction3,4,9,10. Recurrent neural networks (RNNs), such as Long Short-Term Memory (LSTM) networks, convolutional neural networks (CNNs), and transformer-based models, exhibit exceptional capabilities for handling sequential and spatial data. For example, in a study by Kumar et al., different LSTM model variants were developed to predict slope movement3. The authors demonstrated that the LSTM model is suitable for modeling slope movements. Furthermore, a study by Pei et al. developed a one-dimensional CNN model to predict slope movement in the Three Gorges Reservoir area (TGRA)10. Similarly, Xi et al. developed a transformer model to predict the slope movement at the Huanglianshu landslide in China11. A comparative analysis of the model characteristics suggests that the transformer model is better suited for predicting slope movements influenced by multiple factors. Liu et al. introduced a Hierarchical Transformer model to identify self-attention patterns at both local and high levels12. This model demonstrated impressive performance in the field of computer vision. However, this hierarchical architecture must be adapted to discover local and high-level self-attention patterns for slope movement prediction, which is yet to be undertaken in the literature. Landslide sequential datasets exhibit longer dependencies, with higher-level features influenced by environmental factors over extended periods and lower-level features capturing short-term dependencies, such as current weather conditions. Several hierarchical architectures of ML models for finding temporal dependency have been developed in previous work. DifFormer employs a hierarchical transformer architecture with a neural differencing attention mechanism for time series forecasting13. In order to efficiently capture temporal patterns, it focuses on dynamically differencing the input time series data. Additionally, for time series forecasting, the Crossformer model employs a hierarchical encoder-decoder architecture with segment merging, temporal self-attention layer, and deep shallow wide (DSW) embedding14. It focuses on employing hierarchical constructs to synthesize forecasts on several scales. Neural Hierarchical Interpolation then extends the neural basis expansion analysis approach (N-BEATS) for Time Series Forecasting (NHITS), which combines neural basis approximation, multi-rate signal sampling, and hierarchical interpolation for forecasting applications15. Local nonlinear projections onto basis functions across several blocks are highlighted. While Sharma and Diwakar noted that CNNs have trouble accurately modeling fine-grained temporal dependencies16, Saha et al. found poor generalization in imbalanced datasets17. Addressing data scarcity, Firoozi et al. stressed the challenge of overfitting in limited datasets18, while Lu et al. and Nagarani et al. pointed out the lack of interpretability regarding feature importance and environmental thresholds19,20. Fang et al. demonstrated that advanced architectures21, such as stacking ensemble frameworks combining CNN and RNN, improved spatial predictions. However, these models showed limitations in capturing temporal dependencies and achieving accurate multi-step forecasts. For temporal modeling, recent studies such as Nava et al. and Yuan et al. investigated architectures like Conv-LSTM and ST-Transformer. However, they failed with interpretability and were unable to conduct granular environmental threshold analysis22,23. While Wang et al. integrated SAR-derived data with LSTMs to enhance anomaly identification in time-series predictions24, Khalili et al. demonstrated the superiority of Transformer models in integrating spatial-temporal relationships25. While Zhang and Wang, as well as Xu et al. advanced spatiotemporal Transformers by integrating CNNs and LSTMs, these models failed to provide critical insights into the temporal significance of influential environmental factors26,27. Additionally, Kuang et al. and Li et al. demonstrated improved multi-step prediction capabilities with Transformer-enhanced networks but fell short of providing actionable intelligence through environmental variable analysis28,29. Despite the improvements in ML models for predicting landslides, there are some limitations. One of them is the unbalanced data set, which gives the ML models a high chance of overfitting and poor generalization. Furthermore, essential environmental factor thresholds for landslide triggering are not interpretable. Attention mechanisms often fail to capture key temporal periods, and region-specific optimizations hamper scalability over a variety of terrains. Furthermore, these models’ actual applicability in early warning systems is diminished by their inadequate integration with real-time sensor data. Improving the landslide prediction models’ interpretability, accuracy, and robustness requires addressing these issues. In particular, our work addresses these problems by using a new method for environmental threshold analysis and temporal modeling.

In this study, we created a temporal CNN, LSTM, Transformer, and a unique architecture called the Hierarchical Transformer Prediction Autoencoder (also known as the H-Trans Predictor Autoencoder or H-TPA) to fill the literature gaps. Furthermore, a novel variable sensitivity analysis (VSA) technique was developed to identify the threshold of environmental factors that cause the slope movement. The attention-based mechanism from the transformer model was used to identify the period of the influencing factor. To address this gap, we used five years (2018–2023) of data from sixty-four landslide locations in the Himalayan region. This study represents the first of its kind in the Himalayan region, conducted across sixty-four landslide locations. The findings of this study provide a significant addition to the theoretical and practical elements of slope movement analysis, with implications not only for landslide prediction but also for the geosciences’ comprehension of intricate geological phenomena.

Method

Study area

Data for our study was gathered from 64 LMSs in the districts of Mandi, Kinnaur, Kangra, Sirmour, and Solan in Himachal Pradesh. Because of their steep slopes, weak geology, and heavy rainfall, these districts are among the most prone to landslides in the Indian Himalayan region3034. Because of the high frequency of seismic occurrences and rainfall-induced landslides, the area is particularly vulnerable to both shallow and deep-seated slope movements32. The monsoon season mostly saturates the soil31. Thus, soil shear strength is drastically reduced, leading to slope failures. The interplay of topographical, geological, and hydrological factors contributes to the frequent occurrence of landslides, threatening infrastructure, agriculture, and human settlements30,31.

Himachal Pradesh is well-known for its high vulnerability to landslides, with both shallow and deep-seated landslides occurring often throughout its districts. According to Verma and Khanduri, the districts of Mandi and Kinnaur commonly experience shallow translational landslides, which are usually caused by heavy, short-duration rainfall events30. These landslides are often caused by soil saturation, reducing slope stability and resulting in mass movement events. Srivastava et al. reported that because of the abundance of loose soil deposits and heavy rainfall during the monsoon, the Solan district is prone to debris flow-type landslides31. Sarkar and Paul stated that due to its geological complexity and tectonic setting, deep-seated as well as shallow landslides are common in districts like Kangra and Sirmour32. In all such regions, human activities such as unauthorized development and deforestation boost the possibility of a landslide. Kumar et al. further showed that rainfall intensity and duration are contributing factors in landslides in Mandi and Solan districts33. The study showed that poor consolidation of sedimentary rock formations with high-intensity rainfall conditions predisposes these areas to frequent slope failures. In the districts of Kinnaur and Kangra, Borthakur and Singh highlighted the potential traditional ecological knowledge has to reduce the risk of landslides. They recommended sustainable practices like controlled deforestation and engineering methods for slope stabilization34.

Types of landslides observed in the study area.

  • Shallow Translational Landslide-one of the major slides common in districts like Mandi, Kinnaur, and Solan, which normally are triggered by high-intensity and short-duration rainfall30.

  • Deep-seated landslides in the Kangra district are primarily caused by tectonic movements and long-term geological processes32.

  • Frequent debris flows are observed in the Sirmour and Solan districts, as steep slopes with loose sediment favor the rapid downhill flow of soil and rock31.

  • All districts experience rainfall-induced landslides. The slope failure is significantly influenced by the amount and frequency of rainfall33.

Figure 1 shows the study area and monitoring setup. Panel (a) depicts Himachal Pradesh on a map of India, highlighted in green. Panel (b) depicts the 64 monitoring stations installed in Himachal Pradesh’s Mandi, Kinnaur, Kangra, Sirmour, and Solan districts. Panel (c) shows field photos of LMSs installed on crucial slopes. The LMS contains weather sensors, a rain gauge, and a soil node. Weather sensors measure temperature, humidity, air pressure, and sunlight intensity. The rain gauge measures the precipitation levels in the area. The soil node is outfitted with an accelerometer and a soil moisture sensor to detect ground movement and moisture content. The map was generated using ArcGIS Pro 3.1.035.

Fig. 1.

Fig. 1

(a) Study area location on the India Map. (b) Sixty-four landslide locations in Himachal Pradesh, Himalayan region. (c) LMSs installations at select landslide sites. Map generated using ArcGIS Pro 3.1.0 (Esri, https://www.esri.com)35.

Data and pre-processing

Several LMSs have been deployed in the Himalayan region to study slope movement and its correlation with environmental factors2. These systems collect data every 10 min and transmit it to the cloud. The LMSs recorded environmental factors, including temperature (°C), humidity (%), sunlight intensity (lx), barometric pressure (Pa), and rainfall (mm/hr), as well as soil parameters, such as soil acceleration in the x, y, and z directions (m/s2), angular rotation in the x, y, and z directions (°/s2), and soil moisture (%).

Rainfall is considered the main factor causing landslides in the Himalayan region, where the increase in soil water content reduces the stability of slopes36. Excessive rainfall infiltrates the soil, increasing pore water pressure and reducing the shear strength of the soil, hence causing instability. Sudden temperature changes cause the soil to expand and contract, which leads to the formation of cracks37. These cracks weaken the integrity of the soil over time. Eventually, water infiltrates through these cracks and further reduces the stability of the slope. High humidity adds to the moisture content in the soil, which can trigger movements in the slope. When high humidity occurs along with rainfall, it prolongs soil saturation and significantly increases the likelihood of slope failure38. Rainfall directly raises the content of water in the soil, further influencing the dynamics of slope movement. The barometric pressure can have a subtle effect on the subsurface water circulation. During low atmospheric pressure, the soil retains water for a longer time, increasing the possibility of slope movements39. Sunlight exposure determines the rate of evaporation of moisture in the soil. During conditions of low sunlight, evaporation is reduced, and water remains trapped in the soil for more extended periods, increasing susceptibility to landslides40. Soil moisture is the amount of water content at any instant of time in the soil and can be considered as one of the important parameters for slope behavior prediction and possible failure40.

Figure 2 illustrates the subplots of 12 essential features of environmental factors, including temperature, humidity, barometric pressure, rainfall, sunlight intensity, motion data, and soil moisture. The motion data is represented by acceleration (in x, y, and z direction), and angular rotation (in x, y, and z direction).

Fig. 2.

Fig. 2

Time-series plots illustrating the trends of 12 environmental and motion-related features. Each subplot represents the temporal behavior of a specific feature across the sampled data points, providing a detailed visualization of its dynamics.

The data from 44 random landslides were selected for training, which contained 1,066,009 data points. The data from 10 landslide locations were selected for the validation set, which contained 100,000 data points. The data from the remaining 10 landslides were selected for the testing set, which contained 164,082 data points. In our dataset, a data point is defined as a single observation consisting of one instance characterized by 13 distinct features. Any missing values were replaced by the values recorded by the nearest LMS at the same time.

Studies analyzing sensor noise characteristics have shown that accelerometers generate noise within ± 1 standard deviation (SD) under steady-state conditions41,42. Therefore, in the present study, we classified the ± 1 SD level variations as ‘no movement,’ which separated the sensor noise from the actual movements of the soil. Thresholds of standard deviation-such as ± 1 SD, ± 2 SD, and > ± 2 SD, have been widely used for reliable data classification in a number of fields4145. We, therefore, classify the soil movement in this work with variations in soil acceleration and angular rotation along the X, Y, and Z axes. Each observation had three acceleration values and three angular rotation values for a total of six parameters. We used the standard deviation of the differences between successive measurements for the classification of movement.

  • No movement: Changes within ± 1 SD in any one of the six parameters were designated as ‘no movements.’

  • Small movement: Alterations higher than ± 1 SD but within ± 2 SD of any of the six parameters were considered as ‘small movements.’

  • Large movement: Changes greater than ± 2 SD in any of the six parameters were considered ‘large movements.’

In our study, we focused on a multi-class classification problem, where our target variable has three labels: ‘no movement,’ ‘small movement,’ and ‘large movement.’ The dataset consisted of 13 features, 12 of which were directly extracted from LMSs, and the 13th was the historical slope movements at various timestamps.

The training dataset suffered from class imbalance. It consisted of 1.06 million samples, but almost 1 million samples were from the ‘no movement’ class. As shown in Fig. 3a, the ‘small movement’ class had 27,236 samples. The ‘large movement’ class had only 7,776 samples. To balance all classes, we randomly selected 7,776 samples from each class. Similarly, the distribution of class samples in the test and validation datasets is shown in Fig. 3b and c, respectively. The test and validation datasets were then drawn directly from the LMSs to reflect the performance of the ML model in real-life conditions.

Fig. 3.

Fig. 3

(a) Distribution of train data movement classes. (b) Distribution of validation data movement classes. (c) Distribution of test data movement classes. (d) construction of the input packets for 10, 30, and 60 min ahead prediction.

The input packets were created from the dataset with 144 sequence lengths. The 144 sequence length is equal to 24 h (144 × 10 min = 24 h), as data were recorded at 10 min intervals. The current time steps could predict 10, 30, and 60 min ahead. For that, we selected the sequences 10, 30, and 60 min prior. The construction of the input packets is shown in Fig. 3d. For example, to predict Inline graphic, we selected the sequence of data from Inline graphicto Inline graphic.

Table 1 provides the ranges in minimum (min) and maximum (max) values, mean values, and SD of key environmental, motion-related (acceleration and angular rotation), and soil moisture attributes recorded at 64 landslide-prone locations in Himachal Pradesh. Units for each attribute are specified in parentheses.

Table 1.

Overview of the Himalayan dataset with attribute descriptions, value ranges, and statistical summaries.

Attribute Min Max Mean SD
Temperature (°C) 6 41 23.6 5.29
Humidity (%) 30 99 94.71 10.92
Barometric pressure (Pa) 897.35 1075.6 1012.06 14.01
Rainfall (mm/hr) 0 326.25 0.67 7.39
Sunlight intensity (lx) 0 6344 540.88 829.52
Acceleration in x-axis (Ax in m/s2) − 10 9.96 0 0.1
Acceleration in y-axis (Ay in m/s2) − 3.68 3.68 0 0.3
Acceleration in z-axis (Az in m/s2) − 2.21 2 0 0.07
Angular rotation in x-axis (Gx in °/s2) − 10 10 0 0.3
Angular rotation in y-axis (Gy in °/s2) − 9 10 0 0.13
Angular rotation in z-axis (Gz in °/s2) − 9.02 9 0 0.1
Soil moisture (%) 0 100 48.71 42.96

This study considers a total of 64 landslide locations, which are actually distributed across one of the most varied regions of the Himalayas, even within small distances. Thus, these locations were intentionally chosen to capture the region’s spatial and environmental diversity. In this respect, the integration of data from such various locations ensures that the ML models are trained using a heterogeneous dataset. This approach allows the models to identify universal patterns while improving their ability to generalize across diverse environmental conditions. While topographic and geological variability complicates the dataset, it provides complexity to the models for learning generally applicable slope movement patterns that are robust. Again, adding external data on rainfall, humidity, temperature, barometric pressure, and sunlight intensity will ensure consistency in the environmental parameters analyzed at all sites. Such a balanced strategy applies local characteristics of the site together with regional characteristics of the environment thereby significantly increasing the generalization ability of the model.

Despite the anomalies posed by certain site-specific faults, this work can be seen as a creative effort to incorporate both spatial and temporal dimensions into a single prediction framework. This methodology enhances not only model robustness but also lays the foundation for scalable landslide prediction systems that can be applied to other geographically diverse regions.

ML models

Transformer

The transformer model is a state-of-the-art model introduced by Vaswani et al. for sequence-to-sequence prediction46. First, the transformer model uses the input as the input embedding and transforms the input dimension into the model embedding size. Next, the model does not consider the order of the sequence in the inputs, and the positional encoding layer adds the relative positional relationship in the input sequence. The core part of the model is self-attention, which is introduced by the three query, key, and value vectors. These three vectors are linear projections of the input embedding. The dot product of the Query and Key vectors in the self-attention mechanism was used to measure the similarity between each input embedding at different positions. Similarity calculations are essential for the model to identify long-term dependencies in the input sequence. This helps the model to understand how distant elements in the sequence are related by evaluating the similarity between embeddings. The output of the self-attention mechanism was the attention score between each input embedding.

Furthermore, this attention score is used to weigh the Value vectors, producing a contextual representation for each element that incorporates information from the other elements. Additionally, the model uses a multi-head attention mechanism to find many sets of contextual representations. In this mechanism, the self-attention process is applied numerous times in parallel, each with its own set of Query, Key, and Value transformations, to capture various types of relationships and information. After multi-head attention, position-wise feed-forward networks (PWFFN) were applied independently to each element’s contextual representation to introduce non-linearity. The PWFFN consisted of two linear layers with an expansion factor of 2. This expansion factor doubled the input size, regardless of the initial dimensions. A ReLU activation function followed this process; subsequently, the next linear layer reduced the dimension back to its original size. Furthermore, residual connections and layer normalization were used to stabilize the training, and dropout was used for better regularization.

The classifier was applied to the output layer to perform classification. First, the pooling layer aggregated the information from all 144 sequences. Next, a dense layer transformed the sequence into a fixed-size representation. Finally, a SoftMax activation function converted the model’s logits into a probability distribution over the target classes.

H-TPA

The H-TPA model employs an encoder-decoder architecture with compressed latent code in between. An input, Inline graphic, passes through the encoder to compress down into eight-dimensional latent code, while the decoder afterwards uses this latent code as the input to construct the next packet at Inline graphicof the sequence and thus can also work as a predictor autoencoder. Accordingly, the predictor autoencoder effectively filtered the noise from the data and thereby reduced the chances of overfitting. It encapsulates critical features and patterns in the latent code of input data that offers relevant information on predicting future values or sequences. As visible from Fig. 4, the encoder and decoder are hierarchically predisposed.

Fig. 4.

Fig. 4

The architecture of the H-TPA, illustrates the encoder, decoder, and classifier components.

H-TPA encoder

The encoder’s input (Inline graphic) dimension is 144 × 13, where 144 represents the number of time steps, and 13 represents the number of features. Then, this input is provided to the embedding layer, which transforms it to 144 × 512, where 512 is the dimension of the transform’s head. Later, the positional encoding layer encodes the data in relative positioning. The encoder has a three-level hierarchy; the first one is the minute level, which finds the relationships in minute-level data. Next, the output is then provided to the PWFFN, which applies layer normalization, dropout, and residual connection.

Furthermore, the output from the previous layer is split into multiple segments, where each segment contains six sequences, which equals one hour of information. The second level hierarchy is the hour-level, which uses these segments and captures the high-level features by using a pooling layer. The output dimension of this level is 24 × 512, where 24 represents the number of segments. The third level of the hierarchy is the day-level, where the transformer captures the information from the 24 segments and finds the day-level change pattern in the data. The final layer of the encoder is a linear layer, which compresses the data into 8-dimensional latent code (Inline graphic) with a shape of 24 × 8.

H-TPA decoder

The input to the decoder is latent code (Inline graphic) with a dimension of 24 × 8. A linear layer transforms this latent code dimension to 24 × 512, which is equivalent to the transformer’s head dimension. The decoder also has a three-level hierarchy, including day-level, hour-level, and minute-level hierarchy. First, the day-level transformer splits the 24 × 512 dimension into 24 segments, each of which is 1 × 512 in shape. Next, the transpose CNN with six channels regenerates the information and transforms the data from 1 × 512 to 6 × 512 dimension. The hour-level transformer at the next level takes this input and reintroduces the relationships within each 6 × 512 data. The output dimension of the hour-level transformer is 24 × 6 × 512, which is reshaped into the 144 × 512. The minute-level transformer reintroduces the relationships within 144 × 512 dimensional data. The last layer is the linear layer with a ReLU activation function to predict the next data (Inline graphic).

H-TPA classifier

The classifier model is also a transformer model with a 128-dimensional head. It receives input from the encoder (Inline graphic). A pooling layer is applied after the transformer layer to aggregate the information into 1 × 128 dimensions. The last layer is a linear layer with a ReLU activation function. The SoftMax activation function is applied to predict the three slope movement classes.

Final H-TPA model

For the final model, the encoder and classifier were connected directly, where the decoder was discarded. Input would be supplied to the encoder to transform it into latent code, with a classifier predicting the class of movement from this latent code.

Temporal CNN

A temporal CNN was adopted to process the sequential time-series data47. Herein, a temporal CNN comprising 13 input channels representing 13 input features is developed. Convolving kernels over 144 sequences, with a max-pooling layer, thereafter, flattens the output of the temporal CNN. The flattened output is connected to a classifier comprising two linear layers with ReLU activation functions inter-shelved. SoftMax activation was applied to the final layer that consisted of three neurons for predicting movement classes.

LSTM

The LSTM is an RNN model used for sequential data48. In this work, an LSTM model was developed consisting of two stacked LSTM layers. Following that, the output of the LSTM was fed to a pooling layer that aggregates over 144 sequences and connected to a classifier. The classifier consists of two linear layers with ReLU activation functions in between. SoftMax activation was used on the final layer, consisting of three neurons for predicting the movement classes.

VSA

In the VSA method, we applied a step-by-step process to find the threshold of the weather parameters, as shown in Fig. 5. A trained ML model fixed the decision boundary, and we needed to estimate it. A decision boundary is a point in the feature space49. The point near the decision boundary is the threshold value of the feature that belongs to a class. The variation in feature values could shift a class sample toward the decision boundary, and after a threshold, this class sample may be misclassified as another class. Systematically varying the feature values and monitoring the value where one class sample is classified to another class could estimate the decision boundary. In this analysis, we varied feature values by Gaussian distribution with mean (Inline graphic) and variance of 1 to simulate natural variability in real-world scenarios50. The initial mean of the distribution was the observed class mean. In the forward direction, the mean value increased by 0.05, and in the reverse direction, the mean value decreased by 0.05. We recorded the number of misclassifications in each iteration. Whenever 80% of class samples were misclassified to another class, we calculated the threshold values for the features by averaging their values spanning over 144 time steps. The same analysis was also done in the reverse direction for each class. The final decision boundary was approximated by averaging the threshold values obtained in both forward and reverse directions. For example, in the forward direction, the no-movement class samples may be misclassified as low or high movement, and in the reverse direction, high-movement class samples may be misclassified as no or low movement.

Fig. 5.

Fig. 5

Flowchart of the VSA process.

Attention-based analysis of the influence of weather parameters on slope movement

In this study, the attention mechanism of the model was utilized to analyze the influence of weather parameters, precisely temperature, humidity, pressure, rainfall, and sunlight, on slope movement prediction51. Because the attention mechanism divides the model’s emphasis among input features based on their relevance to the predicting task52, greater attention scores suggest a closer association between the feature and the model’s prediction process53. In this way, attention ratings act as a direct reflection of the model’s perceived significance of each weather parameter, demonstrating which variables have the most influence or association with slope movement prediction. By analyzing which features consistently receive high attention, we can infer that the model is finding those features more relevant for its predictions. The following steps were followed to evaluate the importance of each parameter.

First, the attention score matrix produced by the model was used to evaluate its focus on different input features across all time steps. Batch size, number of heads, sequence length (144), and number of characteristics (13) are the dimensions of the attention score matrix. Five features relate to weather conditions, while the remaining features relate to movement parameters.

The attention scores for every feature were averaged across all attention heads and time steps in order to measure the impact of each weather parameter54. As a result, we were able to determine each parameter’s overall relevance score. The model places greater emphasis on a feature with a higher average attention score, suggesting that it is more dependent on that parameter for predicting slope movement. A higher attention score indicates that the model is more focused, which implies that the model depends more on that specific feature to predict slope movement53.

Next, we examined the temporal aspect of the attention process, identifying the exact time steps within the 144-step sequence where the model focused its attention the greatest52. We identified the time step with the highest average attention score, emphasizing the point in the sequence that had the most impact on the model’s prediction for each class53.

This attention-based method provided a structured way to evaluate both the importance of individual weather parameters and the critical moments in the time series that the model relied on most heavily across all three classes51. These analyses were performed to gain insights into the interaction between environmental factors and slope movement in each class and to identify key features and time steps relevant to the prediction process for different slope movement categories.

Hyperparameters of the ML models

Table 2 lists the various hyperparameters and the ranges in which they varied during this research. The number of transformer layers, heads, expansion factors, and dropout rates, as well as the embedding dimensions of the transformer model, which varied from 64 to 1024, can all be changed to offer architectural flexibility. The tanh, ReLU, and GELU are some activation functions that were tried with methods of pooling (max and average) over each output. For the H-TPA model, the autoencoder loss was used as a mean squared error (MSE). The latent dimension was varied as 5, 8, and 10. For the temporal CNN model, this study explores the number of convolution layers, kernel size, the number of kernels, and dropout rate using all max pooling. Finally, the LSTM model also explores the number of LSTM layers, hidden size, pooling type (average and max), and dropout. Among them, some common configurations include the fixed sequence length, which was set as 144. Three distinct activation functions, including tanh, ReLU, and GELU, were compared, and the learning rate was set at 1e-5. The Adam was applied as an optimizer throughout. Batch sizes explored included 256, 512, and 1024; dropout varied between 10% and 20%, while the loss function used was cross-entropy for all models. These hyperparameters will give a far-reaching exploration of the model architectures and configurations toward the objectives of this study.

Table 2.

Hyperparameters of the transformer, H-TPA, Temporal CNN, and LSTM models.

Model Hyperparameter Range
Transformer Embedding dimension 64, 128, 256, 512, 1024
Transformer layers 1, 2, 3
Number of heads 1, 4, 8, 16, 32
Expansion factor 1, 2, 3
Pooling Average, max
H-TPA Transformer embedding dimension 64, 128, 256, 512, 1024
Classifier embedding dimension 64, 128, 512
Transformer layers 1, 2, 3
Number of heads 1, 4, 8, 16, 32
Expansion factor 1, 2, 3
Latent dimension 5, 8, 10
Pooling Average, max
Autoencoder loss MSE
Temporal CNN Number of convolution layers 1, 2, 3
Pooling Max
Kernel sizes 3 × 3
Number of kernels 32, 64, 128, 512
LSTM Number of LSTM layers 1, 2, 3
Hidden size 50, 100, 200, 500
Pooling Average, max

Performance measure

The Himalayan dataset contains a significant class imbalance, and relying solely on accuracy may be misleading. Precision tells something about how exact the positive predictions are, while recall is about the share of actually positive cases correctly detected. The F1 score is the harmonic mean of precision and recall.

Figure 6 depicts the flow chart describing the workflow that predicts slope movement using the H-TPA model. The workflow, therefore, follows the order of data collection and pre-processing, then model selection: H-TPA. After model selection, the workflow has two major ways: slope movement prediction and selection of analysis type. This analysis path further includes two very important approaches: (1) attention-based analysis of weather parameters, and (2) the environmental attribute threshold analysis, considering a variable sensitivity analysis step within it. The results of this workflow would point out the relationship between weather parameters and model predictions, along with the identification of critical environmental attribute thresholds.

Fig. 6.

Fig. 6

Workflow for slope movement prediction using the H-TPA model.

Results

Performance of ML models for slope movement predictions

Table 3 shows the performance of H-TPA, Transformer, LSTM, and temporal CNN on the training, validation, and testing datasets. During training, the Transformer models had the highest F1 score, 0.920, for both Transformer-10 and Transformer-60. The next best model was the LSTM-10 (F1 score = 0.908) in training. On the validation dataset, the H-TPA-10 (F1 score = 0.706) model was the best-performing model. The next best F1 score was for the Transformer-10 with 0.603. In the test dataset, the H-TPA-10 (F1 score = 0.746) model was the best model. The H-TPA-30 and H-TPA-60 also performed well, with F1 scores of 0.619 and 0.623, respectively.

Table 3.

Performance measure of the ML models in training, validation, and testing dataset.

Model Train
Accuracy Precision Recall F1 score
H-TPA-10 0.890 0.915 0.890 0.889
H-TPA-30 0.875 0.897 0.875 0.872
H-TPA-60 0.858 0.888 0.858 0.852
Transformer-10 0.920 0.921 0.920 0.920
Transformer-30 0.915 0.915 0.915 0.915
Transformer-60 0.921 0.920 0.921 0.920
LSTM-10 0.908 0.909 0.908 0.908
LSTM-30 0.836 0.844 0.836 0.835
LSTM-60 0.834 0.839 0.834 0.833
Temporal CNN-10 0.850 0.849 0.850 0.849
Temporal CNN-30 0.869 0.868 0.869 0.868
Temporal CNN-60 0.842 0.841 0.842 0.841
Validation
H-TPA-10 0.966 0.686 0.877 0.760
H-TPA-30 0.942 0.548 0.864 0.624
H-TPA-60 0.943 0.548 0.848 0.622
Transformer-10 0.881 0.529 0.906 0.603
Transformer-30 0.860 0.473 0.895 0.536
Transformer-60 0.864 0.470 0.905 0.533
LSTM-10 0.862 0.490 0.899 0.557
LSTM-30 0.853 0.444 0.829 0.494
LSTM-60 0.831 0.439 0.826 0.483
Temporal CNN-10 0.762 0.424 0.832 0.446
Temporal CNN-30 0.782 0.429 0.850 0.457
Temporal CNN-60 0.748 0.417 0.827 0.432
Test
H-TPA-10 0.971 0.670 0.872 0.746
H-TPA-30 0.953 0.542 0.862 0.619
H-TPA-60 0.955 0.548 0.848 0.623
Transformer-10 0.902 0.522 0.916 0.600
Transformer-30 0.887 0.471 0.910 0.538
Transformer-60 0.889 0.466 0.915 0.533
LSTM-10 0.890 0.484 0.903 0.554
LSTM-30 0.886 0.443 0.829 0.498
LSTM-60 0.870 0.438 0.828 0.488
Temporal CNN-10 0.807 0.423 0.850 0.453
Temporal CNN-30 0.824 0.428 0.866 0.465
Temporal CNN-60 0.794 0.415 0.844 0.439

Model hyperparameter optimization

Table 4 presents the optimized hyperparameters of the H-TPA, Transformer, LSTM, and temporal CNN models. The Transformer model had the embedding dimension of 512 with 16 heads, while ReLU was used as the activation function. The H-TPA model was also configured similarly to the Transformer model but employed a single-layer transformer with an expansion factor of two. Besides, the latent dimension in the H-TPA model was 8. The Temporal CNN model was configured with two 3 × 3 dimension convolution layers. The first layer had 128 kernels, while the second one had 64 kernels. The LSTM model had a tanh activation function and two layers with 100 hidden units each.

Table 4.

Optimized hyperparameters of the transformer, H-TPA, temporal CNN, and LSTM models.

Model Hyperparameter Range
Transformer Embedding dimension 512
Transformer layers 2
Number of heads 16
Expansion factor 2
Pooling Max
Activation function ReLU
H-TPA Transformer embedding dimension 512
Classifier embedding dimension 128
Transformer layers 1
Number of heads 16
Expansion factor 2
Activation function ReLU
Latent dimension 8
Pooling Max
Autoencoder loss MSE
Temporal CNN Number of convolution layers 2
Pooling Max
Kernel sizes 3 × 3
Number of kernels 128, and 64 in both layers, respectively
Activation function ReLU
LSTM Number of LSTM layers 2
Hidden size 100
Pooling Max
Activation function tanh

t-SNE visualization of slope movement classes

Figure 7 depicts the t-distributed Stochastic Neighbor Embedding (t-SNE) visualization of slope movement classes (‘no movement,’ ‘small movement,’ and ‘large movement’) for the predictions at 10, 30, and 60 minutes. Panels (a), (b), and (c) show the original data distributions, while panels (d), (e), and (f) show latent distributions from the H-TPA models during training. Latent distributions of the test dataset are shown in panels (g), (h), and (i) from all classes. The t-SNE plots were generated using Python’s Matplotlib 3.7.0 library55.

Fig. 7.

Fig. 7

t-SNE visualization of data distributions across prediction intervals. (a) t-SNE plot showing the original data distribution for predictions 10 min ahead. (b) t-SNE plot showing the original data distribution for predictions 30 min ahead. (c) t-SNE plot showing the original data distribution for predictions 60 min ahead. (d) t-SNE plot of the latent distribution during training for the H-TPA model with a 10-minute prediction window. (e) t-SNE plot of the latent distribution during training for the H-TPA model with a 30-minute prediction window. (f) t-SNE plot of the latent distribution during training for the H-TPA model with a 60-minute prediction window. (g) t-SNE plot of the latent distribution during testing for the H-TPA model with a 10-minute prediction window. (h) t-SNE plot of the latent distribution during testing for the H-TPA model with a 30-minute prediction window. (i) t-SNE plot of the latent distribution during testing for the H-TPA model with a 60-minute prediction window. Figures generated using Matplotlib 3.7.0 (Hunter, J. D., https://matplotlib.org/)55.

Environmental attribute thresholds for slope movement: insights from VSA

Table 5 presents the threshold values derived by VSA that delineate small from large slope movements. These thresholds provide an indication of the environmental conditions that control slope movement.

Table 5.

Threshold values by VSA are used to predict slope movement based on environmental attributes.

Attribute Threshold
Small movement Large movement
Temperature 28 °C 34 °C
Humidity 71% 13%
Barometric pressure 950 Pa 1057 Pa
Rainfall 97 mm/hr 212 mm/hr
Sunlight intensity 634 lx 3806 lx

Attention-based analysis of weather parameters on slope movement

Table 6 presents the attention scores and their ranking with respect to the most relevant weather parameters-temperature, humidity, pressure, rainfall, and sunlight-in determining slope movement with respect to three classes: Class 0 (no movement), Class 1 (low movement), and Class 2 (high movement). The heatmaps of attention scores across 144 time steps for each class of movement are presented in Figs. 8, 9 and 10, where rainfall and pressure emerge as the major factors contributing to slope instability.

Table 6.

Attention scores and rankings of weather parameters across three slope movement classes indicate the model’s focus on each parameter for forecasting.

Parameter Class 0 Class 1 Class 2
Attention score Rank Attention score Rank Attention score Rank
Temperature 0.008 2 0.007 2 0.0018 3
Humidity 0.009 1 0.006 3 0.0021 2
Barometric pressure 0.007 3 0.005 4 0.0018 4
Rainfall 0.005 4 0.008 1 0.0022 1
Sunlight 0.004 5 0.005 5 0.0014 5

Fig. 8.

Fig. 8

Heatmap of attention scores for weather parameters (temperature, humidity, pressure, rainfall, and sunlight) over 144 time steps for Class 0 (no movement).

Fig. 9.

Fig. 9

Heatmap of attention scores for weather parameters across 144 time steps for Class 1 (low movement).

Fig. 10.

Fig. 10

Heatmap of attention scores for weather parameters across 144 time steps for Class 2 (high movement).

Figures 8 and 9, and 10 represent the attention scores of important weather parameters for 144 time steps corresponding to Class 0 (no movement), Class 1 (low movement), and Class 2 (high movement), respectively. Each time step corresponds to 10 min, amounting to a total of 24 h across all 144 steps. The following figure represents attention scores assigned to different weather parameters - temperature, humidity, pressure, rainfall, and sunlight - at 144 time steps. This indeed implies time steps on the x-axis; each step defines a temporal interval. The parameters around which the weather would be described form the entries along the y-axis. Lastly, note the color scale on the right-hand side depicting values in the low-high range of the attention score, shown in lighter (yellow) and darker (purple) shades for the higher and lower scores, respectively. The gradient indeed reflects the dynamic variation in the model’s attention towards weather parameters at different times and highlights periods of high significance.

Discussion

This work systematically investigated several ML models, including the H-TPA, Transformer, LSTM, and Temporal CNN models, using training, validation, and testing datasets that are used to evaluate these performances for slope-movement predictions. The results showed distinctly apparent differences in the performance among models, especially related to identifying environmental attribute thresholds or important weather parameters. This section interprets the findings and discusses them, outlining the key insights drawn from the results, model performances, and parameter contributions toward slope movement prediction.

Section “Performance of ML models for slope movementpredictions” also reveals that the H-TPA-10 model was at the top in the testing datasets with respect to the slope movement prediction performance, while all the models suffered from performance degradation when being applied to unseen data, indicating intrinsic generalization challenges. Despite this fact, H-TPA-10 managed to retain its top rank across all three datasets, thus demonstrating its effectiveness in generalizing and accurately predicting slope movements over different periods of time.

While other models, such as Transformer models, had very high performance, it is obvious that the best overall performance, considering especially unseen data, was manifested by H-TPA. This underlines the strong generalization ability of H-TPA and, thus, its good performance in the slope movement forecast, placing it as a promising model for practical applications and real-world scenarios. These results emphasize the capabilities of models to estimate slope movements with considerable accuracy, in particular, up to 10–60 min in advance. The critical strength of the H-TPA model in investigating the multilevel temporal dependencies of minute, hour, and day-level variations turned out to be a decisive factor in the comprehension of short- and long-term fluctuation in environmental attributes. This multilevel temporal analysis enables the model to learn both short-term fluctuations and long-term trends, which are crucial for the identification of early warning signals and risks of long-term slope instability. The inbuilt CNN layers are good at finding spatial patterns in the data, thus allowing the model to identify localized features and regional dependencies, which are crucial for accurate slope movement predictions across diverse terrains.

In addition, the structure of the autoencoder is important to denoise the input data while maintaining quality in the feature extraction and representation in the latent space. Noise reduction and enhancement of meaningful signals can be achieved through an autoencoder by increasing the model’s ability to pay more attention to the most critical features.

The combination of these temporal, spatial, and denoising capabilities enables H-TPA to generalize better on unseen datasets, enhancing robustness and reliability in predicting slope movements under various environmental conditions. The selection of 64 landslide monitoring stations across diverse regions was effective in capturing the spatial and environmental variability of the area, including variations in elevation, geological composition, and microclimatic conditions. The results show that the H-TPA model has generalized well over these different conditions and performed better than the other architectures. Figure 7 shows how H-TPA transformed the input data into a latent distribution where each class is distinctively separated, which can be one more reason for better classification. However, it can also be noticed that a few data points lie within the incorrect class distribution, which leads to misclassification.

Section 3.2 emphasizes the environmental threshold obtained from VSA, where temperature, humidity, barometric pressure, rainfall, and sunlight showed clear threshold values that differentiated small and large slope movements. The threshold of temperature for small and large movements was 28 °C and 34 °C, respectively, which suggests that temperature is an important factor in soil stability. Thus, higher temperatures might have been a contributing factor to the increase in slope movement. The humidity thresholds of 71% for small movements and 13% for large movements may indicate that the likelihood of larger slope movements is related to lower levels of humidity. The barometric pressure thresholds of 950 Pa for small movements and 1057 Pa for large movements show the sensitivity of soil stability with regard to changes in atmospheric pressure. Rainfall thresholds of 97 mm/hr for small movements and 212 mm/hr for large movements underlined the great importance of precipitation intensity in landslide events. In the end, sunlight intensity thresholds of 634 lx for small movements and 3806 lx for large movements allowed us to assess that changes in light conditions could become one of the factors affecting soil stability. Generally, these results show that larger probabilities of large movements exist while dry weather follows from heavy rainfall. In contrast, normal weather with rainfall of more than 97 mm/hr indicates a high probability of small movements.

Among all the attention-based analysis results of Sect. 3.3, humidity is the most dominating parameter in Class 0 (no movement), while rainfall dominates Class 1 (low movement). On the other hand, rainfall and humidity are both critical parameters in Class 2 (high movement). In Class 0, humidity has the highest attention score, 0.009, which means that the model relies most on humidity to predict no-movement scenarios. Temperature follows closely, indicating that it is also an important feature. On the other hand, rainfall has a lower attention score, ranking fourth, which suggests that it is not a critical factor for forecasting no movement. The forecast of slope stability is least affected by sunlight, which consistently ranks fifth in all classifications. With the greatest attention score of 0.008, rainfall is the most significant parameter in Class 1, low movement, suggesting that it is crucial to the prediction of low movement. Pressure and sunlight continue to be of lower relevance, suggesting that they have less of an impact in low movement conditions, while temperature and humidity are still relevant but rank slightly lower. Rainfall now again ranks as the most influential parameter in Class 2, which represents high movement, at 0.0022, demonstrating its critical importance in predicting high slope movement. Humidity is in second place, as in all movement classes. The temperature has a medium importance in this class but is lower than rainfall and humidity. Again, pressure and sunlight are the features with the lowest ranks for the high movement scenario and, therefore, the ones that provide less impact for the model in making its forecast.

The following important observations come out of the result. Rainfall is not necessary for Class 0 (no movement) stability forecasts since the rank is quite low in the model’s attention, while at the same time, humidity and temperature are the key drivers of the no-movement forecast. With an increase in slope movement intensity, rainfall also dominated in Classes 1 (low movement) and 2 (high movement), reflecting the high reliance of the model on rainfall for forecasting scenarios of the movement. On the other hand, humidity maintained importance over all the classes, confirming the relevance to unstable as well as stable slope conditions. The temperature is important but normally plays a secondary role compared to rainfall and humidity, especially in the more intense cases of movement.

Pressure and sunlight are always the least important features, which means the model considers these two features less important for slope movement forecast, independently of the movement intensity. The attention-based analysis identified rainfall and humidity as the most influential predictors, emphasizing their universal importance across different regions. However, localized anomalies suggest that elevation and geological composition still affect the predictions. These results confirm the rationale of the study by showing that the model captures broad patterns but needs further fine-tuning to address site-specific variations. More information obtained from the heatmaps in Figs. 8 and 9, and 10, Section “t-SNE visualization of slope movement classes”, shows separated time slots where certain weather parameters had higher attention scores within movement classes. The heatmap for Class 0 (no movement) in Fig. 8 shows that the model pays more attention to temperature and rainfall around certain intervals, especially between time steps 10–15 (representing 1.6 to 2.5 h) and 65–75 (representing 10.8 to 12.5 h). These intervals indicate that the model considers variations in temperature and rainfall during these timeframes more important for forecasting no slope movement. Sunlight shows similar attention peaks but overall has lower importance compared to temperature and rainfall. On the other hand, humidity and pressure received relatively low attention, which may indicate lower importance for this task of no movement forecasting.

In Class 1 (low movement), depicted in Fig. 9, the model gives equal attention to temperature, humidity, and rainfall while placing special attention on the time steps between 60 and 70 (10 to 11.6 h) and 80–90 (13.3 to 15 h). These are the intervals where the model pays the most attention, suggesting that changes in these weather parameters within this period are very key in forecasting low movement. Among the studied weather factors, temperature and rainfall become the most influential variables, while the humidity increases strongly in this class. Regarding pressure, similar peaks can also be observed, and sunlight remains insignificant as the attention to sunlight increases only sporadically around 1.6–2.5 h and 10–11.6 h, whereas in Class 2- High movement class, which can be seen from Fig. In fact, the model places stronger attention on rainfall, humidity, and pressure, mainly for time steps 10–15, representing 1.6 to 2.5 h, and 80–100, representing 13.3 to 16.6 h. These ranges indicate that during such periods, sudden changes in weather conditions, especially rainfall, are extremely important for the accurate forecasting of high-slope movement.

The dominating feature is rainfall, which receives the majority of the model’s attention. During these same periods, humidity and pressure also receive more attention. However, compared to Class 1, temperature receives less attention, suggesting that it has a minor contribution to the prediction of high movement. Sunlight continues to be the least significant value because, most of the time steps pay little attention to it.

Conclusions

This study addresses a critical literature gap that needs to be addressed to improve the essential ability to forecast the occurrence of landslides in the very prone Himalayan region. It integrates hierarchically advanced ML techniques, filling the gap in the present set of literature and hence proposing a robust framework for understanding slope movements and providing forecasts under complicating environmental conditions. It insists on a fine-grained temporal analysis of the five main weather parameters of temperature, humidity, rainfall, barometric pressure, and sunlight intensity and discusses their interaction in triggering landslides.

The novel architecture of the H-TPA model outperformed other models by effectively capturing multilayer temporal dependencies and offering higher sensitivity to time-variant data patterns. Unlike conventional approaches, the segmentation of data by the H-TPA model across multiple temporal scales-minute, hour, and day- enabled it to catch nuances in the pattern that make it exceptionally good in predicting landslide dynamics with high accuracy. This model, incorporating autoencoders, transformers, and pooling mechanisms, addressed long-term dependencies and enabled multi-step predictions critical for proactive disaster management.

Key findings, such as the identification of rainfall and humidity as key early warning signals, reflect, in turn, the practical impact this research is trying to make. In this regard, environmental thresholds, such as a rainfall intensity threshold of 212 mm/hr and 97 mm/hr for large and small slope movements, respectively, could be applied as triggering thresholds for issuing early warnings. The study identifies key time windows that improve forecast accuracy and enable prompt actions, such as 10–15 h and 13–16 h ahead of substantial slope movement.

While attention-based analysis identified the most important drivers of slope instability across different movement classes, it also shed light on their relative importance in varying conditions. Whereas temperature and humidity are the dominant factors driving Class 0 movements, rainfall and pressure become highly critical in higher displacement classes. These insights will not only enrich scientific knowledge of landslide triggers but also set up a platform for building efficient early warning systems.

However, this study identified a number of limitations to this work, including geographical biases in the datasets utilized and the lack of real-time sensor integration with the model, which may restrict the generalizability of the results. Future work will focus on increasing datasets across diverse terrains and real-time data streams, as well as establishing hierarchical scalability models to allow for broad applicability. This will ensure that ML models stay adaptable and effective in many disaster-prone places around the world.

Overall, it is a significant step in landslide prediction and management, merging state-of-the-art methodologies with actionable insights that advance the frontiers of both the theoretical framework and practical applications of disaster preparedness. Along with facilitating the development of more sophisticated forecasting systems, the study will assist in protecting infrastructure, the economy, and human life in some of the most exposed regions in the world. The development of real-time systems, diversification of data, and further improvement of the model are pivotal in ensuring a better and safer future.

Acknowledgements

The authors would like to thank the DDMA of Mandi, Kangra, Kinnaur of Himachal Pradesh, and the National Mission on Himalayan Studies for providing funds for this research (IITM/NMHS-MoEF/VD/499). We also thank IIT Mandi for providing facilities for this research.

Author contributions

Praveen Kumar contributed to the development of the methodology, analysis of the results, and preparation of the manuscript. Priyanka Priyanka assisted in manuscript preparation and contributed to the method development. K.V. Uday supported the validation of the results. Varun Dutt provided guidance and supervised the study. All authors reviewed and approved the final manuscript.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Ram, P. & Gupta, V. Landslide hazard, vulnerability, and risk assessment (hvra), mussoorie Township, lesser Himalaya, India. Environ. Dev. Sustain.24 (1), 473–450. 10.1007/s10668-021-01449-2 (2022). [Google Scholar]
  • 2.Pathania, A. et al. A low-cost, sub-surface IoT framework for landslide monitoring, warning, and prediction. In Proceedings of 2020 International Conference on Advances in Computing, Communication, Embedded and Secure Systems (2020).
  • 3.Kumar, P., Sihag, P., Chaturvedi, P., Uday, K. & Dutt, V. BS-LSTM: an ensemble recurrent approach to forecasting soil movements in the real world. Front. Earth Sci.9, 696792 (2021). [Google Scholar]
  • 4.Kumar, P. et al. Prediction of real-world slope movements via recurrent and non-recurrent neural network algorithms: a case study of the Tangni landslide. Indian Geotech. J.51 (4), 788–810 (2021). [Google Scholar]
  • 5.Yang, B., Yin, K., Lacasse, S. & Liu, Z. Time series analysis and long short-term memory neural network to predict landslide displacement. Landslides16, 677–694 (2019). [Google Scholar]
  • 6.Maxwell, A. E. et al. Assessing the generalization of machine learning-based slope failure prediction to new geographic extents. ISPRS Int. J. Geo-Information. 10 (5), 293 (2021). [Google Scholar]
  • 7.Sun, S., Wang, X., Li, J. & Lian, C. Landslide evolution state prediction and down-level control based on multi-task learning. Knowl. Based Syst.238, 107884 (2022). [Google Scholar]
  • 8.Zhang, Y. et al. Research on displacement prediction of step-type landslide under the influence of various environmental factors based on intelligent wca-elm in the three Gorges reservoir area. Nat. Hazards. 107, 1709–1729 (2021). [Google Scholar]
  • 9.Molnar, C. Interpretable machine learning. Lulu. com. ISBN: 978-0-244-76852-2. (2020).
  • 10.Pei, H., Meng, F. & Zhu, H. Landslide displacement prediction based on a novel hybrid model and convolutional neural network considering time-varying factors. Bull. Eng. Geol. Environ.80 (10), 7403–7422 (2021). [Google Scholar]
  • 11.Xi, N., Yang, Q., Sun, Y. & Mei, G. Machine learning approaches for slope deformation prediction based on monitored time-series displacement data: A comparative investigation. Appl. Sci.13 (8), 4677 (2023). [Google Scholar]
  • 12.Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022 (2021).
  • 13.Li, B. et al. DifFormer: Multi-Resolutional differencing transformer with dynamic ranging for time series analysis. IEEE Trans. Pattern Anal. Mach. Intell.45 (11), 13586–13598 (2023). [DOI] [PubMed] [Google Scholar]
  • 14.Zhang, Y. & Yan, J. Crossformer Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations (2022).
  • 15.Challu, C. et al. Nhits: neural hierarchical interpolation for time series forecasting. Proc. Proc. AAAI Conf. Artif. Intell.37(6), 6989–6997 (2023). [Google Scholar]
  • 16.Sharma, P. & Diwakar, S. Ascending artificial intelligence: Advanced neural networks for precise prediction of mountain perils. In Proceeding International Conference on Climate Change, Disaster Management, and Environmental Sustainability (2024).
  • 17.Saha, S., Majumdar, P. & Bera, B. Deep learning and benchmark machine learning based landslide susceptibility investigation, Garhwal himalaya (India). Quaternary Sci. Adv.10, 100075 (2023). [Google Scholar]
  • 18.Firoozi, A. A., Firoozi, A. A., Aati, K. & Rashid, M. S. Integrated geotechnical modelling and Real-time analysis for predicting Earthquake-Induced landslides and rockfalls in the East African fracture zone. Trends Ecol. Indoor Environ. Eng.2 (3), 1–19 (2024). [Google Scholar]
  • 19.Lu, Z. et al. Advancements in technologies and methodologies of machine learning in landslide susceptibility research: current trends and future directions. Appl. Sci.14 (21), 9639 (2024). [Google Scholar]
  • 20.Nagarani, N., Ramji, T. B. & Kishorelal, A. Predictive analysis of machine learning algorithms applicable for natural disaster management. In Proceeding Utilizing AI and Machine Learning for Natural Disaster Management, 65–79 (2024).
  • 21.Li, W., Fang, Z. & Wang, Y. Stacking ensemble of deep learning methods for landslide susceptibility mapping in the three Gorges reservoir area, China. Stoch. Env. Res. Risk Assess.36, 2207–2228 (2022). [Google Scholar]
  • 22.Nava, L. et al. Landslide displacement forecasting using deep learning and monitoring data across selected sites. Landslides20 (10), 2111–2129 (2023). [Google Scholar]
  • 23.Yuan, R. & Chen, J. A novel method based on deep learning model for national-scale landslide hazard assessment. Landslides20 (11), 2379–2403 (2023). [Google Scholar]
  • 24.Wang, W. et al. A framework for automated landslide dating utilizing SAR-Derived parameters Time-Series, an enhanced transformer model, and dynamic thresholding. Int. J. Appl. Earth Obs. Geoinf.129, 103795 (2024). [Google Scholar]
  • 25.Khalili, M. A. et al. Enhancing landslide prediction through advanced transformer-based models: integrating SAR imagery and environmental data. EJ Nondestruct Test.29 (07) (2024).
  • 26.Zhang, Q. & Wang, T. Deep learning for exploring landslides with remote sensing and Geo-Environmental data: frameworks, progress, challenges, and opportunities. Remote Sens.16 (8), 1344 (2024). [Google Scholar]
  • 27.Xu, M., Zhang, D., Li, J. & Wu, Y. An adaptive spatial–temporal prediction model for landslide displacement based on decomposition architecture. Eng. Appl. Artif. Intell.137, 109215 (2024). [Google Scholar]
  • 28.Kuang, P. et al. Landslide displacement prediction via attentive graph neural network. Remote Sens.14 (8), 1919 (2022). [Google Scholar]
  • 29.Li, Y., Xin, Z., Liao, G., Huang, P. & Yuan, M. Landslide detection for remote sensing images using a Multi-Label classification network based on Bijie landslide dataset. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.17, 9194–9213 (2024). [Google Scholar]
  • 30.Verma, S. & Khanduri, V. S. Landslide killing Himalayas: collective study on causal factors and possible remedies. J. Adv. Res. Appl. Sci. Eng. Technol.19 (1), 28–35 (2020). [Google Scholar]
  • 31.Srivastava, P., Thakur, M., SIngh, S. & Gupta, T. Landslide Analysis: a case study on recent landslides in Solan, Shamti. (2024).
  • 32.Rudra Paul, S. & Sarkar, R. A. Critical analysis of landslide susceptibility studies in Himachal himalaya. J. Geol. Soc. India. 100 (11), 1545–1556 (2024). [Google Scholar]
  • 33.Kumar, P., Sharma, P. K., Kumar, P., Sharma, M. & Butail, N. P. Agricultural sustainability in Indian Himalayan region: constraints and potentials. Indian J. Ecol.48 (3), 649–661 (2021). [Google Scholar]
  • 34.Borthakur, A. & Singh, P. Addressing the Climate Crisis in the Indian Himalayas: Can Traditional Ecological Knowledge Help??. 1–293 (eds Borthakur, A. & Singh, P.) (Springer, 2024).
  • 35.Esri ArcGIS Pro 3.1.0. Environmental Systems Research Institute, Inc. (2023). https://www.esri.com
  • 36.Dikshit, A., Sarkar, R., Pradhan, B., Segoni, S. & Alamri, A. M. Rainfall induced landslide studies in Indian Himalayan region: a critical review. Appl. Sci.10 (7), 2466 (2020). [Google Scholar]
  • 37.Yavari, N., Tang, A. M., Pereira, J. M. & Hassen, G. Effect of temperature on the shear strength of soils and the soil–structure interface. Can. Geotech. J.53 (7), 1186–1194 (2016). [Google Scholar]
  • 38.Rahardjo, H., Li, X. W., Toll, D. G. & Leong, E. C. The effect of antecedent rainfall on slope stability. Unsaturated soil concepts and their application in geotechnical practice, 371–399. (2001).
  • 39.Köhler, H. J. & Schulze, R. Landslide triggering induced by barometric pressure changes. In Proceeding of ISRM International Symposium (2000).
  • 40.Wang, J. D., Gu, T. F. & Xu, Y. J. Field tests of expansive soil embankment slope deformation under the effect of the rainfall evaporation cycle. Appl. Ecol. Environ. Res.15 (3), 343–357 (2017). [Google Scholar]
  • 41.Kumar, P. S., Vignesh, U. & Technologies, N. Analysis of the kalman filter with the MPU6050 accelerometer and gyroscope. In Proceeding 15th International Conference on Computing Communication and (ICCCNT), 1–6 (2024).
  • 42.Chauhan, T. et al. Non-Structural landslide risk mitigation: schemes, application and case studies. Indian Geotech. J.54, 1960–1972 (2024). [Google Scholar]
  • 43.Alzahrani, O. et al. A Cross-Sectional study on the quality of life of adults with sickle cell disease Followed-Up in outpatient clinics: A Single-Center experience. Cureus16 (11), 73970 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jain, A. & Arolkar, H. Gaussian and Impulse Noise Identification from Image Using Frequency Domain Analysis. Proc. International Conference on Smart Computing and Communication, 181–188, (2024).
  • 45.Mistri, D. et al. Cognitive phenotypes in multiple sclerosis: mapping the spectrum of impairment. J. Neurol.271 (4), 1571–1583 (2024). [DOI] [PubMed] [Google Scholar]
  • 46.Vaswani, A. et al. Attention is all you need. Proc. Adv. Neural Inf. Process. Syst.30, 6000–6010 (2017). [Google Scholar]
  • 47.Lea, C. et al. D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision, 156–165 (2017).
  • 48.Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput.9 (8), 1735–1780 (1997). [DOI] [PubMed] [Google Scholar]
  • 49.Lei, S., He, F., Yuan, Y. & Tao, D. Understanding deep learning via decision boundary. IEEE Trans. Neural Networks Learn. Syst.36, 1533–1544 (2023). [DOI] [PubMed] [Google Scholar]
  • 50.DeGroot, M. H. & Schervish, M. J. Probability and statistics. 4th edn. (Pearson Education, 2012).
  • 51.Varshney, Y. & Chauhan, N. Construction of Attention-Based GRU model with effective feature selector for weather forecasting. J. Jilin. Univ.43 (08) (2024).
  • 52.Wei, K., Li, Q., Yao, Y. & Sun, Y. Use of temporal convolutional network with an attention mechanism and a bidirectional gated recurrent unit to capture and predict slope debris flow risk. In Proceeding International Conference on Civil Engineering, 55–67. (2023). 10.1007/978-981-97-4355-1_6
  • 53.Simeunović, J., Schubnel, B., Alet, P. J., Carrillo, R. E. & Frossard, P. Interpretable temporal-spatial graph attention network for multi-site PV power forecasting. Appl. Energy. 327, 120127. 10.1016/j.apenergy.2022.120127 (2022). [Google Scholar]
  • 54.Gururani, C. Developing a spatio-temporal model to predict InSAR-derived hillslope deformation. Master’s thesis, University of Twente, (2024). http://essay.utwente.nl/100495/
  • 55.Hunter, J. D. Matplotlib A 2D graphics environment. Comput. Sci. Eng.9, 90–95. 10.1109/MCSE.2007.55 (2007).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES