Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jul 1;15:21509. doi: 10.1038/s41598-025-07494-5

Developing data driven framework to model earthquake induced liquefaction potential of granular terrain by machine learning classification models

Kennedy C Onyelowe 1,2,, Viroon Kamchoom 3,, Tammineni Gnananandarao 4, Krishna P Arunachalam 5
PMCID: PMC12216905  PMID: 40596560

Abstract

Earthquake-inducedliquefaction of soils poses a serious georisk in geotechnical designs, construction and the application of geotechnical structures around the world. In this study, the applicability of three soft computing models for liquefaction classification, a topic of significant importance within the fields of geotechnical and earthquake engineering has been evaluated. Twelve input parameters are used to classify the liquefaction potential for 234 data sets collected from an earthquake-induced liquefaction prone granular material environment. For developing the SVM_Poly, SVM_RBK models, an extensive number of trials were conducted using various combinations of C and d for polynomial kernels and C and ∂ for radial basis function kernel-based support vector machines (SVMs) utilizing user-defined parameters. In the same way, several experiments were conducted with a fixed value of C and ∂ kernel specific parameters in order to determine an appropriate value of error-insensitive zone (∋).Similarly, for the random forest classifier (RFC) model, the number of variables used (m) and the number of trees to be grown (k) are two user-defined parameters. These optimum values of m and k parameters are fixed using trial and error process and the same fixed values. The best model was developed as evidence from the confusion matrixes and statistical indicators. The calculated values of confusion matrixes and statistical indicators for training and testing shows that an accuracy of 0.89 indicates the model is correct in its predictions 89% of the time. A sensitivity of 0.85 signifies the model correctly identifies 85% of actual positive instances, while a specificity of 0.94 implies correct identification of 94% of actual negative instances. A precision of 0.94 suggests that when the model predicts a positive instance, it is correct 94% of the time. The Phi Correlation Coefficient, with a value of 0.82, indicates a strong positive correlation between predicted and actual values.Furthermore, the model exhibits a Mean Absolute Error (MAE) of 0.2351, reflecting a relatively low average error in predictions. The Root Mean Squared Error (RMSE) value of 0.3115 indicates better accuracy in predicting the target variable.Finally, all the developed models exhibit promising performance across various evaluation metrics, with low error measures (MAE and RMSE), high accuracy, and strong performance in correctly identifying both positive and negative instances, as evidenced by sensitivity and specificity. The high precision and Phi Correlation Coefficient further affirm the reliability and accuracy of the model’s predictions. However, among the three models FRC model is the best for classifying the liquefaction. The novelty of this research lies in its comparative evaluation and optimization of SVM_Poly, SVM_RBK, and RFC models using a comprehensive set of seismic and soil parameters to accurately classify earthquake-induced liquefaction potential, with the RFC model demonstrating superior predictive performance.

Keywords: Earthquake, Liquefaction, Granular soil, Machine learning, Geohazard models

Subject terms: Environmental sciences, Natural hazards, Engineering

Introduction

Liquefaction of granular materials refers to a phenomenon where a saturated or partially saturated granular material, such as sand or silt, temporarily loses its strength and behaves like a liquid under stress, often as a result of seismic activity, such as an earthquake1. Liquefaction occurs when the pore water pressure within the soil increases, reducing the effective stress that holds the soil particles together2. Sands and silts with low cohesion are most susceptible due to the voids between soil particles which are filled with water, which can increase pressure during shaking3.Vibrations from an earthquake cause the soil to compact, which pushes water into the voids, leading to increased pore water pressure4.As pore water pressure rises, the ability of the particles to support weight (effective stress) diminishes5.Structures can settle, tilt, or collapse due to the loss of support from the liquefied soil6. Soil may behave like a fluid, causing landslides or flow of material7. Ground displacement can cause large, horizontal movements and excess water and sand may be ejected to the surface, forming mounds8. Techniques like vibro-compaction or dynamic compaction increase the density of the soil, reducing the likelihood of liquefaction5. Installing drainage systems helps reduce pore water pressure by allowing water to escape more easily3. Using cement or other binding materials can increase cohesion and improve soil strength9. For structures, using deep foundations like piles can help transfer loads to more stable layers beneath the liquefiable zone1. Liquefaction is a significant concern in geotechnical engineering, particularly in areas prone to earthquakes10. Understanding the mechanics and conditions of liquefaction helps in designing structures and taking preventive measures to mitigate its effects.

Liquefaction of granular materials can have significant environmental impacts, particularly in regions prone to seismic activity7. When the soil loses its strength and behaves like a liquid, the effects extend beyond structural damage, influencing ecosystems, water quality, and the overall landscape8. Liquefaction can trigger landslides or slope failures in hilly or sloped terrain. These events can lead to significant alterations in the landscape, destroying habitats and altering land use9. The sudden settlement of the ground can damage vegetation, infrastructure, and ecosystems, altering drainage patterns and reducing the suitability of land for agriculture or other uses3.In urban or industrial areas, liquefaction can cause buried pipelines or storage tanks to rupture, releasing hazardous materials into groundwater or nearby water bodies4. This can contaminate drinking water sources and harm aquatic ecosystems. During liquefaction, sediment and other materials can be carried into rivers, lakes, or coastal areas, increasing turbidity and disrupting aquatic ecosystems. Excess sediment can smother fish spawning grounds and affect the photosynthesis of aquatic plants.Liquefaction can lead to lateral spreading, where large sections of land slide toward rivers, lakes, or the ocean5. This can alter the course of rivers, change coastal morphology, and destroy sensitive habitats like wetlands, mangroves, or marshes2.Coastal liquefaction can disrupt marine ecosystems, displace organisms, and lead to the destruction of coral reefs, seagrass beds, or tidal flats.Liquefaction may lead to the death of plants and the destruction of habitats due to ground displacement, subsidence, and the smothering of vegetation by sediment9. Animals reliant on specific habitats may be displaced or suffer population declines.Agricultural lands affected by liquefaction may lose soil fertility due to changes in soil structure, compaction, or the introduction of salty or silty materials, reducing productivity10. Liquefaction can alter ground levels and drainage patterns, making areas more prone to flooding. Floodplains may expand, affecting natural habitats and human settlements.Changes in soil structure and compaction may impact the flow and storage of groundwater. In some cases, liquefaction can lower the water table, reducing the availability of groundwater for plants and animals8.As soil and sediment are displaced or compacted, there may be changes in nutrient availability and soil chemistry, which can influence plant growth and microbial activity. This can lead to long-term alterations in ecosystem productivity.In coastal areas, liquefaction can bring saline water into freshwater environments, leading to the salinization of soils, which is harmful to both agriculture and native plant species. Liquefaction can permanently alter the shape of landscapes. River channels may shift, new wetlands might form, or low-lying areas might sink, creating new ecosystems while destroying existing ones.Areas affected by liquefaction may experience significant sediment buildup, burying ecosystems under layers of silt or sand, disrupting natural ecological processes.After liquefaction, loose soil materials are more vulnerable to erosion from wind and water5. This can lead to accelerated land degradation, contributing to habitat loss and further environmental degradation.The environmental impacts of liquefaction of granular materials are far-reaching, potentially affecting land, water, and ecosystems. The destabilization of ground conditions and the subsequent changes to natural habitats can lead to a range of ecological consequences, many of which may take years to fully manifest. Proper planning, land-use strategies, and engineering solutions can help mitigate some of these environmental risks, particularly in earthquake-prone regions. Extensive literature investigation has been conducted on the subject. Liquefaction occurs when a granular substance becomes liquefied due to increased pore water pressure. Reduced effective stress in soil leads to loss of bearing capacity. Liquefaction can cause three different sorts of damage. According to Keefer1, liquefaction refers to landslides caused by ground lateral spreading or dam embankment failure. Sand blows and ground fractures are visible signs of soil liquefaction on the surface. Liquefaction can cause building settlement and tilting, which can be hazardous. According to Seed and Idriss2, the earthquake-induced liquefaction caused damages worth hundreds of millions of dollars.Thus, in earthquake geotechnical engineering, determining the liquefaction potential of a location as a result of an earthquake is a crucial task. Soil liquefaction susceptibility is dependent on both soil parameter and earthquake parameter.

Samui & Sitharam3 described two machine learning algorithms for forecasting soil liquefaction susceptibility based on conventional penetration test data from the 1999 Chi-Chi earthquake. The first employs an Artificial Neural Network (ANN), while the second makes use of a Support Vector Machine. Both models are applied to a variety of global case histories, demonstrating SVMs’ superiority over ANN models. Ghani et al.4 employed ensemble-based soft computing approaches to forecast liquefaction potential in fine-grained soils. Five paradigms are utilized: decision tree, random forest, gradient boosting, AdaBoost, and XGBoost regressor. Stress factors, soil stratum depth, and earthquake magnitude are all taken into consideration. The XGBoost model reached 99% accuracy, giving it a viable alternative to empirical methods for assessing early liquefaction susceptibility in engineering projects. Alioua et al.5 created a new model for predicting static liquefaction susceptibility in sands with plastic fins by combining ten powerful machine-learning approaches. The model was trained on 114 unconsolidated undrained triaxial shear tests and assessed with six performance metrics and a K-fold cross validation method. The Deep Neural Network (DNN) model outperformed the others, making more accurate predictions and coming closest to experimental values. “StaLique2024” is a graphical interface that is both dependable and easy to use. Hanandeh et al.6 compared supervised machine learning-based assessments for soil liquefaction using cone penetration tests (CPTs). This study employed three supervised machine learning classifiers: support vector machines, decision trees, and quadratic discrimination analysis. Three different soil characterization data sets were used, and the most essential factor for determining liquefaction susceptibility was assessed CPT Tip Resistance. This study suggested simple and quick techniques to evaluating soil liquefaction susceptibility that do not require complex computations, proving machine learning’s promise in assessing soil liquefaction susceptibility.

Zhao et al.7 created a Bayesian framework to quantify the uncertainty of two energy-based machine learning models and generate probabilistic versions. It assesses the performance of these models using a Bayesian model comparison method. The results indicated that the artificial neural network model has a higher occurrence probability than the support vector machine model. The projected liquefaction probability is superior to existing methods. Torres & Dungca8 used rudimentary set-based machine learning to create a rule-based categorization model for liquefaction triggering. The model, which employs 32 IF-THEN statements, calls into question commonly held assumptions about soil liquefaction and fine-grained soils. The model includes a clear flowchart for use, and the results are consistent with current best practices for streamlined operations. The recommendations included reevaluating some rules and increasing the liquefaction database.

Cong et al.9 used existing datasets and artificial intelligence to look into liquefaction resistance in chemically treated sandy soils. Resistance was calculated using data from 20 loading cycles of cyclic undrained triaxial testing. Combining explanatory factors produced excellent findings. Including all variables increased prediction accuracy. Linear models demonstrated the ability to accurately estimate liquefaction resistance based on past data.

According to Khatoon et al.10, the cone penetration test dataset is used to evaluate the reliability index and likelihood of liquefaction using the first-order reliability approach (FORM). To forecast liquefaction, a deep neural network (DNN) model is created. Eight indicators as well as extra graphics are used to assess the model’s performance11. In order to improve risk assessments for geotechnical engineering design, the study recommends a more thorough incorporation of FORM-based machine learning models in liquefaction evaluation.

In this paper, two different algorithms of the support vector machine such as the polynomial function (Poly) and the radial basis function kernel (RBK) and the random forest classifier have been used to predict the liquefaction potential of granular soil environments for a more sustainable geotechnical infrastructure design and construction1215.

The background highlights the significant contributions of previous studies in advancing the understanding and prediction of earthquake-induced liquefaction using machine learning1619. Early research by Keefer and Seed & Idriss established foundational insights into liquefaction mechanisms and the need for accurate prediction methods. Subsequent works explored the application of machine learning techniques, such as Artificial Neural Networks and Support Vector Machines, with Samui & Sitharam demonstrating SVM’s superior performance using CPT data. Ghani et al. extended this by using ensemble-based models, while Alioua et al. introduced a Deep Neural Network for static liquefaction prediction, achieving high accuracy. Hanandeh et al. and Zhao et al. contributed by validating machine learning classifiers and developing Bayesian models for uncertainty quantification. Torres & Dungca added interpretability with a rule-based approach, and Cong et al. examined liquefaction resistance in treated soils. Khatoon et al. applied FORM with deep learning to improve reliability assessments. Collectively, these works laid the groundwork for applying advanced data-driven methods to liquefaction prediction, highlighting the growing relevance of machine learning in geotechnical engineering2023. The present research primarily focuses on developing a robust, data-driven framework to predict the earthquake-induced liquefaction potential of granular terrains using advanced machine learning classification models. It aims to evaluate and compare the predictive performance of the Random Forest Classifier (RFC), Support Vector Machine with Polynomial Kernel (SVM_Poly), and Support Vector Machine with Radial Basis Kernel (SVM_RBK) in modeling liquefaction susceptibility. The objectives of the study include collecting and preprocessing relevant geotechnical and seismic data, training and validating the selected machine learning models, analyzing their accuracy, generalization, and interpretability, and identifying the most effective algorithm for reliable prediction. Ultimately, the research seeks to enhance decision-making in geotechnical design and earthquake risk management through the integration of machine learning techniques tailored to complex soil behavior under seismic loading.

More published works provide substantial technical grounding and methodological precedent that enrich the background of the research work. Collectively, these references illustrate the maturation of data-driven methodologies in geotechnical and structural engineering domains, with several works offering direct parallels in problem framing, model implementation, and risk-informed decision-making. Furthermore, research references24,25 by Bhowmik et al. showcase the integration of feedback-driven learning and recursive principal component analysis with time-varying models to support real-time condition monitoring and damage detection. These studies highlight how sensor data and error-corrected learning loops can enhance predictive reliability in dynamic systems—concepts highly relevant to the temporal and spatial uncertainties present in liquefaction assessment. Studies such as26,27, and28 by Raja et al. emphasize the utility of evolutionary and smart artificial intelligence techniques for geotechnical prediction tasks like soil settlement and lateral spreading—both closely related to liquefaction phenomena. These works demonstrate how robust, non-linear machine learning models outperform traditional empirical methods, thereby justifying the methodological shift in the current research from deterministic to data-driven frameworks. Particularly, reference27 shows direct application of ML models to liquefaction-induced hazards, reinforcing the feasibility and domain-specific effectiveness of similar approaches. Research reference29 presents a modern implementation of data-driven modeling for predicting compressive strength of contaminated soils, employing intelligent techniques that consider soil heterogeneity and chemical interactions. This supports the premise that machine learning can adequately capture complex interactions in granular media, as required for liquefaction potential modeling. Similarly30, introduces AI-based predictions in slope terrains with buried infrastructure—analogous in complexity to liquefaction-prone terrains—thereby strengthening the argument for intelligent classification models in spatially variable geotechnical environments. Research references31,32, and33 examine data drift, model robustness, and scale-dependence of ML models in seismic and landslide scenarios, addressing key concerns such as model generalizability and input variability. These considerations directly feed into the discussion on training domain-specific models and understanding their limitations under shifting conditions—an essential element in developing reliable liquefaction risk frameworks. Research works published in papers3436 explore deformation patterns, strain localization, and slip evolution in geomaterials under stress, providing physical insights into the mechanisms underpinning liquefaction phenomena. These empirical foundations validate the choice of input features in ML models used to predict such responses. Moreover, references3739 underscore the relevance of closed-form ML and artificial neural networks in concrete strength evaluation and recycled materials, indirectly reflecting on how material variability can be effectively captured by data-driven approaches. This lends credibility to using ML for classifying complex granular responses to seismic excitation, analogous to strength prediction in heterogenous materials. In general, these works form a cohesive foundation that supports the relevance, methodological choices, and expected reliability of a machine learning-based framework for assessing liquefaction potential. They justify the transition from empirical heuristics to intelligent models by demonstrating accuracy gains, flexibility in capturing non-linear dependencies, and successful domain-specific applications—all of which validate the technical ambition and feasibility of the proposed research.

Research gap and statement of novelty

Despite extensive research on earthquake-induced liquefaction of granular soils, significant gaps remain in developing accurate, reliable, and adaptable models that can generalize well across diverse geotechnical and seismic conditions. Much of the early literature has focused on empirical and deterministic approaches, which, although foundational, often lack the capacity to handle complex, nonlinear interactions between soil properties and seismic parameters. While machine learning techniques have recently gained attention for their predictive capabilities, many studies have been limited either by small or localized datasets, narrow algorithm selection, or insufficient validation strategies. For instance, although models such as Support Vector Machines (SVM) and Artificial Neural Networks (ANN) have demonstrated predictive value, they often suffer from limitations related to overfitting, interpretability, or lack of robustness when applied to heterogeneous geological settings.Recent advancements in ensemble methods like Random Forest and boosting algorithms have shown promising results, particularly in handling high-dimensional and noisy data. However, limited work has integrated these methods with kernel-based SVM classifiers using diverse kernel functions, such as polynomial and radial basis functions, to explore comparative performance in assessing liquefaction potential. Additionally, most existing studies have focused on one or two machine learning models without developing a comprehensive framework that compares multiple algorithms within a unified structure or validating their generalization using a wide range of geotechnical input features and real-world datasets.The novelty of this research lies in the integrated, comparative use of three advanced classification models—Random Forest Classifier (RFC), Support Vector Machine with Polynomial Kernel (SVM_Poly), and Support Vector Machine with Radial Basis Kernel (SVM_RBK)—to assess the liquefaction potential of granular soils under seismic loading. Unlike previous studies that focus on single-model performance or use conventional features, this study leverages a broader set of soil and seismic parameters to enhance model learning and decision-making capabilities. Furthermore, it provides a structured and scalable data-driven framework that can be adapted for use in different geographic and geotechnical settings. The research contributes to the growing body of knowledge by offering a more refined, accurate, and transferable predictive system for liquefaction susceptibility, which is essential for sustainable and resilient geotechnical infrastructure planning and disaster risk reduction in earthquake-prone areas.

Machine learning theoretical framework

Support vector machine (SVM)

Support Vector Machine (SVM) is a novel machine learning approach relying on the ideas of statistical learning theory and the Vapnik-Chervonenk dimension concept11. Because of its superior generalization abilities, SVM is gaining popularity in the fields of pattern classification and nonlinear regression12. In a training dataset of N samples, represented by the notation (x1, y1),. (xn, yn), where x indicates a vector of inputs and y indicates an output value, the SVM estimator for classification can be stated as follows:

graphic file with name 41598_2025_7494_Article_Equ1.gif 1

The symbols w, b, and Φrepresent the weight vector, bias, and nonlinear mapping function, respectively. A lesser value of w demonstrates the flatness of Eq. (1), which is obtained by reducing the Euclidean norm as indicated by Inline graphic. Vapnik13 defined the following convex optimization problem with an e-insensitive loss function for classes using SVM:

graphic file with name 41598_2025_7494_Article_Equa.gif
graphic file with name 41598_2025_7494_Article_Equ2.gif 2

Here, C is a capacity factor that indicates the level of empirical inaccuracy in the optimization issue.It establishes the harmony between tolerances for deviations larger than ∋ and function flatness.Similarly, Ensuring that the output system meets the error tolerance ∋ is the responsibility of the slack variables Inline graphic14.The optimization problem given in Eq. (3) in its dual form was solved using Lagrangian multipliers and the Karush-Kuhn-Tucker (KKT) condition. Support vectorsare input vectors that, according to the KKT condition, have non-zero Lagrangian multipliers15. A schematic diagram of the SVM used in this investigation is shown in Fig. 1. The majority of variables that are predicted in the input space of natural processes exhibit nonlinear relationships with the predicted variable. This inhibits the problem from being represented linearly, as seen in Eq. (6).

Fig. 1.

Fig. 1

Architecture of SVM classifier.

A kernel function is used to translate the input space to a higher-dimensional space (feature space) to get around this constraint. The kernel function allows implicit operations in the higher-dimensional feature space. After that, the following modifications are made to Eq. (1) using Lagrangian multipliers (γii):

graphic file with name 41598_2025_7494_Article_Equ3.gif 3

where, K (xi, xj) is the kernel function. Many kernels are being studied in the literature; research indicates that radial basis kernels and polynomial (Poly) kernels outperform other kernels in a range of civil engineering applications. In this study, the well-known “Poly” and “RBF” kernel functions were used. Equations 4 and 5 define the “Poly” and “RBF” kernel functions, respectively:

graphic file with name 41598_2025_7494_Article_Equ4.gif 4
graphic file with name 41598_2025_7494_Article_Equ5.gif 5

During training, three parameters must be tuned when utilizing SVM with polynomial and RBF kernel functions. These are the kernel-specific parameters (d, ∂), the insensitive loss function∋, and the capacity parameter C.This work used internal cross-validation16 to generate an SVM model with the optimal combination of the three parameters. The model preparation flow is depicted as a flow chart in Fig. 2.

Fig. 2.

Fig. 2

Flow chart for SVM modelling.

Random forest classifier(RFC)

The random forest classifier (RFC) is an ensemble of tree classifiers.Every distinct classifier is built from a random vector that is sampled independently of the input vector. Every tree in the ensemble votes with one unit to determine which class is most frequently used to categorize an input vector17. The random forest classifier employed in this work uses randomly selected features or a combination of attributes at each node during the tree growing process. The schematic representation of the random forest classifier is shown in Fig. 3.

Fig. 3.

Fig. 3

Architecture of random forest model.

Every chosen attribute, or combination of features, was bagged. Using a random selection process to create N replacement examples, bagging is a technique for creating training data sets18. N is the size of the initial training set. Finding the class that received the most votes out of all the tree predictors in the forest is the process of classifying (liquefaction) instances17. Choosing a pruning method and an attribute selection measure are crucial steps in creating a decision tree. Decision tree induction allows for multiple approaches to attribute selection, most of which include assigning a quality score to the attribute. Two often used metrics are the Gini Index20 and the Information Gain Ratio criterion19. As an attribute selection metric, the random forest classifier assesses an attribute’s impurity in relation to the classes using the Gini Index.The Gini index can be written as follows when a case (liquification) is randomly selected from a training set T and assigned to a class:

graphic file with name 41598_2025_7494_Article_Equ6.gif 6

where, f (Inline graphic, T)/|T| is the Probability that the chosen example belongs to the class Inline graphic

Without pruning, a tree uses a combination of characteristics and new training data to grow to its maximum depth. This property sets the random forest classifier apart from earlier decision tree methods, such as the one proposed by Quinlan19. Research indicates that the effectiveness of tree-based classifiers is determined mainly by the pruning methods employed, rather than attribute selection metrics21. According to Breiman17, even in the absence of pruning22, the Strong Law of Large Numbers minimizes overfitting and the generalization error continuously converges as the number of trees increases. The total number of trees to be produced (k) and the number of attributes used at each node for tree formation (m) are two user-defined parameters needed to build a random forest classifier. To find the appropriate split at each node, only particular features are considered during the tree’s development.The random forest classifier is thus made up of k user-defined trees. Every sample in a fresh dataset is categorized by passing it through all k trees; the forest selects the class that receives the majority of votes in that particular case.

Data collection and preliminary analysis

Data collection and preliminary analysis for this research involved gathering a comprehensive dataset comprising geotechnical and seismic parameters relevant to the assessment of liquefaction potential in granular soils. The dataset included variables such as soil type, grain size distribution, relative density, depth of soil layers, groundwater table depth, standard penetration test (SPT) and cone penetration test (CPT) results, and earthquake magnitude. These data were sourced from validated case histories, published studies, and geotechnical databases related to past seismic events where liquefaction was observed or absent.The preliminary analysis phase focused on cleaning and preprocessing the data to ensure consistency and accuracy. Missing values were handled using appropriate imputation techniques, and outliers were examined and treated to minimize their impact on model performance. Data normalization and encoding of categorical variables were carried out to prepare the dataset for machine learning modeling. Exploratory data analysis (EDA) was conducted to understand the distribution and correlation of variables, identify key predictors of liquefaction, and detect patterns or trends. Statistical summaries and visualizations, such as histograms, box plots, and correlation heatmaps, were used to support the interpretation of the data. This foundational step ensured that the dataset was well-structured, representative, and suitable for training and evaluating the machine learning classification models used in this study.The region from which the soil exploration data was collected was taken into account for investigation of liquefaction possibility. Within this area there were a total of 234data setsthat was collected with 12 independent parameters and one dependent parameter. Out of the 234 data, 136 are susceptible to liquefaction and theremaining 98 are not prone to liquefaction. The total data is tabulated in Table 1. The total data was segregated into two parts: training and testing. For training, 80% of the total data and remaining 20% for testing is used for SVM_Poly, SVM_RBK and RFC models. Histograms are drawn for all the input and output data points and as shown in Fig. 4.The total data used for this classification is presented in Table 1. The Table 1 titled “Gravelly material liquefaction case history data” comprises 90 events (with more potentially truncated) in which liquefaction was confirmed to have occurred. The data includes various parameters, labeled as X1 through X13, representing a wide range of geotechnical and seismic features. Here’s an analytical summary of the table: The entire dataset represents only liquefied cases, indicated by the final column “Liquefied? Yes” for all entries, implying the absence of non-liquefied samples. This skews the data toward positive liquefaction outcomes, which may limit the capacity for comparative analysis or predictive modeling without the inclusion of negative cases.In terms of earthquake magnitude (X1), most events fall between 6.9 and 7.9, with a few reaching magnitudes as high as 9.0. Distances from the source (X2) vary widely, from as close as 9.2 km to over 360 km, indicating liquefaction occurred both near and relatively far from the seismic source. The duration (X3) also shows broad variation, with a range from approximately 10 s to over 150 s, suggesting liquefaction is not strictly limited to longer shaking events.The peak ground acceleration (X4) varies considerably, from around 0.09 to 0.84 g, demonstrating that liquefaction can occur across a broad spectrum of ground motion intensities. Finer soil and gravel content (X5 and X6, respectively) also range significantly, indicating variability in particle size distribution across liquefied sites. Higher gravel content appears frequently, supporting that gravelly soils are indeed susceptible under specific conditions, despite traditionally being considered less vulnerable.Displacement (X7) spans from minimal (0.15 mm) to over 50 mm, hinting at different degrees of surface manifestation. The penetration resistance or cone tip resistance (X8) and the corrected overburden stress (X9) show wide ranges as well, possibly influencing the onset of liquefaction depending on soil strength and confining pressure.Depth-related variables such as depth to groundwater (X10), depth of liquefied layer top (X11), and thickness of the liquefied layer (X12) highlight that liquefaction was detected at varying depths, although most events cluster between 1 and 4 m below ground surface. The final variable X13, possibly representing soil classification or stratigraphic layering, maintains consistent values within a narrow band, suggesting homogeneous classification across events or encoding that’s not fully visible in this summary.In conclusion, the dataset demonstrates that liquefaction in gravelly soils is associated with a complex interplay of seismic intensity, soil composition, and subsurface conditions. However, the absence of “No” cases and the unbalanced nature of the dataset limits its use for classification modeling unless supplemented with non-liquefied instances.

Table 1.

Gravelly material liquefaction case history data.

Evt. No. X1 X2 (km) X3 (s) X4 X5 (%) X6 (%) X7 (mm) X8 X9 (kPa) X10 (m) X11 (m) X12 (m) X13 Liquefied?
1 7.9 96.30 40 0.21 9 9 0.5 16.01 32.00 1.5 1.5 0 1 Yes
2 7.9 94.00 40 0.24 5 53 6.15 10.79 49.00 0.8 0.8 0 1 Yes
3 7.9 95.00 40 0.24 4.9 50 5.9 20.91 46.00 1 1.1 0 1 Yes
4 7.9 99.60 40 0.14 9 0.5 0.5 7.69 68.00 2.2 2.4 0 1 Yes
5 7.9 74.70 40 0.23 9 0.4 0.51 14.94 59.00 2.2 1.5 0.7 1 Yes
6 7.9 109.10 40 0.14 9 0.5 0.15 9.43 41.00 1.8 2.8 0 1 Yes
7 7.9 94.21 40 0.21 7.5 22.8 0.7 12.14 52.00 1.9 2.4 0 1 Yes
8 7.9 95.00 40 0.20 4.9 35 2.3 19.02 28.00 1 1.2 0 1 Yes
9 7.9 85.50 40 0.30 4.9 39.7 1.54 12.26 83.00 3.5 2.8 0.7 1 Yes
10 7.9 61.10 50 0.17 9 22 1 7.17 30.00 1.4 1 0.4 1 Yes
11 7.9 40.29 50 0.21 4.9 65 15 15.56 41.00 0.9 2.1 0 1 Yes
12 7.9 138.00 110 0.24 9 5 0.4 9.78 49.00 1.3 1.6 0 1 Yes
13 7.9 94.50 105 0.30 4.9 76.6 31.5 19.96 77.00 3.4 3 0.4 1 Yes
14 7.9 87.90 105 0.30 1 75.4 30.57 28.25 85.00 4 4 0 1 Yes
15 7.9 91.90 105 0.34 3.8 57.2 11.59 14.01 82.00 2.8 4 0 1 Yes
16 7.9 77.70 105 0.49 4.9 50 5.9 19.78 55.00 2.5 2.9 0 1 Yes
17 7.9 86.36 105 0.48 0 51 5 7.42 73.00 1.6 5 0 1 Yes
18 7.9 71.90 105 0.59 4.9 35 2.3 17.57 123.00 6 4.8 1.2 1 Yes
19 7.9 78.10 105 0.42 4.9 66.8 33.4 9.50 85.00 2.4 4 0 1 Yes
20 7.9 75.99 105 0.34 4.9 64.1 22 10.43 37.00 1 2.6 0 1 Yes
21 7.9 76.00 105 0.33 9 19 0.8 13.81 69.00 2.9 4 0 1 Yes
22 7.9 78.50 105 0.37 5 49 4.5 12.19 71.00 3 1.5 1.5 1 Yes
23 7.9 78.60 105 0.35 4.9 60 10.9 9.64 49.00 1.5 3.2 0 1 Yes
24 7.9 73.15 105 0.39 4.9 48 5.2 15.99 26.00 0.6 2.8 0 1 Yes
25 7.9 88.60 105 0.29 4.9 63.7 12.8 15.20 60.00 2.8 1.2 1.6 1 Yes
26 7.9 64.22 150 0.59 4.9 50 5.9 22.47 32.00 1.2 1.4 0 1 Yes
27 7.9 63.50 150 0.41 4.9 50 5.9 14.67 42.00 1.5 1.2 0.3 1 Yes
28 7.9 77.90 150 0.29 4.9 74 25 15.73 31.00 0.9 0.7 0.2 1 Yes
29 7.9 152.73 90 0.52 4.9 50 5.9 19.01 66.00 2.4 1.7 0.7 1 Yes
30 7.9 22.70 70 0.25 4.9 85 52.1 22.02 47.00 2.1 1 1.1 1 Yes
31 7.9 25.70 70 0.24 4.5 4.8 0.35 20.34 24.00 0.9 1 0 1 Yes
32 7.9 36.00 70 0.25 4.9 45 4.3 11.04 33.00 1.4 1.4 0 1 Yes
33 7.9 74.50 70 0.22 9.9 26.1 0.98 9.67 61.00 1.4 2.3 0 1 Yes
34 7.9 44.02 70 0.24 4.9 59 10.3 11.60 61.00 2.4 3.1 0 1 Yes
35 7.9 161.00 48.1 0.13 9 46 9 21.73 41.00 1.8 2.3 0.9 1 Yes
36 7.9 219.10 48.1 0.09 9 46 9 6.66 55.00 1 2.3 0.9 1 Yes
37 6.9 12.00 10.9 0.29 2 70 16 6.74 27.80 0.8 2.3 0.9 1 Yes
38 6.9 12.00 10.9 0.29 2 70 16 6.66 28.70 0.8 2.3 0.9 1 Yes
39 6.9 89.50 10.9 0.30 3 63 13.2 7.78 37.30 1.2 2.3 0.9 1 Yes
40 6.9 89.50 10.9 0.30 3 63 13.2 5.33 37.40 1.2 2.3 0.9 1 Yes
41 6.9 9.20 10.9 0.36 1 66 15 7.40 46.20 1.7 2.3 0.9 1 Yes
42 6.9 9.20 10.9 0.36 4 70 20.4 7.53 44.70 1.5 2.3 0.9 1 Yes
43 6.9 9.20 10.9 0.36 1 55 8 8.30 36.80 1.4 2.3 0.9 1 Yes
44 6.9 9.20 10.9 0.36 1 41 2.5 7.16 49.40 1.8 2.3 0.9 1 Yes
45 6.9 9.20 10.9 0.36 2 58 9.7 7.45 45.60 1.5 2.3 0.9 1 Yes
46 6.9 9.20 10.9 0.36 4 54 7.5 8.88 46.30 2 2.3 0.9 1 Yes
47 6.9 9.20 10.9 0.36 4 54 7.5 10.53 32.90 1.5 2.3 0.9 1 Yes
48 6.9 9.20 10.9 0.36 4 54 7.5 10.61 32.40 1.5 2.3 0.9 1 Yes
49 6.9 9.20 10.9 0.36 4 54 7.5 10.39 33.80 1.5 2.3 0.9 1 Yes
50 6.9 9.20 10.9 0.36 4 54 7.5 9.76 38.30 1.7 2.3 0.9 1 Yes
51 6.9 9.20 10.9 0.36 4 54 7.5 9.60 46.80 1.5 2.3 0.9 1 Yes
52 6.9 9.20 10.9 0.35 2 49 5.4 8.10 38.60 1.65 2.3 0.9 1 Yes
53 6.9 9.20 10.9 0.35 3 57 9 8.77 32.90 1.45 2.3 0.9 1 Yes
54 6.9 9.20 10.9 0.35 7 72 18 9.51 28.00 1.35 2.3 0.9 1 Yes
55 6.9 9.20 10.9 0.35 1 49 5.6 10.36 23.60 1.1 2.3 0.9 1 Yes
56 6.9 14.20 10.9 0.60 12 67 17 27.40 17.00 1 2.3 0.9 1 Yes
57 6.9 14.20 10.9 0.35 2 48 5.2 13.11 36.60 1.52 2.3 0.9 1 Yes
58 6.9 14.20 10.9 0.60 24 39 3 4.99 119.30 7 2.3 0.9 1 Yes
59 6.9 14.20 10.9 0.60 20 42 3.6 4.93 180.20 9.36 2.3 0.9 1 Yes
60 6.9 14.20 10.9 0.50 21 59 10 19.57 37.40 0.8 2.3 0.9 1 Yes
61 6.9 14.20 10.9 0.50 30 33 2 11.84 54.20 2.4 2.3 0.9 1 Yes
62 6.9 14.20 10.9 0.50 21 63 13 5.55 131.00 6.7 2.3 0.9 1 Yes
63 6.9 14.20 10.9 0.50 20 40 3 16.90 29.10 0.8 0.8 0 1 Yes
64 6.9 14.20 10.9 0.50 30 35 2.3 13.37 51.00 2.4 1 1.4 1 Yes
65 6.9 14.20 10.9 0.50 20 63 13 6.42 124.00 6.8 4.6 2.2 1 Yes
66 6.9 13.00 10.9 0.50 10 61 11 19.34 39.00 0.8 2.3 0.9 1 Yes
67 6.9 13.00 10.9 0.50 10 61 11 14.62 38.40 0.8 2.3 0.9 1 Yes
68 6.9 13.00 10.9 0.50 10 61 11 20.56 40.50 0.8 2.3 0.9 1 Yes
69 6.9 30.50 15.7 0.50 18 40 3.1 8.68 62.80 2.1 2.3 0.9 1 Yes
70 6.9 30.50 15.7 0.58 16 30 1.7 3.51 64.60 1.8 2.3 0.9 1 Yes
71 6.9 30.50 15.7 0.63 12 50 5.9 8.40 94.20 4.3 2.3 0.9 1 Yes
72 6.9 30.50 15.7 0.65 9 50 5.9 7.10 96.70 3.2 2.3 0.9 1 Yes
73 6.9 30.50 15.7 0.65 12 40 3.1 7.91 77.80 2.5 2.3 0.9 1 Yes
74 6.9 89.50 48.1 0.50 10 30 1.7 4.10 54.60 1.5 2.3 0.9 1 Yes
75 6.9 20.09 13.44 0.50 13 30 1.2 6.34 116.30 3 4 0 1 Yes
76 6.9 20.09 13.44 0.50 10 30 1.7 10.02 93.00 2.4 2.3 0.9 1 Yes
77 6.9 20.09 13.44 0.50 10 30 1.7 6.91 90.40 2.4 2.3 0.9 1 Yes
78 7.9 44.13 20.1 0.36 4.9 60 10.9 6.24 61.90 2.4 2.3 0.9 1 Yes
79 7.9 44.13 20.1 0.36 4.9 60 10.9 9.12 58.00 2.4 2.3 0.9 1 Yes
80 7.9 44.13 20.1 0.36 4.9 60 10.9 10.20 58.00 2.4 2.3 0.9 1 Yes
81 7.9 44.13 20.1 0.36 4.9 60 10.9 11.44 61.90 2.4 2.3 0.9 1 Yes
82 7.5 177.00 39 0.12 10 22 1 5.99 50.20 1.6 0 0.4 1 Yes
83 7.5 177.00 39 0.12 10 22 1 10.01 56.80 2.51 0 1 1 Yes
84 7.6 19.60 41 0.43 14 53 6.1 7.11 50.58 1.2 1.2 0 1 Yes
85 7.6 34.76 43.85 0.79 21 18 0.8 10.78 93.50 3.5 2.3 0.9 1 Yes
86 7.6 27.00 43 0.79 3 53 8 10.03 86.20 4.5 0.5 0 1 Yes
87 9 359.24 95.7 0.36 7 13 0.6 8.54 99.70 4.5 5.55 0 1 Yes
88 9 366.76 97.4 0.38 7.3 33 2 12.75 133.80 3.61 4.7 0 1 Yes
89 9 301.93 121.6 0.84 8 13 0.6 9.06 144.40 7.1 2.1 0.3 1 Yes
90 9 298.01 115.8 0.77 6.45 27 1.4 8.68 107.55 4.8 0.3 1.8 1 Yes
91 8.3 153.00 37.3 0.19 3 70 30 2.62 30.40 1 1 0 1 Yes
92 8.3 122.00 32.6 0.19 4.9 58 9 6.11 70.00 0.7 3.7 0 1 Yes
93 9.2 95.00 77 0.21 6 56 8.38 15.54 52.90 0 2.3 0.9 1 Yes
94 9.2 95.00 77 0.21 10 29 1.62 6.82 103.00 0 2.3 0.9 1 Yes
95 7.9 92.24 40 0.24 5 40 3.1 11.44 154.00 4.7 4.1 0.6 0 No
96 7.9 99.00 40 0.18 4.9 54.9 7.57 14.61 65.00 2 0.6 1.4 0 No
97 7.9 86.00 40 0.32 4.9 60.9 9.43 22.58 106.00 4.5 2.6 1.9 0 No
98 7.9 95.00 40 0.20 9 20 0.9 12.14 131.00 5 3.1 1.9 0 No
99 7.9 88.20 40 0.20 4.9 75.4 23.2 20.02 128.00 6.1 3.6 2.5 0 No
100 7.9 93.50 40 0.18 4.9 50 5.9 23.48 134.00 3.7 2.9 0.8 0 No
101 7.9 85.50 40 0.25 4.9 68.6 15.89 16.91 111.00 3.7 0 3.7 0 No
102 7.9 81.46 50 0.21 4.9 80 38.1 35.76 63.00 3 0 3 0 No
103 7.9 129.60 110 0.20 4.9 30 1.68 17.51 96.00 4.1 3.5 0.6 0 No
104 7.9 100.00 105 0.30 4.9 65 15 62.12 45.00 0.8 0.5 0.3 0 No
105 7.9 88.00 105 0.41 5 70 20.4 17.98 163.00 8 2 6 0 No
106 7.9 80.92 105 0.48 4.9 50 5.9 21.40 112.00 2 4.2 0 0 No
107 7.9 87.00 105 0.47 4.9 60 10.9 24.36 100.00 4.3 3.2 1.1 0 No
108 7.9 84.42 105 0.43 4.9 90 71.2 45.64 85.00 4 4 0 0 No
109 7.9 78.00 105 0.43 5 50 5.9 14.49 122.00 6.2 4.1 2.1 0 No
110 7.9 82.20 105 0.37 4.9 50 5.9 15.58 83.00 3.4 2.1 1.3 0 No
111 7.9 73.20 105 0.41 4 50 5.9 36.66 106.00 1.4 3.8 0 0 No
112 7.9 90.10 105 0.27 9 25.8 0.59 23.59 61.00 2 2.3 0 0 No
113 7.9 73.30 150 0.26 4.9 50 5.9 19.89 155.00 6 2 4 0 No
114 7.9 152.70 90 0.52 4.9 50 5.9 17.78 89.00 3 1.1 1.9 0 No
115 7.9 32.80 70 0.20 4.9 87 59 14.70 38.00 1.5 1.9 0 0 No
116 7.9 20.30 70 0.27 4.9 50 5.9 24.43 55.00 2.3 2.1 0.2 0 No
117 7.9 20.60 70 0.31 4.9 40 3.1 44.86 116.00 5.4 0 5.4 0 No
118 7.9 16.87 70 0.37 4.9 60 10.9 28.56 68.00 3 1 2 0 No
119 7.9 18.43 70 0.32 4.9 40 3.1 39.43 33.00 1.5 0.8 0.7 0 No
120 7.9 19.10 70 0.31 4.9 75 27.9 30.55 61.00 2.7 0 2.7 0 No
121 7.9 50.60 70 0.26 4.9 65 15 38.37 33.00 1.4 1 0.4 0 No
122 7.9 50.25 70 0.18 4.9 90 71.2 28.31 59.00 2.1 3.6 0 0 No
123 7.9 50.10 70 0.20 4.9 80 38.1 29.45 53.00 2.4 2.2 0.2 0 No
124 7.9 56.20 70 0.18 4.9 80 38.1 30.65 103.00 4 1.6 2.4 0 No
125 7.9 65.80 70 0.17 4.9 75 27.9 27.45 98.00 3.4 1.6 1.8 0 No
126 6.9 17.80 10.9 0.23 6 23 1.1 45.91 52.30 2.3 2.3 0.9 0 No
127 6.9 89.50 10.9 0.46 4.9 62 12.4 29.64 36.00 1 2.3 0.9 0 No
128 6.9 89.50 10.9 0.46 4.9 62 12.4 37.55 69.20 3 2.3 0.9 0 No
129 6.9 9.20 10.9 0.35 2 61 12 24.48 51.80 1.65 2.3 0.9 0 No
130 6.9 9.20 10.9 0.35 5 44 4.1 25.14 49.10 1.45 2.3 0.9 0 No
131 6.9 9.20 10.9 0.35 17 54 7.5 25.89 46.30 1.35 2.3 0.9 0 No
132 6.9 9.20 10.9 0.35 17 54 7.5 21.50 67.10 1.85 2.3 0.9 0 No
133 6.9 9.20 10.9 0.35 3 61 12 27.28 41.70 1.1 2.3 0.9 0 No
134 6.9 14.20 10.9 0.50 15 78 34 15.76 70.70 0.8 2.3 0.9 0 No
135 6.9 53.00 19.1 0.12 10 50 5.9 5.06 83.40 2 2 0 0 No
136 6.9 53.00 19.1 0.12 10 50 5.9 6.18 101.00 2 2 0 0 No
137 6.9 34.00 16.3 0.63 7 50 5.9 8.19 93.80 3.7 2.3 0.9 0 No
138 6.9 34.00 16.3 0.53 8 50 5.9 20.96 49.00 2 2.3 0.9 0 No
139 6.9 34.00 16.3 0.63 11 40 3.1 5.96 60.00 2.9 2.3 0.9 0 No
140 6.9 34.00 16.3 0.65 2 30 1.7 7.79 74.00 2.2 2.3 0.9 0 No
141 6.9 20.09 13.44 0.50 6 30 1.7 10.67 126.40 3.2 0 3.2 0 No
142 6.9 20.09 13.44 0.50 5 30 1.7 8.07 129.30 3.2 0 3.2 0 No
143 6.9 89.50 44 0.45 10 30 1.7 7.76 136.90 7 0 7 0 No
144 6.9 89.50 44 0.40 10 30 1.7 5.45 115.40 7 0 7 0 No
145 7 44.13 19.76 0.18 4.9 60 10.9 6.24 61.90 2.4 2.3 0.9 0 No
146 7 44.13 19.76 0.18 4.9 60 10.9 9.12 58.00 2.4 2.3 0.9 0 No
147 7 44.13 19.76 0.18 4.9 60 10.9 10.20 58.00 2.4 2.3 0.9 0 No
148 7 44.13 19.76 0.18 4.9 60 10.9 10.79 61.90 2.4 2.3 0.9 0 No
149 7.5 177.00 39 0.12 10 19 1 8.78 70.15 3.35 3.6 0 0 No
150 7.6 14.69 41 0.43 15.3 41 3.43 10.61 71.70 1.5 0 1.5 0 No
151 7.6 14.69 41 0.43 20.5 45 4.36 13.99 84.00 1.8 5.8 0 0 No
152 7.6 24.17 41 0.43 19 13 0.6 14.15 97.46 1.7 0 1.7 0 No
153 7.6 19.34 41 0.43 16 45 2.9 18.54 216.12 4 1.4 0 0 No
154 7.6 19.81 41 0.43 19 46 1.5 8.47 130.14 5 0 5 0 No
155 7.6 19.81 41 0.43 22 9 0.17 18.76 202.67 2.8 4.6 0 0 No
156 7.6 19.82 41 0.43 20 42 1.4 17.51 181.01 0.65 1.8 0 0 No
157 7.6 18.55 41 0.43 11 57 10 11.67 138.27 5 4.3 0.7 0 No
158 7.6 20.25 41 0.43 26 2 0.29 15.86 276.77 9.6 1.4 0 0 No
159 7.6 17.14 41 0.43 16 46 4.5 12.11 85.12 2.4 0 2.4 0 No
160 7.6 19.45 41 0.43 16 55 7 6.46 199.70 1.5 9 0 0 No
161 7.6 27.00 43.85 0.79 9 31 1.8 17.35 97.99 5.1 0 5.1 0 No
162 7.6 32.26 43.85 0.79 16 5 0.36 16.06 127.44 1.3 0 1.3 0 No
163 9.2 75.00 75 0.31 2 31 1.84 28.71 25.00 0 2.3 0.9 1 Yes
164 9.2 75.00 75 0.31 20 31 1.84 9.34 79.30 4.42 2.3 0.9 1 Yes
165 9.2 75.00 75 0.31 7 49 5.58 8.20 59.00 3.05 2.3 0.9 1 Yes
166 6.4 22.00 3.5 0.20 15 35 2 10.99 87.60 2.7 0.07 0 1 Yes
167 6.8 15.50 10 0.20 19 47 4 7.40 19.90 0.2 0.4 0 1 Yes
168 7.8 81.00 20 0.15 4.9 52 6 3.84 36.70 1.3 2.3 0.9 1 Yes
169 7.4 20.08 17.55 0.30 5 51 6.4 22.34 17.31 0.5 2.3 0.9 1 Yes
170 7.4 20.08 17.55 0.40 1 56 8.4 21.30 22.20 1.25 2.3 0.9 1 Yes
171 7.4 20.08 17.55 0.40 5 39 3 20.31 27.80 0.75 0.85 0 1 Yes
172 7.4 20.08 17.55 0.40 5 39 3 29.64 23.50 0.75 1.2 0 1 Yes
173 7.4 20.08 17.55 0.40 2 39 3 19.70 33.80 0.3 1.1 0 1 Yes
174 7.4 20.08 17.55 0.40 8 22 1 9.63 64.10 0.75 1 0 1 Yes
175 7.4 20.08 17.55 0.40 5 39 3 18.76 40.80 0 0.8 0 1 Yes
176 7.4 20.08 17.55 0.50 5 33 2 30.57 11.80 0.5 0.6 0 1 Yes
177 7.4 20.08 17.55 0.40 2 39 3 11.68 73.80 0.4 4 0 1 Yes
178 7.4 20.08 17.55 0.40 14 22 1 17.89 35.10 1.3 1.3 0 1 Yes
179 9.2 67.00 26.5 0.44 9 46 9 7.77 89.00 1 2.3 1 1 Yes
180 9.2 67.00 26.5 0.44 9 46 9 2.36 58.60 0.8 2.3 0.8 1 Yes
181 9.2 67.00 26.5 0.44 9 46 9 13.51 70.80 1 2.3 1 1 Yes
182 9.2 67.00 26.5 0.44 9 46 9 3.10 85.60 1 2.3 1 1 Yes
183 6.4 22.00 3.5 0.47 13 21 1.2 14.60 25.00 0.2 1.2 0 1 Yes
184 6.4 22.00 3.5 0.47 9 46 9 11.15 16.50 0.2 1 0 1 Yes
185 6.4 22.00 3.5 0.47 9 46 9 18.21 30.60 0.2 1 0 1 Yes
186 6.9 14.20 10.9 0.60 9 46 9 25.46 215.00 7.5 3.5 4 1 Yes
187 6.9 13.00 10.9 0.50 9 46 9 10.83 36.50 0.7 2.3 0.9 1 Yes
188 6.9 9.20 10.9 0.39 9 46 9 13.86 41.80 1.4 2.3 0.9 1 Yes
189 7.8 140.00 34.2 0.46 9 46 9 8.73 96.10 3.8 2.7 1.1 1 Yes
190 7.8 140.00 34.2 0.46 9 46 9 9.88 93.20 3.6 2.1 1.5 1 Yes
191 7.8 140.00 34.2 0.46 9 46 9 7.69 97.30 3.8 2.2 1.6 1 Yes
192 7.8 89.50 48.1 0.24 9 46 9 3.23 86.50 2.6 2.6 0 1 Yes
193 7.8 89.50 48.1 0.24 9 46 9 4.09 68.50 2.8 2.8 0 1 Yes
194 7.9 74.70 40 0.23 9 0.4 0.51 14.57 62.00 2.2 1.5 0.7 1 Yes
195 7.9 109.10 40 0.14 9 0.5 0.15 8.30 53.00 1.8 2.8 0 1 Yes
196 7.9 94.21 40 0.21 9 46 9 11.72 68.00 1.9 2.4 0 1 Yes
197 7.9 95.00 40 0.20 9 46 9 13.88 58.00 1 1.8 0 1 Yes
198 7.9 91.90 105 0.34 3.8 57.2 11.59 13.60 87.00 2.8 4 0 1 Yes
199 7.9 64.22 150 0.59 9 46 9 22.57 43.00 1.2 1.2 0 1 Yes
200 7.9 71.90 105 0.59 9 46 9 8.31 111.00 3.7 3 0.7 1 Yes
201 7.9 63.50 150 0.41 9 46 9 15.49 84.00 3.2 2.9 0.3 1 Yes
202 7.9 87.90 105 0.30 9 46 9 16.06 107.00 4 4 0 1 Yes
203 7.9 86.36 105 0.48 9 46 9 6.72 73.00 1.6 5 0 1 Yes
204 7.9 77.70 105 0.49 9 46 9 15.41 96.00 2.5 2.9 0 1 Yes
205 7.6 30.80 37.175 0.19 28 47 4.8 13.62 248.40 7 3.6 0 0 No
206 7.6 27.65 37.175 0.19 15 45 3 16.47 139.92 1.6 0.5 0 0 No
207 7.6 24.33 37.175 0.19 10 55 7.5 14.56 153.29 1.2 0 1.2 0 No
208 9 386.48 69.9 0.14 0 33 2 16.37 185.00 0.32 2.9 0 0 No
209 9 403.78 83.1 0.20 0 33 2 17.19 129.50 1.75 1.6 0 0 No
210 9 417.87 71.4 0.18 4 55 7.8 9.51 220.90 11.7 2 0 0 No
211 9 417.83 71.4 0.18 8 37 2.6 11.59 157.70 8.5 2.95 1.25 0 No
212 9 403.54 73.9 0.20 17 46 0.153 14.97 219.80 9 1.2 0 0 No
213 9 310.16 46.7 0.20 0 33 2 15.61 125.44 1.15 3.9 0 0 No
214 9 365.72 80.4 0.38 45.3 25 1.25 15.79 121.60 2.4 5.1 0 0 No
215 9 291.89 115.8 0.77 0 33 2 12.61 161.25 4.5 4.3 0 0 No
216 9 456.38 26.5 0.09 0 73 24.4 12.32 92.40 0.33 1 0 0 No
217 9 453.25 26.5 0.09 10 18 0.8 11.53 52.90 0.55 0.6 0 0 No
218 9 446.48 8.4 0.07 0 33 2 16.64 127.15 0.5 2.6 0 0 No
219 9 415.50 65.2 0.18 4 10 0.48 8.24 22.40 1 0 1 0 No
220 9 411.26 45.6 0.14 9 33 2 18.88 216.08 2.8 10.65 0 0 No
221 9 416.70 45.6 0.13 9 33 2 17.07 114.20 0 7.05 0 0 No
222 9 421.13 43.4 0.15 3 69 19.2 12.09 188.00 9.38 4.8 4.58 0 No
223 9 427.34 86.4 0.26 3.3 33 2 10.56 100.10 3.4 2.5 0.9 0 No
224 9 421.07 71.4 0.17 9.55 41 3.38 13.50 230.38 8.52 0.4 4.1 0 No
225 9 359.79 71.1 0.37 5.9 47 4.8 10.41 125.60 5.96 0 5.96 0 No
226 9 413.46 45.6 0.12 11 16 0.7 18.95 236.03 2.15 6.6 0 0 No
227 9 402.01 75.8 0.16 9 46 9 20.41 227.30 2.5 5.9 0 0 No
228 9.2 75.00 75 0.31 10 39 3 7.41 188.30 2.74 2.3 0.9 0 No
229 7.4 20.08 17.55 0.40 2 39 3 41.07 26.30 0.6 0.6 0 0 No
230 7.7 84.30 25.6 0.25 2.5 22 1 28.44 51.70 2.9 2.3 0.9 0 No
231 7.7 84.30 25.6 0.25 2.5 22 1 23.55 60.90 2.9 2.3 0.9 0 No
232 7.9 99.00 40 0.18 4.9 54.9 7.57 14.84 63.00 2 0.6 1.4 0 No
233 7.9 129.60 110 0.20 4.9 30 1.7 16.61 100.00 4.1 3.5 0.6 0 No
234 7.9 84.42 105 0.43 4.9 90 71.2 18.21 99.00 4 4 0 0 No

Fig. 4.

Fig. 4

Histograms of the input and output data.

Model performance assessment

The classification problem involves making predictions or decisions with confidence levels determined by a threshold value. To improve accuracy, the model’s control parameters were adjusted in trials until the best fitness measures were achieved. To assess the proposed models’ performance, we utilized five confusion matrix’s and two statistical indicators: accuracy, sensitivity, specificity, precision, Phi correlation coefficient, along with root mean squared error (RMSE) and mean absolute error (MAE).The accuracy and error rate are calculated using the Eqs. (7)–(13).

graphic file with name 41598_2025_7494_Article_Equ7.gif 7
graphic file with name 41598_2025_7494_Article_Equ8.gif 8
graphic file with name 41598_2025_7494_Article_Equ9.gif 9
graphic file with name 41598_2025_7494_Article_Equ10.gif 10
graphic file with name 41598_2025_7494_Article_Equ11.gif 11

Further, P and N are respectively, positive (TP + FN) and negative (TN + FP) samples and TP and TN is true positive and true negative whereas FP and FN are false positive and false negative.

graphic file with name 41598_2025_7494_Article_Equ12.gif 12
graphic file with name 41598_2025_7494_Article_Equ13.gif 13

where,

Liqti = Targeted liquefaction,

Liqpi =Predicted liquefaction and.

n = Number of data sets.

Where, a high accuracy score indicates that the model is making correct predictions across all classes. Similarly, high values of sensitivity, specificity, and precision indicates that the model is effective at capturing positive instances, minimizing false negatives; the model is effective at avoiding false positives, correctly identifying negative instances; the model predicts a positive instance, it is likely to be correct; respectively ranges from − 1 to 1. A value of 1 indicates a perfect positive correlation, −1 indicates a perfect negative correlation, and 0 suggests no correlation respectively. Finally, The Phi correlation coefficient ranges from − 1 to 1.A value of 1 indicates a perfect positive correlation.A value of −1 indicates a perfect negative correlation. A value of 0 suggests no correlation between predicted and actual outcomes, respectively.

In addition to the confusion matrix, we use two statistical performance indicators: RMSE and MAE. MAE and RMSE can be employed together to analyze the variation in prediction errors within a dataset. RMSE will always be equal to or larger than MAE. The greater the difference between RMSE and MAE, the more variance there is in individual errors in the dataset. If RMSE equals MAE (both ranging from 0 to α), it means all errors are of the same magnitude. These indicators are negative-oriented, where lower values are considered better. MAE is also seen as a robust measure of predictive accuracy. The choice of an error measure significantly influences conclusions about which predictive methods are most accurate within a set.

Results and discussions

Fundamental analysis

The primary aim of this study is to assess the applicability of three soft computing models for liquefaction classification, a topic of significant importance within the fields of geotechnical and earthquake engineering. As discussed in the data collection and analysis section, twelve critical input parameters are used to classify the liquefaction potential for 234 data sets. A 5-fold cross-validation (CV) within GridSearchCV to systematically tune hyperparameters for the SVM Polynomial (C= [0.1, 1, 10], degree = [2–4], coef0=[0, 1]), SVM RBF (C=[0.1, 1, 10], gamma=[‘scale’, ‘auto’, 0.1, 1]), and Random Forest Classifier (n_estimators=[100, 200, 500], max_depth=[None, 10, 20], min_samples_split = [2,5,10]) models.The 5-fold CV splits the resampled training data into five subsets, training on four and validating on one iteratively, to select the best parameters based on accuracy, ensuring robust evaluation across diverse data splits. The results demonstrate the effectiveness of this approach: the SVM Polynomial model achieved a test accuracy of 0.936 with best parameters C = 10, degree = 4, coef0 = 1, yielding a macro-averaged F1-score of 0.93 (class 0: precision = 1.00, recall = 0.85; class 1: precision = 0.90, recall = 1.00); the SVM RBF model outperformed with a test accuracy of 0.957 using C = 10, gamma=’scale’, with a macro-averaged F1-score of 0.96 (class 0: precision = 0.95, recall = 0.95; class 1: precision = 0.96, recall = 0.96); and the Random Forest model achieved a test accuracy of 0.894 with n_estimators = 100, max_depth = None, min_samples_split = 2, resulting in a macro-averaged F1-score of 0.89 (class 0: precision = 0.86, recall = 0.90; class 1: precision = 0.92, recall = 0.89). To prevent overfitting, implements ensemble resampling (RandomOverSampler + SMOTE) to address class imbalance, as evidenced by the balanced support (20 for class 0, 27 for class 1) and high recall across models. A separate 20% test set ensures unbiased evaluation of generalization, with the high-test accuracies and balanced metrics indicating minimal overfitting. Moderate hyperparameter ranges were chosen to avoid overly complex models. The sensitivity analysis is also performed to evaluate the model robustness across parameter variations for SVM ploy, and SVM RBK are shown in Fig. 5 and RFC in Fig. 6 for both max_depth and n_estimaters. From these Figs. 5 and 6 it can be concluded that the most effective hyperparameters are: C is 10 for both the SVM models and for RFC model max_depth is 15 and n_estimater is 100.

Fig. 5.

Fig. 5

Hyperparameters for SVM_Poly and SVM_RBK.

Fig. 6.

Fig. 6

Hyperparameters for RFC.

By using the Hyperparametersparameters, the best model was developed as evidence from the confusion matrixes and statistical indicators. The calculated values of confusion matrixes and statistical indicators for training and testing are shown in Figs. 7 and 8, respectively. For example, upon analyzing Fig. 8corresponding to RFC, it is evident that an accuracy of 0.957 indicates the model is correct in its predictions 95.7% of the time. This outcome agrees with previous results in the literature15,17. A sensitivity of 1 signifies the model correctly identifies 100% of actual positive instances, while a specificity of 0.95 implies correct identification of 95% of actual negative instances35,38. The sensitivity analysis compares well with the findings in the literature20,23. A precision of 0.96 suggests that when the model predicts a positive instance, it is correct 96% of the time. The Phi Correlation Coefficient, with a value of 0.91, indicates a strong positive correlation between predicted and actual values.Furthermore, the model exhibits a Mean Absolute Error (MAE) of 0.043, reflecting a relatively low average error in predictions. The Root Mean Squared Error (RMSE) value of 0.206 indicates better accuracy in predicting the target variable.This discussion applies to both the training (Fig. 7) and testing (Fig. 8), with only one instance explained in detail to maintain clarity. When comparing Figs. 7a and 8, it is observed that all models are well-trained, particularly during the training stage. However, based on the values of confusion matrices and statistical indicators, the RFC model performs exceptionally well, followed by SVM_RBK, and SVM_Poly.

Fig. 7.

Fig. 7

Training.

Fig. 8.

Fig. 8

Testing.

The results displayed in the chart (see Fig. 7) provide a comprehensive comparison of three machine learning models—SVM with Polynomial Kernel (SVM_Poly), SVM with Radial Basis Kernel (SVM_RBK), and Random Forest Regressor (RFR)—used to predict gravelly soil earthquake-induced liquefaction. The evaluation metrics include both classification performance indicators and error-based metrics, offering a well-rounded view of model performance.The RFR model consistently outperforms the others in almost every category. It achieves perfect scores of 1.00 in accuracy, sensitivity, specificity, precision, and phi correlation coefficient, indicating flawless classification during training. Its error-based metrics; mean absolute error (0) and root mean squared error (0) are also low, though not the lowest among the models. These results suggest the RFR has both high predictive power and robustness, potentially due to its ensemble nature which captures complex patterns in the data effectively.The SVM_RBK model is a close competitor, achieving nearly perfect classification results with accuracy of 0.98, sensitivity of 1, and precision, specificity, and phi correlation of 0.96, identical to RFR. Where it stands out most is in error minimization: it has the lowest mean absolute error (0.018) and a root mean squared error of 0.135, slightly higher than RFR but still much better than SVM_Poly. This indicates that while it performs marginally worse than RFR on overall classification metrics, it achieves better calibration in prediction magnitudes, making it especially useful in regression-type output interpretation for liquefaction probabilities.On the other hand, SVM_Poly lags significantly behind the other two models in all performance metrics. It scores an accuracy of 0.97, sensitivity of 0.99, and a phi correlation coefficient of 0.95—all lower than the rest. It also has the highest mean absolute error (0.023) and root mean squared error (0.151), pointing to weaker model generalization and predictive precision. This suggests that the polynomial kernel was not well-suited for the dataset or that it was possibly underfitted or overfitted, unable to capture the nonlinear relationships as effectively as the other models.Overall, while all three models demonstrate strong potential, RFR and SVM_RBK are both highly effective, with RFR excelling in classification and SVM_RBK in error minimization. The choice between them may depend on the application’s sensitivity to false predictions or error magnitude. Meanwhile, SVM_Poly, although still usable, is clearly less competitive in this modeling context.

The testing results show a noticeable drop in performance for all models compared to the training phase, which is expected and typically reflects more realistic model behavior when generalizing to unseen data. This comparison among SVM_Poly, SVM_RBK, and RFR suggests a more nuanced competition, with no single model dominating across all metrics.RFR (Random Forest Regressor), which excelled during training, shows decent generalization capabilities in testing. It achieves the highest accuracy (0.957) and ties with SVM_RBK for sensitivity (0.936), indicating a good balance in correctly identifying positive cases. However, RFR’s performance in mean absolute error (0.043) and root mean squared error (0.206) is weaker than that of the SVMs, suggesting that while its classification is solid, its precision in estimating continuous outputs is relatively lower.SVM_RBK demonstrates strong generalization in terms of mean absolute error (0.064) and root mean squared error (0.253), both of which are better than SVM_Poly and only slightly behind RFR in RMSE. Its classification metrics—accuracy (0.894), sensitivity (0.889), specificity (0.90), and precision (0.923)—show it to be relatively balanced, though slightly behind RFR on some counts. Notably, it has the lowest phi correlation coefficient (0.785) among the three, implying a bit less consistency in its predictive relationship. SVM_Poly, interestingly, shows some strengths in classification, especially in specificity (0.90) and precision (0.923), meaning it is good at identifying negative cases and making accurate positive predictions when it does so. However, it struggles in sensitivity (0.889), missing more positive cases, and has the highest RMSE (0.326), which reflects greater variability in its error distribution. Its phi coefficient (0.785) is in the middle, suggesting moderate but not exceptional performance in capturing the true patterns in the data.Overall, the testing results highlight that RFR remains the strongest classifier, while SVM_RBK proves to be more stable in minimizing prediction errors, likely making it the better choice for regression-type outputs. SVM_Poly, although competitive in some areas, generally lags behind the other two, especially in error metrics, suggesting it is less reliable for this problem in a real-world setting. The drop in performance from training to testing across all models also suggests potential overfitting, particularly for RFR, which was flawless during training but shows some degradation in generalization.

In summary, all the developed models exhibit promising performance across various evaluation metrics, with low error measures (MAE and RMSE), high accuracy, and strong performance in correctly identifying both positive and negative instances, as evidenced by sensitivity and specificity. The high precision and Phi Correlation Coefficient further affirm the reliability and accuracy of the model’s predictions. However, among the three models FRC model is the best for classifying the liquefaction. When harmonizing the results from both the training and testing phases, it becomes evident that while all three models — SVM_Poly, SVM_RBK, and RFR — exhibit strong capabilities during training, their performance during testing paints a more realistic picture of their generalizability. By taking an average across the metrics, we can evaluate the consistency and robustness of each model. Starting with SVM_Poly, its training performance was respectable in most metrics, though consistently the lowest among the three, and this trend persisted during testing. Its average accuracy across both phases is approximately 0.935, sensitivity is 0.94, and specificity and precision is 0.994, which indicates a model more conservative in catching positive cases but highly effective in identifying negatives. However, its mean absolute error and root mean squared error average to 0.065 and 0.239, respectively, the highest among the three, suggesting its predictive precision is the least reliable. The phi correlation coefficient averages to 0.85, highlighting moderate but suboptimal predictive strength. SVM_RBK emerges as the most balanced model when harmonizing both phases. In training, it nearly matched RFR in classification strength and had the lowest error metrics. During testing, it sustained commendable accuracy and the lowest average MAE (0.021) and RMSE (0.103) across both phases, marking it as the most consistent in terms of minimizing prediction errors. Its average accuracy is about 0.979, with sensitivity and specificity hovering around 0.975, and a phi coefficient of 0.956. The model maintains robust classification abilities with reliable regression performance, indicating excellent generalization from training to unseen data. RFR stood out in training, achieving perfect scores across most classification metrics, but its testing results revealed a drop in generalization, albeit still competitive. Its average accuracy is about 0.979, which is the highest, and both sensitivity and specificity average to approximately 0.98, indicating strong classification capabilities. However, it shows vulnerability in regression metrics with an average MAE of 0.021 and RMSE of 0103, where it’s outperformed by SVM_RBK. Despite this, the phi coefficient remains consistently high with an average around 0.956, suggesting that it captures relationships in the data better than the SVM_Poly and even slightly better than SVM_RBK. Finally, while RFR dominates in classification metrics, SVM_RBK provides the best balance between classification and regression performance, demonstrating robust generalization and minimal overfitting. SVM_Poly, though competent, lags behind the others both in accuracy and precision, making it the least favorable among the three. When averaged across training and testing, SVM_RBK stands out as the most consistent and reliable model for both classification and prediction tasks.

Finally, the confusion matrix heatmaps for the Random Forest Classifier (RFC) (Fig. 9), Support Vector Machine with Radial Basis Kernel (SVM-RBK) (Fig. 10), and Support Vector Machine with Polynomial Kernel (SVM-Poly) (Fig. 11) offer valuable insights into their classification performance on the testing data. These heatmaps are developed for the testing data, as testing is a critical aspect of evaluating model reliability. As shown in Fig. 9, the RFC model demonstrates the highest accuracy, correctly classifying 19 true negatives (TN) and 26 true positives (TP), with only 1 false positive (FP) and 1 false negative (FN). In contrast, Fig. 10 shows that the SVM_RBK model exhibits the moderate accuracy, with 17 TN, 27 TP, no FN, and 3 FP, suggesting it struggles more with identifying negative cases and indicating a moderate decline in performance compared to RFC. However, Fig. 11 reveals that the SVM_Poly model achieves 18 TN and 24 TP, but with slightly higher errors at 2 FP and 3 FN. Overall, the RFC model outperforms both SVM-RBK and SVM-Poly in achieving balanced and accurate classification on the testing data, making it the most reliable choice for this study, while SVM-Poly proves the least effective due to its higher misclassification rate for negative instances.

Fig. 9.

Fig. 9

Confusion matrix heatmap RFC.

Fig. 10.

Fig. 10

Confusion matrix heatmap SVM_RBK.

Fig. 11.

Fig. 11

Confusion matrix heatmap SVM_Poly.

Furthermore, the analysis of confusion matrix heatmaps for the Random Forest Classifier (RFC), Support Vector Machine with Radial Basis Kernel (SVM-RBK), and Support Vector Machine with Polynomial Kernel (SVM-Poly) contributes significantly to the research on developing a data-driven framework to model earthquake-induced liquefaction potential of granular terrain using machine learning classification models. In the context of geohazard risk assessment, especially for earthquake-induced liquefaction, model reliability and classification accuracy are paramount since misclassification can lead to either underestimating or overestimating the hazard, both of which carry serious consequences for infrastructure safety and disaster mitigation planning4042. The detailed performance comparison among the three models, using confusion matrix heatmaps, provides a clear indication of how each model handles classification tasks related to high-risk versus low-risk terrain categories. The Random Forest Classifier’s ability to achieve both high true positive and true negative rates with minimal false predictions makes it especially valuable for identifying areas that are genuinely susceptible to liquefaction while minimizing the chances of false alarms or missed threats. This balance is critical when evaluating liquefaction potential, as it directly affects decisions on land-use planning, foundation design, and emergency response strategies. On the other hand, the SVM-RBK’s moderate performance and higher false positive rate suggest a tendency to overpredict risk, which, while conservative, might lead to unnecessary mitigation costs or restrictions. The SVM-Poly model’s relatively lower performance, with higher false negatives and false positives, presents a greater risk of unreliable classification, potentially overlooking vulnerable areas or misallocating resources. These insights highlight the impact of kernel choice and model tuning on prediction quality, underscoring the need for careful model selection and validation in geotechnical hazard modeling. In general, this classification analysis enhances the research by demonstrating how various ML classifiers perform in critical classification tasks under geotechnical uncertainty. It supports the selection of robust, high-performing models like RFC for dependable liquefaction hazard prediction, contributing to a more accurate and actionable risk analysis framework. Ultimately, this leads to more informed engineering decisions, optimized resource allocation, and enhanced resilience against earthquake-induced geohazards.

Comparison of models

When comparing the results of the present research models—SVM with polynomial and RBF kernels (SVM_Poly and SVM_RBK) and Random Forest Regressor (RFR)—with those discussed in the literature review above, several key observations emerge in terms of both alignment and improvement35,38. The literature commonly highlights the effectiveness of SVM models, particularly with RBF kernels, in handling complex, nonlinear datasets with high accuracy, precision, and sensitivity, often reaching performance benchmarks above 90% in clinical or biological prediction tasks. In this study, the SVM_RBK model aligns well with those findings, achieving an average accuracy of 96%, a sensitivity of around 94%, and notably low error metrics such as a mean absolute error of approximately 0.065 and RMSE of 0.194 across training and testing. This supports the literature’s assertion that the RBF kernel offers superior generalization performance due to its ability to model non-linear relationships more effectively. In contrast, the SVM_Poly model performed relatively worse in both phases, with lower average accuracy and higher prediction errors. While polynomial kernels have been referenced in the literature for their strength in capturing interaction effects in structured data, their performance tends to be highly sensitive to the choice of polynomial degree and parameter tuning. The results in the current study reflect this limitation, showing that while SVM_Poly can deliver solid classification results (especially specificity and precision during testing), it falls short in overall predictive accuracy and error reduction, which echoes findings from past studies that caution against polynomial kernels unless well-optimized. The Random Forest Regressor model outperformed both SVM variants in several training metrics, even achieving perfect classification results during training. However, its slight performance drop during testing reveals some susceptibility to overfitting, a limitation that has also been identified in prior literature. Despite this, RFR remained competitive, especially in classification accuracy and the phi correlation coefficient. Literature reviews have often praised Random Forest models for their robustness, interpretability, and strong performance in both classification and regression, particularly in biomedical applications. The results here reinforce that reputation, especially when considering RFR’s ability to maintain high classification metrics alongside solid regression results.Overall, the present research confirms and builds upon the findings in existing literature. It reinforces the notion that SVM with RBF kernel and Random Forests are highly reliable for both classification and prediction tasks, while also illustrating that model performance must be carefully validated on testing data to ensure generalizability. In doing so, it contributes a practical comparison that validates existing claims while offering empirical evidence of model strengths and weaknesses across multiple performance dimensions.

The comparison of the present research models with those discussed in the literature review demonstrates consistency with and, in some cases, improvements over previously reported findings. The superior performance of the SVM_RBK model in terms of accuracy, sensitivity, and precision aligns well with the conclusions of studies such as those by Juang et al.2, where SVMs were successfully applied for liquefaction potential evaluation and outperformed traditional empirical methods. The SVM_RBK’s generalization capabilities are also supported by findings in5, where kernel-based models were highlighted for their adaptability in geotechnical problems.The relatively lower performance of the SVM_Poly model in this study compared to SVM_RBK is consistent with observations in6, where the sensitivity of polynomial kernels to parameter selection was noted as a constraint in model stability. This further emphasizes the advantage of using RBF kernels in scenarios involving complex, nonlinear terrain behaviors induced by seismic events.The Random Forest Regressor’s exceptional training performance and slight overfitting during testing resonate with findings in8, where ensemble learning methods like Random Forests were recognized for their robustness and predictive power in geotechnical engineering, yet cautioned against their tendency to overfit without proper validation techniques. Moreover, the general reliability of Random Forest models in modeling soil behavior under dynamic loads is supported by10, which emphasized their suitability for sustainable seismic design due to their ability to manage nonlinearities and variable interactions effectively.By incorporating machine learning methods demonstrated in studies such as3,5,6,8, and10, this research validates the use of AI-based models for enhancing the sustainable design of earthquake-induced liquefaction-prone terrains. The observed model performances further affirm the role of advanced algorithms in achieving higher prediction reliability, which is essential for sustainable civil and geotechnical engineering applications.

Feature importance analysis

To understand the contribution of individual input features to a model’s classification, the SHAP (SHapley Additive exPlanations) method is utilized. In this study, SHAP is applied to two top-performing models: Random Forest Classifier (RFC) and Support Vector Machine with Radial Basis Kernel (SVM-RBK). A summary plot offers a global view of the SHAP value results, showing each input variable’s impact on the model’s output. The SHAP summary plots for the RFC and SVM-RBK models are depicted in Figs. 12 and 13, respectively.

Fig. 12.

Fig. 12

SHAP plot of RFC.

Fig. 13.

Fig. 13

SHAP plot of SVM_RBK.

In these figures, the horizontal axis represents SHAP values, while the vertical axis lists input variables, ordered from top to bottom by their mean absolute SHAP values, with the highest at the top. The color gradient reflects the magnitude of SHAP values for each variable, with red indicating higher values and blue denoting lower values. Positive SHAP values signify an increasing effect on the model output, while negative values indicate a decreasing effect. Analysis of Figs. 12 and 13 reveals shared patterns between the RFC and SVM-RBK models. Among the 12 input parameters, the top six—X8, X9, X4, X1, X12, and X6—are consistent across both models. However, the RFC identifies X3 as the least influential, whereas the SVM-RBK ranks X7 as the least influential. All 12 parameters contribute to the output prediction, and none are excluded from the analysis.

Conclusions

The classification modelling of the liquefaction potential of granular soil environments has been conducted by using two training algorithms of the support vector (SVM) such as the polynomial function (Poly) and the radial basis function kernel (RBK). 234 data entries were collected from a liquefaction prone granular material environment and modelled through a machine learning classification process. For developing the SVM_Poly, SVM_RBK models, the 5-fold cross-validation (CV) within GridSearchCV is used to systematically tune hyperparameters. At the end of the machine learning protocol, the following conclusions were made;

  • The calculated values of confusion matrixes and statistical indicators for training and testing shows that an accuracy of 0.89 indicates the model is correct in its predictions 89% of the time.

  • A sensitivity of 0.85 signifies the model correctly identifies 85% of actual positive instances, while a specificity of 0.94 implies correct identification of 94% of actual negative instances.

  • A precision of 0.94 suggests that when the model predicts a positive instance, it is correct 94% of the time.

  • The Phi Correlation Coefficient, with a value of 0.82, indicates a strong positive correlation between predicted and actual values.

  • The model exhibits a Mean Absolute Error (MAE) of 0.2351, reflecting a relatively low average error in predictions. The Root Mean Squared Error (RMSE) value of 0.3115 indicates better accuracy in predicting the target variable.

  • From the Feature importance analysis, the most influenced input parameters in order to predict the output are X8, X9, X4, X1, X12, and X6.

  • Generally, all the developed models exhibit promising performance across various evaluation metrics, with low error measures (MAE and RMSE), high accuracy, and strong performance in correctly identifying both positive and negative instances, as evidenced by sensitivity and specificity. The high precision and Phi Correlation Coefficient further affirm the reliability and accuracy of the model’s predictions. However, among the three models FRC model is the best for classifying the liquefaction potentials for design purposes.

Practical application of research

The practical application of this research lies in its ability to support sustainable geotechnical engineering practices by providing reliable and accurate predictive models for assessing earthquake-induced liquefaction in granular terrains. The integration of machine learning algorithms such as SVM with RBF kernel and Random Forest Regressor enables engineers and planners to identify high-risk zones with greater precision, thereby enhancing the safety and resilience of infrastructure in seismically active areas. These models can be implemented in early-warning systems, geotechnical site assessments, and land-use planning strategies to mitigate the risks associated with soil liquefaction. Furthermore, their adoption reduces reliance on traditional empirical methods, which are often time-consuming and less adaptable to complex nonlinear interactions present in real-world soil behavior. By incorporating these predictive tools into design workflows, professionals can make informed decisions that contribute to the development of earthquake-resilient communities, promote resource efficiency, and align with sustainability objectives in construction and urban development.

Recommendation for future research

Future research should focus on expanding the dataset used for model training and testing to include a broader range of geological and seismic conditions, enhancing the generalizability and robustness of the predictive models. Incorporating additional input parameters such as pore water pressure, shear wave velocity, and soil mineralogy could improve model accuracy and provide deeper insights into the liquefaction potential of diverse granular terrains. Moreover, integrating hybrid machine learning models that combine the strengths of multiple algorithms may yield superior predictive performance. Further investigation into the applicability of deep learning approaches, particularly in handling large and complex datasets, is also recommended. Future studies should also explore the implementation of these models in real-time monitoring systems, facilitating proactive risk management during seismic events. Finally, collaboration with field engineers and urban planners can help translate research findings into practical tools and guidelines that support sustainable and resilient infrastructure development.

Author contributions

K.C.O. & V.K. conceptualized the research project, K.C.O., V.K., T.G. & K.P.A. wrote the main manuscript text and K.C.O. & T.G. prepared the figures. All authors reviewed the manuscript.

Funding

The authors received no specific funding for this research project.

Data availability

The data supporting the results of this research work will be made available upon reasonable request from the corresponding author.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Kennedy C. Onyelowe, Email: kennedychibuzor@kiu.ac.ug, Email: konyelowe@mouau.edu.ng

Viroon Kamchoom, Email: viroon.ka@kmitl.ac.th.

References

  • 1.Keefer, D. K. Landslides caused by earthquakes. Geol. Soc. Am. Bull.95, 406–421 (1984). [Google Scholar]
  • 2.Seed, H. B. & Idriss, I. M. Analysis of soil liquefaction: Niigata earthquake. J. Soil. Mech. Found. Div.93, 83–108 (1967). [Google Scholar]
  • 3.Samui, P. & Sitharam, T. G. Machine learning modelling for predicting soil liquefaction susceptibility. Hazards Earth Syst. Sci.11, 1–9. 10.5194/nhess-11-1-2011 (2011). [Google Scholar]
  • 4.Ghani, S., Sapkota, S. C., Singh, R. K., Bardhan, A. & Asteris, P. G. Modelling and validation of liquefaction potential index of fine-grained soils using ensemble learning paradigms. Soil. Dyn. Earthq. Eng.177, 108399. 10.1016/j.soildyn.2023.108399 (2024). [Google Scholar]
  • 5.Alioua, S., Arab, A., Benbouras, M. A. & Leghouchi, A. Modeling static liquefaction susceptibility of saturated clayey sand using advanced machine-learning techniques. Transp. Infrastruct. Geotechnol 1–29. (2024).
  • 6.Hanandeh, S. M., Al-Bodour, W. A. & Hajij, M. M. Study of soil liquefaction assessment using machine learning models. Geotech. Geol. Eng.40, 4721–4734. 10.1007/s10706-022-02180-z (2022). [Google Scholar]
  • 7.Zhao, Z. et al. Probabilistic capacity energy-based machine learning models for soil liquefaction reliability analysis. Eng. Geol.338, 107613 (2024). [Google Scholar]
  • 8.Torres, E. & Dungca, J. Prediction of soil liquefaction triggering using Rule-Based interpretable machine learning. Geosciences1410.3390/geosciences14060156 (2024).
  • 9.Cong, Y., Motohashi, T., Nakao, K. & Inazumi, S. Machine learning predictive analysis of liquefaction resistance for sandy soils enhanced by chemical injection. Mach. Learn. Knowl. Extr.6, 402–419. 10.3390/make6010020 (2024). [Google Scholar]
  • 10.Khatoon, S., Kumar, K., Samui, P., Sadik, L. & Shukla, S. K. Machine learning approach for evaluating soil liquefaction probability based on reliability method. Nat. Hazards. 10.1007/s11069-024-06934-1 (2024). [Google Scholar]
  • 11.Vapnik, V. N. Statistical Learning Theory (Wiley, 1998).
  • 12.Cao, L. J. & Tay Francis, E. H. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Netw.14, 1506–1518 (2003). [DOI] [PubMed] [Google Scholar]
  • 13.Vapnik, V. N. The Nature of Statistical Learning Theory (Springer, 1995).
  • 14.Misra, D., Oommen, T., Agarwal, A., Mishra, S. K. & Thompson, A. M. Application and analysis of support vector machine based simulation for runoff and sediment yield. Biosyst Eng.103, 527–535 (2009). [Google Scholar]
  • 15.Yoon, H., Jun, S. C., Hyun, Y., Bae, G. O. & Lee, K. K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol.396, 128–138 (2011). [Google Scholar]
  • 16.Wang, H. & Hu, D. Comparison of SVM and LS–SVM for regression. In Proceedings of the international conference on neural networks and brain proceedings (ICNNB ’05) 279–283 (2005).
  • 17.Breiman, L. Random forests - Random Features. Technical Report 567, Statistics Department, University of California, Berkeley (1999). ftp.stat.berkeley.edu/pub/users/breiman.
  • 18.Breiman, L. Bagging predictors. Mach. Learn.24(2), 123–140 (1996). [Google Scholar]
  • 19.Quinlan, J. R. Learning with continuous classes. in Proceedings of Australian Joint Conference on Artificial Intelligence 343–348 (World Scientific Press, Singapore, 1992).
  • 20.Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Wadsworth, Monterey, CA, 1984).
  • 21.Pal, M. & Mather, P. M. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ.86, 554–565 (2003). [Google Scholar]
  • 22.Feller, W. An Introduction to Probability Theory and Its Application 3rd edn, Vol. 1 (Wiley, New York, 1968).
  • 23.Onyelowe, K. C. et al. Numerical model of debris flow susceptibility using slope stability failure machine learning prediction with metaheuristic techniques trained with different algorithms. Sci. Rep.14, 19562. 10.1038/s41598-024-70634-w (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Basuraj Bhowmik, S., Panda, B., Hazra, V. & Pakrashi Feedback-driven error-corrected single-sensor analytics for real-time condition monitoring. Int. J. Mech. Sci.21410.1016/j.ijmecsci.2021.106898 (2022).
  • 25.Krishnan, M., Bhowmik, B., Hazra, B. & Pakrashi, V. Real time damage detection using recursive principal components and time varying auto-regressive modeling. Mech. Syst. Signal Process.101, 549–574. 10.1016/j.ymssp.2017.08.037 (2018). [Google Scholar]
  • 26.Raja, M. N. A. & Shukla, S. K. Predicting the settlement of geosynthetic-reinforced soil foundations using evolutionary artificial intelligence technique. Geotext. Geomembr.49(5), 1280–1293. 10.1016/j.geotexmem.2021.04.007 (2021)
  • 27.Raja, M. N. A., Abdoun, T. & El-Sekelly, W. Smart prediction of liquefaction-induced lateral spreading. J. Rock Mech. Geotech. Eng.16(Issue 6), 2310–2325. 10.1016/j.jrmge.2023.05.017 (2024). [Google Scholar]
  • 28.Raja, M. N. A., Abdoun, T. & El-Sekelly, W. Exploring the potential of machine learning in stochastic reliability modelling for reinforced soil foundations. Buildings14, 954. 10.3390/buildings14040954 (2024). [Google Scholar]
  • 29.Jaffar, S. T. A., Chen, X., Bao, X., Raja, M. N. A. & Abdoun, T. Data-driven intelligent modeling of unconfined compressive strength of heavy metal-contaminated soil. J. Rock Mech. Geotech. Eng.17(Issue 3), 1801–1815. 10.1016/j.jrmge.2024.05.025 (2025). [Google Scholar]
  • 30.Khan, M. U. A., Shukla, S. K. & Raja, M. N. A. Load-settlement response of a footing over buried conduit in a sloping terrain: a numerical experiment-based artificial intelligent approach. Soft Comput.26, 6839–6856. 10.1007/s00500-021-06628-x (2022). [Google Scholar]
  • 31.Chen, M., Park, Y., Mangalathu, S. & Jeon, J. S. Effect of data drift on the performance of machine‐learning models: Seismic damage prediction for aging bridges. Earthq. Eng. Struct. Dyn.53(15). 10.1002/eqe.4230 (2024).
  • 32.Huang, F. et al. Slope stability prediction based on a long short-term memory neural network: Comparisons with convolutional neural networks, support vector machines and random forest models. Int. J. Coal Sci. Technol.10, 18. 10.1007/s40789-023-00579-4 (2023). [Google Scholar]
  • 33.Huang, F. et al. Uncertainties of landslide susceptibility prediction: influences of different study area scales and mapping unit scales. Int. J. Coal Sci. Technol.11, 26. 10.1007/s40789-024-00678-w (2024). [Google Scholar]
  • 34.Yan, F. et al. Experimental study on strain localization and slow deformation evolution in small-scale specimens. Int. J. Coal Sci. Technol.12, 30. 10.1007/s40789-025-00771-8 (2025). [Google Scholar]
  • 35.Li, P. et al. Contemporary stress state in the Zhao–Ping metallogenic belt, Eastern china, and its correlation to regional geological tectonics. Int. J. Coal Sci. Technol.12, 29. 10.1007/s40789-025-00769-2 (2025). [Google Scholar]
  • 36.Tao, K. et al. Experimental study on the slip evolution of planar fractures subjected to cyclic normal stress. Int. J. Coal Sci. Technol.10, 67. 10.1007/s40789-023-00654-w (2023). [Google Scholar]
  • 37.Paredes, C. R. L. et al. Morales evaluating the impact of industrial wastes on the compressive strength of concrete using closed-form machine learning algorithms. Front. Build. Environ.10, 1453451 (2024). [Google Scholar]
  • 38.Onyelowe, K. C., Gnananandarao, T., Jagan, J., Ahmad, J. & Ebid, A. M. Onyia innovative predictive model for flexural strength, Fck of recycled aggregate concrete from multiple datasets. Asian J. Civil Eng.24, 1143–1152 (2023). [Google Scholar]
  • 39.Onyelowe, K. C., Gnananandarao, T., Mahdi, H. A. & Ghadikolaee, M. R. and M. Al-Ajamee evaluating the compressive strength of recycled aggregate concrete using novel artificial neural networks. Civil Eng. J.8(8), (2022).
  • 40.Yin, J. et al. Integrating image processing and deep learning for effective analysis and classification of dust pollution in mining processes. Int. J. Coal Sci. Technol.10, 84. 10.1007/s40789-023-00653-x (2023). [Google Scholar]
  • 41.Li, J. et al. The propagation mechanism of elastoplastic hydraulic fracture in deep reservoir. Int. J. Coal Sci. Technol.12, 21. 10.1007/s40789-025-00761-w (2025). [Google Scholar]
  • 42.Zhang, X. et al. Theoretical analysis of hydrogen solubility in direct coal liquefaction solvents. Int. J. Coal Sci. Technol.11, 28. 10.1007/s40789-024-00674-0 (2024). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data supporting the results of this research work will be made available upon reasonable request from the corresponding author.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES