Abstract
Background
Early and accurate diagnosis of metabolic dysfunction–associated steatotic liver disease (MASLD) is crucial for implementing effective treatment and management strategies, as the disease can progress to more severe conditions such as cirrhosis and liver cancer. This study aimed to evaluate the performance of machine learning (ML) methods in detecting MASLD and provide a more effective and efficient diagnostic approach.
Methods
Data were collected from outpatient participants undergoing annual health checks at the Pudong District Health Care Service Centers in Shanghai, China. The discovery and independent validation cohorts included 8949 and 5973 participants, respectively. Initially, 47 variables were analyzed, and an ML-driven feature selection method identified 20 variables for MASLD prediction. To enhance clinical utility, a simplified panel of 7 variables was further derived: body mass index, albumin, alanine transaminase, glucose, high-density lipoprotein, triglyceride, and creatinine. Four ML models—k-nearest neighbors (KNN), support vector machines (SVM), logistic regression (LR), and artificial neural networks (ANN)—were trained using both the 7-variable and full-variable datasets.
Results
In the independent test set, the 7-variable models demonstrated superior performance compared to the full-variable models. The AUC values for KNN, SVM, and ANN using the 7-variable set were 0.833, 0.753, and 0.848, respectively, significantly higher than those of the full-variable models (KNN, 0.683; SVM, 0.705; ANN, 0.847). The robustness of the 7-variable panel was further validated by its generalizability across the independent cohort.
Conclusions
This study establishes a streamlined ML-driven diagnostic framework for MASLD, leveraging routinely measured clinical variables to achieve high accuracy. The simplified 7-variable model enhances early detection and risk stratification while reducing diagnostic complexity and cost. By enabling proactive interventions through actionable biomarkers (e.g., glucose and lipid profiles), this approach holds significant potential for large-scale population screening and prevention of MASLD-related complications. The findings underscore the transformative role of ML in optimizing chronic liver disease management and advancing precision hepatology.
Supplementary Information
The online version contains supplementary material available at 10.1007/s11695-025-08096-w.
Keywords: Metabolic dysfunction–associated steatotic liver disease, Advanced machine learning models, Prediction, Older Chinese population
Introduction
Metabolic dysfunction–associated steatotic liver disease (MASLD) is a growing global health concern, affecting an estimated 25% of the population worldwide [1]. Early and accurate diagnosis of MASLD is crucial for implementing effective treatment and management strategies, as the disease can progress to more severe conditions such as cirrhosis and liver cancer.
Traditionally, diagnosis of MASLD has relied on imaging techniques such as ultrasound, liver function tests, and biopsy [2]. However, these methods can be subjective, prone to inter-observer variability, and time-consuming [3]. Furthermore, sole liver function tests cannot confirm the diagnosis of MASLD due to the indirect measure, and it can be influenced by other factors such as medication use; it usually should be interpreted by a healthcare professional in the context of the individual’s medical history and additional clinical information. Ultrasound may not provide enough detail to accurately diagnose MASLD, particularly in the early stages of the disease; it is also limited in its ability to differentiate between fatty liver and other liver diseases, leading to misdiagnosis or overdiagnosis of MASLD; the results of an ultrasound can be affected by multi-factors such as patient’s weight and the presence of bowel gas, which also limit the accuracy of the scan. In addition, a biopsy is an invasive procedure with associated risks, e.g., infection, and a biopsy is not always available or accessible in all locations, which limits its widespread use for diagnosing MASLD.
In recent years, machine learning (ML) methods have been proposed as a promising alternative for predicting MASLD [4]. ML algorithms can analyze large amounts of data and identify complex patterns that might not be obvious to human observers. This can lead to more accurate predictions of MASLD and improved risk stratification and personalized treatment approaches. Several types of ML algorithms exist, including supervised, unsupervised, semi-supervised, and reinforcement learning [5]. Supervised learning algorithms, e.g., k-nearest neighbors (KNN), support vector machines (SVM), logistic regression (LR), and artificial neural networks (ANN), are the most common ones, and they can be used for both classification and regression [6].
In the context of MASLD, early detection and accurate diagnosis are crucial for effective management. This is where ML can play a significant role. By leveraging advanced computational techniques, ML algorithms can analyze large and complex datasets to make predictions about MASLD. In this study, we aimed to evaluate the performance of these algorithms in detecting MASLD and assess their potential as a diagnostic tool. The results of this research will contribute to a better understanding of the utility of ML in MASLD diagnosis and pave the way for more effective and efficient diagnostic methods.
Materials and Methods
Data Source
The data for this study was gathered from a pool of outpatient patients who participated in annual health checks at the Pudong District Health Care Service Centers in the Zhangjiang area of Shanghai, China, from 2014 to 2019. The inclusion criteria were individuals over 60 years of age who lived in Shanghai and could complete measurements. Participants with mental disorders, malignant tumors, or incomplete medical records were excluded. Anthropometric and clinical traits were recorded for a total of 8949 older Chinese subjects with complete data, consisting of 3560 males and 5839 females, which were included in 2014–2018. The discovery set included in 2014–2018 was first used to select the variables based on the cross-validation in different machine learning models. The independent test set used to evaluate the prediction models comprised 5973 electronic medical records from the Health Care Service Centers from 2019, with 2680 males and 3293 females (see Supplementary Table 1). The diagnosis of MASLD was made using a Philips IU22 Color ultrasound system. The study adhered to the Helsinki Declaration and was approved by the Shanghai University of Traditional Chinese Medicine Ethics Committee. The standard protocol was developed by the Shanghai Innovation Center of Traditional Chinese Medicine Health Service, and all subjects gave their consent.
Questionnaire, Anthropometry, and Physical Examinations
Collection of information such as age, gender, alcohol consumption, smoking, medical history, and constitution of traditional Chinese medicine were collected by questionnaire. Body mass index (BMI) was calculated as weight (kg) divided by height squared (m2). Electronic sphygmomanometers were used to measure blood pressure (Bio-space, Cheonan, South Korea). The trained professional reliably measured the waist and hipline using a non-stretch tape. Systolic and diastolic blood pressure (SBP and DBP) and heart rate (HR) were measured by electronic sphygmomanometers (Biospace, Cheonan, South Korea). Waist-to-hip ratio (WHR) was calculated as the waist circumference-to-hip circumference ratio. Blood samples from the antecubital vein after fasting overnight were collected in the morning. Fasting glucose, alanine transaminase (ALT), aspartate transaminase (AST), total cholesterol (TC), low-density lipoprotein (LDL), high-density lipoprotein (HDL), triglyceride (TG), hemoglobin, hemameba, erythrocyte, urea, uric acid, total bilirubin, creatinine, and alpha-fetoprotein (AFP) were measured using the biochemistry analyzer (Hitachi, Tokyo, Japan). The tumor marker carcinoma embryonic antigen (CEA) was quantitatively determined by an electro-chemiluminescence immunoassay (ECLIA).
Variable Selection
To improve the discrimination of the model and make it more practical for clinical use with fewer unnecessary variables, a selection of variables was performed to identify a panel of biomarkers with the most discriminatory power for the outcome. Initial analysis using the Spearman correlation showed that several variables were highly correlated with each other and formed small groups, as illustrated in Fig. 1A and 1B. The correlation coefficient values can be found in Supplementary Table 2. This suggests that a simplified model using a small representative group of variables would still have enough discriminatory power.
Fig. 1.
Variable selection results. A Spearman correlation coefficients between all the variables over vectors of all the samples. Detailed correlation coefficient values can be found in Supplementary Table S1. B Variable-way hierarchical clustering results using distance metrics based on Spearman correlation coefficients
We used a variable selection method that had previously been successful in protein selection [7, 8]. The method consisted of a two-step, combined greedy feature-selection algorithm that eliminates redundant and irrelevant variables: backward elimination and forward selection [7, 8].
The first step in the backward elimination process involved using all the variables in a loop and eliminating one feature per iteration. A variable was removed from each iteration based on the fivefold and tenfold cross-validation (CV) results from the receiver operating characteristic (ROC) curve (AUC) value of the machine learning model, specifically the k-nearest neighbor (KNN) [9] and logistic regression (LR) [10] until the ROC-AUC value stopped improving. The set of variables remaining after this step would be used in the next stage, forward selection. In the forward selection step, all the variables that were retained from the previous backward elimination step were included. The variable set started with zero variables, and one variable was added to the set in each iteration based on the ROC-AUC value in KNN and LR until the ROC-AUC value stopped improving. The final set of variables was selected once this occurred.
In this research, missing data could be due to certain observed variables being zero at the time of data collection, or these pieces of information were omitted during the data collection process. Consequently, we opted to fill these missing data with zeros to reflect this situation to a certain extent. To show that our padding method did not introduce excessive bias and did not lead to very large changes in the results, we performed further tests in the 7-variable model by removing any samples with null values (the dataset size was reduced from 14,922 to 8080) and using the 2019 data in the dataset as the validation. In this model, the results are close to the original results without removing the null values, see Supplementary Table 3; we also randomly disordered the data and removed a quarter of them as the validation set, and the results are not much different from the original cross-validation results, see Supplementary Table 3.
Prediction Methods
With the panel of selected variables, we evaluated the performance of four ML methods: KNN [9], support vector machine (SVM) [11, 12], LR [10], and artificial neural network (ANN) [13, 14]. Every model was implemented using Python with sklearn or tensorflow package. The LR classifier employed a linear combination of variables through a sigmoid function, and the LR module used the “Liblinear” solver and “L1” penalty. The SVM aimed to identify classes by creating a decision boundary in higher-dimensional feature space using a radial basis function kernel after comparing linear kernel, polynomial kernel, and radial basis function kernels. The radial basis function kernel was selected as the default option. The KNN classifier adopted majority voting as the prediction value, with hyperparameter k set to 600 after evaluating values from 100 to 1000. For the ANN classifier, we used a tensorflow keras model with three fully connected layers (20, 10, and 1 nodes); the output layer was activated by the sigmoid activation function, a binary cross-entropy error function defined model loss, and adam optimization function was chosen during the training process [13, 14].
Model Evaluation
In the discovery set, we used a fivefold CV to verify the performance of different models. During the fivefold CV, the data were randomly divided into five copies, and five loops were carried out, with four copies used for the training model and one for testing in each loop. The fivefold CV was repeated 100 times to ensure the accuracy of results, and the data disruption was recorded in each fivefold CV.
ROC curve was used to evaluate the classifier performance by plotting the true positive rate against the false positive rate at different thresholds. The area under the AUC was used to summarize the ROC curve and provide a single numerical representation of the classifier’s discriminatory power. Matthews correlation coefficient (MCC) is a balanced measure of accuracy that takes into account true and false positives and negatives, while accuracy reflects the proportion of true results to all results. Precision, recall, and F1 score are three important evaluation metrics that measure the classifier’s ability to correctly predict positive and negative cases. Precision measures the proportion of true positive results among the positive predictions, recall measures the proportion of true positive results among all positive cases, and the F1 score is a weighted average of precision and recall. Finally, the area under the positive rate curve (AUPR) was used to evaluate the precision-recall trade-off for positive cases [8]. In addition, due to the output values from LR/KNN/ANN models being continuous variables, the F1 score, MCC, accuracy, precision, and recall were calculated based on the biggest cutoff score for each model.
Statistical Methods
Shapiro–Wilk test was used to check the normality of the data using IBM SPSS (26.0). If data were not normally distributed, their natural logarithms were used, and each variable in this study satisfied the normality assumption distribution. Clinical data in a human was presented as mean ± standard deviation (SD). Categorical data were calculated as a percentage. All statistical analyses and figures related to ML were performed by Python 3.8.16. The sensitivity, specificity, and positive and negative predictive values (PPV and NPV) were provided in the Supplementary Table 4 and 5.
Results
Characteristics of Subjects
Table 1 provides an overview of the demographic and clinical characteristics of the study population. The average age of participants was 67.52 years, and the average BMI was 24.61. Out of the total 14,922 participants, 8471 (56.77%) were diagnosed with MASLD. The other tables in the Supplementary material (Tables 6 and 7) provide further information on the clinical traits, such as levels of AST, TC, and LDL, as well as information on traditional Chinese medicine constitution and the presence of medical complications such as diabetes, cardiopathy, and hyperlipidemia.
Table 1.
Characteristics of the total subjects, discovery set and independent test set
| Characteristic | N (%)/mean ± SD | ||
|---|---|---|---|
| Total (14,922) | Discovery set (n = 8949) | Independent test set (n = 5973) | |
| Gender | |||
| Male | 6240 (41.82) | 3560 (39.78) | 2680 (44.87) |
| Female | 9132 (58.18) | 5839 (60.22) | 3293 (55.13) |
| Age (years) | 67.52 ± 9.63 | 66.60 ± 10.55 | 68.91 ± 7.85 |
| BMI | 24.61 ± 3.46 | 24.51 ± 3.54 | 24.76 ± 3.33 |
| Current smoking | |||
| Yes | 55 (0.37) | 39 (0.44) | 16 (0.27) |
| No | 14,867 (99.63) | 8910 (99.56) | 5957 (99.73) |
| Alcohol consumption | |||
| Yes | 85 (0.57) | 67 (0.75) | 18 (0.30) |
| No | 14,837 (99.43) | 8882 (99.25) | 5955 (99.70) |
| NAFLD | |||
| Yes | 8471 (56.77) | 4128 (46.13) | 4343 (72.71) |
| No | 6451 (43.23) | 4821 (53.87) | 1630 (27.29) |
BMI body mass index, MASLD metabolic dysfunction–associated steatotic liver disease
Table 6.
Evaluation of different models in the independent test set
| ML model | AUC | MCC | Accuracy | Precision | Recall | AUPR | F1 score |
|---|---|---|---|---|---|---|---|
| KNN1 | 0.683 | 0.047 | 0.728 | 0.728 | 1.000 | 0.833 | 0.842 |
| KNN2 | 0.833 | 0.462 | 0.790 | 0.848 | 0.867 | 0.921 | 0.857 |
| SVM1 | 0.705 | 0.493 | 0.815 | 0.825 | 0.947 | 0.905 | 0.882 |
| SVM2 | 0.753 | 0.513 | 0.810 | 0.862 | 0.879 | 0.915 | 0.870 |
| LR1 | 0.851 | 0.441 | 0.800 | 0.806 | 0.954 | 0.928 | 0.874 |
| LR2 | 0.849 | 0.494 | 0.810 | 0.842 | 0.909 | 0.927 | 0.874 |
| ANN1 | 0.847 | 0.423 | 0.795 | 0.799 | 0.959 | 0.927 | 0.871 |
| ANN2 | 0.848 | 0.490 | 0.809 | 0.841 | 0.909 | 0.926 | 0.874 |
ANN artificial neural network,ws correlation coefficient, KNN k-nearest neighbor, LR linear regression, ML machine learning, SVM support vector machine. 1 and 2 indicate all-variable and 7-variable set, respectively
No significant difference in age was observed between MASLD and control groups in the discovery and independent test sets (Table 2). Weight, SBP, DBP, waistline, hipline, and WHR in the MASLD group were significantly higher than in the control groups of the discovery and independent test sets (p < 0.001 for all). Left and right eye vision and HR in the MASLD group were lower than those in the control groups of the discovery set and the independent test set (p < 0.001 for both).
Table 2.
Characteristics of the discovery set and independent test set
| Characteristic | Discovery set (n = 8949) | Independent test set (n = 5973) | ||||
|---|---|---|---|---|---|---|
| MASLD (n = 4128) | Control (n = 4821) | p-value | MASLD (n = 4343) | Control (n = 1630) | p-value | |
| Age | 66.64 ± 9.86 | 66.56 ± 11.11 | 0.185 | 68.96 ± 7.41 | 68.8 ± 8.913 | 0.833 |
| Weight (kg) | 68.34 ± 10.71 | 59.02 ± 11.45 | < 0.001 | 66.43 ± 9.782 | 55.51 ± 8.94 | < 0.001 |
| Left eye | 0.49 ± 0.35 | 0.57 ± 0.32 | < 0.001 | 0.48 ± 0.36 | 0.58 ± 0.37 | < 0.001 |
| Right eye | 0.49 ± 0.36 | 0.58 ± 0.33 | < 0.001 | 0.48 ± 0.28 | 0.58 ± 0.39 | < 0.001 |
| HR | 57.99 ± 30.83 | 68.30 ± 19.90 | < 0.001 | 58.00 ± 30.78 | 68.98 ± 18.99 | < 0.001 |
| SBP (mmHg) | 142.63 ± 20.02 | 137.57 ± 20.89 | < 0.001 | 140.63 ± 19.18 | 135.83 ± 20.41 | < 0.001 |
| DBP (mmHg) | 81.19 ± 9.86 | 79.76 ± 9.86 | < 0.001 | 78.71 ± 7.82 | 76.45 ± 7.792 | < 0.001 |
| Waistline (cm) | 88.19 ± 8.54 | 79.079 ± 8.48 | < 0.001 | 84.34 ± 7.76 | 74.85 ± 7.682 | < 0.001 |
| Hipline (cm) | 98.31 ± 6.62 | 93.51 ± 24.00 | < 0.001 | 93.67 ± 5.99 | 88.14 ± 5.24 | < 0.001 |
| WHR | 0.90 ± 0.06 | 0.85 ± 0.07 | < 0.001 | 0.90 ± 0.05 | 0.85 ± 0.06 | < 0.001 |
| Current smoking | ||||||
| Yes | 7 | 32 | 12 | 4 | ||
| No | 4121 | 4789 | 4331 | 1626 | ||
| Alcohol consumption | ||||||
| Yes | 11 | 56 | 15 | 3 | ||
| No | 4117 | 4765 | 4328 | 1527 | ||
DBP diastolic blood pressure, HDL high-density lipoprotein, HR heart rate, MASLD metabolic dysfunction–associated steatotic liver disease, SBP systolic blood pressure, WHR waist-to-hip ratio
Variable Selection
Table 2 and Supplementary Table 4 and 5 present 47 alternative variables, encompassing anthropometric factors, clinical variables, the constitution of traditional Chinese medicine, and some complications. A variable set was chosen to eliminate irrelevant and redundant features through a combination of backward elimination and forward selection based on the ROC-AUC value of two ML models, KNN and LR. The selection process continued until the ROC-AUC value could no longer be improved.
The discovery set was analyzed using LR and KNN with fivefold and tenfold CV to find the optimal accuracy for predicting MASLD. The LR fivefold and tenfold CV selected 19 and 20 variables, respectively, with accuracy values of 0.7610 and 0.7605. The KNN fivefold and tenfold CV resulted in selecting 10 and 12 variables, respectively, with accuracy values of 0.7533 and 0.7483. The optimal AUC and AUPR were found using LR fivefold and KNN fivefold CV, with AUC values of 0.8585 and 0.8329 and AUPR values of 0.8266 and 0.8054, respectively (Table 3).
Table 3.
Selecting variables by LR and KNN
| fivefold LR | fivefold KNN | tenfold LR | tenfold KNN | |
|---|---|---|---|---|
| Selected variables | BMI | BMI | BMI | BMI |
| TG | CEA | TG | AFP | |
| Platelet | Eythrocyte | Platelet | TG | |
| Waistline | TG | Waistline | Albumin | |
| Gender | Gender | Gender | Hemameba | |
| UA | Waistline | Heart rate | Waistline | |
| Glucose | Height | HDL | Creatinine | |
| HDL | ALT | Glucose | Glucose | |
| Total cholesterol | Creatinine | Total cholesterol | Height | |
| Examine age | Albumin | UA | Gender | |
| Heart rate | Examine age | AST | ||
| Creatinine | Creatinine | Weight | ||
| ALT | ALT | |||
| Hemoglobin | Hemoglobin | |||
| AST | Diabetes | |||
| Diabetes | AST | |||
| Ureophil | Ureophil | |||
| Weight | Weight | |||
| WHR | WHR | |||
| Albumin | ||||
| Accuracy | 0.7610 | 0.7533 | 0.7605 | 0.7483 |
| AUC | 0.8585 | 0.8329 | 0.8590 | 0.8310 |
| AUPR | 0.8266 | 0.8054 | 0.8270 | 0.8023 |
AFP alpha fetoprotein, ALT alanine aminotransferase, AST aspartate aminotransferase, AUC area under curve, AUPR area under precision-recall curve, BMI body mass index, CEA carcinoembryonic antigen, HDL high-density lipoprotein, LDL low-density lipoprotein, KNN k-nearest neighbor, LR logistic regression, TG triglyceride, UA uric acid, WHR waist-to-hip ratio
The combination of accuracy and optimal AUC led to the selection of 20 variables listed in Supplementary Tables 4 and 5. These variables included BMI, TG, platelet, waistline, gender, UA, glucose, HDL, TC, examine age, HR, creatinine, ALT, hemoglobin, AST, diabetes, ureophil, weight, WHR, and albumin. After a close examination and considering prior clinical experience, these variables were further narrowed down to 7 variables that are practical for clinical implementation. These seven variables are BMI, albumin, ALT, glucose, HDL, TG, and creatinine (Table 4). A strong correlation was observed between these seven variables (Fig. 2).
Table 4.
Selected seven variables of the discovery set and independent test set
| Characteristic | Discovery set (n = 8949) | Independent test set (n = 5973) | ||||
|---|---|---|---|---|---|---|
| MASLD (n = 4128) | Control (n = 4821) | p-value | MASLD (n = 4343) | Control (n = 1630) | p-value | |
| BMI (kg/m2) | 26.40 ± 3.30 | 22.90 ± 2.88 | < 0.001 | 25.76 ± 2.99 | 22.07 ± 2.62 | < 0.001 |
| Albumin (g/L) | 41.02 ± 12.47 | 39.26 ± 14.70 | 0.007 | 44.54 ± 2.32 | 44.17 ± 2.43 | < 0.001 |
| ALT (U/L) | 24.88 ± 18.90 | 19.72 ± 18.18 | < 0.001 | 23.71 ± 19.66 | 18.90 ± 9.60 | < 0.001 |
| Glucose (mmol/L) | 5.97 ± 2.46 | 5.25 ± 2.33 | < 0.001 | 6.18 ± 1.73 | 5.66 ± 1.46 | < 0.001 |
| HDL (mmol/L) | 1.21 ± 0.48 | 1.34 ± 0.61 | < 0.001 | 1.21 ± 0.27 | 1.40 ± 0.32 | < 0.001 |
| TG (mmol/L) | 0.72 ± 1.17 | 0.36 ± 0.81 | < 0.001 | 1.71 ± 1.19 | 1.16 ± 0.71 | < 0.001 |
| Creatinine | 66.02 ± 27.67 | 66.23 ± 38.40 | 0.025 | 73.18 ± 20.37 | 71.29 ± 21.45 | < 0.001 |
ALT alanine aminotransferase, BMI body mass index, HDL high-density lipoprotein, TG triglyceride
Fig. 2.

Correlation between 7-variable set. ALT, alanine aminotransferase; BMI, body mass index; HDL, high-density lipoprotein; TG, triglyceride
Discrimination of Different Models
Eight predictive models were developed, including KNN, SVC, LR, and ANN models for the discovery set’s 7-variable and all-variable sets. The discriminatory ability of each model was illustrated using violin plots in Fig. 3 and Fig. 4, which show the prediction score distribution comparison for MASLD positive (left violin) and MASLD negative peptides (right violin).
Fig. 3.
Discriminative power comparison between different prediction models in the discovery set. A1 and A2 indicate the predicted score distribution in all-variable set and 7-variable set using KNN, respectively; B1 and B2 indicate the predicted score distribution in all-variable set and 7-variable set using SVC, respectively; C1 and C2 indicate the predicted score distribution in all-variable set and 7-variable set using LR, respectively; D1 and D2 indicate the predicted score distribution in all-variable set and 7-variable set using ANN, respectively. ANN, artificial neural network; KNN, k-nearest neighbor; LR, logistic regression; SVM, support vector machine
Fig. 4.
Discriminative power comparison between different prediction models in the independent set. A1 and A2 indicate the predicted score distribution in the all-variable set and 7-variable set using KNN, respectively; B1 and B2 indicate the predicted score distribution in the all-variable set and 7-variable set using SVM, respectively; C1 and C2 indicate the predicted score distribution in the all-variable set and 7-variable set using LR, respectively; D1 and D2 indicate the predicted score distribution in the all-variable set and 7-variable set using ANN, respectively. ANN, artificial neural network; KNN, k-nearest neighbor; LR, logistic regression; SVM, support vector machine
Among the models in the discovery set, the LR model using the 7-variable set showed the best discriminatory effect, while the KNN and ANN models had higher AUCs than the SVC model (Table 5 and Fig. 5). The performance of the 7-variable set in some models, such as SVC, LR, and ANN, showed a slight decrease compared to the AUC in the all-variable set, which could be due to overfitting induced by data redundancy in the discovery set.
Table 5.
Evaluation of different models with fivefold cross-validation in the discovery set
| ML model | AUC | MCC | Accuracy | Precision | Recall | AUPR | F1 score |
|---|---|---|---|---|---|---|---|
| KNN1 | 0.810 | 0.459 | 0.719 | 0.655 | 0.828 | 0.768 | 0.731 |
| KNN2 | 0.826 | 0.496 | 0.745 | 0.695 | 0.796 | 0.789 | 0.742 |
| SVM1 | 0.802 | 0.551 | 0.776 | 0.745 | 0.781 | 0.819 | 0.763 |
| SVM2 | 0.778 | 0.529 | 0.765 | 0.733 | 0.769 | 0.809 | 0.751 |
| LR1 | 0.860 | 0.547 | 0.766 | 0.702 | 0.856 | 0.828 | 0.771 |
| LR2 | 0.843 | 0.520 | 0.751 | 0.685 | 0.852 | 0.799 | 0.759 |
| ANN1 | 0.859 | 0.544 | 0.764 | 0.701 | 0.853 | 0.827 | 0.769 |
| ANN2 | 0.846 | 0.526 | 0.754 | 0.689 | 0.851 | 0.800 | 0.761 |
ANN artificial neural network, AUC area under curve, AUPR area under precision-recall curve, F1 score, MCC Matthews correlation coefficient, KNN k-nearest neighbor, LR linear regression, ML machine learning, SVM support vector machine. 1 and 2 indicate all-variable and 7-variable set, respectively
Fig. 5.
Receiver operating characteristic curves of different prediction models based on all-variable and 7-variable set. A1 and A2 indicate the ROC curves in the all-variable set and 7-variable set of KNN, SVM, ANN, and LR in the discovery set, respectively; B1 and B2 indicate the ROC curves in the all-variable set and 7-variable set of KNN, SVM, ANN, and LR in the independent test set, respectively. ANN, artificial neural network; AUC, area under curve; KNN, k-nearest neighbor; LR, logistic regression; ROC, receiver operating characteristic; SVC, support vector machine classifier
However, in the independent test set, the AUCs of the 7-variable set using KNN, SVC, and ANN models were significantly higher compared to the all-variable set (KNN: 0.833 vs. 0.683; SVC: 0.753 vs. 0.705; ANN: 0.848 vs. 0.847) (Table 6 and Fig. 5). These results indicate that the performance in the independent test set was much better than that in the discovery set, confirming the robustness of the selected 7-variable set. Additionally, the 7-variable set showed higher MCC (KNN: 0.462 vs. 0.047; SVC: 0.513 vs. 0.493; LR: 0.494 vs. 0.441; ANN: 0.490 vs. 0.423) and precision (KNN: 0.848 vs. 0.728; SVC: 0.862 vs. 0.825; LR: 0.842 vs. 0.806; ANN: 0.841 vs. 0.799) compared to the all-variable set when using the four ML models.
Discussion
ML models were applied to predict the occurrence of MASLD based on a large sample in a cross-sectional study performed with subjects who attended a health examination at the Pudong District Health Care Service Centers in the Zhangjiang area of Shanghai, China. We used ML variable selection methods to screen for risk factors for MASLD. Of the 44 extracted variables, seven variables were selected based on the fivefold and tenfold CV from ML models in the discovery set and our prior knowledge, which included BMI, albumin, ALT, glucose, HDL, TG, and creatinine. Notably, the AUCs, MCC, and precision of the 7-variable set using KNN, SVC, and ANN models were significantly higher compared to the all-variable set in the independent test set, which indicates that the performance in the independent test set was much better than that in the discovery set, confirming the robustness of the selected 7-variable set.
In the older population, both BMI and waistline were considered essential indicators for predicting the risk of developing MASLD [15]. However, BMI as an indicator is more common because it is a simpler, quicker, and less invasive method of measuring body composition compared to waistline measurement. In addition, BMI also provides an overall assessment of body fat levels and distribution, which is crucial in determining the risk of developing MASLD and other related health conditions [16]. Nevertheless, waistline measurement was still considered a valuable tool in assessing the amount of abdominal fat. It is a better predictor of risk for frailty, given its relationship with metabolic disorders in community-dwelling old adults in Beijing [17]. Therefore, we believe that it depends on the specific condition when BMI or waistline is used as a variable to correlate to MASLD or other metabolic disorders in a particular population.
Albumin was one of the seven predictors and demonstrated high prediction power in this study. Albumin plays a role in the immune system by transporting antibodies and hormones and binding to toxins and waste products [18]. This helps to remove those harmful substances from the body and maintain a healthy immune response. Regarding MASLD, research has shown a correlation between decreased albumin levels and inflammation aggravation, which can worsen liver disease progression [19]. This can also indicate advanced stages of MASLD and the presence of liver fibrosis, which can increase the risk of liver failure and other complications. Hence, determining the amount of albumin can aid in diagnosing and keeping track of MASLD and evaluating the advancement and response to therapy. The relationship between albumin and TG in MASLD was not well established. However, low albumin levels have generally been associated with higher levels of TG and other markers of insulin resistance, which were known risk factors for MASLD [20].
It is worth noting that the accumulation of TG, common in obese and lean individuals with MASLD, can cause inflammation and oxidative stress, leading to liver damage [21]. The TG, glucose, and BMI combination in the TG-glucose-BMI index was considered better for predicting MASLD than using any of these variables alone, as they provide a more comprehensive assessment of metabolic health and the risk of developing the liver disease [22, 23]. This index considers lipid metabolism, glucose metabolism, and body fat distribution critical factors in MASLD’s development and progression. In our study, besides including the three variables mentioned earlier, adding more indicators, i.e., albumin, ALT, HDL, and creatinine, that can reflect metabolic and inflammatory characteristics can improve the prediction of MASLD and monitor changes in metabolic health over time.
ALT was generally considered a more reliable marker for MASLD compared to AST and was often included in studies of MASLD as a predictor or diagnostic tool [24]. ALT and AST are both liver enzymes commonly used as liver injury markers. However, ALT was considered a more specific indicator of liver damage than AST, as ALT is primarily found in the liver, and elevated levels indicate liver disease [25], which was consistent with our study. In contrast, AST was found in other organs as well, and elevated AST levels can also indicate diseases or conditions in other organs, such as the skeletal muscle [26]. It is important to note that the choice of which liver enzyme to include in a study relies on the specific goals of the study, as well as the type of population being studied and the availability of data.
Cholesterol is not always included as a predictor of MASLD because it is not a direct marker of liver inflammation or damage [27]. While elevated cholesterol levels can be associated with an increased risk of cardiovascular disease and other health problems, it is not a specific marker for MASLD [28]. Specifically, cholesterol is a more general marker of metabolic health and is influenced by various factors such as diet and genetics [29, 30]. This can make it less specific to MASLD. Additionally, the subtypes of cholesterol, such as HDL and LDL, can have different health effects. HDL is often included as a predictor or diagnostic marker for MASLD as it has been shown to have a protective effect against the development of liver disease [30, 31]. The choice of HDL in this study as one of the predictor variables in ML models was based on the results of statistical analyses and the ability of the variables to predict the outcome of interest.
Notably, creatinine was also included in the prediction of MASLD in this study. Creatinine is a waste product produced by muscle metabolism and is typically excreted by the kidneys [32]. Creatinine levels vary depending on several factors, including age, muscle mass, and kidney function [33]. In the older population, creatinine levels can increase due to decreased muscle mass and a decline in kidney function [34]. The relationship between creatinine levels and muscle metabolism is complex, but creatinine levels are believed to reflect muscle mass and its metabolic activity. High creatinine levels can indicate decreased kidney function and muscle wasting, which can be relevant in the context of MASLD [35]. NASH patients had significantly more elevated serum creatinine than those with other chronic liver diseases [36, 37]. Therefore, creatinine levels can be an essential factor to consider in evaluating individuals with suspected or confirmed MASLD, as they provide additional information about overall health status and the potential presence of comorbid conditions.
When predicting MASLD using ML, a smaller set of variables, such as the seven commonly selected ones (i.e., BMI, albumin, ALT, glucose, HDL, TG, and creatinine), can have several advantages in this study. Firstly, a smaller number of variables can make the prediction model more manageable and easier to work with, as it reduces the complexity of the model and the risk of overfitting. Secondly, these selected seven variables have been found to have strong associations with MASLD and be good predictors of the disease in numerous studies. Thirdly, more minor variables can make the model more interpretable and transparent. Understanding the relationships between the variables and the outcome of interest can be more accessible. In summary, while using all available variables might provide more information, the trade-off is increased complexity and increased risk of overfitting. Using a carefully selected subset of variables can give a more robust and usable model for predicting MASLD [38].
Notably, Chen et al. [39] reported that the xgBoost model had the best overall prediction ability for diagnosing FLD with the highest AUC (0.882) in Taiwanese subjects. Su et al. [40] found that the two-class neural network exhibited a higher AUC value for predicting fatty liver (0.868) in Korean population. In our study, the AUCs of the 7-variable set using KNN (0.833), SVC (0.753), and ANN (0.848) models were significantly higher compared to the all-variable set in the independent test set, all of which suggest that ML algorithms offer benefits for screening MASLD, although these studies used different algorithms. This also shows that the same predictive model may not be suitable for other patients or ethnic groups. Additionally, although fatty liver index (FLI) is regarded as a suitable and simple predictor for liver steatosis, Motamed et al. [41] reported that the performance of FLI in predicting MASLD in the Iran population was not more effective than waist circumference. Su et al. [40] observed that ML method has the higher predictive ability than FLI in the Korean population. So a modified FLI formula based on different ethical populations is necessary.
The limitations of this study included the small sample size, being limited to data from one center, and the lack of external validation. The prediction model was based on electronic medical records that may contain inherent biases but are readily accessible and the most practical option for predicting MASLD. Additionally, the biopsy is the gold standard of MASLD diagnosis; instead, we used color ultrasound to diagnose MASLD in this study due to its low cost. Despite these limitations, the proposed combination of variable selection and ML approach is still noteworthy for the early prediction of MASLD. Our study is based on the elderly population in the community, which is indicative of the prevention and treatment of MASLD in this special population. More clinical data from various centers operating under varied ethical standards are required to further validate the model’s predictions.
Conclusions
We established prediction models for MASLD by implementing ML algorithms with health examination data. Seven key MASLD predictive variables, i.e., BMI, albumin, ALT, glucose, HDL, TG, and creatinine, were selected with the ML-based variable selection approach. This model not only enhances early detection in aging populations but also offers a practical tool for primary care settings through its reliance on routine clinical parameters. The identification of modifiable biomarkers (e.g., glucose and TG) further bridges precision prevention and public health strategies, while demonstrating the potential of ML to optimize resource-limited chronic disease management.
Supplementary Information
Below is the link to the electronic supplementary material.
Author Contribution
B.L., and G.J. designed research; N.W., S.W., X.S., X.X., W.Z., S.Y., H.S., H.Y., J.W., and L.Z., conducted research; N.W., M.F., and H.Z., analyzed and interpreted data; N.W., and H.Z., wrote the paper; G.J., and B.L. reviewed the manuscript critically. All authors have read and agreed to the published version of the manuscript.
Funding
This study was funded by National Natural Science Foundation of China (82405047); Shanghai 2023 “Science and Technology Innovation Action Plan” Qi Ming Xing Cultivation (Yang Fan Project, 23YF1447700); Shanghai Key Laboratory of Health Identification and Assessment (NO.21DZ2271000); Shanghai Science and Technology Innovation Action Plan Technical Standards Project, 2022 (22DZ2206000).
Data Availability
No datasets were generated or analysed during the current study.
Declarations
Consent for Publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Na Wu, Mofan Feng and Hanhua Zhao contributed equally.
Contributor Information
Na Wu, Email: wuna0308@163.com.
Guang Ji, Email: jg@shutcm.edu.cn.
Baocheng Liu, Email: baochliu_lab@163.com.
References
- 1.Younossi Z, Anstee QM, Marietti M, et al. Global burden of NAFLD and NASH: trends, predictions, risk factors and prevention. Nat Rev Gastroenterol Hepatol. 2018;15(1):11–20. [DOI] [PubMed] [Google Scholar]
- 2.Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of non-alcoholic fatty liver disease: practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Hepatology. 2012;55(6):2005–23. [DOI] [PubMed] [Google Scholar]
- 3.Sumida Y, Nakajima A, Itoh Y. Limitations of liver biopsy and non-invasive diagnostic tests for the diagnosis of nonalcoholic fatty liver disease/nonalcoholic steatohepatitis. World J Gastroenterol. 2014;20(2):475–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li Y, Wang X, Zhang J, et al. Applications of artificial intelligence (AI) in researches on non-alcoholic fatty liver disease(NAFLD): a systematic review. Rev Endocr Metab Disord. 2022;23(3):387–400. [DOI] [PubMed] [Google Scholar]
- 5.Hastie T, Tibshirani R, Friedman JH, et al. 2016 The elements of statistical learning: data mining, inference, and prediction. Springer
- 6.Goodfellow I, Bengio Y, and Courville A. Deep learning. 2009 MIT press
- 7.Niroula A, Urolagin S, Vihinen M. PON-P2: prediction method for fast and reliable identification of harmful variants. PLoS ONE. 2015;10(2):e0117380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang Y, Urolagin S, Niroula A, et al. 2018 PON-tstab: protein variant stability predictor. Importance of Training Data Quality. Int J Mol Sci. 19(4) [DOI] [PMC free article] [PubMed]
- 9.Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87. [Google Scholar]
- 10.Cramer JS. 2002 The origins of logistic regression
- 11.Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST). 2011;2(3):1–27. [Google Scholar]
- 12.Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1):389–422. [Google Scholar]
- 13.Aziz R, Verma C, Jha M, et al. Artificial neural network classification of microarray data using new hybrid gene selection method. Int J Data Min Bioinform. 2017;17(1):42–65. [Google Scholar]
- 14.Rahideh A, and Shaheed MH. 2011 The 2nd international conference on control. Instrum Autom IEEE. 1175–80
- 15.Gu Z, Li D, He H, et al. Body mass index, waist circumference, and waist-to-height ratio for prediction of multiple metabolic risk factors in Chinese elderly population. Sci Rep. 2018;8(1):385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Church TS, Kuk JL, Ross R, et al. Association of cardiorespiratory fitness, body mass index, and waist circumference to nonalcoholic fatty liver disease. Gastroenterology. 2006;130(7):2023–30. [DOI] [PubMed] [Google Scholar]
- 17.Liao Q, Zheng Z, Xiu S, et al. Waist circumference is a better predictor of risk for frailty than BMI in the community-dwelling elderly in Beijing. Aging Clin Exp Res. 2018;30(11):1319–25. [DOI] [PubMed] [Google Scholar]
- 18.Wilde B, and Katsounas A. 2019 Immune dysfunction and albumin-related immunity in liver cirrhosis. Mediat Inflamm. 2019 [DOI] [PMC free article] [PubMed]
- 19.Kawaguchi K, Sakai Y, Terashima T, et al. 2021 Decline in serum albumin concentration is a predictor of serious events in nonalcoholic fatty liver disease. Medicine. 100(31) [DOI] [PMC free article] [PubMed]
- 20.Bae JC, Seo SH, Hur KY, et al. Association between serum albumin, insulin resistance, and incident diabetes in nondiabetic subjects. Endocrinol Metab (Seoul). 2013;28(1):26–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sookoian S, Pirola CJ. Systematic review with meta-analysis: risk factors for non-alcoholic fatty liver disease suggest a shared altered metabolic and cardiovascular profile between lean and obese patients. Aliment Pharmacol Ther. 2017;46(2):85–95. [DOI] [PubMed] [Google Scholar]
- 22.Beran A, Ayesh H, Mhanna M, et al. Triglyceride-Glucose index for early prediction of nonalcoholic fatty liver disease: a meta-analysis of 121,975 individuals. J Clin Med. 2022;11(9):2666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Al Akl NS, Haoudi EN, Bensmail H, et al. The triglyceride glucose-waist-to-height ratio outperforms obesity and other triglyceride-related parameters in detecting prediabetes in normal-weight Qatari adults: a cross-sectional study. Front Public Health. 2023;11:1086771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wong CA, Araneta MRG, Barrett-Connor E, et al. Probable NAFLD, by ALT levels, and diabetes among Filipino-American Women. Diabetes Res Clin Pract. 2008;79(1):133–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kim WR, Flamm SL, Di Bisceglie AM, et al. Serum activity of alanine aminotransferase (ALT) as an indicator of health and disease. Hepatology. 2008;47(4):1363–70. [DOI] [PubMed] [Google Scholar]
- 26.Nathwani RA, Pais S, Reynolds TB, et al. Serum alanine aminotransferase in skeletal muscle diseases. Hepatology. 2005;41(2):380–2. [DOI] [PubMed] [Google Scholar]
- 27.Lazo M, Clark JM. The epidemiology of nonalcoholic fatty liver disease: a global perspective. Semin Liver Dis. 2008;28(4):339–50. [DOI] [PubMed] [Google Scholar]
- 28.Sozen E, Ozer NK. Impact of high cholesterol and endoplasmic reticulum stress on metabolic diseases: an updated mini-review. Redox Biol. 2017;12:456–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ye SQ, Kwiterovich PO Jr. Influence of genetic polymorphisms on responsiveness to dietary fat and cholesterol. Am J Clin Nutr. 2000;72(5):1275s-s1284. [DOI] [PubMed] [Google Scholar]
- 30.Wu N, Zhai X, Yuan F, et al. Genetic variation in TBC1 domain family member 1 gene associates with the risk of lean NAFLD via high-density lipoprotein. Front Genet. 2022;13:1026725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.McCullough A, Previs SF, Dasarathy J, et al. HDL flux is higher in patients with nonalcoholic fatty liver disease. Am J Physiol-Endocrinol Metab. 2019;317(5):E852–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Patel SS, Molnar MZ, Tayek JA, et al. Serum creatinine as a marker of muscle mass in chronic kidney disease: results of a cross-sectional study and review of literature. J Cachexia Sarcopenia Muscle. 2013;4:19–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Perrone RD, Madias NE, Levey AS, et al. Serum creatinine as an index of renal function: new insights into old concepts. Clin Chem. 1992;38(10):1933–53. [PubMed] [Google Scholar]
- 34.Tao Z, Li Y, Cheng B, et al. Influence of nonalcoholic fatty liver disease on the occurrence and severity of chronic kidney disease. J Clin Transl Hepatol. 2022;10(1):164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cao Y, Deng Y, Wang J, et al. The association between NAFLD and risk of chronic kidney disease: a cross-sectional study. Ther Adv Chron Dis. 2021;12:20406223211048650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Park CW, Tsai NT, Wong LL. Implications of worse renal dysfunction and medical comorbidities in patients with NASH undergoing liver transplant evaluation: impact on MELD and more. Clin Transplant. 2011;25(6):E606–11. [DOI] [PubMed] [Google Scholar]
- 37.Targher G, Chonchol MB, Byrne CD. CKD and nonalcoholic fatty liver disease. Am J Kidney Dis. 2014;64(4):638–52. [DOI] [PubMed] [Google Scholar]
- 38.Uddin S, Khan A, Hossain ME, et al. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chen YY, Lin CY, Yen HH, et al. 2022 Machine-learning algorithm for predicting fatty liver disease in a Taiwanese population. J Pers Med. 12(7) [DOI] [PMC free article] [PubMed]
- 40.Su P-Y, Chen Y-Y, Lin C-Y, et al. Comparison of machine learning models and the fatty liver index in predicting lean fatty liver. Diagnostics. 2023;13(8):1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Motamed N, Sohrabi M, Ajdarkosh H, et al. Fatty liver index vs waist circumference for predicting non-alcoholic fatty liver disease. World J Gastroenterol. 2016;22(10):3023–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No datasets were generated or analysed during the current study.




