Skip to main content
Gut and Liver logoLink to Gut and Liver
. 2025 Jan 8;19(1):126–135. doi: 10.5009/gnl240367

A Novel Point-of-Care Prediction Model for Steatotic Liver Disease: Expected Role of Mass Screening in the Global Obesity Crisis

Jeayeon Park 1,#, Goh Eun Chung 2,#, Yoosoo Chang 3,4,5,#, So Eun Kim 6, Won Sohn 7, Seungho Ryu 3,4,5, Yunmi Ko 1, Youngsu Park 1, Moon Haeng Hur 1, Yun Bin Lee 1, Eun Ju Cho 1, Jeong-Hoon Lee 1, Su Jong Yu 1, Jung-Hwan Yoon 1, Yoon Jun Kim 1,
PMCID: PMC11736326  PMID: 39778883

Abstract

Background/Aims

The incidence of steatotic liver disease (SLD) is increasing across all age groups as the incidence of obesity increases worldwide. The existing noninvasive prediction models for SLD require laboratory tests or imaging and perform poorly in the early diagnosis of infrequently screened populations such as young adults and individuals with healthcare disparities. We developed a machine learning-based point-of-care prediction model for SLD that is readily available to the broader population with the aim of facilitating early detection and timely intervention and ultimately reducing the burden of SLD.

Methods

We retrospectively analyzed the clinical data of 28,506 adults who had routine health check-ups in South Korea from January to December 2022. A total of 229,162 individuals were included in the external validation study. Data were analyzed and predictions were made using a logistic regression model with machine learning algorithms.

Results

A total of 20,094 individuals were categorized into SLD and non-SLD groups on the basis of the presence of fatty liver disease. We developed three prediction models SLD model 1, which included age and body mass index (BMI); SLD model 2, which included BMI and body fat per muscle mass; and SLD model 3, which included BMI and visceral fat per muscle mass. In the derivation cohort, the area under the receiver operating characteristic curve (AUROC) was 0.817 for model 1, 0.821 for model 2, and 0.820 for model 3. In the internal validation cohort, 86.9% of individuals were correctly classified by the SLD models. The external validation study revealed an AUROC above 0.84 for all the models.

Conclusions

As our three novel SLD prediction models are cost-effective, noninvasive, and accessible, they could serve as validated clinical tools for mass screening of SLD.

Keywords: Machine learning, Fatty liver, Obesity, Bioelectrical impedance

INTRODUCTION

The global prevalence of steatotic liver disease (SLD) is rapidly increasing as the obesity crisis emerges worldwide, leading to a significant rise in liver-related complications such as liver cirrhosis and hepatocellular carcinoma.1-5 In response to these growing concerns, various noninvasive biomarkers and prediction models have been developed to diagnose SLD in its early stages and to curb its progression through timely intervention.6-8 However, these methods, requiring hospital visits for laboratory tests or imaging, may be less suitable for young adults who infrequently visit medical facilities and for individuals in healthcare-underserved areas.9,10 As the number of individuals in these screening blind spots increases, the worldwide disease burden caused by SLD is also on the rise.11,12 Therefore, there is a need for easily accessible diagnostic methods for SLD outside of hospital settings.

Among the diagnostic criteria for SLD, body mass index (BMI) is an easily measurable parameter outside clinical settings, derived solely from height and weight, and strongly correlates with total body fat.13 However, relying solely on BMI for predicting SLD has limitations particularly in the case of lean SLD, which is prevalent among Asians and is known to be associated with sarcopenia.1,14 To enhance the precision of SLD prediction, it can be beneficial to incorporate body fat and muscle mass measurements through bioelectrical impedance analysis (BIA).15 Due to its noninvasive and cost-effective attributes, the use of BIA is becoming more prevalent.16,17

Early diagnosis and appropriate medical interventions are crucial in reducing the disease burden of SLD. In this study, we aimed to develop a SLD prediction model using only noninvasive tests that can be easily conducted outside of clinical settings, utilizing machine learning algorithms to enhance the accuracy and efficiency of the prediction process. Furthermore, this approach aimed to provide early diagnosis and management opportunities to as many individuals as possible.

MATERIALS AND METHODS

1. Study population

We conducted a retrospective analysis of 28,506 adults who had routine health check-ups at Seoul National University Hospital Gangnam Healthcare Center (Seoul, South Korea) between January 1, 2022, and December 31, 2022. For external validation, 253,602 subjects who underwent comprehensive health examinations at Kangbuk Samsung Hospital between January 1, 2019, and December 31, 2019, were included. Among 28,506 subjects, 8,412 were excluded due to factors potentially inducing other liver diseases: 666 had chronic hepatitis B, 183 had chronic hepatitis C, 493 were using medication that could induce fatty liver, 11 had autoimmune-related liver disease, and 7,056 had a history of cancer diagnosis within 5 years, and 3 were suspected of having congestive hepatopathy. A total of 20,094 subjects were included in this study and randomly divided in a 7:3 ratio, with 14,066 assigned to a derivation cohort and 6,028 to a validation cohort (Fig. 1). Subjects were classified into two groups: the SLD group if hepatic steatosis was confirmed through abdominal ultrasound examination (LOGIQ 9, GE Healthcare, Milwaukee, WI, USA; or iU22, Philips Medical Systems, Bothell, WA, USA), and the non-SLD group if hepatic steatosis was absent.

Fig. 1.

Fig. 1

Flow diagram of patient selection for the study. Among the 28,506 individuals screened, 8,412 were excluded due to factors potentially inducing the development of other liver diseases. Consequently, a total of 20,094 individuals were included in this study; 10,878 were assigned to the non-steatotic liver disease (SLD) group, and 9,216 were assigned to the SLD group.

This study conformed to the ethical guidelines of the World Medical Association Declaration of Helsinki and was approved by the Institutional Review Board of Seoul National University Hospital (IRB number: H-2305-141-1434). The requirement for informed consent from patients was waived because of the retrospective nature of the study.

2. Definitions and assessments

Significant alcohol consumption was defined as more than 140 g per week for women and 210 g per week for men. Nonalcoholic fatty liver disease (NAFLD), metabolic dysfunction-associated fatty liver disease (MAFLD), and metabolic dysfunction-associated steatotic liver disease (MASLD) subcohorts were defined according to each respective definition (Supplementary Methods).13,18,19 Low-density lipoprotein cholesterol levels were calculated using the Friedewald equation and definitions of other factors are detailed in the supplementary methods.20 An Inbody 970 (InBody Co., Ltd., Seoul, Korea) was used for BIA to assess body composition, including measurements of body fat mass (kg), visceral fat area (cm2), and muscle mass (kg).

3. Statistical analysis

Categorical variables were presented as frequencies (%) and continuous variables as medians and interquartile ranges (IQRs). The Pearson chi-square test or Fisher exact test was used to compare categorical variables, and the Student t-test or the Mann-Whitney U test was used to compare continuous variables.

To enhance predictive accuracy and computational efficiency, we applied the Random Forest machine learning algorithm, known for its ensemble approach that combine multiple decision trees. The model was trained and tested on subdivided derivation cohort datasets, with adjustments made to the number of trees and their depth to optimize performance. After initial development of optimization, the model’s effectiveness was further validated on an independent cohort, evaluating its diagnostic power for SLD by analyzing areas under the receiver operating characteristic curve (AUROC), sensitivities, specificities, and likelihood ratios.

All statistical analyses were performed using R version 4.2.0 (R Foundation for Statistical Computing, Vienna, Austria) and Python version 3.11.0 (Python Software Foundation, Wilmington, DE, USA). p-values less than 0.05 were considered statistically significant.

RESULTS

1. Baseline characteristics

Among a total of 20,094 subjects, 10,878 were classified into the non-SLD group, and 9,216 into the SLD group (Fig. 1). The baseline characteristics of the total population, derivation cohort, and validation cohort are presented in Tables 1-3. There was no statistically significant difference in baseline characteristics between the derivation and validation cohorts (Table 1). Waist circumference, BMI, body fat mass, visceral fat area, and muscle mass were all significantly increased in the SLD group. Subjects diagnosed with diabetes mellitus (DM), hypertension, and dyslipidemia (DL) were more prevalent in the SLD group. The SLD group also had significantly more subjects drinking alcohol above the threshold (Tables 2 and 3). The baseline characteristics in the NAFLD, MAFLD, and MASLD cohorts also tended to be similar (Supplementary Tables 1-3).

Table 1.

Baseline Characteristics of the Total Population

Variable Derivation cohort (n=14,066) Validation cohort (n=6,028) p-value
Age, yr 51.0 (43.0–59.0) 51.0 (43.0–59.0) 0.21
Male sex 7,587 (53.9) 3,189 (52.9) 0.18
Waist circumference, cm 84.0 (77.3–90.0) 84.0 (77.5–89.9) 0.92
Body mass index, kg/m2 23.1 (20.8–25.3) 23.1 (20.9–25.3) 0.83
Bodyfat, kg 16.8 (13.4–20.7) 16.9 (13.6–20.7) 0.33
Visceral fat, cm2 80.0 (63.0–97.2) 80.1 (63.8–97.6) 0.60
Muscle, kg 26.5 (20.8–31.7) 26.3 (20.9–31.7) 0.75
Diabetes mellitus 4,460 (31.7) 1,916 (31.8) 0.93
Hypertension 2,756 (19.6) 1,187 (19.7) 0.89
Dyslipidemia 5,221 (37.1) 2,291 (38.0) 0.24
Fatty liver 6,454 (45.9) 2,762 (45.8) 0.95
Alcohol consumption (above threshold*) 2,533 (18.0) 1,117 (18.5) 0.39

Data are presented as median (interquartile range) or number (%).

*140 g/wk for females and 210 g/wk for males.

Table 2.

Baseline Characteristics of the Derivation Cohort

Variable Non-SLD group (n=7,632) SLD group (n=6,434) p-value
Age, yr 50.0 (40.0–58.0) 53.0 (46.0–60.0) <0.001
Sex <0.001
Female 4,730 (62.3) 1,790 (27.6)
Male 2,862 (37.7) 4,684 (72.4)
Waist circumference, cm 79.0 (74.0–84.5) 88.6 (84.0–94.0) <0.001
Body mass index, kg/m2 21.5 (19.7–23.4) 24.9 (23.2–26.8) <0.001
Body fat, kg 14.6 (11.9–17.8) 19.4 (16.3–23.4) <0.001
Visceral fat, cm2 69.5 (53.9–85.7) 91.5 (77.4–107.9) <0.001
Muscle, kg 22.4 (19.9–29.1) 30.0 (24.3–33.4) <0.001
Diabetes mellitus <0.001
Absent 5,973 (78.7) 3,611 (55.8)
Present 1,619 (21.3) 2,863 (44.2)
Hypertension <0.001
Absent 6,642 (87.5) 4,631 (71.5)
Present 950 (12.5) 1,843 (28.5)
Dyslipidemia <0.001
Absent 5,557 (73.2) 3,224 (49.8)
Present 2,035 (26.8) 3,250 (50.2)
Alcohol consumption <0.001
Below threshold* 6,042 (79.6) 5,491 (84.8)
Above threshold* 1,550 (20.4) 983 (15.2)

Data are presented as median (interquartile range) or number (%).

SLD, steatotic liver disease.

*140 g/wk for females and 210 g/wk for males.

Table 3.

Baseline Characteristics of the Validation Cohort

Variable Non-SLD group (n=3,246) SLD group (n=2,782) p-value
Age, yr 50.0 (40.0–59.0) 53.0 (47.0–60.0) <0.001
Sex <0.001
Female 1,974 (61.0) 811 (29.0)
Male 1,262 (39.0) 1,981 (71.0)
Waist circumference, cm 79.0 (74.0–84.5) 89.0 (84.0–94.0) <0.001
Body mass index, kg/m2 21.5 (19.7–23.5) 24.9 (23.2–26.9) <0.001
Body fat, kg 14.5 (11.7–17.7) 19.6 (16.3–23.6) <0.001
Visceral fat, cm2 69.0 (53.3–85.4) 92.4 (77.3–108.9) <0.001
Muscle, kg 22.6 (20.0–29.4) 29.9 (24.0–33.3) <0.001
Diabetes mellitus <0.001
Absent 2,555 (79.0) 1,536 (55.0)
Present 681 (21.0) 1,256 (45.0)
Hypertension <0.001
Absent 2,832 (87.5) 1,997 (71.5)
Present 414 (12.5) 795 (28.5)
Dyslipidemia <0.001
Absent 2,363 (73.0) 1,403 (50.3)
Present 873 (27.0) 1,389 (49.7)
Alcohol consumption <0.001
Below threshold* 2,596 (80.2) 2,370 (84.9)
Above threshold* 640 (19.8) 422 (15.1)

Data are presented as median (interquartile range) or number (%).

SLD, steatotic liver disease.

*140 g/wk for females and 210 g/wk for males.

2. Factors associated with SLD

In univariable logistic regression analyses, age, sex, waist circumference, BMI, body fat mass, visceral fat area, skeletal muscle mass, body fat mass per skeletal muscle mass, visceral fat area per skeletal muscle mass, history of DM, hypertension, DL, and alcohol consumption showed statistically significant differences between the non-SLD and SLD groups in the derivation cohort (Table 4). Multivariable logistic regression analyses were conducted following collinearity analyses to select variables without significant interactions. We developed three distinct models by integrating the selected variables and conducted multivariable analyses for each one. In all models, male, BMI, DM, and DL were consistently identified as independent predictors of SLD. In model 2, body fat per muscle mass was identified as an independent predictor, and in model 3, visceral fat area per muscle mass was also an independent predictor of SLD (Table 4).

Table 4.

Logistic Regression Analyses of Factors Associated with Steatotic Liver Disease in the Total Derivation Cohort

Variable Crude OR
(95% CI)
p-value Model 1 Model 2 Model 3
aOR (95% CI) p-value aOR (95% CI) p-value aOR (95% CI) p-value
Age <0.001 <0.001 0.002 0.609
≤60 yr 1 (reference) 1 (reference) 1 (reference) 1 (reference)
>60 yr 1.02 (1.02–1.03) 1.01 (1.01–1.02) 1.01 (1.01–1.02) 1.01 (1.00–1.01)
Sex <0.001 <0.001 <0.001 <0.001
Female 1 (reference) 1 (reference) 1 (reference) 1 (reference)
Male 4.32 (4.03–4.65) 1.75 (1.60–1.92) 4.62 (3.98–5.38) 3.18 (2.77–3.66)
Waist circumference 1.18 (1.17–1.19) <0.001
Body mass index 1.56 (1.54–1.59) <0.001 1.46 (1.44–1.49) <0.001 1.28 (1.25–1.31) <0.001 1.37 (1.34–1.40) <0.001
Body fat 1.22 (1.21–1.23) <0.001
Muscle 1.14 (1.13–1.14) <0.001
Body fat/muscle 4.01 (3.49–4.60) <0.001 13.32 (9.80–18.16) <0.001
Visceral fat/muscle 1.31 (1.27–1.34) <0.001 1.34 (1.27–1.42) <0.001
Diabetes mellitus <0.001 1.87 (1.70–2.60) <0.001 1.79 (1.62–1.96) <0.001 <0.001
Absent 1 (reference) 1 (reference)
Present 2.93 (2.72–3.15) 1.87 (1.70–2.06)
Hypertension <0.001 0.18 0.19 0.24
Absent 1 (reference) 1 (reference) 1 (reference) 1 (reference)
Present 2.78 (2.55–3.04) 0.93 (0.83–1.04) 0.93 (0.83–1.04) 0.94 (0.84–1.05)
Dyslipidemia <0.001 <0.001 <0.001 <0.001
Absent 1 (reference) 1 (reference) 1 (reference) 1 (reference)
Present 2.75 (2.57–2.95) 1.51 (1.38–1.65) 1.47 (1.35–1.61) 1.48 (1.35–1.62)
Alcohol consumption <0.001 0.94 0.37 0.17
Below threshold* 1 (reference) 1 (reference) 1 (reference) 1 (reference)
Above threshold* 1.43 (1.31–1.56) 1.00 (0.90–1.13) 1.05 (0.94–1.18) 1.08 (0.97–1.22)

OR, odds ratio; CI, confidence interval; aOR, adjusted OR.

*140 g/wk for females and 210 g/wk for males.

3. The SLD model 1: using age, sex, BMI, and history of DM or DL

In the derivation cohort, the probability of SLD was calculated using a logistic regression model through a machine learning algorithm, represented as e(0.10×Age+0.28×Sex+1.30×BMI+0.27×DM+0.19×DL)/(1+e(0.10×Age+0.28×Sex+1.30×BMI+0.27×DM+0.19×DL)) (sex, female=0 and male=1; DM, absence of DM=0 and presence of DM=1; DL, absence of DL=0 and presence of DL=1). We utilized the exponent of this formula and divided the multiplicative factor for age, which is 0.10, to approximate the multiplicative factors into integers. Consequently, we formulated an equation capable of predicting the presence of SLD.

SLD model 1=age+13×BMI (+3, if male; +3, if DM; +2, if DL)

The median value of SLD model 1 is 331.4 (IQR, 302.8 to 360.9) in the non-SLD group and 381.8 (IQR, 358.5 to 407.1) in the SLD group (p<0.001). AUROC of SLD model 1 is 0.817 (95% confidential interval [CI], 0.811 to 0.824) (Fig. 2A).

Fig. 2.

Fig. 2

Receiver operating characteristic curve of prediction model for detecting steatotic liver disease (SLD). (A) The AUROC was 0.817 (95% CI, 0.811 to 0.824) in the derivation cohort and 0.822 (95% CI, 0.811 to 0.832) in the validation cohort of SLD model 1. (B) The AUROC was 0.823 (95% CI, 0.816–0.829) in the derivation cohort and 0.822 (95% CI, 0.811–0.832) in the validation cohort of SLD model 2. (C) The AUROC was 0.820 (95% CI, 0.813 to 0.826) in the derivation cohort and 0.822 (95% CI, 0.812 to 0.832) in the validation cohort of SLD model 3. AUROC, areas under the receiver operating characteristic curve; CI, confidence interval.

In the validation cohort, median value of SLD model 1 is 331.1 (IQR, 303.7 to 360.4) in non-SLD group and 381.9 (IQR, 358.9 to 408.8) in SLD group (p<0.001). AUROC of SLD model 1 was 0.822 (95% CI, 0.811 to 0.832) (Fig. 2A). With a value of <330, model 1 demonstrated the ability to rule out SLD, achieving a sensitivity of 92.7% (95% CI, 91.8% to 93.7%) and a negative likelihood ratio of 0.136 (95% CI, 0.118–0.157) (Table 5). At a value of >400, model 1 could detect SLD with a specificity of 94.7% (95% CI, 93.9% to 95.4%) and a positive likelihood ratio of 6.117 (95% CI, 5.241 to 7.139) (Table 5). Based on these cutoff values, 2,467 subjects (86.9% of subjects with model 1 <330 or >400) were correctly classified.

Table 5.

Predictive Values of Steatotic Liver Disease in the Derivation and Validation Cohorts in the Model 1*

Cohort Low cutoff point (<330) Intermediate (330–400) High cutoff point (>400) Total No.
The derivation cohort
Total 4,179 (29.7) 7,464 (53.1) 2,423 (17.2) 14,066
Steatotic liver disease 465 (11.1) 3,956 (53.0) 2,013 (83.1) 6,434
Sensitivity, % (95% CI) 92.8 (92.1–93.4) 31.3 (30.2–32.4)
Specificity, % (95% CI) 48.7 (47.5–49.8) 94.6 (94.1–95.1)
Positive likelihood ratio, (95% CI) 1.807 (1.766–1.849) 5.824 (5.265–6.442)
Negative likelihood ratio, (95% CI) 0.149 (0.136–0.163) 0.726 (0.714–0.739)
Positive predictive value, % (95% CI) 83.1 (81.6–84.6)
Negative predictive value, % (95% CI) 88.9 (87.9–89.8)
The validation cohort
Total 1,802 (29.9) 3,146 (52.2) 1,038 (17.2) 6,028
Steatotic liver disease 202 (11.2) 1,673 (60.1) 867 (83.5) 2,782
Sensitivity, % (95% CI) 92.7 (91.8–93.7) 32.6 (30.9–34.3)
Specificity, % (95% CI) 49.3 (47.6–51.0) 94.7 (93.9–95.4)
Positive likelihood ratio, (95% CI) 1.843 (1.779–1.909) 6.117 (5.241–7.139)
Negative likelihood ratio, (95% CI) 0.136 (0.118–0.157) 0.712 (0.693–0.731)
Positive predictive value, % (95% CI) 84.0 (81.8–86.2)
Negative predictive value, % (95% CI) 88.8 (87.8–89.7)

Data are presented as number (%) unless indicated otherwise.

*Model 1: age+13×body mass index (+3, if male; +3, if diabetes mellitus; +2, if dyslipidemia).

In the NAFLD, MAFLD, and MASLD cohorts, the AUROCs of SLD model 1 were 0.811 (95% CI, 0.797 to 0.825), 0.870 (95% CI, 0.859 to 0.882), and 0.845 (95% CI, 0.832 to 0.857), respectively (Supplementary Fig. 1A). Applying the same cutoff values, the negative predictive values for NAFLD, MAFLD, and MASLD were 88.5%, 96.1%, and 93.1%, respectively, at low cutoff values (Supplementary Table 4).

4. The SLD model 2: using sex, BMI, body fat mass per muscle mass, and history of DM or DL

In the derivation cohort, the probability of SLD using a logistic regression model through an machine learning algorithm was e(0.80×Sex+0.88×BMI+0.28×DM+0.20×DL+0.61×Body fat mass per muscle mass)/(1+e(0.80×Sex+0.88×BMI+0.28×DM+0.20×DL+0.61×Body fat mass per muscle mass)) (sex, female=0 and male=1; DM, absence of DM=0 and presence of DM=1; DL, absence of DL=0 and presence of DL=1). We applied the exponent to this formula and adjusted the multiplicative factor for DL, which is 0.20, to approximate the multiplicative factors into integers. This process led us to develop an equation designed to predict the presence of SLD effectively.

SLD model 2=4.5×BMI+3.5 body fat mass per muscle mass (+4, if male; +1.5, if DM; +1, if DL)

The median value of SLD model 2 is 100.8 (IQR, 91.5 to 110.9) in non-SLD group and 118.8 (IQR, 110.5 to 128.1) in SLD group (p<0.001). AUROC of SLD model 2 is 0.823 (95% CI, 0.816 to 0.829) (Fig. 2B).

In the validation cohort, median value of SLD model 2 is 100.9 (IQR, 91.9 to 110.6) in non-SLD group and 118.5 (IQR, 110.2 to 127.8) in SLD group (p<0.001). AUROC of SLD model 2 was 0.822 (95% CI, 0.811 to 0.832) (Fig. 2B). At the low cutoff value of <100, model 2 demonstrated the ability to rule out SLD, achieving a sensitivity of 92.8% (95% CI, 91.8% to 93.8%) and a negative likelihood ratio of 0.152 (95% CI, 0.132 to 0.174) (Table 6). At the high cutoff value of >125, model 2 could detect SLD with a specificity of 95.1% (95% CI, 94.3% to 95.8%) and a positive likelihood ratio of 6.213 (95% CI, 5.289 to 7.299) (Table 6). Based on these cutoff values, 2,392 subjects (86.9% of subjects with model 2 <100 or >125) were correctly classified.

Table 6.

Predictive Values of Steatotic Liver Disease in the Derivation and Validation Cohorts in the Model 2*

Cohort Low cutoff point (<100) Intermediate (100–125) High cutoff point (>125) Total No.
The derivation cohort
Total 4,208 (29.9) 7,517 (53.4) 2,341 (16.6) 14,066
Steatotic liver disease 493 (11.7) 3,962 (52.7) 1,979 (84.5) 6,434
Sensitivity, % (95% CI) 92.8 (91.8-93.8) 30.8 (29.6–31.9)
Specificity, % (95% CI) 48.0 (46.9–49.1) 95.1 (94.3-95.8)
Positive likelihood ratio, (95% CI) 1.779 (1.739–1.820) 6.213 (5.289-7.299)
Negative likelihood ratio, (95% CI) 0.152 (0.132-0.174) 0.727 (0.715–0.739)
Positive predictive value, % (95% CI) 84.5 (83.1–86.0)
Negative predictive value, % (95% CI) 88.3 (87.3–89.3)
The validation cohort
Total 1,826 (30.3) 3,190 (52.9) 1,012 (16.8) 6,028
Steatotic liver disease 216 (11.8) 1,711 (53.6) 855 (84.5) 2,782
Sensitivity, % (95% CI) 92.2 (91.2–93.2) 30.7 (29.0–32.4)
Specificity, % (95% CI) 49.6 (47.9–51.3) 95.2 (94.4–95.9)
Positive likelihood ratio, (95% CI) 1.830 (1.766–1.897) 6.354 (5.401–7.475)
Negative likelihood ratio, (95% CI) 0.157 (0.137–0.179) 0.728 (0.709–0.747)
Positive predictive value, % (95% CI) 84.5 (82.3–86.7)
Negative predictive value, % (95% CI) 88.2 (86.7–89.7)

Data are presented as number (%) unless indicated otherwise.

*Model 2: 4.5×body mass index+3.5×body fat mass per muscle mass (+4, if male; +1.5, if diabetes mellitus; +1, if dyslipidemia).

In the NAFLD, MAFLD, and MASLD cohorts, the AUROCs of SLD model 2 were 0.820 (95% CI, 0.806 to 0.834), 0.875 (95% CI, 0.864 to 0.887), and 0.851 (95% CI, 0.838 to 0.863), respectively (Supplementary Fig. 1B). Applying the same cutoff values, the positive predictive value was consistent at 82.9% in NAFLD, MAFLD, and MASLD (Supplementary Table 5).

5. The SLD model 3: using sex, BMI, visceral fat mass per muscle mass, and history of DM or DL

In the derivation cohort, the probability of SLD using a logistic regression model through a machine learning algorithm was e(0.58×Sex+1.07×BMI+0.27×DM+0.19×DL+0.40×Visceral fat area per muscle mass)/(1+e(0.58×Sex+1.07×BMI+0.27×DM+0.19×DL+0.40×Visceral fat area per muscle mass)) (sex, female=0 and male=1; DM, absence of DM=0 and presence of DM=1; DL, absence of DL=0 and presence of DL=1). We utilized the exponent of this formula and divided the multiplicative factor for DL, which is 0.19, to approximate the multiplicative factors into integers. Consequently, we formulated an equation capable of predicting the presence of SLD.

SLD model 3=5.5×BMI+2.0×visceral fat area per muscle mass (+3, if male; +1.5, if DM; +1, if DL)

The median value of SLD model 3 is 126.2 (IQR, 114.6 to 137.6) in non-SLD group and 146.8 (IQR, 136.9 to 158.4) in SLD group (p<0.001). AUROC of SLD model 3 is 0.820 (95% CI, 0.813 to 0.826) (Fig. 2C).

In the validation cohort, median value of SLD model 3 is 125.8 (IQR, 115.0 to 137.5) in non-SLD group and 147.3 (IQR, 137.0 to 158.4) in SLD group (p<0.001). The AUROC of SLD model 3 was 0.822 (95% CI, 0.812 to 0.832). With a value of <125, model 3 demonstrated the ability to rule out SLD, achieving a sensitivity of 92.6% (95% CI, 91.7% to 93.6%) and a negative likelihood ratio of 0.150 (95% CI, 0.133 to 0.175) (Table 7). At a value of >155.0, model 3 could detect SLD with a specificity of 95.0% (95% CI, 94.3% to 95.8%) and a positive likelihood ratio of 6.288 (95% CI, 5.359 to 7.378) (Table 7). Based on these cutoff values, 2,443 subjects (86.9% of subjects with model 3 <125 or >155) were correctly classified.

Table 7.

Predictive Values of Steatotic Liver Disease in the Derivation and Validation Cohorts in the Model 3*

Cohort Low cutoff point (<125) Intermediate (125–155) High cutoff point (>155) Total No.
The derivation cohort
Total 4,056 (28.8) 7,614 (54.1) 2,396 (17.0) 14,066
Steatotic liver disease 451 (11.1) 3,966 (61.6) 2,017 (84.2) 6,434
Sensitivity, % (95% CI) 93.0 (92.4–93.6) 31.3 (30.2–32.5)
Specificity, % (95% CI) 47.3 (46.1–48.4) 95.0 (94.5–95.5)
Positive likelihood ratio, % (95% CI) 1.763 (1.724–1.803) 6.293 (5.669–6.986)
Negative likelihood ratio, % (95% CI) 0.148 (0.135–0.163) 0.723 (0.710–0.735)
Positive predictive value, % (95% CI) 84.1 (82.7–85.6)
Negative predictive value, % (95% CI) 88.9 (87.9–89.9)
The validation cohort
Total 1,775 (29.4) 3,218 (53.4) 1,035 (17.2) 6,028
Steatotic liver disease 205 (11.5) 1,704 (53.0) 873 (84.3) 2,782
Sensitivity, % (95% CI) 92.6 (91.7–93.6) 31.4 (29.7–33.1)
Specificity, % (95% CI) 48.4 (46.6–50.1) 95.0 (94.3–95.8)
Positive likelihood ratio, % (95% CI) 1.794 (1.732–1.858) 6.288 (5.359–7.378)
Negative likelihood ratio, % (95% CI) 0.150 (0.133–0.175) 0.722 (0.703–0.742)
Positive predictive value, % (95% CI) 84.3 (82.1–86.6)
Negative predictive value, % (95% CI) 88.5 (87.0–89.9)

Data are presented as number (%) unless indicated otherwise.

Model 3: 5.5×body mass index+2.0×visceral fat area per muscle mass (+3, if male; +1.5, if diabetes mellitus; +1, if dyslipidemia).

In the NAFLD, MAFLD, and MASLD cohorts, the AUROCs of SLD model 2 were 0.819 (95% CI, 0.806 to 0.833), 0.874 (95% CI, 0.862 to 0.885), and 0.850 (95% CI 0.838 to 0.863), respectively (Supplementary Fig. 1C). Applying the same cutoff values, the negative predictive values were higher in the order of MAFLD, MASLD, and NAFLD. The positive predictive value was consistent at 82.2% in all three cohorts (Supplementary Table 6).

6. External validation

For the external validation, using the same inclusion and exclusion criteria, 229,162 out of 253,602 subjects were included, with 149,214 classified into the non-SLD group and 79,948 into the SLD group (Supplementary Fig. 2). Similar to the Seoul National University Hospital Gangnam Healthcare Center cohort, waist circumference, BMI, bodyfat mass, visceral fat area, and muscle mass were also significantly higher in the SLD group (Supplementary Table 7). All models showed an AUROC above 0.840 (Supplementary Fig. 3). The predictive performance for NAFLD, MAFLD, and MASLD showed a similar tendency to that observed in internal validation (Supplementary Fig. 4, Supplementary Tables 8-10).

7. Comparison between FLI, HSI, and SLD models

When comparing SLD models with the most commonly used noninvasive predictive tools for fatty liver disease, the Fatty Liver Index (FLI) and Hepatic Steatosis Index (HSI), their predictive performance was similar. For predicting lean SLD in individuals with a BMI of less than 23, the FLI demonstrated the highest AUROC at 0.792 (95% CI, 0.781 to 0.803), followed by SLD model 2 with an AUROC of 0.744 (95% CI, 0.733 to 0.756). The remaining SLD models 1 and 3, along with HSI, showed similar values. In the external validation, the three SLD models and both FLI and HSI also demonstrated similar AUROCs in predicting SLD (Supplementary Fig. 5).

DISCUSSION

In this study, we derived a machine learning-based index for predicting SLD, utilizing simple, easily measurable factors outside the hospital setting. SLD model 1, which uses only age and BMI for prediction, along with models 2 and 3, which include BIA measurements of body fat mass, visceral fat area, and muscle mass, all demonstrated outstanding predictive performance with an AUROC >0.82 in the validation cohort. Furthermore, in models 2 and 3, which incorporate BIA measurements, the negative predictive value exceeded 92% at the low cutoff, and the positive predictive value exceeded 82% at the high cutoff. In the external validation involving 229,162 participants, all three demonstrated an AUROC >0.84. Above all, these SLD models showed relatively good predictive performance for SLD compared to other noninvasive models, such as FLI and HSI, even without laboratory tests.

Young adults generally have a lower prevalence of most diseases, which means they rarely visit hospitals unless they have a specific family history of medical conditions. Additionally, lower socioeconomic status is associated with increased barriers to health care access, which subsequently leads to worse health outcomes and premature death.21 The SLD index developed in our study enables the prediction of SLD in in both young and lower socioeconomic status individuals out of hospital settings. This method is also free or minimally expensive, noninvasive, and has proven accuracy. When these SLD prediction models indicate a high probability of SLD, further examinations can be conducted, and timely interventions can be implemented to reduce the disease burden and maximized socioeconomic cost efficiency.

This prediction model's high predictive performance without laboratory tests can be attributed to the use of machine learning algorithm and the incorporation of BIA as a significant predictive factor. In this study, we selected the Random Forest algorithms over other machine learning algorithms due to several compelling reasons. First, Random Forest employs an ensemble learning technique that combines multiple decision trees, thereby enabling more accurate and stable predictions.22 Second, it utilizes the Bootstrap sampling method to train on random subsets of the entire dataset. This approach not only enhances the diversity of the model but also helps prevent overfitting and improves generalization performance.22,23 Additionally, Random Forest facilitates the evaluation of the importance of each feature, enabling us to identify which variables most significantly impact predictions.21

BIA, a noninvasive and low-cost method for measuring body composition, has seen increased use for assessing factors such as body fat, visceral fat, and muscle mass, which are closely associated with SLD.24-27 We utilized BIA in its prediction models due to its accuracy and simplicity.24 This study found that the ratio of fat mass to muscle mass is more closely associated with SLD development than considering fat or muscle mass independently. This insight has not only enabled more accurate predictions of SLD using BIA but also improved the prediction accuracy for lean SLD patients. In this study, the BIA-based SLD models 2 and 3 demonstrated superior predictive power in cases of lean SLD compared to SLD model 1 and HSI.

This study had some limitations. First, the prediction models, created through retrospective regression analysis, may have selection bias, although we minimized this using a machine learning-based Random Forest algorithms. Second, our SLD models, derived from solely Asian cohort, need validation in other ethnic groups. However, as obesity patterns in Asia are increasingly mirroring those in the West, it is expected that similar results would be observed in other ethnic groups.

In conclusion, we developed a machine learning-based index for predicting SLD, utilizing simple, easily measurable factors outside hospital settings. The developed SLD models are cost-effective, noninvasive, and readily accessible, offering the opportunity for early detection of SLD in a broad population. By implementing early and proactive interventions, we can significantly reduce the global burden of the disease.

SUPPLEMENTARY MATERIALS

Supplementary materials can be accessed at https://doi.org/10.5009/gnl240367.

gnl-19-1-126-supple.pdf (712.1KB, pdf)

Footnotes

CONFLICTS OF INTEREST

Y.B.L. reports receiving research grants from Yuhan Pharmaceuticals; J.H.L. reports receiving research grants from Yuhan Pharmaceuticals, and lecture fee from GreenCross Cell; S.J.Y. reports receiving research grants from Roche, Yuhan Pharmaceuticals and Daewoong Pharmaceuticals; Y.J.K. reports receiving research grants from Yuhan Pharmaceuticals, AstraZeneca, and Boston Scientific and lecture fees from Bayer HealthCare Pharmaceuticals and MSD Korea; J.H.Y. reports receiving research grants from Roche, AstraZeneca, and Hanmi Pharmaceuticals.

S.J.Y. and Y.J.K. are editorial board members of the journal but were not involved in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflict of interest relevant to this article was reported.

AUTHOR CONTRIBUTIONS

Study concept and design: J.P., G.E.C., Y.C., Y.J.K. Data acquisition: J.P., G.E.C., Y.C. Data analysis and interpretation: J.P., G.E.C., Y.C. Drafting of the manuscript: J.P., G.E.C., Y.C. Critical revision of the manuscript for important intellectual content: all authors. Statistical analysis: J.P., S.R., Y.C. Administrative, technical, or material support; study supervision: Y.J.K. Approval of final manuscript: all authors.

REFERENCES

  • 1.Huang TD, Behary J, Zekry A. Non-alcoholic fatty liver disease: a review of epidemiology, risk factors, diagnosis and management. Intern Med J. 2020;50:1038–1047. doi: 10.1111/imj.14709. [DOI] [PubMed] [Google Scholar]
  • 2.Moon AM, Singal AG, Tapper EB. Contemporary epidemiology of chronic liver disease and cirrhosis. Clin Gastroenterol Hepatol. 2020;18:2650–2666. doi: 10.1016/j.cgh.2019.07.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wong SW, Chan WK. Epidemiology of non-alcoholic fatty liver disease in Asia. Indian J Gastroenterol. 2020;39:1–8. doi: 10.1007/s12664-020-01018-x. [DOI] [PubMed] [Google Scholar]
  • 4.Teng ML, Ng CH, Huang DQ, et al. Global incidence and prevalence of nonalcoholic fatty liver disease. Clin Mol Hepatol. 2023;29(Suppl):S32–S42. doi: 10.3350/cmh.2022.0365.e44ff4afd6e54a47aa472435c3d90f2b [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jeong S, Oh YH, Ahn JC, et al. Evolutionary changes in metabolic dysfunction-associated steatotic liver disease and risk of hepatocellular carcinoma: a nationwide cohort study. Clin Mol Hepatol. 2024;30:487–499. doi: 10.3350/cmh.2024.0145.61416920f3a94ce2ac6bebcc672f0448 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lee JH, Kim D, Kim HJ, et al. Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease. Dig Liver Dis. 2010;42:503–508. doi: 10.1016/j.dld.2009.08.002. [DOI] [PubMed] [Google Scholar]
  • 7.Bedogni G, Bellentani S, Miglioli L, et al. The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 2006;6:33. doi: 10.1186/1471-230X-6-33.bfadc7e1511141379fa0188f090432a3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shi YW, Fan JG. Surveillance of the progression and assessment of treatment endpoints for nonalcoholic steatohepatitis. Clin Mol Hepatol. 2023;29(Suppl):S228–S243. doi: 10.3350/cmh.2022.0401.1791a16d06e347c98b8d3c3407952bd0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Doycheva I, Watt KD, Alkhouri N. Nonalcoholic fatty liver disease in adolescents and young adults: the next frontier in the epidemic. Hepatology. 2017;65:2100–2109. doi: 10.1002/hep.29068. [DOI] [PubMed] [Google Scholar]
  • 10.Park JM. Health status and health services utilization in elderly Koreans. Int J Equity Health. 2014;13:73. doi: 10.1186/s12939-014-0073-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Simon TG, Roelstraete B, Hartjes K, et al. Non-alcoholic fatty liver disease in children and young adults is associated with increased long-term mortality. J Hepatol. 2021;75:1034–1041. doi: 10.1016/j.jhep.2021.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Le MH, Le DM, Baez TC, et al. Global incidence of adverse clinical events in non-alcoholic fatty liver disease: a systematic review and meta-analysis. Clin Mol Hepatol. 2024;30:235–246. doi: 10.3350/cmh.2023.0485.212640dbde3e48dd95151eac893e6f4f [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kanwal F, Neuschwander-Tetri BA, Loomba R, Rinella ME. Metabolic dysfunction-associated steatotic liver disease: update and impact of new nomenclature on the American Association for the Study of Liver Diseases practice guidance on nonalcoholic fatty liver disease. Hepatology. 2024;79:1212–1219. doi: 10.1097/HEP.0000000000000670. [DOI] [PubMed] [Google Scholar]
  • 14.Chen M, Cao Y, Ji G, Zhang L. Lean nonalcoholic fatty liver disease and sarcopenia. Front Endocrinol (Lausanne) 2023;14:1217249. doi: 10.3389/fendo.2023.1217249.5d61a34f54094047b8b814d2ec628699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hanna DJ, Jamieson ST, Lee CS, et al. "Bioelectrical impedance analysis in managing sarcopenic obesity in NAFLD". Obes Sci Pract. 2021;7:629–645. doi: 10.1002/osp4.509.cef0766622394e14aa43d01cab44bf24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Orkin S, Yodoshi T, Romantic E, et al. Body composition measured by bioelectrical impedance analysis is a viable alternative to magnetic resonance imaging in children with nonalcoholic fatty liver disease. JPEN J Parenter Enteral Nutr. 2022;46:378–384. doi: 10.1002/jpen.2113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lukaski HC. Regional bioelectrical impedance analysis: applications in health and medicine. Acta Diabetol. 2003;40 Suppl 1:S196–S199. doi: 10.1007/s00592-003-0064-4. [DOI] [PubMed] [Google Scholar]
  • 18.Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77:1797–1835. doi: 10.1097/HEP.0000000000000323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Eslam M, Sanyal AJ, George J International Consensus Panel, author. MAFLD: a consensus-driven proposed nomenclature for metabolic associated fatty liver disease. Gastroenterology. 2020;158:1999–2014. doi: 10.1053/j.gastro.2019.11.312. [DOI] [PubMed] [Google Scholar]
  • 20.Friedewald WT, Levy RI, Fredrickson DS. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem. 1972;18:499–502. doi: 10.1093/clinchem/18.6.499. [DOI] [PubMed] [Google Scholar]
  • 21.McMaughan DJ, Oloruntoba O, Smith ML. Socioeconomic status and access to healthcare: interrelated drivers for healthy aging. Front Public Health. 2020;8:231. doi: 10.3389/fpubh.2020.00231.3207f9a1caa942839abd3e560cad3930 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Breiman L. Random forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 23.Qi Y. In: Ensemble machine learning. Zhang C, Ma Y, editors. Springer; New York: 2012. Random forest for bioinformatics; pp. 307–323. [DOI] [Google Scholar]
  • 24.Khalil SF, Mohktar MS, Ibrahim F. The theory and fundamentals of bioimpedance analysis in clinical status monitoring and diagnosis of diseases. Sensors (Basel) 2014;14:10895–10928. doi: 10.3390/s140610895.0da5d578944144288dbae0ff824a745e [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kim D, Chung GE, Kwak MS, et al. Body fat distribution and risk of incident and regressed nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2016;14:132–138. doi: 10.1016/j.cgh.2015.07.024. [DOI] [PubMed] [Google Scholar]
  • 26.Cheung O, Kapoor A, Puri P, et al. The impact of fat distribution on the severity of nonalcoholic fatty liver disease and metabolic syndrome. Hepatology. 2007;46:1091–1100. doi: 10.1002/hep.21803. [DOI] [PubMed] [Google Scholar]
  • 27.Lee YH, Kim SU, Song K, et al. Sarcopenia is associated with significant liver fibrosis independently of obesity and insulin resistance in nonalcoholic fatty liver disease: nationwide surveys (KNHANES 2008-2011) Hepatology. 2016;63:776–786. doi: 10.1002/hep.28376. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gnl-19-1-126-supple.pdf (712.1KB, pdf)

Articles from Gut and Liver are provided here courtesy of The Korean Society of Gastroenterology, the Korean Society of Gastrointestinal Endoscopy, the Korean Society of Neurogastroenterology and Motility, Korean College of Helicobacter and Upper Gastrointestinal Research, Korean Association for the Study of Intestinal Diseases, the Korean Association for the Study of the Liver, the Korean Society of Pancreatobiliary Disease, and the Korean Society of Gastrointestinal Cancer

RESOURCES