Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 Dec 20;27(3):1406–1414. doi: 10.1111/dom.16142

Diagnostic accuracy of Agile‐4 score for liver cirrhosis in patients with metabolic dysfunction‐associated steatotic liver disease. A systematic review and meta‐analysis of diagnostic test accuracy studies

Konstantinos Malandris 1,, Anastasia Katsoula 2, Aris Liakos 1, Thomas Karagiannis 1, Emmanouil Sinakos 3, Olga Giouleme 2, Philippos Klonizakis 1, Eleni Theocharidou 1, Eleni Gigi 1, Eleni Bekiari 1, Apostolos Tsapas 1,4
PMCID: PMC11802403  PMID: 39703127

Abstract

Aims

A novel noninvasive score, Agile‐4 score, combining liver stiffness measurements, aspartate aminotransferase/alanine aminotransferase, platelet count, diabetes status and sex has been developed for the identification of cirrhosis in patients with metabolic dysfunction‐associated steatotic liver disease (MASLD). We assessed the performance of Agile‐4 for ruling‐in/out liver cirrhosis in MASLD patients.

Materials and Methods

We searched Medline, Cochrane library, Web of science, Scopus and Echosens website up to May 2024. Eligible studies assessed the accuracy of Agile‐4 for ruling‐in (≥0.565) and ruling‐out (<0.251) liver cirrhosis, using biopsy as the reference standard, at predefined thresholds. We calculated pooled sensitivity and specificity estimates for both Agile‐4 thresholds alongside 95% confidence intervals following bivariate random‐effect models. We assessed the risk of bias using Quality Assessment of Diagnostic Accuracy Studies‐2 tool.

Results

We included seven studies with 6037 participants. An Agile‐4 score ≥0.565 yielded a pooled specificity of 0.93 (95% CI, 0.86–0.97). Similarly, an Agile‐4 score <0.251 excluded cirrhosis with a summary sensitivity of 0.90 (0.80–0.95). Assuming a cirrhosis prevalence of 30%, the positive predictive value (PPV) for ruling‐in cirrhosis was 80%, while the negative predictive value for ruling‐out cirrhosis was 95%. Most studies were at high or unclear risk for bias due to concerns regarding patient selection and the blinding status of Agile‐4 score interpretation in relation to biopsy results.

Conclusions

Agile‐4 score performs well for ruling‐in/out liver cirrhosis in MASLD patients. Owing to the relatively low PPV, sequential application of the Agile‐4 after fibrosis‐4 index (FIB‐4) testing might further enhance its performance.

Keywords: Agile‐4 score, cirrhosis, MASLD, meta‐analysis, systematic review

1. INTRODUCTION

Metabolic dysfunction‐associated steatotic liver disease (MASLD) affects nearly 30% of the general population and 65% of people with type 2 diabetes (T2D). 1 , 2 Metabolic dysfunction‐associated steatohepatitis (MASH), the progressive form of MASLD, is characterized by liver inflammation and hepatocellular injury that can potentially lead to the development of liver fibrosis and eventually cirrhosis. Patients with bridging fibrosis (fibrosis stage = F3) and cirrhosis (F4) are at increased risk of liver‐related complications and death. 3 Therefore, early identification of such patients is of major importance.

Liver biopsy remains the reference standard for fibrosis assessment. Nevertheless, it is an invasive procedure limited by increased cost, reader variability and sampling error. 4 Among the noninvasive methods available for fibrosis assessment, the fibrosis‐4 index (FIB‐4) and liver stiffness measurement (LSM) by vibration‐controlled transient elastography (VCTE) are the most widely used in clinical practice. 5 , 6 Although both modalities are effective in excluding advanced fibrosis and cirrhosis, respectively, they lack adequate accuracy for ruling‐in higher fibrosis grades. 7 To address this issue, Sanyal et al. 8 introduced the Agile3+ and Agile 4 scores for the diagnosis of advanced fibrosis (F ≥ F3) and cirrhosis (F4) in MASLD, respectively. Both scores combine LSM by VCTE, demographic characteristics (age, gender, T2D) and common laboratory markers including, platelets, aspartate aminotransferase (AST) and alanine aminotransaminase (ALT) to produce a score ranging from 0 to 1. 8

Recently, Dalbeni et al. 9 performed a systematic review and meta‐analysis assessing the diagnostic performance of Agile3+ suggesting its good accuracy for ruling‐in advanced fibrosis in MASLD with a pooled specificity of 87%. Acting complementary to Agile3+ which targets advanced fibrosis, Agile 4 enables the noninvasive diagnosis of cirrhosis. Early identification of cirrhosis is crucial for the timely initiation of a screening program to monitor for the development of clinical decompensation, including ascites, encephalopathy, hepatocellular carcinoma and variceal haemorrhage. To provide a thorough summary of existing evidence, we performed a systematic review and meta‐analysis of diagnostic accuracy studies assessing the Agile 4 score for liver cirrhosis in patients with MASLD using liver biopsy as the reference standard.

2. MATERIALS AND METHODS

The protocol for this systematic review and meta‐analysis of diagnostic accuracy studies is registered in Prospero (registration no CRD42024557282). 10 We report our methods and results in line with the Preferred Reporting Items for a Systematic Review and Meta‐analysis of Diagnostic Test Accuracy Studies (PRISMA‐DTA) guidelines (Table S1). 11

2.1. Eligibility criteria

We included cohort or cross‐sectional studies assessing the accuracy of Agile 4 score for the diagnosis of cirrhosis in adults with MASLD, using liver biopsy as the reference standard. Eligible studies were published in English language and evaluated the accuracy of Agile 4 score at the previously described thresholds by Sanyal et al. 8 for ruling‐in (Agile 4 ≥ 0.565) and ruling‐out (Agile 4 < 0.251) liver cirrhosis. We excluded studies without adequate information to reconstruct 2 × 2 classification tables for thresholds of interest, and case–control studies as this study type may lead to biased diagnostic accuracy estimates. 12 Details on eligibility criteria are presented in Data S1.

2.2. Search strategy and study selection

We searched Medline via PubMed, Scopus, Web of Science, Cochrane library and Echosens website up to 18 May 2024. Our search strategy included controlled vocabulary terms and free text words (Tables S2–S6). We used the Polyglot Search Translator to facilitate conversion of search strings across databases. 13

Search results were imported into a reference manager software and duplicate records were removed. Two reviewers working independently assessed the remaining records first at title and abstract levels and then in full text. Any disagreements during the study selection process were resolved through discussion or by a senior reviewer. The study selection process was performed using Covidence web application (Covidence systematic review software, Veritas Health Innovation).

2.3. Data extraction and quality assessment

Two reviewers working independently extracted data from eligible studies using predesigned and pilot‐tested forms. We extracted information on study characteristics, participant baseline characteristics and diagnostic accuracy results in terms of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). In case of overlapping cohorts between different publications, we extracted data from the largest cohort, provided there was adequate information to reconstruct 2 × 2 classification tables.

Two independent reviewers assessed risk of bias and applicability using the Quality Assessment of Diagnostic Accuracy Studies‐2 (QUADAS‐2) tool. 14 We took into consideration the following domains: patient selection, index test, reference standard, and flow and timing. Details on risk of bias and applicability judgements are presented in Data S1. Any disagreements during data extraction and quality assessment were resolved through discussion or by a senior reviewer.

2.4. Data synthesis and analysis

The primary outcome of interest was the accuracy of Agile 4 score for ruling‐in (Agile 4 ≥ 0.565) liver cirrhosis. Secondary outcome was the accuracy of Agile 4 score for ruling‐out liver cirrhosis (Agile 4 < 0.251).

For eligible studies, we created 2 × 2 classification tables for both outcomes of interest. Using TP, FP, TN and FN values, we recalculated sensitivity and specificity estimates with their 95% confidence intervals (CIs) for each study, and generated coupled forest plots for both outcomes to visually present these estimates. We produced pooled estimates of sensitivity, specificity, positive likelihood ratio (LRp) and negative likelihood ratio (LRn) for thresholds of interest following a bivariate random‐effect model. 15 , 16 We also provided individual and pooled study estimates in ROC space with 95% confidence and prediction regions.

We evaluated heterogeneity through visual examination of forest plots and size of prediction regions. 16 We assessed for possible sources of heterogeneity through meta‐regression analysis. Covariates of interest were: age, body mass index (BMI), prevalence of T2D (%) and prevalence of male gender (%). We applied the Cook's distance approach alongside standardized residuals to identify potentially influential studies. 17 We planned prespecified sensitivity analyses excluding influential studies identified with the Cook's distance approach, studies at unclear or high applicability concerns and studies of retrospective design due to concerns related to disease spectrum and overestimation of diagnostic accuracy results. 18

We assessed the clinical utility of Agile 4 score with Fagan nomograms for different pre‐test probabilities reflecting both low and high cirrhosis prevalence settings. Using the pooled estimates of sensitivity and specificity, we calculated positive (PPV) and negative predictive values (NPV) for the same prevalence scenarios. We also generated likelihood ratio scattergrams to present clinical utility based on summary LRs. 19 All statistical analyses were performed with STATA v.11.2 statistical software using ‘metandi’ and ‘midas’ modules.

3. RESULTS

After duplicate record removal, we screened 374 records at title and abstract levels. Following full text screening evaluation, seven studies with 6037 participants were included in the systematic review and meta‐analysis. 8 , 20 , 21 , 22 , 23 , 24 , 25 The study selection process is presented in Figure S1.

3.1. Study and participant characteristics

Table 1 presents the characteristics of included studies and participants. Most studies were multicentre, following a retrospective design, deriving from tertiary healthcare facilities. One study was published solely as a conference abstract. 22 The study by Sanyal et al. 8 included four different cohorts (French NAFLD cohort, NASH CRN cohort, Validation set, Training set) with results presented separately. This study contributed the largest amount of data, comprising nearly 60% of the total combined sample size from all included studies, with 3761 participants. 8 The median age of participants among studies ranged from 36.5 to 61.0 years. Out of the 6037 participants, almost half (3084 participants) were males (51.1%), and the prevalence of T2D was 44.6% (2695 participants). The average median BMI was 30.9 kg/m2 ranging from 27.8 to 34.6 kg/m2. Median ALT and AST levels ranged from 44.0 to 64.9 IU/L and from 37.0 to 49.0 IU/L, respectively. The average median FIB‐4 score across studies was 1.41 ranging from 0.80 to 2.02. Similarly, median LSM values by VCTE ranged from 8.4 to 10.8 kPa. 1438 participants (23.8%) had bridging fibrosis and the prevalence of cirrhosis was 16.2% (981 participants).

TABLE 1.

Baseline characteristics of included studies.

Reference Country Centres, design Participants, N. Males, N. (%) Median Age [IQR] Diabetes, N. (%) Median BMI, Kg/m2 [IQR] Median ALT, IU/L [IQR] Median AST, IU/L [IQR] Median FIB‐4 [IQR] Median LSM, kPa [IQR] Bridging fibrosis, N. (%) Cirrhosis, N. (%) Agile 4 rule‐out cut off Agile 4 rule‐in cut off AGILE 4 score grey zone, N (%)
Taru et al. 20 Romania Single centre, retrospective 246 133 (54.1) 52.0 [20.0] 75 (30.5) 29.0 [5.1] 56.0 [66.0] 43.0 [38.0] 1.26 [1.29] 8.4 [7.5] 44 (17.9) 29 (11.8) <0.251 ≥0.565 28 (11.4)
Fan et al. 21 , a International Multicentre, NR 160 123 (76.9) 36.5 [22.0] 29 (18.1) NR 64.9 [95.7] 40.0 [44.8] 0.80 [0.70] 9.3 [6.3] 10 (6.3) 13 (8.1) <0.251 ≥0.565 9 (5.6)
Mehta et al. 22 India Multicentre, NR 381 241 (63.2) 42.0 [20.0] NR NR NR NR NR NR NR 38 (10.0) <0.251 ≥0.565 54 (14.1)
Nakatsuka et al. 23 Japan Single centre, retrospective 300 187 (62.3) 55.0 [21.3] 91 (30.3) 27.8 [5.3] NR NR 1.45 [1.40] 8.7 [5.1] 69 (23) 24 (8.0) <0.251 ≥0.565 74 (24.7)
Noureddin et al. 24 USA Multicentre, retrospective 548 190 (35.0) 58.0 [15.0] 292 (53.0) 33.3 [8.5] 44.0 [42.0] 36.0 [30.0] 1.67 [1.61] 12 [10] 111 (20) 144 (26.0) <0.251 ≥0.565 126 (23.0)
Oeda et al. 25 Japan Multicentre, retrospective 641 281 (43.8) 61.0 [NR] 352 (54.9) 27.9 [NR] 60.0 [NR] 49.0 [NR] 2.02 [NR] 8.9 [NR] 146 (22.8) 25 (3.9) <0.251 ≥0.565 89 (13.9)
Sanyal et al. 8 [Training set] b International Multicentre, retrospective 1434 729 (50.8) 55.0 [16.0] 723 (50.4) 31.7 [7.8] 49.0 [47.0] 39.0 [31.0] 1.40 [1.25] 10.8 [10.2] 437 (30.5) 335 (23.4) <0.251 ≥0.565 244 (17.0)
Sanyal et al. 8 [Validation set] b International Multicentre, retrospective 700 359 (51.3) 55.5 [16.0] 357 (51.0) 31.6 [8.1] 47.0 [45.2] 38.0 [29.0] 1.40 [1.20] 10.5 [9.8] 215 (30.7) 165 (23.6) <0.251 ≥0.565 122 (16.0)
Sanyal et al. 8 [NASH CRN cohort] b International Multicentre, retrospective 585 219 (37.4) 54.0 [17.0] 268 (45.8) 34.6 [9.1] 48.0 [42.0] 37.0 [28.0] 1.30 [1.06] 8.6 [7.2] 139 (23.8) 75 (12.8) <0.251 ≥0.565 76 (13.0)
Sanyal et al. 8 [French NAFLD cohort] b International Multicentre, retrospective 1042 622 (59.7) 58.0 [15.4] 508 (48.8) 31.2 [7.7] 57.0 [45.0] 39.5 [26.0] 1.40 [1.17] 8.5 [6.7] 267 (25.6) 133 (12.8) <0.251 ≥0.565 115 (11.0)

Abbreviations: BMI, body mass index; Bridging fibrosis defined as F = F3; FIB‐4, fibrosis index 4; LSM, liver stiffness by vibration‐controlled transient elastography; N, number; NR, not reported; IQR, interquartile range.

a

The study by Fan et al. 21 included three different cohorts providing data on the diagnostic accuracy of Agile 4 score. We used only the data from the Guangzhou cohort, due to some population overlapping concerns of the remaining two cohorts (Hong Kong, Wenzhou) with the study by Sanyal et al. 8

b

The study by Sanyal et al. 8 included four different cohorts (Training set, Validation set, NASH CRN cohort, French NAFLD cohort). Baseline data and diagnostic accuracy estimates were reported separately for each cohort.

3.2. Risk of bias and applicability assessment

Most studies were at unclear or high risk for bias due to concerns regarding patient selection and/or interpretation of the Agile 4 score with previous knowledge of liver biopsy results (Figure S2). Three studies raised applicability concerns because of either poor reporting of the VCTE LSM criteria or using less than 10 measurements to determine the median LSM by VCTE. 8 , 22 , 23

3.3. Accuracy of Agile 4 score ruling‐in/out liver cirrhosis

Figure 1 presents individual study estimates for ruling‐in liver cirrhosis (Agile 4 ≥ 0.565). Sensitivity and specificity estimates across studies ranged from 0.39 to 1.00 and from 0.62 to 0.99, respectively. An Agile 4 score ≥0.565 yielded a pooled sensitivity of 0.67 (95% CI, 0.50–0.80), specificity 0.93 (0.86–0.97), LRp 9.5 (4.8–18.9) and LRn 0.36 (0.23–0.56). For ruling‐out liver cirrhosis, individual study estimates for sensitivity and specificity ranged from 0.71 to 1.00 and from 0.34 to 0.97, respectively (Figure 2). An Agile 4 score <0.251 yielded a pooled sensitivity of 0.90 (0.80–0.95), specificity 0.83 (0.72–0.91), LRp 5.4 (3.2–9.2) and LRn 0.12 (0.06–0.24).

FIGURE 1.

FIGURE 1

Coupled forest plot of sensitivity and specificity of Agile 4 score for ruling‐in liver cirrhosis.

FIGURE 2.

FIGURE 2

Coupled forest plot of sensitivity and specificity of Agile 4 score for ruling‐out liver cirrhosis.

3.4. Heterogeneity assessment and additional analyses

Based on visual inspection of forest plots and the size of prediction regions (Figure S3), there was increased heterogeneity for both outcomes of interest. For ruling‐in liver cirrhosis, the prevalence of T2D (LRT χ 2 = 16.2, p <0.01) and BMI (LRT χ 2 = 7.2, p = 0.03) were identified as potential sources of heterogeneity (Table S7). Similarly, age (LRT χ 2 = 15.3, p <0.01) and T2D prevalence (LRT χ 2 = 12.4, p <0.01) were potential sources of heterogeneity for ruling‐out liver cirrhosis.

Results from sensitivity analyses are presented in Table S8. Most of the included studies were retrospective and as such, a sensitivity analysis excluding respective studies was not feasible. Following the exclusion of studies with applicability concerns, sensitivity improved from 0.90 to 0.97 for ruling‐out cirrhosis. Nevertheless, this analysis included only four studies with a considerably lower number of participants (1595 participants). Based on Cook's distance approach and standardized residuals, the studies by Mehta et al. 22 and Nakatsuka et al. 23 were influential for ruling‐in cirrhosis yielding the lowest estimates of specificity (0.66 and 0.62, respectively) (Figure S4). The study by Mehta et al. was identified solely as a conference abstract, and as such detailed data on baseline characteristics were not available. Moreover, the study by Nakatsuka et al. had the lowest median BMI among included studies. An analysis excluding both studies yielded similar specificity estimates to our main analysis. Similarly, the study by Nakatsuka et al. was identified as influential for ruling‐out the target condition (Figure S5). Sensitivity estimates after exclusion of this study were comparable to our main analyses (0.88 vs. 0.90).

3.5. Clinical utility

For ruling‐in liver cirrhosis, assuming a prevalence of 5% or 30%, the probability of having cirrhosis following a positive test was 33% or 80%, respectively (Figure S6). For ruling‐out liver cirrhosis and for the same prevalence scenarios, the post‐test probability after a negative test result was 1% and 5%, respectively (Figure S7). Table 2 presents PPVs and NPVs of Agile 4 for ruling‐in and ruling‐out cirrhosis for different prevalence scenarios. Figures S8 and S9 present scattergrams of individual study and pooled estimates of LRs.

TABLE 2.

Positive and negative predictive values for ruling‐in/out liver cirrhosis across different prevalence scenarios.

Cirrhosis prevalence Agile 4 score ≥0.565 Agile 4 score <0.251
PPVs NPVs
5% 34% 99%
10% 52% 99%
15% 63% 98%
20% 71% 97%
25% 76% 96%
30% 80% 95%

Abbreviations: NPV, Negative predictive value; PPV, Positive predictive value.

4. DISCUSSION

4.1. Summary of findings

In this systematic review and meta‐analysis of diagnostic accuracy studies, we assessed the performance of the Agile 4 score for liver cirrhosis in patients with MASLD using liver biopsy as the reference standard. The Agile 4 score thresholds were set at ≥0.565 for ruling‐in and <0.251 for ruling‐out cirrhosis. Our analysis, which pooled data from seven studies involving over 6000 participants, demonstrated that the Agile 4 score performs well for both outcomes. An Agile 4 score ≥0.565 yielded a summary specificity 0.93 with a LRp of 9.5. Conversely, an Agile 4 score <0.251 resulted in a pooled sensitivity of 0.90 and a LRn of 0.12. Assuming a prevalence of 30%, the PPV of Agile score for ruling‐in cirrhosis was 80%, while the NPV of Agile score for ruling‐out the target condition was 95%.

4.2. Strengths and limitations

Our systematic review and meta‐analysis of diagnostic accuracy studies provides an up‐to‐date evidence synthesis on the diagnostic performance of Agile 4. Using robust methodology, following the latest Cochrane recommendations, 26 we conducted an extensive literature search across multiple databases, identifying seven studies with a combined total of over 6000 participants. Our clinically focused results employ the dual cutoff approach, utilizing the most widely validated Agile 4 thresholds for ruling‐in or ruling‐out liver cirrhosis. Additionally, we evaluated the risk of bias and applicability using the QUADAS‐2 tool.

Certain limitations must be acknowledged. There was increased heterogeneity for both outcomes of interest. Nevertheless, this phenomenon is common in meta‐analyses of diagnostic test accuracy studies. 26 Through meta‐regression analysis we managed to identify age, BMI and prevalence of T2D as potential sources of heterogeneity. However, this analysis was based on a limited number of studies, and therefore should be interpreted with caution. 16 Similarly, the sparse reporting of relevant data and the small number of included studies prohibited subgroup analyses for these variables. However, in the largest study to date by Sanyal et al., the Agile 4 score demonstrated comparable performance across BMI subgroups (<30 vs. ≥30 kg/m2) based on areas under the ROC curve (AUROC) comparisons. Specifically, in the internal validation set (AUROC: 0.88 vs. 0.91), the NASH CRN cohort (AUROC: 0.96 vs. 0.93) and the French NAFLD cohort (AUROC: 0.89 vs. 0.89), the score demonstrated consistent performance. Notably, in the same study, the Agile 4 score performed better in patients without diabetes compared to those with diabetes in the French NAFLD cohort (AUROC: 0.96 vs. 0.82). These findings underscore the need for additional research to elucidate the impact of clinical factors such as, BMI, age and diabetes on the diagnostic performance of Agile 4 score. Furthermore, most of the included studies were deemed at unclear or high risk of bias, primarily due to concerns regarding patient selection. This stemmed mainly from the retrospective design of the studies and the possibility of convenience sampling (i.e., searching patient records with histology and agile score components available). Even though this practice is commonly encountered in diagnostic accuracy studies, it may introduce spectrum bias leading to overestimation of results. Furthermore, our eligibility criteria limited the analysis to studies that provided data for specific Agile 4 cutoffs. Consequently, we excluded studies that reported diagnostic accuracy estimates for different thresholds. While this approach may be subject to criticism, it ensures homogeneity of cutoffs, allowing for the more clinically meaningful generation of pooled estimates of sensitivity and specificity compared to the less informative area under the ROC curve approach.

4.3. Comparison with previous research

To our knowledge, this is the first meta‐analysis to assess the diagnostic accuracy of Agile 4 score for liver cirrhosis in patients with MASLD. Our results are generally in line with the performance reported in the original study by Sanyal et al. 8 Notably, our findings suggest that the cutoff of <0.251, originally chosen to achieve a sensitivity of ≥0.85, can actually exclude cirrhosis with a pooled sensitivity of around 0.90. Recently, Papatheodoridi et al. 27 assessed the diagnostic accuracy of the Agile 4 score in 912 MASLD participants. In this study, the authors found that an Agile 4 score <0.169 for excluding cirrhosis resulted in a sensitivity of 0.78, while a score >0.388 achieved a specificity of 0.92. These findings are comparable to our pooled estimates for the 0.565 cutoff.

Given the recent development of Agile 4 score, data regarding its performance comparted to other widely used indices are sparse. Existing evidence indicate the superiority of Agile 4 over FIB‐4, 8 , 20 , 25 with comparable performance to the classical VCTE in terms of AUROC comparison. 20 , 24 , 25 Of note, when it comes to patient stratification using the dual cutoff approach, the magnitude of the ‘grey zone’ of the index test remains a concern. In our pooled cohort, the percentage of participants with an Agile 4 score between 0.251 and 0.565 was 16%. Decision‐making for this subpopulation should consider individual patient characteristics, proximity to diagnostic thresholds and results from additional noninvasive tests. Ideally, these additional tests should be independent of Agile 4‐related parameters to ensure independent patient classification. For that purpose, magnetic resonance elastography (MRE) and Enhanced liver fibrosis (ELF) test seem reasonable approaches for further diagnostic evaluation, reserving biopsy as a final option.

Dalbeni et al. 9 conducted a systematic review and meta‐analysis to evaluate the performance of both Agile3+ and Agile 4 scores. Nevertheless, at the time of their literature searches, there were insufficient data for an Agile 4 meta‐analysis. It is worth mentioning that both Agile 3+ and Agile 4 were introduced to the scientific community simultaneously, complementing each other; Agile 3+ targets the identification of advanced fibrosis (F ≥ F3), while Agile 4 is designed for diagnosing cirrhosis (F = F4). In this meta‐analysis of six studies with 6955 participants, an Agile 3+ score ≤0.451 had a pooled sensitivity of 0.88 for ruling out advanced fibrosis, while an Agile 3+ score ≥0.679 had a pooled specificity of 0.87.

4.4. Implications for practice and research

Early identification of cirrhosis is crucial for timely initiation of a screening program to monitor for the development of clinical decompensation, including ascites, encephalopathy, hepatocellular carcinoma and variceal haemorrhage. 28 With a pooled specificity of 0.93 and LRp of 9.5, an Agile 4 score >0.565 is able to correctly classify almost 9 out of 10 patients. Additionally, patients with MASLD and a positive test are nearly 10 times more likely to have cirrhosis. Given that the PPV of Agile 4 score for ruling in liver cirrhosis is 80% even with a cirrhosis prevalence of 30%, a sequential testing approach could further enhance Agile 4's performance in identifying cirrhosis. In clinical practice, diagnostic workflows often progress from less resource‐intensive and widely available tests to more advanced diagnostic methods. Within this framework, integrating the Agile 4 score after an initial assessment using the FIB‐4 index seems reasonable. The FIB‐4 index, with its high sensitivity and NPV, serves as an effective initial screening tool. Prioritizing Agile 4 testing in patients with FIB‐4 values ≥1.3 helps target individuals more likely to have advanced fibrosis. This approach increases the prevalence of advanced fibrosis and cirrhosis in the tested population, thus improving the PPV of Agile 4 by reducing the proportion of false‐positive results. Data on the performance of this sequential testing approach are rather limited. Papatheodoridi et al. 27 reported correct classification rates of 80% for Agile 4 after a FIB‐4 ≥ 1.3. However, it is important to note that the Agile 4 thresholds used in their study differed from those chosen in our meta‐analysis. This two‐tier approach fueled the development of the MEFIB score. 29 , 30 The MEFIB score combines the application of FIB‐4 (≥1.3) as an initial screening tool, followed by MRE (≥3.6 kPa) for further evaluation. This approach has demonstrated robust diagnostic performance, with areas under the ROC curve (AUROC) of 0.84 and 0.83 for identifying advanced fibrosis and cirrhosis, respectively. 29 , 30 However, comparative data on the diagnostic performance of the two sequential approaches—FIB‐4 followed by Agile 4 versus FIB‐4 followed by MRE—are currently lacking.

With a pooled sensitivity of 0.90, LRn of 0.12, and NPVs exceeding 90% across different prevalence scenarios, an Agile 4 score <0.251 seems sufficient tο exclude cirrhosis without the need for further testing. Nevertheless, in the presence of clinical uncertainty, application of more advanced imaging modalities such as MRE remains a reasonable option with excellent diagnostic performance for fibrosis staging, 31 reserving biopsy as the last resort.

Results from individual studies indicate that Agile 4 yields lower numbers of unclassified participants compared to both conventional VCTE and FIB‐4. 8 , 24 Future studies should focus on validating these findings, and explore factors related to ‘grey zone’ results ideally within the context of an individual patient meta‐analysis.

Moreover, future studies should evaluate the role of Agile 4 as a predictor for liver related events (LRE), including hepatic decompensation and hepatocellular carcinoma. Recently, Lin et al. 32 conducted a multicentre observational study of more than 16 000 MASLD participants suggesting that both Agile scores confer high accuracy for LRE prediction.

5. CONCLUSION

Based on aggregated data from seven studies with 6037 participants, an Agile 4 score ≥0.565 performs well for ruling‐in cirrhosis with a pooled specificity of 0.93. Similarly, an agile 4 score <0.251 is able to exclude cirrhosis with an overall sensitivity of 0.90. Based on a PPV of 80% for ruling‐in cirrhosis, a sequential testing approach with the Agile 4 applied after a FIB‐4 examination seems clinically useful.

AUTHOR CONTRIBUTIONS

All the authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Konstantinos Malandris, Anastasia Katsoula, Aris Liakos, Thomas Karagiannis and Philippos Klonizakis. The first draft of the manuscript was written by Konstantinos Malandris and all the authors commented on previous versions of the manuscript. All the authors read and approved the final manuscript.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest for this article.

PEER REVIEW

The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer‐review/10.1111/dom.16142.

Supporting information

Data S1. Supporting information.

DOM-27-1406-s001.docx (694.2KB, docx)

ACKNOWLEDGEMENTS

None.

Malandris K, Katsoula A, Liakos A, et al. Diagnostic accuracy of Agile‐4 score for liver cirrhosis in patients with metabolic dysfunction‐associated steatotic liver disease. A systematic review and meta‐analysis of diagnostic test accuracy studies. Diabetes Obes Metab. 2025;27(3):1406‐1414. doi: 10.1111/dom.16142

DATA AVAILABILITY STATEMENT

Data set available on reasonable request from corresponding author.

REFERENCES

  • 1. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology. 2023;77:1335‐1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Younossi ZM, Golabi P, Price JK, et al. The global epidemiology of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis among patients with type 2 diabetes. Clin Gastroenterol Hepatol. 2024;22:1999‐2010.e8. [DOI] [PubMed] [Google Scholar]
  • 3. Sanyal AJ, Van Natta ML, Clark J, et al. Prospective study of outcomes in adults with nonalcoholic fatty liver disease. N Engl J Med. 2021;385:1559‐1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Sumida Y, Nakajima A, Itoh Y. Limitations of liver biopsy and non‐invasive diagnostic tests for the diagnosis of nonalcoholic fatty liver disease/nonalcoholic steatohepatitis. World J Gastroenterol. 2014;20:475‐485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Xiao G, Zhu S, Xiao X, Yan L, Yang J, Wu G. Comparison of laboratory tests, ultrasound, or magnetic resonance elastography to detect fibrosis in patients with nonalcoholic fatty liver disease: a meta‐analysis. Hepatology. 2017;66:1486‐1501. [DOI] [PubMed] [Google Scholar]
  • 6. EASL clinical practice guidelines on non‐invasive tests for evaluation of liver disease severity and prognosis—2021 update. J Hepatol. 2021;75:659‐689. [DOI] [PubMed] [Google Scholar]
  • 7. Castera L, Friedrich‐Rust M, Loomba R. Noninvasive assessment of liver disease in patients with nonalcoholic fatty liver disease. Gastroenterology. 2019;156:1264‐1281.e1264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Sanyal AJ, Foucquier J, Younossi ZM, et al. Enhanced diagnosis of advanced fibrosis and cirrhosis in individuals with NAFLD using FibroScan‐based agile scores. J Hepatol. 2023;78:247‐259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Dalbeni A, Lombardi R, Henrique M, et al. Diagnostic accuracy of AGILE 3+ score for advanced fibrosis in patients with NAFLD: a systematic review and meta‐analysis. Hepatology. 2024;79:1107‐1116. [DOI] [PubMed] [Google Scholar]
  • 10. Page MJ, Shamseer L, Tricco AC. Registration of systematic reviews in PROSPERO: 30,000 records and counting. Syst Rev. 2018;7:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. McInnes MDF, Moher D, Thombs BD, et al. Preferred reporting items for a systematic review and meta‐analysis of diagnostic test accuracy studies: the PRISMA‐DTA Statement. JAMA. 2018;319:388‐396. [DOI] [PubMed] [Google Scholar]
  • 12. Reitsma JB, Whiting P, Yang B, Leeflang MM, Bossuyt PM, Deeks JJ. Chapter 8: assessing risk of bias and applicability. In: Deeks JJ, Leeflang MM, eds. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 2.0; John Wiley & Sons; 2023. https://training.cochrane.org/handbook-diagnostic-test-accuracy/current [Google Scholar]
  • 13. Clark JM, Sanders S, Carter M, et al. Improving the translation of search strategies using the polyglot search translator: a randomized controlled trial. J Med Libr Assoc. 2020;108:195‐207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS‐2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529‐536. [DOI] [PubMed] [Google Scholar]
  • 15. Takwoingi Y, Schiller I, Rücker G, Jones HE, Partlett C, Macaskill P. Chapter 10: Undertaking meta‐analysis. In: Deeks JJ, Leeflang MM, eds. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 2.0; John Wiley & Sons; 2023. https://training.cochrane.org/handbook-diagnostic-test-accuracy/current [Google Scholar]
  • 16. Macaskill PTY, Deeks JJ, Gatsonis C. Chapter 9: Understanding meta‐analysis. In: Deeks JJ, Leeflang MM, Takwoingi Y, eds. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy 2.0; John Wiley & Sons; 2023. https://training.cochrane.org/handbook-diagnostic-test-accuracy/current [Google Scholar]
  • 17. Skrondal A, Rabe‐Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. 2004.
  • 18. Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ. 2006;174:469‐476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Stengel D, Bauwens K, Sehouli J, Ekkernkamp A, Porzsolt F. A likelihood ratio approach to meta‐analysis of diagnostic studies. J Med Screen. 2003;10(1):47‐51. [DOI] [PubMed] [Google Scholar]
  • 20. Taru MG, Tefas C, Neamti L, et al. FAST and agile‐the MASLD drift: validation of agile 3+, agile 4 and FAST scores in 246 biopsy‐proven NAFLD patients meeting MASLD criteria of prevalent caucasian origin. PLoS One. 2024;19:e0303971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Fan R, Yu N, Li G, et al. Machine‐learning model comprising five clinical indices and liver stiffness measurement can accurately identify MASLD‐related liver fibrosis. Liver Int. 2024;44:749‐759. [DOI] [PubMed] [Google Scholar]
  • 22. Mehta M, Duseja A, De A, et al. Validation of AGILE‐4 score for assessing cirrhosis in Indian patients with nonalcoholic fatty liver disease (NAFLD): an interim analysis of the multicentric Indian consortium on Nafld (ICON D) study. J Clin Exp Hepatol. 2023;13:S129‐S130. [Google Scholar]
  • 23. Nakatsuka T, Tateishi R, Sato M, Fujishiro M, Koike K. Agile scores are a good predictor of liver‐related events in patients with NAFLD. J Hepatol. 2023;79:e126‐e127. [DOI] [PubMed] [Google Scholar]
  • 24. Noureddin M, Mena E, Vuppalanchi R, et al. Increased accuracy in identifying NAFLD with advanced fibrosis and cirrhosis: independent validation of the agile 3+ and 4 scores. Hepatol Commun. 2023;7:e0055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Oeda S, Seko Y, Hayashi H, et al. Validation of the utility of agile scores to identify advanced fibrosis and cirrhosis in Japanese patients with nonalcoholic fatty liver disease. Hepatol Res. 2023;53:489‐496. [DOI] [PubMed] [Google Scholar]
  • 26. Deeks JJ, Leeflang MM, Takwoingi Y, eds. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. Version 2.0. 2023. https://training.cochrane.org/handbook-diagnostic-test-accuracy/current [DOI] [PMC free article] [PubMed]
  • 27. Papatheodoridi M, De Ledinghen V, Lupsor‐Platon M, et al. Agile scores in MASLD and ALD: external validation and their utility in clinical algorithms. J Hepatol. 2024;81:590‐599. [DOI] [PubMed] [Google Scholar]
  • 28. Sanyal AJ, Brunt EM, Kleiner DE, et al. Endpoints and clinical trial design for nonalcoholic steatohepatitis. Hepatology. 2011;54:344‐353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Kim BK, Tamaki N, Imajo K, et al. Head‐to‐head comparison between MEFIB, MAST, and FAST for detecting stage 2 fibrosis or higher among patients with NAFLD. J Hepatol. 2022;77:1482‐1490. [DOI] [PubMed] [Google Scholar]
  • 30. Tamaki N, Imajo K, Sharpton SR, et al. Two‐step strategy, FIB‐4 followed by magnetic resonance elastography, for detecting advanced fibrosis in NAFLD. Clin Gastroenterol Hepatol. 2023;21:380‐387.e383. [DOI] [PubMed] [Google Scholar]
  • 31. Liang J‐x, Ampuero J, Niu H, et al. An individual patient data meta‐analysis to determine cut‐offs for and confounders of NAFLD‐fibrosis staging with magnetic resonance elastography. J Hepatol. 2023;79:592‐604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Lin H, Lee HW, Yip TC, et al. Vibration‐controlled transient elastography scores to predict liver‐related events in Steatotic liver disease. JAMA. 2024;331:1287‐1297. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting information.

DOM-27-1406-s001.docx (694.2KB, docx)

Data Availability Statement

Data set available on reasonable request from corresponding author.


Articles from Diabetes, Obesity & Metabolism are provided here courtesy of Wiley

RESOURCES