Deep learning radiomics of elastography for diagnosing compensated advanced chronic liver disease: an international multicenter study

Xue Lu; Haoyan Zhang; Hidekatsu Kuroda; Matteo Garcovich; Victor de Ledinghen; Ivica Grgurević; Runze Linghu; Hong Ding; Jiandong Chang; Min Wu; Cheng Feng; Xinping Ren; Changzhu Liu; Tao Song; Fankun Meng; Yao Zhang; Ye Fang; Sumei Ma; Jinfen Wang; Xiaolong Qi; Jie Tian; Xin Yang; Jie Ren; Ping Liang; Kun Wang

doi:10.1186/s42492-025-00199-6

. 2025 Aug 15;8:19. doi: 10.1186/s42492-025-00199-6

Deep learning radiomics of elastography for diagnosing compensated advanced chronic liver disease: an international multicenter study

Xue Lu ^1,^#, Haoyan Zhang ^2,^3,^#, Hidekatsu Kuroda ⁴, Matteo Garcovich ⁵, Victor de Ledinghen ⁶, Ivica Grgurević ⁷, Runze Linghu ⁸, Hong Ding ⁹, Jiandong Chang ¹⁰, Min Wu ¹¹, Cheng Feng ¹², Xinping Ren ¹³, Changzhu Liu ¹⁴, Tao Song ¹⁵, Fankun Meng ¹⁶, Yao Zhang ¹⁷, Ye Fang ¹⁸, Sumei Ma ¹⁹, Jinfen Wang ¹, Xiaolong Qi ²⁰, Jie Tian ^2,³, Xin Yang ^2,^3,^✉, Jie Ren ^1,^✉, Ping Liang ^8,^✉, Kun Wang ^2,^3,^✉

¹Department of Ultrasound, Guangdong Key Laboratory of Liver Disease Research, Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510630 Guangdong China

²CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190 China

³School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100190 China

⁴Division of Hepatology, Department of Internal Medicine, School of Medicine, Iwate Medical University, Shiwa-Gun, Iwate, 028-3694 Japan

⁵Medicina Interna E Gastroenterologia, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Università Cattolica del Sacro Cuore, Rome, 00168 Italy

⁶Hepatology Unit, University Hospital, CHU Bordeaux, Pessac, & INSERM U1312, Bordeaux University, Bordeaux, 33000 France

⁷Department of Gastroenterology, Hepatology and Clinical Nutrition, University Hospital Dubrava, Zagreb, 10000 Croatia

⁸Ultrasound Department, Chinese PLA General Hospital, Beijing, 100853 China

⁹Department of Ultrasound, Zhongshan Hospital Affiliated to Fudan University, Shanghai, 200032 China

¹⁰Department of Ultrasound, Xiamen Hospital of Traditional Chinese Medicine, Xiamen, 361009 Fujian China

¹¹Department of Ultrasound, Nanjing University Medical School Affiliated Nanjing Drum Tower Hospital, Nanjing, 210008 Jiangsu China

¹²Department of Ultrasound, National Clinical Research Center for Infectious Disease, Department of Ultrasound, Shenzhen Third People’s Hospital, Second Hospital, Affiliated to Southern University of Science and Technology, Shenzhen, 518112 Guangdong China

¹³Department of Ultrasound, School of Medicine, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, 200025 China

¹⁴Department of Ultrasound, Guangzhou Eighth People’s Hospital, Guangzhou Medical University, Guangzhou, 510600 Guangdong China

¹⁵Department of Abdominal Ultrasound, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054 Xinjiang Uygur Autonomous Region, China

¹⁶Department of Ultrasound, Beijing You’an Hospital Affiliated to Capital Medical University, Beijing, 100069 China

¹⁷Department of Ultrasound, Beijing Ditan Hospital, Capital Medical University, Beijing, 100015 China

¹⁸Department of Ultrasound, Ningbo Yinzhou No. 2 Hospital, Ningbo, 315100 Zhejiang China

¹⁹Department of Ultrasound, the First Hospital of Lanzhou University, Lanzhou, 730000 Gansu China

²⁰Department of Radiology, Medical School, Zhongda Hospital, Southeast University, Nanjing, 210009 Jiangsu China

^✉

Corresponding author.

Contributed equally.

PMCID: PMC12354435 PMID: 40813740

Abstract

Accurate, noninvasive diagnosis of compensated advanced chronic liver disease (cACLD) is essential for effective clinical management but remains challenging. This study aimed to develop a deep learning-based radiomics model using international multicenter data and to evaluate its performance by comparing it to the two-dimensional shear wave elastography (2D-SWE) cut-off method covering multiple countries or regions, etiologies, and ultrasound device manufacturers. This retrospective study included 1937 adult patients with chronic liver disease due to hepatitis B, hepatitis C, or metabolic dysfunction-associated steatotic liver disease. All patients underwent 2D-SWE imaging and liver biopsy at 17 centers across China, Japan, and Europe using devices from three manufacturers (SuperSonic Imagine, General Electric, and Mindray). The proposed generalized deep learning radiomics of elastography model integrated both elastographic images and liver stiffness measurements and was trained and tested on stratified internal and external datasets. A total of 1937 patients with 9472 2D-SWE images were included in the statistical analysis. Compared to 2D-SWE, the model achieved a higher area under the receiver operating characteristic curve (AUC) (0.89 vs 0.83, P = 0.025). It also achieved a highly consistent diagnosis across all subanalyses (P values: 0.21–0.91), whereas 2D-SWE exhibited different AUCs in the country or region (P < 0.001) and etiology (P = 0.005) subanalyses but not in the manufacturer subanalysis (P = 0.24). The model demonstrated more accurate and robust performance in noninvasive cACLD diagnosis than 2D-SWE across different countries or regions, etiologies, and manufacturers.

Supplementary Information

The online version contains supplementary material available at 10.1186/s42492-025-00199-6.

Keywords: Deep learning, Elastography, International multicenter study, Compensated advanced chronic liver disease

Introduction

Compensated advanced chronic liver disease (cACLD) was first proposed by the Baveno VI consensus to describe the spectrum of advanced fibrosis and cirrhosis in asymptomatic patients [1]. Early and accurate discrimination of cACLD can maximally improve treatment outcomes [2, 3]. This makes the noninvasive diagnosis of cACLD vital in clinical practice [4].

Liver stiffness measurements (LSMs) obtained by two-dimensional shear wave elastography (2D-SWE) have been recommended by guidelines [5] but with inconsistent cut-off values and unstable accuracies in various studies [6–10]. For example, a meta-analysis of 53 studies showed cut-offs ranging from 7.1 to 14.1 kPa for assessing cACLD in patients with non-alcoholic fatty liver diseases [11]. Moreover, as manufacturers launched 2D-SWE systems individually, intersystem variability ranged up to 12% [12]. Guidelines point out that the LSM cut-off should be system and etiology specific [13–15], which severely limits the reliability and generalizability of 2D-SWE for identifying cACLD. Currently, when a manufacturer wants to release a new 2D-SWE system in a country or region, a multicenter study of cACLD diagnosis is required to obtain specific LSM cut-offs for different etiologies, which may be unreliable in other countries and regions [15–17]. Therefore, this expensive, time-consuming, laborious, and patient-invasive (for obtaining pathological results) process must be performed repeatedly.

Previously, a convolutional neural network (CNN)-based radiomics technique using 2D-SWE images, named deep learning radiomics of elastography (original DLRE model) was developed and its performance in staging advanced fibrosis (≥ F3) and cirrhosis (F4) evaluated in a multicenter prospective study [18]. This model was designed to automatically and quantitatively extract high-throughput image features from 2D-SWE, enabling the learning of more comprehensive disease characteristics compared to a single LSM acquisition. However, the original model had relatively inaccurate discrimination of clinically significant fibrosis (≥ F2); an updated CNN model (refined DLRE model) successfully overcame the major defect of the original version [19]. Other studies used similar methods to confirm these findings [20–22].

Both model versions were trained using 2D-SWE images acquired from patients with chronic hepatitis B (CHB) in China solely using the SuperSonic Imagine (SSI) system. From the available information, to date, there have been no reports of deep learning radiomics models whose performance has been evaluated using 2D-SWE images acquired from multi-manufacturer systems and patients with multi-etiology liver disease in multiple countries and regions worldwide.

Therefore, a generalized deep learning radiomics of elastography (DLRE-X) model was developed. Its architecture was specifically designed to eliminate noise features from different countries or regions, etiologies, and US device manufacturers in 2D-SWE images while retaining the key image features (image biomarkers) derived from crucial pathological knowledge of cACLD. To train and evaluate this updated model for cACLD identification, an international multicenter dataset of patients with chronic liver disease (CLD) was constructed. This dataset consisted of 2D-SWE images obtained from three countries and region (China, Japan, and Europe), three etiologies (CHB, chronic hepatitis C [CHC], and metabolic dysfunction-associated steatotic liver disease [MASLD]), and three manufacturers (SSI, General Electric (GE), and Mindray).

Thus, this study aimed to develop a deep learning-based radiomics model using international multicenter data and evaluate its accuracy for diagnosing cACLD compared with that of the 2D-SWE cut-off method and its robustness covering multiple countries or regions, etiologies, and US device manufacturers.

Methods

This international multicenter study was performed in accordance with the Declaration of Helsinki and adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, or TRI-POD, statement. The study protocol was approved by the Ethics Committee of the principal investigator’s (PI) hospital (No. [2021] 02–529), and verbal informed consent was obtained.

Study design and datasets

This study analyzed CLD with three etiologies (CHB, CHC, and MASLD) in patients from 17 academic medical centers in three countries and region (China, Japan, and Europe). 2D-SWE images were acquired by three manufacturers, including SSI (SuperSonics), GE (Logic E9), and Mindray (Resona7), between January 2012 and December 2019. The inclusion criteria were as follows: (1) patients aged 18–80 years; (2) diagnosed with CHB, CHC, or MASLD; (3) underwent LSM on 2D-SWE images acquired by the SSI, GE, or Mindray system; and (4) had liver pathological results. The exclusion criteria were as follows: (1) fewer than three 2D-SWE images of acceptable quality; (2) coinfection with other liver diseases or liver transplantation; (3) received antiviral treatments within six months prior to pathological examination; and (4) unqualified pathological results, including number of liver biopsy specimens < 2, specimen length < 15 mm, and portal areas < 6.

Patients were assigned to the training, internal test, or external test sets in two steps. First, participants with three etiologies examined by three manufacturer systems from hospitals A, B, E, F, H, I, J, K, L, and M in China and hospital N in Japan (hospital names in Supplementary Table 1), accounting for approximately 80% of the total, were assigned to the training set (64%) or internal test set (16%) using simple randomization. Second, the remaining 20% from hospitals C, D, and G in China, hospital N in Japan, and hospitals O, P, and Q in Europe were allocated to the external test set to ensure complete independence. Using this method, each patient was assigned to one of three datasets and not multiple datasets.

The DLRE-X model was developed using the training set and compared to the 2D-SWE cut-off method in all three datasets. Moreover, subanalysis comparisons were performed in terms of the diagnostic accuracy and robustness of cACLD between different countries or regions, etiologies, and manufacturers in the external test set.

2D-SWE image acquisition and quality control

The PI center established the standard 2D-SWE measurement procedure and promoted it to participating centers since 2012, which was recommended by many guidelines in later years [13–15, 23]. Hence, all centers follow standard 2D-SWE measurement procedures [17, 24, 25], which guarantee the quality of the 2D-SWE images. Detailed methods are provided in the Supplementary Materials.

Clinical data and pathological evaluation

Demographic data and serological results of eligible patients were collected. All patients underwent liver biopsy within 2 weeks after 2D-SWE. The degree of fibrosis (F0-1, F2, F3, and F4) was evaluated using METAVIR (patients with CHB or CHC) or Ishak (patients with MASLD). These are the two most widely accepted scoring systems for the assessment of liver fibrosis. cACLD was defined as ≥ F3 (METAVIR scoring systems) or ≥ S4 (Ishak scoring systems) [4, 25]. Detailed methods are provided in the Supplementary Materials.

DLRE-X

The 2D-SWE images acquired using the SSI, GE, and Mindray systems showed different appearances, colors, and manual or machine annotations in the measurement areas (Fig. 1a and Supplementary Fig. 1). The model integrated two inputs. The first was the pseudo-color area of a 2D-SWE image, which was semi-automatically cropped as the region of interest (ROI) by a US radiologist (X.L.) with more than 5 years of 2D-SWE experience, who was blinded to the pathological results, using a self-developed algorithm (Fig. 1a, red boxes; the detailed methods are shown in Supplementary Materials). The second input was the textual information of the LSM values (Fig. 1a, yellow arrows).

The model adopted the convolutional network for next generation (ConvNeXt) architecture to extract phenotypic features from 2D-SWE ROIs (Fig. 1b and Supplementary Fig. 2) [26]. Various techniques were employed, including modernizing a standard ResNet, redesigning its macro design, adopting the idea of inception, using the inverted bottleneck, increasing the kernel size, and applying different micro design choices to achieve high performance while using fewer parameters and requiring fewer computational resources [27–30]. Moreover, a transfer learning strategy was employed using pre-trained weights provided by Pytorch’s official ImageNet model [31–34]. The parameters were then fine-tuned using the training set data [35]. Offline image augmentation was performed, including random cropping, rotation, flipping, affine transformation, normalization, and tensor conversion, to improve the fine-tuning efficiency and mitigate overfitting. Owing to the dual-input design, dimension matching [36] was performed between these two inputs, and their prior weights were adjusted before dual-input feature fusion to ensure that their feature dimensions were of similar magnitudes (Fig. 1b).

All model parameters were determined using the training and internal test sets. The external test set was used to independently evaluate its performance. The performances of the single-input (ROI) and dual-input of the model were compared in all datasets and subanalyses. Detailed information on the model is shown in Supplementary Materials, and the codes of the ROI selection algorithm and the model are available at https://github.com/samadhi-fire/DLRE-3.0.

Statistical analysis

Continuous variables were summarized as means ± SDs, and categorical variables were categorized as numbers and percentages. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated to assess the diagnostic performance of the model and 2D-SWE. The prevalence, sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratio (LR + and LR-) of maximizing the Youden index on the estimated ROCs were calculated with 95%CIs. AUCs with 95%CIs of the model and 2D-SWE were compared in all three datasets and in each subanalysis under the external test set using the Delong test. P < 0.05 indicated statistical significance. Statistical analysis was performed using SPSS software for Windows, V.20.0 (SPSS).

Results

Baseline dataset characteristics

Of 2591 patients, 654 were excluded because of insufficient number of images, co-infection or liver transplantation, receiving antiviral treatments before liver biopsy, and unquantified histological results, and 1937 patients with 9472 2D-SWE images were selected for analysis after exclusion (Fig. 2a). Among them, 1233 patients with 6150 images, 309 patients with 1543 images, and 395 patients with 1779 images were allocated to the training, internal, or external test sets, respectively (Supplementary Materials). The detailed distribution of the included patients regarding the country or region, etiologies, manufacturers, and participating centers is shown in Figs. 2b-d and Supplementary Table 1. Table 1 lists the baseline characteristics. There was no evidence of differences among the training, internal, and external test sets (all P > 0.05).

Fig. 2 — Flowchart of patient enrollment process for the training, internal test, and external test sets. (a) The procedure of patient enrollment. Of 2591 patients, 654 were excluded and 1937 patients with 9472 2D-SWE images were selected for analysis. Distribution of the (b) country or region, (c) etiology, and (d) manufacturer subanalyses

Table 1.

Baseline characteristics of the training, internal test, and external test sets

Characteristics	Training	Internal test set	External test set	P value
No. of patients	1233	309	395	0.54
Age (year)*	58.92 ± 15.63	60.62 ± 12.83	60.85 ± 13.86	0.57
Age range (year)	19–79	21–71	19–79	0.57
Sex (M)	653 (53%)	159 (52%)	216 (55%)	0.54
Sex (F)	580 (47%)	150 (48%)	179 (45%)	0.54
BMI*	25.70 ± 4.87	27.07 ± 5.67	25.45 ± 5.73	0.25
AST (U/L)*	51.43 ± 50.86	47.87 ± 31.35	45.56 ± 28.23	0.57
ALT (U/L)*	62.07 ± 74.97	51.87 ± 34.26	58.57 ± 54.45	0.66
ALB (g/L)*	4.08 ± 0.50	4.10 ± 0.23	4.12 ± 0.57	0.76
TB (umol/L)*	0.73 ± 0.40	0.71 ± 0.29	0.80 ± 0.38	0.31
DB (umol/L)*	0.43 ± 0.22	0.42 ± 0.18	0.48 ± 0.21	0.25
IB (umol/L)*	0.29 ± 0.18	0.27 ± 0.13	0.31 ± 0.17	0.43
GGT (U/L)*	61.66 ± 69.67	57.65 ± 53.52	69.80 ± 119.86	0.70
PT (s)*	12.92 ± 2.09	12.70 ± 1.29	12.96 ± 2.52	0.82
Fibrosis degree
F0-1	460 (38%)	127 (41%)	168 (43%)	0.51
F2	349 (28%)	69 (22%)	84 (21%)	0.61
F3	210 (17%)	60 (19%)	59 (15%)	0.53
F4	214 (17%)	53 (17%)	84 (21%)	0.51
cACLD or not
non-cACLD	809 (66%)	196 (63%)	252 (64%)	0.55
cACLD	424 (34%)	113 (37%)	143 (36%)	0.53

Open in a new tab

Data are numbers of participants, with percentages in parentheses, for categorical variables

AST aspartate aminotransferase, BMI body mass index, calculated as weight in kilograms divided by height in meters squared, ALT alanine aminotransferase, ALB albumin, TB total bilirubin, DB direct bilirubin, IB indirect bilirubin, GGT gamma-glutamyl transpeptidase, PT prothrombin activity, F0-1 no or minor fibrosis, F2 clinically significant fibrosis, F3 advanced fibrosis, F4 cirrhosis

*Data are means ± SDs for continuous variables. P values were calculated between the training, internal test, and external test sets using analysis of variance (ANOVA) for continuous variables and Fisher’s exact test for categorical variables

Overall diagnostic performance comparisons between DLRE-X and 2D-SWE

Figure 3a shows the overall diagnostic performances of the model and 2D-SWE in the training, internal test, and external test sets for diagnosing cACLD. The hyperparameters of the model and the LSM cut-off values of 2D-SWE are shown in Supplementary Table 2. In the training set, the model achieved higher AUCs (0.92; 95%CI: 0.91, 0.94 vs 0.87; 95%CI: 0.85, 0.89; P < 0.001) than 2D-SWE. The sensitivity, specificity, and other quantitative indices are listed in Table 2 and Supplementary Table 3. However, in the internal test set, no evidence of a difference was found between the model and 2D-SWE (AUC: 0.90; 95%CI: 0.86, 0.93 vs 0.88; 95%CI: 0.83, 0.91; P = 0.49).

Fig. 3 — Overall comparisons between the deep learning-based radiomics model DLRE-X and 2D-SWE. ROC curves of DLRE-X and 2D-SWE in the training (a), internal test (b), and external test sets (c). Heat plot of DLRE-X and 2D-SWE in the external test set (d)

Table 2.

Diagnostic performances of the deep learning-based radiomics model (DLRE-X) and 2D-SWE in the training, internal test, and external test sets

Prevalence (%)

Sensitivity (%)

Specificity(%)

AUC

P value

Training set

The model

1233

34.3 [424/1233]

(31.7, 37.0)

87 [370/424]

(85, 90)

84 [678/809]

(81, 86)

0.92

(0.91, 0.94)

0.0002

2D-SWE

1233

34.3 [424/1233]

(31.7, 37.0)

84 [358/424]

(81, 88)

75 [598/799]

(72, 78)

0.87

(0.85, 0.89)

Internal test set

The model

309

36.6 [113/309]

(31.2, 42.0)

81 [91/113]

(74, 86)

86 [168/196]

(81, 90)

0.90

(0.86, 0.93)

0.49

2D-SWE

309

36.6 [113/309]

(31.2, 42.0)

77 [87/113]

(71, 83)

67 [132/196]

(59, 75)

0.88

(0.83, 0.91)

External test set

The model

395

36.2 [143/395]

(31.4, 41.0)

83 [118/143]

(77, 87)

81 [203/252]

(76, 84)

0.89

(0.86, 0.91)

0.025

2D-SWE

395

36.2 [143/395]

(31.4, 41.0)

72 [103/143]

(64, 79)

75 [189/252]

(69, 80)

0.83

(0.79, 0.87)

Subanalysis of external test set

Country or region-China

The model

221

40.2 [89/221]

(33.8, 46.8)

82 [73/89]

(75, 88)

79 [104/132]

(73, 84)

0.87

(0.83, 0.91)

0.003

2D-SWE

221

40.2 [89/221]

(33.8, 46.8)

64 [57/89]

(53, 74)

69 [91/132]

(60, 77)

0.75

(0.69, 0.81)

Country or region-Japan

The model

36.4 [36/99]

(26.7, 46.0)

83 [30/36]

(72, 93)

83 [52/63]

(74, 90)

0.91

(0.85, 0.95)

> 0.99

2D-SWE

36.4 [36/99]

(26.7, 46.0)

86 [31/36]

(71, 95)

75 [47/63]

(62, 85)

0.91

(0.83, 0.96)

Country or region-Europe

The model

24.0 [18/75]

(14.1, 33.9)

83 [15/18]

(67, 96)

82 [47/57]

(74, 91)

0.91

(0.84, 0.97)

0.69

2D-SWE

24.0 [18/75]

(14.1, 33.9)

83 [15/18]

(59, 96)

90 [51/57]

(79, 96)

0.93

(0.85, 0.98)

Etiology-CHB

The model

195

44.1 [86/195]

(37.1, 51.1)

83 [71/86]

(76, 89)

78 [85/109]

(72, 85)

0.87

(0.82, 0.91)

0.004

2D-SWE

195

44.1 [86/195]

(37.1, 51.1)

64 [55/86]

(53, 74)

68 [74/109]

(58, 77)

0.75

(0.68, 0.81)

Etiology-CHC

The model

47.9 [34/71]

(36.0, 59.8)

85 [29/34]

(75, 94)

65 [24/37]

(51, 77)

0.86

(0.78, 0.93)

0.77

2D-SWE

47.9 [34/71]

(36.0, 59.8)

82 [28/34]

(66, 93)

70 [26/37]

(53, 84)

0.84

(0.74, 0.92)

Etiology-MASLD

The model

129

17.8 [23/129]

(11.1, 24.5)

78 [18/23]

(64, 92)

89 [94/106]

(83, 93)

0.90

(0.82, 0.96)

0.86

2D-SWE

129

17.8 [23/129]

(11.1, 24.5)

87 [20/23]

(77, 90)

84 [89/106]

(87, 97)

0.91

(0.85, 0.95)

Manufacturer-SSI

The model

148

21.6 [32/148]

(14.9, 28.3)

69 [22/32]

(54, 81)

81 [94/116]

(75, 87)

0.84

(0.78, 0.90)

0.77

2D-SWE

148

21.6 [32/148]

(14.9, 28.3)

78 [25/32]

(60, 91)

78 [90/116]

(69, 85)

0.87

(0.80, 0.92)

Manufacturer-GE

The model

196

50.5 [99/196]

(43.4, 47.6)

87 [86/99]

(81, 92)

76 [74/97]

(69, 83)

0.90

(0.86, 0.93)

0.94

2D-SWE

196

50.5 [99/196]

(43.4, 47.6)

69 [68/99]

(58, 78)

80 [78/97]

(71, 88)

0.87

(0.81, 0.91)

Manufacturer-Mindray

The model

23.5 [12/51]

(11.5, 35.6)

83 [10/12]

(64, 100)

90 [35/39]

(81, 97)

0.88

(0.76, 0.99)

0.65

2D-SWE

23.5 [12/51]

(11.5, 35.6)

83 [10/12]

(52, 98)

56 [22/39]

(40, 72)

0.81

(0.67, 0.90)

Open in a new tab

Data in parentheses are 95%CIs, and data in brackets are numbers of patients. The AUC of the model was statistically compared with the AUC of 2D-SWE in each dataset or subanalysis using the Delong test

In the external test sets, the model achieved a higher AUC (0.89; 95%CI: 0.86, 0.91) than 2D-SWE (0.83; 95%CI: 0.79, 0.87) (P = 0.02). It achieved better diagnostic performance for most quantitative indices (Table 2 and Supplementary Table 3). Figure 3d shows the results of both methods for each patient in the external set, demonstrating that the model provided fewer false negatives (blue stripes in yellow areas, 49 vs 114, P < 0.001) and fewer false positives (yellow stripes in blue areas, 25 vs 90, P < 0.001) than 2D-SWE. Some failure cases of the model are presented in Supplementary Fig. 3, revealing that misdiagnosed cases were similarly to correct cases.

The net benefits of the two methods in the decision curve analysis are shown in Supplementary Fig. 4. The area under the red decision curve (the model) exceeds the blue area (2D-SWE) in all three datasets.

Single- vs dual-input performance of DLRE-X

The results of the single- and dual-input evaluations are shown in Supplementary results and Supplementary Table 4. Dual-input provided a better AUC than single-input in the external test set (0.89; 95%CI: 0.86, 0.91 vs 0.83; 95%CI: 0.79, 0.86; P = 0.02).

Diagnostic accuracy and robustness comparisons between DLRE-X and 2D-SWE among different subanalyses

Country or region

Figure 4a and Supplementary Table 5 show the ROC curves and quantitative indices of the model and 2D-SWE for the three countries and region (China, Japan, and Europe) under the external test set. In the Chinese subanalysis, the model had better AUC values than 2D-SWE (0.87; 95%CI: 0.83, 0.91 vs 0.75; 95%CI: 0.69, 0.81; P = 0.003), but in the Japanese and European subanalyses, there was no evidence of differences between the model and 2D-SWE (Japan: 0.91; 95%CI: 0.85, 0.95 vs 0.91; 95%CI: 0.83, 0.96; P > 0.99; Europe: 0.91; 95%CI: 0.84, 0.97 vs 0.93; 95%CI: 0.85, 0.98; P = 0.69).

Fig. 4 — Subanalysis comparisons between DLRE-X and 2D-SWE under the external test set. a ROC curves and AUC values of DLRE-X and 2D-SWE for different country or region (Chinese, Japanese, and European subanalyses); b ROC curves and AUC values of DLRE-X and 2D-SWE for different etiologies (CHB, CHC, and MASLD subanalyses); c ROC curves and AUC values of DLRE-X and 2D-SWE for different manufacturers (SSI, GE, and Mindray subanalyses). AUC values of each method within each subanalysis were statistically compared pairwise using the Delong test. DLRE-X achieved highly consistent diagnosis across all subanalyses (P values: 0.21–0.91), whereas 2D-SWE exhibited different AUCs in country and region (P < 0.001) and etiology (P = 0.005) subanalyses, but not in the manufacturer subanalysis (P = 0.24)

Diagnostic robustness comparisons revealed different consistencies between the two methods in different countries and regions. Using the model to diagnose cACLD in different countries and regions, no differences were found between the AUCs (Fig. 4a, upper bar chart, AUC range: 0.87, 0.91; P = 0.21). However, 2D-SWE exhibited different AUCs between the Chinese, Japanese, and European subanalyses (Fig. 4a, lower bar chart, AUC range: 0.75, 0.93; P < 0.001).

Etiologies

The model provided a better AUC than 2D-SWE in the CHB subanalysis (0.87; 95%CI: 0.82, 0.91 vs 0.75; 95%CI: 0.68, 0.81; P = 0.004) (Fig. 4b and Supplementary Table 6). However, in both the CHB and MASLD subanalyses, there were differences between AUCs for the model and 2D-SWE (CHC: 0.86; 95%CI: 0.78, 0.93 vs 0.84; 95%CI: 0.74, 0.92; P = 0.77; MASLD: 0.90; 95%CI: 0.82, 0.96 vs 0.91; 95%CI: 0.85, 0.95; P = 0.86).

The diagnostic robustness comparisons showed no evidence of differences between AUCs for the model across different etiologies (Fig. 4b, upper bar chart, AUC range: 0.87, 0.90; P = 0.27), but 2D-SWE exhibited different AUCs between them (Fig. 4b, lower bar chart, AUC range: 0.75, 0.91; P = 0.005).

US device manufacturers

When comparing AUC values between the model and 2D-SWE in all three manufacturer subanalyses, there was no evidence of statistical differences: SSI (0.84; 95%CI: 0.78, 0.90 vs 0.87; 95%CI: 0.80, 0.92; P = 0.77), GE (0.90; 95%CI: 0.86, 0.93 vs 0.87; 95%CI: 0.81, 0.91; P = 0.94), Mindray (0.88; 95%CI: 0.76, 0.99 vs 0.81; 95%CI: 0.67, 0.90; P = 0.65) (Fig. 4c, Table 2, and Supplementary Table 7).

Diagnostic robustness comparisons showed no evidence of differences between the AUCs for either the model or 2D-SWE across different manufacturers (Fig. 4c, upper bar chart, AUC range: 0.84, 0.90; P = 0.55; lower bar chart, AUC range: 0.81, 0.87; P = 0.24).

Single- vs dual-input performance of DLRE-X in subanalyses

The results of the single- vs dual-input of the model in the country or region, etiology, and manufacturer subanalyses are presented in Supplementary Tables 8–10. Dual-input provided higher AUCs than single-input in the Chinese (0.87; 95%CI: 0.83, 0.91; vs 0.78; 95%CI: 0.73, 0.84; P = 0.03) and CHB (0.87; 95%CI: 0.82, 0.91; vs 0.77; 95%CI: 0.72, 0.83; P = 0.02) subanalyses, but there was no evidence of differences in the other subanalyses (all P > 0.05).

Poor robustness of 2D-SWE in cACLD discrimination

The poor robustness of 2D-SWE in cACLD discrimination was further demonstrated by listing the optimal LSM cut-offs for each subanalysis under the external test set (Supplementary Table 11). There was a 33.5% [(10.66–7.09)/10.66] difference between the lowest cut-off (7.09 kPa, GE subanalysis) and the largest cut-off (10.66 kPa, Mindray subanalysis).

Discussion

Noninvasive and accurate diagnosis of cACLD is necessary to optimize treatment outcomes. However, the lack of a universal cut-off for 2D-SWE limits its reliability and generalizability in identifying cACLD. Thus, the accuracy and robustness of DLRE-X and 2D-SWE for noninvasive diagnosis of cACLD were investigated. A total of 1937 patients with CLD who underwent 2D-SWE and liver biopsy at 17 centers in China, Japan, and Europe were included. Their data were used to develop and evaluate the DLRE-X model as well as to make comparisons with 2D-SWE. The results demonstrate that the model achieved significantly better diagnostic accuracy in the external test set. More importantly, when the robustness of identifying cACLD was compared using these two methods across different subanalyses, DLRE-X was found to achieve highly reproducible AUCs, whereas 2D-SWE exhibited significantly different AUCs in patients from different countries and regions or with different etiologies.

Although a considerable number of studies have assessed the effectiveness of using radiomics models to stage liver fibrosis [18–22], there are no real international data covering multiple manufacturers and etiologies to evaluate their potential impact on the performance of radiomics models. Durot et al. [37] previously proposed a machine learning-based model capable of analyzing point SWE and 2D-SWE images from two manufacturers (Siemens and Philips) to grade significant fibrosis (≥ F2) in patients with CLD of multiple etiologies. However, this was a single-center retrospective study with only 232 patients, using magnetic resonance elastography as the reference standard instead of pathological results. Therefore, this international multicenter study offers solid evidence that eliminates doubts regarding the generalizability of radiomics models.

Of note, the data of European patients were not included in the training and internal test sets but were only used for the independent external test set. One important reason for this was the relatively small sample size of European patients, which made it difficult to effectively support training, internal testing, and external testing simultaneously. Another reason was that such a design could verify whether DLRE-X could achieve an accurate and consistent diagnosis in European patients in the absence of European training data. The results showed that this was the case, suggesting that the model may be applicable to other regions/countries even if patients in those places were not involved in the model training process.

In addition, in the external test set, the AUC of the model was significantly better than that of 2D-SWE only in the Chinese subanalysis (0.87 vs 0.75, P = 0.003). No evidence of significant differences was observed in the European or Japanese subanalyses (both P > 0.05). In addition, AUCs of 2D-SWE in China were much lower than that in Europe and Japan (China vs Europe vs Japan: 0.75 vs 0.93 vs 0.91, P < 0.001). This was probably because the Chinese subanalysis was subject to more confounding factors. It included 221 patients from three hospitals examined using US devices from three manufacturers, whereas the Japanese and European subanalyses included only 99 patients from one hospital using GE systems and 75 patients from three hospitals using SSI systems, respectively. When including a larger sample size and more hospitals and manufacturers and the tested subanalyses were closer to the real world, the performance of 2D-SWE worsened, and the advantage of the model became more obvious, as shown in the Chinese subanalysis comparison. Such limitations of the LSM cut-offs are consistent with those reported in previous studies. A multicenter study by Degos et al. [38] showed a lower AUC than a single-center study by Cardoso et al. [39] (number of patients: 284 vs 202; AUC: 0.78 vs 0.87), even though both studies were performed in the same region using the same manufacturer.

Many other studies have also indicated that the adoption of LSM cut-offs for diagnosing cACLD is severely affected by various factors, including country and region, manufacturer, etiology, inflammation, and steatosis, yielding a wide range of thresholds with varied performances [5–9]. Thus, the worldwide promotion of this technique suffers from the lack of a universally reliable cut-off value [40, 41]. There are more than 1600 Grade IIIA hospitals in China that are responsible for managing more than 440 million patients with CLD in their routine clinical practice [42, 43], but they use US elastography devices from different manufacturers. Several multicenter clinical trials have been conducted to determine optimal LSM cut-offs for different manufacturers [44, 45]. However, these tremendous efforts have already been made in many other countries and regions and will probably have to be made again when new US devices are released. The current study demonstrates that the DLRE-X is an effective approach for solving this challenge. The model works for patients with any etiology of CLD or for any 2D-SWE systems in any country. Once it is trained with sufficient data, ultrasound radiologists only need to perform a standardized manual selection of the ROI in the daily workflow to obtain an accurate and reliable diagnosis of cACLD.

The biggest challenge of this study was the design and development of DLRE-X because 2D-SWE images are remarkably different from different manufacturers. To establish this model, a state-of-the-art ConvNeXt [28], dual-input (LSM and 2D-SWE ROI) design, transfer learning strategy [32], and ROI selection and data argumentation protocol inherited from the original and refined DLRE models were adopted [18, 19]. The impacts of single and dual inputs on the performance of the model were also compared, which proved that dual-input improved the overall diagnostic accuracy. Therefore, the integration of LSM and pseudo-color image for deep learning analysis is necessary and should be widely used in future clinical applications. The overall performance of the model will continue to improve as more data are added to the training. Such a self-evolution capability is not available in the conventional 2D-SWE method. Another important point is that diagnostic inconsistencies across different devices, etiologies, and regions are not exclusive to 2D-SWE; they also appear in other imaging modalities (such as magnetic resonance elastography). The DLRE-X model offers methodological viability for constructing artificial intelligence models that can be applied across devices, etiologies, and regions and may help other artificial intelligence modalities.

This study had several limitations. First, it was a retrospective study. Second, there were prevalence biases. Owing to the different disease prevalence in different regions, most patients with CHB were from China, whereas all European patients had MASLD. The ALT levels (U/L) were higher in this study than that in other studies (current study vs Zheng et al. [17]: 60.0; 95%CI: 56.2, 63.7; vs 43.0; 95%CI: 26.0, 87.5; P = 0.01) [25], which was probably a confounding factor affecting the performance of 2D-SWE. Third, the sample size was relatively small for countries/regions other than China. Finally, only three regions, three manufacturers, and three etiologies were included. A larger sample size, including more regions/manufacturers and etiologies, is required in future studies.

In summary, a deep learning-based radiomics model achieved more accurate and robust performance in the noninvasive diagnosis of cACLD than 2D-SWE across different countries and regions, manufacturers, and etiologies. Future evaluations of this model should be based on the current study by evaluating more etiologies, US device manufacturers, countries and regions, and countries.

Conclusions

The model achieved more accurate and robust performance in the noninvasive diagnosis of cACLD than 2D-SWE across different countries and regions, etiologies, and manufacturers.

Supplementary Information

Supplementary Material 1.^{(47.5KB, docx)}

Supplementary Material 2.^{(25.1KB, docx)}

Supplementary Material 3.^{(1.3MB, docx)}

Acknowledgements

We deeply appreciated Prof. Ferraioli Giovanna for the program communications between China and Europe. We thank all the patients involved in this study. The authors would like to acknowledge the instrumental and technical support of Multimodal Biomedical Imaging Experimental Platform, Institute of Automation, Chinese Academy of Sciences.

Abbreviations

cACLD: Compensated advanced chronic liver disease
LSM: Liver stiffness measurement
CNN: Convolutional neural network
CHB: Chronic hepatitis B
CHC: Chronic hepatitis C
CLD: Chronic liver disease
MASLD: Metabolic dysfunction-associated steatotic liver disease
ROC: Receiver operating characteristic
AUC: Area under receiver characteristics curve
2D-SWE: Two-dimensional shear wave elastography
SSI: SuperSonic Imagine
GE: General Electric
DLRE-X: Generalized deep learning radiomics of elastography
ROI: Region of interest
Conv2d: Two-dimensional convolutional layer
Dim: Dimension
ConvNeXt: Convolutional network for next generation
AST: Aspartate aminotransferase
BMI: Body mass index
ALT: Alanine aminotransferase
ALB: Albumin
TB: Total bilirubin
DB: Direct bilirubin
IB: Indirect bilirubin
GGT: Gamma-glutamyl transpeptidase
PT: Prothrombin activity
F0-1: No or minor fibrosis
F2: Clinically significant fibrosis
F3: Advanced fibrosis
F4: Cirrhosis

Authors’ contributions

KW, JR, PL, and XY contributed to the study concept and design; XL, HZ, HK, MG, VL, IG, RL, HD, JC, MW, CF, XR, CL, TS, FM, YZ, YF, SM, JW, XQ, and JT contributed to the data acquisition; XL and HZ contributed to data analysis, drafting of the manuscript; all authors contributed to critical revision and approved the final manuscript.

Funding

The work is supported by the National Key Research and Development Program of China, No. 2023YFF1204600; the National Natural Science Foundation of China, Nos. 82302221, 81971632, 82441010, 92159305, 82272029; the Beijing Science Fund for Distinguished Young Scholars, No. JQ22013; the Guangdong Basic and Applied Basic Research Foundation, No. 2021A1515110553.

Data availability

The 2D-SWE images and datasets generated during or analyzed in this study are not publicly available due to restrictions by privacy laws. The 2D-SWE images, LSM values and related clinical datasets are held by the Department of Ultrasound, Guangdong Key Laboratory of Liver Disease Research, Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China. Requests for sharing of all data and material should be addressed to the corresponding author within 15 years of the date of publication of this article and include a scientific proposal. The datasets will be shared after the approval of Prof. Ping Liang with a signed data access agreements.

Declarations

Competing interests

All authors declared that they do not have anything to disclose regarding funding or conflict of interest with respect to this manuscript.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xue Lu and Haoyan Zhang contributed equally to this work.

Contributor Information

Xin Yang, Email: xin.yang@ia.ac.cn.

Jie Ren, Email: renj@mail.sysu.edu.cn.

Ping Liang, Email: liangping301@126.com.

Kun Wang, Email: kun.wang@ia.ac.cn.

References

1.de Franchis R, Baveno VI Faculty (2015) Expanding consensus in portal hypertension: report of the Baveno VI Consensus Workshop: stratifying risk and individualizing care for portal hypertension. J Hepatol 63(3):743–752. 10.1016/j.jhep.2015.05.022 [DOI] [PubMed] [Google Scholar]
2.de Franchis R, Bosch J, Garcia-Tsao G, Reiberger T, Ripoll C, Baveno VII F (2022) Baveno VII - renewing consensus in portal hypertension. J Hepatol 76(4):959–974. 10.1016/j.jhep.2021.12.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Miao L, Targher G, Byrne CD, Valenti L, Qi XL, Zheng MH (2022) Portal hypertension in nonalcoholic fatty liver disease: challenges and perspectives. Port Hypertens Cirrhosis 1:57–65. 10.1002/poh2.8 [Google Scholar]
4.Asesio N, Pollo-Flores P, Caliez O, Munteanu M, Ngo A, Ngo Y et al (2022) Baveno VI criteria as a prognostic factor for clinical complications in patients with compensated cirrhosis. Dig Liver Dis 54(5):645–653. 10.1016/j.dld.2021.09.004 [DOI] [PubMed] [Google Scholar]
5.European Association for the Study of the Liver (2021) EASL Clinical Practice Guidelines on non-invasive tests for evaluation of liver disease severity and prognosis - 2021 update. J Hepatol 75(3):659–689. 10.1016/j.jhep.2021.05.025 [DOI] [PubMed] [Google Scholar]
6.Ferraioli G, Tinelli C, Malfitano A, Dal Bello B, Filice G, Filice C et al (2012) Performance of real-time strain elastography, transient elastography, and aspartate-to-platelet ratio index in the assessment of fibrosis in chronic hepatitis C. Am J Roentgenol 199(1):19–25. 10.2214/AJR.11.7517 [DOI] [PubMed] [Google Scholar]
7.Friedrich-Rust M, Nierhoff J, Lupsor M, Sporea I, Fierbinteanu-Braticevici C, Strobel D et al (2012) Performance of Acoustic Radiation Force Impulse imaging for the staging of liver fibrosis: a pooled meta-analysis. J Viral Hepat 19(2):e212–e219. 10.1111/j.1365-2893.2011.01537.x [DOI] [PubMed] [Google Scholar]
8.Bende F, Sporea I, Sirli R, Popescu A, Mare R, Miutescu B et al (2017) Performance of 2D-SWE.GE for predicting different stages of liver fibrosis, using Transient Elastography as the reference method. Med Ultrason 19(2):143–149. 10.11152/mu-910 [DOI] [PubMed]
9.Joo I, Kim SY, Park HS, Lee ES, Kang HJ, Lee JM (2019) Validation of a new point shear-wave elastography method for noninvasive assessment of liver fibrosis: a prospective multicenter study. Korean J Radiol 20(11):1527–1535. 10.3348/kjr.2019.0109 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wang JF, Wu ML, Linghu RZ, Chang JD, Wu M, Feng C et al (2022) Usefulness of new shear wave elastography technique for noninvasive assessment of liver fibrosis in patients with chronic hepatitis B: A prospective multicenter study. Ultraschall Med 43(2):e1–e10. 10.1055/a-1376-6734 [DOI] [PubMed] [Google Scholar]
11.Chon YE, Jin YJ, An J, Kim HY, Choi M, Jun DW et al (2024) Optimal cut-offs of vibration-controlled transient elastography and magnetic resonance elastography in diagnosing advanced liver fibrosis in patients with nonalcoholic fatty liver disease: a systematic review and meta-analysis. Clin Mol Hepatol 30(Suppl):S117–S133. 10.3350/cmh.2024.0392 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Hall TJ, Milkowski A, Garra B, Carson P, Palmeri M, Nightingale K et al (2013) RSNA/QIBA: shear wave speed as a biomarker for liver fibrosis staging. In: Proceedings of 2013 IEEE international ultrasonics symposium, IEEE, Prague, 21–25 July 2013. 10.1109/ULTSYM.2013.0103
13.Dietrich CF, Bamber J, Berzigotti A, Bota S, Cantisani V, Castera L et al (2017) EFSUMB Guidelines and recommendations on the clinical use of liver ultrasound elastography, update 2017 (Long Version). Ultraschall Med 38(4):e16–e47. 10.1055/s-0043-103952 [DOI] [PubMed] [Google Scholar]
14.Ferraioli G, Wong VWS, Castera L, Berzigotti A, Sporea I, Dietrich CF et al (2018) Liver ultrasound elastography: an update to the world federation for ultrasound in medicine and biology guidelines and recommendations. Ultrasound Med Biol 44(12):2419–2440. 10.1016/j.ultrasmedbio.2018.07.008 [DOI] [PubMed] [Google Scholar]
15.Ferraioli G, Barr RG, Berzigotti A, Sporea I, Wong VWS, Reiberger T et al (2024) WFUMB guideline/guidance on liver multiparametric ultrasound: Part 1. Update to 2018 guidelines on liver ultrasound elastography. Ultrasound Med Biol 50(8):1071–1087. 10.1016/j.ultrasmedbio.2024.03.013 [DOI] [PubMed]
16.Yoo HW, Kim SG, Jang JY, Yoo JJ, Jeong SW, Kim YS et al (2022) Two-dimensional shear wave elastography for assessing liver fibrosis in patients with chronic liver disease: a prospective cohort study. Korean J Intern Med 37(2):285–293. 10.3904/kjim.2020.635 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Zheng J, Guo HY, Zeng J, Huang ZP, Zheng BW, Ren J et al (2015) Two-dimensional shear-wave elastography and conventional US: the optimal evaluation of liver fibrosis and cirrhosis. Radiology 275(1):290–300. 10.1148/radiol.14140828 [DOI] [PubMed] [Google Scholar]
18.Wang K, Lu X, Zhou H, Gao YY, Zheng J, Tong MH et al (2019) Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68(4):729–741. 10.1136/gutjnl-2018-316204 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lu X, Zhou H, Wang K, Jin JY, Meng FK, Mu XJ et al (2021) Comparing radiomics models with different inputs for accurate diagnosis of significant fibrosis in chronic liver disease. Eur Radiol 31(11):8743–8754. 10.1007/s00330-021-07934-6 [DOI] [PubMed] [Google Scholar]
20.Kagadis GC, Drazinos P, Gatos I, Tsantis S, Papadimitroulas P, Spiliopoulos S et al (2020) Deep learning networks on chronic liver disease assessment with fine-tuning of shear wave elastography image sequences. Phys Med Biol 65(21):215027. 10.1088/1361-6560/abae06 [DOI] [PubMed] [Google Scholar]
21.Dana J, Venkatasamy A, Saviano A, Lupberger J, Hoshida Y, Vilgrain V et al (2022) Conventional and artificial intelligence-based imaging for biomarker discovery in chronic liver disease. Hepatol Int 16(3):509–522. 10.1007/s12072-022-10303-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lee JH, Joo I, Kang TW, Paik YH, Sinn DH, Ha SY et al (2020) Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network. Eur Radiol 30(2):1264–1273. 10.1007/s00330-019-06407-1 [DOI] [PubMed] [Google Scholar]
23.Barr RG, Wilson SR, Rubens D, Garcia-Tsao G, Ferraioli G (2020) Update to the society of radiologists in ultrasound liver elastography consensus statement. Radiology 296(2):263–274. 10.1148/radiol.2020192437 [DOI] [PubMed] [Google Scholar]
24.Trebicka J, Gu WY, de Ledinghen V, Aubé C, Krag A, Praktiknjo M et al (2022) Two-dimensional shear wave elastography predicts survival in advanced chronic liver disease. Gut 71(2):402–414. 10.1136/gutjnl-2020-323419 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Papatheodoridi M, Hiriart JB, Lupsor-Platon M, Bronte F, Boursier J, Elshaarawy O et al (2021) Refining the Baveno VI elastography criteria for the definition of compensated advanced chronic liver disease. J Hepatol 74(5):1109–1116. 10.1016/j.jhep.2020.11.050 [DOI] [PubMed] [Google Scholar]
26.Liu Z, Mao HZ, Wu CY, Feichtenhofer C, Darrell T, Xie SN (2022) A convnet for the 2020s. In: Proceedings of 2022 IEEE/CVF conference on computer vision and pattern recognition, IEEE, New Orleans, 18–24 June 2022. 10.1109/CVPR52688.2022.01167
27.He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of 2016 IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, 27–30 June 2016. 10.1109/CVPR.2016.90
28.Xie SN, Girshick R, Dollár P, Tu ZW, He KM (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21–26 July 2017. 10.1109/CVPR.2017.634
29.Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18–23 June 2018. 10.1109/CVPR.2018.00474
30.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., Long Beach, 4–9 December 2017
31.Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of 2021 IEEE/CVF international conference on computer vision, IEEE, Montreal, 10–17 October 2021. 10.1109/ICCV48922.2021.00986
32.Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. 10.1109/TKDE.2009.191 [Google Scholar]
33.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd conference on neural information processing systems, NeurIPS, Vancouver, 8–14 December 2019
34.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE conference on computer vision and pattern recognition, IEEE, Miami, 20–25 December 2009. 10.1109/CVPR.2009.5206848
35.Xue LY, Jiang ZY, Fu TT, Wang QM, Zhu YL, Dai M et al (2020) Transfer learning radiomics based on multimodal ultrasound imaging for staging liver fibrosis. Eur Radiol 30(5):2973–2983. 10.1007/s00330-019-06595-w [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Yan R, Zhang F, Rao XS, Lv ZL, Li JT, Zhang LL et al (2021) Richer fusion network for breast cancer classification based on multimodal data. BMC Med Inform Decis Mak 21(Suppl 1):134. 10.1186/s12911-020-01340-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Durot I, Akhbardeh A, Sagreiya H, Loening AM, Rubin DL (2020) A new multimodel machine learning framework to improve hepatic fibrosis grading using ultrasound elastography systems from different vendors. Ultrasound Med Biol 46(1):26–33. 10.1016/j.ultrasmedbio.2019.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Degos F, Perez P, Roche B, Mahmoudi A, Asselineau J, Voitot H et al (2010) Diagnostic accuracy of FibroScan and comparison to liver fibrosis biomarkers in chronic viral hepatitis: a multicenter prospective study (the FIBROSTIC study). J Hepatol 53(6):1013–1021. 10.1016/j.jhep.2010.05.035 [DOI] [PubMed] [Google Scholar]
39.Cardoso AC, Carvalho-Filho RJ, Stern C, Dipumpo A, Giuily N, Ripault MP et al (2012) Direct comparison of diagnostic performance of transient elastography in patients with chronic hepatitis B and chronic hepatitis C. Liver Int 32(4):612–621. 10.1111/j.1478-3231.2011.02660.x [DOI] [PubMed] [Google Scholar]
40.Lazarus JV, Castera L, Mark HE, Allen AM, Adams LA, Anstee QM et al (2023) Real-world evidence on non-invasive tests and associated cut-offs used to assess fibrosis in routine clinical practice. JHEP Rep 5(1):100596. 10.1016/j.jhepr.2022.100596 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Zeng J, Zheng J, Jin JY, Mao YJ, Guo HY, Lu MD et al (2019) Shear wave elastography for liver fibrosis in chronic hepatitis B: adapting the cut-offs to alanine aminotransferase levels improves accuracy. Eur Radiol 29(2):857–865. 10.1007/s00330-018-5621-x [DOI] [PubMed] [Google Scholar]
42.Qiu HB, Cao SM, Xu RH (2021) Cancer incidence, mortality, and burden in China: a time-trend analysis and comparison with the United States and United Kingdom based on the global epidemiological data released in 2020. Cancer Commun (Lond) 41(10):1037–1048. 10.1002/cac2.12197 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Cao W, Chen HD, Yu YW, Li N, Chen WQ (2021) Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J 134(7):783–791. 10.1097/CM9.0000000000001474 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Bâldea V, Bende F, Popescu A, Șirli R, Sporea I (2021) Comparative study between two 2D-Shear Waves Elastography techniques for the non-invasive assessment of liver fibrosis in patients with chronic hepatitis C virus (HCV) infection. Med Ultrason 23(3):257–264. 10.11152/mu-2863 [DOI] [PubMed]
45.Paternostro R, Reiberger T, Bucsics T (2019) Elastography-based screening for esophageal varices in patients with advanced chronic liver disease. World J Gastroenterol 25(3):308–329. 10.3748/wjg.v25.i3.308 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.^{(47.5KB, docx)}

Supplementary Material 2.^{(25.1KB, docx)}

Supplementary Material 3.^{(1.3MB, docx)}

Data Availability Statement

[CR1] 1.de Franchis R, Baveno VI Faculty (2015) Expanding consensus in portal hypertension: report of the Baveno VI Consensus Workshop: stratifying risk and individualizing care for portal hypertension. J Hepatol 63(3):743–752. 10.1016/j.jhep.2015.05.022 [DOI] [PubMed] [Google Scholar]

[CR2] 2.de Franchis R, Bosch J, Garcia-Tsao G, Reiberger T, Ripoll C, Baveno VII F (2022) Baveno VII - renewing consensus in portal hypertension. J Hepatol 76(4):959–974. 10.1016/j.jhep.2021.12.022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Miao L, Targher G, Byrne CD, Valenti L, Qi XL, Zheng MH (2022) Portal hypertension in nonalcoholic fatty liver disease: challenges and perspectives. Port Hypertens Cirrhosis 1:57–65. 10.1002/poh2.8 [Google Scholar]

[CR4] 4.Asesio N, Pollo-Flores P, Caliez O, Munteanu M, Ngo A, Ngo Y et al (2022) Baveno VI criteria as a prognostic factor for clinical complications in patients with compensated cirrhosis. Dig Liver Dis 54(5):645–653. 10.1016/j.dld.2021.09.004 [DOI] [PubMed] [Google Scholar]

[CR5] 5.European Association for the Study of the Liver (2021) EASL Clinical Practice Guidelines on non-invasive tests for evaluation of liver disease severity and prognosis - 2021 update. J Hepatol 75(3):659–689. 10.1016/j.jhep.2021.05.025 [DOI] [PubMed] [Google Scholar]

[CR6] 6.Ferraioli G, Tinelli C, Malfitano A, Dal Bello B, Filice G, Filice C et al (2012) Performance of real-time strain elastography, transient elastography, and aspartate-to-platelet ratio index in the assessment of fibrosis in chronic hepatitis C. Am J Roentgenol 199(1):19–25. 10.2214/AJR.11.7517 [DOI] [PubMed] [Google Scholar]

[CR7] 7.Friedrich-Rust M, Nierhoff J, Lupsor M, Sporea I, Fierbinteanu-Braticevici C, Strobel D et al (2012) Performance of Acoustic Radiation Force Impulse imaging for the staging of liver fibrosis: a pooled meta-analysis. J Viral Hepat 19(2):e212–e219. 10.1111/j.1365-2893.2011.01537.x [DOI] [PubMed] [Google Scholar]

[CR8] 8.Bende F, Sporea I, Sirli R, Popescu A, Mare R, Miutescu B et al (2017) Performance of 2D-SWE.GE for predicting different stages of liver fibrosis, using Transient Elastography as the reference method. Med Ultrason 19(2):143–149. 10.11152/mu-910 [DOI] [PubMed]

[CR9] 9.Joo I, Kim SY, Park HS, Lee ES, Kang HJ, Lee JM (2019) Validation of a new point shear-wave elastography method for noninvasive assessment of liver fibrosis: a prospective multicenter study. Korean J Radiol 20(11):1527–1535. 10.3348/kjr.2019.0109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Wang JF, Wu ML, Linghu RZ, Chang JD, Wu M, Feng C et al (2022) Usefulness of new shear wave elastography technique for noninvasive assessment of liver fibrosis in patients with chronic hepatitis B: A prospective multicenter study. Ultraschall Med 43(2):e1–e10. 10.1055/a-1376-6734 [DOI] [PubMed] [Google Scholar]

[CR11] 11.Chon YE, Jin YJ, An J, Kim HY, Choi M, Jun DW et al (2024) Optimal cut-offs of vibration-controlled transient elastography and magnetic resonance elastography in diagnosing advanced liver fibrosis in patients with nonalcoholic fatty liver disease: a systematic review and meta-analysis. Clin Mol Hepatol 30(Suppl):S117–S133. 10.3350/cmh.2024.0392 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Hall TJ, Milkowski A, Garra B, Carson P, Palmeri M, Nightingale K et al (2013) RSNA/QIBA: shear wave speed as a biomarker for liver fibrosis staging. In: Proceedings of 2013 IEEE international ultrasonics symposium, IEEE, Prague, 21–25 July 2013. 10.1109/ULTSYM.2013.0103

[CR13] 13.Dietrich CF, Bamber J, Berzigotti A, Bota S, Cantisani V, Castera L et al (2017) EFSUMB Guidelines and recommendations on the clinical use of liver ultrasound elastography, update 2017 (Long Version). Ultraschall Med 38(4):e16–e47. 10.1055/s-0043-103952 [DOI] [PubMed] [Google Scholar]

[CR14] 14.Ferraioli G, Wong VWS, Castera L, Berzigotti A, Sporea I, Dietrich CF et al (2018) Liver ultrasound elastography: an update to the world federation for ultrasound in medicine and biology guidelines and recommendations. Ultrasound Med Biol 44(12):2419–2440. 10.1016/j.ultrasmedbio.2018.07.008 [DOI] [PubMed] [Google Scholar]

[CR15] 15.Ferraioli G, Barr RG, Berzigotti A, Sporea I, Wong VWS, Reiberger T et al (2024) WFUMB guideline/guidance on liver multiparametric ultrasound: Part 1. Update to 2018 guidelines on liver ultrasound elastography. Ultrasound Med Biol 50(8):1071–1087. 10.1016/j.ultrasmedbio.2024.03.013 [DOI] [PubMed]

[CR16] 16.Yoo HW, Kim SG, Jang JY, Yoo JJ, Jeong SW, Kim YS et al (2022) Two-dimensional shear wave elastography for assessing liver fibrosis in patients with chronic liver disease: a prospective cohort study. Korean J Intern Med 37(2):285–293. 10.3904/kjim.2020.635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Zheng J, Guo HY, Zeng J, Huang ZP, Zheng BW, Ren J et al (2015) Two-dimensional shear-wave elastography and conventional US: the optimal evaluation of liver fibrosis and cirrhosis. Radiology 275(1):290–300. 10.1148/radiol.14140828 [DOI] [PubMed] [Google Scholar]

[CR18] 18.Wang K, Lu X, Zhou H, Gao YY, Zheng J, Tong MH et al (2019) Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68(4):729–741. 10.1136/gutjnl-2018-316204 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Lu X, Zhou H, Wang K, Jin JY, Meng FK, Mu XJ et al (2021) Comparing radiomics models with different inputs for accurate diagnosis of significant fibrosis in chronic liver disease. Eur Radiol 31(11):8743–8754. 10.1007/s00330-021-07934-6 [DOI] [PubMed] [Google Scholar]

[CR20] 20.Kagadis GC, Drazinos P, Gatos I, Tsantis S, Papadimitroulas P, Spiliopoulos S et al (2020) Deep learning networks on chronic liver disease assessment with fine-tuning of shear wave elastography image sequences. Phys Med Biol 65(21):215027. 10.1088/1361-6560/abae06 [DOI] [PubMed] [Google Scholar]

[CR21] 21.Dana J, Venkatasamy A, Saviano A, Lupberger J, Hoshida Y, Vilgrain V et al (2022) Conventional and artificial intelligence-based imaging for biomarker discovery in chronic liver disease. Hepatol Int 16(3):509–522. 10.1007/s12072-022-10303-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Lee JH, Joo I, Kang TW, Paik YH, Sinn DH, Ha SY et al (2020) Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network. Eur Radiol 30(2):1264–1273. 10.1007/s00330-019-06407-1 [DOI] [PubMed] [Google Scholar]

[CR23] 23.Barr RG, Wilson SR, Rubens D, Garcia-Tsao G, Ferraioli G (2020) Update to the society of radiologists in ultrasound liver elastography consensus statement. Radiology 296(2):263–274. 10.1148/radiol.2020192437 [DOI] [PubMed] [Google Scholar]

[CR24] 24.Trebicka J, Gu WY, de Ledinghen V, Aubé C, Krag A, Praktiknjo M et al (2022) Two-dimensional shear wave elastography predicts survival in advanced chronic liver disease. Gut 71(2):402–414. 10.1136/gutjnl-2020-323419 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Papatheodoridi M, Hiriart JB, Lupsor-Platon M, Bronte F, Boursier J, Elshaarawy O et al (2021) Refining the Baveno VI elastography criteria for the definition of compensated advanced chronic liver disease. J Hepatol 74(5):1109–1116. 10.1016/j.jhep.2020.11.050 [DOI] [PubMed] [Google Scholar]

[CR26] 26.Liu Z, Mao HZ, Wu CY, Feichtenhofer C, Darrell T, Xie SN (2022) A convnet for the 2020s. In: Proceedings of 2022 IEEE/CVF conference on computer vision and pattern recognition, IEEE, New Orleans, 18–24 June 2022. 10.1109/CVPR52688.2022.01167

[CR27] 27.He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of 2016 IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, 27–30 June 2016. 10.1109/CVPR.2016.90

[CR28] 28.Xie SN, Girshick R, Dollár P, Tu ZW, He KM (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21–26 July 2017. 10.1109/CVPR.2017.634

[CR29] 29.Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18–23 June 2018. 10.1109/CVPR.2018.00474

[CR30] 30.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., Long Beach, 4–9 December 2017

[CR31] 31.Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of 2021 IEEE/CVF international conference on computer vision, IEEE, Montreal, 10–17 October 2021. 10.1109/ICCV48922.2021.00986

[CR32] 32.Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. 10.1109/TKDE.2009.191 [Google Scholar]

[CR33] 33.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd conference on neural information processing systems, NeurIPS, Vancouver, 8–14 December 2019

[CR34] 34.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE conference on computer vision and pattern recognition, IEEE, Miami, 20–25 December 2009. 10.1109/CVPR.2009.5206848

[CR35] 35.Xue LY, Jiang ZY, Fu TT, Wang QM, Zhu YL, Dai M et al (2020) Transfer learning radiomics based on multimodal ultrasound imaging for staging liver fibrosis. Eur Radiol 30(5):2973–2983. 10.1007/s00330-019-06595-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Yan R, Zhang F, Rao XS, Lv ZL, Li JT, Zhang LL et al (2021) Richer fusion network for breast cancer classification based on multimodal data. BMC Med Inform Decis Mak 21(Suppl 1):134. 10.1186/s12911-020-01340-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Durot I, Akhbardeh A, Sagreiya H, Loening AM, Rubin DL (2020) A new multimodel machine learning framework to improve hepatic fibrosis grading using ultrasound elastography systems from different vendors. Ultrasound Med Biol 46(1):26–33. 10.1016/j.ultrasmedbio.2019.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Degos F, Perez P, Roche B, Mahmoudi A, Asselineau J, Voitot H et al (2010) Diagnostic accuracy of FibroScan and comparison to liver fibrosis biomarkers in chronic viral hepatitis: a multicenter prospective study (the FIBROSTIC study). J Hepatol 53(6):1013–1021. 10.1016/j.jhep.2010.05.035 [DOI] [PubMed] [Google Scholar]

[CR39] 39.Cardoso AC, Carvalho-Filho RJ, Stern C, Dipumpo A, Giuily N, Ripault MP et al (2012) Direct comparison of diagnostic performance of transient elastography in patients with chronic hepatitis B and chronic hepatitis C. Liver Int 32(4):612–621. 10.1111/j.1478-3231.2011.02660.x [DOI] [PubMed] [Google Scholar]

[CR40] 40.Lazarus JV, Castera L, Mark HE, Allen AM, Adams LA, Anstee QM et al (2023) Real-world evidence on non-invasive tests and associated cut-offs used to assess fibrosis in routine clinical practice. JHEP Rep 5(1):100596. 10.1016/j.jhepr.2022.100596 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Zeng J, Zheng J, Jin JY, Mao YJ, Guo HY, Lu MD et al (2019) Shear wave elastography for liver fibrosis in chronic hepatitis B: adapting the cut-offs to alanine aminotransferase levels improves accuracy. Eur Radiol 29(2):857–865. 10.1007/s00330-018-5621-x [DOI] [PubMed] [Google Scholar]

[CR42] 42.Qiu HB, Cao SM, Xu RH (2021) Cancer incidence, mortality, and burden in China: a time-trend analysis and comparison with the United States and United Kingdom based on the global epidemiological data released in 2020. Cancer Commun (Lond) 41(10):1037–1048. 10.1002/cac2.12197 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Cao W, Chen HD, Yu YW, Li N, Chen WQ (2021) Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J 134(7):783–791. 10.1097/CM9.0000000000001474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Bâldea V, Bende F, Popescu A, Șirli R, Sporea I (2021) Comparative study between two 2D-Shear Waves Elastography techniques for the non-invasive assessment of liver fibrosis in patients with chronic hepatitis C virus (HCV) infection. Med Ultrason 23(3):257–264. 10.11152/mu-2863 [DOI] [PubMed]

[CR45] 45.Paternostro R, Reiberger T, Bucsics T (2019) Elastography-based screening for esophageal varices in patients with advanced chronic liver disease. World J Gastroenterol 25(3):308–329. 10.3748/wjg.v25.i3.308 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Deep learning radiomics of elastography for diagnosing compensated advanced chronic liver disease: an international multicenter study

Xue Lu

Haoyan Zhang

Hidekatsu Kuroda

Matteo Garcovich

Victor de Ledinghen

Ivica Grgurević

Runze Linghu

Hong Ding

Jiandong Chang

Min Wu

Cheng Feng

Xinping Ren

Changzhu Liu

Tao Song

Fankun Meng

Yao Zhang

Ye Fang

Sumei Ma

Jinfen Wang

Xiaolong Qi

Jie Tian

Xin Yang

Jie Ren

Ping Liang

Kun Wang

Abstract

Supplementary Information

Introduction

Methods

Study design and datasets

2D-SWE image acquisition and quality control

Clinical data and pathological evaluation

DLRE-X

Fig. 1.

Statistical analysis

Results

Baseline dataset characteristics

Fig. 2.

Table 1.

Overall diagnostic performance comparisons between DLRE-X and 2D-SWE

Fig. 3.

Table 2.

Single- vs dual-input performance of DLRE-X

Diagnostic accuracy and robustness comparisons between DLRE-X and 2D-SWE among different subanalyses

Country or region

Fig. 4.

Etiologies

US device manufacturers

Single- vs dual-input performance of DLRE-X in subanalyses

Poor robustness of 2D-SWE in cACLD discrimination

Discussion

Conclusions

Supplementary Information

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases