Interactive Explainable Deep Learning Model for Hepatocellular Carcinoma Diagnosis at Gadoxetic Acid–enhanced MRI: A Retrospective, Multicenter, Diagnostic Study

Mingkai Li; Zhi Zhang; Zebin Chen; Xi Chen; Huaqing Liu; Yuanqiang Xiao; Haimei Chen; Xiaodan Zong; Jingbiao Chen; Jianning Chen; Xinying Wang; Xuehong Xiao; Zhiwei Yang; Lanqing Han; Jin Wang; Bin Wu

doi:10.1148/rycan.240332

. 2025 May 30;7(3):e240332. doi: 10.1148/rycan.240332

Interactive Explainable Deep Learning Model for Hepatocellular Carcinoma Diagnosis at Gadoxetic Acid–enhanced MRI: A Retrospective, Multicenter, Diagnostic Study

Mingkai Li ¹, Zhi Zhang ⁵, Zebin Chen ⁶, Xi Chen ², Huaqing Liu ⁴, Yuanqiang Xiao ², Haimei Chen ², Xiaodan Zong ², Jingbiao Chen ², Jianning Chen ³, Xinying Wang ⁷, Xuehong Xiao ⁸, Zhiwei Yang ⁹, Lanqing Han ⁴, Jin Wang ^2,^#, Bin Wu ^1,^✉,^#

PMCID: PMC12130696 PMID: 40445095

Abstract

Purpose

To develop an artificial intelligence (AI) model based on gadoxetic acid–enhanced MRI to assist radiologists in hepatocellular carcinoma (HCC) diagnosis.

Materials and Methods

This retrospective study included patients with focal liver lesions (FLLs) who underwent gadoxetic acid–enhanced MRI between January 2015 and December 2021. All hepatic malignancies were diagnosed pathologically, whereas benign lesions were confirmed with pathologic findings or imaging follow-up. Five manually labeled bounding boxes for each FLL obtained from precontrast T1-weighted, T2-weighted, arterial phase, portal venous phase, and hepatobiliary phase images were included. The lesion classifier component, used to distinguish HCC from non-HCC, was trained and externally tested. The feature classifier, based on a post hoc algorithm, inferred the presence of the Liver Imaging Reporting and Data System (LI-RADS) features by analyzing activation patterns of the pretrained lesion classifier. Two radiologists categorized FLLs in the external testing dataset according to LI-RADS criteria. Diagnostic performance of the AI model and the model’s impact on reader accuracy were assessed.

Results

The study included 839 patients (mean age, 51 years ± 12 [SD]; 681 male) with 1023 FLLs (594 HCCs and 429 non-HCCs). The AI model yielded area under the receiver operating characteristic curves of 0.98 and 0.97 in the training set and external testing set, respectively. Compared with LI-RADS category 5, the AI model showed higher sensitivity (91.6% vs 74.8%; P < .001) and similar specificity (90.7% vs 96.0%; P = .22). The two readers identified more LI-RADS major features and more accurately classified category LR-5 lesions when assisted versus unassisted by AI, with higher sensitivities (reader 1, 85.7% vs 72.3%; P < .001; reader 2, 89.1% vs 74.0%; P < .001) and the same specificities (reader 1, 93.3% vs reader 2, 94.7%; P > .99 for both).

Conclusion

The AI model accurately diagnosed HCC and improved the radiologists’ diagnostic performance.

Keywords: Artificial Intelligence, Deep Learning, MRI, Hepatocellular Carcinoma

Supplemental material is available for this article.

See also commentary by Singh et al in this issue.

Keywords: Artificial Intelligence, Deep Learning, MRI, Hepatocellular Carcinoma

graphic file with name rycan.240332.VA.jpg

Summary

An artificial intelligence model developed using gadoxetic acid–enhanced MRI effectively diagnosed hepatocellular carcinoma. Radiologists assisted by the model, which included a post hoc Liver Imaging Reporting and Data System feature identification tool, had improved sensitivity.

Key Points

■ In a retrospective study of 839 patients with 1023 focal liver lesions (594 hepatocellular carcinomas [HCCs] and 429 non-HCCs), the deep learning–based artificial intelligence (AI) model allowed accurate diagnosis of HCC with an area under the receiver operating characteristic curve of 0.97 in the test set (119 HCCs and 75 non-HCCs).
■ Compared with Liver Imaging Reporting and Data System category 5 using the consensus data, the AI model showed higher sensitivity (91.6% vs 74.8%) and similar specificity (90.7% vs 96.0%).
■ Compared with Liver Imaging Reporting and Data System category 5 by individual readers, AI-assisted strategy improved the sensitivities of readers 1 and 2 from 72.3% to 74.0% and 85.7% to 89.1%, respectively, with the same specificity of 93.3% and 94.7%, respectively.

Introduction

Primary liver cancer is the sixth most frequently diagnosed cancer and the third leading cause of cancer-related death worldwide (1). Around 75%–85% of liver cancer cases are hepatocellular carcinoma (HCC), with a 5-year survival rate of only 18%. Patients with early-stage HCC can undergo potentially curative treatments like surgery and local ablation (2). Thus, there is an urgent need for reliable methods to improve the accurate diagnosis of HCC and enhance patient prognosis.

HCC can be routinely diagnosed based on validated imaging criteria, including CT and MRI examinations, in patients at high risk. The development of HCC is associated with reduced expression of organic anionic transporting polypeptide and the formation of abnormal arterioles. Gadoxetic acid–enhanced MRI provides valuable information regarding not only blood flow but also organic anionic transporting polypeptide changes in the liver. It also offers high contrast between lesions and normal liver tissue, making it superior to CT or MRI with extracellular contrast agents for detecting small HCCs (3,4). The Liver Imaging Reporting and Data System (LI-RADS) algorithm, integrated into the 2018 HCC clinical practice guidance, allows the diagnosis of definite HCC (LI-RADS category 5 [LR-5]) without the need for a biopsy (5). Although gadoxetic acid–enhanced MRI is highly specific (94%), it has relatively lower sensitivity (55%) (6). Additionally, the subjective evaluation of imaging features can lead to differences in interpretation among radiologists (7). This variability can result in unnecessary biopsies, carrying risks like tumor seeding and sampling errors (8).

The era of big data has led to the increasing interest and development of artificial intelligence (AI) models to improve clinical workflow. Radiology workflow is expected to incorporate AI technology to reduce medical costs and improve efficiency (9,10). Several studies have explored deep learning (DL) models for distinguishing liver masses using dynamic contrast-enhanced CT or MRI with extracellular contrast agents, achieving high accuracy (11–20). Such AI tools may lead to substantial improvement in diagnostic management of HCC. Many models performed better than or comparable to experienced radiologists (21). However, in real-world clinical scenarios, radiologists are responsible for the final diagnosis. To date, no studies have evaluated AI models for HCC diagnosis at gadoxetic acid–enhanced MRI and compared performance between AI, radiologists, and computer-aided diagnosis in the same patient cohort. The potential for computer-aided diagnosis to enhance diagnostic accuracy warrants further investigation.

Our study aimed to create a DL-based AI tool, comprising a lesion classifier and feature classifier, to distinguish between HCC and non-HCC and provide visual explanations for the model’s decision based on LI-RADS features. We evaluated the diagnostic performance of this AI tool on an external test set and assessed its impact on reader accuracy.

Materials and Methods

Study Patients and Data Collection

This was a multicenter, retrospective, diagnostic study of patients from five independent hospitals in China (center 1, Third Affiliated Hospital of Sun Yat-sen University; center 2, Third Affiliated Hospital of Sun Yat-sen University Yuedong Hospital; center 3, First Affiliated Hospital of Sun Yat-sen University; center 4, Zhujiang Hospital of Southern Medical University; center 5, Zhongshan City People’s Hospital), with approvals from the institutional review boards of each participating hospital. Patient consent was waived due to the retrospective nature of the study. Gadoxetic acid–enhanced MR images obtained from center 1 between January 2015 and December 2021 were used for AI model development. Data for the external testing dataset were collected from center 2 between January 2015 and December 2017 and centers 3–5 between May 2020 and May 2021. Patients were included if they fulfilled the following criteria: were at least 18 years old, were at high risk for HCC per the LI-RADS criteria (current or prior HCC, cirrhosis, or chronic hepatitis B), and underwent gadoxetic acid–enhanced MRI of at least one solid focal liver lesion (FLL). The exclusion criteria were as follows: chronic liver disease due to vascular disorders or cirrhosis due to congenital hepatic fibrosis, since LI-RADS is not applicable to these conditions; lack of definite diagnosis (ie, immediate local-regional treatment or systemic therapy without pathologic confirmation, histopathologic diagnosis more than 2 months after MRI examination, inconclusive histopathologic diagnosis due to inadequate biopsy sample, or insufficient follow-up [<2 years] to determine size stability); and inadequate image quality (ie, severe motion artifacts occurred frequently on arterial phase [AP]) (Fig 1). We reviewed medical records of each patient included in the study to extract the following data: age, sex, HCC risk factors, α-fetoprotein level, and the required laboratory data to calculate the Child-Pugh score.

Flowchart of patient inclusion and exclusion. Patients with hepatocellular carcinoma (HCC) coexisting with non-HCCs, lesions with histologic evidence, or hemangioma and focal nodular hyperplasia with typical imaging features would also be included for artificial intelligence model development and validation. The goal of the deep learning model was to discriminate between HCCs and non-HCCs. LI-RADS = Liver Imaging Reporting and Data System.

MRI Protocol

MRI scans were acquired according to the LI-RADS guidelines (5). Gadoxetic acid (Primovist; Bayer Schering Pharma) was injected intravenously as a bolus at a flow rate of 1 mL/sec, followed by a 20-mL saline flush. Late AP (20–40 seconds), portal venous phase (PVP, 50–75 seconds), transitional phase (2–5 minutes), and hepatobiliary phase (HBP, 20 minutes) were acquired after contrast material injection according to LI-RADS version 2018 criteria. Details of the MRI scan sequences and parameters are summarized in Table S1.

LI-RADS Scoring

One author (H.C., with more than 5 years of experience in abdominal imaging) not involved in image analysis documented individual FLLs, noted their location and size, and created a list for further assessment. Readers 1 and 2 (X.C. and Y.X., with 8 and 4 years of experience in abdominal imaging, respectively) were then invited to analyze the images independently. They assessed imaging features (major, ancillary, and targetoid) and assigned LI-RADS version 2018 categories to each FLL, all while remaining blind to imaging reports, pathologic results, and clinical data. In cases of disagreement, consensus was reached with a third reader (Jingbiao Chen, with 12 years of experience in liver MRI) to resolve discrepancies.

Image Processing

MR images were exported from datasets of the five hospitals in Digital Imaging and Communications in Medicine format. A registration process, comprising affine registration followed by B-spline registration via a previously developed algorithm based on Elastix (12,13), with a target registration error of approximately 3 mm, was applied to align precontrast T1-weighted, T2-weighted, AP, and PVP images to the HBP images. One author (X.Z., with more than 10 years of experience in hepatic MRI), blinded to all the clinical and pathologic findings, performed image annotation with reference to the list displaying the anatomic location and size of all the target nodules (n = 1023). The image annotation process involved manually labeling a bounding box on the section where the tumor diameter was largest on the clearest sequence. ITK-SNAP software, version 3.8.0 (http://www.itksnap.org), was used to define the starting and ending sections, constructing a three-dimensional (3D) bounding box. Five 3D bounding boxes were obtained for each lesion after the affine registration. The radiologist also reviewed and made manual corrections to ensure proper alignment of the boxes in all phases (Fig 2). The area defined by the bounding boxes created by X.Z. was used for development of the AI model. The 3D bounding boxes were resized to n × 256 × 256 pixels, where n represents the number of layers. A series of intensity projections, including maximum, minimum, and average intensity projections, was applied for each phase from a top-down perspective. Apart from the features extracted from the images processed with maximum, minimum, and average intensity projections, we also extracted the middle section’s grayscale values within the 3D bounding box (constructed from the defined starting and ending sections) and computed the subtraction derived from the maximum and minimum intensity projections as features (Fig S1). These five features were stacked to form the 5 × 256 × 256 input features, which were augmented to generate a larger, more complicated and diverse dataset through standard image augmentation, such as random rotation (0°–360°), scaling (0.8 times–1.2 times), and flipping (ratio 0–0.1). These common augmentation techniques help increase the variability of the dataset and reduce overfitting by simulating real-world variability in lesion orientation, size, and positioning (22).

Development and evaluation of the deep learning framework. (A) Process of images annotation. Manual labeling of the bounding box on the largest layer of tumor appearance on the clearest sequence (usually hepatobiliary phase), then input starting and ending layers to build a three-dimensional bounding box. After replicating on the other four registered phases and manual checking whether boxes were correctly aligned, five three-dimensional bounding boxes for each lesion were generated. (B) Deep learning framework. For each lesion, raw data within three-dimensional bounding boxes were processed with a series of vertical intensity projection, including maximum intensity projection, minimum intensity projection, average intensity projection, and the subtraction between the maximum and minimum intensity projections. Using Resnet 50 (Linux Foundation), the median layer and the processed intensity projection data from each phase were included for the hepatocellular carcinoma (HCC) diagnosis–oriented artificial intelligence (AI) model development. To determine an optimal combination of the five phases, AI models using different phases were compared with consensus reading of an expert reader. Model A represents an AI model with arterial phase and portal venous phase images. Model B represents an AI model with arterial phase, portal venous phase, and hepatobiliary phase images. Model C represents an AI model with precontrast T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), arterial phase, portal venous phase, and hepatobiliary phase images. Diagnostic results by the Liver Imaging Reporting and Data System (LI-RADS) category 5 were used to represent the LI-RADS version 2018 performances. (C) Flowchart of the approach for lesion and feature classifier development and validation. — Development and evaluation of the deep learning framework. **(A)** Process of images annotation. Manual labeling of the bounding box on the largest layer of tumor appearance on the clearest sequence (usually hepatobiliary phase), then input starting and ending layers to build a three-dimensional bounding box. After replicating on the other four registered phases and manual checking whether boxes were correctly aligned, five three-dimensional bounding boxes for each lesion were generated. **(B)** Deep learning framework. For each lesion, raw data within three-dimensional bounding boxes were processed with a series of vertical intensity projection, including maximum intensity projection, minimum intensity projection, average intensity projection, and the subtraction between the maximum and minimum intensity projections. Using Resnet 50 (Linux Foundation), the median layer and the processed intensity projection data from each phase were included for the hepatocellular carcinoma (HCC) diagnosis–oriented artificial intelligence (AI) model development. To determine an optimal combination of the five phases, AI models using different phases were compared with consensus reading of an expert reader. Model A represents an AI model with arterial phase and portal venous phase images. Model B represents an AI model with arterial phase, portal venous phase, and hepatobiliary phase images. Model C represents an AI model with precontrast T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), arterial phase, portal venous phase, and hepatobiliary phase images. Diagnostic results by the Liver Imaging Reporting and Data System (LI-RADS) category 5 were used to represent the LI-RADS version 2018 performances. **(C)** Flowchart of the approach for lesion and feature classifier development and validation.

To further enhance the model generalizability, random discarded and mixup were applied (23,24). Random discard randomly selects and sets 0%–60% of consecutive volumes within each 5 × 256 × 256 tensor to zero, meaning that a portion of the tensor ranging from 0% to 60% has its pixel values set to zero. For more information, see Appendix S1.

Lesion Classifier Training and Validation

Our DL algorithm was designed to distinguish HCC from non-HCC. The input to the model is a four-dimensional dataset (specifically five phases × five feature extraction methods × image height × image width), for which each data point consists of 5 × 256 × 256 features derived from a single phase of MRI scans. These features are generated from five different intensity projection techniques applied to each phase: precontrast T1-weighted imaging, T2-weighted imaging, AP, PVP, and HBP. Each phase thus contributes a five-feature set, resulting in a four-dimensional input structure that captures the multifaceted characteristics of the lesions across different phases. The two-dimensional ResNet-50 model was adapted to process these four-dimensional inputs by applying the network to each 1 × 256 × 256 section individually within the five-feature set of a single phase. This approach allows for the extraction of image features from each section while reducing the impact of gradient vanishing through the use of residual learning blocks. The model was trained on the datasets from center 1 using 10-fold cross-validation. The model with the smallest validation error was chosen as the final model for each fold, with 10 total models selected. Each iteration of the 10-fold cross-validation was evaluated by averaging the test accuracy across folds at the end of each iteration. The average area under the receiver operating characteristic curve (AUC) and average sensitivity, specificity, and accuracy with a threshold probability of 0.5 were calculated as performance metrics for the training set. These models were tested on the external testing dataset by a simple voting ensemble. To determine an optimal combination of the five phases, AI models using different phases were compared with radiologists’ performance after consensus reading (Fig 2). LR-5 classifications by consensus reading were used to represent the LI-RADS version 2018 performances. The best AI model would be selected for further analysis. For more information, see Appendix S1. The code used for training and inference of the DL systems is available online (https://github.com/Huatsing-Lau/LiverIPNet).

Feature Classifier Training and Validation

With reference to the prior workflow (25), selected images with annotated features were fed to the pretrained lesion classifier for feature identification with probabilistic inference. Five imaging features (nonrim AP hyperenhancement [APHE], washout, enhancing capsule, mild to moderate T2 hyperintensity, and HBP hypointensity) in the LI-RADS lexicon were selected, comprising lesion imaging characteristics that are observable on gadoxetic acid–enhanced MRI scans, which were commonly used in day-to-day radiologic practice. For feature classifier development, a subset of 60 FLLs was assembled, with the aim of each feature being represented in at least 15 cases (Table S2). These five features were also annotated by the consensus reader in the external testing dataset. For more information, see Appendix S1.

Evaluation of the AI Tool

We tested the selected lesion classifier in two stages. First, we evaluated the AI model’s performance on FLLs of different sizes, various LI-RADS categories, and specific lesion types. Second, we assessed how the model improved the diagnostic performance of individual radiologists in the external testing dataset. Two radiologists (X.C. and Y.X.), responsible for LI-RADS scoring and unaware of clinical data, annotated all the FLLs in the external testing dataset in the first round. Before the second round of AI-assisted diagnosis, we informed the radiologists of their diagnostic performance as compared with AI. For cases in which there were discrepancies between AI and radiologist diagnoses (specifically, AI predicting HCC diagnosis while humans assigned non–LR-5 categories, or AI predicting non-HCC diagnosis while humans assigned LR-5 categories), the same readers conducted AI-assisted diagnoses 1 month after LI-RADS scoring to prevent recall bias. During AI-assisted readings, radiologists were provided with bounding boxes overlaid on images, including a diagnosis of HCC or non-HCC with estimated probability. They also had access to feature identification through probabilistic inference (Fig 3). We kept records of the initial and final assisted diagnoses for further analysis.

Overview of the artificial intelligence (AI)–assisted workflow. HCC = hepatocellular carcinoma, LI-RADS = Liver Imaging Reporting and Data System.

Reference Standard

Histopathologic examinations determined the diagnosis of each malignant lesion, including both HCC and non-HCC malignancies. An experienced liver pathologist (Jianning Chen, with more than 15 years of experience) verified the histologic diagnosis. Benign lesions were diagnosed by pathology-proven diagnosis or composite clinical reference standard diagnosis, defined as the use of imaging criteria and imaging follow-up (stability or regression for at least 2 years) to establish the diagnosis.

Statistical Analysis

Categorical variables were compared using the χ² test or Fisher exact test, or the McNemar test for paired categorical data, as appropriate. For continuous variables, the Student t test was used if they followed a normal distribution, while the Mann-Whitney U test was used for nonnormally distributed data or ranked data (Child-Pugh classification). For comparisons involving more than two groups, the Kruskal-Wallis H test was applied. These analyses were performed using SPSS, version 16.0 (IBM). The roc function in R package pROC (version 4.2.1; R Foundation) was used to perform receiver operating characteristic curve analyses. The accuracy, sensitivity, and specificity of the AI diagnosis for HCC were also determined, with 95% CIs reported for each measure. The DeLong test was applied to compare the AUC values of AI models using different phases. In addition, Brier scores were reported, summing the magnitude of error in probability forecasts between 0.0 and 1.0, in which a model with perfect calibration would score 0 (26).

Interreader agreement of the major imaging features and LI-RADS category assignments were assessed using Cohen κ statistic. κ values of 0.01–0.20 indicated slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and 0.81–0.99, excellent agreement. The McNemar test or Fisher exact test was performed to compare the sensitivity and specificity between AI models using different phases and LR-5 as well as between the two independent readers alone versus the AI-assisted strategy. P < .05 indicated statistical significance.

Results

Patient Characteristics

Among the 1960 adult patients at high risk for HCC, 1121 were excluded (Fig 1). A total of 839 patients (mean age, 51 years ± 12 [SD]; 681 male, 158 female) were included. The clinical characteristics of patients and FLLs are summarized in Table 1. No evidence of differences in age, sex, hepatitis B virus infection, presence of liver cirrhosis, Child-Pugh classification, and α-fetoprotein level was found between the training set and external testing dataset. The lesion size and final diagnoses were similarly distributed between the training and testing dataset. In the training set, 475 FLLs were diagnosed with HCC, and 354 FLLs were non-HCC. In the external test set, 119 FLLs were diagnosed with HCC, and 75 FLLs were non-HCC. All the malignancies were diagnosed with histopathologic examination. Among 384 benign FLLs, 140 FLLs (119 hemangiomas and 21 focal nodular hyperplasias) were diagnosed by clinical composite reference standard, while the remaining 244 FLLs were pathologically confirmed. Details of the FLLs in the training set and external testing dataset are summarized in Table S3.

Table 1:

Characteristics of Patients and Lesions in Training and External Test Sets

Open in a new tab

Interreader Reliability

Of the four major imaging features, there was substantial interreader agreement for identification of nonrim APHE (κ = 0.787), non-peripheral washout (κ = 0.783), and enhancing capsule (κ = 0.730). The feature “threshold growth” was not included because prior examinations were not available. Substantial agreement was also observed between the two reviewers for LR-5 classification (κ = 0.803) (Table S4).

Diagnostic Performances of the AI Models with Precontrast T1-weighted Imaging, T2-weighted Imaging, AP, PVP, and HBP

Among models using different phases, the AI model using precontrast T1-weighted imaging, T2-weighted imaging, AP, PVP, and HBP (model C) achieved the highest AUC (0.97), followed by the AI model using AP, PVP, and HBP (model B: AUC, 0.95) and the AI model using AP and PVP (model A: AUC, 0.91) (Fig 4). The DeLong test showed that the AUCs of the AI model with five phases was significantly higher than the other two models (model C vs model A, P < .001; model C vs model B, P = .003). Model C had higher sensitivity (91.6% [109 of 119; 95% CI: 85.2, 95.4] vs 84.9% [101 of 119; 95% CI: 77.4, 90.2]; P = .04), specificity (90.7% [68 of 75; 95% CI: 82.0, 95.4] vs 76.0% [57 of 75; 95% CI: 65.2, 84.3]; P < .001), and precision (94.0% [109 of 116; 95% CI: 88.1, 97.1] vs 84.9% [101 of 119; 95% CI: 77.4, 90.2]; P = .02) compared with model A while maintaining comparable sensitivity (91.6% [109 of 119; 95% CI: 85.2, 95.4] vs 90.8% [108 of 119; 95% CI: 84.2, 94.8]; P > .99), specificity (90.7% [68 of 75; 95% CI: 82.0, 95.4] vs 84.0% [63 of 75; 95% CI: 74.1, 90.6]; P = .08), and precision (94.0% [109 of 116; 95% CI: 88.1, 97.1] vs 90.0% [108 of 120; 95% CI: 83.3, 94.2]; P = .26) to model B. Only the AI model using precontrast T1-weighted imaging, T2-weighted imaging, AP, PVP, and HBP (model C) displayed higher sensitivity (91.6% [109 of 119; 95% CI: 85.2, 95.4] vs 74.8% [89 of 119; 95% CI: 66.3, 81.7]; P < .001) compared with LR-5 consensus data, with similar specificity (90.7% [68 of 75; 95% CI: 82.0, 95.4] vs 96.0% [72 of 75; 95% CI: 88.9, 98.6]; P = .22) (Table S5). As a result, we selected the AI model using five phases for further analysis.

Graph of diagnostic performance of the artificial intelligence (AI) model with different imaging phases and radiologists for hepatocellular carcinoma diagnosis. Model A represents the AI model with arterial phase and portal venous phase images. Model B represents the AI model with arterial phase, portal venous phase, and hepatobiliary phase images. Model C represents the AI model with precontrast T1-weighted, T2-weighted, arterial phase, portal venous phase, and hepatobiliary phase images. AUC = area under the receiver operating characteristic curve, LR-5 = Liver Imaging Reporting and Data System category 5. Dotted line indicates performance of a classifier, serving as a reference line with an AUC of 0.50.

Diagnostic Performances of the AI Tool with Five Phases

The five phases–based AI tool consists of a lesion classifier and a feature classifier. The diagnostic performance of the lesion classifier with five phases in each external test set is summarized in Figure S2. Confusion matrices present the number of correct and incorrect predictions in each center. The lesion classifier yielded excellent AUCs in each external test center. The calibration curves are provided in Figure S3, with Brier scores ranging from 0.03 to 0.08. The lesion classifier yielded excellent diagnostic capacity irrespective of underlying cirrhosis or FLL size (<2 cm vs ≥2 cm). The lesion classifier also maintained robust diagnostic performance across various patient demographics, including different age strata (≤50 years vs >50 years), sex, hepatitis B virus infection status, and α-fetoprotein levels (≤100 μg/L vs >100 μg/L) (Table S6). Of the false-positive cases, 100% (seven of seven) of lesions were present with mild to moderate T2 hyperintensity, compared with 35% (24 of 68) in true-negative cases. Similar lesion size, Child-Pugh classification, and LI-RADS version 2018 major imaging features between false-positive and true-negative cases were observed. There was no evidence of differences in lesion and imaging characteristics between false-negative and true-positive cases (Table S7). No evidence of a difference in AUCs was observed based on different bounding boxes individually placed by different radiologists (Fig S4), indicating that the imaging process did not affect the results.

Compared with the standard of consensus reading in 194 FLLs annotated with imaging features in the external testing dataset, the feature classifier identified nonrim APHE, washout, enhancing capsule, mild to moderate T2 hyperintensity, and HBP hypointensity with accuracies of 76.3% (148 of 194; 95% CI: 69.8, 81.7), 83.5% (162 of 194; 95% CI: 77.7, 88.1), 75.3% (146 of 194; 95% CI: 68.7, 80.8), 89.7% (174 of 194; 95% CI: 84.6, 93.2), and 80.4% (156 of 194; 95% CI: 74.3, 85.4), respectively (Table S2).

Comparison of Diagnoses Using the LI-RADS version 2018 Categories and Lesion Classifier with Final Diagnoses

Table 2 presents the comparison of the diagnoses performed using the AI model and LI-RADS version 2018 categories with the final diagnoses. In the classification of HCC cases, most (74.8%; 89 of 119) were categorized as LR-5, with smaller proportions in LR-3, LR-4, and LR-M (other malignancy) categories. In the non-HCC group, most (49%; 37 of 75) were classified as LR-1/2, followed by LR-3, LR-4, LR-M, and LR-5. For HCC diagnosis, the AI model achieved similar specificities compared with the LR-5, LR-4/5, and LR-5/M but demonstrated a trend toward higher sensitivities and accuracies, though some differences were not statistically significant (Table 3).

Table 2:

Comparison of Diagnoses Using the AI Model and LI-RADS Version 2018 Categories with Final Diagnoses

Open in a new tab

Table 3:

Diagnostic Performance for Hepatocellular Carcinoma of Artificial Intelligence Model and LI-RADS Scoring Using Consensus Data

Open in a new tab

Among 116 FLLs that were classified as HCC by the AI model, 109 FLLs were HCCs, while the other seven FLLs were misclassified: one combined HCC and intrahepatic cholangiocarcinoma, three intrahepatic cholangiocarcinomas, one angiomyolipoma, and two dysplastic nodules. Other than angiomyolipoma and dysplastic nodule, none of the benign FLLs were classified as HCC.

Among LR-1/2, LR-3, LR-4, and LR-5 categories, higher categories indicate higher certainty of HCC diagnosis. The AI predictive probability of HCC increased in a stepwise manner across the above LI-RADS categories (Fig 5A). Kruskal-Wallis test showed a significant positive association of the LI-RADS categories with improved classification accuracy in HCC lesions (P < .001) and reduced classification accuracy in non-HCCs (P < .001) (Fig 5B). Among LR-M categories, two of 13 (15%) HCCs were missed by AI, while three of seven (43%) intrahepatic cholangiocarcinomas were misclassified by the AI model (Fig 5C).

Graphs of diagnostic performance of artificial intelligence (AI) model in different Liver Imaging Reporting and Data System (LI-RADS) version 2018 categories. (A) Association between AI model predictive probability of hepatocellular carcinoma (HCC) and LI-RADS version 2018 categories. (B) Diagnostic performance of AI model in LR-1/2, LR-3, LR-4, and LR-5. (C) Diagnostic performance of AI model in LR-M (other malignancy). FLL = focal liver lesion. — Graphs of diagnostic performance of artificial intelligence (AI) model in different Liver Imaging Reporting and Data System (LI-RADS) version 2018 categories. **(A)** Association between AI model predictive probability of hepatocellular carcinoma (HCC) and LI-RADS version 2018 categories. **(B)** Diagnostic performance of AI model in LR-1/2, LR-3, LR-4, and LR-5. **(C)** Diagnostic performance of AI model in LR-M (other malignancy). FLL = focal liver lesion.

Evaluation of the AI-assisted Strategy

Classification results of LI-RADS version 2018 for individual readers are listed in Table S8. For the cases with discrepancies between AI and radiologists’ diagnoses, AI-assisted diagnosis was conducted by the same individual readers. The readers could accept the model predictions or retain their previous judgments based on LI-RADS version 2018. In terms of LR-5 diagnosis, AI-assisted strategy improved the sensitivities of the readers 1 and 2 from 72.3% (86 of 119; 95% CI: 63.6, 79.5) and 74.0% (88 of 119; 95% CI: 65.4, 81.0) to 85.7% (102 of 119; 95% CI: 78.3, 90.9) and 89.1% (106 of 119; 95% CI: 82.2, 93.5) (P < .001), respectively. Readers 1 and 2 had the same specificity of 93.3% (70 of 75; 95% CI: 85.3, 97.1) and 94.7% (71 of 75; 95% CI: 87.1, 97.9), respectively, with and without assistance of the AI model (Table 4).

Table 4:

Diagnostic Performance of Individual Radiologists and AI-assisted Radiologists When Categorizing LR-5 in the External Testing Set

Open in a new tab

The feature classifier was able to capture subtle intensity alterations within each 3D bounding box and identified the LI-RADS imaging features after accounting for the similarity between the annotated features fed to the pretrained lesion classifier. When classified as LR-3, both readers retained their original judgments in 100% (three of three) of cases due to the lack of high probabilities of HCC or major features indicated. For LR-4 classifications, the two readers accepted the AI predictions of HCC in 77% (10 of 13) and 82% (14 of 17) of cases, respectively, where high probabilities of lesion diagnosis and imaging features were suggested by the AI tool (Table S9). When lesions were classified as LR-M with rim APHE, the two readers accepted the AI predictions of HCC in 50% (six of 12) and 40% (four of 10) of cases due to the AI indicating high probabilities of nonrim APHE, suggesting potential heterogeneous enhancement (Fig 6). The AI tool helped the two readers identify an additional 18 imaging features (reader 1: seven HCCs with atypical APHE, nine HCCs with unremarkable PVP washout, and two HCCs with enhancing capsule; reader 2: eight HCCs with atypical APHE and 10 HCCs with unremarkable PVP washout). Being aware of the high specificity of LR-5, none of the readers changed the LR-5 diagnoses.

Example of the AI model predicting a Liver Imaging Reporting and Data System (LI-RADS) other malignancy category (LR-M) focal liver lesion (FLL). (A–F) Axial gadoxetic acid–enhanced MRI scans show a 16-mm FLL in a 39-year-old male patient. (A) The FLL was seen as hypointensity on precontrast T1-weighted image (T1WI). (B) Mild to moderate T2 hyperintensity was noted. There was (C) rim arterial phase hyperenhancement (APHE), (D) nonperipheral washout at portal venous phase (PVP), and (D–E) enhancing capsule at PVP and transitional phase (TP). Ancillary features such as (E) TP hypointensity (arrow) and (F) hepatobiliary phase (HBP) hypointensity were noted. Since rim APHE could not be confirmed by reader 1, this FLL was categorized into LR-M with use of the original LI-RADS version 2018. After accessing five three-dimensional (3D) bounding boxes for each lesion, the AI model applied a series of intensity projections (IP), including maximum IP (MAIP), minimum IP (MIIP), and average IP (AVIP), to each 3D bounding box obtained from precontrast T1-weighted imaging, T2-weighted imaging (T2WI), AP, PVP, and HBP. Additionally, the median layer of the phase and the difference between the MAIP and MIIP were calculated as feature values. As a result, it was recategorized as hepatocellular carcinoma (HCC) by the lesion classifier with a probability of 0.9992. The heatmap of the deep learning model analyzing the hepatic nodule indicates the contribution of the corresponding area to the model prediction (1.00 indicates the region with the most important contribution, while 0 indicates no contribution). In this case, the feature classifier identified nonrim APHE with a probability of 0.978, and heterogeneous enhancement (arrows) was noted after reviewing the (G) coronal precontrast T1-weighted images and (H) coronal AP images. The presumptive diagnoses were changed to definite HCC. After surgical resection, it was confirmed as HCC. — Example of the AI model predicting a Liver Imaging Reporting and Data System (LI-RADS) other malignancy category (LR-M) focal liver lesion (FLL). **(A–F)** Axial gadoxetic acid–enhanced MRI scans show a 16-mm FLL in a 39-year-old male patient. **(A)** The FLL was seen as hypointensity on precontrast T1-weighted image (T1WI). **(B)** Mild to moderate T2 hyperintensity was noted. There was **(C)** rim arterial phase hyperenhancement (APHE), **(D)** nonperipheral washout at portal venous phase (PVP), and **(D–E)** enhancing capsule at PVP and transitional phase (TP). Ancillary features such as **(E)** TP hypointensity (arrow) and **(F)** hepatobiliary phase (HBP) hypointensity were noted. Since rim APHE could not be confirmed by reader 1, this FLL was categorized into LR-M with use of the original LI-RADS version 2018. After accessing five three-dimensional (3D) bounding boxes for each lesion, the AI model applied a series of intensity projections (IP), including maximum IP (MAIP), minimum IP (MIIP), and average IP (AVIP), to each 3D bounding box obtained from precontrast T1-weighted imaging, T2-weighted imaging (T2WI), AP, PVP, and HBP. Additionally, the median layer of the phase and the difference between the MAIP and MIIP were calculated as feature values. As a result, it was recategorized as hepatocellular carcinoma (HCC) by the lesion classifier with a probability of 0.9992. The heatmap of the deep learning model analyzing the hepatic nodule indicates the contribution of the corresponding area to the model prediction (1.00 indicates the region with the most important contribution, while 0 indicates no contribution). In this case, the feature classifier identified nonrim APHE with a probability of 0.978, and heterogeneous enhancement (arrows) was noted after reviewing the **(G)** coronal precontrast T1-weighted images and **(H)** coronal AP images. The presumptive diagnoses were changed to definite HCC. After surgical resection, it was confirmed as HCC.

Discussion

This study explored the performance of an AI model based on gadoxetic acid–enhanced images for HCC and non-HCC classification. The DL-based diagnosis model had a higher sensitivity (91.6% vs 74.8%; P < .001) and a similar specificity (90.7% vs 96.0%; P = .22) to LR-5. Our study also demonstrated that the AI-assisted strategy achieved higher sensitivity (reader 1: 85.7% vs 72.3%; reader 2: 89.1% vs 74.0%; both P < .001) for definite HCC diagnosis beyond LI-RADS version 2018 without compromising specificity, providing the opportunity for prompt treatment.

Several studies have shown promise for classifying FLLs using DL approaches. A previously proposed model was able to generate diagnostic evaluation of specific subtypes with typical features on contrast-enhanced MRI scans, with an accuracy of approximately 90%. However, assessment of FLLs with typical features is not well representative of current clinical practice. Another study developed an AI model capable of distinguishing HCC versus non-HCC with indeterminate lesions. Despite the relatively small sample size from a single center, the results were promising. In contrast to these two previous studies, the current study used multicenter data and a wider spectrum of FLLs in patients at high risk for HCC to develop an AI model for HCC diagnosis using a liver-specific MRI contrast agent. To determine an optimal combination of different sequences, we compared the AI models with different sequences with LI-RADS version 2018. Ultimately, the AI model using precontrast T1-weighted imaging, T2-weighted imaging, AP, PVP, and HBP achieved the highest AUCs, and this model displayed higher sensitivity with similar specificity as LR-5 (Table 3), indicating additional information from precontrast T1-weighted imaging, T2-weighted imaging, and HBP is applicable to assist in HCC diagnosis. Precontrast T1-weighted imaging and T2-weighted imaging provide identification of different tissue components, including the fibrotic changes, water and fat contents, the entity of vascularization, and metabolites (27). HBP images provide additional information about the organic anionic transporting polypeptide, a marker correlated with hepatocarcinogenesis (4). Some imaging features identified in these sequences contribute to the final diagnosis. According to the LI-RADS algorithm, examples of LR-2 (probably benign) include otherwise unremarkable T1 hyperintense nodules, T2 hypointense nodules, and hyperintense nodules observed during HBP. Due to the vascular nature of hemangiomas, T2-weighted images usually show marked hyperintensity (28). Mild to moderate T2 hyperintensity is an ancillary feature listed in the LI-RADS lexicon, supporting the diagnosis of malignancies. This feature is regarded as one of the predictors, indicating the malignant transformation of hypovascular nodules (29–31). As reported by previous studies, more than 90% of observed focal nodular hyperplasias are iso- or hyperintense during HBP (32–34). A comprehensive evaluation of the observation on AP, PVP, and HBP may enable differentiation of dysplastic nodule and early HCCs, as the signal intensities of the lesion on these sequences reflect the dynamic change of unpaired arterioles, venous drainage, and organic anionic transporting polypeptide expression during hepatocarcinogenesis (35).

In the current study, the AI model accurately diagnosed HCC in patients at high risk, with an AUC of 0.97 in the external testing dataset. One of the major factors contributing to the better performance of the AI model is the integration of multiple MRI sequences, including precontrast T1-weighted imaging, T2-weighted imaging, AP, PVP, and HBP. These sequences provide complementary information, enabling the AI model to make more informed decisions (27–35). Another key strength of this AI model is its training on a large dataset encompassing a wide range of lesions and various scanner types (Tables S1–S2), thereby enhancing the model’s generalizability.

Currently, the broad adoption of the LI-RADS algorithm is limited by the subjectivity and complexity of the LI-RADS scoring using both major and ancillary features (36,37). Similar to previous studies (38,39), our study showed substantial interreader agreement for identification of major imaging features supporting the diagnosis of LR-5. A DL-based approach enables more objective evaluations of medical images. The AI-assisted strategy used in this study allowed individual readers to make a judgment based on LI-RADS categories (ordinal categories) and probabilities of HCC provided by AI (continuous numbers). High probabilities indicated by AI could enhance the readers’ certainty to identify HCC. Meanwhile, the retained LR-5 diagnosis ensured high specificity. Combined AI and human reading may permit the optimal diagnosis. However, the successful integration of AI in clinical practice requires clinician training and high development costs for a more user-friendly software. These challenges may affect the adoption of AI tools in routine radiology workflows, but their potential to enhance diagnostic accuracy and efficiency remains promising.

Similar to a previous study (13), classic-appearing FLLs were generally classified with high accuracy. The misclassification of intrahepatic cholangiocarcinoma and combined HCC and intrahepatic cholangiocarcinoma is likely related to the small proportion of these lesion types (4.0%; 33 of 829) in the training set (Table S3). Since patients at high risk for HCC per the LI-RADS criteria were included, a large proportion of patients with non-HCC malignancies were not eligible due to the absence of chronic hepatitis B or cirrhosis. Moreover, transitional phase and diffusion-weighted imaging were not included in model training. Delayed central enhancement and targetoid restriction contribute to the LR-M categorization. Additionally, the external testing sample in this study did not include metastatic liver lesions, so the model’s ability to distinguish metastatic liver lesions from HCC was not evaluated. Future research should incorporate metastatic liver lesions to further assess the model’s performance.

This study had limitations. First, the retrospective design introduces the risk of selection bias. In the AI-assisted workflow, the second reads were performed only for cases with disagreement between the initial assessments, rather than for a mix of cases. Although this approach aimed to address the most ambiguous cases, it may have introduced a potential source of bias. Although efforts were made to minimize recall bias, such as incorporating a 1-month interval, future studies with larger datasets and a protocol including second reads for a broader mix of cases are warranted to enhance the robustness of our findings. Second, we did not include transitional phase images in the model training due to the different scanning times between different centers (5 minutes after injection in center 1 and 3 minutes after injection in other centers). However, transitional phase hypointensity may be misinterpreted as it reflects the hyperenhancement of the background liver and the reduction of transporters for gadoxetic acid rather than the real washout (40). Third, the ability of gadoxetic acid to enhance the liver and characterize FLLs would be compromised in patients with severe iron overload (41), posing a challenge for image annotation on HBP. Fourth, this study includes only patients at high risk for HCC. It is unclear whether the model’s performance would be similarly high in a general population with a lower prevalence of HCC. Although our AI model demonstrated promising performance in HCC diagnosis, its generalizability to Western populations and other patient groups with differing risk factors remains uncertain; additional validation using datasets from diverse demographics and MRI protocols is necessary to ensure the model’s robustness and applicability across different clinical settings. Finally, a composite clinical reference standard was applied for the determination of benignity. However, since a biopsy is not always available for highly suspected benign lesions, determining the final diagnosis of hemangioma and focal nodular hyperplasia with the clinical composite reference standard better represents clinical practice.

In conclusion, our results show that a DL-based model allows accurate diagnosis of HCC on gadoxetic acid–enhanced MRI scans. Moreover, readers showed improved sensitivity, without evidence of a difference in specificity, for HCC diagnosis when assisted by AI. These results suggest that the AI-assisted strategy may facilitate prompt interventions for HCC. Larger studies with more diverse datasets are needed to validate these findings and further assess model robustness.

Acknowledgments

We thank all other staff (Jianwen Li, MD, and Jie Zhu, MD) from the Department of Radiology and the Department of Gastroenterology, The Third Affiliated Hospital of Sun Yat-sen University, for their support and assistance.

M.L. and Z.Z. contributed equally to this work.

^**

J.W. and B.W. are co–senior authors.

Funding: This study was funded by grants from the National Natural Science Foundation of China (nos. 82070574, U1501224), Natural Science Foundation Team Project of Guangdong Province (no. 2018B030312009), and the Key Research and Development Program of Guangzhou City (no. 2023B03J1298).

Disclosures of conflicts of interest: M.L. No relevant relationships. Z.Z. No relevant relationships. Z.C. No relevant relationships. X.C. No relevant relationships. H.L. No relevant relationships. Y.X. No relevant relationships. H.C. No relevant relationships. X.Z. No relevant relationships. Jingbiao Chen No relevant relationships. Jianning Chen No relevant relationships. X.W. No relevant relationships. X.X. No relevant relationships. Z.Y. No relevant relationships. L.H. No relevant relationships. J.W. No relevant relationships. B.W. No relevant relationships.

Abbreviations:

AI: artificial intelligence
AP: arterial phase
APHE: arterial phase hyperenhancement
AUC: area under the receiver operating characteristic curve
DL: deep learning
FLL: focal liver lesion
HBP: hepatobiliary phase
HCC: hepatocellular carcinoma
LI-RADS: Liver Imaging Reporting and Data System
LR: Liver Imaging Reporting and Data System category
PVP: portal venous phase
3D: three-dimensional

References

1. Sung H , Ferlay J , Siegel RL , et al . Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries . CA Cancer J Clin 2021. ; 71 ( 3 ): 209 – 249 . [DOI] [PubMed] [Google Scholar]
2. Vogel A , Meyer T , Sapisochin G , Salem R , Saborowski A . Hepatocellular carcinoma . Lancet 2022. ; 400 ( 10360 ): 1345 – 1362 . [DOI] [PubMed] [Google Scholar]
3. Kim SH , Kim SH , Lee J , et al . Gadoxetic acid-enhanced MRI versus triple-phase MDCT for the preoperative detection of hepatocellular carcinoma . AJR Am J Roentgenol 2009. ; 192 ( 6 ): 1675 – 1681 . [DOI] [PubMed] [Google Scholar]
4. Sano K , Ichikawa T , Motosugi U , et al . Imaging study of early hepatocellular carcinoma: usefulness of gadoxetic acid-enhanced MR imaging . Radiology 2011. ; 261 ( 3 ): 834 – 844 . [DOI] [PubMed] [Google Scholar]
5. Chernyak V , Fowler KJ , Kamaya A , et al . Liver Imaging Reporting and Data System (LI-RADS) Version 2018: imaging of Hepatocellular Carcinoma in At-Risk Patients . Radiology 2018. ; 289 ( 3 ): 816 – 830 . [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Marrero JA , Kulik LM , Sirlin CB , et al . Diagnosis, Staging, and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases . Hepatology 2018. ; 68 ( 2 ): 723 – 750 . [DOI] [PubMed] [Google Scholar]
7. Rimola J , Sapena V , Brancatelli G , et al . Reliability of extracellular contrast versus gadoxetic acid in assessing small liver lesions using liver imaging reporting and data system v.2018 and European association for the study of the liver criteria . Hepatology 2022. ; 76 ( 5 ): 1318 – 1328 . [DOI] [PubMed] [Google Scholar]
8. Seehofer D , Öllinger R , Denecke T , et al . Blood Transfusions and Tumor Biopsy May Increase HCC Recurrence Rates after Liver Transplantation . J Transplant 2017. ; 2017 : 9731095 . [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Liu HF , Wang M , Lu YJ , et al . CEMRI-Based Quantification of Intratumoral Heterogeneity for Predicting Aggressive Characteristics of Hepatocellular Carcinoma Using Habitat Analysis: comparison and Combination of Deep Learning . Acad Radiol 2024. ; 31 ( 6 ): 2346 – 2355 . [DOI] [PubMed] [Google Scholar]
10. Xu Y , Zhou C , He X , et al . Deep learning-assisted LI-RADS grading and distinguishing hepatocellular carcinoma (HCC) from non-HCC based on multiphase CT: a two-center study . Eur Radiol 2023. ; 33 ( 12 ): 8879 – 8888 . [DOI] [PubMed] [Google Scholar]
11. Yasaka K , Akai H , Abe O , Kiryu S . Deep Learning with Convolutional Neural Network for Differentiation of Liver Masses at Dynamic Contrast-enhanced CT: a Preliminary Study . Radiology 2018. ; 286 ( 3 ): 887 – 896 . [DOI] [PubMed] [Google Scholar]
12. Hamm CA , Wang CJ , Savic LJ , et al . Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI . Eur Radiol 2019. ; 29 ( 7 ): 3338 – 3347 . [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Oestmann PM , Wang CJ , Savic LJ , et al . Deep learning-assisted differentiation of pathologically proven atypical and typical hepatocellular carcinoma (HCC) versus non-HCC on contrast-enhanced MRI of the liver . Eur Radiol 2021. ; 31 ( 7 ): 4981 – 4990 . [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Liu X , Khalvati F , Namdar K , et al . Can machine learning radiomics provide pre-operative differentiation of combined hepatocellular cholangiocarcinoma from hepatocellular carcinoma and cholangiocarcinoma to inform optimal treatment planning? Eur Radiol 2021. ; 31 ( 1 ): 244 – 255 . [DOI] [PubMed] [Google Scholar]
15. Stollmayer R , Budai BK , Tóth A , et al . Diagnosis of focal liver lesions with deep learning-based multi-channel analysis of hepatocyte-specific contrast-enhanced magnetic resonance imaging . World J Gastroenterol 2021. ; 27 ( 35 ): 5978 – 5988 . [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Zhong X , Guan T , Tang D , et al . Differentiation of small (≤ 3 cm) hepatocellular carcinomas from benign nodules in cirrhotic liver: the added additive value of MRI-based radiomics analysis to LI-RADS version 2018 algorithm . BMC Gastroenterol 2021. ; 21 ( 1 ): 155 . [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Zhen SH , Cheng M , Tao YB , et al . Deep Learning for Accurate Diagnosis of Liver Tumor Based on Magnetic Resonance Imaging and Clinical Data . Front Oncol 2020. ; 10 : 680 . [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Jiang H , Liu X , Chen J , et al . Man or machine? Prospective comparison of the version 2018 EASL, LI-RADS criteria and a radiomics model to diagnose hepatocellular carcinoma . Cancer Imaging 2019. ; 19 ( 1 ): 84 . [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Wu J , Liu A , Cui J , Chen A , Song Q , Xie L . Radiomics-based classification of hepatocellular carcinoma and hepatic haemangioma on precontrast magnetic resonance images . BMC Med Imaging 2019. ; 19 ( 1 ): 23 . [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Jansen MJA , Kuijf HJ , Veldhuis WB , Wessels FJ , Viergever MA , Pluim JPW . Automatic classification of focal liver lesions based on MRI and risk factors . PLoS One 2019. ; 14 ( 5 ): e0217053 . [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Chatzipanagiotou OP , Loukas C , Vailas M , et al . Artificial intelligence in hepatocellular carcinoma diagnosis: a comprehensive review of current literature . J Gastroenterol Hepatol 2024. ; 39 ( 10 ): 1994 – 2005 . [DOI] [PubMed] [Google Scholar]
22. Tomita N , Abdollahi B , Wei J , Ren B , Suriawinata A , Hassanpour S . Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides . JAMA Netw Open 2019. ; 2 ( 11 ): e1914645 . [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Mei X , Lee HC , Diao KY , et al . Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 . Nat Med 2020. ; 26 ( 8 ): 1224 – 1228 . [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Zhang B , Liu X , Chen L , Zhu J . Convolution neural network toward Monte Carlo photon dose calculation in radiation therapy . Med Phys 2022. ; 49 ( 2 ): 1248 – 1261 . [DOI] [PubMed] [Google Scholar]
25. Wang CJ , Hamm CA , Savic LJ , et al . Deep learning for liver tumor diagnosis part II: convolutional neural network interpretation using radiologic imaging features . Eur Radiol 2019. ; 29 ( 7 ): 3348 – 3357 . [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Gentry-Maharaj A , Burnell M , Dilley J , et al . Serum HE4 and diagnosis of ovarian cancer in postmenopausal women with adnexal masses . Am J Obstet Gynecol 2020. ; 222 ( 1 ): 56.e51 – 56.e17 . [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Bartolozzi C , Battaglia V , Bozzi E . HCC diagnosis with liver-specific MRI–close to histopathology . Dig Dis 2009. ; 27 ( 2 ): 125 – 130 . [DOI] [PubMed] [Google Scholar]
28. Nie P , Wu J , Wang H , et al . Primary hepatic perivascular epithelioid cell tumors: imaging findings with histopathological correlation . Cancer Imaging 2019. ; 19 ( 1 ): 32 . [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Lee S , Kim SS , Bae H , Shin J , Yoon JK , Kim MJ . Application of Liver Imaging Reporting and Data System version 2018 ancillary features to upgrade from LR-4 to LR-5 on gadoxetic acid-enhanced MRI . Eur Radiol 2021. ; 31 ( 2 ): 855 – 863 . [DOI] [PubMed] [Google Scholar]
30. Hyodo T , Murakami T , Imai Y , et al . Hypovascular nodules in patients with chronic liver disease: risk factors for development of hypervascular hepatocellular carcinoma . Radiology 2013. ; 266 ( 2 ): 480 – 490 . [DOI] [PubMed] [Google Scholar]
31. Rhee H , Kim MJ , Park YN , Choi JS , Kim KS . Gadoxetic acid-enhanced MRI findings of early hepatocellular carcinoma as defined by new histologic criteria . J Magn Reson Imaging 2012. ; 35 ( 2 ): 393 – 398 . [DOI] [PubMed] [Google Scholar]
32. Purysko AS , Remer EM , Coppa CP , Obuchowski NA , Schneider E , Veniero JC . Characteristics and distinguishing features of hepatocellular adenoma and focal nodular hyperplasia on gadoxetate disodium-enhanced MRI . AJR Am J Roentgenol 2012. ; 198 ( 1 ): 115 – 123 . [DOI] [PubMed] [Google Scholar]
33. Gupta RT , Iseman CM , Leyendecker JR , Shyknevsky I , Merkle EM , Taouli B . Diagnosis of focal nodular hyperplasia with MRI: multicenter retrospective study comparing gadobenate dimeglumine to gadoxetate disodium . AJR Am J Roentgenol 2012. ; 199 ( 1 ): 35 – 43 . [DOI] [PubMed] [Google Scholar]
34. Grazioli L , Bondioni MP , Haradome H , et al . Hepatocellular adenoma and focal nodular hyperplasia: value of gadoxetic acid-enhanced MR imaging in differential diagnosis . Radiology 2012. ; 262 ( 2 ): 520 – 529 . [DOI] [PubMed] [Google Scholar]
35. Park HJ , Choi BI , Lee ES , Park SB , Lee JB . How to Differentiate Borderline Hepatic Nodules in Hepatocarcinogenesis: emphasis on Imaging Diagnosis . Liver Cancer 2017. ; 6 ( 3 ): 189 – 203 . [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Sirlin CB , Kielar AZ , Tang A , Bashir MR . LI-RADS: a glimpse into the future . Abdominal radiology (New York) 2018. ; 43 ( 1 ): 231 – 236 . [DOI] [PubMed] [Google Scholar]
37. Fowler KJ , Tang A , Santillan C , et al . Interreader Reliability of LI-RADS Version 2014 Algorithm and Imaging Features for Diagnosis of Hepatocellular Carcinoma: a Large International Multireader Study . Radiology 2018. ; 286 ( 1 ): 173 – 185 . [DOI] [PubMed] [Google Scholar]
38. Kwag M , Choi SH , Choi SJ , et al . Simplified LI-RADS for Hepatocellular Carcinoma Diagnosis at Gadoxetic Acid-enhanced MRI .[J.] Radiology 2022. ; 305 ( 3 ): 614 – 622 . [DOI] [PubMed] [Google Scholar]
39. Ehman EC , Behr SC , Umetsu SE , et al . Rate of observation and inter-observer agreement for LI-RADS major features at CT and MRI in 184 pathology proven hepatocellular carcinomas . Abdom Radiol (NY) 2016. ; 41 ( 5 ): 963 – 969 . [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Fowler KJ , Sirlin CB . Is It Time to Expand the Definition of Washout Appearance in LI-RADS? Radiology 2019. ; 291 ( 3 ): 658 – 659 . [DOI] [PubMed] [Google Scholar]
41. Roberts LR , Sirlin CB , Zaiem F , et al . Imaging for the diagnosis of hepatocellular carcinoma: a systematic review and meta-analysis . Hepatology 2018. ; 67 ( 1 ): 401 – 421 . [DOI] [PubMed] [Google Scholar]

[r1] 1. Sung H , Ferlay J , Siegel RL , et al . Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries . CA Cancer J Clin 2021. ; 71 ( 3 ): 209 – 249 . [DOI] [PubMed] [Google Scholar]

[r2] 2. Vogel A , Meyer T , Sapisochin G , Salem R , Saborowski A . Hepatocellular carcinoma . Lancet 2022. ; 400 ( 10360 ): 1345 – 1362 . [DOI] [PubMed] [Google Scholar]

[r3] 3. Kim SH , Kim SH , Lee J , et al . Gadoxetic acid-enhanced MRI versus triple-phase MDCT for the preoperative detection of hepatocellular carcinoma . AJR Am J Roentgenol 2009. ; 192 ( 6 ): 1675 – 1681 . [DOI] [PubMed] [Google Scholar]

[r4] 4. Sano K , Ichikawa T , Motosugi U , et al . Imaging study of early hepatocellular carcinoma: usefulness of gadoxetic acid-enhanced MR imaging . Radiology 2011. ; 261 ( 3 ): 834 – 844 . [DOI] [PubMed] [Google Scholar]

[r5] 5. Chernyak V , Fowler KJ , Kamaya A , et al . Liver Imaging Reporting and Data System (LI-RADS) Version 2018: imaging of Hepatocellular Carcinoma in At-Risk Patients . Radiology 2018. ; 289 ( 3 ): 816 – 830 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6. Marrero JA , Kulik LM , Sirlin CB , et al . Diagnosis, Staging, and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases . Hepatology 2018. ; 68 ( 2 ): 723 – 750 . [DOI] [PubMed] [Google Scholar]

[r7] 7. Rimola J , Sapena V , Brancatelli G , et al . Reliability of extracellular contrast versus gadoxetic acid in assessing small liver lesions using liver imaging reporting and data system v.2018 and European association for the study of the liver criteria . Hepatology 2022. ; 76 ( 5 ): 1318 – 1328 . [DOI] [PubMed] [Google Scholar]

[r8] 8. Seehofer D , Öllinger R , Denecke T , et al . Blood Transfusions and Tumor Biopsy May Increase HCC Recurrence Rates after Liver Transplantation . J Transplant 2017. ; 2017 : 9731095 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9. Liu HF , Wang M , Lu YJ , et al . CEMRI-Based Quantification of Intratumoral Heterogeneity for Predicting Aggressive Characteristics of Hepatocellular Carcinoma Using Habitat Analysis: comparison and Combination of Deep Learning . Acad Radiol 2024. ; 31 ( 6 ): 2346 – 2355 . [DOI] [PubMed] [Google Scholar]

[r10] 10. Xu Y , Zhou C , He X , et al . Deep learning-assisted LI-RADS grading and distinguishing hepatocellular carcinoma (HCC) from non-HCC based on multiphase CT: a two-center study . Eur Radiol 2023. ; 33 ( 12 ): 8879 – 8888 . [DOI] [PubMed] [Google Scholar]

[r11] 11. Yasaka K , Akai H , Abe O , Kiryu S . Deep Learning with Convolutional Neural Network for Differentiation of Liver Masses at Dynamic Contrast-enhanced CT: a Preliminary Study . Radiology 2018. ; 286 ( 3 ): 887 – 896 . [DOI] [PubMed] [Google Scholar]

[r12] 12. Hamm CA , Wang CJ , Savic LJ , et al . Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI . Eur Radiol 2019. ; 29 ( 7 ): 3338 – 3347 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13. Oestmann PM , Wang CJ , Savic LJ , et al . Deep learning-assisted differentiation of pathologically proven atypical and typical hepatocellular carcinoma (HCC) versus non-HCC on contrast-enhanced MRI of the liver . Eur Radiol 2021. ; 31 ( 7 ): 4981 – 4990 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14. Liu X , Khalvati F , Namdar K , et al . Can machine learning radiomics provide pre-operative differentiation of combined hepatocellular cholangiocarcinoma from hepatocellular carcinoma and cholangiocarcinoma to inform optimal treatment planning? Eur Radiol 2021. ; 31 ( 1 ): 244 – 255 . [DOI] [PubMed] [Google Scholar]

[r15] 15. Stollmayer R , Budai BK , Tóth A , et al . Diagnosis of focal liver lesions with deep learning-based multi-channel analysis of hepatocyte-specific contrast-enhanced magnetic resonance imaging . World J Gastroenterol 2021. ; 27 ( 35 ): 5978 – 5988 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16. Zhong X , Guan T , Tang D , et al . Differentiation of small (≤ 3 cm) hepatocellular carcinomas from benign nodules in cirrhotic liver: the added additive value of MRI-based radiomics analysis to LI-RADS version 2018 algorithm . BMC Gastroenterol 2021. ; 21 ( 1 ): 155 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17. Zhen SH , Cheng M , Tao YB , et al . Deep Learning for Accurate Diagnosis of Liver Tumor Based on Magnetic Resonance Imaging and Clinical Data . Front Oncol 2020. ; 10 : 680 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18. Jiang H , Liu X , Chen J , et al . Man or machine? Prospective comparison of the version 2018 EASL, LI-RADS criteria and a radiomics model to diagnose hepatocellular carcinoma . Cancer Imaging 2019. ; 19 ( 1 ): 84 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19. Wu J , Liu A , Cui J , Chen A , Song Q , Xie L . Radiomics-based classification of hepatocellular carcinoma and hepatic haemangioma on precontrast magnetic resonance images . BMC Med Imaging 2019. ; 19 ( 1 ): 23 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20. Jansen MJA , Kuijf HJ , Veldhuis WB , Wessels FJ , Viergever MA , Pluim JPW . Automatic classification of focal liver lesions based on MRI and risk factors . PLoS One 2019. ; 14 ( 5 ): e0217053 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21. Chatzipanagiotou OP , Loukas C , Vailas M , et al . Artificial intelligence in hepatocellular carcinoma diagnosis: a comprehensive review of current literature . J Gastroenterol Hepatol 2024. ; 39 ( 10 ): 1994 – 2005 . [DOI] [PubMed] [Google Scholar]

[r22] 22. Tomita N , Abdollahi B , Wei J , Ren B , Suriawinata A , Hassanpour S . Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides . JAMA Netw Open 2019. ; 2 ( 11 ): e1914645 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23. Mei X , Lee HC , Diao KY , et al . Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 . Nat Med 2020. ; 26 ( 8 ): 1224 – 1228 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24] 24. Zhang B , Liu X , Chen L , Zhu J . Convolution neural network toward Monte Carlo photon dose calculation in radiation therapy . Med Phys 2022. ; 49 ( 2 ): 1248 – 1261 . [DOI] [PubMed] [Google Scholar]

[r25] 25. Wang CJ , Hamm CA , Savic LJ , et al . Deep learning for liver tumor diagnosis part II: convolutional neural network interpretation using radiologic imaging features . Eur Radiol 2019. ; 29 ( 7 ): 3348 – 3357 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26. Gentry-Maharaj A , Burnell M , Dilley J , et al . Serum HE4 and diagnosis of ovarian cancer in postmenopausal women with adnexal masses . Am J Obstet Gynecol 2020. ; 222 ( 1 ): 56.e51 – 56.e17 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27. Bartolozzi C , Battaglia V , Bozzi E . HCC diagnosis with liver-specific MRI–close to histopathology . Dig Dis 2009. ; 27 ( 2 ): 125 – 130 . [DOI] [PubMed] [Google Scholar]

[r28] 28. Nie P , Wu J , Wang H , et al . Primary hepatic perivascular epithelioid cell tumors: imaging findings with histopathological correlation . Cancer Imaging 2019. ; 19 ( 1 ): 32 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29. Lee S , Kim SS , Bae H , Shin J , Yoon JK , Kim MJ . Application of Liver Imaging Reporting and Data System version 2018 ancillary features to upgrade from LR-4 to LR-5 on gadoxetic acid-enhanced MRI . Eur Radiol 2021. ; 31 ( 2 ): 855 – 863 . [DOI] [PubMed] [Google Scholar]

[r30] 30. Hyodo T , Murakami T , Imai Y , et al . Hypovascular nodules in patients with chronic liver disease: risk factors for development of hypervascular hepatocellular carcinoma . Radiology 2013. ; 266 ( 2 ): 480 – 490 . [DOI] [PubMed] [Google Scholar]

[r31] 31. Rhee H , Kim MJ , Park YN , Choi JS , Kim KS . Gadoxetic acid-enhanced MRI findings of early hepatocellular carcinoma as defined by new histologic criteria . J Magn Reson Imaging 2012. ; 35 ( 2 ): 393 – 398 . [DOI] [PubMed] [Google Scholar]

[r32] 32. Purysko AS , Remer EM , Coppa CP , Obuchowski NA , Schneider E , Veniero JC . Characteristics and distinguishing features of hepatocellular adenoma and focal nodular hyperplasia on gadoxetate disodium-enhanced MRI . AJR Am J Roentgenol 2012. ; 198 ( 1 ): 115 – 123 . [DOI] [PubMed] [Google Scholar]

[r33] 33. Gupta RT , Iseman CM , Leyendecker JR , Shyknevsky I , Merkle EM , Taouli B . Diagnosis of focal nodular hyperplasia with MRI: multicenter retrospective study comparing gadobenate dimeglumine to gadoxetate disodium . AJR Am J Roentgenol 2012. ; 199 ( 1 ): 35 – 43 . [DOI] [PubMed] [Google Scholar]

[r34] 34. Grazioli L , Bondioni MP , Haradome H , et al . Hepatocellular adenoma and focal nodular hyperplasia: value of gadoxetic acid-enhanced MR imaging in differential diagnosis . Radiology 2012. ; 262 ( 2 ): 520 – 529 . [DOI] [PubMed] [Google Scholar]

[r35] 35. Park HJ , Choi BI , Lee ES , Park SB , Lee JB . How to Differentiate Borderline Hepatic Nodules in Hepatocarcinogenesis: emphasis on Imaging Diagnosis . Liver Cancer 2017. ; 6 ( 3 ): 189 – 203 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] 36. Sirlin CB , Kielar AZ , Tang A , Bashir MR . LI-RADS: a glimpse into the future . Abdominal radiology (New York) 2018. ; 43 ( 1 ): 231 – 236 . [DOI] [PubMed] [Google Scholar]

[r37] 37. Fowler KJ , Tang A , Santillan C , et al . Interreader Reliability of LI-RADS Version 2014 Algorithm and Imaging Features for Diagnosis of Hepatocellular Carcinoma: a Large International Multireader Study . Radiology 2018. ; 286 ( 1 ): 173 – 185 . [DOI] [PubMed] [Google Scholar]

[r38] 38. Kwag M , Choi SH , Choi SJ , et al . Simplified LI-RADS for Hepatocellular Carcinoma Diagnosis at Gadoxetic Acid-enhanced MRI .[J.] Radiology 2022. ; 305 ( 3 ): 614 – 622 . [DOI] [PubMed] [Google Scholar]

[r39] 39. Ehman EC , Behr SC , Umetsu SE , et al . Rate of observation and inter-observer agreement for LI-RADS major features at CT and MRI in 184 pathology proven hepatocellular carcinomas . Abdom Radiol (NY) 2016. ; 41 ( 5 ): 963 – 969 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[r40] 40. Fowler KJ , Sirlin CB . Is It Time to Expand the Definition of Washout Appearance in LI-RADS? Radiology 2019. ; 291 ( 3 ): 658 – 659 . [DOI] [PubMed] [Google Scholar]

[r41] 41. Roberts LR , Sirlin CB , Zaiem F , et al . Imaging for the diagnosis of hepatocellular carcinoma: a systematic review and meta-analysis . Hepatology 2018. ; 67 ( 1 ): 401 – 421 . [DOI] [PubMed] [Google Scholar]

PERMALINK

Interactive Explainable Deep Learning Model for Hepatocellular Carcinoma Diagnosis at Gadoxetic Acid–enhanced MRI: A Retrospective, Multicenter, Diagnostic Study

Mingkai Li, MD

Zhi Zhang, DEng

Zebin Chen, MD

Xi Chen, MS

Huaqing Liu, MEng

Yuanqiang Xiao, MS

Haimei Chen, MS

Xiaodan Zong, MS

Jingbiao Chen, MS

Jianning Chen, MD, PhD

Xinying Wang, MD, PhD

Xuehong Xiao, MS

Zhiwei Yang, MS

Lanqing Han, MEng, PhD

Jin Wang, MD, PhD

Bin Wu, MD, PhD

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Materials and Methods

Study Patients and Data Collection

Figure 1:

MRI Protocol

LI-RADS Scoring

Image Processing

Figure 2:

Lesion Classifier Training and Validation

Feature Classifier Training and Validation

Evaluation of the AI Tool

Figure 3:

Reference Standard

Statistical Analysis

Results

Patient Characteristics

Table 1:

Interreader Reliability

Diagnostic Performances of the AI Models with Precontrast T1-weighted Imaging, T2-weighted Imaging, AP, PVP, and HBP

Figure 4:

Diagnostic Performances of the AI Tool with Five Phases

Comparison of Diagnoses Using the LI-RADS version 2018 Categories and Lesion Classifier with Final Diagnoses

Table 2:

Table 3:

Figure 5:

Evaluation of the AI-assisted Strategy

Table 4:

Figure 6:

Discussion

Acknowledgments

Acknowledgments

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases