A generative vision-language model for holistic pathological assessment using preoperative imaging in hepatocellular carcinoma

Liyang Wang; Fa Tian; Fan Li; Min Wu; Lingxuan Hou; Jitao Wang; Jing Zhao; Xiaobin Feng; Chengquan Li; Xiaojuan Wang; Haoming Xia; Kaixin Du; Xuehong Liao; Mingli Jin; Xiaoli Hu; Ruishan Liu; Xu Feng; Jinming Cao; Zhichao Hu; Jiabin Cai; Shizhong Yang; Jiahong Dong

doi:10.1016/j.ebiom.2025.106060

. 2025 Dec 2;122:106060. doi: 10.1016/j.ebiom.2025.106060

A generative vision-language model for holistic pathological assessment using preoperative imaging in hepatocellular carcinoma

Liyang Wang ^a,^b,^c,^t, Fa Tian ^d,^t, Fan Li ^e,^t, Min Wu ^f,^t, Lingxuan Hou ^g, Jitao Wang ^b,^c, Jing Zhao ^h, Xiaobin Feng ^b,^c,ⁱ, Chengquan Li ^b,^c,ⁱ, Xiaojuan Wang ^b,^c,ⁱ, Haoming Xia ^b,^c,ⁱ, Kaixin Du ^j, Xuehong Liao ^k, Mingli Jin ^l, Xiaoli Hu ^m, Ruishan Liu ^e, Xu Feng ⁿ, Jinming Cao ^o, Zhichao Hu ^p, Jiabin Cai ^q,^r,^s,^∗, Shizhong Yang ^b,^c,^∗∗, Jiahong Dong ^a,^b,^c,^i,^∗∗∗

^aSchool of Biomedical Engineering, Tsinghua University, Beijing, China

^bHepatopancreatobiliary Center, Beijing Tsinghua Changgung Hospital, Institute for Precision Medicine, Key Laboratory of Digital Intelligence Hepatology (Ministry of Education), Tsinghua University, Beijing, China

^cResearch Unit of Precision Hepatobiliary Surgery Paradigm, Chinese Academy of Medical Sciences, Beijing, China

^dCollege of Information Engineering, Sichuan Agricultural University, Sichuan Province, China

^eDepartment of Radiology, The Third Hospital of Mianyang, Sichuan Mental Health Center, Sichuan Province, China

^fDepartment of Radiology, The People's Hospital of Yubei District of Chongqing City, Yubei District, Chongqing, China

^gOptical Bioimaging Laboratory, Department of Biomedical Engineering, National University of Singapore, 15 Kent Ridge Crescent, Singapore, 119276, Singapore

^hDepartment of Radiology, Chifeng Municipal Hospital, Inner Mongolia Autonomous Region, China

ⁱSchool of Clinical Medicine, Tsinghua University, Beijing, China

^jOncology Center, Xiamen Hong'ai Hospital, Fujian Province, China

^kDepartment of Pathology, Sapporo Medical University, Sapporo, Hokkaido, Japan

^lDepartment of Radiology, The Affiliated Hospital of Southwest Medical University, Sichuan Province, China

^mDepartment of Radiology, The Third Affiliated Hospital of Chongqing Medical University, No. 1, Shuanghu Branch Road, Huixing Street, Yubei District, Chongqing, China

ⁿDepartment of Interventional Medicine Center, The Second People's Hospital of YiBin, 96# Beida Street, Sichuan Province, China

^oDepartment of Radiology, Beijing Anzhen Nanchong Hospital, Capital Medical University & Nanchong Central Hospital, Sichuan Province, China

^pDepartment of Information Systems and Operation Management, Nanyang Technological University, Singapore, 639798, Singapore

^qDepartment of Liver Surgery, Zhongshan Hospital (Xiamen Branch), Fudan University, Fujian Province, China

^rDepartment of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China

^sKey Laboratory of Carcinogenesis and Cancer Invasion, Shanghai Key Laboratory of Organ Transplantation, Zhongshan Hospital, Shanghai, China

^∗

Corresponding author. Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, 200032, China. cai.jiabin@zs-hospital.sh.cn

^∗∗

Corresponding author. Hepatopancreatobiliary Center, Beijing Tsinghua Changgung Hospital, Institute for Precision Medicine, Key Laboratory of Digital Intelligence Hepatology (Ministry of Education), Tsinghua University, Beijing, 102218, China. ysza02008@btch.edu.cn

^∗∗∗

These authors are contributed equally to the paper.

PMCID: PMC12719682 PMID: 41337935

Summary

Background

Pathological evaluation of hepatocellular carcinoma (HCC) traditionally relies on surgical resection, posing risks of infection and complications while failing to provide comprehensive pathological insights preoperatively. This study aims to develop HepaPathGPT, which utilises preoperative imaging to deliver detailed pathological interpretations, enabling non-invasive, real-time pathological assessments for patients with HCC.

Methods

A retrospective study of 1091 patients with HCC from 10 independent cohorts was used. SegFormer-b5 segmented tumour regions, and vision-language alignment mapped imaging features to pathology descriptions. We fine-tuned four pretrained frameworks using Low-Rank Adaptation (LoRA) to efficiently translate imaging features into structured histological reports, enabling real-time evaluation via an interactive interface.

Findings

HepaPathGPT showed robust tumour segmentation (mean Intersection over Union: 0.883 ± 0.007, Dice: 0.934 ± 0.006) and an average accuracy of 0.697 ± 0.024 for six pathological markers in external validation (n = 109). For text generation, BLEU-4 and ROUGE-1 scores were 62.7 ± 1.7 and 84.2 ± 1.1. Five pathologists rated 92.5% and 87.4% of reports as acceptable for accuracy and completeness.

Interpretation

HepaPathGPT offers a approach for non-invasive pathological analysis in patients with HCC. This technology holds significant clinical value for decision-making in patients with HCC and promises scalability to other diseases in the future.

Funding

National Natural Science Foundation of China (82090053, 82090052, 12326618, 82272703, 82473201); Tsinghua University Initiative Scientific Research Program of Precision Medicine (2022ZLA007); CAMS Innovation Fund for Medical Sciences (2019-I2M-5-056); Elite Youth Project of Natural Science Foundation of Fujian Province (2023J06056); Science-Health Joint Medical Scientific Research Project of Chongqing (2023MSXM092).

Keywords: Hepatocellular carcinoma, Vision-language model, Generative AI, Imaging-pathology integration

Research in context.

Evidence before this study

Preoperative pathological assessment of hepatocellular carcinoma (HCC) traditionally relies on invasive biopsy or postoperative histopathology, which carries risks and delays decision-making. Although AI models have been developed for HCC diagnosis and prognosis, most studies focus on isolated imaging biomarkers or single-modal predictions rather than a holistic pathological assessment. We searched PubMed and Web of Science for peer-reviewed articles published in any language up to January 31, 2025, using the search terms “(Vision-Language Model OR Generative AI) AND (pathology prediction OR histopathological features) AND (Hepatocellular Carcinoma OR HCC).” No previous studies have integrated vision-language models (VLMs) to predict comprehensive histopathological features and immunohistochemical markers from preoperative imaging.

Added value of this study

This is the study to introduce a generative vision-language model (VLM) for holistic pathological assessment of HCC using preoperative imaging. Our AI framework, HepaPathGPT, leverages multimodal learning to integrate radiological imaging with histopathological and immunohistochemical data, enabling non-invasive, real-time pathology prediction. Unlike conventional models that predict only individual biomarkers, HepaPathGPT generates structured pathological reports, offering a comprehensive assessment of tumour characteristics. The model was validated on a 10-center dataset comprising 30,289 paired imaging-pathology samples and achieved high clinical acceptance among pathologists. This innovation represents a significant advancement in AI-assisted preoperative evaluation and personalised treatment planning for patients with HCC.

Implications of all the available evidence

Our study bridges the gap between radiology and pathology by utilising generative AI for preoperative pathological assessment. By providing non-invasive and accurate pathological insights, HepaPathGPT has the potential to transform clinical workflows, reduce reliance on invasive biopsies, and enhance precision oncology for patients with HCC. Moreover, this framework can be extended to other tumour types, establishing a new paradigm for AI-driven, multimodal pathology prediction.

Introduction

Hepatocellular carcinoma (HCC) is one of the most lethal malignancies worldwide, characterised by complex pathological features and poor prognosis. With over 800,000 new cases annually, the global 5-year survival rate for HCC remains below 20%.¹ Due to its asymptomatic nature in the early stages, HCC is often diagnosed at advanced stages,²^,³ leading to postoperative recurrence rates exceeding 70% and posing significant challenges to long-term survival.⁴^,⁵ Precision clinical decision-making can markedly improve patient outcomes.⁶ Pathology is widely regarded as the “gold standard” for HCC diagnosis, enabling detailed analysis of critical markers such as proliferation indicators (such as Ki-67) and immune evasion-related proteins (such as PD-L1), which reveal the biological behaviour of tumours.⁷ These insights are essential for tumour staging, prognosis evaluation, and individualised treatment planning.⁸

Currently, pathological assessment of HCC primarily relies on postoperative resection or biopsy samples. These invasive procedures carry risks of infection and fail to provide timely guidance for preoperative treatment.⁹ Consequently, traditional clinical decision-making often lacks comprehensive biological information during the formulation of initial treatment strategies.¹⁰ Accurate prediction of pathological features through non-invasive imaging techniques could considerably enhance its clinical management. Non-invasive pathological prediction provides critical information for preoperative staging and disease evaluation, offering insights into tumour biology before treatment. This supports personalised and precise therapeutic decision-making.

Recent advancements in artificial intelligence (AI) have enabled the use of radiomics and machine learning for predicting HCC pathological markers.11, 12, 13 For example, Hectors et al.¹⁴ demonstrated that radiomics features could serve as non-invasive predictors of HCC immunological characteristics. Xia et al.¹⁵ developed a machine learning model using preoperative multiphase computed tomography (CT) images to predict microvascular invasion (MVI) and identify related differentially expressed genes. Zhao et al.¹⁶ constructed a radiomics model to predict Ki-67 expression; in contrast, other studies employed dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to predict Ang-2 expression in HCC.¹⁷

Despite their promise, current AI models face several challenges. First, their prediction accuracy remains suboptimal, and many models lack robustness in real-world clinical scenarios. Furthermore, most studies rely on small datasets with limited large-scale validation, reducing their applicability to multi-center and heterogeneous datasets. These models also tend to focus on specific classification tasks, such as predicting one or two markers, rather than providing comprehensive predictions of multiple biomarkers. Critically, existing models lack effective human–computer interaction capabilities, making it difficult for clinicians to participate in real-time or receive feedback. This limits the flexibility of AI systems in clinical decision-making. Addressing these challenges requires the development of more accurate, scalable, and interactive intelligent systems for HCC pathology prediction.

The rapid evolution of large language models (LLMs) and vision-language models (VLMs) has introduced new possibilities for foundation-level pathological predictions using preoperative imaging.¹⁸ VLMs can analyse multi-level features within imaging data and align visual and linguistic modalities,¹⁹ linking imaging information to pathology text to generate rich and semantically coherent predictions. These models have shown potential in medical applications. For instance, SkinGPT-4 aligns a pretrained Vision Transformer (ViT) with the Llama-2-13b-chat model to create a multimodal dermatological diagnostic system capable of autonomously analysing skin images, identifying features, and providing diagnostic recommendations.²⁰ Similarly, BiomedGPT, an open-source foundation VLM, has demonstrated notable performance in radiological question answering, report generation, and summarisation tasks.²¹ However, no studies have yet developed LLMs for foundational pathological generation based on preoperative imaging.

This study introduces HepaPathGPT, a vision-language-based generative AI model designed for pathological assessment in HCC. By integrating preoperative CT/MRI with language generation techniques, HepaPathGPT enables comprehensive prediction of HCC pathological features. The model employs multimodal alignment strategies to deeply extract imaging features and associate them with pathological text. Fine-tuning on leading VLM frameworks allows it to generate personalised pathology reports. Compared with traditional AI models, HepaPathGPT achieves universal prediction capabilities while providing exceptional human–computer interaction features, allowing clinicians to receive real-time feedback during the diagnostic process. Fig. 1 illustrates the usage diagram of HepaPathGPT. Imaging data are anonymised and pixelated before upload, requiring only JPG-format CT/MRI images. Furthermore, HepaPathGPT supports local deployment, fully adhering to Health Insurance Portability and Accountability Act (HIPAA) standards. These features make the model suitable for clinical applications without compromising sensitive information.

Fig. 1 — Schematic of HepaPathGPT's workflow for generating pathological reports based on preoperative imaging in hepatocellular carcinoma.

Methods

The overall workflow of this study is shown in Fig. 2.

Patient enrolment

Inclusion and exclusion criteria

Patient enrolment at each center was conducted in a retrospective, consecutive manner, whereby all eligible patients meeting the inclusion and exclusion criteria were enrolled without additional filtering. A unified protocol and centralised quality control were applied across centres to minimise selection bias and ensure representativeness. This study included patients with primary HCC who underwent liver resection at 10 hospitals across mainland China: Zhongshan Hospital, Fudan University (ZS); Chifeng Municipal Hospital (CF); The Affiliated Hospital of Southwest Medical University (XN); Third Affiliated Hospital of Chongqing Medical University (CY3); The People's Hospital of Yubei District of Chongqing City (CYB); The Third Hospital of Mianyang (MY); Luzhou Traditional Chinese Medicine Hospital (LZ); Leshan Hospital, Chengdu University of Traditional Chinese Medicine (LS); Beijing Anzhen Nanchong Hospital (NC); and The Second People's Hospital of YiBin (YB). A total of 2462 patients with HCC were initially screened, and 1091 patients were ultimately included based on the following inclusion and exclusion criteria. Sex data were obtained retrospectively from patients' electronic medical records.

Inclusion criteria:

1)
Postoperative histopathological confirmation of primary HCC.
2)
Availability of imaging and clinical data within 1 month prior to surgery.
3)
Detailed postoperative pathological reports, including IHC information, for tumour biology analysis.
4)
Child-Pugh class A liver function and age of 18–75 years, excluding patients with hepatic insufficiency or extreme age-related bias.
5)
No history of severe organ dysfunction or other malignancies.

Exclusion criteria:

1)
Patients with incomplete clinical data or pathological reports.
2)
Those receiving preoperative anti-tumour therapy or prior liver resection.
3)
Patients with imaging evidence of major vascular invasion or extrahepatic metastasis.

To minimise center-specific biases, identical inclusion and exclusion criteria were applied at all sites, and imaging data were harmonised by voxel resampling and intensity normalisation. Pathology reports were standardised in terminology and formatting, and all annotations underwent centralised quality control by senior hepatobiliary surgeons. Furthermore, to test cross-institutional adaptability, we deliberately reserved one independent center (MY) as the external validation cohort, where imaging data were acquired using different vendors and acquisition protocols compared to the training centres.

Ethics

This study complied with the Declaration of Helsinki and was approved by the ethics committees of all 10 participating centres (Ethics Approval Number: ZS: B2024-184R; CF: CK2024123; XN: KY2024429; CY3: 2023-95; CYB: 2022SA12-2; MY: 2023-179; LZ: 2024-27; LS: 2024014-3; NC: 2024-149; YB: 2024-226-01). All patients waived informed consent requirements. Data privacy was protected in strict adherence to HIPAA principles,²² with sensitive information such as names, ages, and sexes removed during data anonymization. The study design, participant inclusion, model development, and evaluation followed the STROBE and TRIPOD-AI guidelines to ensure transparency and reproducibility.

Constructing the dataset

Collection of imaging-pathology text pairs

The dataset consisted of imaging and pathology text pairs provided by the 10 centres. Details of imaging equipment models and scanning parameters are listed in Supplementary Table S1. Imaging methods included upper abdominal contrast-enhanced CT or DCE-MRI, with arterial phase images specifically selected for model development, as our preliminary experiments demonstrated that this phase provided the most reliable lesion conspicuity and yielded superior predictive performance compared to other phases. Pathology results were collected following standardised procedures: surgical specimens were submitted for examination, processed through cutting, dehydration, embedding, sectioning, and staining, and then evaluated under a microscope by pathologists, who generated reports detailing the pathological features. These reports included descriptions of the specimens (such as size and location), histological characteristics (including differentiation grade and tumour type), lesion properties, and IHC staining results. For example, a sample pathology report might state: “(Left hepatic tumour tissue) Primary moderately differentiated hepatocellular carcinoma, measuring approximately 3.5 × 3 × 2.8 cm, with no definite satellite nodules or vascular invasion observed. “Resection margins” show no cancerous tissue—IHC: AFP (+), Glypican-3 (weak+), HSA (+), CK19 (−), P53 (−), CD31 and CD34 (vessels +), Ki-67 (+, ∼5%).” Each pathology text was paired with its corresponding imaging data, ensuring a one-to-one alignment between the modalities.

Imaging preprocessing

To address imaging differences caused by varying equipment across centres, all images underwent resampling and normalisation.²³ Specifically, voxel sizes were resampled to 1 mm³ to ensure consistent spatial resolution. Intensity values were normalised to a uniform range to mitigate contrast deviations due to equipment variations. Lesion annotations were performed using 3D-Slicer software (https://www.slicer.org/) by two experienced radiologists and hepatobiliary surgeons, each with over 10 years of expertise (F.L. and S.Z.Y.). F.L. conducted the initial lesion annotations, which were subsequently reviewed by S.Z.Y. to ensure Acc. Imaging data were converted from the DICOM to JPG format, removing all patient identifiers and retaining only imaging information. From the original dataset of 225,164 slices, 30,289 slices containing annotated lesions were retained for subsequent training. Additionally, to support the automated application of HepaPathGPT, we trained a lightweight deep-learning model based on SegFormer-B5²⁴^,²⁵ for the automatic segmentation of liver lesions. Details of the model architecture and training process are provided in Supplementary Tables S2 and S3.

HepaPathGPT architecture

HepaPathGPT is designed to generate pathological interpretations based on preoperative imaging. Tumour regions segmented by SegFormer-B5 serve as the foundation for extracting deep imaging features using a ViT. These features are then aligned with corresponding pathological text features via multimodal alignment techniques, such as Contrastive Language-Image Pretraining (CLIP),²⁶ to enable downstream transfer learning. Four advanced pretrained VLM frameworks were employed: LLaVA 1.5-7B,²⁷ LLaVA 1.5-Med-7B,²⁸ Qwen2VL-7B-Instruct,²⁹ and DeepSeek-VL-7B-Chat.³⁰ These frameworks were fine-tuned specifically for HCC pathology generation tasks (further details on the working principles are provided in Supplementary Fig. S1).

Model fine-tuning and inference

VLM fine-tuning

The segmented tumour slices from each patient were aggregated into a unified multi-slice representation. Specifically, all slices containing non-zero tumour masks were stacked and concatenated into a single high-dimensional tensor, preserving both intra-slice structural information and inter-slice continuity. This aggregated tensor was then used as the vision encoder input and paired one-to-one with the corresponding pathology report. In this way, each patient contributed a strictly aligned mapping between the complete tumour imaging profile and its ground-truth pathology outcomes, ensuring comprehensive coverage of intra- and peri-tumoral features during model training.

Fine-tuning was performed using the Low-Rank Adaptation (LoRA) method to enhance the model performance in generating HCC pathology reports.³¹ LoRA decomposes the large weight matrices of transformer self-attention modules into two low-rank matrices (rank = 16, scaling factor = 8 in our experiments), introducing only a small number of trainable parameters while preserving the pretrained backbone. To avoid catastrophic forgetting of general multimodal knowledge, we froze the embedding and lower transformer layers, which primarily encode transferable low-level visual features (e.g., edges, textures, contrast) and basic linguistic embeddings. In contrast, the upper transformer blocks and cross-attention layers were fine-tuned, as they govern high-level semantic alignment between tumour imaging features and domain-specific pathology descriptors (e.g., “microvascular invasion,” “CK19 positivity”). This selective adaptation strategy provided a balance between computational efficiency and domain specialisation. The initial learning rate was set to 2e-5, and the AdamW optimiser was used to stabilise weight updates. A batch size of 16 was employed to manage computational resources efficiently. The hyperparameter “temperature” was set to 0.7 to ensure clinically logical outputs (A sensitivity analysis of the decoding temperature revealed that model performance peaked consistently at 0.7 across BLEU and ROUGE metrics (Supplementary Fig. S2), which was therefore adopted as the default setting for all subsequent evaluations).

Experiments were conducted on a Linux-based operating system (Ubuntu 20.04) with a 64-core AMD EPYC processor and five NVIDIA RTX 3090 GPUs (total memory: 120 GB). The setup uses CUDA 12.2 for GPU acceleration, Python 3.11 for improved performance, and Torch 2.4.0 for efficient deep-learning model training and inference. Core frameworks include Transformers (4.41.2–4.45.0) and Datasets (2.16.0–2.21.0), with Accelerate and PEFT for model fine-tuning and distributed training. Auxiliary tools including Gradio (≥4.0.0) for interfaces, Pandas (≥2.0.0), and Matplotlib (≥3.7.0) for data processing and visualisations and the backend uses Fastapi and Uvicorn with SSE-Starlette for efficient server support. Libraries such as Numpy (<2.0.0), Scipy, and Sentencepiece handle numerical computation and preprocessing, ensuring efficient execution and flexible deployment. The LoRA library (lora-adaptor 0.1.2) supported the low-rank decomposition fine-tuning process.

Model inference

During inference, HepaPathGPT processes preprocessed imaging inputs to perform feature analysis and text generation. The inference strategy employs layer-wise decoding and dynamic weight adjustment to ensure the Acc of the generated content. To enable users to deploy the model locally and safeguard information privacy, we developed an intuitive web interface and made the source code and model weights publicly available on GitHub (https://github.com/wangliyang123/HepaPathGPT.git).

Model evaluation

External validation was conducted using 109 patients with HCC from the MY cohort, while the other nine centres were used exclusively for model training. The evaluation consisted of two complementary stages. In the first stage, we quantitatively assessed the performance of HepaPathGPT across segmentation, biomarker classification, and text generation tasks. For tumour segmentation, accuracy was measured using mIoU and Dice score. For biomarker prediction, classification Acc, Recall, Prec, and F1 score were reported. All IHC markers were binarized into positive vs. negative categories, with mismatches in polarity (e.g., “Ki-67 (–, ∼15%)” vs. “Ki-67 (+, ∼15%)”) treated as incorrect. Omitted markers were counted as false negatives and hallucinated markers as false positives, ensuring conservative and clinically faithful evaluation. For text generation, ROUGE metrics (ROUGE-1, ROUGE-2, ROUGE-L) and BLEU-4 were used to evaluate the similarity between generated outputs and ground-truth pathology reports.³²

In the second stage, five board-certified pathologists with more than 10 years of diagnostic experience (K.X.D., X.H.L., K.N.Y., Z.S.W., and Y.Q.J.) independently reviewed H&E-stained slides of validation patients to establish baseline diagnoses. They then evaluated HepaPathGPT by providing the standardised prompt: “Please generate a pathological overview of this patient with hepatocellular carcinoma.” The AI-generated outputs were compared against the actual pathology reports. Following this, the pathologists completed a structured six-question survey (Supplementary Table S4), using a four-level scoring system to assess the generated outputs along six dimensions: accuracy, completeness, logical coherence, professionalism, consistency, and practicality. The survey results were aggregated to provide an expert-based evaluation of the clinical acceptability and reliability of HepaPathGPT.

Statistics

We saved multiple checkpoints during training and evaluated model performance across six representative checkpoints. Reported results are expressed as mean ± SEM to reflect the robustness of performance across training stages. Unless specifically stated otherwise, all reported p-values are computed using the Wilcoxon Rank–Sum Test (Mann–Whitney U test), with differences considered significant at p < 0.05.

Role of funders

The funders played no direct roles in the study design, data collection, analysis, interpretation, or the writing of the manuscript. None of the authors have been paid to write this article by a pharmaceutical company or other agency.

Results

Patient enrolment results

During enrolment, patients who did not meet the criteria were excluded. This included 747 patients lacking complete imaging data or essential pathological information, 292 patients who had undergone preoperative anti-tumour therapy, and 332 patients with major vascular invasion or extrahepatic metastasis. Ultimately, 1091 patients met the inclusion criteria and were included. The detailed screening process for each center is provided in Supplementary Fig. S3. We performed descriptive analyses of baseline clinical characteristics across the ten participating hospitals. Because this study was retrospective, the scope and completeness of baseline laboratory and clinical examinations varied between centres, and some indicators had substantial missing values. Therefore, we restricted analysis to common variables available across all centres, and summarised their overall distributions (Supplementary Fig. S4). These statistics highlight the heterogeneity of the enrolled cohort, reflecting real-world patient diversity.

Statistical analyses were performed on the pathological data of the included patients. For each participating center, the frequency of pathological markers was calculated and visualised as word clouds (Supplementary Fig. S5), which highlight inter-center variations in reporting styles and diagnostic emphasis. We also summarised the overall expression distributions of the seven key pathological biomarkers across the entire cohort (Supplementary Fig. S6). These pie charts illustrate the proportion of positive vs. negative (or high vs. low) expression for Ki-67, differentiation grade, CK19, CK18, CD34, p53, and Arg-1, providing a cohort-level view of biomarker prevalence.

We further analysed the geographic distribution and institutional composition of the enrolled patients to ensure the representativeness of the cohort. The 1091 cases were contributed by 10 tertiary or municipal hospitals across mainland China, spanning eastern (ZS), western (XN and LS), northern (CF), southern (LZ), central (CY3 and CYB), and southwestern regions (MY, NC, and YB). This distribution captures both metropolitan and regional populations, thereby increasing the diversity of etiological backgrounds and imaging practices. The largest single-center contribution accounted for 30.2% of the cohort (ZS), indicating that no single institution dominated the dataset.

Tumour segmentation module

The SegFormer-B5 model was employed for lesion segmentation tasks, demonstrating excellent performance on complex liver imaging datasets. On the validation set, it achieved high segmentation metrics, with a mean Intersection over Union (mIoU) of 0.8825 ± 0.0073 and Dice score of 0.9337 ± 0.0061. These results confirm the suitability of this model for the automated lesion segmentation module for HepaPathGPT (see Supplementary Fig. S7 for performance comparisons).

To systematically benchmark SegFormer-B5 against representative state-of-the-art models, we compared it with three widely recognised architectures under identical preprocessing, augmentation, and evaluation protocols: (i) Swin-UNet, a ViT-based model with strong performance on multiple medical imaging benchmarks; (ii) nnU-Net (2D), widely considered the gold standard “autoconfiguration” framework in medical image segmentation; and (iii) UNet++, a recent refinement of U-Net with redesigned skip connections and deep supervision. Performance was evaluated over repeated runs with different seeds, reporting mean ± SEM for Dice and mIoU. Supplementary Table S5 summarises the results, which demonstrate that SegFormer-B5 achieves the best overall balance, with the top Dice and mIoU scores while maintaining computational efficiency.

VLM performance

Fine-tuning process

Four models—LLaVA 1.5-7B, LLaVA 1.5-Med-7B, Qwen2VL-7B-Instruct, and DeepSeek-VL-7B-Chat—underwent layer-wise fine-tuning to optimise their performance in generating HCC pathology reports. Each model converged after 1000 steps, minimising overfitting risks. During the initial 200 steps, the loss values decreased sharply, indicating rapid learning of the correspondence between pathological text and imaging features. The loss curves subsequently stabilised, showing a gradual decline until convergence. Fig. 3 shows the performance comparison of four different models. Fig. 3a illustrates the loss trajectories for all models during training.

Fig. 3 — Performance comparison of four vision-language models (LLaVA 1.5-Med-7B-en, LLaVA 1.5-7B-en, Qwen2VL-7B-Instruct-en, and DeepSeek-VL-7B-Chat-en) in generating hepatocellular carcinoma pathology descriptions. (A) Loss curve variations during the fine-tuning of LLaVA 1.5-Med-7B-en (top left), LLaVA 1.5-7B-en (top right), Qwen2VL-7B-Instruct-en (bottom left), and DeepSeek-VL-7B-Chat-en (bottom right). (B) Performance metrics (Accuracy [Acc], Precision [Prec], Recall, and F1-score) for predicting key markers, including Ki67, CK19, P53, CD34, Arginase-1, Differentiation, and CK8/18 (mean ± SEM across six inference runs).

Comparative model performance

The performance of the fine-tuned vision–language models was evaluated on the external validation cohort for seven representative biomarkers of hepatocellular carcinoma: Ki-67, CK19, P53, CD34, CK18, differentiation grade, and Arginase-1. Accuracy, precision, recall, and F1 scores were calculated for each biomarker, and results are presented as mean values with standard error of the mean (SEM) across six independent inference runs (Fig. 3b). Among the compared models, LLaVA 1.5-7B exhibited the most robust and consistent performance across biomarkers. It achieved the highest F1 scores for Ki-67 (0.705 ± 0.034), CK19 (0.727 ± 0.068), CD34 (0.490 ± 0.031), and CK18 (0.526 ± 0.030), significantly outperforming LLaVA 1.5-Med-7B in these tasks (Wilcoxon Rank–Sum Test, p < 0.05). For P53 and differentiation grade, LLaVA 1.5-7B and DeepSeek-VL-7B-Chat showed comparable performance, while other VLM models remained substantially lower. DeepSeek-VL-7B-Chat achieved competitive results in Arginase-1 prediction (Acc: 0.836 ± 0.014) but underperformed for CK18 and CD34, with wider SEM ranges indicating less stability. Qwen2VL-7B-Instruct consistently demonstrated the weakest outcomes, particularly for Ki-67 (Acc: 0.451 ± 0.036) and CK18 (Acc: 0.448 ± 0.004), reflecting poor reproducibility across runs. These results demonstrate that LLaVA 1.5-7B provides the most balanced and reproducible accuracy across key biomarkers, combining high mean scores with narrow error margins. This stability under repeated inference supports its suitability as the backbone model for pathology-aligned report generation in preoperative HCC imaging. Additionally, we implemented a CNN-based baseline that extracted imaging features for direct classification of biomarker status to further benchmark HepaPathGPT against conventional pipelines. As shown in Supplementary Fig. S8, VLMs outperformed the CNN baseline across accuracy, precision, recall, and F1, confirming the superiority of vision–language alignment over traditional feature-classification strategies for imaging-to-pathology prediction.

The quality of text generation was assessed using BLEU-4 and ROUGE scores, reported as mean ± SEM across six inference runs (Fig. 4). LLaVA 1.5-7B consistently outperformed other backbones, achieving the highest BLEU-4 score (62.7 ± 1.7) and ROUGE-1 score (84.2 ± 1.1), with significant advantages over Qwen2VL-7B-Instruct and LLaVA 1.5-Med-7B (Wilcoxon Rank–Sum Test, p < 0.05). It also led in ROUGE-2 (71.7 ± 1.6) and ROUGE-L (73.6 ± 1.6), reflecting stronger sentence-level coherence and overall completeness of the generated reports. Qwen2VL-7B-Instruct demonstrated competitive results in ROUGE-1 (79.9 ± 1.2) and ROUGE-L (67.9 ± 1.9) but lagged behind LLaVA 1.5-7B in BLEU-4 and ROUGE-2. DeepSeek-VL-7B-Chat achieved intermediate performance, with reasonable BLEU-4 (57.2 ± 1.8) and ROUGE-1 (80.6 ± 1.2) scores, but was less stable across repeated runs. In contrast, LLaVA 1.5-Med-7B scored substantially lower on BLEU-4 (35.7 ± 2.0) and ROUGE-2 (46.9 ± 1.9), indicating weaker lexical and phrasal coverage. These findings show that LLaVA 1.5-7B not only achieved the most accurate biomarker predictions but also generated the most coherent, complete, and clinically faithful pathology-aligned reports, supporting its selection as the backbone model for HepaPathGPT.

Evaluation by pathologists

Five pathologists, each with over 10 years of diagnostic experience, provided initial diagnoses for pathology slides from 109 patients. HepaPathGPT was then used to generate pathology reports based on identical prompts, which were compared with real pathology reports (the evaluation results as shown in Fig. 5). The results were evaluated through a structured questionnaire (Fig. 5a). The responses revealed that the proportions of “completely accepted” or “mostly accepted” ratings for Acc, completeness, logical coherence, professionalism, consistency, and practicality were 92.5%, 87.4%, 87.3%, 91.4%, 85.6%, and 91.8%, respectively. These findings underscore the high satisfaction ratings of HepaPathGPT across all evaluation dimensions. Additionally, a comparative analysis of diagnostic time showed that, in addition to its non-invasive preoperative prediction capabilities, HepaPathGPT significantly reduced diagnostic time (Fig. 5b).

Fig. 5 — Multidimensional evaluation of HepaPathGPT-generated reports and diagnostic time comparisons. (A): Score distributions from five pathologists evaluating HepaPathGPT-generated reports on six dimensions: accuracy, completeness, logical coherence, consistency, professionalism, and clinical practicality. Scores are categorised as “completely unacceptable,” “partially acceptable,” “mostly acceptable,” and “completely acceptable.” (B): Diagnostic time comparison between five pathologists (left y-axis, minutes) and HepaPathGPT (right y-axis, seconds). Box plots show minima, maxima, interquartile ranges (25th–75th percentiles), and medians.

Visualisation of HepaPathGPT outputs

We present a representative case from the external validation cohort (Fig. 6). The ground-truth pathology report (Fig. 6a) documented a primary poorly differentiated hepatocellular carcinoma in the left hepatic lobe with necrosis, capsule involvement, and cirrhotic background, together with detailed immunohistochemistry (IHC) results (e.g., PCK (+), Glypican-3 (+), Arginase-1 (+), HepPar-1 (+), CD34 (+), CK8/18 (+), CK19 (−), TS (+), Ki-67 (+)).

When prompted with the standardised instruction “Please generate a pathological overview of this patient with hepatocellular carcinoma”, HepaPathGPT (Fig. 6b) produced a comprehensive and well-structured report. It not only described tumour location and necrosis but also accurately summarised key IHC marker expressions and tumour differentiation, closely mirroring the reference pathology document. In comparison, LLaVA 1.5-Med-7B (Fig. 6c) captured several core features, including necrosis and some biomarker profiles, but its descriptions were less detailed and occasionally omitted secondary findings. DeepSeek-VL-7B-Chat (Fig. 6d) produced brief and loosely structured text, focussing mainly on general tumour presence without sufficient diagnostic granularity. Qwen2VL-7B-Instruct (Fig. 6e) correctly localised the lesion and noted major pathological traits, but some hallucinations still occurred. GPT-4o (Fig. 6f) generated a readable and logically coherent report, yet its marker-level specificity was inferior to HepaPathGPT. These comparisons highlight that HepaPathGPT generates pathology-aligned reports with the highest degree of accuracy, completeness, and interpretability, reinforcing its suitability as a clinically relevant preoperative decision-support tool.

Discussion

This study presents HepaPathGPT, a vision-language alignment model with generative capabilities designed to produce pathological reports from preoperative HCC imaging. HepaPathGPT demonstrates strong clinical potential by integrating imaging features with the pathological text through multimodal alignment techniques and fine-tuned LLaVA 1.5-7B pretrained frameworks. This enables the model to accurately capture tumour microfeatures and generate detailed, diagnostic-quality reports. Furthermore, the interactive interface of HepaPathGPT allows clinicians to generate pathology descriptions in real time with simple prompts, offering a practical tool for non-invasive preoperative lesion evaluation.

Studies have supported the feasibility of predicting pathological features from preoperative HCC imaging.³³^,³⁴ Imaging characteristics, such as texture, density, and boundary properties, have been closely linked to tumour biology.¹⁴^,³⁵ Machine learning and radiomics approaches have successfully extracted subtle imaging features associated with tumour differentiation, invasiveness, and biomarker expression, indicating that imaging features can reflect the molecular and immune characteristics of tumours.¹⁶ Advances in AI technology have enhanced the ability to analyse these features, improving insights into the tumour microenvironment through imaging.36, 37, 38 These developments provide a strong foundation for the design and implementation of HepaPathGPT.

The study evaluated four mainstream VLM pretrained frameworks and conducted fine-tuning to efficiently map imaging features to pathological text. HepaPathGPT employs SegFormer-B5 for precise lesion segmentation. Leveraging a ViT-based architecture, it extracts fine-grained tumour features, including location, shape, and boundaries, which are subsequently aligned through multimodal mapping to generate detailed pathological descriptions. During text generation, the model processes imaging features at multiple levels, using a progressive mechanism to iteratively refine its output. This approach transforms imaging features into clinical text that adheres to pathological standards. The disparity in model performance likely stems from a trade-off between general-purpose capabilities and specialised medical knowledge. LLaVA 1.5-7B's superior performance suggests that its robust foundational architecture and broad pre-training provided a stronger base for fine-tuning, enabling more accurate and coherent text generation. In contrast, models pre-trained specifically on medical data may have inherited limitations in general visual-linguistic reasoning or conversational fluency, which proved critical for this task.

HepaPathGPT offers significant clinical utility by providing detailed preoperative pathological information, including tumour differentiation and biomarker expression, which aids in risk assessment and treatment planning for patients with HCC. By non-invasively obtaining critical pathological details preoperatively, the model enables physicians to make more informed decisions about surgical resection or alternative interventions, reducing the risk of treatment errors due to insufficient data.³⁹ Unlike previous studies that primarily focus on predicting individual biomarkers, HepaPathGPT delivers comprehensive pathological reports covering both macro features, such as tumour differentiation and margin status, and detailed immunomarker expressions. Our choice to generate pathology-aligned reports from imaging is motivated by their interpretability and traceability as a bridge toward prognostic and therapeutic modelling. The biomarkers predicted by HepaPathGPT are established correlates of recurrence risk, tumour aggressiveness, and treatment response in HCC, and presenting them in a standardised, clinically readable format provides a transparent intermediate representation that clinicians can audit and that downstream prognostic or treatment-policy models can readily consume. Thus, HepaPathGPT should be viewed as a preoperative phenotyping assistant that complements, rather than replaces, histopathology, with future work aimed at coupling these phenotypic outputs with outcome models and guideline-based decision frameworks to inform strategy selection.

In addition to performance improvements, we also considered the potential risks of hallucination and opacity commonly associated with generative models. To address this, we implemented a human-in-the-loop validation strategy, whereby five board-certified pathologists independently reviewed the generated pathology reports alongside ground-truth pathology. Expert ratings demonstrated significant performance across six key dimensions (accuracy, completeness, logical coherence, professionalism, consistency, and practicality). These findings confirm that HepaPathGPT substantially reduces the likelihood of misleading or clinically unsafe outputs. Looking forward, we plan to further mitigate hallucination risks by integrating retrieval-augmented generation (RAG) grounded on authoritative guidelines, medical knowledge graph alignment, and automated factual consistency checks to ensure generated texts remain faithful to validated evidence. Taken together, these measures will transform HepaPathGPT into a hybrid knowledge-grounded assistant, thereby enhancing reliability and clinical applicability while preserving the flexibility of generative language output.

Currently, HepaPathGPT supports two-dimensional imaging data and requires the upload of all tumour-containing slices from contrast-enhanced CT or DCE-MRI studies, which are automatically segmented, aggregated, and then used for pathological predictions. Although this design allows rapid processing, the model may underperform in cases with complex tumour morphologies due to the lack of three-dimensional data.

Despite its promise, HepaPathGPT faces some limitations. First, its reliance on two-dimensional data restricts its ability to effectively analyse irregular boundaries or complex lesion structures. Second, as a generation-based model dependent on LLMs, it is prone to hallucination and opacity issues, where generated content may not align with actual data or be difficult to interpret. These challenges are critical in pathological reporting, where Acc and transparency are essential. Lastly, although the model performed well on a limited dataset, broader validation across larger datasets is needed. Because this retrospective study focused on imaging-to-pathology generation, follow-up data were not uniformly available and Cox regression analysis was not feasible. Nevertheless, the selected biomarkers are well established as independent prognostic factors in HCC, providing a clinically meaningful foundation for evaluating HepaPathGPT. Moreover, HepaPathGPT is intended as a phenotyping aid within confirmed HCC cases following radiologic diagnosis, rather than as an independent detector or differential diagnostic tool. This study focuses exclusively on HCC, and the model's generalisability to other tumour types remains untested. Future research should incorporate data from diverse tumour types to confirm its applicability across varied clinical contexts. Pathological correlation was based on the dominant lesion rather than all coexisting nodules in multifocal HCC. This approach reflects routine clinical pathology practice but may overlook biological heterogeneity among smaller lesions, which should be further investigated in prospective lesion-level studies. We acknowledge that further validating the model's performance across a wider spectrum of imaging protocols and scanner manufacturers would strengthen its generalisability. Therefore, a key future direction will be to quantitatively evaluate HepaPathGPT's robustness against such technical variations in prospective, multi-vendor cohorts. On the other hand, in future work, we plan to explore adaptive low-rank allocation, collaborative cross-modal optimisation, and integration with retrieval-augmented or knowledge-graph modules to improve factual alignment and reduce hallucination, enabling HepaPathGPT to evolve into a more adaptive, knowledge-grounded assistant.

This study aimed to develop a generative AI model based on vision-language question answering to predict detailed pathological information in patients with HCC using preoperative imaging. We employed four mainstream pretrained VLM frameworks, combining image segmentation with fine-tuning techniques to enable the model to predict histological features and biomarker expression. Experimental results demonstrate that HepaPathGPT exhibits high Acc and versatility in the imaging-pathology generation task, with generated reports receiving favourable evaluations from pathologists regarding clinical applicability.

Contributors

L.W. and F.T. conceived of the presented idea. L.W. designed the computational framework and analysed the data. F.L., M.W., L.H., J.Z., and X.F. conducted the clinical evaluation. L.W. and C.L. supervised the findings of this work. F.T., X.W., H.X., K.D., X.L., and Z.H. provided valuable intellectual input during the software refinement process. L.W., F.L., M.J., X.H. and J.C. took the lead in writing the manuscript and supplementary information. L.W., F.T., and C.L. accessed and verified the underlying data. S.Y. and J.D. provided funding support for this research. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final version of the manuscript, and ensure it is the case.

Data sharing statement

The data supporting this study's findings are not publicly available due to privacy and confidentiality restrictions. However, the datasets can be obtained from the corresponding author upon reasonable request, subject to approval and compliance with relevant data protection regulations. Our source code is publicly available on GitHub (https://github.com/wangliyang123/HepaPathGPT.git) and has been archived on Zenodo with DOI: https://doi.org/10.5281/zenodo.17581413.⁴⁰

Declaration of interests

The authors declare no competing interests.

Acknowledgements

We would like to thank the following centers for providing the imaging and pathology data: [Zhongshan Hospital, Fudan University; Chifeng Municipal Hospital; Affiliated Hospital of Southwest Medical University; Third Affiliated Hospital of Chongqing Medical University; People's Hospital of Yubei District of Chongqing City; The Third Hospital of Mianyang; Luzhou Traditional Chinese Medicine Hospital; Leshan Hospital, Chengdu University of Traditional Chinese Medicine; Beijing Anzhen Nanchong Hospital, Capital Medical University; and The Second People's Hospital of YiBin]. Additionally, we are grateful to the pathologists [Kaixin Du, Xuehong Liao, Kaining Ye, Zhenshui Wu, and Youqin Jiang] for their valuable contributions to the result valuation. We are also grateful for Dreamstime (https://www.dreamstime.com), Vecteezy (https://www.vecteezy.com), illustAC (https://en.ac-illust.com), iStock (https://www.istockphoto.com), Shutterstock (https://www.shutterstock.com), and Freepik (https://www.freepik.com) for providing materials for Fig. 1.

Footnotes

^{Appendix A}

Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2025.106060.

Contributor Information

Jiabin Cai, Email: cai.jiabin@zs-hospital.sh.cn.

Shizhong Yang, Email: ysza02008@btch.edu.cn.

Jiahong Dong, Email: dongjiahong@mail.tsinghua.edu.cn.

Appendix A. Supplementary data

Supplementary Materials

mmc1.docx^{(5.8MB, docx)}

References

1.Ferrante N.D., Pillai A., Singal A.G. Update on the diagnosis and treatment of hepatocellular carcinoma. Gastroenterol Hepatol. 2020;16(10):506–516. [PMC free article] [PubMed] [Google Scholar]
2.Koshy A. Evolving global etiology of hepatocellular carcinoma (HCC): insights and trends for 2024. J Clin Exp Hepatol. 2025;15 doi: 10.1016/j.jceh.2024.102406. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Vitale A., Cabibbo G., Iavarone M., et al. Personalised management of patients with hepatocellular carcinoma: a multiparametric therapeutic hierarchy concept. Lancet Oncol. 2023;24(7):e312–e322. doi: 10.1016/S1470-2045(23)00186-9. [DOI] [PubMed] [Google Scholar]
4.Papaconstantinou D., Tsilimigras D.I., Pawlik T.M. Recurrent hepatocellular carcinoma: patterns, detection, staging and treatment. J Hepatocell Carcinoma. 2022;9:947–957. doi: 10.2147/JHC.S342266. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Tabrizian P., Jibara G., Shrager B., Schwartz M., Roayaie S. Recurrence of hepatocellular cancer after resection: patterns, treatments, and prognosis. Ann Surg. 2015;261(5):947–955. doi: 10.1097/SLA.0000000000000710. [DOI] [PubMed] [Google Scholar]
6.Jones C., Thornton J., Wyatt J.C. Enhancing trust in clinical decision support systems: a framework for developers. BMJ Health Care Inform. 2021;28(1) doi: 10.1136/bmjhci-2020-100247. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Yang X., Ni H., Lu Z., et al. Mesenchymal circulating tumor cells and Ki67: their mutual correlation and prognostic implications in hepatocellular carcinoma. BMC Cancer. 2023;23(1):10. doi: 10.1186/s12885-023-10503-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Burkhart R.A., Ronnekleiv-Kelly S.M., Pawlik T.M. Personalized therapy in hepatocellular carcinoma: molecular markers of prognosis and therapeutic response. Surg Oncol. 2017;26(2):138–145. doi: 10.1016/j.suronc.2017.01.009. [DOI] [PubMed] [Google Scholar]
9.Teufel A., Kudo M., Qian Y., et al. Current trends and advancements in the management of hepatocellular carcinoma. Dig Dis. 2024;42(4):349–360. doi: 10.1159/000538815. [DOI] [PubMed] [Google Scholar]
10.Di Tommaso L., Spadaccini M., Donadon M., et al. Role of liver biopsy in hepatocellular carcinoma. World J Gastroenterol. 2019;25(40):6041–6052. doi: 10.3748/wjg.v25.i40.6041. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Fan Y., Yu Y., Wang X., Hu M., Hu C. Radiomic analysis of Gd-EOB-DTPA-enhanced MRI predicts Ki-67 expression in hepatocellular carcinoma. BMC Med Imaging. 2021;21(1):100. doi: 10.1186/s12880-021-00633-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Yan Y., Lin X.S., Ming W.Z., et al. Radiomic analysis based on Gd-EOB-DTPA enhanced MRI for the preoperative prediction of Ki-67 expression in hepatocellular carcinoma. Acad Radiol. 2024;31(3):859–869. doi: 10.1016/j.acra.2023.07.019. [DOI] [PubMed] [Google Scholar]
13.Wang F., Zhan G., Chen Q.Q., et al. Multitask deep learning for prediction of microvascular invasion and recurrence-free survival in hepatocellular carcinoma based on MRI images. Liver Int. 2024;44(6):1351–1362. doi: 10.1111/liv.15870. [DOI] [PubMed] [Google Scholar]
14.Hectors S.J., Lewis S., Besa C., et al. MRI radiomics features predict immuno-oncological characteristics of hepatocellular carcinoma. Eur Radiol. 2020;30(7):3759–3769. doi: 10.1007/s00330-020-06675-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Xia T.Y., Zhou Z.H., Meng X.P., et al. Predicting microvascular invasion in hepatocellular carcinoma using CT-based radiomics model. Radiology. 2023;307(4) doi: 10.1148/radiol.222729. [DOI] [PubMed] [Google Scholar]
16.Zhao Y.M., Xie S.S., Wang J., et al. Added value of CE-CT radiomics to predict high Ki-67 expression in hepatocellular carcinoma. BMC Med Imaging. 2023;23(1):138. doi: 10.1186/s12880-023-01069-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Zheng J., Du P.Z., Yang C., et al. DCE-MRI-based radiomics in predicting angiopoietin-2 expression in hepatocellular carcinoma. Abdom Radiol (NY) 2023;48(11):3343–3352. doi: 10.1007/s00261-023-04007-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Benary M., Wang X.D., Schmidt M., et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw Open. 2023;6(11) doi: 10.1001/jamanetworkopen.2023.43689. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lu M.Y., Chen B., Williamson D.F.K., et al. A multimodal generative AI copilot for human pathology. Nature. 2024;634(8033):466–473. doi: 10.1038/s41586-024-07618-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zhou J., He X., Sun L., et al. Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. Nat Commun. 2024;15(1):5649. doi: 10.1038/s41467-024-50043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Zhang K., Zhou R., Adhikarla E., et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat Med. 2024;30(11):3129–3141. doi: 10.1038/s41591-024-03185-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Rosenbloom S.T., Smith J.R.L., Bowen R., Burns J., Riplinger L., Payne T.H. Updating HIPAA for the electronic medical record era. J Am Med Inform Assoc. 2019;26(10):1115–1119. doi: 10.1093/jamia/ocz090. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Seoni S., Shahini A., Meiburger K.M., et al. All you need is data preparation: a systematic review of image harmonization techniques in Multi-center/device studies for medical support systems. Comput Methods Programs Biomed. 2024;250 doi: 10.1016/j.cmpb.2024.108200. [DOI] [PubMed] [Google Scholar]
24.Zhou T., Wang W. Cross-Image pixel contrasting for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2024;46(8):5398–5412. doi: 10.1109/TPAMI.2024.3367952. [DOI] [PubMed] [Google Scholar]
25.Yu H., Ye X., Hong W., Shi R., Ding Y., Liu C. A cascading learning method with SegFormer for radiographic measurement of periodontal bone loss. BMC Oral Health. 2024;24(1):325. doi: 10.1186/s12903-024-04079-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Radford A., Kim J.W., Hallacy C., et al. International conference on machine learning. PMLR; 2021. Learning transferable visual models from natural language supervision; pp. 8748–8763. [Google Scholar]
27.Liu H., Li C., Li Y., et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024. Improved baselines with visual instruction tuning; pp. 26296–26306. [Google Scholar]
28.Li C., Wong C., Zhang S., et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. Adv Neural Inf Process Syst. 2023;36:28541–28564. [Google Scholar]
29.Wu W., Luo M., Wang H., et al. Towards general continuous memory for vision-language models. arXiv [Preprint] 2025 https://arxiv.org/abs/2505.17670 Available from: [Google Scholar]
30.Lu H., Liu W., Zhang B., et al. DeepSeek-VL: towards real-world vision-language understanding. arXiv [Preprint] 2024 https://arxiv.org/abs/2403.05525 Available from: [Google Scholar]
31.Wang H., Gao C., Dantona C., Hull B., Sun J. DRG-LLaMA: tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med. 2024;7(1):16. doi: 10.1038/s41746-023-00989-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Akhmetov I., Mussabayev R., Gelbukh A. Reaching for upper bound ROUGE score of extractive summarization methods. PeerJ Comput Sci. 2022;8 doi: 10.7717/peerj-cs.1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Xu X., Zhang H.L., Liu Q.P., et al. Radiomic analysis of contrast-enhanced CT predicts microvascular invasion and outcome in hepatocellular carcinoma. J Hepatol. 2019;70(6):1133–1144. doi: 10.1016/j.jhep.2019.02.023. [DOI] [PubMed] [Google Scholar]
34.Lv K., Cao X., Du P., Fu J.Y., Geng D.Y., Zhang J. Radiomics for the detection of microvascular invasion in hepatocellular carcinoma. World J Gastroenterol. 2022;28(20):2176–2183. doi: 10.3748/wjg.v28.i20.2176. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Cho E.S., Choi J.Y. MRI features of hepatocellular carcinoma related to biologic behavior. Korean J Radiol. 2015;16(3):449–464. doi: 10.3348/kjr.2015.16.3.449. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Chou Y.C., Lao I.H., Hsieh P.L., et al. Gadoxetic acid-enhanced magnetic resonance imaging can predict the pathologic stage of solitary hepatocellular carcinoma. World J Gastroenterol. 2019;25(21):2636–2649. doi: 10.3748/wjg.v25.i21.2636. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Rong D., Liu W., Kuang S., et al. Preoperative prediction of pathologic grade of HCC on gadobenate dimeglumine-enhanced dynamic MRI. Eur Radiol. 2021;31(10):7584–7593. doi: 10.1007/s00330-021-07891-0. [DOI] [PubMed] [Google Scholar]
38.Joo I., Kim S.Y., Kang T.W., et al. Radiologic-Pathologic correlation of hepatobiliary phase hypointense nodules without arterial phase hyperenhancement at gadoxetic acid-enhanced MRI: a multicenter study. Radiology. 2020;296(2):335–345. doi: 10.1148/radiol.2020192275. [DOI] [PubMed] [Google Scholar]
39.Chen V.L., Sharma P. Role of biomarkers and biopsy in hepatocellular carcinoma. Clin Liver Dis. 2020;24(4):577–590. doi: 10.1016/j.cld.2020.07.001. [DOI] [PubMed] [Google Scholar]
40.Wang L., Tian F., Li F., et al. Zenodo; 2025. Code for the paper A generative vision-language model for holistic pathological assessment using preoperative imaging in hepatocellular carcinoma. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx^{(5.8MB, docx)}

[bib1] 1.Ferrante N.D., Pillai A., Singal A.G. Update on the diagnosis and treatment of hepatocellular carcinoma. Gastroenterol Hepatol. 2020;16(10):506–516. [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Koshy A. Evolving global etiology of hepatocellular carcinoma (HCC): insights and trends for 2024. J Clin Exp Hepatol. 2025;15 doi: 10.1016/j.jceh.2024.102406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Vitale A., Cabibbo G., Iavarone M., et al. Personalised management of patients with hepatocellular carcinoma: a multiparametric therapeutic hierarchy concept. Lancet Oncol. 2023;24(7):e312–e322. doi: 10.1016/S1470-2045(23)00186-9. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Papaconstantinou D., Tsilimigras D.I., Pawlik T.M. Recurrent hepatocellular carcinoma: patterns, detection, staging and treatment. J Hepatocell Carcinoma. 2022;9:947–957. doi: 10.2147/JHC.S342266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Tabrizian P., Jibara G., Shrager B., Schwartz M., Roayaie S. Recurrence of hepatocellular cancer after resection: patterns, treatments, and prognosis. Ann Surg. 2015;261(5):947–955. doi: 10.1097/SLA.0000000000000710. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Jones C., Thornton J., Wyatt J.C. Enhancing trust in clinical decision support systems: a framework for developers. BMJ Health Care Inform. 2021;28(1) doi: 10.1136/bmjhci-2020-100247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Yang X., Ni H., Lu Z., et al. Mesenchymal circulating tumor cells and Ki67: their mutual correlation and prognostic implications in hepatocellular carcinoma. BMC Cancer. 2023;23(1):10. doi: 10.1186/s12885-023-10503-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Burkhart R.A., Ronnekleiv-Kelly S.M., Pawlik T.M. Personalized therapy in hepatocellular carcinoma: molecular markers of prognosis and therapeutic response. Surg Oncol. 2017;26(2):138–145. doi: 10.1016/j.suronc.2017.01.009. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Teufel A., Kudo M., Qian Y., et al. Current trends and advancements in the management of hepatocellular carcinoma. Dig Dis. 2024;42(4):349–360. doi: 10.1159/000538815. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Di Tommaso L., Spadaccini M., Donadon M., et al. Role of liver biopsy in hepatocellular carcinoma. World J Gastroenterol. 2019;25(40):6041–6052. doi: 10.3748/wjg.v25.i40.6041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Fan Y., Yu Y., Wang X., Hu M., Hu C. Radiomic analysis of Gd-EOB-DTPA-enhanced MRI predicts Ki-67 expression in hepatocellular carcinoma. BMC Med Imaging. 2021;21(1):100. doi: 10.1186/s12880-021-00633-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Yan Y., Lin X.S., Ming W.Z., et al. Radiomic analysis based on Gd-EOB-DTPA enhanced MRI for the preoperative prediction of Ki-67 expression in hepatocellular carcinoma. Acad Radiol. 2024;31(3):859–869. doi: 10.1016/j.acra.2023.07.019. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Wang F., Zhan G., Chen Q.Q., et al. Multitask deep learning for prediction of microvascular invasion and recurrence-free survival in hepatocellular carcinoma based on MRI images. Liver Int. 2024;44(6):1351–1362. doi: 10.1111/liv.15870. [DOI] [PubMed] [Google Scholar]

[bib14] 14.Hectors S.J., Lewis S., Besa C., et al. MRI radiomics features predict immuno-oncological characteristics of hepatocellular carcinoma. Eur Radiol. 2020;30(7):3759–3769. doi: 10.1007/s00330-020-06675-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Xia T.Y., Zhou Z.H., Meng X.P., et al. Predicting microvascular invasion in hepatocellular carcinoma using CT-based radiomics model. Radiology. 2023;307(4) doi: 10.1148/radiol.222729. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Zhao Y.M., Xie S.S., Wang J., et al. Added value of CE-CT radiomics to predict high Ki-67 expression in hepatocellular carcinoma. BMC Med Imaging. 2023;23(1):138. doi: 10.1186/s12880-023-01069-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Zheng J., Du P.Z., Yang C., et al. DCE-MRI-based radiomics in predicting angiopoietin-2 expression in hepatocellular carcinoma. Abdom Radiol (NY) 2023;48(11):3343–3352. doi: 10.1007/s00261-023-04007-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Benary M., Wang X.D., Schmidt M., et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw Open. 2023;6(11) doi: 10.1001/jamanetworkopen.2023.43689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Lu M.Y., Chen B., Williamson D.F.K., et al. A multimodal generative AI copilot for human pathology. Nature. 2024;634(8033):466–473. doi: 10.1038/s41586-024-07618-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Zhou J., He X., Sun L., et al. Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4. Nat Commun. 2024;15(1):5649. doi: 10.1038/s41467-024-50043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Zhang K., Zhou R., Adhikarla E., et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat Med. 2024;30(11):3129–3141. doi: 10.1038/s41591-024-03185-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Rosenbloom S.T., Smith J.R.L., Bowen R., Burns J., Riplinger L., Payne T.H. Updating HIPAA for the electronic medical record era. J Am Med Inform Assoc. 2019;26(10):1115–1119. doi: 10.1093/jamia/ocz090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Seoni S., Shahini A., Meiburger K.M., et al. All you need is data preparation: a systematic review of image harmonization techniques in Multi-center/device studies for medical support systems. Comput Methods Programs Biomed. 2024;250 doi: 10.1016/j.cmpb.2024.108200. [DOI] [PubMed] [Google Scholar]

[bib24] 24.Zhou T., Wang W. Cross-Image pixel contrasting for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2024;46(8):5398–5412. doi: 10.1109/TPAMI.2024.3367952. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Yu H., Ye X., Hong W., Shi R., Ding Y., Liu C. A cascading learning method with SegFormer for radiographic measurement of periodontal bone loss. BMC Oral Health. 2024;24(1):325. doi: 10.1186/s12903-024-04079-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Radford A., Kim J.W., Hallacy C., et al. International conference on machine learning. PMLR; 2021. Learning transferable visual models from natural language supervision; pp. 8748–8763. [Google Scholar]

[bib27] 27.Liu H., Li C., Li Y., et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024. Improved baselines with visual instruction tuning; pp. 26296–26306. [Google Scholar]

[bib28] 28.Li C., Wong C., Zhang S., et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. Adv Neural Inf Process Syst. 2023;36:28541–28564. [Google Scholar]

[bib29] 29.Wu W., Luo M., Wang H., et al. Towards general continuous memory for vision-language models. arXiv [Preprint] 2025 https://arxiv.org/abs/2505.17670 Available from: [Google Scholar]

[bib30] 30.Lu H., Liu W., Zhang B., et al. DeepSeek-VL: towards real-world vision-language understanding. arXiv [Preprint] 2024 https://arxiv.org/abs/2403.05525 Available from: [Google Scholar]

[bib31] 31.Wang H., Gao C., Dantona C., Hull B., Sun J. DRG-LLaMA: tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med. 2024;7(1):16. doi: 10.1038/s41746-023-00989-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Akhmetov I., Mussabayev R., Gelbukh A. Reaching for upper bound ROUGE score of extractive summarization methods. PeerJ Comput Sci. 2022;8 doi: 10.7717/peerj-cs.1103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Xu X., Zhang H.L., Liu Q.P., et al. Radiomic analysis of contrast-enhanced CT predicts microvascular invasion and outcome in hepatocellular carcinoma. J Hepatol. 2019;70(6):1133–1144. doi: 10.1016/j.jhep.2019.02.023. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Lv K., Cao X., Du P., Fu J.Y., Geng D.Y., Zhang J. Radiomics for the detection of microvascular invasion in hepatocellular carcinoma. World J Gastroenterol. 2022;28(20):2176–2183. doi: 10.3748/wjg.v28.i20.2176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Cho E.S., Choi J.Y. MRI features of hepatocellular carcinoma related to biologic behavior. Korean J Radiol. 2015;16(3):449–464. doi: 10.3348/kjr.2015.16.3.449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Chou Y.C., Lao I.H., Hsieh P.L., et al. Gadoxetic acid-enhanced magnetic resonance imaging can predict the pathologic stage of solitary hepatocellular carcinoma. World J Gastroenterol. 2019;25(21):2636–2649. doi: 10.3748/wjg.v25.i21.2636. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Rong D., Liu W., Kuang S., et al. Preoperative prediction of pathologic grade of HCC on gadobenate dimeglumine-enhanced dynamic MRI. Eur Radiol. 2021;31(10):7584–7593. doi: 10.1007/s00330-021-07891-0. [DOI] [PubMed] [Google Scholar]

[bib38] 38.Joo I., Kim S.Y., Kang T.W., et al. Radiologic-Pathologic correlation of hepatobiliary phase hypointense nodules without arterial phase hyperenhancement at gadoxetic acid-enhanced MRI: a multicenter study. Radiology. 2020;296(2):335–345. doi: 10.1148/radiol.2020192275. [DOI] [PubMed] [Google Scholar]

[bib39] 39.Chen V.L., Sharma P. Role of biomarkers and biopsy in hepatocellular carcinoma. Clin Liver Dis. 2020;24(4):577–590. doi: 10.1016/j.cld.2020.07.001. [DOI] [PubMed] [Google Scholar]

[bib40] 40.Wang L., Tian F., Li F., et al. Zenodo; 2025. Code for the paper A generative vision-language model for holistic pathological assessment using preoperative imaging in hepatocellular carcinoma. [DOI] [PubMed] [Google Scholar]

PERMALINK

A generative vision-language model for holistic pathological assessment using preoperative imaging in hepatocellular carcinoma

Liyang Wang

Fa Tian

Fan Li

Min Wu

Lingxuan Hou

Jitao Wang

Jing Zhao

Xiaobin Feng

Chengquan Li

Xiaojuan Wang

Haoming Xia

Kaixin Du

Xuehong Liao

Mingli Jin

Xiaoli Hu

Ruishan Liu

Xu Feng

Jinming Cao

Zhichao Hu

Jiabin Cai

Shizhong Yang

Jiahong Dong

Summary

Background

Methods

Findings

Interpretation

Funding

Research in context.

Evidence before this study

Added value of this study

Implications of all the available evidence

Introduction

Fig. 1.

Methods

Fig. 2.

Patient enrolment

Inclusion and exclusion criteria

Ethics

Constructing the dataset

Collection of imaging-pathology text pairs

Imaging preprocessing

HepaPathGPT architecture

Model fine-tuning and inference

VLM fine-tuning

Model inference

Model evaluation

Statistics

Role of funders

Results

Patient enrolment results

Tumour segmentation module

VLM performance

Fine-tuning process

Fig. 3.

Comparative model performance

Fig. 4.

Evaluation by pathologists

Fig. 5.

Visualisation of HepaPathGPT outputs

Fig. 6.

Discussion

Contributors

Data sharing statement

Declaration of interests

Acknowledgements

Footnotes

Contributor Information

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases