Abstract
Artificial intelligence (AI) is rapidly revolutionizing the landscape of oncological research and the advancement of personalized clinical interventions. Progress in three interconnected areas, including the development of methods and algorithms for training AI models, the evolution of specialized computing hardware, and increased access to large volumes of cancer data such as imaging, genomics, and clinical information, has converged, leading to promising new applications of AI in cancer research. AI applications are systematically organized according to specific cancer types and clinical domains, encompassing the elucidation and prediction of biological mechanisms, the identification and utilization of patterns within clinical data to improve patient outcomes, and the unraveling of the complexities inherent in epidemiological, behavioral, and real-world datasets. When applied in an ethical and scientifically rigorous manner, these AI-driven approaches hold the promise of accelerating progress in cancer research and ultimately fostering improved health outcomes for all populations. We review examples demonstrating the integration of AI within oncology, highlighting cases where deep learning has adeptly addressed challenges once deemed insurmountable, while also discussing the barriers that must be surmounted to facilitate broader adoption of these technologies.
Keywords: Artificial intelligence (AI), Precision oncology, Cancer biology, Clinical translation, Real-world challenges
Introduction
Cancer remains a principal cause of mortality worldwide [1]. Projections estimate approximately 35 million cases by 2050 [2]. This alarming rise highlights the imperative to accelerate progress in cancer research and the development of therapeutic strategies.
Over the last decade, there has been a renewed and growing interest in the integration of artificial intelligence (AI) within the medical field, propelled by the advent of advanced deep-learning algorithms, significant advancements in computational hardware, and the rapid growth of data leveraged for clinical decision-making [3–5]. Furthermore, its application in oncology exhibits remarkable and expanding potential, encompassing foundational scientific pursuits such as protein folding predictions [6, 7], translational initiatives such as biomarker discovery [8, 9], and clinical progress in the organization and management of trials [10, 11].
In this review, we aim to provide a comprehensive overview of the present state and evolving landscape of AI in the realm of oncology. We initiate our discussion by summarizing the major types of AI models and input data modalities. Next, we review recent advancements in AI across six key domains, including cancer screening and diagnosis, precision treatment, cancer surveillance, drug discovery, health care delivery, and mechanisms of cancer. Finally, we highlight the principal obstacles impeding the widespread clinical integration of AI and propose strategic, actionable approaches to catalyze future innovations in this rapidly evolving field.
AI models and data modalities
Artificial intelligence enables systems to learn from data, recognize patterns, and make decisions [12]. In oncology, AI uses diverse data modalities, including medical imaging, genomics, and clinical records, to address complex challenges [13]. The selection of AI models depends on the data type and clinical objective [3]. Structured data such as genomic biomarkers and lab values are often analyzed using classical machine learning (ML) models including logistic regression and ensemble methods for tasks such as survival prediction or therapy response [14]. Imaging data including histopathology and radiology utilize deep learning (DL) architectures such as convolutional neural networks (CNNs) to extract spatial features, enabling tumor detection, segmentation, and grading [15]. Sequential or text data such as genomic sequences and clinical notes employ transformers or recurrent neural networks (RNNs) to model long-range dependencies, facilitating tasks such as biomarker discovery or electronic health record (EHR) mining [16]. Recent advances in large language models (LLMs) such as GPT-5 enhance knowledge extraction from scientific literature and clinical text, accelerating hypothesis generation in cancer research. For an overview of AI model evolution and technical specifications, see Appendix (Fig. 1).
Fig. 1.
Overview of AI models and corresponding data modalities in precision oncology. A Developmental history of AI and key models. B Clinical origin datasets leveraged in AI models. CT, computed tomography; MRI, magnetic resonance imaging; H&E, hematoxylin and eosin; IHC, immunohistochemistry; CNNs, convolutional neural networks; GNNs, graph neural networks
AI applications in cancer research and care
The integration of AI in cancer research and clinical practice encompasses advancing screening and diagnostic accuracy, improving precision cancer treatment, enhancing cancer surveillance, accelerating drug discovery, optimizing healthcare delivery, and elucidating cancer mechanisms. These applications collectively strive to improve patient outcomes and streamline clinical workflows, fostering more efficient and personalized cancer management (Fig. 2).
Fig. 2.
Key applications of artificial intelligence across the cancer care continuum. LLMs, large language models; FDA, food and drug administration; EHR, electronic health record
Expediting cancer screening, detection and diagnosis
AI is playing an increasingly important role in enhancing various aspects of cancer screening and detection methods by significantly improving their speed, accuracy, and overall reliability. The extensive and continually growing datasets generated by current screening programs offer a remarkable opportunity for the development and implementation of advanced AI applications. As AI continues to evolve, its application across diverse cancer detection modalities promises to revolutionize early diagnosis and improve clinical decision-making processes (Table 1).
Table 1.
Performance comparison of AI systems vs. Human experts in oncology applications
| Cancer type | Modality | Task | AI system | Dataset size | Gold-standard labels | Sensitivity | Specificity | Area Under the Curve (AUC) | External validation cohorts | Evidence level | Ref |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Colorectal cancer | Colonoscopy | Malignancy detection | CRCNet |
Training: 464,105 images from 12,179 patients Testing: 2,263 patients across 3 cohorts |
A group of five skilled endoscopists |
AI vs. Human: 1ST test set: 91.3% vs. 83.8% (p < 0.001) 2nd test set: 82.9% vs 87.6% (p = 0.29) 3rd test set: 96.5% vs. 90.3% (p = 0.006) |
AI vs. Human: 1ST test set: 85.3% (AI) 2nd test set: 95.0% (AI) 3rd test set: 99.2% (AI) |
1ST test set: 0.882 (95% CI: 0.828–0.931) 2nd test set: 0.874 (0.820–0.926) 3rd test set: 0.867 (0.795–0.923) |
Three independent cohorts: 1. Tianjin Cancer Hospital (internal) 2. Tianjin First Central Hospital (external) 3. Tianjin General Hospital (external) |
Retrospective multicohort diagnostic study with external validation | [17] |
| Colonoscopy/Histopathology | Histological classification of colorectal polyps (neoplastic vs. nonneoplastic) | Real-time image recognition system (bag-of-features with SIFT descriptors + SVM classifier) | 118 lesions (from 41 patients) | Histopathology (by a single GI pathologist blinded to real-time image recognition system) | AI: 95.9% (detection of neoplastic lesions) | AI: 93.3% (identification of nonneoplastic lesions) | NR | None (single-center study at Hiroshima University Hospital) | Prospective diagnostic accuracy study with blinded gold standard | [18] | |
| Colonoscopy/Histopathology | Compare autonomous AI to AI-assisted human (AI-H) optical diagnosis | CADx |
AI: 238 patients/158 polyps AI-H: 229 patients/179 polyps |
Histopathology by board-certified pathologists |
AI vs. AI-H: 84.8% (95% CI: 76.8–90.9) vs. 83.6 (75.4–90.0) |
AI vs. AI-H: 64.4% (48.8–78.1) vs. 63.8% (51.3–75.0) |
NR | None (single-center study at Montreal University Hospital Center) | Randomized controlled trial | [19] | |
| Breast cancer | 2D Mammography | Screening detection | Ensemble of three DL models (lesion-, breast-, and case-level analysis) |
UK: 25,856 women US: 3,097 women Reader study: 500 cases (465 analyzed) |
Biopsy-confirmed cancer within extended follow-up: UK: 39 months US: 27 months Reader study: 27 months |
UK vs. 1 st reader: + 2.7% (P = 0.004 for non-inferiority) US vs. radiologists: + 9.4% (P < 0.001) |
UK vs. 1 st reader: + 1.2% (P = 0.0096) US vs. radiologists: + 5.7% (P < 0.001) |
UK: 0.889 (95% CI 0.871–0.907) US: 0.8107 (95% CI 0.791–0.831) Reader study (US): 0.740 |
Model trained only on UK data tested on US dataset | Diagnostic case–control study with comparison to radiologists | [20] |
| 2D/3D Mammography |
1. Screening detection 2. Early cancer detection |
Progressively trained RetinaNet with MSP for DBT |
Reader Study (Site D):131 index cancers + 154 confirmed negatives 120 pre-index cancers + 154 negatives |
Cancers: Biopsy-proven malignancy Negatives: BI-RADS 1/2 + subsequent confirmed negative screen (9–39 months) Pre-index: Prior "negative" exams (12–24 mo pre-diagnosis) |
Index exams: + 14.2% absolute increase (95% CI: 9.2–18.5%) at avg. reader specificity Pre-index exams: + 17.5% absolute increase (95% CI: 6.0–26.2%) at avg. reader specificity |
Index exams: + 24.0% absolute increase (95% CI: 17.4–30.4%) at avg. reader sensitivity Pre-index exams: + 16.2% absolute increase (95% CI: 7.3–24.6%) at avg. reader sensitivity |
Reader Study (Site D): Index: 0.94 OMI-DB: 0.963 ± 0.003 Site A-DM: 0.927 ± 0.008 Site A-DBT: 0.957 ± 0.010 Site E (China): 0.971 ± 0.005 Site E (size-adjusted): 0.956 ± 0.020 |
1. OMI-DB (UK): Screening population 2. Site A-DM (US): GE equipment 3. Site A-DBT (US): Hologic equipment 4. Site E (China): Diagnostic population, Hologic equipment 5. Reader Study (Site D, US): Hologic equipment |
Diagnostic case–control study with comparison to radiologists | [21] | |
| B-mode and color Doppler ultrasound |
1. Malignancy detection 2. Localization (saliency maps for lesion detection) |
Weakly supervised DL model (ResNet-18 backbone) Trained with breast-level labels (no per-image/pixel annotations) |
Internal (NYU): 288,767 exams (5,442,907 images) from 143,203 patients External (BUSI): 780 images from 600 patients |
Pathology reports (biopsy/surgery within 30–120 days of ultrasound) Test set filtering: Positive exams: Cancers confirmed visible in ultrasound Negative exams: ≥ 1 cancer-negative follow-up or biopsy |
AI: 94.5% (at radiologists' avg. specificity) Radiologists (avg): 90.1% Improvement: + 4.4% (95% CI: − 0.3%, 7.5%; P = 0.0278) |
AI: 85.6% (at radiologists' avg. sensitivity) Radiologists (avg): 80.7% Improvement: + 4.9% (95% CI: 3.0%, 7.1%; P < 0.001) |
Internal test: 0.976 (95% CI: 0.972–0.980) Reader study (AI): 0.962 (95% CI: 0.943–0.979) Radiologists (avg): 0.924 ± 0.02 External validation (BUSI): 0.927 (95% CI: 0.907–0.959) |
BUSI dataset (Egypt): 780 images (437 benign, 210 malignant, 133 normal) |
Retrospective reader study | [22] | |
| B-mode and color Doppler ultrasound | Improve the detection rate of early breast cancer while reducing misdiagnosis | EDL-BC |
Training: 7,076 lesions (14,152 images) Internal validation: 879 lesions External validation 1 (TS): 448 lesions (Tangshan People's Hospital) External validation 2 (DZ): 245 lesions (Dazu People's Hospital) |
Pathology biopsy (malignant) or 3-year follow-up confirmation (benign) |
AI alone (threshold 0.76): 94.4% (internal), 100% (TS), 80% (DZ) Radiologists alone: 12.1% (average) Radiologists with AI assistance: 54.5% |
AI alone (threshold 0.76): 84.2% (internal) Radiologists alone: 99.1% Radiologists with AI assistance: 98.2% |
Internal validation: 0.950 (95% CI: 0.909–0.969) External validation (TS): 0.956 (95% CI: 0.939–0.971) External validation (DZ): 0.907 (95% CI: 0.877–0.938) Radiologists with AI assistance: 0.899 (vs. 0.716 without AI; p < 0.0001) |
Two independent cohorts: 1. Tangshan (TS): 448 lesions (391 patients), northern China 2. Dazu (DZ): 245 lesions (235 patients), southwestern China |
Retrospective, multicenter diagnostic study | [23] | |
| Ultrasound | Binary classification of metastatic vs. non-metastatic lymph nodes | Custom CNN (3 × 3 convolutional kernels, 9 layers including 6 residual layers) |
Total: 338 images (169 patients) Training: 248 images (124 patients) Testing: 90 images (45 patients) |
Core needle biopsy pathology (ground truth) Metastatic: 64 patients Non-metastatic: 105 patients |
AI: 65.5% (SD ± 28.6) | AI: 78.9% (SD ± 15.1) | 0.72 (SD ± 0.08) | None (single-center retrospective study) | Retrospective diagnostic study without external validation | [24] | |
| Histopathology (H&E/HES-stained whole slide images of biopsies) |
1. Malignancy detection 2. Subtype classification 3. Feature identification (e.g. TILs, ALI) |
Galen Breast (Ibex Medical Analytics) Architecture: Ensemble of CNNs |
Internal test: 2,252 slides (1,090 cases) External validation: 841 slides (436 cases) Clinical deployment: 12,031 slides (5,954 cases) |
Pathologist consensus (blinded review by ≥ 2 senior breast pathologists) Discrepancies resolved by third pathologist |
AI: Invasive carcinoma detection: 95.51% (95% CI: 91.03–97.81%) DCIS/ADH detection: 93.20% (95% CI: 86.63–96.67%) |
AI: Invasive carcinoma detection: 93.57% (95% CI: 90.07–95.90%) DCIS/ADH detection: 93.79% (95% CI: 88.63–96.70%) |
Invasive carcinoma detection: 0.990 (95% CI: 0.984–0.995) DCIS/ADH detection: 0.980 (95% CI: 0.967–0.993) |
1. Institut Curie (France) 2. Maccabi Healthcare Services (Israel) |
Prospective validation in clinical setting with real-time deployment | [25] | |
| Histopathology (H&E-stained whole-slide images) |
1. Binary classification of NHG 1 vs. NHG 3 2. Risk stratification of NHG 2 tumors into DG2-low/DG2-high |
DeepGrade: Ensemble of 20 CNNs |
Internal development: 1,567 patients External validation: 1,262 patients |
Nottingham Histological Grade (NHG) assigned by pathologists during routine clinical diagnostics | NR | NR |
NHG 1 vs. NHG 3 classification: - Internal AUC: 0.919–0.937 (95% CI: 0.884–0.987) - External AUC: 0.907 (95% CI: 0.885–0.930) |
the Sweden Cancerome Analysis Network—Breast (SCAN-B) Lund cohort (n = 1,262, fully independent) |
Retrospective cohort study with external validation | [26] | |
| Histopathology (H&E-stained whole-slide images) | Detection of metastases in lymph nodes (LNs) (micrometastases, macrometastases, and negative cases) | LYmph Node Assistant (LYNA) | Total: 70 digitized slides | Established by 3 board-certified pathologists (≥ 7 years' experience) using H&E and IHC Discrepancies resolved via adjudicated review |
For micrometastases: Assisted: 91.2% (95% CI: 86.0–96.5%) Unassisted: 83.3% (95% CI: 76.4–90.2%) P = 0.023 |
NR |
LYNA algorithm alone: Overall: 99.0% Micrometastases + negative slides: 98.5% |
None. Study used internal test data only | Multireader, multicase retrospective study with intra-dataset validation | [27] | |
| Histopathology (H&E-stained whole-slide images) | Detection of metastases in LNs (macrometastases, micrometastases, isolated tumor cells) | Visiopharm Integrator System (VIS) metastasis AI algorithm |
Total: 594 LNs SLN validation cohort: 234 sentinel LNs SLN consensus cohort: 102 sentinel LNs NSLN cohort: 258 nonsentinel LNs |
Consensus review by 3 subspecialized breast pathologists | 100% across all cohorts |
SLN validation cohort: 41.5% NSLN cohort: 78.5% |
NR | None (single-center retrospective study) | Retrospective diagnostic cohort study | [28] | |
| Lung cancer | Chest radiography (X-ray) | Detection of pulmonary metastases from mixed primary cancers | Lunit INSIGHT CXR 1 (v1.1.2.0), DL-based CAD |
Main study: CAD-assisted: 2,916 CXRs Conventional: 5,681 CXRs (after propensity matching) Subanalysis (reader test): 215 CXRs with paired CT |
Clinical/pathological diagnosis + CT confirmation |
Subanalysis (per-examination): Standalone CAD: 80% (73/91) Radiologists without CAD: 71% (257/364) Radiologists with CAD: 77% (279/364) |
Subanalysis (per-examination): Standalone CAD: 75% (93/124) Radiologists without CAD: 73% (360/496) Radiologists with CAD: 75% (374/496) |
NR | None (single-center retrospective study) | Retrospective diagnostic cohort study | [29] |
| Chest Radiography (X-ray) | Detection of actionable lung nodules (Lung-RADS category 4) | AI-based CAD software (Lunit INSIGHT CXR v2.0.2.0) | 10,476 participants (AI group: 5,238; non-AI group: 5,238) | CT scans within 3 months post-X-ray (actionable nodules: solid > 8 mm or subsolid solid portion > 6 mm) |
AI-assisted radiologists: 56.4% (31/55 actionable nodules detected) Non-AI radiologists: 23.2% (13/56 actionable nodules detected) |
AI group: 97.6% Non-AI group: 97.7% |
NR | None (single-center trial: Seoul National University Hospital, Korea) | Prospective randomized controlled trial | [30] | |
| Low-dose computed tomography (LDCT) | End-to-end malignancy risk prediction and localization | 3D DL model |
Training/Tuning/Test (NLST): 42,290 CT cases (14,851 patients; 578 biopsy-confirmed cancers within 1 year) External validation: 1,139 cases (907 patients; 27 biopsy-confirmed cancers) from a US academic medical center |
Cancer-positive: Biopsy/surgical confirmation within screening year Cancer-negative: Cancer-free on 1-year follow-up LDCT |
Without prior CT: AI: + 5.2% improvement (95% CI: 0.38–9.9) at LuMAS 3+ vs. Lung-RADS 3+ Early-stage cancers: + 24.4% sensitivity improvement |
Without prior CT: AI: + 11% improvement (95% CI: 7.8–15.1) at LuMAS 3+ vs. Lung-RADS 3+ With prior CT: Significant improvement for Lung-RADS 4A + threshold |
NLST test set: 94.4% (95% CI: 91.1–97.3) External validation set: 95.5% (95% CI: 88.0–98.4) |
Independent cohort from a US academic medical center (n = 1,139 cases) | Retrospective cohort study with external validation | [31] | |
| LDCT | Malignancy risk estimation of pulmonary nodules | DL algorithm using ensemble of 2D (ResNet50) and 3D (Inception-v1) CNNs |
Development: 16,077 nodules (1,249 malignant) from NLST External validation: 883 nodules (65 malignant) from DLCST |
Histopathologic confirmation for malignancies; ≥ 2 years CT follow-up for benign nodules |
Full DLCST cohort (90% specificity): AI: 84% (54/65) PanCan model: 63% (41/65) Cancer-enriched subset A: AI: 91% (54/59) at 90% specificity Clinicians (avg): NR Cancer-enriched subset B: AI: 54% (32/59) at 90% specificity Clinicians: Higher than 8/11 clinicians |
Fixed at 90% for sensitivity comparisons (as per study design) |
Full DLCST cohort: AI: 0.93 (95% CI: 0.89–0.96) PanCan model: 0.90 (95% CI: 0.86–0.93) Cancer-enriched subset A: AI: 0.96 (95% CI: 0.93–0.99) Clinicians (avg): 0.90 (95% CI: 0.87–0.94) Cancer-enriched subset B: AI: 0.86 (95% CI: 0.80–0.91) Clinicians (avg): 0.82 (95% CI: 0.77–0.86) |
DLCST cohort: n = 883 Cancer-enriched subset A: n = 175 Cancer-enriched subset B: n = 177 |
Retrospective cohort study with external validation | [32] | |
| LDCT | Malignancy risk stratification of pulmonary nodules | Chinese Lung Nodules Reporting and Data System (C-Lung-RADS) |
Primary cohort (MCC): 45,064 participants Independent testing cohort (MSC): 14,437 participants |
Pathologically confirmed malignancy (label 4) or clinician ratings (labels 1–3) |
C-Lung-RADS vs. Lung-RADS v2022 (Human): Internal testing: 79.9% vs. 60.3% (P < 0.001) External testing: 87.1% vs. 63.3% (P < 0.001) |
C-Lung-RADS vs. Lung-RADS v2022 (Human): Internal testing: 91.8% (derived from 8.2% FPR) External testing: 94.1% (derived from 5.9% FPR) |
Overall performance: Internal testing: 0.918 (95% CI: 0.918–0.919) External testing: 0.927 (95% CI: 0.926–0.928) |
Mobile screening cohort (MSC): 14,437 participants screened via mobile CT units across Western China (Sichuan Province) | Retrospective cohort study with external validation using real-world data | [33] | |
| Liquid biopsy (blood-based cfDNA fragmentome analysis) | Early detection of lung cancer in screening-eligible individuals | Machine learning classifier |
Total: 958 individuals Training: 576 (181 cases, 395 controls) Validation: 382 (248 cases, 134 controls) |
Pathologically confirmed lung cancer diagnosis (cases) vs. negative thoracic CT ± 12-month clinical follow-up (controls) | Overall (validation): 84% (95% CI: 79–88%) | Overall (validation): 53% (95% CI: 45–61%) | NR | Independent held-out cohort from the same prospective study (DELFI-L101, NCT04825834), but no truly external cohort used | Prospective case–control study with independent validation split | [34] | |
| Blood-based liquid biopsy (ctDNA methylation analysis via ELSA-seq) | Early detection of lung cancer | Soft-margin support vector machine classifier |
Total: 569 individuals Cases: 308 lung cancer patients Controls: 261 age-/sex-matched non-cancer individuals |
Cases: Pathologically confirmed lung cancer Controls: Negative thoracic CT ± 12-month clinical follow-up |
Overall: 52–81% (stage-dependent) Subgroup (115 individuals): 100% sensitivity at 100% specificity |
Overall: 96% (95% CI: 93–98%) Subgroup (115 individuals): 100% (95% CI: 91–100%) |
Training/Validation: 0.93 Independent test set: 0.90 |
Independent cohorts from two hospitals (Peking Union Medical College Hospital and Shanghai Chest Hospital) | Prospective case–control study with independent validation cohorts | [35] | |
| Histopathology (H&E-stained whole-slide images) |
1. Classification: LUAD vs. LUSC vs. normal 2. Mutation prediction: 6 oncogenes |
Google Inception v3 (deep CNN) |
TCGA: 1,634 slides (459 normal, 567 LuAD, 608 LUSC) Tiles: 987,931 (512 × 512 px) |
TCGA consensus pathology diagnosis + NYU pathologist annotations (for external cohorts) | AI: 89% | AI: 93% |
Classification (LUAD/LUSC/Normal): AUC: 0.97 (TCGA test set) Mutation prediction (LUAD): • AUC range: 0.733–0.856 (for 6/10 genes: STK11, EGFR, FAT1, SETBP1, KRAS, TP53) |
Independent cohorts (NYU Langone Medical Center): Frozen sections (n = 98) FFPE sections (n = 140) Biopsies (n = 102) EGFR mutation prediction (n = 63) |
Prospective validation in multiple external cohorts | [36] | |
| Prostate cancer | Biparametric magnetic resonance imaging (bpMRI) |
1. Prostate segmentation 2. Detection of clinically significant PCa (csPCa) lesions 3. Automated report generation |
AutoProstate framework |
Training: PROSTATEx (n = 204 patients, 299 Lesions: 76 csPCa, 223 low-grade or benign lesions) External validation: PICTURE (n = 247 patients, 210 Lesions: 147 csPCa, 63 low-grade or benign lesions) |
PROSTATEx (train): MR-guided targeted biopsy + histopathology (csPCa: Gleason ≥ 3 + 4). Annotations by radiologists PICTURE (test): Transperineal Template Prostate-Mapping (TTPM) biopsy + MR-targeted biopsy (csPCa: Gleason ≥ 3 + 4). Annotations by radiologists using histopathology correlation |
Lesion-Level (external validation): AI: 76% Radiologist: 78% (p > 0.05, not statistically significant) |
Lesion-Level (external validation): AI: 57% Radiologist: 48% (p > 0.05, not statistically significant) |
Lesion-Level (external validation): AI: 0.70 Radiologist: 0.64 (p > 0.05, not statistically significant) |
PICTURE dataset (n = 247) | External validation using a prospective cohort from a different institution | [37] |
| bpMRI |
1. Detection of index lesions 2. Classification: PCa vs. benign; csPCa 3. PI-RADS feature-based explanations |
Explainable AI (XAI) model |
Internal: 1,224 patients (3,260 lesions) Training/validation: 1,108 patients (3,094 lesions) Internal test: 116 patients (166 lesions) External: PROSTATEx dataset (204 patients, 330 lesions) |
Histopathologic analysis (biopsy/prostatectomy) |
csPCa detection (patient-level): 93% (95% CI: 87–98%) Lesion-level: 77% (CI: 69–85%) for csPCa |
Lesion-level: 89% (CI: 84–95%) for csPCa |
Internal: 0.89 (CI: 0.85–0.93) for csPCa External (PROSTATEx): 0.87 (CI: 0.81–0.93) for csPCa |
PROSTATEx dataset | Retrospective diagnostic study with external validation | [38] | |
| bpMRI and multiparametric MRI (mpMRI) | Detection and localization of csPCa (Gleason grade group ≥ 2) | Ensemble of 5 DL models |
Total: 10,207 MRI exams (9,129 patients) Training: 9,207 exams Testing: 1,000 exams (including 197 external cases) |
Histopathology (biopsy/prostatectomy) + ≥ 3-year follow-up (median 5 years) |
vs. Radiologists (reader study): 89.4% (matched specificity) vs. Clinical practice: 96.1% |
vs. Radiologists (reader study): 79.1% (50.4% fewer false positives) vs. Clinical practice: 68.9% |
Reader study cohort: 0.91 (95% CI 0.87–0.94) Full testing cohort: 0.93 (0.91–0.94) |
Testing cohort included 197 exams from St. Olav's Hospital, Norway (unseen centre) | Large retrospective diagnostic study with independent external validation | [39] | |
| Digital histopathology (whole-slide images of biopsies) |
1. PCa detection 2. Gleason grading (ISUP Grade Groups 1–5) |
15 DL algorithms (ensemble-based CNNs) |
Development: 10,616 biopsies (2,113 patients) Internal validation: 545 biopsies External validation: US (741 biopsies), EU (330 biopsies) |
Internal: Consensus of 3–4 uropathologists US external: majority vote of 6 uropathologists + immunohistochemistry EU external: single expert uropathologist |
US external: 98.6% (95% CI: 97.6–99.3) Human pathologists (US): 91.9% (95% CI: 89.3–95.5) |
US external: 75.0% (95% CI: 61.2–82.7) Human pathologists (US): 95.0% (95% CI: 87.4–98.1) |
NR |
US cohort: 741 biopsies from 3 sites EU cohort: 330 biopsies from Karolinska Hospital |
Retrospective diagnostic study with independent external validation | [40] | |
| H&E-stained histopathology | Gleason grading of biopsies | Extended U-Net architecture |
Internal: 5,759 biopsies (1,243 patients) External: 245 TMA cores |
Consensus reference standard by three uropathology experts | 95.4% (AI) vs. pathologists' median | 95.2% (AI) vs. pathologists' median |
0.990 (benign vs. malignant) 0.978 (Grade Group ≥ 2) 0.974 (Grade Group ≥ 3) |
Tissue microarray set (public dataset from University Hospital Zurich): (n = 245 cores) | Retrospective diagnostic study with external validation | [41] | |
| Brain cancer | mpMRI | Binary classification of glioma grades (HGG vs. LGG) | Three machine learning classifiers | 285 patients (210 HGG, 75 LGG) | Histopathological grading confirmed by surgical specimens | 95.08% | 70.22% | 0.903 | None | Retrospective diagnostic study with cross-validation | [42] |
| Multi-modal MRI (3D black-blood [BB] + 3D gradient echo [GRE]) | Automatic detection and segmentation of brain metastases | 3D U-Net with encoder-decoder architecture + reconstruction decoder for regularization |
Training: 188 patients (917 lesions) Test: 45 patients (203 lesions) + 49 metastasis-free patients |
Manual segmentation by radiologists, confirmed/modified by senior neuroradiologist | 93.10% | 69.4% | 0.78 (0.85–0.91) | None | Retrospective diagnostic study without external validation | [43] | |
| mpMRI | Non-invasive prediction of 1p/19q co-deletion status | Deep CNN based on ResNet-34 architecture |
Total: 555 patients Training: 330 Internal validation: 123 External testing (TCIA): 102 |
1p/19q status determined by fluorescence in situ hybridization (FISH) on surgical tissue samples | Testing cohort: 93.0% (95% CI: 89.43–96.03) | Testing cohort: 93.6% (95% CI: 90.45–96.46) |
Training: 0.999 Internal validation: 0.986 Testing (external): 0.983 (95% CI: 0.9110–0.9549) |
the cancer imaging archive (TCIA) dataset (n = 102) | Retrospective multi-center validation with external cohort | [44] | |
|
Pan-Cancer (BRCA, COAD, LUNG, LIHC, PRAD) |
cfDNA methylation profiling |
1. Early cancer detection 2. Tumor localization (tissue-of-origin) |
SRFD-Bayes (Semi-reference-free deconvolution + Bayesian diagnostic model) |
Training: 207 normal controls + 113 late-stage cancer patients Validation: 207 normal controls + 79 early-stage cancer patients + 191 pre-diagnosis samples |
Core needle biopsy pathology | AI: 86.1% (early detection) | AI: 94.7% (early detection) | Close to 0.98 (SRFD-Bayes for cancer detection) |
Three independent cohorts: 1. GSE129374 (21 cirrhosis vs. 22 HCC) 2. HCC cohort (n = 1,050, stages I–IV) 3. Pre-diagnosis cohort (n = 191) |
Retrospective diagnostic study | [45] |
Abbreviation: ADH Atypical ductal hyperplasia, AI Artificial intelligence, ALI Angiolymphatic invasion, BRCA Breast invasive carcinoma, BUSI Breast Ultrasound Images dataset, CADe computer-aided detection, CADx Computer-aided diagnosis, cfDNA cell-free DNA, CI Confidence interval, CNN Convolutional neural network, COAD Colon adenocarcinoma, ctDNA circulating tumor DNA, CXR Chest radiography, DBT Digital breast tomosynthesis, DCIS Ductal carcinoma in situ, DG DeepGrade, DL Deep learning, DLCST Danish Lung Cancer Screening Trial, DM Digital mammography, EDL-BC Ensemble deep learning for breast cancer, FPR False-positive rate, HCC Hepatocellular carcinoma, HES Hematoxylin–eosin-saffron, HGG High-grade gliomas, H&E Hematoxylin and eosin, ISUP International Society of Urological Pathology, LGG Low-grade gliomas, LIHC Liver hepatocellular carcinoma, LUAD lung adenocarcinoma, LUMAS Lung malignancy scores, LUNG Lung squamous cell carcinoma & lung adenocarcinoma, Lung-RADS Lung Imaging Reporting and Data System, LUSC squamous cell carcinoma, MCC Medical checkup cohort, MSP Maximum suspicion projection, NLST National Lung Cancer Screening Trial, NR Not reported, NYU New York University, OMI-DB Optimam mammography imaging database, PCa Prostate cancer, SIFT Scale-invariant feature transform, SLN Sentinel lymph node, SVM Support vector machine, TILs Tumor-infiltrating lymphocytes, TMA Tissue microarray, UK United Kingdom, US United States, 2D Two-dimensional
Colorectal cancer
Colonoscopy remains the gold-standard for colorectal cancer (CRC) screening, though its efficacy varies with the operator's expertise, resulting in variability in adenoma detection rates and missed lesions that can contribute to the incidence of interval cancers [46–48]. To address these challenges, AI has emerged as a transformative breakthrough in colonoscopy, using DL and ML techniques to improve real-time decision-making by extensive, annotated datasets of colonoscopic images to train deep neural networks (DNNs), thereby enabling automated polyp detection, classification, and quality assessment [49, 50]. For example, Zhou et al. [17] introduced CRCNet, a DL model designed for the detection of CRC within endoscopic images, demonstrating high performance across three independent datasets. Multiple companies have achieved the Food and Drug Administration (FDA) clearance or the European Union (EU) certification for their computer-aided detection (CADe) systems designed to identify polyps in colonoscopy images (e.g., K211951, K223473), and a series of randomized controlled trials have demonstrated compelling evidence of these technologies’ clinical efficacy and potential to enhance diagnostic accuracy [51–56]. However, Mangas-Sanjuan et al. [57] designed a randomized controlled trial primarily to assess whether computer-assisted colonoscopy increases the detection of advanced colorectal neoplasias in patients with positive fecal immunochemical test (FIT) results in organized CRC screening programs. They concluded that CADe did not improve the colonoscopic identification of advanced colorectal neoplasias. In a meta-analysis of data from 18,232 patients across 21 randomized trials, the investigators found that using CADe for polyp detection during colonoscopy increases the detection of adenomas but not advanced adenomas, and leads to higher rates of unnecessary removal of nonneoplastic polyps [58]. The importance of detecting and removing these small growths, also known as polyps, still sparks debate [59].
AI-driven computer-aided diagnosis (CADx) systems employ advanced imaging analytics to distinguish benign from malignant lesions, and the integration of AI-derived histopathological predictions may serve as a valuable tool for improving diagnostic accuracy [46, 60]. An initial study comparing real-time image recognition system analysis with narrow-band imaging diagnosis, along with assessing the correlation between image analysis and pathological results, showed promising results, achieving approximately 90% accuracy and a negative predictive value (NPV) over 90% [18]. Data from the past five years offer clues on the NPV of CADx for neoplastic histologic prediction in small rectosigmoid polyps [61–64]. In three studies, CADx achieved an NPV of 90% or higher, meeting the performance threshold proposed by the American Society for Gastrointestinal Endoscopy (ASGE) for allowing the 'do not resect' strategy [65]. Research published in 2024 suggested that autonomous AI-based diagnosis has noninferior accuracy to endoscopist-based diagnosis [19]. Both autonomous AI and AI-assisted human (AI-H) showed relatively low accuracy for optical diagnosis. However, autonomous AI achieved higher agreement with pathology-based surveillance intervals [19]. Impressively, Hassan et al. [66] conducted a meta-analysis involving 7400 diminutive polyps, 3769 patients, and 185 endoscopists from 11 studies. They found that CADx did not provide benefit or harm for the resect-and-discard strategy, questioning its value in clinical practice. Improving the accuracy and explainability of CADx is desired.
In terms of histological diagnosis, Graham et al. [67] designed a comprehensive CNN architecture that preserves maximum information during feature extraction, which is crucial for successful gland instance segmentation in colon histology images. They also emphasized the generalizability of their method by processing whole-slide images (WSIs) from a different center with high accuracy. Zhao et al. [68] developed a DL model to quantify the tumor-stroma ratio (TSR) based on histologic WSIs of CRC and demonstrated its prognostic validity for patient stratification of overall survival (OS) in two independent CRC patient cohorts. This fully automatic method allows for objective and standardized assessment while reducing pathologists' workload.
Breast cancer
The application of computer-assisted techniques in enhancing the accuracy of medical imaging for breast cancer (BC) screening and detection has a storied history. Notably, CADe received approval from the FDA in 1998 for X-ray mammography, subsequently leading to its widespread integration into clinical practice [69]. However, CADe has been unable to fully meet the increasing demands for better mammographic performance due to limitations such as a high incidence of false positives and elevated recall rates [70]. Meanwhile, in recent years, the growing use of AI has transformed the capabilities of automated BC detection through mammographic interpretation [71].
In 2020, McKinney et al. [20] elegantly introduced an AI system that outperforms radiologists in a clinically relevant task of BC identification. Their system was trained and tested on two-dimensional (2D) mammograms from the United Kingdom (UK) and the United States (US), demonstrating its ability to generalize from training on UK data to testing on data collected from a US clinical site. Following this important study, Lotter et al. [21] developed a DL approach that effectively uses both strongly and weakly labeled data by gradually training in stages while keeping localization-based interpretability. Their approach also extends to 3D mammography, which is especially important given its increasing use as a primary screening method and the additional time needed for interpreting it. There are now several FDA-cleared AI products designed to aid radiologists in the detection of BC from mammograms (K220105, K211541, K200905) and prospective study in Sweden is also under way to assess the clinical utility of these products in real-world healthcare settings [72–75]. Furthermore, Yala et al. [76] developed the Mirai system, capable of predicting future five-year BC risk directly from mammograms, and it has been retrospectively validated across multiple hospitals. Vachon et al. [77] demonstrated that AI imaging algorithms not only enhance the detection of BC on mammography but also possess the potential to inform long-term risk prediction of invasive BC.
In addition to X-ray mammography, the availability of unique features and comprehensive imaging datasets such as ultrasound, magnetic resonance imaging (MRI), and positron emission tomography/computed tomography (PET/CT) offers opportunities for developing clinically impactful AI applications. Shen et al. [22] introduced a radiologist-level AI system that can automatically identify malignant lesions in breast ultrasound images. By validating its performance on an external dataset, they also provided initial results supporting its ability to generalize across a patient cohort with different demographic composition and image acquisition protocols. Moreover, ensemble DL models can detect subtle features in BC lesion images, thereby improving both diagnostic accuracy and efficiency in ultrasound interpretation, while the integration of ultrasound and elastography through DL techniques holds promise for more precise prediction of axillary lymph node metastasis, potentially reducing false positives and unnecessary biopsies [23, 24]. Additionally, FDA-approved AI algorithms have been developed to facilitate the interpretation of MRI (DEN1700, RRID:SCR_012945) and breast ultrasound examinations (K190442, K210670, P150043; RRID:SCR_012945) [78].
In recent years, liquid biopsy of body fluid samples, which include a diverse range of tumor-derived components, has gained considerable attention and momentum for the detection and characterization of BC. Zhou and colleagues [45] evaluated the application of circulating cell‐free DNA (cfDNA) methylation profiling for the characterization of tumor constituents and the detection of nascent neoplasms, utilizing a semi-reference deconvolution algorithm that integrates tumor scores and ML models, ultimately achieving high sensitivity and specificity in early BC detection.
The advent of innovative pathological techniques has yielded more comprehensive and intricate large-scale datasets encompassing breast fine-needle aspiration samples and tissue specimens [79]. For example, Sandbank et al. [25] introduced a DL model capable of classifying invasive and noninvasive BC subtypes and generating predicted clinical and morphological features, which was validated using external datasets from the Institut Curie and subsequently piloted as a second-reader diagnostic system at Maccabi Healthcare Services in Israel. Wang et al. [26] developed and validated a novel method, DeepGrade, for histological grading of BC, focused on re-stratification of the Nottingham histological grade (NHG) 2 cases. The approach provides a cost-effective alternative to molecular profiling to extract information relevant for clinical decisions. Another key focus in this field has been the detection of metastatic lesions in sentinel lymph nodes, given that treatment strategies in BC are frequently contingent upon the identification of such lesions [80]. For instance, Google [27] developed the LYmph Node Assistant (LYNA), and reported that pathologists employing this system achieved improved accuracy and efficiency in detecting micrometastases while reducing their review time. Challa et al. [28] reported that the Visiopharm Integrator System (VIS) metastasis AI algorithm demonstrated 100% sensitivity and NPV in detecting lymph node metastasis with readily recognized false negative causes and less time consumed during pathologists' review compared with immunohistochemistry (IHC) slides. This suggests its potential as a useful screening tool in routine clinical digital pathology workflows to improve efficiency. Recently, the Paige Lymph Node, an AI-assisted diagnostic tool developed by the company Paige, has received FDA Breakthrough Device Designation, recognizing its potential to aid pathologists in the detection of BC metastases within lymph node tissue [81]. In addition, quantitative assessment of tissue biomarkers, including Ki-67 and human epidermal growth factor receptor 2 (HER2), is essential for BC evaluation. However, the interpretation of these markers remains influenced by inherent subjectivity [82, 83]. Data from several studies demonstrated that AI-assisted assessment of Ki-67 could achieve a lower mean error [84] and a lower standard error deviation [85]. As another example, Dy et al. reported AI's potential to standardize Ki-67 scoring, especially between 5 and 30% proliferation index [86]. In 2025, DAN Albuquerque et al. [87] conducted a diagnostic meta-analysis to assess AI’s ability to classify HER2 IHC scores. They found that AI holds promising potential in accurately identifying HER2-low patients and excels in distinguishing 2 + and 3 + scores.
Lung cancer
AI-assisted interpretation for chest radiography has been one of the earliest applications of AI in medical imaging, as detecting abnormalities, especially small pulmonary nodules, remains a challenging task even for experienced thoracic radiologists [88]. Early studies demonstrated that AI achieved radiologist-level or superior performance in detecting various abnormalities on chest radiographs, and radiologists' detection accuracy was further improved when assisted by AI [89, 90]. In recent years, the real-world impact of AI CADe tools on interpretation accuracy has started to be documented. For example, a retrospective diagnostic cohort study [29] documented that a DL–based CADe system improved the diagnostic yield for newly visible metastasis on chest radiographs in patients with cancer with a similar false-referral rate. Nam et al. [30] conducted a randomized controlled trial involving patients undergoing chest radiography for health check-ups, demonstrating that AI CADe-assisted reading improved the detection rate of actionable lung nodules.
Currently, low-dose CT (LDCT) screening is the only exam currently proven to reduce mortality from lung cancer (LC) [91, 92]. A variety of software devices have been approved by the FDA to improve workflow efficiency and performance through enhanced detection of lung nodules [92–94]. In addition, assessing the risk of malignancy in detected lung nodules is another crucial aspect of comprehensive lung nodule evaluation. Impressively, Ardila et al. [31] employed advanced DL techniques to develop models with cutting-edge performance by utilizing full 3D LDCT volumes, pathology-confirmed case results, and previous volumes. If clinically validated, these models could assist clinicians in both localization and lung cancer risk assessment tasks. Venkadesh et al. [32] illustrated that DL–based models can predict the malignancy risk of pulmonary nodules with greater accuracy than radiologists' interpretations or existing risk prediction models. Wang et al. [33] presented the Chinese Lung Nodules Reporting and Data System (C-Lung-RADS), a multiphase approach to evaluate the malignancy risk of pulmonary nodules, improving early lung cancer detection while optimizing healthcare resources.
For blood-based liquid biopsy, Mazzone et al. [34] described the development and validation of a new blood-based LC screening test that uses a highly affordable, low-coverage genome-wide sequencing platform to analyze cfDNA fragmentation patterns. The test could increase LC screening rates leading to substantial public health benefits. Furthermore, Liang et al. [35] demonstrated that deep methylation sequencing, combined with a ML classifier analyzing methylation patterns, facilitates the detection of circulating tumour DNA (ctDNA) at dilution ratios as low as 1 in 10,000, thereby providing advantages in cancer screening and the assessment of treatment efficacy.
In histologic diagnosis of non–small cell lung cancer (NSCLC), such as distinguishing between lung adenocarcinoma (LUAD) and lung squamous carcinoma (LUSC), the process is relatively straightforward, and AI has the potential to support pathologists in the near future by handling these routine tasks. In line with this notion, there are already documented instances of AI-based approaches capable of effectively distinguishing LUAD from LUSC using hematoxylin and eosin (H&E) slides [95, 96]. Impressively, Coudray et al. [36] developed a DL model trained on The Cancer Genome Atlas (TCGA) H&E images to classify NSCLC subtypes, and their study was also among the first to predict mutation status of key driver genes directly from H&E images. Lu et al. [97] introduced a method named clustering-constrained-attention multiple-instance learning (CLAM) that can localize well-known morphological features on WSIs without needing spatial labels. It outperforms standard weakly supervised classification algorithms and is adaptable to independent test cohorts, smartphone microscopy, and varying tissue content. As another example, Wang et al. [98] developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, CNN-based classification of tumor cells, stromal cells, and lymphocytes, as well as extraction of tumor microenvironment-related features for LC pathology images. However, rare cases such as various neuroendocrine carcinoma subtypes frequently require additional time and specialized expertise from pathologists to achieve accurate diagnoses, and currently, the utility of clinical AI tools in these complex and atypical diagnostic scenarios remains questionable [99].
Prostate cancer
According to international guidelines, multiparametric MRI (mpMRI) of the prostate is recommended for radiologists to effectively identify lesions harboring clinically significant prostate cancer (csPCa) prior to conducting confirmatory pathological biopsies [100]. Various AI-powered methodologies have been developed to facilitate the analysis of MRI images for the detection, staging, and segmentation of PCa [101, 102]. For example, Mehta et al. [37] introduced AutoProstate, which uses patient data and biparametric MRI (bpMRI) to generate an automatic web-based report that includes segmentations of the entire prostate, prostatic zones, and potential csPCa lesions. It also presents several derived characteristics with clinical value. Hamm et al. [38] developed an explainable AI (XAI) model for detecting and classifying csPCa. The model improved confidence and reduced reading time for nonexperts while providing visual and textual explanations using well-established imaging features. Additionally, there are several prostate segmentation AI algorithms available on commercial platforms for users. All of these tools have received approval for use in the US by the FDA, including Prostate MR® (Siemens) [103], Quantib Prostate® (Quantib) [104], OnQ Prostate® (Cortechs.ai) [105], PROView® (GE Medical Systems) [106], and qp-Prostate® (Quibim) [107].
Various concurrent AI data challenges are currently ongoing. Among them, the Prostate Imaging–Cancer AI (PI-CAI) challenge stands out as a new grand challenge, featuring over 10,000 carefully curated prostate MRI exams to validate modern AI algorithms and assess radiologists’ performance at csPCa detection and diagnosis [39]. The prostate cancer grade assessment (PANDA) framework exemplifies this approach through its two-phase study design (development and validation) [40]. In the validation phase, submitted algorithms underwent independent evaluation by expert pathologists under strictly blinded conditions to ensure unbiased assessment. These initiatives establish a transparent and rigorous framework for the evaluation and benchmarking of AI algorithms, thereby fostering the progression of future algorithmic innovations.
The prevailing standard for PCa diagnosis entails the histopathological examination of biopsy specimens. Currently, numerous studies have demonstrated encouraging outcomes in the application of DL for autonomous cancer detection within digital WSIs of prostate biopsy specimens [108–110]. In 2019, Campanella et al. [108] introduced a multiple instance learning-based DL system that uses only the reported diagnoses as labels for training, thereby avoiding costly and time-consuming pixel-wise manual annotations. This model was further examined by other researchers and was later named Paige Prostate, which has received FDA approval for clinical use in the automated detection of PCa in core needle biopsies [111–113]. Beyond the initial diagnosis of PCa, DL–based approaches have been extensively investigated to improve the precision of Gleason grading, a method for determining the aggressiveness of PCa based on microscopic examination of tissue samples. For instance, Bulten et al. [41] demonstrated that an automated DL system achieved performance similar to that of pathologists in Gleason grading and could potentially aid in PCa diagnosis. The system might assist pathologists by screening biopsies, offering second opinions on grade groups, and providing quantitative measurements of volume percentages. Similarly, Nagpal et al. [114] developed a DL model that could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable.
Brain cancer
Brain tumors are frequently initially characterized by MRI before further diagnosed through histopathological examination and AI-driven tools have the potential to assist neuroradiologists in lesion detection and differential diagnosis [115, 116]. For example, Cho et al. [42] showed that glioma grades could be accurately determined by combining high-dimensional imaging features, an advanced feature selection method, and ML classifiers. Park et al. [43] demonstrated that DL-based models markedly improve both the detection and segmentation of brain metastases, with a particular emphasis on accurately identifying small metastases. In a retrospective multi-center study, researchers [44] found that a deep CNN model built from preoperative mpMRI could predict the 1p/19q status in lower-grade gliomas (LGG) patients with high accuracy, sensitivity, and specificity. The imaging-based DL has the potential to serve as a noninvasive tool for predicting key molecular markers in adult diffuse gliomas.
In terms of pathological diagnosis, CNNs trained on WSIs of gliomas have been utilized to deliver unbiased diagnoses of gliomas. For instance, Ertosun et al. [117] trained two CNNs on publicly available H&E-stained images of gliomas from TCGA. One CNN was designed to differentiate glioblastoma (GBM) from LGG, while the other aimed to distinguish between grade 2 and grade 3 gliomas. Li et al. [118] used DL on H&E-stained histopathology images to classify central nervous system (CNS) tumors and to assess biomarkers including IDH1 mutation and p53 mutation. While conventional methods that include imaging and tissue biopsies reliably identify many brain tumors, there are exceptions such as high-grade astrocytoma with piloid features, which was introduced in the 2021 WHO classification [119]. This condition requires methylome profiling for accurate diagnosis [120]. To address this, Vermeulen et al. [121] evaluated the use of rapid nanopore sequencing combined with ML to improve intraoperative diagnosis of CNS tumors, developing 'Sturgeon', a neural network trained to subclassify CNS tumors during surgery using sparse methylation profiles obtained through nanopore sequencing. Similarly, Hoang et al. [122] developed Deep lEarning from histoPathoLOgy and methYlation (DEPLOY), a DL model that classifies CNS tumors into ten major categories from histopathology within a clinically relevant short time frame.
Facilitating precision cancer treatment
While cancer diagnosis and classification remain crucial for informing patient treatment, emerging AI algorithms are increasingly being developed to directly enhance therapeutic interventions by assisting in treatment selection, designing personalized treatments, and guiding during treatment delivery. This form of cancer care often involves analyzing a large amount of data with advanced computational approaches to help clinicians in decision-making and to facilitate the evaluation of biomarkers for prognostic purposes.
Colorectal cancer
Surgical intervention remains the principal and most efficacious approach for the management of patients with CRC. Advancements in AI have ushered in a new era in CRC surgery, exemplified by the substantial progress achieved with the da Vinci surgical system, which has the potential to enhance surgical precision, visualization, and surgeon ergonomics, thereby reducing tissue trauma, accelerating recovery, and decreasing complication rates [123, 124].
AI tools are being increasingly integrated into clinical workflows to optimize the radiotherapy treatment process. For instance, AI integration in magnetic resonance-guided radiation therapy (MRgRT) represents a significant leap forward in oncologic treatment by utilizing real-time MRI for highly precise visualization of tumors and adjacent structures. It is especially advantageous in the treatment of complex soft tissue neoplasms, delivering enhanced image resolution and precise localization that significantly exceed the standards set by conventional imaging-guided radiation therapy [125].
Prognostic AI models for CRC have been extensively developed utilizing various data modalities such as histopathological analysis [126, 127] and multiplex imaging approaches [128–130]. For example, DoMore Diagnostics company [131, 132] has a conformité européenne (CE)-marked product that predicts CRC prognosis from H&E slides, and several validation studies of histopathology-based prognostic models have been conducted, including research demonstrating that a prognostic feature initially identified through AI analysis can be effectively learned and applied by pathologists [133, 134]. In addition to directly predicting clinical outcomes, several studies have employed AI to forecast previously characterized prognostic and predictive biomarkers, with emphasis on the development and validation of models for predicting microsatellite instability (MSI), a key biomarker associated with treatment response and clinical prognosis in immunotherapy [135, 136]. This application is increasingly approaching commercialization, exemplified by Owkin’s CE-marked product that predicts MSI directly from H&E images [137].
Breast cancer
AI is gradually revolutionizing the surgical landscape in BC management, offering advancements in oncology aesthetics, preoperative planning, intraoperative guidance, and postoperative assessment. Pfob et al. [138] utilized ML algorithms to predict breast satisfaction during follow-up in women contemplating mastectomy and reconstruction as part of their BC treatment strategy, which provided a personalized reference. In addition, ensuring clear margins is crucial to prevent the recurrence of BC in breast-conserving surgery. Kothari et al. [139] reported that combining Laser Raman spectroscopy (LRS) with two ML algorithms offers rapid, quantitative, and probabilistic tumor assessment with real-time error analysis. This approach can detect cellular changes characteristic of cancer tissue in vivo during surgery, enabling real-time margin evaluation.
Advancements in AI are accelerating the evolution of drug therapies for BC treatment. Park et al. [140] developed an interpretable DL model to predict responses to palbociclib, a cyclin-dependent kinase 4/6 inhibitor (CDK4/6i) used in BC therapy, based on a reference map of multiprotein assemblies in cancer. This study provides an integrated assessment of how a tumor’s genetic profile influences resistance to CDK4/6 inhibitors. Moreover, Sammut et al. [141] demonstrated that ML models integrating clinical, genomic, and transcriptomic data from patients undergoing chemotherapy or targeted therapy substantially outperform those relying solely on clinical variables in predicting BC outcomes, with the high accuracy observed in external validation indicating their robustness and potential to inform therapeutic decision-making in future clinical trials.
In addition to surgical intervention and drug therapy, numerous AI-driven predictive models grounded in histological analysis have been developed to augment clinical decision-making processes. For example, Ogier du Terrail et al. [142] demonstrated that federated learning enables collaborative multi-center training of ML models on WSIs to predict histological response to neoadjuvant chemotherapy in triple-negative breast cancer (TNBC), outperforming local models and clinical baselines while identifying predictive features such as tumor-infiltrating lymphocytes (TILs), apocrine tumor cells, and fibrosis through interpretability techniques. In another study, Amgad et al. [143] introduced the Histomic Prognostic Signature (HiPS), an interpretable and comprehensive scoring system that assesses survival risk based on the morphological characteristics of the BC microenvironment. HiPS leverages DL techniques to precisely map cellular and tissue architectures, enabling the quantification of features related to epithelial, stromal, immune components, and their spatial interactions. Additional investigations have employed AI techniques on H&E images to directly predict the presence of TILs, a prognostic and predictive biomarker in BC, as well as the expression status of programmed death ligand-1 (PD-L1), a key biomarker for immunotherapy response [144–146]. Furthermore, DL can directly determine molecular biomarker status from routine histology. In line with notion, Shamai et al. [147] developed a DL system termed morphological-based molecular profiling (MBMP) that predicts BC molecular biomarker expression, such as estrogen receptor (ER), progesterone receptor (PR), and HER2, directly from standard H&E-stained tissue images. Results suggest tissue morphology encodes molecular information, providing a potentially faster, cheaper alternative to IHC for biomarker profiling in many patients.
Lung cancer
The application of AI in non-invasive liquid biopsy for personalized treatment of LC has seen remarkable advancements. Assaf et al. [148] demonstrated that alterations in ctDNA, modeled within a ML framework and validated across both a hold-out test set and an external cohort of NSCLC patients, can enhance patient risk stratification and enable the sensitive detection of treatment arm differences at early stages within clinical trial contexts. Widman et al. [149] presented a ML-guided whole-genome sequencing platform for ctDNA single-nucleotide variant detection that enables plasma-only (non-tumor-informed) disease monitoring in advanced LC, providing clinically valuable tumor fraction assessments for patients undergoing immune checkpoint inhibition. Heeke et al. [150] introduced minimal residual disease (MRD)-EDGE, a ML-enhanced plasma whole-genome sequencing platform that improves ctDNA detection sensitivity by approximately 300-fold for single-nucleotide variants (SNVs) and reduces the required aneuploidy for copy-number variant (CNV) detection from 1 Gb to 200 Mb. This enables ultrasensitive monitoring of tumor burden, MRD, immunotherapy response, and even ctDNA shedding from precancerous lesions.
Additionally, numerous studies have highlighted the potential of AI to predict immunotherapy biomarkers from pathological data in patients with NSCLC, including PD-L1 expression, and TILs. For example, Park et al. [151] developed an AI-powered spatial analyzer of TILs in H&E images, which correlates with tumor response and progression-free survival in patients with advanced NSCLC undergoing ICI therapy, potentially serving as a supplementary biomarker to the tumor proportion score evaluated by a pathologist. This product is currently CE-marked and approved for quantifying PD-L1 expression by the AI startup Lunit [152]. Vanguri et al. [153] developed a multimodal ML model (DyAM) integrating radiology (CT scans), pathology (digitized PD-L1 slides), and genomics to predict response to PD-(L)1 blockade immunotherapy in advanced NSCLC patients.
Furthermore, AI has revolutionized the identification and analysis of genomic profiles, playing a vital role in forecasting disease prognosis, treatment responses, and survival rates. For instance, Wang et al. [154] proposed a fully automated artificial intelligence system (FAIS), which offers a non-invasive method to detect epidermal growth factor receptor (EGFR) genotype and identify patients with an EGFR mutation at high risk of tyrosine kinase inhibitor (TKI) resistance. The superior performance of FAIS over tumour-based DL methods indicates that genotype and prognostic information can be obtained from the whole lung instead of only tumour tissues. Apart from the driver mutation EGFR, Rakaee et al. [155] proposed that ML-based immune phenotyping, which analyzes the spatial distribution of T cells in resected NSCLC, can identify patients at greater risk of disease recurrence after surgical resection. Specifically, LUADs with concurrent KEAP1 and STK11 mutations are enriched for altered and desert immune phenotypes. Besides, Ricciuti et al. [156] highlighted the genomic and immunophenotypic heterogeneity of immune checkpoint inhibitor (ICI) resistance in patients with NSCLC, utilizing comprehensive tumor genomic profiling and ML-based assessment of TILs.
Prostate cancer
AI systems can also enhance situational awareness, optimize surgical approaches, and improve patient outcomes during robotic-assisted radical prostatectomy (RARP) [157]. For instance, automated performance metrics collected by the Da Vinci surgical robot have been utilized to predict postoperative length of stay following RARP [158]. AI also helps integrate interactive 3D imaging systems for preoperative and intraoperative planning. In a prospective study, a real-time 3D augmented reality system was developed to identify prostate lesions at the neurovascular bundle level, thereby enabling enhanced nerve-sparing techniques during RARP [159].
AI has also been extensively investigated for non-surgical treatment planning in PCa. Data from a prospective study demonstrated that fully automated, ML-generated therapeutics are realizable in a clinical environment. ML delivers reproducible, high-performance radiation therapy treatments for patients with PCa and provides time savings that allow for better reallocation of human resources [160]. Nouranian et al. [161] developed an advanced ML-based multi-label segmentation algorithm aimed at providing rapid and clinically relevant segmentations for seed implantation planning in low-dose-rate prostate brachytherapy, a treatment modality involving the placement of small radioactive seeds within or adjacent to the prostate gland.
Choosing the best therapy for a PCa patient is challenging, as oncologists must find a treatment with the highest chance of success and the lowest risk of side effects. International guidelines for predicting outcomes rely on non-specific and semi-quantitative tools, often resulting in over- and under-treatment [162]. To address this, Esteva et al. [163] developed a multimodal deep learning model (MMAI) using clinical data and digital histopathology images from randomized trials to predict long-term PCa outcomes (e.g., metastasis, survival). The MMAI outperformed standard risk stratification from the National Comprehensive Cancer Network (NCCN) across all endpoints, offering a globally accessible tool for therapy personalization through digital pathology. Furthermore, Parker and colleagues [164] aimed to evaluate whether the MMAI algorithm could predict outcomes in very advanced PCa using data from four phase 3 trials of the STAMPEDE platform protocol. They found that diagnostic prostate biopsy samples contain prognostic information in patients with, or at high-risk of, radiologically overt metastatic PCa. The MMAI algorithm combined with disease burden improves prognostication of advanced PCa.
Efforts are also ongoing to leverage AI for enhancing PCa prognostication. For example, Elmarakeby et al. [165] elegantly showed P-NET is a sparse DL model that analyzes molecular profiling data within a biologically informed, pathway-driven framework to predict PCa states such as metastasis, with the resulting prediction scores demonstrated to independently correlate with patient prognosis. Kartasalo et al. [166] developed an AI model utilizing H&E images to detect perineural invasion, a critical prognostic marker associated with adverse outcomes in PCa. Additionally, ArteraAI, an AI-focused company, has developed and validated an AI-based predictive model that can identify patients with a predominantly intermediate risk for prostate cancer who are likely to benefit from short-term androgen deprivation therapy (ADT) [167]. This innovative technology has been commercialized as the ArteraAI Prostate Test and is now available for clinical use through a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory.
Brain cancer
Approximately 96% of patients with GBM have isocitrate dehydrogenase-1 (IDH1) wildtype mutations, and the treatment success rate for these patients (i.e., concomitant adjuvant temozolomide (TMZ) therapy) can be predicted via the O6-methylguanine-DNA methyltransferase (MGMT) gene promoter [168]. To address this, Le et al. [168] developed a radiomics model using XGBoost and F-score feature selection to non-invasively classify MGMT promoter methylation status in IDH1 wildtype GBM patients. Using nine key MRI-derived radiomics features, the model achieved high accuracy outperforming other methods and potentially supporting treatment planning. Furthermore, Do et al. [169] proposed a hybrid ML feature selection model to identify the most informative radiomics feature set and meticulously evaluate its capability of accurately classifying MRI images into methylated and unmethylated ones.
AI-based methods excel at predicting therapy responses, facilitating more effective treatment planning. Kawahara et al. [170] reported that AI techniques can predict responses to gamma knife radiosurgery for metastatic brain tumors by utilizing radiomic features extracted from contrast-enhanced MRI scans. There are also several studies in which radiomics features were used to improve the accuracy of distinguishing necrosis from tumor progression and to enable early detection of adverse radiation events after radiotherapy of brain tumors [171–173].
In brain tumor care, AI plays a crucial role in enhancing prognostic capabilities, enabling more accurate predictions of disease progression and patient outcomes. For instance, Macyszyn et al. [174] integrated diverse imaging markers to accurately predict patient survival and classify GBM molecular subtypes using ML on preoperative MRI, revealing distinctive radiographic phenotypes that can improve diagnosis and guide treatment without additional invasive testing. Zheng et al. [175] presented a scalable approach to elucidate the transcriptional heterogeneity of GBM and establish a vital link between spatial cellular architecture and clinical outcomes.
Improving cancer surveillance
Beyond individual patient care, AI plays a crucial role in population-level cancer surveillance, enabling more efficient data collection, analysis, and risk prediction to inform public health strategies. Cancer surveillance involves the continuous collection and analysis of patient data and epidemiological statistics. AI techniques are increasingly used to speed up the extraction of relevant information for surveillance reports and to discern meaningful patterns within population-level cancer datasets. For example, a collaboration between the National Cancer Institute (NCI) and the department of energy, known as Modeling Outcomes using Surveillance data and Scalable Artificial Intelligence for Cancer (MOSSAIC), employs advanced AI methodologies to expedite the submission of data to NCI’s Surveillance, Epidemiology, and End Results (SEER) program, thereby enhancing the efficiency and timeliness of cancer data reporting [176]. As part of this initiative, Alawad et al. [177] developed sophisticated AI algorithms capable of automatically extracting tumor characteristics from unstructured clinical narratives, thereby saving thousands of hours of manual effort. Chandrashekar et al. [178] developed a long-sequence AI transformer called Path-BigBird, which extracts data from six SEER cancer registries to provide cancer researchers and clinicians with more accurate information on cancer diagnosis and management, supported by the MOSSAIC initiative. Moreover, NCI-supported researchers are harnessing DL algorithms trained on population-scale disease data to predict individual risk for pancreatic cancer, thereby laying the groundwork for earlier detection and improved patient outcomes [179].
Additionally, LLMs used for EHR surveillance are assisting researchers in gaining deeper insights into social determinants of health, which are potentially vital for the prevention, early detection, and effective treatment of cancer [180]. Pan et al. [181] proposed that an LLM-based pipeline can facilitate the interpretation of EHR notes without needing manually curated labels, thereby enabling comprehensive and real-time disease surveillance.
Revolutionizing cancer drug discovery
AI is also transforming the upstream process of cancer drug development, accelerating the identification of novel targets, therapeutic candidates, and optimization strategies.
Enhancing target identification and validation
AI-driven computational predictions of protein structures can substantially advance structure-based drug design, facilitating the development of novel therapeutic targets. Recent advancements in DL-based AI models have revolutionized the prediction of protein three-dimensional structures, exemplified by AlphaFold [6], the modeling of molecular interactions such as those achieved by AlphaFold 3 [182], and the simulation of dynamic processes including protein folding and unfolding, as demonstrated by AI2BMD [183]. In addition, target validation through the utilization of cellular and animal models constitutes a critical phase in the target discovery process, and an expanding array of AI-identified candidates are moving toward successful validation. For example, Ren et al. [184] reported that a highly potent small-molecule inhibitor, designed using generative AI, exhibited selective antiproliferative activity in a hepatocellular carcinoma cell line.
Accelerating drug discovery
AI holds the potential to markedly compress drug discovery timelines and reduce costs relative to conventional methodologies. Impressively, Zhavoronkov et al. [185] developed a deep generative model called generative tensorial reinforcement Learning model for de novo small-molecule design, which optimizes synthetic feasibility, novelty, and biological activity, and used it to discover a potent TKI within Just 21 days. Ren et al. [184] combined AlphaFold-predicted structures with generative AI (Chemistry42) to identify a novel CDK20 inhibitor for hepatocellular carcinoma. Within 30 days of target selection, their first hit molecule (ISM042-2–001) was validated after synthesizing only seven compounds. A second AI-driven optimization round yielded ISM042-2–048 within 60 total days. The entire process synthesized only 13 synthesized compounds, highlighting a significant improvement over traditional drug discovery, which typically takes 10–15 years and costs up to $2.8 billion [186].
Beyond de novo design, AI-enabled drug repurposing has emerged as a complementary strategy to further compress timelines and reduce risk. For example, Tran et al. [187] used AI (Standigm Insight™) to rapidly repurpose antiviral drug Z29077885 as a novel serine/threonine kinase 33 (STK33)-targeting anticancer agent. By leveraging its existing safety profile, this approach bypassed early-stage development costs, with efficacy validated through in vitro/in vivo studies. Recently, Abdel-Rehim et al. [188] utilized GPT-4 to identify unexpected combinations of everyday drugs that could help treat BC. One example is the combination of simvastatin (commonly used to lower cholesterol) and disulfiram (used in alcohol dependence), which showed an inhibitory effect on BC cells. By focusing on affordable, FDA-approved drugs not typically associated with cancer, these treatments have a higher potential to be fast-tracked for real-world application.
Improving drug design and optimization
AI-driven approaches are transforming cancer drug development by improving precision, interpretability, and treatment optimization. For example, Vries et al. [189] introduced a revolutionary AI 'fingerprint' technology that can accurately show how cancer cells respond to new drugs, by simply observing changes in their shape. Zhao et al. [190] developed an ensemble of predictive models that reveal the impact of cancer mutations on responses to common DNA replication stress-inducing agents, enabling both multidrug response prediction and mechanistic interpretation.
Improving access to cancer care
Finally, the advent of AI-powered patient engagement tools (e.g., chatbots, virtual assistants) shows promise in supporting cancer care delivery. Chatbots emulating human conversation can enhance medication adherence and self-management. Chaix et al. [191] demonstrated that the use of chatbots enhanced medication adherence among patients with BC. Similarly, Tawfik et al. [192] developed ChemoFreeBot, a chatbot on the Microsoft Azure platform, to educate women with BC, aiming to enhance self-care behaviors and reduce chemotherapy-related side effects through personalized information and improved access to real-time, high-quality data. However, limitations include potential reinforcement of health inequities if training data lacks diversity, and challenges in handling complex patient queries [127, 193, 194]. Future integration should focus on complementing (not replacing) clinician-led care.
Advancing fundamental knowledge of cancer biology
AI methods are increasingly utilized to deepen our understanding of the mechanisms underlying cancer initiation, progression, and drug resistance. The extensive scientific literature offers a wealth of information and insights on cancer. Experts in AI are leveraging LLMs to develop innovative computational tools aimed at enhancing the extraction of knowledge from research publications. For example, complex molecular regulatory pathways (MRPs) are essential for deciphering the mechanisms that govern cancer biology, and knowledge graphs (KGs) have emerged as indispensable tools for organizing and analyzing MRPs, providing structured frameworks to represent intricate biological interactions. Wu et al. [195] illustrated that reguloGPT, an innovative variant of GPT-4 within the LLMs framework, is designed for the end-to-end joint Name entity recognition, N-ary relationship extraction, and context predictions from a sentence describing regulatory interactions with MRPs. Using reguloGPT predictions on 400 annotated PubMed titles centered on N6-methyladenosine (m6A) regulation, they constructed the m6A knowledge graph (m6A-KG) and showed its effectiveness in elucidating the regulatory mechanisms of m6A in cancer phenotypes across multiple cancer types, thereby underscoring the transformative potential of reguloGPT in advancing the extraction of biological insights from scientific literature [195].
Additionally, researchers are harnessing AI to model the atomic dynamics of the RAS protein, one of the most frequently mutated proteins involved in cancer [196]. The AI-driven multiscale investigation of the RAS/RAF activation lifecycle (ADMIRRAL) project employs a novel integration of molecular dynamics, coarse-grain modeling, and DL to comprehensively investigate the RAS/RAF interaction across progressively biologically relevant time scales. A more comprehensive understanding of the interactions between RAS and its associated proteins could unveil new therapeutic opportunities for targeting oncogenic mutations within the RAS gene. Moreover, transposons are integral to processes such as evolution, gene regulation, and cancer development, yet existing methods for their identification often lack standardized frameworks, including a unified taxonomy scheme and consistent output file formats [197]. Riehl et al. [198] developed TransposonUltimate, an AI-powered software platform designed for the classification, detection, and annotation of transposon-related events. Wang et al. [199] introduced DeepBIO, a comprehensive platform integrating 42 DL algorithms aimed at enhancing the accuracy of functional annotation, visualization analysis, and automated interpretation of high-throughput biological sequence predictions. It offers streamlined biological sequence analysis with minimal programming effort, thereby providing detailed functional insights at both the nucleotide and sequence levels.
Challenges and opportunities for AI in oncology
Data privacy
In healthcare, concerns about data privacy and security are more urgent than in nearly any other industry, particularly in data-intensive domains such as oncology where AI relies on sensitive multimodal data (imaging, genomics, EHRs) [200–202]. It is well recognized that the average individual remains largely unaware of the extensive scope of their data that is collected, stored, sold, and shared across various entities [203]. This opacity is intensified by re-identification risks even with pseudonymized data and by the loss of individual control as commercial holders govern health information [204, 205]. These persistent challenges necessitate the development and implementation of concrete strategies such as privacy-enhancing technologies (PETs) to mitigate privacy risks while enabling critical oncology AI advancements.
Federated learning (FL) is widely used in healthcare, enabling multi-institutional model training without exposing raw data beyond institutional boundaries and thereby preserving local control while supporting collaborative analytics [206, 207]. For example, Elbachir et al. [208] proposed a federated 3D U-Net on the BraTS 2020 dataset for brain tumor segmentation, training across institutions without sharing patient data and demonstrating the feasibility of privacy-preserving, multi-institutional learning in medical imaging. Despite its promise, FL faces challenges such as the statistical diversity of data across clients and communication constraints. Techniques such as Agnostic FL and q-Fair FL have been developed to address data heterogeneity [209, 210]. Meanwhile, client selection protocols, model compression, and update reduction strategies focus on boosting communication efficiency [200, 211, 212]. Additionally, incorporating secure multi-party computation (SMPC) and differential privacy in FL frameworks enhances data privacy and security, making it a valuable tool for healthcare applications [211, 213]. Data from 2025 indicate that using FL combined with DP effectively balances data privacy and model accuracy in BC diagnosis [214]. The findings show that FL outperforms traditional centralized models, demonstrating its capability to generalize across decentralized data without compromising predictive performance.
Synthetic data augmentation technology is increasingly adopted in healthcare to support privacy-preserving research, algorithm training, and patient profiling. By mimicking the statistical characteristics of real data without revealing identifiable information, it helps balance innovation and data protection [215]. For instance, Zhou et al. [216] introduced a novel technique named DiffGuard that generates diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. Walonoski et al. [217] developed Synthea, an open-source software that simulates the Lifespans of synthetic patients, modeling the 10 most common for primary care encounters and the 10 chronic conditions with the highest morbidity in the US. These innovative approaches are crucial for preserving patient privacy and maintaining AI model integrity, enabling the development and testing of healthcare AI applications without risking the exposure of sensitive patient information.
Source-free domain adaptation (SFDA) has recently gained attention in the medical field. It aims to adapt a model well trained on source domain to target domains without accessing source domain data nor requiring target domain labels, thus promoting privacy protection and annotation efficiency [218, 219]. For example, Wang et al. [220] developed a dual reference strategy to select domain-invariant and domain-specific representative samples from a specific target domain for annotation and model fine-tuning without relying on source-domain data. This method ensures data privacy and reduces the workload for oncologists as it just requires annotating a few representative samples from the target domain and does not need to access the source data. However, Guichemerre et al. [221] analyzed the effectiveness of four representative SFDA methods for weakly supervised object localization (WSOL) in histology images and results indicated that these SFDA methods typically perform poorly for localization after adaptation when optimized for classification. Therefore, while SFDA offers promising advantages in terms of privacy and efficiency, its application, particularly in large datasets, remains challenging and requires further research to improve effectiveness.
Blockchain technology offers a secure, decentralized method for storing and managing health data, employing a distributed ledger system to ensure that patient records remain immutable and verifiable, thereby preventing unauthorized access and potential data breaches. It also facilitates secure data sharing among verified entities [222, 223]. In parallel, homomorphic encryption provides a solution for conducting computations on encrypted data without decryption. This enables AI to process and learn from the data, while it remains encrypted, thereby upholding privacy even when third-party analysis is involved [202, 224]. Lastly, differential privacy (DP) adds controlled noise to data sets to obscure individual identities, permitting the training of AI models on population-representative data without compromising individual privacy [61, 200]. Together, these technologies create a comprehensive framework for safeguarding patient privacy in AI-driven healthcare. Their implementation signifies a commitment to upholding ethical standards in healthcare AI and demonstrating a dedication to respecting and protecting patient rights, especially concerning privacy and autonomy. In addition, by adopting these technologies, healthcare providers can comply with laws such as the Health Insurance Portability and Accountability Act (HIPAA) in the US and the General Data Protection Regulation (GDPR) in Europe, which enforce strict data privacy and security standards [200, 225].
Regulatory frameworks and compliance
Most of the described AI applications are subject to oversight by regulatory agencies, which varies based on regional regulations and the intended use of the technology. In the US, the FDA (https://www.fda.gov/) handles authorizations and regulates AI applications under the umbrella of Software as a Medical Device (SaMD) with similar regulatory processes as other SaMD devices and non-AI algorithm [226]. The requirements for FDA clearance primarily depend on the device's class and intended use, with classes ranging from I (lowest risk) to III (highest risk). Typically, the FDA evaluates medical devices through an appropriate premarket pathway, such as premarket clearance (510(k)), De Novo classification, or premarket approval [227]. For example, with the FDA's De Novo pathway authorization of DermaSensor, a novel moderate-risk AI device, manufacturers needed to submit clinical evidence from three key studies [228]. First, the pivotal DERM-SUCCESS trial, involving 1,005 patients and 1,579 Lesions, demonstrated 95.5% sensitivity and a 96.6% negative result by device confirmed by biopsy, though it had low specificity at 20.7%. Additionally, a supplemental melanoma validation study, DERM-ASSESS, was conducted, along with a clinical utility study showing that the use of the device halved missed cancers by primary care physicians from 18 to 9%. Importantly, post-market surveillance requires performance testing in underrepresented populations, such as those with Fitzpatrick skin types IV–VI, to address trial disparities, given that 97.1% of participants were White. This approach contrasts with predecessors such as MelaFind, which followed the premarket approval pathway but was discontinued due to 10% specificity. In addition, the FDA has sought to distinguish CADe systems from the more stringently regulated computer-assisted diagnosis devices. The former are designed to 'identify, mark, highlight, or in any other manner direct attention' to imaging features, rather than autonomously diagnose, stage, or triage pathology. To further reduce the regulatory burden on SaMD developers, the FDA is considering reclassifying CADe systems used for visualizing breast lesions and Lung nodules into a lower-risk category, requiring a 510(k) submission instead of premarket approval [226].
In contrast to the US FDA's single SaMD guidance, the EU employs two comprehensive regulations for medical device safety and efficacy, including AI applications. These are the Medical Device Regulation (MDR), which governs devices implanted within the body, and the In Vitro Diagnostic Devices Regulation (IVDR), for devices testing specimens outside the human body. In the EU, device assessments are not conducted by a single central agency but by accredited organizations authorized to issue CE marks. Oncology AI systems also fall under the EU AI Act, which is the first comprehensive AI regulation by a major regulator globally. According to the EU AI Act, AI systems used in health care are generally 'high-risk'. This classification necessitates implementing a risk-management system, data governance controls (including documentation of the quality and representativeness of training, validation, and test data), technical documentation, transparency and human oversight, assurance of accuracy, robustness, and cybersecurity, as well as a post-market monitoring plan [229]. In Europe, some mammography AI readers and digital pathology triage tools have obtained CE marking under IVDR [230–232]. For example, based on the retrospective study evaluating the CE-marked AI system (Vara version 2.8) in BreastScreen Norway, the evidence required for regulatory pathways included large-scale real-world validation across diverse screening populations. The study analyzed 1,017,208 screening examinations from ten centers and achieved an area under the receiver operating curve (AUROC) of 0.921–0.927 for detecting screen-detected and interval cancers. The system showed potential to reduce workload for the radiologists and potentially increase the sensitivity of mammography [230]. Post-market surveillance could be maintained through ongoing retrospective studies and registry linkages. Firstly, subgroup validations based on mammographic density and equipment vendors can ensure consistent performance across clinical settings. Secondly, integration with other national cancer registries could enable long-term tracking of false negatives and cancer outcomes.
Reliability and scalability
While AI represents the frontier of technological innovation, similar to how dial-up internet once was, widespread adoption remains distant, and ongoing challenges related to interoperability, data quality refinement, and other issues are unavoidable.
Code sharing for AI models is a crucial step toward ensuring transparency and reproducibility, thereby enhancing their suitability for clinical application [233]. While most published studies validate their models on external datasets, true clinical relevance and translatability require that these models be independently reproducible by other researchers, just as with any other robust scientific discovery. In addition to code sharing, disparities exist between the ease of acquiring data from various platforms and the accessibility of that data for independent use by external institutions, particularly regarding private or controlled-access datasets [234]. Moreover, these advanced models are currently niche, complex, and costly, rendering widespread consumer-level adoption a considerable distance into the future. Beyond these barriers to adoption, the path towards reliable, equitable, and clinically impactful AI faces significant challenges rooted in data representativeness, algorithmic fairness, and methodological heterogeneity.
Data representativeness
In the clinical context, access to data that comprehensively represents the diverse human population is essential for developing robust AI models [235]. It is increasingly evident that disparities related to race, gender, and socio-economic status collectively influence disease risk and recurrence among individuals [236]. However, many of the datasets routinely used to train and evaluate AI models in cancer research remain fundamentally biased toward specific racial and ethnic groups [237]. For example, TCGA, the largest repository of diverse cancer datasets, has a median of 83% European ancestry individuals (range 49–100%) [238]. A 2016 genomic analysis of TCGA demonstrated insufficient representation among all ethnic minority populations, thereby restricting the capacity to reliably identify potentially clinically significant genetic alterations [239]. The under-representation of non-European ancestries in widely used resources such as TCGA poses a critical challenge to equitable AI development. AI models trained on biased datasets can perpetuate and amplify existing health disparities. If a model is trained primarily on data from one population, it may not perform well on individuals from other populations due to differences in genetic architecture and disease susceptibility [239, 240]. Another example is the genome-wide association study (GWAS) catalog, the largest genomic database with detailed ancestry classification, which currently consists of approximately 95% European data [241]. A study on GWAS data showed that model effectiveness is linked to the size of population samples [242]. Populations with limited or no representation have greater disparities in disease model performance and gain minimal benefit from benchmark models. In addition, a survey of cell‑line data suggested that Just about 5% of Transcriptomes and 2% of analyzed genomes come from people of African descent, even though Africa contains greater human genetic diversity than all other continents together [243–245]. Consequently, models largely based on European ancestry data might inaccurately assess risk, overlook biology specific to certain ancestries, or detect false correlations that reflect population structure instead of true causal signals. To address this under-representation of non-European ancestries, Smith et al. [246] recently introduced PhyloFrame, a ML method aimed at equitable genomic precision medicine. PhyloFrame corrects for ancestral bias by integrating functional interaction networks and population genomics data with transcriptomic training data.
Besides the under-representation of non-European ancestries in TCGA and other public large databases, commercial medical imaging datasets used to train diagnostic AI models also often lack diversity in race, geography, and socioeconomic status. Models trained on urban, high-resource settings may fail in rural or low-income populations due to differences in imaging equipment, acquisition protocols, and disease. These repositories often contain data primarily from specific demographics or clinical settings, potentially excluding individuals from other backgrounds and the process of selecting images for inclusion in a repository can introduce bias, favoring certain types of cases or patients over others [247, 248]. Efforts to address bias include incorporating data from various demographic groups, using rigorous testing and validation protocols, and conducting ongoing monitoring of model performance. For example, Pinaya et al. [249] utilized generative adversarial networks and latent diffusion models to create synthetic datasets, such as brain MRI data based on age, sex, and brain structure volumes. They found that signals identifying race are also present in these synthetic datasets. Theoretically, it could be easier to validate models using these controlled datasets because specific subgroup sizes can be predetermined, but further research is needed for proper validation.
Algorithmic fairness
Moving beyond data representativeness, achieving algorithmic fairness presents a distinct and critical future challenge. While biased data is a primary cause, unfair outcomes arise from complex interactions within model development and deployment, manifesting as systematic disparities in model performance (e.g., accuracy, false negative rates, calibration) across protected subgroups such as those defined by race, ethnicity, gender, or socioeconomic status [250–252]. This constitutes algorithmic disparate impact, where models unintentionally cause disproportionate harm, even if disparate treatment (intentional discrimination) is absent [253, 254].
Addressing this requires algorithmic bias mitigation strategies that span the entire modeling pipeline and work in a complementary manner. In pre-processing, practitioners reweight samples or transform features to weaken correlations between protected attributes or their proxies and the outcome before any model is trained [255–257]. During in-processing, fairness constraints are incorporated directly into the learning objective; for instance, adversarial debiasing penalizes models for learning representations tied to protected attributes in order to promote subgroup-invariant features, and other approaches enforce statistical parity metrics within the optimization procedure to guide the learner toward fairer solutions [258–261]. After training, post-processing methods modify model outputs by setting subgroup-specific decision thresholds to meet criteria such as equalized odds, which targets equal false positive and false negative rates, or predictive parity, which targets equal positive predictive value [262–265]. Even with these techniques in place, significant challenges remain, as accuracy and fairness often entail unavoidable trade-offs, and when prevalences differ across groups it is mathematically impossible to satisfy all fairness criteria at once [266–268].
Implementing these approaches in clinical AI faces unique hurdles because performance and fairness can degrade under dataset shift when models encounter distributions not represented during training, including genetic variation, differences in imaging protocols, changes in disease prevalence, and evolving taxonomies, which can disproportionately affect underrepresented groups [269–272]. This challenge is compounded by the frequent unavailability of protected attributes due to strict regulations, since sensitive data such as self-reported race are often needed both to audit fairness and to apply many mitigation techniques [273–275]. In addition, models must contend with concept drift as clinical definitions evolve over time, with revisions to coding systems and disease classifications, such as updates to the international classification of diseases (ICD) and the Banff classification, changing the meaning of labels and thereby undermining model validity and fairness, which in turn affects equity assessments and real-world performance monitoring [276–278].
Promising paths forward include the use of FL, which enables training on diverse, distributed datasets without centralizing raw data and can inherently improve representation while helping to mitigate population and acquisition shifts [279–282]. Building on this, FL frameworks that integrate adversarial learning or disentanglement are designed to learn fairer, site-invariant representations across participating institutions [283, 284]. To ensure that such technical advances translate into equitable real-world performance, rigorous fairness auditing should be required, with stratified performance reporting across relevant subgroups during both validation and ongoing monitoring, complemented by attribution techniques such as Shapley additive explanations (SHAP) values to identify and quantify sources of disparity [285, 286]. In parallel, explainability is critical for trust, so developing interpretable methods that help clinicians understand model behavior and diagnose why disparities arise is essential for regulatory approval and for sustainable clinical adoption [287, 288].
Heterogeneity in AI study designs
Heterogeneity in study design across AI oncology studies hinders the interpretation and comparison of reported results and limits assessment of clinical utility. Major sources of variability and potential bias include differences in dataset size, gold-standard labels, and external validation cohorts.
Various studies use datasets ranging from small, single-institution cohorts (n < 100) [27] to large, multi-center collections (n > 10,000) [17, 20, 31, 33, 39, 40]. Smaller datasets are inherently more susceptible to overfitting, potentially leading to inflated performance metrics (e.g., accuracy, AUC) during internal validation that fail to hold in broader populations. While larger datasets generally offer greater robustness, their quality and representativeness remain crucial.
The quality and consistency of the gold-standard used for training and validation are important. Heterogeneity in medical AI studies primarily stems from several key sources. Firstly, pathologist variability is a well-documented challenge, encompassing both inter- and intra-observer disagreement in areas such as histopathological diagnosis, grading (e.g., Gleason, Nottingham), and biomarker assessment (e.g., PD-L1, Ki-67). Artificial intelligence models trained on labels from a single pathologist or a small group inheriting specific biases will reflect those limitations. Inconsistent application of diagnostic criteria (e.g., for sessile serrated lesions in colonoscopy or rare neuroendocrine subtypes further complicates labeling. Furthermore, differences in how regions of interest are annotated on images (e.g., WSIs, radiology scans) or how specific features are defined and labeled introduce noise. Lack of standardized annotation guidelines across studies makes aggregating data or comparing models difficult. Finally, variations in the clinical definitions used as labels (e.g., endpoints such as overall survival vs. progression-free survival) directly impact what the model learns and reports.
One of the most critical factors affecting real-world applicability is external validation. Many studies rely solely on internal validation, which optimistically biases performance estimates. Studies employing external validation often use cohorts from similar institutions or healthcare systems, limiting generalizability to truly diverse settings. The scarcity of robust, prospective, multi-center external validation studies represents a major barrier to clinical trust and adoption. Performance frequently degrades when models encounter data from different scanners, acquisition protocols, patient demographics, or clinical practices not represented in the training set.
Addressing these challenges requires concerted efforts towards standardization. Key priorities include defining minimum dataset size recommendations for different tasks, establishing clear and consistent gold-standard diagnostic and outcome definitions (leveraging expert consensus panels), mandating rigorous independent external validation using prospectively collected cohorts from diverse settings as a prerequisite for claims of clinical utility, and ensuring transparent reporting of all methodologies and data characteristics (e.g., adhering to guidelines such as CONSORT-AI and STARD-AI) [289, 290]. Ultimately, future research must prioritize robustness and generalizability alongside technical accuracy to ensure AI models deliver meaningful benefits in real-world clinical application.
Social impact
AI remains a human creation, crafted by individuals who may possess malicious intent or be inherently fallible. Consequently, issues of bias and fairness can emerge, stemming from the biases of their creators or even from ethically benign sources such as inadequate sampling methods [291].
An additional critical consideration is the issue of interpersonal responsibility when human oversight is minimized or removed [292]. For example, if AI systems make errors, questions arise regarding who should be held liable, such as in malpractice suits, and how this might influence insurance coverage and liability scope [293]. These are long-term, unresolved questions, but it is essential that industries address them thoughtfully and ethically to ensure that accountability is not simply transferred onto consumers.
Clinical integration and validation
Currently, nearly all AI models developed for cancer diagnosis rely on clinical data collected at the time of development, which may include patient reports or sequencing results. This raises the question of whether there are AI systems capable of recommending additional diagnostic tests or treatment options, or even assisting in prescribing anticancer medications, without dependence on traditional clinical data. As multiomics technologies continue to advance, incorporating diverse data types such as methylation profiles and fragmentomics, it is conceivable that once an AI model's dataset reaches sufficient scale and diversity, it could potentially predict the likelihood of cancer development solely based on data from healthy individuals [294]. Furthermore, by comparing sequencing results from cancer patients against an extensive database, it may become possible to recommend personalized chemotherapy regimens.
Looking ahead, prevention rather than treatment may emerge as the most compelling application of AI in cancer care. Pioneering research has already enabled the scientific community to compile a comprehensive portfolio of cancer risk factors, paving the way for more effective early intervention strategies. Advances in technology have facilitated multiple methods of collecting data at the individual patient level. In addition to genetic testing and EHRs, wearable biosensors have revolutionized healthcare, especially in cancer detection and monitoring [295]. They are embedded into smartwatches, patches, and clothing, continuously collecting extensive physiological and biochemical data streams to enhance diagnosis and treatment [296]. Traditional methods such as biopsies and imaging are invasive and costly, limiting their use. Advances in microfluidics and surface engineering have broadened the scope of wearable biosensors, utilizing body fluids such as sweat, saliva, tears, and interstitial fluid for non-invasive tumor biomarker detection [297]. Moreover, AI paired with smartphone-based imaging for in vivo cancer detection is an emerging field that Leverages advancements in 5th generation (5G) and 6G sensing technology. The high-speed data Transmission capabilities of 5G networks enable the efficient processing of large data volumes, allowing AI algorithms to quickly analyze smartphone-captured images and provide cancer diagnoses. For example, the integration of advanced imaging sensors, such as high-resolution cameras, thermal, and multispectral sensors in smartphones, enhances image quality and aids in the accurate identification of potential tumors [298, 299]. These diverse, real-time data streams have the potential to feed into integrated AI platforms, creating dynamic, personalized risk profiles by synthesizing information from genomics, EHRs, lifestyle factors (via sensors), physiological fluctuations (from wearables), and visual changes (from smartphone imaging). This offers real-time management guidance for modifiable risk factors, such as setting personalized activity goals based on wearable data insights. It also enables personalized early intervention recommendations, such as prompting high-risk individuals for specific screening tests based on smartphone-based imaging. Furthermore, these integrated AI systems could enable proactive remote monitoring of cancer survivors and alert clinicians to early physiological or behavioral changes that may indicate recurrence or complications long before traditional follow-up schedules would.
Clinical validation is the pivotal step that determines whether AI systems meaningfully improve cancer care beyond technical performance in development datasets. Unlike technical validation, which focuses on accuracy metrics within held-out or cross-validated data, clinical validation evaluates generalizability, calibration, safety, and impact in real patients and workflows. Robust external validation across institutions, geographies, devices, and time is essential to assess resilience to dataset shift and to quantify whether models maintain discrimination and calibration in populations that differ from the training data. Appropriate endpoints should extend from diagnostic accuracy to decision impact, downstream care processes, patient-centered outcomes, and harms, with transparent, pre-specified performance thresholds and failure analyses. For tools that learn or are periodically updated, adaptive or platform trial structures and learning health system approaches help align evaluation with model change. Equity considerations require planned subgroup analyses by demographics, tumor subtype, site, scanner or device, and socioeconomic context, coupled with bias mitigation and reporting of heterogeneous effects. Regulatory pathways (Sect. " Regulatory frameworks and compliance") emphasize intended use, benefit–risk assessment, Good Machine Learning Practice (GMLP), change control for adaptive algorithms, and post-market surveillance. After deployment, continuous monitoring for performance drift, re-calibration when needed, incident reporting, and human-in-the-loop safeguards are necessary to maintain safety and effectiveness.
In oncology, much of the current literature remains retrospective and single-center [24, 28, 29, 43], with fewer prospective impact evaluations [25, 35, 36] and only a limited number of randomized studies [19, 30]. Where rigorous prospective validation has been conducted, results have sometimes confirmed utility and at other times revealed diminished performance, underscoring the need for careful and context-specific evaluation. Ultimately, AI systems for cancer care should be considered ready for routine use only after they demonstrate reliable, equitable clinical benefit in well-designed studies and are supported by ongoing governance and surveillance in real-world practice.
However, it must be emphasized that, despite the rapid advancements and promising prospects of AI, it can never fully replace clinicians and will ultimately serve as a crucial tool to assist healthcare professionals in their practice (Fig. 3).
Fig. 3.
Challenges and future pathways for AI integration in oncology
Limitations
This review has several limitations that warrant acknowledgment. First, our synthesis of evidence relies predominantly on retrospective studies and proof-of-concept trials (e.g., CADe validation for colonoscopy in Sect. " Expediting cancer screening, detection and diagnosis"). While these demonstrate technical feasibility, they may overstate real-world performance due to idealized datasets and limited external validation. Prospective randomized trials assessing patient survival or cost-effectiveness remain underrepresented in our analysis, including trials of AI-driven surveillance interventions. Second, our assessment of algorithmic fairness is constrained by inconsistent reporting in primary literature. Although we highlight biases in training data (for example, TCGA’s under-representation of non-European ancestries), we do not provide a quantitative synthesis of performance disparities across demographic groups, including for prognostic models in breast and prostate cancer (Sect. " Improving cancer surveillance"). This limits our ability to assess equitable generalizability. Third, the regulatory analysis in Sect. " Clinical integration and validation" focuses on FDA/CE pathways but omits emerging frameworks in Asia (e.g., China’s National Medical Products Administration guidelines for oncology AI). This geographical gap reflects our inclusion bias toward English-language publications and may reduce relevance for global audiences. Finally, our evaluation of clinical integration barriers prioritizes technological hurdles over socioeconomic determinants. We provide limited analysis of how factors such as hospital resources and payer policies shape adoption, a critical gap in our review. These limitations highlight constraints specific to our methodology but do not undermine the challenges discussed in Sect. " Challenges and opportunities for AI in oncology". Future studies in this field would benefit from standardized bias reporting, prospective registry data, and deliberate inclusion of global regulatory cases.
Conclusion
Fueled by the exponential increase in data, advancements in AI algorithms, and enhanced computational capabilities, AI possesses the transformative potential to revolutionize precision oncology across the entire continuum of cancer care, encompassing prevention, diagnosis, treatment, and drug development. Achieving this potential relies on coordinated efforts focused on translational research, which encompasses expanding access to extensive and diverse datasets, developing robust and interpretable AI models, and conducting rigorous validation through unbiased, prospective clinical trials. Furthermore, the development of robust regulatory frameworks is crucial to ensure the safe, equitable, and effective deployment of AI technologies. By prioritizing these initiatives and fostering interdisciplinary collaboration among key stakeholders, we can expedite the integration of AI into routine clinical practice, thereby maximizing its impact on patient outcomes and ultimately alleviating the global burden of cancer.
Looking ahead, key research priorities must be addressed to unlock the full potential of AI in oncology: (a) Development of multi-institutional, multi-modal reference datasets: Establishing large-scale, curated datasets that integrate diverse data modalities (imaging, genomics, transcriptomics, proteomics, pathology, EHRs, liquid biopsy) across multiple institutions and diverse patient populations is paramount. Initiatives such as MOSSAIC demonstrate the value of such efforts, but broader collaboration and standardized data sharing frameworks are essential to overcome biases, enhance model generalizability, and fuel discovery across rare cancer types and underrepresented groups. (b) Creation of XAI methods that clinicians find trustworthy: Moving beyond 'black box' predictions is critical for clinical adoption. Research must focus on developing intuitive and reliable XAI techniques (e.g., visual saliency maps, feature attribution, natural language explanations) that provide clinicians with actionable insights into why an AI model arrived at a specific diagnosis, risk prediction, or treatment recommendation. Building trust requires transparency that aligns with clinical reasoning pathways, enabling effective human-AI collaboration, as highlighted by the need for interpretability in models predicting treatment response or biomarker status [300, 301]. (c) Integration of socio-behavioral and real-world data into risk-prediction and care algorithms: To achieve truly personalized prevention and care, AI models must evolve to incorporate factors beyond traditional clinical and molecular data. This includes integrating data on social determinants of health, behavioral patterns (e.g., from wearables or patient-reported outcomes), environmental exposures, and longitudinal real-world evidence. Developing AI capable of synthesizing this complex tapestry of information, while carefully addressing privacy and equity concerns, will be crucial for predicting individual cancer risk with higher fidelity, tailoring screening strategies, optimizing treatment adherence, and understanding the impact of non-biological factors on outcomes, as envisioned in future AI-driven prevention models. Addressing these priorities, alongside ongoing advancements in algorithms, computing, and ethical frameworks, will be instrumental in translating the promise of AI into tangible improvements in cancer prevention, early detection, therapeutic efficacy, and ultimately, patient survival and quality of life.
At the time of writing, shortly after the release of GPT-5, Sam Altman, the chief executive officer (CEO) of OpenAI, the company behind ChatGPT, said in an interview that by 2035 AI will cure or at least treat many diseases that currently plague humanity [302, 303]. For example, in the era of GPT-8, we might ask it to 'cure a certain type of cancer.' It will first read all existing research and data and come up with some treatment ideas. Then it will tell us, 'I need you to find an experimenter to conduct these nine experiments and tell me the results.' After two months of cell culture, when the experimenter sends the results back to GPT-8, it may say, 'Well, there's an unexpected discovery. I need to conduct one more experiment.' Then it will tell you, 'Synthesize this molecule and test it on mice.' If it works, then conduct human trials. Finally, it will say, 'Okay, here's the process for submitting it to the FDA.' These scenarios underscore AI's transformative potential in biomedical discovery and care, but realizing them will require rigorous validation, ethical safeguards, robust oversight, and integration with clinical expertise to ensure safety, equity, and reproducibility.
Appendix: evolution of AI models and data modalities
At a workshop held at Dartmouth College in the summer of 1956, McCarthy et al. introduced the term 'artificial intelligence', also known as 'thinking machines' [304]. The first ML algorithms were developed in the 1950 s to the 1970s. Frank Rosenblatt's perceptron was one of the neural network models, primarily employed for binary classification tasks [305]. From the 1980 s to the 2010 s, ML research has led to development and application of a number of 'shallow' learning algorithms, including earlier generalized classic linear models such as logistic regression, Bayesian algorithms, decision trees, and ensemble methods [306, 307]. The early 2010 s signified a pivotal turning point with the advent of DL, which exhibited superiority over traditional ML methods in applications such as image and speech recognition [308]. In computer science, ML constitutes a specialized branch of AI, while DL represents a particular subset of ML that emphasizes the development and application of deep artificial neural networks [309]. The 2020 s marked the introduction of the transformer architecture, a revolutionary DL framework distinguished by its attention mechanism, which proficiently models long-range dependencies within sequential data such as text and video [310]. Transformers have further significantly enhanced the capabilities of AI, particularly through their pivotal role in the development of LLMs. Through training on extensive datasets sourced from the internet using advanced supercomputing infrastructure, LLMs such as ChatGPT, DeepSeek, LLaMA and Grok demonstrate extraordinary capabilities in comprehending human input and producing responses that closely emulate human communication [311].
Acknowledgements
All figures were created with bioRENDER.
Authors’ contributions
CHC researched data for this manuscript. All authors discussed the content of the manuscript, and wrote, reviewed and/or edited the manuscript prior to submission.
Funding
This study is supported by funds from Research Grants Council-General Research Fund (14101321; 24100520), RGC-Collaborative Research Fund (C4008-23WF, C4042-24GF); Heath and Medical Research Fund (06170686; 08190706); The National Key Research and Development Program of China (2021YFF1201300)
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Cillian H. Cheng, Email: chenghc95@gmail.com
Su-sheng Shi, Email: drshisusheng@163.com.
References
- 1.Siegel RL, et al. Cancer statistics, 2025. CA Cancer J Clin. 2025;75(1):10–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bray F, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. [DOI] [PubMed] [Google Scholar]
- 3.Chang T-G, et al. Hallmarks of artificial intelligence contributions to precision oncology. Nat Cancer. 2025;6(3):417–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yates J, Van Allen EM. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell. 2025;43(4):708–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang J, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med. 2025;31(2):609–17. [DOI] [PubMed] [Google Scholar]
- 6.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Topol EJ. Learning the language of life with AI. Science. 2025;387(6733):eadv4414. [DOI] [PubMed] [Google Scholar]
- 8.Gong D, et al. Spatial oncology: translating contextual biology to the clinic. Cancer Cell. 2024;42(10):1653–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ligero M, et al. Artificial intelligence-based biomarkers for treatment decisions in oncology. Trends Cancer. 2025;11(3):232–44. [DOI] [PubMed] [Google Scholar]
- 10.Kleppe A, et al. A clinical decision support system optimising adjuvant chemotherapy for colorectal cancers by integrating deep learning and pathological staging markers: a development and validation study. Lancet Oncol. 2022;23(9):1221–32. [DOI] [PubMed] [Google Scholar]
- 11.Rosenthal JT, Beecy A, Sabuncu MR. Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems. NPJ Digit Med. 2025;8(1):252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xu Y, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation (Camb). 2021;2(4):100179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dlamini Z, et al. Artificial intelligence (AI) and big data in cancer and precision oncology. Comput Struct Biotechnol J. 2020;18:2300–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sharma A, et al. Advances in AI and machine learning for predictive medicine. J Hum Genet. 2024;69(10):487–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mostavi M, et al. Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics. 2020;13(5):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mienye ID, Swart TG, Obaido G. Recurrent neural networks: a comprehensive review of architectures, variants, and applications. Information. 2024;15(9):517. [Google Scholar]
- 17.Zhou D, et al. Diagnostic evaluation of a deep learning model for optical diagnosis of colorectal cancer. Nat Commun. 2020;11(1):2961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kominami Y, et al. Computer-aided diagnosis of colorectal polyp histology by using a real-time image recognition system and narrow-band imaging magnifying colonoscopy. Gastrointest Endosc. 2016;83(3):643–9. [DOI] [PubMed] [Google Scholar]
- 19.Djinbachian R, et al. Autonomous artificial intelligence vs artificial intelligence-assisted human optical diagnosis of colorectal polyps: a randomized controlled trial. Gastroenterology. 2024;167(2):392-399.e2. [DOI] [PubMed] [Google Scholar]
- 20.McKinney SM, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. [DOI] [PubMed] [Google Scholar]
- 21.Lotter W, et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat Med. 2021;27(2):244–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shen Y, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat Commun. 2021;12(1):5645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Liao J, et al. Artificial intelligence-assisted ultrasound image analysis to discriminate early breast cancer in Chinese population: a retrospective, multicentre, cohort study. EClinMed. 2023;60:102001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sun S, et al. Deep learning prediction of axillary lymph node status using ultrasound images. Comput Biol Med. 2022;143:105250. [DOI] [PubMed] [Google Scholar]
- 25.Sandbank J, et al. Validation and real-world clinical application of an artificial intelligence algorithm for breast cancer detection in biopsies. NPJ Breast Cancer. 2022;8(1):129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang Y, et al. Improved breast cancer histological grading using deep learning. Ann Oncol. 2022;33(1):89–98. [DOI] [PubMed] [Google Scholar]
- 27.Steiner DF, et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am J Surg Pathol. 2018;42(12):1636–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Challa B, et al. Artificial intelligence-aided diagnosis of breast cancer lymph node metastasis on histologic slides in a digital workflow. Mod Pathol. 2023;36(8):100216. [DOI] [PubMed] [Google Scholar]
- 29.Hwang EJ, et al. Deep learning for detection of pulmonary metastasis on chest radiographs. Radiology. 2021;301(2):455–63. [DOI] [PubMed] [Google Scholar]
- 30.Nam JG, et al. AI improves nodule detection on chest radiographs in a health screening population: a randomized controlled trial. Radiology. 2023;307(2):e221894. [DOI] [PubMed] [Google Scholar]
- 31.Ardila D, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25(6):954–61. [DOI] [PubMed] [Google Scholar]
- 32.Venkadesh KV, et al. Deep learning for malignancy risk estimation of pulmonary nodules detected at low-dose screening CT. Radiology. 2021;300(2):438–47. [DOI] [PubMed] [Google Scholar]
- 33.Wang C, et al. Data-driven risk stratification and precision management of pulmonary nodules detected on chest computed tomography. Nat Med. 2024;30(11):3184–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mazzone PJ, et al. Clinical validation of a cell-free DNA fragmentome assay for augmentation of lung cancer early detection. Cancer Discov. 2024;14(11):2224–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Liang N, et al. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat Biomed Eng. 2021;5(6):586–99. [DOI] [PubMed] [Google Scholar]
- 36.Coudray N, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24(10):1559–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mehta P, et al. AutoProstate: towards automated reporting of prostate MRI for prostate cancer assessment using deep learning. Cancers. 2021;13. 10.3390/cancers13236138. [DOI] [PMC free article] [PubMed]
- 38.Hamm CA, et al. Interactive explainable deep learning model informs prostate cancer diagnosis at MRI. Radiology. 2023;307(4):e222276. [DOI] [PubMed] [Google Scholar]
- 39.Saha A, et al. Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study. Lancet Oncol. 2024;25(7):879–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bulten W, et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat Med. 2022;28(1):154–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bulten W, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 2020;21(2):233–41. [DOI] [PubMed] [Google Scholar]
- 42.Cho H-H, et al. Classification of the glioma grading using radiomics analysis. PeerJ. 2018;6:e5982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Park YW, et al. Robust performance of deep learning for automatic detection and segmentation of brain metastases using three-dimensional black-blood and three-dimensional gradient echo imaging. Eur Radiol. 2021;31(9):6686–95. [DOI] [PubMed] [Google Scholar]
- 44.Yan J, et al. Predicting 1p/19q co-deletion status from magnetic resonance imaging using deep learning in adult-type diffuse lower-grade gliomas: a discovery and validation study. Lab Invest. 2022;102(2):154–9. [DOI] [PubMed] [Google Scholar]
- 45.Zhou X, et al. Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis. Nat Commun. 2022;13(1):7694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Misawa M, Kudo S-E. Current status of artificial intelligence use in colonoscopy. Digestion. 2025;106(2):138–45. [DOI] [PubMed] [Google Scholar]
- 47.Chitca DD, et al. Advancing colorectal cancer diagnostics from barium enema to AI-assisted colonoscopy. Diagnostics. 2025. 10.3390/diagnostics15080974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hewett DG. Measurement of polyp size at colonoscopy: addressing human and technology bias. Dig Endosc. 2022;34(7):1478–80. [DOI] [PubMed] [Google Scholar]
- 49.Babu B, et al. A narrative review on the role of Artificial Intelligence (AI) in colorectal cancer management. Cureus. 2025;17(2):e79570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Chen J, et al. Ai support for colonoscopy quality control using CNN and transformer architectures. BMC Gastroenterol. 2024;24(1):257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.FDA. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K211951. Cited 2025 Aug 15.
- 52.FDA. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K223473. Cited 2025 Aug 15.
- 53.Wang P, et al. Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study. Lancet Gastroenterol Hepatol. 2020;5(4):343–51. [DOI] [PubMed] [Google Scholar]
- 54.Repici A, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020. 10.1053/j.gastro.2020.04.062. [DOI] [PubMed] [Google Scholar]
- 55.Karsenti D, et al. Effect of real-time computer-aided detection of colorectal adenoma in routine colonoscopy (COLO-GENIUS): a single-centre randomised controlled trial. Lancet Gastroenterol Hepatol. 2023;8(8):726–34. [DOI] [PubMed] [Google Scholar]
- 56.Shaukat A, et al. Computer-aided detection improves adenomas per colonoscopy for screening and surveillance colonoscopy: a randomized trial. Gastroenterology. 2022;163(3):732–41. [DOI] [PubMed] [Google Scholar]
- 57.Mangas-Sanjuan C, et al. Role of artificial intelligence in colonoscopy detection of advanced neoplasias: a randomized trial. Ann Intern Med. 2023;176(9):1145–52. [DOI] [PubMed] [Google Scholar]
- 58.Hassan C, et al. Real-time computer-aided detection of colorectal neoplasia during colonoscopy : a systematic review and meta-analysis. Ann Intern Med. 2023;176(9):1209–20. [DOI] [PubMed] [Google Scholar]
- 59.Reynolds S. Available from: https://www.cancer.gov/news-events/cancer-currents-blog/2023/colonoscopy-cad-artificial-intelligence. Cited 2025 Aug 15.
- 60.Yeasmin MN, et al. Advances of AI in image-based computer-aided diagnosis: a review. Array. 2024;23:100357. [Google Scholar]
- 61.Hassan C, et al. Artificial intelligence allows leaving-in-situ colorectal polyps. Clin Gastroenterol Hepatol. 2022. 10.1016/j.cgh.2022.04.045. [DOI] [PubMed] [Google Scholar]
- 62.Rondonotti E, et al. Artificial intelligence-assisted optical diagnosis for the resect-and-discard strategy in clinical practice: the artificial intelligence BLI characterization (ABC) study. Endoscopy. 2023;55(1):14–22. [DOI] [PubMed] [Google Scholar]
- 63.Barua I, et al. Real-time artificial intelligence-based optical diagnosis of neoplastic polyps during colonoscopy. NEJM Evid. 2022;1(6):EVIDoa2200003. [DOI] [PubMed] [Google Scholar]
- 64.Li JW, et al. Real-world validation of a computer-aided diagnosis system for prediction of polyp histology in colonoscopy: a prospective multicenter study. Am J Gastroenterol. 2023;118(8):1353–64. [DOI] [PubMed] [Google Scholar]
- 65.Rex DK, et al. The American Society for Gastrointestinal Endoscopy PIVI (preservation and incorporation of valuable endoscopic innovations) on real-time endoscopic assessment of the histology of diminutive colorectal polyps. Gastrointest Endosc. 2011;73(3):419–22. [DOI] [PubMed] [Google Scholar]
- 66.Hassan C, et al. Computer-aided diagnosis for the resect-and-discard strategy for colorectal polyps: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol. 2024;9(11):1010–9. [DOI] [PubMed] [Google Scholar]
- 67.Graham S, et al. MILD-net: Minimal information loss dilated network for gland instance segmentation in colon histology images. Med Image Anal. 2019;52:199–211. [DOI] [PubMed] [Google Scholar]
- 68.Zhao K, et al. Artificial intelligence quantified tumour-stroma ratio is an independent predictor for overall survival in resectable colorectal cancer. EBioMedicine. 2020;61:103054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Giger ML, Chan HP, Boone J. Anniversary paper: history and status of CAD and quantitative image analysis: the role of medical physics and AAPM. Med Phys. 2008;35(12):5799–820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Gao Y, et al. New frontiers: an update on computer-aided diagnosis for breast imaging in the age of artificial intelligence. AJR Am J Roentgenol. 2019;212(2):300–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Chen Y, et al. AI in breast cancer imaging: an update and future trends. Semin Nucl Med. 2025;55(3):358–70. [DOI] [PubMed] [Google Scholar]
- 72.FDA. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K220105. Cited 2025 Aug 15.
- 73.FDA. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K211541. Cited 2025 Aug 15.
- 74.FDA. Available from: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K200905. Cited 2025 Aug 15.
- 75.Lång K, et al. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol. 2023;24(8):936–44. [DOI] [PubMed] [Google Scholar]
- 76.Yala A, et al. Multi-institutional validation of a mammography-based breast cancer risk model. J Clin Oncol. 2022;40(16):1732–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Vachon CM, et al. Impact of artificial intelligence system and volumetric density on risk prediction of interval, screen-detected, and advanced breast cancer. J Clin Oncol. 2023;41(17):3172–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Lotter W, et al. Artificial intelligence in oncology: current landscape, challenges, and future directions. Cancer Discov. 2024;14(5):711–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Liu Y, et al. Applications of artificial intelligence in breast pathology. Arch Pathol Lab Med. 2023;147(9):1003–13. [DOI] [PubMed] [Google Scholar]
- 80.Zhang-Yin J, Mauel E, Talpe S. Update on sentinel lymph node methods and pathology in breast cancer. Diagnostics (Basel, Switzerland). 2024. 10.3390/diagnostics14030252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Paige. Available from: https://www.businesswire.com/news/home/20231026045607/en/U.S.-FDA-Grants-Paige-Breakthrough-Device-Designation-for-Cancer-Detection-in-Breast-Lymph-Nodes. Cited 2025 Aug 15.
- 82.Brown JR, et al. Quantitative assessment Ki-67 score for prediction of response to neoadjuvant chemotherapy in breast cancer. Lab Invest. 2014;94(1):98–106. [DOI] [PubMed] [Google Scholar]
- 83.Casterá C, Bernet L. HER2 immunohistochemistry inter-observer reproducibility in 205 cases of invasive breast carcinoma additionally tested by ISH. Ann Diagn Pathol. 2020;45:151451. [DOI] [PubMed] [Google Scholar]
- 84.Cai L, et al. Improving Ki67 assessment concordance by the use of an artificial intelligence-empowered microscope: a multi-institutional ring study. Histopathology. 2021;79(4):544–55. [DOI] [PubMed] [Google Scholar]
- 85.Bodén ACS, et al. The human-in-the-loop: an evaluation of pathologists’ interaction with artificial intelligence in clinical practice. Histopathology. 2021;79(2):210–8. [DOI] [PubMed] [Google Scholar]
- 86.Dy A, et al. AI improves accuracy, agreement and efficiency of pathologists for Ki67 assessments in breast cancer. Sci Rep. 2024;14(1):1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Albuquerque DAN, et al. Systematic review and meta-analysis of artificial intelligence in classifying HER2 status in breast cancer immunohistochemistry. NPJ Digit Med. 2025;8(1):144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Hwang EJ, Goo JM, Park CM. AI applications for thoracic imaging: considerations for best practice. Radiology. 2025;314(2):e240650. [DOI] [PubMed] [Google Scholar]
- 89.Nam JG, et al. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology. 2019;290(1):218–28. [DOI] [PubMed] [Google Scholar]
- 90.Hwang EJ, et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open. 2019;2(3):e191095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Aberle DR, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.RadNet. Available from: https://www.radnet.com/corp/artificial-intelligence/lung. Cited 2025 Aug 16.
- 93.DeepHealth. Available from: https://deephealth.com/population-health/deephealth-lung/. Cited 2025 Aug 16.
- 94.Qure.ai. Available from: https://www.qure.ai/news_press_coverages/qure.ai-launches-FDA-cleared-AI-solution-for-advanced-lung-nodule-quantification-on-CT-scans-at-AABIP-2024. Cited 2025 Aug 16.
- 95.Yu K-H, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7(1):12474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Khosravi P, et al. Deep convolutional neural networks enable discrimination of heterogeneous digital pathology images. EBioMedicine. 2018;27:317–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Lu MY, et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng. 2021;5(6):555–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Wang S, et al. Convpath: a software tool for lung adenocarcinoma digital pathological image analysis aided by a convolutional neural network. EBioMedicine. 2019;50:103–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Hofman P, et al. Artificial intelligence for diagnosis and predictive biomarkers in non-small cell lung cancer patients: new promises but also new hurdles for the pathologist. Lung Cancer. 2025;200:108110. [DOI] [PubMed] [Google Scholar]
- 100.Kasivisvanathan V, et al. MRI-targeted or standard biopsy for prostate-cancer diagnosis. N Engl J Med. 2018;378(19):1767–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Turkbey B, Haider MA. Deep learning-based artificial intelligence applications in prostate MRI: brief summary. Br J Radiol. 2022;95(1131):20210563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Winkel DJ, et al. A novel deep learning based computer-aided diagnosis system improves the accuracy and efficiency of radiologists in reading biparametric magnetic resonance images of the prostate: results of a multireader, multicase study. Invest Radiol. 2021;56(10):605–13. [DOI] [PubMed] [Google Scholar]
- 103.Siemens. Available from: https://www.siemens-healthineers.com/magnetic-resonance-imaging/clinical-specialities/prostate-mri. Cited 2025 Aug 16.
- 104.Quantib. Available from: https://www.quantib.com/en/solutions/quantib-prostate. Cited 2025 Aug 16.
- 105.Cortechs.ai. Available from: https://www.cortechs.ai/onq-prostate/. Cited 2025 Aug 16.
- 106.GeHealthCare. Available from: https://www.gehealthcare.com/products/magnetic-resonance-imaging/signa-works/proview-body?srsltid=AfmBOor9XnUlvs2WQBRpc1_9pDur74tJMoWJAd7AcFwGicFveZGCfBfj. Cited 2025 Aug 16.
- 107.Quibim. Available from: https://quibim.com/qp-prostate/. Cited 2025 Aug 16.
- 108.Campanella G, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25(8):1301–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Pinckaers H, et al. Detection of prostate cancer in whole-slide images through end-to-end training with image-level labels. IEEE Trans Med Imaging. 2021;40(7):1817–26. [DOI] [PubMed] [Google Scholar]
- 110.Chen C-M, et al. A computer-aided diagnosis system for differentiation and delineation of malignant regions on whole-slide prostate histopathology image using spatial statistics and multidimensional densenet. Med Phys. 2020;47(3):1021–33. [DOI] [PubMed] [Google Scholar]
- 111.Raciti P, et al. Clinical validation of artificial intelligence-augmented pathology diagnosis demonstrates significant gains in diagnostic accuracy in prostate cancer detection. Arch Pathol Lab Med. 2023;147(10):1178–85. [DOI] [PubMed] [Google Scholar]
- 112.da Silva LM, et al. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol. 2021;254(2):147–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Perincheri S, et al. An independent assessment of an artificial intelligence system for prostate cancer detection shows strong diagnostic accuracy. Mod Pathol. 2021;34(8):1588–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Nagpal K, et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med. 2019;2(1):48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Khalighi S, et al. Artificial intelligence in neuro-oncology: advances and challenges in brain tumor diagnosis, prognosis, and precision treatment. NPJ Precis Oncol. 2024;8(1):80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Musthafa N, Memon QA, Masud MM. Advancing brain tumor analysis: current trends, key challenges, and perspectives in deep learning-based brain MRI tumor diagnosis. Eng. 2025;6(5):82. [Google Scholar]
- 117.Ertosun MG, Rubin DL. Automated Grading of Gliomas using Deep Learning in Digital Pathology Images: A modular approach with ensemble of convolutional neural networks. AMIA Annu Symp Proc. 2015;2015:1899–1908. [PMC free article] [PubMed]
- 118.Li Z, et al. Vision transformer-based weakly supervised histopathological image analysis of primary brain tumors. iScience. 2023;26(1):105872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Louis DN, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro-oncol. 2021;23(8):1231–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Bender K, et al. High-grade astrocytoma with piloid features (HGAP): the Charité experience with a new central nervous system tumor entity. J Neuro-Oncol. 2021;153(1):109–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Vermeulen C, et al. Ultra-fast deep-learned CNS tumour classification during surgery. Nature. 2023;622(7984):842–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Hoang D-T, et al. Prediction of DNA methylation-based tumor types from histopathology in central nervous system tumors with deep learning. Nat Med. 2024;30(7):1952–61. [DOI] [PubMed] [Google Scholar]
- 123.Kim HS, et al. Single-incision robotic colorectal surgery with the da Vinci SP® surgical system: initial results of 50 cases. Tech Coloproctol. 2023;27(7):589–99. [DOI] [PubMed] [Google Scholar]
- 124.Picciariello A, et al. Evaluation of the da Vinci single-port system in colorectal cancer surgery: a scoping review. Update Surg. 2024;76(7):2515–20. [DOI] [PubMed] [Google Scholar]
- 125.Di Costanzo G, et al. Artificial intelligence and radiomics in magnetic resonance imaging of rectal cancer: a review. Explor Target Antitumor Ther. 2023;4(3):406–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Kather JN, et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 2019;16(1):e1002730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Reichling C, et al. Artificial intelligence-guided tissue analysis combined with immune infiltrate assessment predicts stage III colon cancer outcomes in PETACC08 study. Gut. 2020;69(4):681–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Wu Z, et al. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nat Biomed Eng. 2022;6(12):1435–48. [DOI] [PubMed] [Google Scholar]
- 129.Foersch S, et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat Med. 2023;29(2):430–9. [DOI] [PubMed] [Google Scholar]
- 130.Jiang X, et al. An MRI deep learning model predicts outcome in rectal cancer. Radiology. 2023;307(5):e222223. [DOI] [PubMed] [Google Scholar]
- 131.Arole V, et al. Clinical validation of Histotype Px colorectal in patients in a U.S. colon cancer cohort. J Clin Oncol. 2024;42(16_suppl):3622–3622. [Google Scholar]
- 132.DoMore. Available from: https://www.domorediagnostics.com/news/do-more-diagnostics-achieves-ce-mark-43lh6. Cited 2025 Aug 16.
- 133.L’Imperio V, et al. Pathologist validation of a machine learning-derived feature for colon cancer risk stratification. JAMA Netw Open. 2023;6(3):e2254891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Skrede O-J, et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet. 2020;395(10221):350–60. [DOI] [PubMed] [Google Scholar]
- 135.Echle A, et al. Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology. 2020. 10.1053/j.gastro.2020.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Kather JN, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019;25(7):1054–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Owkin. Available from: https://www.owkin.com/diagnostics/msintuit-crc. Cited 2025 Aug 16.
- 138.Pfob A, et al. Towards patient-centered decision-making in breast cancer surgery: machine learning to predict individual patient-reported outcomes at 1-year follow-up. Ann Surg. 2023;277(1):e144–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Kothari R, et al. Raman spectroscopy and artificial intelligence to predict the Bayesian probability of breast cancer. Sci Rep. 2021;11(1):6482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Park S, et al. A deep learning model of tumor cell architecture elucidates response and resistance to CDK4/6 inhibitors. Nat Cancer. 2024;5(7):996–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.Sammut S-J, et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature. 2022;601(7894):623–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.du Ogier Terrail J, et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat Med. 2023;29(1):135–46. [DOI] [PubMed] [Google Scholar]
- 143.Amgad M, et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat Med. 2024;30(1):85–97. [DOI] [PubMed] [Google Scholar]
- 144.Saltz J, et al., Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. Cell Reports. 2018;23(1):181-193. [DOI] [PMC free article] [PubMed]
- 145.Binder A, et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat Mach Intell. 2021;3(4):355–66. [Google Scholar]
- 146.Shamai G, et al. Deep learning-based image analysis predicts PD-L1 status from H&E-stained histopathology images in breast cancer. Nat Commun. 2022;13(1):6753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Shamai G, et al. Artificial intelligence algorithms to assess hormonal status from tissue microarrays in patients with breast cancer. JAMA Netw Open. 2019;2(7):e197700–e197700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Assaf ZJF, et al. A longitudinal circulating tumor DNA-based model associated with survival in metastatic non-small-cell lung cancer. Nat Med. 2023;29(4):859–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Widman AJ, et al. Ultrasensitive plasma-based monitoring of tumor burden using machine-learning-guided signal enrichment. Nat Med. 2024;30(6):1655–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Heeke S, et al. Tumor- and circulating-free DNA methylation identifies clinically relevant small cell lung cancer subtypes. Cancer Cell. 2024;42(2):225-237.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Park S, et al. Artificial intelligence-powered spatial analysis of tumor-infiltrating lymphocytes as complementary biomarker for immune checkpoint inhibition in non-small-cell lung cancer. J Clin Oncol. 2022;40(17):1916–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Lunit. Available from: https://www.lunit.io/en/company/news/lunit-ai-solution-for-pd-l1-expression-analysis-receives-ce-mark. Cited 2025 Aug 17.
- 153.Vanguri RS, et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat Cancer. 2022;3(10):1151–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Wang S, et al. Mining whole-lung information by artificial intelligence for predicting EGFR genotype and targeted therapy response in lung cancer: a multicohort study. Lancet Digit Health. 2022;4(5):e309–19. [DOI] [PubMed] [Google Scholar]
- 155.Rakaee M, et al. Machine learning-based immune phenotypes correlate with STK11/KEAP1 co-mutations and prognosis in resectable NSCLC: a sub-study of the TNM-I trial. Ann Oncol. 2023;34(7):578–88. [DOI] [PubMed] [Google Scholar]
- 156.Ricciuti B, et al. Genomic and immunophenotypic landscape of acquired resistance to PD-(L)1 blockade in non–small-cell lung cancer. J Clin Oncol. 2024;42(11):1311–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Khanna R, et al. Artificial intelligence in the management of prostate cancer. Nat Rev Urol. 2025;22(3):125–6. [DOI] [PubMed] [Google Scholar]
- 158.Hung AJ, et al. Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J Endourol. 2018;32(5):438–44. [DOI] [PubMed] [Google Scholar]
- 159.Checcucci E, et al. Three-dimensional automatic artificial intelligence driven augmented-reality selective biopsy during nerve-sparing robot-assisted radical prostatectomy: a feasibility and accuracy study. Asian J Urol. 2023;10(4):407–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.McIntosh C, et al. Clinical integration of machine learning for curative-intent radiation treatment of patients with prostate cancer. Nat Med. 2021;27(6):999–1005. [DOI] [PubMed] [Google Scholar]
- 161.Nouranian S, et al. Learning-based multi-label segmentation of transrectal ultrasound images for prostate brachytherapy. IEEE Trans Med Imaging. 2016;35(3):921–32. [DOI] [PubMed] [Google Scholar]
- 162.Daskivich TJ, et al. Limitations of the National Comprehensive Cancer Network® (NCCN®) guidelines for prediction of limited life expectancy in men with prostate cancer. J Urol. 2017;197(2):356–62. [DOI] [PubMed] [Google Scholar]
- 163.Esteva A, et al. Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials. NPJ Digit Med. 2022;5(1):71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Parker CTA, et al. External validation of a digital pathology-based multimodal artificial intelligence-derived prognostic model in patients with advanced prostate cancer starting long-term androgen deprivation therapy: a post-hoc ancillary biomarker study of four phase 3 randomised controlled trials of the STAMPEDE platform protocol. Lancet Digit Health. 2025. 10.1016/j.landig.2025.100885. [DOI] [PubMed] [Google Scholar]
- 165.Elmarakeby HA, et al. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598(7880):348–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Kartasalo K, et al. Detection of perineural invasion in prostate needle biopsies with deep neural networks. Virchows Arch. 2022;481(1):73–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167.Spratt DE, et al. Artificial intelligence predictive model for hormone therapy use in prostate cancer. NEJM Evid. 2023;2(8):EVIDoa2300023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168.Le NQK, et al. XGBoost improves classification of MGMT promoter methylation status in IDH1 wildtype glioblastoma. J Pers Med. 2020;10(3):128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169.Do DT, et al. Improving MGMT methylation status prediction of glioblastoma through optimizing radiomics features using genetic algorithm-based machine learning approach. Sci Rep. 2022;12(1):13412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170.Kawahara D, et al. Predicting the local response of metastatic brain tumor to Gamma Knife radiosurgery by radiomics with a machine learning method. Front Oncol. 2020;10:569461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171.Peng L, et al. Distinguishing true progression from radionecrosis after stereotactic radiation therapy for brain metastases with machine learning and radiomics. Int J Radiat Oncol Biol Phys. 2018;102(4):1236–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Zhang B, et al. Machine-learning based MRI radiomics models for early detection of radiation-induced brain injury in nasopharyngeal carcinoma. BMC Cancer. 2020;20(1):502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173.Kocher M, et al. Applications of radiomics and machine learning for radiotherapy of malignant brain tumors. Strahlenther Onkol. 2020;196(10):856–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174.Macyszyn L, et al. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-oncol. 2016;18(3):417–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175.Zheng Y, et al. Spatial cellular architecture predicts prognosis in glioblastoma. Nat Commun. 2023;14(1):4122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176.Hsu E, et al. Machine learning and deep learning tools for the automated capture of cancer surveillance data. J Natl Cancer Inst Monogr. 2024;2024(65):145–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177.Alawad M, et al. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc. 2020;27(1):89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Chandrashekar M, et al. Path-BigBird: an AI-driven transformer approach to classification of cancer pathology reports. JCO Clin Cancer Inform. 2024;8:e2300148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 179.Placido D, et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat Med. 2023;29(5):1113–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180.Guevara M, et al. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med. 2024;7(1):6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181.Pan J, et al. Integrating large language models with human expertise for disease detection in electronic health records. Comput Biol Med. 2025;191:110161. [DOI] [PubMed] [Google Scholar]
- 182.Abramson J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183.Wang T, et al. Ab initio characterization of protein molecular dynamics with AI2BMD. Nature. 2024;635(8040):1019–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184.Ren F, et al. Alphafold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chem Sci. 2023;14(6):1443–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185.Zhavoronkov A, et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol. 2019;37(9):1038–40. [DOI] [PubMed] [Google Scholar]
- 186.Vijayan RSK, et al. Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today. 2022;27(4):967–84. [DOI] [PubMed] [Google Scholar]
- 187.Tran NL, et al. Artificial intelligence-driven new drug discovery targeting serine/threonine kinase 33 for cancer treatment. Cancer Cell Int. 2023;23(1):321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188.Abdel-Rehim A, et al. Scientific hypothesis generation by large language models: laboratory validation in breast cancer treatment. J R Soc Interface. 2025;22(227):20240674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189.De Vries M, et al. Geometric deep learning and multiple-instance learning for 3d cell-shape profiling. Cell Syst. 2025;16(3):101229. [DOI] [PubMed] [Google Scholar]
- 190.Zhao X, et al. Cancer mutations converge on a collection of protein assemblies to predict resistance to replication stress. Cancer Discov. 2024;14(3):508–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 191.Chaix, B., et al., When Chatbots Meet Patients: One-Year Prospective Study of Conversations Between Patients With Breast Cancer and a Chatbot. JMIR Cancer. 2019;5(1):e12856. [DOI] [PMC free article] [PubMed]
- 192.Tawfik, E., E. Ghallab, and A. Moustafa, A nurse versus a chatbot ‒ the effect of an empowerment program on chemotherapy-related side effects and the self-care behaviors of women living with breast Cancer: a randomized controlled trial. BMC Nurs. 2023;22(1):102. [DOI] [PMC free article] [PubMed]
- 193.Park S. AI chatbots and linguistic injustice. J Univ Lang. 2024;25(1):99–119. [Google Scholar]
- 194.Kataoka Y, et al. Development and early feasibility of chatbots for educating patients with lung cancer and their caregivers in Japan: mixed methods study. JMIR Cancer. 2021;7(1):e26911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 195.Wu X, et al. reguloGPT: Harnessing GPT for Knowledge Graph Construction of Molecular Regulatory Pathways. BioRxiv : the Preprint Server For Biology. bioRxiv [Preprint]. 2024.
- 196.Ingólfsson HI, et al. Machine learning-driven multiscale modeling: bridging the scales with a next-generation simulation infrastructure. J Chem Theory Comput. 2023;19(9):2658–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 197.Lee M, Ahmad SF, Xu J. Regulation and function of transposable elements in cancer genomes. Cell Mol Life Sci. 2024;81(1):157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 198.Riehl K, et al. Transposonultimate: software for transposon classification, annotation and detection. Nucleic Acids Res. 2022;50(11):e64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199.Wang R, et al. DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis. Nucleic Acids Res. 2023;51(7):3017–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 200.Williamson SM, Prybutok V. Balancing privacy and progress: a review of privacy challenges, systemic oversight, and patient perceptions in AI-driven healthcare. Appl Sci. 2024;14(2):675. [Google Scholar]
- 201.Wang C, et al. Privacy protection in using artificial intelligence for healthcare: Chinese regulation in comparative perspective. Healthcare. 2022. 10.3390/healthcare10101878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202.Khalid N, et al. Privacy-preserving artificial intelligence in healthcare: techniques and applications. Comput Biol Med. 2023;158:106848. [DOI] [PubMed] [Google Scholar]
- 203.Pool J, et al. A systematic analysis of failures in protecting personal health data: a scoping review. Int J Inf Manage. 2024;74:102719. [Google Scholar]
- 204.Murdoch B. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics. 2021;22(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 205.Panagopoulos A, et al. Incentivizing the sharing of healthcare data in the AI era. Comput Law Secur Rev. 2022;45:105670. [Google Scholar]
- 206.Li M, et al. From challenges and pitfalls to recommendations and opportunities: implementing federated learning in healthcare. Med Image Anal. 2025;101:103497. [DOI] [PubMed] [Google Scholar]
- 207.Rieke N, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3(1):119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 208.Elbachir YM, et al. Federated Learning for Multi-institutional on 3D Brain Tumor Segmentation. in 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS). pages 1-8, IEEE, 2024.
- 209.Mohri M, Sivek G, Suresh AT. Agnostic federated learning. in International conference on machine learning. PMLR; 2019, pp. 4615–25.
- 210.Li T, et al. Fair resource allocation in federated learning. arXiv. 2019, arXiv:1905.10497.
- 211.Xu J, et al. Federated learning for healthcare informatics. J Healthc Inform Res. 2021;19. 10.1007/s41666-020-00082-4. [DOI] [PMC free article] [PubMed]
- 212.Almanifi ORA, et al. Communication and computation efficiency in federated learning: a survey. Internet of Things. 2023;22:100742. [Google Scholar]
- 213.Singh JP, et al. Privacy-aware hierarchical federated learning in healthcare: integrating differential privacy and secure multi-party computation. Future Internet. 2025;17(8):345. [Google Scholar]
- 214.Shukla S, et al. Federated learning with differential privacy for breast cancer diagnosis enabling secure data sharing and model integrity. Sci Rep. 2025;15(1):13061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 215.Nisevic M, Milojevic D, Spajic D. Synthetic data in medicine: legal and ethical considerations for patient profiling. Comput Struct Biotechnol J. 2025;28:190–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 216.Zhou Z, et al. Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis. NPJ Digit Med. 2024;7(1):293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 217.Walonoski J, et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018;25(3):230–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 218.Li J, et al. A comprehensive survey on source-free domain adaptation. IEEE Trans Pattern Anal Mach Intell. 2024;46(8):5743–62. [DOI] [PubMed] [Google Scholar]
- 219.Peng D, et al. Unsupervised domain adaptation via domain-adaptive diffusion. IEEE Trans Image Process. 2024. 10.1109/TIP.2024.3424985. [DOI] [PubMed] [Google Scholar]
- 220.Wang H, et al. Dual-reference source-free active domain adaptation for nasopharyngeal carcinoma tumor segmentation across multiple hospitals. IEEE Trans Med Imaging. 2024;43(12):4078–90. [DOI] [PubMed] [Google Scholar]
- 221.Guichemerre A, et al. Source-free domain adaptation of weakly-supervised object localization models for histology. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. arXiv. 2024, arXiv:2404.19113.
- 222.Agbo CC, Mahmoud QH, Eklund JM. Blockchain technology in healthcare: a systematic review. Healthcare (Basel). 2019. 10.3390/healthcare7020056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 223.Pokharel BP, et al. BlockHealthSecure: integrating blockchain and cybersecurity in post-pandemic healthcare systems. Information. 2025;16(2):133. [Google Scholar]
- 224.Munjal, K. and R. Bhatia, A systematic review of homomorphic encryption and its contributions in healthcare industry. Complex Intell Systems. 2022;9(4):1–28. [DOI] [PMC free article] [PubMed]
- 225.Lawlor RT. The impact of GDPR on data sharing for European cancer research. Lancet Oncol. 2023;24(1):6–8. [DOI] [PubMed] [Google Scholar]
- 226.Harvey HB, Gowda V. How the FDA regulates AI. Acad Radiol. 2020;27(1):58–61. [DOI] [PubMed] [Google Scholar]
- 227.FDA. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-software-medical-device. Cited 2025 Aug 12.
- 228.Venkatesh KP, Kadakia KT, Gilbert S. Learnings from the first AI-enabled skin cancer device for primary care authorized by FDA. NPJ Digit Med. 2024;7(1):156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 229.Fehrmann RSN, van Kruchten M, de Vries EGE. How to critically appraise and direct the trajectory of AI development and application in oncology. ESMO Real World Data and Digital Oncology. 2024;5:100066. [Google Scholar]
- 230.Hovda T, et al. Retrospective evaluation of a CE-marked AI system, including 1,017,208 mammography screening examinations. Eur Radiol. 2025. 10.1007/s00330-025-11521-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 231.Aiforia. Available from: https://www.aiforia.com/press-releases/ivdr-certification-and-new-ceivd-marked-products. Cited 2025 Aug 12.
- 232.Aiosyn. Available from: https://ioplus.nl/en/posts/ai-cancer-detection-software-aiosyn-gets-european-certification. Cited 2025 Aug 12.
- 233.Lvovs D, et al. Balancing ethical data sharing and open science for reproducible research in biomedical data science. Cell Rep Med. 2025;6(4):102080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 234.Cheng C, et al. A general primer for data harmonization. Sci Data. 2024;11(1):152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 235.Bhinder B, et al. Artificial intelligence in cancer research and precision medicine. Cancer Discov. 2021;11(4):900–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 236.Zavala VA, et al. Cancer health disparities in racial/ethnic minorities in the United States. Br J Cancer. 2021;124(2):315–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 237.Bouguettaya A, Stuart EM, Aboujaoude E. Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digit Med. 2025;8(1):332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 238.Yuan J, et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell. 2018. 10.1016/j.ccell.2018.08.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 239.Spratt DE, et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol. 2016;2(8):1070–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 240.Ju D, et al. Importance of including non-European populations in large human genetic studies to enhance precision medicine. Annu Rev Biomed Data Sci. 2022;5:321–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 241.Sollis E, et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2023;51(D1):D977–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 242.Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51(4):584–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 243.Dutil J, et al. An interactive resource to probe genetic diversity and estimated ancestry in cancer cell lines. Cancer Res. 2019;79(7):1263–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 244.Wonkam A, Adeyemo A. Leveraging our common African origins to understand human evolution and health. Cell Genom. 2023;3(3):100278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 245.Duda P, Jan Z. Human population history revealed by a supertree approach. Sci Rep. 2016;6(1):29890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 246.Smith LA, et al. Equitable machine learning counteracts ancestral bias in precision medicine. Nat Commun. 2025;16(1):2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 247.Koçak B, et al. Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Intervent Radiol (Ankara, Turkey). 2025;31(2):75–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 248.Norori N, et al. Addressing bias in big data and AI for health care: a call for open science. Patterns. 2021;2(10):100347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 249.Pinaya WHL, et al. Brain Imaging Generation with Latent Diffusion Models. In: Deep Generative Models: Second MICCAI Workshop, DGM4MICCAI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings. Singapore, Singapore: Springer-Verlag; 2022. p. 117–26. [Google Scholar]
- 250.McCradden MD, et al. Ethical limitations of algorithmic fairness solutions in health care machine learning. The Lancet Digital Health. 2020;2(5):e221–3. [DOI] [PubMed] [Google Scholar]
- 251.Seyyed-Kalantari L, et al. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27(12):2176–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 252.Chen RJ, et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng. 2023;7(6):719–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 253.Buolamwini J, Gebru T. Gender shades: Intersectional accuracy disparities in commercial gender classification. InProceedings of the 1st Conference on Fairness, Accountability and Transparency. A.F. Sorelle and W. Christo, Editors. PMLR: Proceedings of Machine Learning Research; 2018. pp. 77--91.
- 254.Diao JA, et al. Clinical implications of removing race from estimates of kidney function. JAMA. 2021;325(2):184–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 255.Kamiran F, Calders T. Data preprocessing techniques for classification without discrimination. Knowl Inf Syst. 2012;33(1):1–33. [Google Scholar]
- 256.Krasanakis E, et al. Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification. In Proceedings of the 2018 World Wide Web Conference. Lyon, France: International World Wide Web Conferences Steering Committee; 2018. pp. 853–862.
- 257.Jiang H, Nachum O. Identifying and Correcting Label Bias in Machine Learning. Inproceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. C. Silvia and C. Roberto, Editors. PMLR: Proceedings of Machine Learning Research; 2020. pp. 702-712.
- 258.Kamishima T, et al. Fairness-aware classifier with prejudice remover regularizer. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012.
- 259.Zafar MB, et al. Fairness Constraints: Mechanisms for Fair Classification. InProceedings of the 20th International Conference on Artificial Intelligence and Statistics. S. Aarti and Z. Jerry, Editors. PMLR: Proceedings of Machine Learning Research; 2017. pp. 962-970.
- 260.Goel N, Yaghini M, Faltings B. Non-Discriminatory Machine Learning through Convex Fairness Criteria. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. New Orleans, LA, USA: Association for Computing Machinery; 2018. p. 116.
- 261.Corbett-Davies S, et al. Algorithmic decision making and the cost of fairness. in Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. arXiv. 2017, arXiv:1701.08230.
- 262.Hardt M, Price E, Srebro N, .J.A.i.n.i.p.s. Equality of opportunity in supervised learning. arXiv.2016, arXiv:1610.02413.
- 263.Corbett-Davies S, et al. The measure and mismeasure of fairness. 2023. 24(312):1-117.
- 264.Kleinberg J, Mullainathan S, Raghavan M, J.a.p.a. Inherent trade-offs in the fair determination of risk scores. arXiv. 2016, arXiv:1609.05807.
- 265.Pleiss G, et al. On fairness and calibration. arXiv. 2017, arXiv:1709.02012.
- 266.Chouldechova A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data. 2017;5(2):153–63. [DOI] [PubMed] [Google Scholar]
- 267.Pfohl SR, Foryciarz A, Shah NH. An empirical characterization of fair machine learning for clinical risk prediction. J Biomed Inform. 2021;113:103621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 268.Zhao H, Gordon GJ, J.J.o.M.L.R. Inherent tradeoffs in learning fair representations. arXiv.2022, arXiv:1906.08386.
- 269.Giguere S, et al. Fairness guarantees under demographic shift. InProceedings of the 10th International Conference on Learning Representations (ICLR). 2022 Poster.
- 270.Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics (Oxford, England). 2020;21(2):345–52. [DOI] [PubMed] [Google Scholar]
- 271.Guo LL, et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci Rep. 2022;12(1):2726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 272.Castro DC, Walker I, Glocker B. Causality matters in medical imaging. Nat Commun. 2020;11(1):3673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 273.Hashimoto T, et al. Fairness without demographics in repeated loss minimization. InInternational Conference on Machine Learning. PMLR 80:1929-1938, 2018.
- 274.Wang S, et al. Robust optimization for fairness with noisy protected groups. arXiv. 2020, arXiv:2002.09343.
- 275.Duchi JC, Namkoong HJ. Learning models with uniform performance via distributionally robust optimization. Ann Stat. 2021;49(3):1378–406. [Google Scholar]
- 276.Heslin KC, et al. Trends in opioid-related inpatient stays shifted after the US transitioned to ICD-10-CM diagnosis coding in 2015. Med Care. 2017;55(11):918–23. [DOI] [PubMed] [Google Scholar]
- 277.Guo LL, et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl Clin Inform. 2021;12(4):808–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 278.Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing. In: Proc. of the 7th SIAM Int. Conf. on Data Mining, SDM (2007).
- 279.Hao M, et al. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans Ind Inform. 2020;16(10):6532–42. [Google Scholar]
- 280.Yang Q, et al. Federated machine learning: concept and applications. 2019. 10(2 %J ACM Trans. Intell. Syst. Technol.): p. Article 12.
- 281.Bonawitz K, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA: Association for Computing Machinery; 2017. pp. 1175–1191.
- 282.Rieke N, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 283.Wang Y, et al. Intelligent fault diagnosis with deep adversarial domain adaptation. IEEE Transactions on Instrumentation and Measurement. 2020;70:p. 1-9.
- 284.Bercea CI, et al. Feddis: Disentangled federated learning for unsupervised brain pathology segmentation. arXiv. 2021, arXiv:2103.03705.
- 285.Wexler, J., et al., Probing ML models for fairness with the what-if tool and SHAP: hands-on tutorial, in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. Barcelona, Spain: Association for Computing Machinery;2020. ACM, 705.
- 286.Meng C, et al. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci Rep. 2022;12(1):7166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 287.Jacovi A, et al. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021. p. 624 - 635.
- 288.Floridi L. Establishing the rules for building trustworthy AI. Nat Mach Intell. 2019;1(6):261–2. [Google Scholar]
- 289.Liu X, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020;26(9):1364–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 290.Sounderajah V, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. 2021;11(6):e047709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 291.Murikah W, Nthenge JK, Musyoka FM. Bias and ethics of AI systems applied in auditing - a systematic review. Sci Afr. 2024;25:e02281. [Google Scholar]
- 292.Durán JM, Pozzi G. Trust and trustworthiness in AI. Philos Technol. 2025;38(1):16. [Google Scholar]
- 293.Buiten MC. Product liability for defective AI. Eur J Law Econ. 2024;57(1):239–73. [Google Scholar]
- 294.Zhang C, et al. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J Hematol Oncol. 2023;16(1):114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 295.Smith AA, Li R, Tse ZTH. Reshaping healthcare with wearable biosensors. Sci Rep. 2023;13(1):4998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 296.Vo D-K, Trinh KTL. Advances in wearable biosensors for healthcare: current trends, applications, and future perspectives. Biosensors. 2024;14(11):560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 297.Kashaninejad N, et al. Chapter Nine - Wearable biosensors for cancer detection and monitoring. InProgress in Molecular Biology and Translational Science. K. Mahato and A. Pandya, Editors. Academic Press; 2025. pp. 311–354. [DOI] [PubMed]
- 298.Song B, Liang R. Integrating artificial intelligence with smartphone-based imaging for cancer detection in vivo. Biosens Bioelectron. 2025;271:116982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 299.Hunt B, Ruiz AJ, Pogue BW. Smartphone-based imaging systems for medical applications: a critical review. J Biomed Opt. 2021;26(4):040902. [DOI] [PMC free article] [PubMed]
- 300.Markus AF, Kors JA, Rijnbeek PR. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform. 2021;113:103655. [DOI] [PubMed] [Google Scholar]
- 301.Hassija V, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput. 2024;16(1):45–74. [Google Scholar]
- 302.Altman S. Available from: https://www.youtube.com/watch?v=hmtuvNfytjM. Cited 2025 Aug 17.
- 303.36kr. Available from: https://eu.36kr.com/en/p/3418599926402693. Cited 2025 Aug 17.
- 304.McCarthy J, et al. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Mag. 2006;27(4):12. [Google Scholar]
- 305.Fradkov AL. Early history of machine learning. IFAC-PapersOnLine. 2020;53(2):1385–90. [Google Scholar]
- 306.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97. [Google Scholar]
- 307.Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106. [Google Scholar]
- 308.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. [Google Scholar]
- 309.Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci. 2021;2(6):420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 310.Vaswani A, et al. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, California, USA: Curran Associates Inc.; 2017. pp. 6000–6010.
- 311.Sapkota R, Raza S, Karkee M. Comprehensive analysis of transparency and accessibility of chatgpt, deepseek, and other sota large language models. in Preprints. arXiv. 2025, arXiv:2502.18505.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No datasets were generated or analysed during the current study.



