Skip to main content
Advanced Science logoLink to Advanced Science
. 2025 May 29;12(31):e02833. doi: 10.1002/advs.202502833

An Explainable Multimodal Artificial Intelligence Model Integrating Histopathological Microenvironment and EHR Phenotypes for Germline Genetic Testing in Breast Cancer

Zijian Yang 1, Changyuan Guo 2, Jiayi Li 3,4,5, Yalun Li 6, Lei Zhong 7, Pengming Pu 4,5, Tongxuan Shang 4,5, Lin Cong 4,5, Yongxin Zhou 4,5, Guangdong Qiao 6, Ziqi Jia 4, Hengyi Xu 8, Heng Cao 4, Yansong Huang 4,5, Tianyi Liu 9, Jian Liang 10, Jiang Wu 4, Dongxu Ma 4, Yuchen Liu 4, Ruijie Zhou 4, Xiang Wang 4, Jianming Ying 2,, Meng Zhou 1,, Jiaqi Liu 4,8,
PMCID: PMC12376594  PMID: 40439693

Abstract

Genetic testing for pathogenic germline variants is critical for the personalized management of high‐risk breast cancers, guiding targeted therapies and cascade testing for at‐risk families. In this study, MAIGGT (Multimodal Artificial Intelligence Germline Genetic Testing) is proposed, a deep learning framework that integrates histopathological microenvironment features from whole‐slide images with clinical phenotypes from electronic health records for precise prescreening of germline BRCA1/2 mutations. Leveraging a multi‐scale Transformer‐based deep generative architecture, MAIGGT employs a cross‐modal latent representation unification mechanism to capture complementary biological insights from multimodal data. MAIGGT is rigorously validated across three independent cohorts and demonstrated robust performance with areas under receiver operating characteristic curves of 0.925 (95% CI 0.868 – 0.982), 0.845 (95% CI 0.779 – 0.911), and 0.833 (0.788 – 0.878), outperforming single‐modality models. Mechanistic interpretability analyses revealed that BRCA1/2‐mutated associated tumors may exhibit distinct microenvironment patterns, including increased inflammatory cell infiltration, stromal proliferation and necrosis, and nuclear heterogeneity. By bridging digital pathology with clinical phenotypes, MAIGGT establishes a new paradigm for cost‐effective, scalable, and biologically interpretable prescreening of hereditary breast cancer, with the potential to significantly improve the accessibility of genetic testing in routine clinical practice.

Keywords: BRCA1/2, breast cancer, deep learning, electronic health records, tumor microenvironment, whole‐slide image


This study presents an explainable multimodal AI model, MAIGGT (Multimodal Artificial Intelligence Germline Genetic Testing), that combines whole‐slide histopathology with clinical data for accurate germline BRCA1/2 mutation prescreening. By integrating digital pathology and EHR phenotypes, MAIGCT enables cost‐effective, scalable hereditary breast cancer risk assessment, improving genetic testing accessibility in clinical practice.

graphic file with name ADVS-12-e02833-g004.jpg

1. Introduction

Breast cancer is one of the most prevalent malignancies in women worldwide, with 5–10% of all cases attributable to germline pathogenic variants in cancer susceptibility genes.[ 1 ] In particular, pathogenic variations in the highly penetrant genes BRCA1 (17q21) [OMIM ID: 113705] or BRCA2 (13q12–13) [OMIM ID: 600185] account for ≈15– 20% of familial breast cancer cases.[ 1 , 2 ] These germline mutations are associated with highly aggressive phenotypes, including an increased risk of ovarian and contralateral breast cancer and higher tumor grades.[ 3 ] Moreover, BRCA1/2‐related breast cancers are particularly responsive to platinum and poly (ADP‐ribose) polymerase (PARP) inhibitors.[ 4 ] Thus, identifying germline BRCA1/2 mutations is critical for precise treatment, genetic counselling,[ 5 ] and the optimization of health care costs.[ 6 ] Since the first recommendation for germline BRCA1/2 mutation testing in clinical guidelines in 2011,[ 1 ] subsequent guidelines have broadened testing criteria to include patients regardless of age at diagnosis or family history.[ 7 ] Clinical trials have further highlighted the importance of more effective screening methods for BRCA1/2 mutations across different breast cancer phenotypes.[ 8 ] Despite being one of the most commonly used criteria sets for prioritizing BRCA1/2 genetic testing, current National Comprehensive Cancer Network criteria fail to identify nearly half of breast cancer patients with a clinically actionable germline BRCA1/2 mutation.[ 9 ]

Previous studies have attempted to predict BRCA1/2 mutation carriers based on clinical information, such as age, gender and family history of cancer, but achieved limited predictive performance.[ 10 ] Recent efforts to incorporate detailed pathological features into phenotype‐driven prediction models have demonstrated higher accuracy in Asian populations.[ 3 , 10 ] This improved performance has shifted the focus of screening methods to pathological image‐based models, particularly given the well‐established genotype‐phenotype correlations in breast cancer. Tumors with specific genetic mutations often exhibit distinct pathological characteristics and corresponding image features.[ 11 ] Whole‐slide images (WSIs) of breast cancer tissue provide a high‐resolution, detailed representation of the cell composition within the tumor microenvironment (TME), allowing for the quantitative characterization of cell‐level intratumoral heterogeneity (ITH).[ 12 ] Recent advances in artificial intelligence (AI), particularly deep learning algorithms, have introduced innovative methods for analyzing pathological images.[ 13 ] This technological advancement is further exemplified by Bilal et al.’s CAIMAN system, which demonstrated that AI‐based prescreening can achieve 99% sensitivity in identifying normal colorectal biopsies, significantly reducing pathologists workload while maintaining diagnostic accuracy for abnormal cases.[ 14 ] These developments collectively highlight the show promise for predicting prognosis and therapeutic response related to breast pathology.[ 15 ]

However, the clinical application of image‐based AI for genotypic prediction remains underexplored. First, given that somatic mutations are more strongly associated than germline mutations to the tumor in genotype‐phenotype correlations, most image‐based AI models have been designed to predict somatic mutations.[ 16 ] Second, previous models attempting to predict germline mutations from pathology images were trained on limited sample sizes without external validation and achieved inferior performance.[ 17 ] Third, the biological significance of WSI features in image‐based AI models and their complementarity to clinical characteristics is largely unknown. On the other hand, current predictions of BRCA1/2 mutation carriers are often based on single‐modal data, while the exploration of multi‐modal feature representation, such as aligning pathological images with phenotypic data to optimize clinical decision‐making and meet real‐world clinical applicability, remains insufficient.

Here, we developed MAIGGT (Multimodal Artificial Intelligence for Germline Genetic Testing), a novel deep learning framework that synergistically integrates histopathological features from WSIs with clinical phenotypes from electronic health records (EHRs) to accurately predict pathogenic BRCA1/2 mutations. Through rigorous validation across multiple independent cohorts, we demonstrate that MAIGGT not only achieves state‐of‐the‐art predictive performance but also provides novel insights into the spatial biology of BRCA1/2‐associated tumors, offering a transformative approach to precision cancer risk assessment.

2. Results

2.1. Study Overview and Patient Characteristics

The MAIGGT framework initially incorporates a pathology image‐based model, WISE‐BRCA (Whole‐slide Images Systematically Extrapolate BRCA1/2 mutations), which extracts high‐level histopathological feature representations from WSIs and predicts BRCA1/2 mutation status using a multi‐scale transformer architecture. We validated the predictive performance of WISE‐BRCA through extensive multi‐center evaluation and interpretability analyzes. We then developed MAIGGT, which integrates histopathological microenvironment features from WSIs with clinical phenotypes from EHRs for precise multimodal joint prediction and assessed the complementarity between modalities.

We collected 2278 H&E‐stained WSIs from 634 untreated breast cancer patients with matched clinical features across three Chinese hospitals: Cancer Hospital of the Chinese Academy of Medical Sciences in Beijing (CHCAMS), Yantai Yuhuangding Hospital in Shandong Province (YYH), and Harbin Medical University Clinical Hospitals in Heilongjiang Province (HMUCH) (Figure 1A; Figure S1, Supporting Information). Baseline patient characteristics for each cohort are listed in Table  1 . After matching, 92/374 (24.6%) and 27/106 (25.5%) patients with germline BRCA1/2 mutations were in the CHCAMS discovery set and internal test set, respectively. Germline BRCA1/2 mutation carriers and non‐carriers had similar ages of onset, tumor sizes, lymph node statuses, and hormone receptors (HR) and human epidermal growth factor receptor 2 (HER2) status in the CHCAMS cohort (all p > 0.05; Figure S2, Supporting Information). However, carriers showed higher rates of family and personal history of breast and ovarian cancer, as well as a higher phenotype‐based risk score for BRCA1/2 mutations (DrABC score)[ 10 ] compared to non‐carriers (all P<0.05; Figure S2, Supporting Information). For the external validation cohorts, 30/133 (22.6%) and 10/21 (47.6%) patients were classified as BRCA1/2 mutation carriers in the YYH and HMUCH cohorts, respectively.

Figure 1.

Figure 1

Study design and workflow of WISE‐BRCA. A) Multi‐center patient recruitment and sampling. Blood sampling, genetic sequencing, and pathogenic slide staining have done retrospectively. B) WSI pre‐processing and extraction of representative features. C) WSI sampling and schematic workflow of WISE‐BRCA. Abbreviations: WISE‐BRCA, Whole‐slide Images Systematically Extrapolate BRCA1/2 mutations; EHR, electronic health record; FFPE, formalin‐fixed, paraffin‐embedded; CHCAMS, Cancer Hospital, Chinese Academy of Medical Sciences; YYH, Yantai Yuhuangding Hospital; HMUCH, Harbin Medical University Clinical Hospitals.

Table 1.

Baseline characteristics of patients enrolled in this study.

Characteristics CHCAMS YYH HMUCH
No. of patients 480 133 21
No. of WSIs 1995 205 78
Age at diagnosis (Mean  ±  SD) 41.2  ±  9.3 46.5  ±  9.7 53.6  ±  9.8
Germline BRCA1/2 mutation status
BRCA1 mutation carrier 65 (13.5%) 14 (10.5%) 9 (42.9%)
BRCA2 mutation carrier 58 (12.1%) 16 (12.0%) 4 (19.0%)
Non‐carrier 361 (75.2%) 103 (77.4%) 11 (52.4%)
Personal history
Any cancer 32 (6.7%) 19 (14.3%) 3 (14.3%)
Breast cancer 19 (4.0%) 9 (6.8%) 2 (9.5%)
Ovarian cancer 3 (0.6%) 2 (1.5%) 1 (4.8%)
Family history
Any cancer 214 (44.6%) 32 (24.0%) 7 (33.3%)
Breast cancer 110 (22.9%) 11 (8.3%) 6 (28.6%)
Ovarian cancer 18 (3.8%) 1 (0.8%) 1 (4.8%)
Pancreatic cancer 12 (2.5%) 0 (0.0%) 1 (4.8%)
Male breast cancer 2 (0.4%) 0 (0.0%) 0 (0.0%)
Tumor size
≤2cm 232 (48.3%) 59 (44.4%) 9 (42.9%)
>2cm 184 (38.3%) 60 (45.1%) 12 (57.1%)
Grade
I 19 (4.0%) 11 (8.3%) 0 (0.0%)
II 183 (38.1%) 58 (43.6%) 9 (42.9%)
III 206 (42.9%) 41 (30.8) 12 (57.1%)
Estrogen receptor
Positive 176 (36.7%) 76 (57.1%) 6 (28.6%)
Negative 252 (52.5%) 57 (42.9%) 15 (71.4%)
Progesterone receptor
Positive 171 (35.6%) 82 (61.7%) 7 (33.3%)
Negative 252 (53.5%) 51 (38.3%) 14 (66.7%)
Ki67
≤30% 193 (40.2%) 57 (42.9%) 9 (42.9%)
>30% 235 (49.0%) 76 (57.1%) 12 (57.1)
HER2
Positive 36 (7.5%) 15 (11.3%) 1 (4.8%)
Negative 335 (69.8%) 118 (88.7%) 20 (95.2%)
Lymph nodes status
Positive 209 (43.5%) 81 (60.9%) 15 (71.4%)
Negative 205 (42.7%) 39 (29.3%) 6 (28.6%)

Abbreviations: CHCAMS, Cancer Hospital, Chinese Academy of Medical Sciences; YYH, Yantai Yuhuangding Hospital; HMUCH, Harbin Medical University Clinical Hospitals; WSI, whole slide image; SD, standard deviation; HER2, human epidermal growth factor receptor 2.

2.2. Predicting Germline BRCA1/2 Mutation from Histology Images Using WISE‐BRCA

We developed a multi‐scale Transformer‐based deep learning model, WISE‐BRCA, to predict pathogenic germline BRCA1/2 mutation carriers from H&E‐stained WSIs. While developing WISE‐BRCA, we initially evaluated the performance of the tumor segmentation model on the BCSS test set. The results show that the tumor segmentation model has a superior performance with an AUC of 0.894 (95% CI 0.882 – 0.906) for 224 × 224 pixel patches and 0.887 (95% CI 0.858 – 0.916) for 512 × 512 pixel patches (Figure S3, Supporting Information).

For the development and evaluation of WISE‐BRCA, the CHCAMS cohort was split into a discovery set for model training and a test set for internal validation at the patient level in an 8:2 ratios (Figure  2A). In the discovery set, WISE‐BRCA achieved a mean AUC of 0.747 (95% CI 0.719 – 0.775; 0.733 – 0.763 at the slide level) and a mean AUC of 0.755 (95% CI 0.720 – 0.789; 0.736 – 0.789 at the patient level) in the 4‐fold cross‐validation (Figure 2B). We further evaluated the performance of WISE‐BRCA in the CHCAMS test set and two external cohorts. WISE‐BRCA showed consistently high performance in predicting germline BRCA1/2 mutation status, with AUCs of 0.807 (95% CI 0.757 – 0.857), 0.800 (95% CI 0.716 – 0.884), and 0.783 (95% CI 0.678 – 0.889) at the slide level and AUCs of 0.824 (95% CI 0.725 – 0.923), 0.798 (95% CI 0.682 – 0.913), and 0.800 (95% CI 0.590 – 1.000) at the patient level in the CHCAMS test set, YYH cohort, and HMUCH cohort, respectively (Figure 2C). The probabilities generated by WISE‐BRCA were significantly different between non‐carriers and BRCA1/2 mutation carriers at both the slide and patient level (Figure 2D, P < 0.001). Furthermore, the proportion of BRCA1/2 mutation carriers increased with higher prediction scores (Figure 2E). We also compared WISE‐BRCA with other H&E image‐based methods that are widely used in computational pathology, and achieved the best performance (Table S1, Supporting Information).

Figure 2.

Figure 2

WISE‐BRCA's predictive power for germline BRCA1/2 mutations. A) Model development and evaluation design. B) Slide‐level and patient‐level prediction efficacy of four‐fold cross‐validation in the discovery set (n = 358). C) Slide‐ and patient‐level prediction efficacy in the CHCAMS test set (n = 90) and YYH (n = 74) and HMUCH (n = 21) cohorts. D) Slide‐ and patient‐level prediction scores of the CHCAMS test set and YYH and HMUCH cohorts. E) Proportion of germline BRCA1/2 mutation carriers across different model prediction score percentiles and odds ratios for associations between model prediction scores and germline BRCA1/2 mutation status. Abbreviations: WISE‐BRCA, Whole‐slide Images Systematically Extrapolate BRCA1/2 mutations; CHCAMS, Cancer Hospital, Chinese Academy of Medical Sciences; YYH, Yantai Yuhuangding Hospital; HMUCH, Harbin Medical University Clinical Hospitals.

2.3. Model Reliability and Subgroup Analyses

To investigate the reliability and robustness of WISE‐BRCA, we first assessed model performance across patients with different numbers of slides in the three cohorts (Figure 3A; Figure S4, Supporting Information). We observed significant improvements in model performance, as the number of slides per patient increased from 1 to ≥5, with AUCs increasing from 0.786 to 0.906 in the discovery set, from 0.780 to 0.891 in the test set, from 0.776 to 1.000 in the YYH cohort, and from 0.778 to 0.873 in the HMUCH cohort (Figure 3B).

Figure 3.

Figure 3

Model reliability and subgroup analyses. A) Distribution of the number of WSIs and percentage of patients with different numbers of WSIs in the CHCAMS cohort (Discovery set, n = 358; Test set, n = 90). B) Model prediction performance significantly improved as the number of slides for each patient increased from 1 to 5 or more (YYH cohort, n = 74; HMUCH cohort, n = 21). For a given number of slides 𝑛, we sampled 𝑛 slides from each patient with at least 𝑛 slides and averaged the prediction scores of these 𝑛 slides to obtain the patient‐level prediction result. This sampling was performed 1000 times for each number of slides, and the numbers shown represent the average model performance across the 1000 samples. C) Prediction performance on core needle biopsy samples (CHCAMS‐CNB, n = 16; YYH‐CNB, n = 59). D) Prediction performance in BRCA1 (CHCAMS, n = 247; YYH, n = 65; HMUCH, n = 17) or BRCA2 (CHCAMS, n = 241; YYH, n = 65; HMUCH, n = 12) mutation carriers. E) Prediction performance in patients with different tumor sizes (CHCAMS, n = 124; YYH, n = 37; HMUCH, n = 12). F) Prediction performance in patients with different breast cancer molecular subtypes (TNBC: CHCAMS, n = 85; YYH, n = 34; HMUCH, n = 6; HR+HER2‐: CHCAMS, n = 131; YYH, n = 30; HMUCH, n = 14). Abbreviations: WISE‐BRCA, Whole‐slide Images Systematically Extrapolate germline BRCA1/2 mutations; SL, slide‐level; PL, patient‐level; TNBC, triple‐negative breast cancer; HR, hormone receptor; HER2, human epidermal growth factor receptor 2; CHCAMS, Cancer Hospital, Chinese Academy of Medical Sciences; YYH, Yantai Yuhuangding Hospital; HMUCH, Harbin Medical University Clinical Hospitals.

We conducted further validation of WISE‐BRCA across various subgroups. WISE‐BRCA demonstrated acceptable performance on CNB samples, with a slide‐level AUC of 0.770 (95% CI 0.587 – 0.952) and patient‐level AUCs of 0.783 (95% CI 0.550 – 1.000) and 0.697 (95% CI 0.528 – 0.865) in the CHCAMS‐CNB and YYH‐CNB sets, respectively (Figure 3C). WISE‐BRCA maintained comparable efficacy between BRCA1 and BRCA2 mutation carriers, with slide‐level AUCs of 0.766 – 0.854 and 0.760 – 1.000, and patient‐level AUCs of 0.782 – 0.857 and 0.738 – 1.000 across the three cohorts (Figure 3D).

Notably, WISE‐BRCA achieved superior performance in patients with larger tumors (>2 cm), with slide‐level AUCs of 0.805 – 0.867 and patient‐level AUCs of 0.832 – 0.906 (Figure 3E). In the molecular subtype analysis, WISE‐BRCA shows lower AUCs in triple‐negative breast cancer (TNBC), with slide‐level AUCs of 0.656 – 0.722 and patient‐level AUCs of 0.615 – 0.875, demonstrating greater heterogeneity in the TNBC subtype[ 18 ] compared to higher AUCs for the HR‐positive/HER2‐negative subtype (slide‐level: 0.756 – 0.886 and patient‐level: 0.750 – 0.881) (Figure 3F).

2.4. Morphological ITH of Tumor Nuclei and Immune Cell Infiltration

To establish the interpretability of WISE‐BRCA and to identify BRCA1/2 mutation‐specific histomorphological patterns, we used an Integrated Gradients (IG)‐based attribution method to calculate gradient values for patches of different sizes. Visualization of the gradient maps revealed distinct mutation‐related contextual features in the WSIs from BRCA1/2 mutation carriers versus non‐carriers (Figure 4A). We extracted and quantitatively analyzed representative patches critical for germline BRCA1/2 prediction from 400 WSIs of non‐carriers and carriers in the CHCAMS cohort.

Figure 4.

Figure 4

Biological interpretability for WISE‐BRCA and BRCA1/2 mutation‐specific histomorphological patterns. A) Visualization of the gradient map from germline BRCA1/2 mutation carriers and non‐carriers in patches of sizes 224 × 224 and 512 × 512 pixels. B) Densities of epithelial and TME cells in and BRCA1/2 mutation carriers and non‐carriers. C) Odds ratios for associations between cell densities and germline BRCA1/2 mutation status. D) Low‐scale geometric and textural features of cells in BRCA1/2 mutation carriers and non‐carriers. E) Representative cell‐cell interaction metrics. F) High‐scale distribution of cell‐cell interactions in germline BRCA1/2 mutation carriers and non‐carriers. G) Odds ratios for associations between cell‐cell interactions and BRCA1/2 mutation status. H) Representation of genotype‐related TME features from representative patches showed enriched inflammatory cell infiltration and necrosis features in BRCA1‐mutated tumors, close cell connectivity in BRCA2‐mutated tumors, and stromal abundance and fewer lymphocytes in non‐carriers. Abbreviations: WISE‐BRCA, Whole‐slide Images Systematically Extrapolate BRCA1/2 mutations. TME, tumor microenvironment.

Based on genotype‐phenotype correlations, WISE‐BRCA facilitated biological interpretation in multiple dimensions: cell type, cell shape, and cell‐to‐cell interaction. BRCA1/2 mutation carriers showed significantly higher enrichment of immune cells (P < 0.001) and non‐tumor stromal cells (< 0.001) than non‐carriers. Additionally, immune cell density was significantly higher in BRCA1 mutation carriers compared to BRCA2 mutation carriers (P < 0.05) (Figure 4B,C; Figure S5, Supporting Information).

Low‐scale geometric and texture analysis revealed that non‐carrier cells had a large area, perimeter homogeneity, and small contrast, indicating relatively larger nuclei with less heterogeneity. Conversely, cells from BRCA1/2 mutation carriers had a larger ASM and smaller area (P < 0.001), suggesting a closer cellular arrangement. Cells from BRCA2 mutation carriers also showed greater eccentricity compared to BRCA1 mutation carriers (P < 0.001), indicating more invasive growth patterns (Figure 4D; Figures S6, and S7, Supporting Information).

We further analyzed spatial cell‐cell interactions and observed distinct distributions of interactions between BRCA1/2 mutation carriers and non‐carriers (Figure 4E). BRCA1/2 mutation carriers, particularly BRCA1, show a significantly higher enrichment of immune cell‐related interactions (P < 0.001), which are marked by abundant immune cell infiltration and an inflammatory response (Figure 4F,G; Figures S6 and S7, Supporting Information). In contrast, non‐carriers showed less immune infiltration but more frequent stromal‐neoplastic cell interactions. These results indicate that BRCA1/2 mutations may influence cell morphology and shape different TME. Representative patches reflecting these genotype‐related TME features are shown in Figure S8 (Supporting Information), with corresponding diagrams illustrating the enriched inflammatory infiltrates and necrosis in BRCA1‐mutated tumors, close tumor cell connectivity and more frequent mitosis in BRCA2‐mutated tumors, and stromal abundance with fewer lymphocytes in non‐carriers (Figure 4H).

2.5. Joint Prediction of BRCA1/2 Mutation Carriers from Histology Images and Electronic Health Records Using MAIGGT

Building on WISE‐BRCA, the MAIGGT framework integrates learned high‐level histopathological feature representations with clinical phenotypes from EHRs (Table S2, Supporting Information) through a unified latent representation space (Figure 5A,B). MAIGGT was initially developed using the CHCAMS cohort (Figure S9, Supporting Information) and then independently validated in two external cohorts. As a result, MAIGGT achieved AUCs of 0.845 (95% CI 0.779 – 0.911) and 0.925 (95% CI 0.868 – 0.982) in the YYH and HMUCH cohorts, respectively (Figure 5C), outperforming WISE‐BRCA, which relied solely on histology images for prediction. Finally, we assessed the relative contribution of individual features for the MAIGGT through SHAP, the overall SHAP values of the top‐10 pathology features and all phenotypic features are depicted in Figure S10 (Supporting Information). Among the phenotypic features, HER2 status, age at diagnosis, family history of tumor, and family history of breast cancer contributed the most significantly, and the pathological features derived from WISE‐BRCA demonstrated a consistently stable contribution to the model prediction (Figure S10, Supporting Information).

Figure 5.

Figure 5

Efficacy of joint prediction of germline BRCA1/2 mutation status from pathology images and clinical information. A) Development of the multi‐modal model MAIGGT. B) Joint prediction of germline BRCA1/2 mutation status from pathology images and clinical information by MAIGGT. C) Patient‐level prediction efficacy in the YYH (n = 74) and HMUCH (n = 21) cohorts. D) MAIGGT achieved higher prediction accuracy than the other single‐modality models at 80% sensitivity. Abbreviations: MAIGGT, Multimodal Artificial Intelligence Germline Genetic Testing; EHR, electronic health record; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; AR, androgen receptor; FC, full connected; CHCAMS, Cancer Hospital, Chinese Academy of Medical Sciences; YYH, Yantai Yuhuangding Hospital; HMUCH, Harbin Medical University Clinical Hospitals.

By incorporating clinical phenotypes, MAIGGT demonstrated superior prediction accuracy compared to other single‐modality models. At a sensitivity of 0.808, the multi‐modal MAIGGT achieved a specificity of 0.757, an improvement of 0.305 and 0.103 over the phenotype‐based and histology‐based WISE‐BRCA models, respectively (Figure 5C,D).

3. Discussion

In this study, we introduce MAIGGT, the first interpretable multi‐modal deep learning framework for predicting pathogenic germline BRCA1/2 mutations in breast cancer patients using routine EHR and histology images. By synergistically integrating histopathological features from WSIs with clinical phenotypes from EHR through a unified latent representation space, MAIGGT achieves state‐of‐the‐art performance while providing novel biological insights into BRCA1/2‐associated tumor biology.

Pathological image‐based deep learning predictive models play an increasingly important role in many aspects of breast cancer management, including screening, diagnosis, staging (counting mitotic figures[ 19 ] or identifying metastatic axillary lymph nodes[ 20 ]), biomarker evaluation (classifying molecular features[ 21 ]), prognostication, and therapeutic response prediction.[ 15 ] According to our previous study, pathological features outperform clinical characteristics, such as age and family history,[ 10 ] for predicting genetic mutations. Because patients with germline BRCA1/2 mutations exhibit specific pathological features (e.g., absence of HR and HER2 expression[ 5 , 22 ]), deep learning can help predict germline[ 17a ] or somatic[ 16a ] BRCA1/2 mutations in breast and ovarian cancer[ 3 , 16 ] but is limited by small study cohorts,[ 17a ] lack of external validation,[ 16a,c ] and insufficient information on germline mutations.[ 3 , 16 ] Notably, the clinical utility of AI‐based prescreening has been independently validated in other cancer types. For example, Saillard et al. demonstrated that MSIntuit, a deep learning model for microsatellite instability (MSI) detection in colorectal cancer, achieved high sensitivity (96‐98%) across diverse cohorts and scanner platforms,[ 23 ] highlighting the generalizability of such approaches. Our proposed WSI‐based WISE‐BRCA outperforms previous models due to its application to large populations of breast cancer patients, multi‐center external validation, germline mutation prediction, and innovative AI model construction. While WISE‐BRCA does not predict homogeneous recombination deficiency (HRD) status or somatic mutations, as achieved in some earlier studies,[ 3 , 24 ] its focus on germline BRCA1/2 status is clinically relevant, particularly as an indicator for PARP inhibitor treatment.[ 25 ] Moreover, HRD measurement in clinical practice often relies on germline BRCA1/2 status. Despite the weaker association between germline mutations and phenotypic expression compared to somatic mutations, the strong predictive performance of WISE‐BRCA underscores the close relationship between germline mutations and pathological phenotypes, effectively capturing genotype information by analyzing detailed pathology image features.

WISE‐BRCA was developed using a transformer‐based architecture and a weakly supervised learning method that does not require costly manual annotation. Recognizing that WSIs at different magnifications reveal different levels of pathological features, WISE‐BRCA is designed with a multi‐scale architecture, and processes patches of different sizes to systematically analyze WSIs. WISE‐BRCA also introduces cross‐attention mechanisms to facilitate interactions between low‐scale morphological features and high‐scale, spatial‐oriented representations, resulting in a more synergistic and enriched histopathological feature representation. Based on the promising results achieved by pathology‐driven models, we aim to further integrate patient phenotypic data to explore potential performance improvements from multimodal information. Pathological images are high‐dimensional, complex, and unprocessed data, whereas clinical phenotypic information is low‐dimensional, manually refined and structured data that encapsulates human expertise. There are significant differences between the two in terms of dimensionality, data representation and information granularity. Considering that our MAIGGT uses a shared and unified deep generative modelling method to integrate the learned high‐level histopathological information from WISE‐BRCA with clinical phenotypic data, rather than simply fusing the raw data from the two modalities. Through multimodal prior distribution constraints and reconstruction, MAIGGT learns a joint common latent space between histopathological and phenotypic data, effectively aligning and integrating these modalities to extract intrinsic complementary features related to germline BRCA1/2 mutations, thereby achieving superior predictive performance compared to single‐modality models and other integration methods (Table S3, Supporting Information).

Although achieving excellent performance, the interpretability of deep learning models remains a challenge, particularly in clinical settings where pathology is central to diagnosis. To address this barrier, we conducted a visualization analysis of regions of interest identified by WISE‐BRCA. The model revealed that BRCA1 mutations were associated with higher inflammatory infiltration and necrosis, as well as off‐center nuclei and pushing tumor margins, while BRCA2 mutations were associated with more active mitotic phases. In contrast, non‐carriers showed an abundance of stromal cells and tumor cells with centrally located nuclei. These results corroborate previous findings on the pathogenic mechanism of BRCA1/2, which indicate immune activation and nuclear morphological heterogeneity in BRCA1/2‐mutated breast cancers.[ 26 ] Notably, although the use of HoverNet may introduce some bias, since our data are external datasets for HoverNet, any errors or biases introduced by HoverNet are generally uniformly distributed across the slides of both mutation and non‐mutation patients, without significant specificity. This may introduce bias into the numerical estimates within individual groups in our interpretability analysis, such as cell density estimates, but it will have little impact on the differences between groups that we primarily focus on, such as the differences in cell density between mutation carriers and non‐carriers. Moreover, clinical trials and drug development efforts suggest patients with BRCA1/2 mutations may respond well to immunotherapy,[ 27 ] and BRCA2‐mutated tumors are being investigated for potential resistance to CDK4/6 inhibitors because they regulate mitosis and the cell cycle.[ 28 ] In this context, MAIGGT represents a more objective, feature‐naïve approach that innovatively adopts morphological classification and EHR phenotypes to systematically infer germline genotypes and their corresponding potential therapeutic targets.

Despite these promising findings, our study has several limitations. First, the retrospective nature of the study involving clinically matched patients introduces inherent selection bias and heterogeneity into the imaging data, which requires further validation in real‐world contexts. In addition, due to the clinical focus, we did not incorporate other genes or evaluate HRD status within DNA damage response pathways. However, MAIGGT is specifically designed to address clinical indications for intervention and to directly inform treatment selection. Consequently, the significant disparity between germline phenotypes and pathological features may lead to an underestimation of predictive results. Notably, our model demonstrated relatively lower performance in TNBC cases compared to hormone receptor‐positive subtypes. This may reflect the inherent biological heterogeneity of TNBC, where BRCA1‐associated tumors share histological features (e.g., immune infiltration, necrosis) with sporadic TNBCs, making discrimination difficult. In addition, the underrepresentation of TNBC cases (≈30% of our cohort) and the divergent molecular profiles of BRCA1‐ versus BRCA2‐associated tumors may contribute to this disparity. Future studies with expanded TNBC cohorts and subtype‐specific modeling approaches could further refine predictive accuracy in this clinically critical subgroup. Meanwhile, although MAIGGT demonstrates excellent predictive performance, it still lacks validation across different regions and ethnic groups. Therefore, larger, prospective, and cross‐ethnic multi‐center cohorts are needed to extend the application scenario of this promising deep learning model.

4. Conclusion

MAIGGT introduces a new paradigm for precision cancer risk assessment by demonstrating that deep learning can extract clinically relevant genetic insights from routine histopathology images. By bridging digital pathology with clinical genotypic data, this approach improves the accessibility, efficiency, and scalability of genetic testing. Ultimately, MAIGGT has the potential to transform personalized cancer prevention and treatment strategies, enabling early risk stratification, targeted interventions, and improved clinical decision‐making.

5. Experimental Section

Study Design and Patient Cohorts

Participants from the CHCAMS cohort were sorted at patient‐level into a separate discovery set for model training and underwent four‐fold cross‐validation (n = 374), along with an internal test set for validation (n = 106). External validation was conducted independently in the YYH and HMUCH cohorts (n = 133 and 21, respectively) (Figure 1A). To avoid the potential influence of clinical characteristics and to alleviate data imbalance during model training, germline BRCA1/2 mutation carriers and non‐carriers were matched from the CHCAMS and YYH cohorts in a 1: 2 – 3 ratio based on their age at diagnosis, tumor size, lymph node status, and molecular subtype. Participants were enrolled consecutively in the HMUCH cohort to mimic a real‐world scenario for breast cancer patients with high BRCA1/2 mutation risk. Participants were excluded if they had received neoadjuvant chemotherapy, endocrine therapy, or radiotherapy prior to specimen collection. Treatment history was verified through electronic health records and pathological review to ensure that all samples analyzed represented treatment‐naïve tumor biology.

Additionally, the Breast Cancer Semantic Segmentation (BCSS) dataset,[ 29 ] which includes WSIs from The Cancer Genome Atlas Breast Invasive Carcinoma cohort annotated by pathologists into six tissue regions, was incorporated. The BCSS dataset was split into a discovery set (80%) and test set (20%) for 5‐fold cross‐validation and testing of the tumor segmentation model as described later in this section.

The ethics committees of each participating institute approved the study (2021041314383902 for CHACMS, 2020–289 for YYH, LC2024‐052 for HMUCH). Written informed consent from patients was waived because of the complete anonymization of the study and the non‐identifying and retrospective nature of the collected data. The authors state that they have followed the principles outlined in the Declaration of Helsinki for all human experimental investigations.

Genotype and Phenotypic Characteristics

Germline BRCA1/2 mutations were previously analyzed through next‐generation sequencing of genomic DNA extracted from peripheral blood or saliva.[ 10 ] The clinical significance of each mutation was evaluated according to guidelines of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology through an in‐house pipeline.[ 10 , 30 ] Finally, the germline BRCA1/2 mutation status of each participant was determined independently by two clinical geneticists.

Clinical and pathological data, including age at diagnosis, family and personal cancer history, pathological features, molecular subtype, and clinical stage, were extracted from each participant's EHR (Figure 1A). Molecular subtypes were defined based on the immunohistochemical status of hormone receptors, such as estrogen receptor (ER), progesterone receptor (PR), and HER2.

Image Curation and Pre‐Processing

For patients with multiple slides, all available slides were retained to assess the effectiveness of the deep learning model at both the slide and patient level. In the CHCAMS and YYH cohorts, H&E‐stained slides were prepared from surgically resected and core needle biopsy (CNB) specimens using formalin‐fixed, paraffin‐embedded (FFPE) tissue blocks. These slides were digitized at 20× magnification using scanners (KF‐PRO‐040, FKBIO, China and 3D Panoramic SCAN SC150‐214705, DynaMax, China). In the HMUCH cohort, all H&E‐stained slides were prepared from surgical resection specimens and digitized at 40× magnification using the EasyScan One scanner (Motic, China). Poor‐quality slides were filtered out, including those with large areas of debris, folds, pen marks, blurred regions, missing diagnostic time information, poor visual quality, or abnormal staining. The filtering process was performed by two clinical pathologists who were blinded to patient identity, and in case of any disagreement, a third pathologist was involved for joint review.

To facilitate the analysis of gigapixel‐sized WSIs and capture information at different scales, WSIs meeting quality standards were tessellated into non‐overlapping patches of 224 × 224 and 512 × 512 pixels at 20× magnification. Patches with <50% tissue area were excluded from the analysis. To reduce staining variation and batch effects, patches from the external cohorts were subjected to staining normalization using a parameter‐variable staining normalization network.[ 31 ] Similarly, WSIs from the BCSS dataset were tessellated into patches of different sizes, with patches containing >25% tumor area classified as tumor patches and the remainder as non‐tumor patches.

Tumor Area Segmentation and Image Sampling

Emerging evidence indicates tumor areas are more closely associated with gene mutation compared to peri‐tumor and other non‐cancerous regions that may alter the tumor phenotype.[ 32 ] Therefore, a tumor segmentation model was developed to automatically extract tumor areas from each WSI and to predict germline BRCA1/2 mutations within these areas. The tumor segmentation model used ResNet50, pre‐trained on ImageNet, as the backbone and was trained on labelled patches of different sizes from the BCSS dataset for multi‐scale tumor area segmentation. Specifically, two tumor segmentation models were trained on patches of sizes 224 and 512, respectively. Both models used the cross‐entropy loss as an object function and were optimized using the AdamW optimizer with a batch size of 64, a learning rate of 7e‐4, and a weight decay of 5e‐4. Five‐fold cross‐validation was used to develop two models in the BCSS discovery set, with each fold trained for 100 epochs. The model with the highest AUC in the five‐fold cross‐validation was selected for further use.

To reduce computational complexity, a clustering‐based sampling strategy was implemented to extract patches with distinct histomorphological features from the tumor area (Figure 1B,C). CTransPath was used,[ 33 ] a convolutional neural network and SwinTransformer‐based pathology foundation model, to generate 768‐dimensional feature representations for each patch. The K‐means algorithm was used to cluster patches into Nc clusters, and Ns patches were randomly sampled from each cluster, yielding a total of Nc × Ns patches per WSI (Figure S11, Supporting Information). Compared to random sampling, this clustering‐based sampling method comprehensively captures more complete information from WSIs and reduces the risk of missing significant patches with less common histomorphological features.

Architecture of WISE‐BRCA

As shown in Figure 1C, WISE‐BRCA consists of three main components: embedding layer, multi‐scale Transformer block and fusion classification head. The embedding layer includes a linear layer, a batch normalization layer, and a ReLU activation function. In practical applications, WISE‐BRCA sets two embedding layers to embed the feature representations Zl={z1,1,z1,2,z1,3,,zcl,sl},zi,jRd and Zh={z1,1,z1,2,z1,3,,zch,sh},zi,jRd of 224 × 224 an 512 × 512 sized patches from cluster sampling, with output embedding features Inline graphic and Inline graphic, where cl , sl and ch , sh represent the number of clusters and the number of samples at different sampling sizes, respectively, and d represents the dimension of patch embeddings.

Then, Zl and Zh were treated as tokens and forwarded to twelve stacked multi‐scale Transformer blocks. Each block contains a low‐scale branch and a high‐scale branch, which process Zl and Zh respectively, to extract BRCA1/2 mutation‐related multi‐scale information for prediction. Traditional self‐attention does not allow for cross‐scale information interaction and fusion due to the independence of branches. Therefore, WISE‐BRCA introduces cross‐attention to bridge the gap between low‐scale morphological features and high‐scale spatially oriented representations, allowing information interaction between the branches and generating systematic high‐level discriminative features. The cross‐attention would be used interchangeably with self‐attention in consecutive multi‐scale Transformer blocks. Suppose the Xl , Xclsl Rclsl×d, and Xh , Xclsh Rchsh×d are the patch tokens and class tokens output by the previous multi‐scale Transformer block, the calculation process of the cross attention was as follows:

graphic file with name ADVS-12-e02833-e014.jpg (1)

where Ql , Kl , Vl Rclsl×dk and Qh , Kh , Vh Rchsh×dk, are the query, key and value matrices; WlQ, WlK, WlV Rd×dk and WhQ, WhK, WhV Rd×dk are learnable parameters. Multiple heads were also allocated in cross‐attention as follows:

graphic file with name ADVS-12-e02833-e005.jpg (2)

where Hli=Attention(Qli,Kli,Vli) and Hhi=Attention(Qhi,Khi,Vhi) for i{1,2,n}; WLRndk×d and WHRndk×d are learnable parameters; n represents the number of cross attention heads. Here, cl , sl , ch , sh and n were set to 30, 60, 30, 60, and 12, respectively. The dimension of d and dk were 768 and 64.

Finally, a systematic feature representation of 1536 dimensions was generated by concatenating the class tokens from the low‐scale and high‐scale branches. Meanwhile, this discriminative feature was forward to the linear fusion classification head to obtain the slide‐level BRCA1/2 mutation prediction result.

During training, binary cross‐entropy was used with logits loss as the object function and AdamW as the optimizer. The initial learning rate was set to 1e‐5, weight decay was 1e‐3 and batch size was 32. WISE‐BRCA was trained using 4‐fold cross‐validation, with each fold trained for 150 epochs and early stopping after 20 epochs of no improvement. The model with the highest validation AUC among the 4‐fold cross‐validation was selected as the best model for internal test and external validation. To expedite model convergence, the GradualWarmupScheduler was used for a five epochs warm‐up, followed by CosineAnnealingLR with T‐max set to 95 epochs to adjust the learning rate.

Attribution and Visualization of WISE‐BRCA Predictions

To ensure the intrinsic interpretability of WISE‐BRCA and reveal biological changes associated with BRCA1/2 mutations from an AI perspective, model predictions were attributed using the Integrated Gradients (IG) method.[ 34 ]

For the patch feature representations Zl and Zh of the input WSI, IG calculates an importance‐related gradient value for each patch as follows:

graphic file with name ADVS-12-e02833-e020.jpg (3)

where F(x) is the WISE‐BRCA model; X and X′ represent the feature representations and baseline features of all input patches; i represents the i‐th patch feature representation in X.

After the computation, IG will output IG gradient values of the same size as the input. The IG gradient values were then averaged across patches at different scales as the gradient value for each patch. The IG gradient values were used to quantify the importance of each patch in the model's prediction, with larger gradient values indicating a stronger association with BRCA1/2 mutations. For slide‐level visualisation, cluster sampling was performed 500 times per WSI, and predictions were made for each sampling. The IG gradients for each patch were averaged to obtain macro‐attribution results for the WSI. At the patch level, 200 WSIs were selected from germline BRCA1/2 mutation carriers with the highest prediction probabilities, and 200 WSIs were selected from non‐carriers with the lowest prediction probabilities. The top 10 highest gradient value patches from germline BRCA1/2 mutation carriers and the top 10 lowest gradient value patches from non‐carriers were then selected for morphological and spatial interaction analyses.

Nuclei Segmentation and Feature Extraction

Nuclei segmentation was performed using HoVer‐Net[ 35 ] and pre‐trained on the PanNuke annotated pan‐cancer pathology image dataset to classify nuclei into six types: neoplastic, inflammatory, connective, dead, non‐neoplastic stromal, and unlabeled (Figure S12, Supporting Information). Based on the centroid and contour of the nucleus within its corresponding patch, 14 total morphological and texture features were obtained to quantitatively characterize the properties of each nucleus (Table S4, Supporting Information).[ 36 ]

Spatial Cell‐Cell Interaction Analysis

Cells were defined as interacting if their segmentation masks were in direct contact (i.e., the contour of the nuclei was connected); spatial interactions for all cells were calculated based on this criterion. Furthermore, cell type annotations were incorporated, which identified 15 cell‐cell interactions between different cell types. The abundance of each interaction type within a patch was calculated as the total number of interactions between two cell types divided by the total number of cells.

Fine‐Tuning WISE‐BRCA on Fresh‐Frozen Samples

To improve the generalization of WISE‐BRCA, transfer learning was performed on fresh frozen (FF) samples. Specifically, the model was fine‐tuned, trained initially on FFPE samples, using the Finetune set for 50 epochs with an early stopping criterion of 10 epochs. Independent performance evaluations were then conducted on the CHCAMS‐CNB and YYH‐CNB cohorts. During the fine‐tuning, the binary cross‐entropy was used with logits loss as the object function and AdamW as the optimizer with a learning rate of 1e‐3, weight decay of 5e‐4 and a batch size of 2. The learning rate was initially adjusted using the GradualWarmupScheduler for the first 10 epochs, followed by CosineAnnealingLR with T‐max set to 20 epochs to further modulate the learning rate.

Architecture of MAIGGT

Based on WISE‐BRCA, clinical information of patients (Table S2, Supporting Information) was incorporated to design a multi‐modal deep learning model, MAIGGT, for jointly predicting germline BRCA1/2 mutations. An unsupervised deep generative multi‐channel variational auto‐encoder (mcVAE)[ 37 ] was employed to learn a shared common latent space and generating unified representations for histopathological and phenotypic data. For the input pathological image representation Xpatho derived from WISE‐BRCA and standardized clinical information Xpheno (The standardization method was the same as the previous study.),[ 10 ] MAIGGT employs two pairs of encoders and decoders (Table S5, Supporting Information) for training. To optimize the Evidence Lower Bound (ELBO), achieving cross‐modal learning and reasoning capabilities, two loss functions Lrec and Lreg were delineated. Among them, Lrec represents the reconstruction term for intra‐modal self‐reconstruction and inter‐modal cross‐reconstruction between pathological and phenotypic data. Lreg is the regularization term that constrains the distributions of the latent variables for both modalities remain close to the prior distribution. The detailed description is as follows:

Lrec=EZpatho,Zphenoi,jlogpXi|ZjLreg=β·DKLqZpatho|Xpatho||pZ+DKLqZpheno|Xpheno||pZ (4)

Here, Zpathoq(Zpatho|Xpatho) and Zphenoq(Zpheno|Xpheno) are the posterior distributions of pathological and phenotypic features, respectively. p(Z)=N(0,I) represents the standard Gaussian distribution. β represents the weight coefficient of the constraint term. i and j represent the pathological or phenotypic channels. DKL represents the Kullback‐Leibler Divergence, which is formulated by

DKL(p||q)=12k=1Kσk2+μk21logσk2 (5)

where p=N(μ1,σ12), and q=N(μ2,σ22). Finally, the reconstruction and regularization losses were combined as the objective function L as follows:

L=LregLrec (6)

During training, the encoders of each channel generate the posterior distributions for the inputs of their respective modalities, while the decoders reconstruct the multimodal data from the reparameterized sample Zi=μi+εi·σi (εiN(0,I)). The model was trained using AdamW optimizer with a batch size of 32, a learning rate of 5e‐4, and a weight decay of 5e‐4 for 2000 epochs, with an early stopping criterion set at 100 epochs. To expedite model convergence, the GradualWarmupScheduler was used for 50 epochs of warm‐up, followed by CosineAnnealingLR with T‐max of 50 epochs to adjust the learning rate.

After that, the learned unified latent representation of histopathological and phenotypic data was fed into a multi‐modal fusion classification head (Table S6, Supporting Information) for prediction and uses the binary cross‐entropy with logits loss as the object function. The multimodal fusion classification head was also optimized using the AdamW optimizer and trained for 100 epochs with a batch size of 60, a learning rate of 5e‐4, weight decay of 5e‐4, a dropout rate of 0.4, and an early stopping criterion set at 10 epochs.

Hardware and Software

The PyTorch (version 1.13.1), Pandas (version 1.3.4), Numpy (version 1.24.4), OpenCV (version 4.5.3.56), histolab (version 0.5.1), and Feather (0.4.1) were used in this project. All deep learning models were implemented on two NVIDIA 4090 GPUs with CUDA11.7 and cuDNN 8.5.0.

Statistical Analysis

Statistical analysis was performed using R (version 4.2.2) and Python (version 3.8). The unpaired t‐test was used to compare continuous variables with a normal distribution, while the Mann‐Whitney U test or Kruskal‐Wallis test was used for non‐normally distributed continuous variables. Means and standard deviations (SDs) summarized continuous variables, whereas categorical variables were expressed as percentages. The area under the curve (AUC)‐receiver operating characteristic was calculated using the pROC package (version 1.18.0). Odds ratios were calculated using univariate logistic regression. The 95% confidence intervals (CIs) were calculated using the non‐parametric bootstrap method. All tests were two‐tailed, with a significance level set at P < 0.05.

Conflict of Interest

The authors declare no conflict of interest.

Author Contributions

Z.J.Y., C.Y.G., J.Y.L., Y.L.L., and L.Z. contributed equally to the work. J.Q.L. and M.Z. conceived the idea and designed the study. J.Y.L., Y.L.L., L.Z., P.M.P., Y.X.Z., T.X.S. and L.C. collected and double‐checked the clinical data. Z.J.Y. and M.Z. developed the deep learning algorithm. Z.J.Y., JQ.L., J.Y.L., and C.Y.G. analyzed the results. C.Y.G., J.Y.L., Y.L.L., L.Z., and J.M.Y. analyzed the pathological images. Z.J.Y., J.Y.L., C.Y.G., J.Q.L., and M.Z. wrote and revised the manuscript. All the authors participated in revising the manuscript before submission and the formal revision, including literature searching, information extraction, and text modification.

Supporting information

Supporting Information

Acknowledgements

The authors thank all individuals, families and physicians involved in the study for their participation. The authors thank BioScience Writers, LLC, for editing support. This research was funded in part by the National Natural Science Foundation of China (82272938 to J.L.), Beijing Nova Program (20220484059 to J.L.), the CAMS Innovation Fund for Medical Sciences (2021‐I2M‐1‐014 to J.L.), and Beijing Technology Innovation Foundation for University or College Students (2024dcxm054 to J.L.).

Yang Z., Guo C., Li J., Li Y., Zhong L., Pu P., Shang T., Cong L., Zhou Y., Qiao G., Jia Z., Xu H., Cao H., Huang Y., Liu T., Liang J., Wu J., Ma D., Liu Y., Zhou R., Wang X., Ying J., Zhou M., Liu J., An Explainable Multimodal Artificial Intelligence Model Integrating Histopathological Microenvironment and EHR Phenotypes for Germline Genetic Testing in Breast Cancer. Adv. Sci. 2025, 12, e02833. 10.1002/advs.202502833

Contributor Information

Jianming Ying, Email: jmying@cicams.ac.cn.

Meng Zhou, Email: zhoumeng@wmu.edu.cn.

Jiaqi Liu, Email: j.liu@cicams.ac.cn.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Codes used in this study are publicly available at https://github.com/ZhoulabCPH/MAIGGT/.

References

  • 1. Balmana J., Diez O., Rubio I. T., Cardoso F., Group E. G. W., Ann. Oncol. 2011, 22, vi31. [DOI] [PubMed] [Google Scholar]
  • 2.a) Wooster R., Weber B. L., N. Engl. J. Med. 2003, 348, 2339; [DOI] [PubMed] [Google Scholar]; b) Venkitaraman A. R., Cell 2002, 108, 171. [DOI] [PubMed] [Google Scholar]
  • 3. Ang B. H., Ho W. K., Wijaya E., Kwan P. Y., Ng P. S., Yoon S. Y., Hasan S. N., Lim J. M. C., Hassan T., Tai M. C., Allen J., Lee A., Taib N. A. M., Yip C. H., Hartman M., Lim S. H., Tan E. Y., Tan B. K. T., Tan S. M., Tan V. K. M., Ho P. J., Khng A. J., Dunning A. M., Li J., Easton D. F., Antoniou A. C., Teo S. H., J. Clin. Oncol. 2022, 40, 1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Nolan E., Lindeman G. J., Visvader J. E., Cell 2023, 186, 1708. [DOI] [PubMed] [Google Scholar]
  • 5. Pujol P., Barberis M., Beer P., Friedman E., Piulats J. M., Capoluongo E. D., Garcia Foncillas J., Ray‐Coquard I., Penault‐Llorca F., Foulkes W. D., Turnbull C., Hanson H., Narod S., Arun B. K., Aapro M. S., Mandel J. L., Normanno N., Lambrechts D., Vergote I., Anahory M., Baertschi B., Baudry K., Bignon Y. J., Bollet M., Corsini C., Cussenot O., De la Motte Rouge T., Duboys de Labarre M., Duchamp F., et al., Eur. J. Cancer 2021, 146, 30. [DOI] [PubMed] [Google Scholar]
  • 6. Grindedal E. M., Heramb C., Karsrud I., Ariansen S. L., Maehle L., Undlien D. E., Norum J., Schlichting E., BMC Cancer 2017, 17, 438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Daly M. B., Pal T., Maxwell K. N., Churpek J., Kohlmann W., AlHilli Z., Arun B., Buys S. S., Cheng H., Domchek S. M., Friedman S., Giri V., Goggins M., Hagemann A., Hendrix A., Hutton M. L., Karlan B. Y., Kassem N., Khan S., Khoury K., Kurian A. W., Laronga C., Mak J. S., Mansour J., McDonnell K., Menendez C. S., Merajver S. D., Norquist B. S., Offit K., Rash D., et al., J. Natl. Compr. Cancer Netw. 2023, 21, 1000. [DOI] [PubMed] [Google Scholar]
  • 8. Tutt A. N. J., Garber J. E., Kaufman B., Viale G., Fumagalli D., Rastogi P., Gelber R. D., de Azambuja E., Fielding A., Balmana J., Domchek S. M., Gelmon K. A., Hollingsworth S. J., Korde L. A., Linderholm B., Bandos H., Senkus E., Suga J. M., Shao Z., Pippas A. W., Nowecki Z., Huzarski T., Ganz P. A., Lucas P. C., Baker N., Loibl S., McConnell R., Piccart M., Schmutzler R., Steger G. G., et al., N. Engl. J. Med. 2021, 384, 2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.a) Beitsch P. D., Whitworth P. W., Hughes K., Patel R., Rosen B., Compagnoni G., Baron P., Simmons R., Smith L. A., Grady I., Kinney M., Coomer C., Barbosa K., Holmes D. R., Brown E., Gold L., Clark P., Riley L., Lyons S., Ruiz A., Kahn S., MacDonald H., Curcio L., Hardwick M. K., Yang S., Esplin E. D., Nussbaum R. L., J. Clin. Oncol. 2019, 37, 453; [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Yang S., Axilbund J. E., O'Leary E., Michalski S. T., Evans R., Lincoln S. E., Esplin E. D., Nussbaum R. L., Ann. Surg. Oncol. 2018, 25, 2925; [DOI] [PubMed] [Google Scholar]; c) Yadav S., Hu C., Hart S. N., Boddicker N., Polley E. C., Na J., Gnanaolivu R., Lee K. Y., Lindstrom T., Armasu S., Fitz‐Gibbon P., Ghosh K., Stan D. L., Pruthi S., Neal L., Sandhu N., Rhodes D. J., Klassen C., Peethambaram P. P., Haddad T. C., Olson J. E., Hoskin T. L., Goetz M. P., Domchek S. M., Boughey J. C., Ruddy K. J., Couch F. J., J. Clin. Oncol. 2020, 38, 1409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Liu J., Zhao H., Zheng Y., Dong L., Zhao S., Huang Y., Huang S., Qian T., Zou J., Liu S., Li J., Yan Z., Li Y., Zhang S., Huang X., Wang W., Li Y., Wang J., Ming Y., Li X., Xing Z., Qin L., Zhao Z., Jia Z., Li J., Liu G., Zhang M., Feng K., Wu J., Zhang J., et al., Genome Med. 2022, 14, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lakhani S. R., Van De Vijver M. J., Jacquemier J., Anderson T. J., Osin P. P., McGuffog L., Easton D. F., J. Clin. Oncol. 2002, 20, 2310. [DOI] [PubMed] [Google Scholar]
  • 12. Liu Y., Han D., Parwani A. V., Li Z., Arch. Pathol. Lab. Med. 2023, 147, 1003. [DOI] [PubMed] [Google Scholar]
  • 13. Bi W. L., Hosny A., Schabath M. B., Giger M. L., Birkbak N. J., Mehrtash A., Allison T., Arnaout O., Abbosh C., Dunn I. F., Mak R. H., Tamimi R. M., Tempany C. M., Swanton C., Hoffmann U., Schwartz L. H., Gillies R. J., Huang R. Y., Aerts H., CA Cancer J. Clin. 2019, 69, 127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Bilal M., Tsang Y. W., Ali M., Graham S., Hero E., Wahab N., Dodd K., Sahota H., Wu S., Lu W., Jahanifar M., Robinson A., Azam A., Benes K., Nimir M., Hewitt K., Bhalerao A., Eldaly H., Raza S. E. A., Gopalakrishnan K., Minhas F., Snead D., Rajpoot N., Lancet Digit Health 2023, 5, 786. [DOI] [PubMed] [Google Scholar]
  • 15. Ahn J. S., Shin S., Yang S. A., Park E. K., Kim K. H., Cho S. I., Ock C. Y., Kim S., J. Breast Cancer 2023, 26, 405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.a) Qu H., Zhou M., Yan Z., Wang H., Rustgi V. K., Zhang S., Gevaert O., Metaxas D. N., NPJ Precis. Oncol. 2021, 5, 87; [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Nero C., Boldrini L., Lenkowicz J., Giudice M. T., Piermattei A., Inzani F., Pasciuto T., Minucci A., Fagotti A., Zannoni G., Valentini V., Scambia G., Int. J. Mol. Sci. 2022, 23, 11326; [DOI] [PMC free article] [PubMed] [Google Scholar]; c) Ektefaie Y., Yuan W., Dillon D. A., Lin N. U., Golden J. A., Kohane I. S., Yu K. H., NPJ Breast Cancer 2021, 7, 147; [DOI] [PMC free article] [PubMed] [Google Scholar]; d) Kather J. N., Heij L. R., Grabsch H. I., Loeffler C., Echle A., Muti H. S., Krause J., Niehues J. M., Sommer K. A. J., Bankhead P., Kooreman L. F. S., Schulte J. J., Cipriani N. A., Buelow R. D., Boor P., Ortiz‐Bruchle N. N., Hanby A. M., Speirs V., Kochanny S., Patnaik A., Srisuwananukorn A., Brenner H., Hoffmeister M., van den Brandt P. A., Jager D., Trautwein C., Pearson A. T., Luedde T., Nat. Cancer 2020, 1, 789; [DOI] [PMC free article] [PubMed] [Google Scholar]; e) Wagner S. J., Reisenbüchler D., West N. P., Niehues J. M., Zhu J., Foersch S., Veldhuizen G. P., Quirke P., Grabsch H. I., van den Brandt P. A., Cancer Cell 2023, 41, 1650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.a) Wang X., Zou C., Zhang Y., Li X., Wang C., Ke F., Chen J., Wang W., Wang D., Xu X., Xie L., Zhang Y., Front. Genet. 2021, 12, 661109; [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Zhao S., Yan C. Y., Lv H., Yang J. C., You C., Li Z. A., Ma D., Xiao Y., Hu J., Yang W. T., Jiang Y. Z., Xu J., Shao Z. M., Fundam. Res. 2024, 4, 678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Morganti S., Marra A., De Angelis C., Toss A., Licata L., Giugliano F., Taurelli Salimbeni B., Berton Giachetti P. P. M., Esposito A., Giordano A., Bianchini G., Garber J. E., Curigliano G., Lynce F., Criscitiello C., JAMA Oncol. 2024, 10, 658. [DOI] [PubMed] [Google Scholar]
  • 19. Veta M., van Diest P. J., Willems S. M., Wang H., Madabhushi A., Cruz‐Roa A., Gonzalez F., Larsen A. B., Vestergaard J. S., Dahl A. B., Ciresan D. C., Schmidhuber J., Giusti A., Gambardella L. M., Tek F. B., Walter T., Wang C. W., Kondo S., Matuszewski B. J., Precioso F., Snell V., Kittler J., de Campos T. E., Khan A. M., Rajpoot N. M., Arkoumani E., Lacle M. M., Viergever M. A., Pluim J. P., Med. Image Anal. 2015, 20, 237. [DOI] [PubMed] [Google Scholar]
  • 20. Ehteshami Bejnordi B., Veta M., Johannes van Diest P., van Ginneken B., Karssemeijer N., Litjens G., van der Laak J., the C. C., Hermsen M., Manson Q. F., Balkenhol M., Geessink O., Stathonikos N., van Dijk M. C., Bult P., Beca F., Beck A. H., Wang D., Khosla A., Gargeya R., Irshad H., Zhong A., Dou Q., Li Q., Chen H., Lin H. J., Heng P. A., Hass C., Bruni E., Wong Q., et al., JAMA, J. Am. Med. Assoc. 2017, 318, 2199. [Google Scholar]
  • 21. Zhao Y., Zhang J., Hu D., Qu H., Tian Y., Cui X., Micromachines (Basel) 2022, 13, 2197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Atchley D. P., Albarracin C. T., Lopez A., Valero V., Amos C. I., Gonzalez‐Angulo A. M., Hortobagyi G. N., Arun B. K., J. Clin. Oncol. 2008, 26, 4282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Saillard C., Dubois R., Tchita O., Loiseau N., Garcia T., Adriansen A., Carpentier S., Reyre J., Enea D., von Loga K., Kamoun A., Rossat S., Wiscart C., Sefta M., Auffret M., Guillou L., Fouillet A., Kather J. N., Svrcek M., Nat. Commun. 2023, 14, 6695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Lazard T., Bataillon G., Naylor P., Popova T., Bidard F. C., Stoppa‐Lyonnet D., Stern M. H., Decenciere E., Walter T., Vincent‐Salomon A., Cell Rep. Med. 2022, 3, 100872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Dibitetto D., Widmer C. A., Rottenberg S., Trends Cancer 2024, 10, 857. [DOI] [PubMed] [Google Scholar]
  • 26.a) Solinas C., Marcoux D., Garaud S., Vitoria J. R., Van den Eynden G., de Wind A., De Silva P., Boisson A., Craciun L., Larsimont D., Piccart‐Gebhart M., Detours V., t'Kint de Roodenbeke D., Willard‐Gallo K., Cancer Lett. 2019, 450, 88; [DOI] [PubMed] [Google Scholar]; b) Byrne A., Savas P., Sant S., Li R., Virassamy B., Luen S. J., Beavis P. A., Mackay L. K., Neeson P. J., Loi S., Nat. Rev. Clin. Oncol. 2020, 17, 341. [DOI] [PubMed] [Google Scholar]
  • 27. Hong R., Xu B., Cancer Commun. (Lond) 2022, 42, 913,. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Morrison L., Loibl S., Turner N. C., Nat. Rev. Clin. Oncol. 2024, 21, 89. [DOI] [PubMed] [Google Scholar]
  • 29. Amgad M., Elfandy H., Hussein H., Atteya L. A., Elsebaie M. A., Abo Elnasr L. S., Sakr R. A., Salem H. S., Ismail A. F., Saad A. M., Bioinformatics 2019, 35, 3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Richards S., Aziz N., Bale S., Bick D., Das S., Gastier‐Foster J., Grody W. W., Hegde M., Lyon E., Spector E., Voelkerding K., Rehm H. L., Committee A. L. Q. A., Genet. Med. 2015, 17, 405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Choqueuse V., Frunza A., Belouchrani A., Azou S., Morel P., in 2022 30th European Signal Processing Conference(EUSIPCO) , IEEE, Piscataway, NJ: 2022, 1616. [Google Scholar]
  • 32.a) Shia J., Schultz N., Kuk D., Vakiani E., Middha S., Segal N. H., Hechtman J. F., Berger M. F., Stadler Z. K., Weiser M. R., Wolchok J. D., Boland C. R., Gonen M., Klimstra D. S., Mod. Pathol. 2017, 30, 599; [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Yan R., Shen Y., Zhang X., Xu P., Wang J., Li J., Ren F., Ye D., Zhou S. K., Med. Image Anal. 2023, 87, 102824. [DOI] [PubMed] [Google Scholar]
  • 33. Wang X., Yang S., Zhang J., Wang M., Zhang J., Yang W., Huang J., Han X., Med. Image Anal. 2022, 81, 102559. [DOI] [PubMed] [Google Scholar]
  • 34. Sundararajan M., Taly A., Yan Q., Int. Conf. Mach. Learn. PMLR 2017, 70, 3319. [Google Scholar]
  • 35. Graham S., Vu Q. D., Raza S. E. A., Azam A., Tsang Y. W., Kwak J. T., Rajpoot N., Med. Image Anal. 2019, 58, 101563. [DOI] [PubMed] [Google Scholar]
  • 36.a) Zhao S., Chen D.‐P., Fu T., Yang J.‐C., Ma D., Zhu X.‐Z., Wang X.‐X., Jiao Y.‐P., Jin X., Xiao Y., Nat. Commun. 2023, 14, 6796; [DOI] [PMC free article] [PubMed] [Google Scholar]; b) Gao R., Yuan X., Ma Y., Wei T., Johnston L., Shao Y., Lv W., Zhu T., Zhang Y., Zheng J., Cell Rep. Med. 2024, 5, 101536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Antelmi L., Ayache N., Robert P., Lorenzi M., Int. Conf. Mach. Learn. PMLR 2019, 97, 302. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Codes used in this study are publicly available at https://github.com/ZhoulabCPH/MAIGGT/.


Articles from Advanced Science are provided here courtesy of Wiley

RESOURCES