His‐MMDM: Multi‐Domain and Multi‐Omics Translation of Histopathological Images with Diffusion Models

Zhongxiao Li; Tianqi Su; Bin Zhang; Wenkai Han; Sibin Zhang; Guiyin Sun; Yuwei Cong; Xin Chen; Jiping Qi; Yujie Wang; Shiguang Zhao; Hongxue Meng; Peng Liang; Xin Gao

doi:10.1002/advs.202518066

. 2026 Jan 26;13(19):e18066. doi: 10.1002/advs.202518066

His‐MMDM: Multi‐Domain and Multi‐Omics Translation of Histopathological Images with Diffusion Models

Zhongxiao Li ^1,^2,³, Tianqi Su ⁴, Bin Zhang ^1,², Wenkai Han ^1,², Sibin Zhang ⁴, Guiyin Sun ⁴, Yuwei Cong ⁵, Xin Chen ⁶, Jiping Qi ⁵, Yujie Wang ⁴, Shiguang Zhao ^6,^✉, Hongxue Meng ^7,^✉, Peng Liang ^4,^✉, Xin Gao ^1,^2,^✉

PMCID: PMC13045368 PMID: 41588694

ABSTRACT

Generative AI (GenAI) has advanced computational pathology through various image translation models. These models synthesize histopathological images from existing ones, facilitating tasks such as color normalization and virtual staining. Current models, while effective, are mostly dedicated to specific source‐target domain pairs and lack scalability for multi‐domain translations. Here, we introduce His‐MMDM, a diffusion model‐based framework enabling multi‐domain and multi‐omics histopathological image translation. His‐MMDM is not only effective in performing existing tasks such as transforming cryosectioned images to FFPE ones and virtual immunohistochemical (IHC) staining but can also facilitate knowledge transfer between different tumor types and between primary and metastatic tumors. Additionally, it performs genomics‐ and/or transcriptomics‐guided editing of histopathological images, illustrating the impact of driver mutations and oncogenic pathway alterations on tissue histopathology and educating pathologists to recognize them. These versatile capabilities position His‐MMDM as a versatile tool in the GenAI toolkit for future pathologists.

Keywords: deep learning, diffusion models, generative artificial intelligence, histopathological image analysis

His‐MMDM is a diffusion model‐based framework for scalable multi‐domain and multi‐omics translation of histopathological images, enabling tasks from virtual staining, cross‐tumor knowledge transfer, and omics‐guided image editing.

graphic file with name ADVS-13-e18066-g004.jpg

1. Introduction

Generative artificial intelligence (GenAI) in imagery has marked a groundbreaking advance in the field of artificial intelligence (AI), offering transformative capabilities across a wide range of applications. At the forefront, the state‐of‐the‐art text‐to‐image models such as Stable Diffusion [1], Midjourney [2], and DALL‐E 3 [3], transform text descriptions into vivid visual representations. Conversely, image‐to‐text models, such as GPT‐4V [4], interpret and convert visual data into descriptive text, enabling open‐ended dialogs about the image content with users. There are also image‐to‐image translation models that use existing images as templates to generate new ones, enabling applications such as image editing, resolution enhancement, and sketch‐to‐image synthesis [1, 5]. Collectively, these models are revolutionizing the way we create, interpret, and interact with visual contents.

Such a revolution is also gradually taking place in the field of computational pathology. Early explorations have demonstrated the effectiveness of using generative adversarial networks (GANs) [6] or diffusion models (DMs) [7] to generate synthetic histopathological images [8, 9, 10], and have been mainly used as a data augmentation strategy [11, 12, 13]. Concurrently, visual language foundation models and copilot systems are also being developed in computational pathology to automate medical visual question answering [14, 15, 16, 17, 18]. As for image‐to‐image translation, dedicated models have been developed for various common tasks such as color normalization [19], stain transfer [20], virtual staining [20, 21, 22, 23], and transforming cryo‐sectioned or stimulated Raman images to formalin‐fixed paraffin‐embedded (FFPE) ones [24, 25].

In general, there are two types of image‐to‐image translation tasks: paired and unpaired. The paired image translation task requires matched source and target domain images during training, which is usually not readily available [26]. In contrast, the unpaired image translation task requires the algorithm to learn the mapping between a source and target domain on itself, making it more useful in real‐world applications [26]. It is thus not surprising that almost all the existing image translation models in computational pathology are unpaired. Although the abovementioned image translation models serve as important computational pathology applications, their full potential still has not yet been unleashed. First, current applications are mostly limited to GANs, whereas recent developments of DMs show their superior performance on conditional generation as compared to GANs [27]. Second, the current GAN‐based histopathological image translation models are limited to the translation between a couple of specific source‐target domain pairs and their extension to the translation between multiple domains is not straightforward [28]. However, simultaneous translation across multiple domains has a wider range of applications in computational pathology. For example, one can edit one real histopathological image into multiple synthesized versions corresponding to altered genomic or transcriptomic profiles, which can potentially model the impact of mutations and altered gene expression on the histopathological appearance of the tissue. Furthermore, multiple‐domain translations may also alleviate the inefficiency of pairwise translation, reducing the number of n(n − 1)/2 individual pairwise models to one multi‐domain model. Thirdly, current GAN‐based models have specifically designed components or loss functions that may hinder their extensibility to new applications. For example, a virtual immunohistochemical (IHC) staining model [23] incorporated a dedicated network within the discriminator and employed loss functions to enforce consistent staining patterns between cells in real and virtual IHC images. While these design principles are effective for this specific application, they may limit the model's extensibility to translation tasks unrelated to virtual staining.

Due to the above limitations, in this study, we propose a novel versatile framework, Histopathological image Multi‐domain Multi‐omics translation with Diffusion Models (His‐MMDM). His‐MMDM achieves image translation by diffusing the source domain images into noisy images distributed as Gaussian distribution and then denoising the noisy images back into target domain images. Compared with previous image translation models in histopathology, His‐MMDM stands out with two unique capabilities: (1) it can be efficiently trained to translate images between an unlimited number of categorial domains (through class‐conditional inputs to the model) (2) it can perform genomics‐ or transcriptomics‐ guided editing of histopathological images (through multi‐omics inputs to the model). For categorical domain translation, we demonstrated His‐MMDM's performance through four comprehensive tasks using four independent datasets, including the translation of cryo‐sectioned slide images to FFPE ones, virtual IHC staining of hematoxylin and eosin (H&E) images, primary tumor type translation, and tumor organ site translation. Most notably, we showed through comprehensive experiments that His‐MMDM's generated contents can improve the performance of recent histopathological foundation models [29, 30, 31] that were mainly trained on FFPE images on various discriminative and retrieval tasks. Using the histopathological images in The Cancer Genome Atlas (TCGA) that are equipped with the matched genomic and transcriptomic profiles, we demonstrated His‐MMDM's effectiveness in performing genomics‐ and transcriptomics‐guided editing of the images. We then systematically discussed and illustrated through notable examples the effect of high‐frequency somatic mutations and alterations on the expression level of genes/pathways on the histopathology of the tissue from a generative model's perspective. We then demonstrated the usefulness of these AI‐generated contents (AIGC) by showing that they are useful in educating pathologists to recognize these underlying conditions. Overall, we believe that His‐MMDM showcases the possibility of generic multi‐domain and multi‐omic translation and generation in histopathology, provides a powerful tool for pathologists, and could serve as an integral component of GenAI copilot systems in the future.

2. Results

2.1. The Design of His‐MMDM Model

His‐MMDM is trained with histopathological images based on the principle of diffusion models (DMs) [7] and denoising diffusion implicit bridges (DDIB) [28] (Methods). The DM in DDIB is comprised of a forward diffusion and a backward denoising process (Figure 1). In the forward diffusion procedure, a histopathological image (X_T ) is gradually diffused into a noisy image (X ₀) following a Gaussian distribution using a pre‐determined schedule and in the backward denoising procedure, the noisy image is gradually denoised back into a histopathological image (Methods). At each iteration of the denoising procedure, a U‐net‐based [7, 32] network is used to predict noise from the image at the current iteration step, which is utilized in the subsequent denoising step derived from the inversion of the forward procedure (Methods) [33]. Conditional generation is achieved by the addition of condition information to the U‐net through embedding vectors added to each layer of the U‐net blocks (Figure 1; Table S1, Methods). The DM used in His‐MMDM can handle three types of condition information: (1) categorical labels, such as FFPE/cryosection, (2) genomics, which includes the mutation status of the corresponding sample, and (3) transcriptomics, which includes the transcriptomic profile of the sample. After His‐MMDM is trained, it achieves image translation by (1) executing the forward diffusion procedure of the trained DM by conditioning the U‐net with the source domain condition information and then (2) executing the backward denoising procedure by conditioning the U‐net with the target domain condition information (Figure 1, detailed in Methods). In this way, His‐MMDM establishes a general, multi‐domain translator of images. To enable large‐scale experiments, we employed a procedure (detailed in Methods) to empirically determine an optimal acceleration strategy for the DM sampling process, which speeds up diffusion sampling without significantly compromising image quality. We train four seperate His‐MMDM models for four distinct categorical domain translation tasks, as well as one His‐MMDM model for multi‐omics profile‐guided editing (Table 1, detailed in Table S1). His‐MMDM's categorical domain translation and multi‐omics profile‐guided editing abilities will be supported by extensive experiments detailed in the following sections.

Architecture of His‐MMDM. The architecture of Histopathological image Multi‐domain Multi‐omic translation with Diffusion Models (His‐MMDM). Based on the principle of DMs and DDIBs, the forward diffusion procedure gradually diffuses a histopathological image into a noisy image following a Gaussian distribution. The backward procedure takes in the noisy image and gradually denoises it back into a real histopathological image. A U‐net‐based network makes noise prediction for a given input and takes in conditional information in the format of categorical labels, genomic profiles, and transcriptomic profiles in its embedding layers. The procedures to perform categorical translation tasks, i.e., Cryosection to FFPE conversion, virtual staining, and tumor type translation, as well as the genomics‐ / transcriptomics‐guided editing task, are shown alongside the general diffusion model procedure. This is achieved by executing the forward process by putting the model in the source domain condition ('cryosection', 'H&E', 'Tumor Type A', or 'Genomics/transcriptomics Profile A'), and then execute the backward process by putting the model in the target domain condition ('FFPE', 'IHC Marker', 'Tumor Type B', or 'Genomics/Transcriptomics Profile B'), after the U‐net‐based network is trained.

TABLE 1.

Major Tasks performed by His‐MMDM.

Tasks Name	Description	Figures	Task Type	Remarks
Cryosection to FFPE conversion	Translation from cryosectioned histopatholocal image patches to FFPE	Figure 2	Categorical domain
Virtual IHC staining	Translation from H&E‐stained histopatholocal image patches to IHC‐stained	Figure 2		Supports unrestricted number of IHC markers
Translation across primary tumor types	Translation from one primary tumor type in TCGA to another	Figure 3		Novel application
Translation across primary and metastatic organ sites	Translation of lung tumor images between primary (lung) and metastatic (brain, lymph node) organ sites	Figure 4		Novel application
Genomics‐guided editing	Editing images based on modifications in genomics profiles	Figure 5	Multi‐omics profile	Novel application; Supports the freeform manipulation of multiple genes/gene combinations
Transcriptomics‐guided editing	Editing images based on modifications in transcriptomics profiles	Figure 6	Multi‐omics profile	Novel application; Supports the freeform quantitative manipulation of multiple genes/gene combinations

Open in a new tab

2.2. Cryosection to FFPE Conversion and Virtual IHC Staining of Histopathological Images

We first evaluated His‐MMDM on a previously studied domain translation task: converting cryosectioned slide images into formalin‐fixed, paraffin‐embedded (FFPE) images [24].

To convert cryosectioned slide images into FFPE with His‐MMDM, we trained it to conditionally generate TCGA slide images of their respective slide type. We collect histopathological image patches from a total of 19 tumor types from five families, including five gastrointestinal (GI) (ESCA, STAD, COAD, READ, and PAAD), two lung (LUAD and LUSC), three kidney (KIRP, KICH, and KIRC), three pan‐gynecological (GYN) (OV, BRCA, and UCEC), and five other tumor types (LIHC, HNSC, SARC, THCA, BLCA, and PRAD) (Methods, statistics shown in Table S2, abbreviations defined in Table S3). We ran the forward diffusion process on a cryosectioned slide image under the condition ‘cryosection’ followed by the backward denoising process under the condition ‘FFPE’ (Figure 2A, Methods, Table S1). We translated the test image patches from all 19 tumor types in TCGA using our pre‐trained model and evaluated the reduction of the Frechet Inception Distance (FID) before and after translation (ΔFID = FID_before − FID_after) as the metric for evaluating the performance of translation (Figure 2B; Table S4). Although His‐MMDM is not specifically designed for this task, it achieved performance comparable to the dedicated method AI‐FFPE [24] across most tumor types, even surpassing it in seven out of 19 tumor types (p < 0.05 in four tumor types). In the remaining tumor types where AI‐FFPE has an edge, the performance gap between His‐MMDM and AI‐FFPE is not greater than that observed with other methods. Consistent with the original publication, AI‐FFPE performs best when its application‐specific components are enabled, namely, self‐regulation (SR) and spatial attention block (SAB), and they seem to have synergistic effects (Figure 2B). When these application‐specific components are disabled, His‐MMDM can outperform AI‐FFPE in the majority of tumor types (18/19, p<0.05 in 12/19) (Figure 2B; Table S4). Overall, His‐MMDM achieved better performance compared to other general GAN‐based models, CycleGAN [26], CUT [34], and StyleGAN2 [35]. For other diffusion model‐based image translation methods, His‐MMDM performs better than the few‐shot translation model D2C [36]. Interestingly, His‐MMDM performs better than AI‐FFPE in the Improved Precision metric (Figure S1A), but lower than AI‐FFPE in the Improved Recall metric (Figure S1B), which suggests a lower diversity of the His‐MMDM translated images than AI‐FFPE (Figure S1C). We found that His‐MMDM can alleviate the problem of tissue dehydration and nuclear deformation that are common in cryosectioned slides and generate a better presentation of contrasting regions and arrangement of the histopathological contents (Figure 2C). Through a formal visual examination and rating (on a 1‐5 scale) by the pathologists (Methods), they on average gave a higher than neutral rating (a rating of 3) to each of these aspects (Figure S1D).

Cryosection to FFPE conversion and virtual IHC staining of histopathological images. (A) A schematic that illustrates how His‐MMDM achieves cryosection to FFPE conversion. (B) Evaluation of His‐MMDM (in terms of ΔFID scores) in performing cryosection to FFPE conversion and comparison with other dedicated (AI‐FFPE and its various ablations, AI‐FFPE w/o SR, AI‐FFPE w/o SAB), non‐dedicated GAN‐based (CycleGAN, CUT, StyleGAN2), and diffusion model‐based (D2C) image translation models. Error bars indicate 95% confidence interval. Inner p values are for the comparison between His‐MMDM and AI‐FFPE when His‐MMDM has an edge. Outer p values indicators are for the comparison between His‐MMDM and AI‐FFPE w/o SR or AI‐FFPE w/o SAB, whichever performs better. * p<0.05, ** p<0.01, *** p<0.001. (C) Examples of cryosection to FFPE conversion. (D) His‐MMDM converted cryosectioned slide images improve the image retrieval performance (in terms of MV@3) of the pre‐trained models, PLIP, CHIEF, CLIP, and Inception V3. FFPE: using FFPE images, Cryo: using original cryosection images, His‐MMDM: using His‐MMDM translated images. Results are reported as mean ± std, with performance gains from His‐MMDM highlighted in a separate, color‐coded column. (E,F) His‐MMDM converted cryosectioned slide images improve the performance of the foundation model, CHIEF, on IDH1 mutation prediction in glioma (E) and MSI prediction in CRC (F). (G) The ‘inverted normalized FID scores’ of the virtually stained images for each IHC marker in each glioma and meningioma subtype. (H‐I) Examples of virtual staining of H&E histopathological images in glioblastoma (H) and atypical meningioma (I). His‐MMDM can perform virtual staining of the glioma‐specific markers, the meningioma‐specific markers, as well as the common markers used by both primary brain tumor types. The positive regions (indicated by the brown‐colored DAB stain) are also in agreement with the respective markers’ expected location of expression (nucleus, cytoplasm, and extracellular matrix).

We emphasize that translating cryosectioned slide images into FFPE images has broader implications beyond merely improving image quality for pathologists' observation. Although recent histopathological foundation models [14, 30, 31] are pre‐trained on diverse multi‐center cohorts, the majority of their training data consists of FFPE slide images rather than cryosectioned ones. Consequently, directly applying these models to cryosectioned slide images risks significant performance degradation. We systematically assessed whether this degradation could be mitigated by first translating cryosectioned slides into FFPE images using His‐MMDM. Following a similar image‐to‐image retrieval method as in the PLIP paper [14], we evaluated the performance of various image encoders on an image‐to‐image retrieval task (Figure S2A, Methods). With His‐MMDM, the retrieval accuracy increased by 0.30, 0.27, 0.08, and 0.08 for pretrained PLIP, CHIEF [31], CLIP [37], and InceptionV3 [38], respectively (Figure 2D). Additionally, His‐MMDM improved the text‐guided classification accuracy of PLIP by 0.39 (macro‐averaged, Methods, Figure S2A,B). His‐MMDM also enhances CHIEF on tasks such as inferring microsatellite instability (MSI) in colorectal (CRC) tumors (Figure 2E) and IDH1 mutation status in gliomas (Figure 2F).

To further assess His‐MMDM's general capability in translating images across categorical domains, we applied it to a previously studied multi‐domain image translation task—virtual IHC staining—without any task‐specific modifications to the model architecture [22, 23, 39]. For IHC virtual staining of histopathological images, we collected patch images of 13 common markers used in brain tumor diagnosis spanning five glioma subtypes and five meningioma subtypes from The First Affiliated Hospital of Harbin Medical University (hereafter referred to as the ‘HMU‐1st dataset’) (Methods, Table S2). Of the 13 IHC markers, five are routinely used in glioma, four in meningioma, and four for both two (Table S5). In this dataset, the H&E slides and the IHC slides of each patient are simultaneously available. His‐MMDM is then trained on this dataset to perform conditional generation of both H&E images as well as each of the 13 IHC markers. It achieves virtual staining by running in serial the forward diffusion process conditioned on the label ‘H&E’ and the backward denoising process conditioned on the label of the desired IHC marker (Figure S2C).

We evaluated the ‘inverted normalized FID scores’ (Methods) of the translated images in each subtype of glioma and meningioma (Figure 2G). The IHC markers into which the H&E images are translated tend to have higher performance when the marker is more compatible with the brain tumor type/subtype. For instance, the glioma markers (GFAP, ATRX, IDH‐1, MGMT, and Oligo‐2) tend to have higher performance in gliomas and so do the meningioma markers (EMA, PR, and SSTR2) in meningiomas (Figure 2G). The positive regions of the virtually stained images (as indicated by the brown‐colored diaminobenzidine (DAB) stain) are both in agreement with the marker's expected location of expression (in the nucleus or the cytoplasm and extracellular matrix) (Figure 2H,I; Figure S2D,E), as well as the corresponding glioma subtype (Figure S2F–H; Section S1).

2.3. Knowledge Transfer of Histopathological Images Across Primary Tumor Types

The general categorical‐domain image translation capability of His‐MMDM can enable novel applications. We next experimented with His‐MMDM to ‘style‐transfer’ [40] histopathological images of one tumor type to another. Using discriminative models, previous studies [41, 42] demonstrated high cross‐classification performance on a lot of these tumor type pairs, i.e., the performance of a tumor classifier trained on one tumor type and tested on another, which suggests conserved tumor image features across different types of tumors. Because of this, we are interested to find out, whether His‐MMDM could ‘style‐transfer’ an image from one tumor type to another, while keeping its content, i.e., the outline of the tissues and cells.

The categorical domains selected for this task are the above‐mentioned 19 primary tumor types of The Cancer Genome Atlas (TCGA) (Methods). We tasked His‐MMDM to translate the histopathology images between each two across these 19 primary tumor types (Figure 3A). We systematically evaluated the translation performance of His‐MMDM and other baseline methods and discussed the heterogeneous performance across different translation pairs (Figure S3A,B; Section S2). The translation process generally kept the cellular arrangement and structure of the source image while keeping the contents compatible with the target tumor type. For example, colorectal adenocarcinoma (COAD) usually features glandular structures formed by the epithelial cells (Figure 3B). When the colorectal images are translated to other adenocarcinomas, those structures are kept intact (Figure 3B). However, when they are translated to non‐adenocarcinomas, those structures are attenuated as much as possible and replaced with some target type‐specific contents (Figure 3B). As before, pathologists conducted a formal visual examination and assigned ratings on a 1–5 scale (Methods). On average, they gave each aspect a rating higher than the neutral value of 3 (Figure S3C).

Translation of histopathological images across primary tumor types. (A) Schematic that illustrates how His‐MMDM translates histopathological images across primary tumor types. (B) Image examples showing the effect of translation across tumor types. The COAD patches were translated across three adenocarcinomas (READ, STAD, and PAAD) and four non‐adenocarcinomas (LUSC, LIHC, KICH, and SARC). The typical glandular structures of adenocarcinomas were retained in the former group but attenuated in the latter group. (C) The classification performance (F1 score) of binary tumor classifiers that are trained on other tumor types (as targets of translation) using either translated/untranslated images when there is no additional fine‐tuning (‘Ratio for fine‐tuning = 0’) or with additional fine‐tuning (‘ratio for fine‐tuning = 0.3, 0.6, 0.9’) in each tumor type (as the source of translation and the goal of classification). Data for each source of translation are aggregated for a particular tumor family and through macro‐averaging. Error bars indicate 95% confidence interval.

Translation of images from one tumor type to another enables new applications. For example, suppose we have a discriminative model for tumor type A and we would like to apply it to another tumor type B where such a model is unavailable. Instead of directly applying it to the images from B, which is the cross‐classification strategy as in Noorbakhsh et al. [41], we could first adapt the images into type A and then use the model to classify or further fine‐tune the model on the adapted images. In this way, we could use His‐MMDM to achieve knowledge transfer across tumor types. We compared the performance (in terms of F1 score) of binary tumor classification models trained on one tumor type (the target of translation) and cross‐classifying images from another tumor type (the source of translation/the goal of classification) using either the original images or the translated images. The classification performance in four out of five tumor families corresponding to 12 out of 19 tumor types on average showed higher performance using the translated images (Figure 3C; Figure S3D,E, ‘ratio for fine‐tuning = 0’). Additionally, to improve the classification models’ performance, we fine‐tuned the classification models using 30%, 60%, and 90% of the translated images (Figure 3C; Figure S3D,E, ‘ratio for fine‐tuning = 0.3, 0.6, 0.9’). The number of tumor types with higher performance using the translated images increases to 13 at 60% and 15 at 90%. Inference of tumor of unknown primary (TUP) has recently gained a lot of attention due to the strong performance of machine learning models on such tasks [43, 44]. We demonstrated that the translation between different primary tumor types enabled by His‐MMDM is useful for the inference of tumor origin from a generative model's perspective and delegate the discussions to the Supplementary Materials (Section S1; Figure S4A–D).

2.4. Knowledge Transfer of Histopathological Images Between Primary and Metastatic Organ Sites

Metastasis of tumor is one of the leading factors that reduces a patient's life quality and survival. Therefore, the prediction of metastasis of tumors has important application values in clinical diagnosis. From a histopathological perspective, tumors displaying higher levels of malignancy tend to have a higher tendency to metastasize [45]. Several previous studies have already attempted to build discriminative models that can infer the metastatic potential of tumors from such images [46, 47, 48].

By training His‐MMDM on tumor histopathological images from different primary/metastatic organ sites, it could acquire a similar aforementioned ‘style‐transfer’ capability of the images. However, we sought to take this a step further by addressing a key question about knowledge transfer: Can His‐MMDM leverage the knowledge gained from generating images of metastatic organ sites to enable prognostic and staging assessments for primary site images? With this, we explored the model's potential in the prediction of metastatic potential and prognosis of the patients, but from the perspective of a generative model.

For this purpose, we trained His‐MMDM on 475 lung tumor histopathological WSIs, including 315 from primary lung tumors, 76 from lymph node metastases, and 84 from brain metastases from the Harbin Medical University Cancer Hospital (the ‘HMU‐C dataset’, Methods, Table S2). During training, His‐MMDM was conditioned on the site (lung, lymph, and brain) of the tumor tissue, and during inference, His‐MMDM was asked to translate the images from one site to another (Figure S5A). As before, we computed ΔFID between each pair of the translations. The ΔFID scores are much higher when translating from a non‐solid organ site (lymph node) to a solid organ site (lung and brain) and the translation within solid organ sites (lung and brain) tends to result in lower ΔFIDs (Figure 4A). The original and translated histopathological images, showed that His‐MMDM could keep the cellular arrangements and characteristics of the tumor subtypes during the translation processes, but the tumor growth environment is modified to match the designated organ sites (Figure 4B). As before, pathologists conducted a formal visual examination and they gave these aspects a rating higher than the neutral value of 3 (Figure S5B).

Translation of histopathological images between primary and metastatic organ sites. (A) The improvement of FID score before and after translation between each pair of organ sites (lung, lymph node, and brain). Bar plots show the averaged ΔFID for a particular translation source site (rows) or target site (columns). (B) Histopathological image examples of three major lung tumor subtypes (adenocarcinoma, squamous cell carcinoma, and small cell carcinoma) illustrating the effect of translating images from the primary site (lung) to the common metastatic sites of lung tumor (brain and lymph node). (C) Schematic and visualization of running the outlier detection model (Isolation Forest) on the translated images from the lung to the lymph nodes. The images are visualized according to their structure in the tSNE space and the ‘inlier scores’ are superimposed. The images with low ‘inlier scores’ (out of distribution) tend to be eccentric‐looking and are located on the edge. (D) The relationship between the brain inlier score and the stagings of cases in the TCGA lung tumor cohort. The stagings include the M (distant metastasis) and N (extent of regional lymph node spread) stagings in the TNM staging system. The dashed lines inside the violin plots indicate 25%, 50%, and 75% quartiles. (E,F) The relationship between the lymph node (E) and brain (F) inlier score to TCGA patients’ overall survival. The inlier scores of the images are aggregated per patient through averaging.

We systematically visualized the images translated from the primary organ site (lung) to the metastatic organ sites, including the lymph node (Figure 4C) and brain (Figure S5C) according to their Inception V3 features in the 2D tSNE space. The images located at the interior of the distribution are more likely to be properly translated from the primary site characterized by the properly arranged tumor cell nuclei and the more realistic background tumor environment when assessed by pathologists (Figure 4C; Figure S5C,D). On the contrary, the translated images that look eccentric (characterized by incorrect cell shape, infiltration, and unrealistic backgrounds) tend to be located at the edge of the distribution whose appearance indicates less successful image translation which could be attributed to a lack of compatibility of the primary site image with the metastatic site (Figure 4C). We speculate that the low quality of some translated images suggests inherent difficulty in the translation of such primary site images to the target organ site, possibly due to their incompatibility with the target organ. To quantify this phenomenon, we trained an outlier detection model, Isolation Forest [49], on the Inception V3 features of the translated images and produced an ‘inlier score’ (in the range [‐1,1], ‐1 for the most out‐of‐distribution and 1 for the most in‐distribution) for each lung image to measure its compatibility with the metastatic organ site (Figure 4C; Figure S5C). We speculated tumors that are easy to translate to a particular metastatic site image will have a higher ‘inlier score’ (compatibility) with that particular organ site. To verify this, we translated primary lung tumor histopathological images (LUAD and LUSC) from TCGA to the metastatic organ sites (brain and lymph node) using the trained His‐MMDM model and computed their inlier scores using the outlier detection model. Interestingly, we found that the inlier scores indeed displayed associations with the staging of the patients. Specifically, a patient's high brain inlier score is associated with M1 staging and an overall staging of III‐IV, and a patient's high lymph node inlier score is associated with N1‐N3 staging and an overall staging of II‐IV (Figure 4D). Moreover, a high lymph node inlier score is associated with poorer survival of the patients (Figure 4E, p = 5.6e‐04). This trend also holds for the brain inlier score (Figure 4F, p = 0.039). We additionally compared the lymph node inlier Score and brain inlier score with established clinical indicators reflecting lung tumor severity, including Stage Group (Stages I–IV) and TNM staging and found out that the inlier scores’ association with a patient's survival is as significant as these clinical indicators (Figure S5E,F).

2.5. Genomics‐Guided Editing of Histopathological Images

Somatic mutations in oncogenes and tumor suppressors are the driver factors in tumorigenesis. Previous studies have reported strong associations of driver mutations with histopathological image features, which were demonstrated through the predictive power of discriminative models [50, 51, 52, 53, 54, 55, 56]. More recently, pan‐cancer studies have shown the identifiability of common driver mutations (e.g. TP53 and PTEN) shared in multiple tumor types using deep transfer learning [57]. In addition to mutations, transcriptomic dysregulation in tumors has also been linked to alterations in histopathological image features, as shown by previous studies that have attempted to predict transcriptomic expression levels from histopathological images with varying degrees of success [57, 58].

We explored the capability of His‐MMDM to decipher these associations but from a generative model's perspective. To this end, we trained another His‐MMDM model from the TCGA histopathological images conditioned simultaneously on the genomic mutation profiles and the transcriptomic profiles of the corresponding samples (Methods). In total, the genomic embeddings of His‐MMDM take into account the 334 genes from the ten common oncological signaling pathways [59], and 188 other genes with high pan‐cancer mutation rate or discriminative performance [57] in the TCGA cohort (Table S6). Similarly, the transcriptomic embeddings include 4,193 genes that were included in the collection of 50 hallmark pathways of MSigDB (v2023.2) [60, 61], as well as an additional 268 genes that either displayed the greatest variability between tumor and normal samples or have previously reported discriminative performance [58] (Table S6).

We first performed in silico genomic modification experiments using the trained His‐MMDM model. For a given histopathological image, the forward diffusion process is executed upon it conditioned on its original genomic mutation profile, e.g., gene X is wide‐type (WT) and gene Y is mutated, and the backward denoising process is then executed conditioned on a modified genomic profile, e.g., gene X is mutated and gene Y is WT (Figure 5A). In this way, we are essentially asking the model to edit the given histopathological image into a version if its gene X and gene Y's mutation statuses were as designated.

The genomics‐guided editing of histopathological images. (A) Schematic that illustrates how His‐MMDM achieves histopathological image editing guided by genomic and transcriptomic profiles. (B) The effect of genetic mutations on histopathological images by each tumor type, measured by cosine distance between the WT version image and mutated image. The top 30 genes with the highest pan‐cancer mutation rate are selected. The cosine distances aggregated by each gene‐tumor type pair through averaging are normalized either by each tumor type (circle color) or cross‐tumor (circle size). Bar plots aggregate per each gene (row) or tumor type (row). The mutations of genes have a greater effect on histopathology in the gene‐tumor type pairs where there were previously reported associations by Fu et al. (Wilcoxon rank‐sum test; all gene‐tumor pairs: p = 6.7e‐05; only high‐frequency (>10%) mutations: p = 0.03). (C) The mutation of genes *APC*, *PTEN*, *TP53*, and *CDKN2A* increases the numbers of malignant cells detected by Hover‐Net in COAD, UCEC, OV, and ESCA, respectively. P values are from the Wilcoxon signed‐rank test. Each dot represents a particular image without (blue) or with (red) a particular mutation. (D) Histopathological image examples illustrating the effect of mutations of *APC*, *PTEN*, *TP53*, and *CDKN2A* in COAD, UCEC, OV, and ESCA, respectively. Structural distance maps are shown in the last row to highlight the differences between the two versions. (E,F) The computation of the change in cosine distance of a query image (w/o mut) to its nearest neighbors in the database when the image is edited into a version with mutation (w/mut) is illustrated in (E). 18 out of the top 25 tumor type and mutation pairs where His‐MMDM's edits resulted in the most notable feature changes displayed a statistically significant reduction in the distance to their nearest neighbors (F). q values are from the Wilcoxon signed‐rank test adjusted by the Benjamini–Hochberg procedure. (G) Performance of pathologists’ recognition of the top mutations in four tumors (COAD, UCEC, OV, and ESCA) before and after observing His‐MMDM's generated images as a tutorial. In the left four panels, statistical significance of improvement is calculated base on simulating a null distribution of the percentage of correct answers when the pathologists’ decisions are random. In the rightmost panel, the results from different tumor type‐mutation pairs are pooled together and the p value from Wilcoxon signed‐rank test is shown. (H) Summary of the effect of five tumor suppressors and three epigenetic factors on the development and metastasis of prostate tumors reported by Cai et al. (I) The effect of mutations in the tumor suppressors and epigenetic factors on the histopathology images. Boxplots represent median ± IQR; tails: min/max excluding outliers (±1.5 x IQR). (J) An example illustrating His‐MMDM's ability to infer images at different tumor developmental stages (mutation in one tumor suppressor gene, mutation in all tumor suppressor genes, and mutation in all tumor suppressor and epigenetic factor genes) from a normal image. Cosine distances are computed w.r.t. the initial normal image (w/o mutation). (K) An example illustrating His‐MMDM's ability to infer images when there is more mutation or less mutation than the observed image (mutation in *TP53* and *KMT2D*). Such inference can be made along different paths, i.e., the solid line and the dashed line, representing mutations accumulating in different orders. Cosine distances are computed w.r.t. the initial inferred image (w/o mutation).

We first investigated the effect of single mutations among the top 30 genes with the highest mutation rates across the aforementioned 19 tumor types on the appearance of histopathological images. For each image, we used His‐MMDM to produce one version of it when all genes are WT. Then we produced a series of other images when each one of the genes is mutated. The mutation of different genes resulted in modifications in different aspects of the histopathological image. For example, in COAD, the mutation of APC had the most significant effect on the nuclear shape of the cell, TP53 on the cellular density, and SMAD4 on the background and microenvironment (Figure S6A). To measure the visual effect of the mutation, we computed the cosine distance between the WT image and each ‘mutated image’. Specifically, for each gene in a specific tumor type, the cosine distances of the Inception V3 features of the two versions are averaged and normalized both within‐ and cross‐tumor (Figure 5B). Some tumor types show the greatest overall effect of single mutations on their visual appearance, such as the kidney tumors (KICH, KIRC, and KIRP). While some other tumor types showed greater effect within their tumor categories, such as COAD and READ among the GI tumors and OV in GYN tumors (Figure 5B). Mutations in HIPPO, NOTCH, and RTK‐RAS pathways tend to have the greatest pan‐cancer effect on histopathology. Interestingly, these genes in NOTCH and RTK‐RAS are NCOR2 and ERBB4, but are not NOTCH1 and KRAS themselves (Figure 5B). The mutations of genes have a greater effect on histopathology in tumor types where there were previously reported associations between them based on discriminative performance reported by Fu et al. [57] (all gene‐tumor pairs: p = 6.7e‐05; p = 0.03 only high‐frequency mutations (>10%)). We repeated the experiments using structural distance and the visual features computed from the pre‐trained histopathological image model CHIEF instead of InceptionV3 (Methods) and a very high correlation was observed between the two (Figure S6B–E). We applied the cell detection algorithm, Hover‐Net [62], to the generated images of the genes which have high mutation rates across multiple tumors. Most notably, the generated images corresponding to the mutated version of APC, PTEN, TP53, and CDKN2A have a higher number of malignant cells compared to their WT counterparts in COAD, UCEC, OV, and ESCA, respectively (Figure 5C,D). Hereafter, a structural distance map (Methods) for each edited image to highlight the edited regions (Figure 5D).

We further validated the results of genomics‐guided image editing by searching both the original and edited images in a curated histopathological image database with accompanying genomics profiling data (Methods). The database is curated to include real histopathological images that represent various mutations across different tumor types. We compared the original images (w/o mutation) and edited images (w/ mutation), calculating the cosine distance (InceptionV3 feature) between each image and its nearest neighbor in the database (Figure 5E). For the top 25 tumor types and mutation pairs where His‐MMDM's edits resulted in the most notable feature changes (Figure 5B), 18 pairs showed a statistically significant reduction in the distance to their nearest neighbors (Figure 5F). Additionally, we used these paired images (w/ and w/o mutation) as educational material for pathologists, assessing their ability to recognize such mutations after reviewing these examples (Methods, Supplementary Data). We observed an improvement in the pathologists' accuracy following exposure to these examples of seven out of eight mutations in the four tumor types COAD, UCEC, OV, and ESCA (Figure 5G; Figure S6F,G), and this improvement is greater than if unpaired real images were used (Figure S6H). We further trained a binary classification model on 1K – 4K His‐MMDM generated images with or without the above mutations, and achieved F1 score higher than 0.7 in three out of eight of them (Figure S6I). These findings suggest that the image content generated by His‐MMDM aligns well with reality.

Apart from single mutations, we also investigated the effect of accumulating mutations on the generated histopathological images by His‐MMDM. For a total of 331 genes in nine oncological pathways (one pathway excluded due to its low number of genes), we sorted the genes in each of them according to their mutation rate (from high to low, Table S6) and edited the images by sequentially performing the mutation of 25%, 50%, 75%, and 100% genes in each pathway. We then compared the cosine similarity of Inception V3 features of each of them to the WT version of the image (Figure S7A). Some pathways, such as Cell Cycle and WNT signaling pathways in COAD, displayed continuous changes as the mutation accumulates (Figure S7A,B). But more frequently, a saturation effect was observed after the most significant change had already resulted from the initial 25% of the mutations (Figure S7A,B).

Using a CRISPR/Cas9 mouse model, a recent study [63] has confirmed the sufficiency of loss‐of‐function mutations of five tumor suppressor genes (PTEN, TRP53 (mouse homolog of the human TP53), RB1, STK11, and RNaseL) to induce prostate tumor progression and three additional epigenetic factors (KMT2C, KMT2D, and ZBTB16) to initiate tumor metastasis (Figure 5H). Among them, only RNaseL and ZBTB16 are not included in the His‐MMDM genomic embeddings. We therefore attempted to edit the PRAD images in the TCGA cohort according to the mutation of the remaining seven genes. For each image, we used His‐MMDM to produce one version free of mutations in all tumor suppressors and epigenetic factors (WT), and several mutant versions for (1) individual mutations in the tumor suppressors and epigenetic factors, (2) combined mutations in all tumor suppressors, (3) combined mutations in all epigenetic factors, and (4) combined mutations in all tumor suppressors + epigenetic factors. We then computed the cosine distances of the Inception V3 features of the mutated images versus the WT images (Figure 5I). Among the tumor suppressors, STK11 induces the greatest histopathological changes in the tissue histopathology (Figure 5I). The tumor suppressors generally had higher impact than the epigenetic factors, which is consistent the previous study that it is these factors that mainly contributed to the in situ tumorigenesis of prostate tumor. The advantages of His‐MMDM enabled in silico interpolation of histopathological images corresponding to different mutation statuses of the tumor. As for images from normal tissue, His‐MMDM can infer its appearance when one or more of the tumor suppressors/epigenetic factors are mutated (Figure 5J). As for images from tumor tissue with some somatic mutations, His‐MMDM can traceback to its WT appearance or infer later stages of the tumor with more accumulating mutations. For instance, for an image from the prostate tumor with existing mutations in TP53 and KMT2D, His‐MMDM can edit it into more or fewer mutations in the tumor suppressors/epigenetic factors and can freely modify it into versions when the mutations of tumor suppressors and epigenetic factors are applied either together or alone (Figure 5K).

2.6. Transcriptomics‐Guided Editing of Histopathological Images

We subsequently investigated the effect of modifying transcriptomic profiles on the generated histopathological images. As transcriptomic profiles are shaped by regulations of pathways, we first investigated the effect of the top 10 MSigDB transcriptomic pathways that are most dysregulated in TCGA tumor samples (Methods). To exaggerate the effect of transcriptomics on the generated images, for each histopathological image in the TCGA cohort, we used His‐MMDM to produce one version of it corresponding to when a pathway was knocked up and another one when it was knocked down. Similarly, we computed the cosine distance using Inception V3 (Figure 6A), CHIEF (Figure S8A), and structural distance (Figure S8B) between the two versions of the image and visualized them accordingly. Most notably, the manipulation of the ‘Mitotic Spindle’ pathway resulted in the greatest change in histopathology in almost all tumor types (Figure 6A; Figure S8A,B). The result is not surprising as tumor proliferation through mitosis usually has a high correlation with histopathological grading. Targets of the transcription factor Myc (‘Myc Targets V1’) are influential in the tumorigenesis of multiple GI tumors, and therefore, they display a high impact on the histopathology of COAD, READ, and STAD (Figure 6A; Figure S8A–C). The manipulation of transcriptomic pathways also had a greater histopathological effect in tumor types when their genes had previously reported associations with histopathology based on discriminative performance reported by Fu et al. [57] (p = 5.2e‐03). Similar to genomics‐guided editing, 20 out of the top 25 tumor type and pathway pairs displayed a statistically significant reduction in the distance to their nearest neighbors in the database curated for transcriptomics pathway up‐regulation (Figure S8D, Methods). Using these images as educational materials (Supplementary Data), we also observed a greater improvement in the pathologists' accuracy following exposure to these examples (Figure S8E) than unpaired real images (Figure S8F). A binary classification model trained on 1K – 4K His‐MMDM generated images with or without the above pathway alterations (Methods) could achieve F1 score higher than 0.65 in three out of eight of them (Figure S9A). Using Hover‐Net, we observed a significant reduction in the number of malignant cells when either the ‘G2M checkpoint’ (in 9/19 tumor types) or the ‘DNA repair’ (in 7/19 tumor types) pathways were knocked up compared to when it was knocked down (Figure 6B,C; Figure S9B,C), suggesting that His‐MMDM can indeed capture the effect of the activation of such pathways in limiting tumor growth potential.

The Transcriptomics‐guided editing of histopathological images. (A) The effect of transcriptomic pathway manipulations on the histopathological images by each tumor type, measured by the cosine distances between a knocked‐down version and a knocked‐up version. The top 10 pathways with the highest t‐statistics between TCGA tumor and normal samples are selected for the experiment. The cosine distances aggregated by each pathway‐tumor type pair through averaging are normalized either by each tumor type (circle color) or cross‐tumor (circle size). Bar plots aggregate per each pathway (row) or tumor type (row). The manipulation of pathways would have a greater effect on histopathology if the pathways have genes with previously reported histopathological associations by Fu et al. (Wilcoxon rank‐sum test; p = 5.2e‐03). (B) The knock‐up of the pathway ‘G2M checkpoint’ reduces the number of malignant cells detected by Hover‐Net in OV and UCEC respectively. The knock‐up of the pathway ‘DNA repair’ reduces the number of malignant cells detected by Hover‐Net in UCEC and COAD respectively. Each dot represents a particular image when a particular pathway is knocked down (blue) or knocked up (red). (C) Histopathological image examples illustrating the effect of knock‐up and knock‐down of the ‘G2M checkpoint’ and the ‘DNA repair’ pathways in OV, UCEC, and COAD. Structural distance maps are shown to highlight the differences. (D) The effect of transcriptomic manipulations of high‐frequency SCNV genes reported by Aaltonen et al., measured by the cosine distances between a knocked‐down version and a knocked‐up version. Bar plots aggregate per each gene (row) or tumor type (row). The manipulation of genes would have a greater effect on histopathology if the genes have previously reported histopathological associations by Fu et al. (Wilcoxon rank‐sum test; all gene‐tumor pairs: p = 0.04; only genes with high amplification rates (>10%): p = 0.02). (E) Editing of histopathological images based on transcriptomic signatures of immune profiles that result in a significantly higher number of necrotic cells detected by Hover‐Net. An arrow is drawn from one immune subtype to another if the increase of such cells is statistically significant (Wilcoxon signed‐rank test, p < 0.001). (F) Histopathological image examples illustrating the effect of editing images into different immune subtypes from their original ones.

Aside from pathway‐level alterations, transcriptional changes in individual genes could also be critical in tumorigenesis. Most notably, the amplification of genomic segments can directly alter the transcriptomic expression level of genes through somatic copy number variations (SCNVs). We investigated the five genes with high pan‐cancer SCNV rates reported by Aaltonen et al. [64] that are also available in the transcriptomic embeddings of His‐MMDM. These include the prominent oncogenic transcription factor, MYC, a core component of the PI3K signaling pathway, PIK3CA, two genes from the cyclin family, CCNE1 and CCND1, and one gene from the Bcl‐2 family, MCL1. We used His‐MMDM to produce one image with an average expression level of the genes in the normal samples and compared it against another version when each of their expression levels was elevated (3 × s.d. above the average). Most notably, we observed a greater effect of the cyclin genes, CCNE1 and CCND1, on the tumor histopathology, suggesting the SCNVs that promote the progression of the cell cycle may have a universal effect across tumor types (Figure 6D; Figure S9D). In line with the results of Myc target genes in the pathway analysis, the gene MYC itself also induced strong histopathological alterations in multiple GI tumor types (Figure 6D; Figure S9D). As before, for these SCNV genes, there is also an association between their greater histopathological effect and previously reported discriminative performance on their transcriptomic levels (p = 0.04 overall and p = 0.02 if only tumor types where the genes have high amplification rates are considered).

The immune microenvironment of tumors plays an essential role in tumor development, subtyping, and prognosis [65]. Thorsson et al. systematically characterized the immunogenomic signatures in TCGA and classified each tumor sample into six immune subtypes (C1‐C6) based on the clustering of transcriptomic profiles of key immune pathways [65]. As it is reported that tumor samples in such immune subtypes display distinct characteristics such as tumor proliferation and the types of infiltrated immune cells, we were interested in investigating whether His‐MMDM can reproduce such observations. We selected the most abundant three (C1, C2, and C3) out of the six immune subtypes studied by Thorsson et al. and dropped the three kidney tumor types due to their low numbers of C1 and C2 immune subtypes (see Figure 1D of Thorsson et al. [65]). We then used His‐MMDM to edit the images from one immune subtype into another one (e.g. editing C1 images into C2 or C3) by manipulating the five underlying pathways that define the immune subtypes (Methods, Table S6). The detection of cell types using Hover‐Net on the images before and after His‐MMDM's edit revealed an increase (with a few exceptions) in the number of necrotic cells when translating images from C1 to C3 (in five tumor types) and C2 to C3 (in five tumor types) (Figure 6E,F; Figure S9E) which is consistent with C3's lower tumor cell proliferation and high inflammatory response. As a negative control, we used randomly selected sets of genes (while keeping the sizes of the gene sets the same) (Table S6) for the underlying immune pathways and did not observe statistically significant increments in most cases (Figure S9F). We finally demonstrate through an example when His‐MMDM is guided by the combination of mutation and transcriptomic profiles to edit histopathological images between BRAF‐like and RAS‐like thyroid tumors, and the consensus molecular subtypes (CMS) of colon tumor [66]. However, we delegate such discussions to the Supplementary Materials (Section S3; Figure S9G,H; Figure S10A,B).

3. Discussion

In this study, we have developed His‐MMDM, a histopathological image translation model that translates histopathological images between multiple categorial domains or edits them guided by genomic or transcriptomic profiles (Table 1). A lot of previous evidence has suggested the feasibility of building such general image translation models across histopathological domains. First, several previous studies have pointed out the existence of conserved histopathological image features across different tumor types [41, 42]. His‐MMDM also demonstrates this via explicitly translating images and transferring knowledge between them (Figures 3 and 4). Second, pan‐cancer studies have demonstrated through discriminative models the high predictive performance of the tumor origin from histopathological images [43], multiple genetic mutations [57], and multiple genes’ expression levels [58]. We demonstrated, for the first time, that such relationships can be manifested through a generative model, by explicitly generating histopathological images corresponding to in silico altered tumor types or genomic and transcriptomic profiles. Thirdly, several dedicated histopathological image translation models have already been proposed in specific application domains such as cryosection to FFPE translation and in silico virtual staining, which proves the applicability of image translation in histopathology [23, 24]. We have shown that the general His‐MMDM model can achieve performance comparable to these dedicated models. Most notably, we illustrated that such translation from cryosection to FFPE not only improves the visual quality for pathologists but is useful to improve the performance of existing histopathological foundation models as well.

Very recently, several new generative models to synthesize histopathological images based on genomics or transcriptomics have begun to emerge. For instance, Dolezal et al. proposed a conditional GAN (cGAN) based model [67] that can synthesize images of different subtypes in lung, breast, head & neck, and thyroid tumors. In particular, the model can synthesize thyroid tumor images from BRAF^V600E‐like ones to RAS‐like ones. Carrillo‐Perez et al. sequentially developed a GAN‐based [68] and a diffusion‐based generative model [69] for the synthesis of histopathological images from bulk RNA‐seq profiles. It is worth noting that these previous works are generative models conditioned only on a genomic/transcriptomic profile and synthesize new images from scratch. His‐MMDM, however, distinguishes itself as an image translation model that can edit existing images into new ones. This property is indispensable in our analyses of the effect of genomic/transcriptomic manipulations on real histopathological images. This also makes His‐MMDM more useful in real‐world applications in that it achieves multi‐omics‐guided editing of the existing images rather than the un‐constrained generation of new ones.

In recent years, GenAI‐based copilot systems have demonstrated significant potential to transform how we interact with computers [70, 71, 72]. In parallel, early efforts in biomedicine suggest that such systems could greatly assist biologists and healthcare professionals [17, 18, 73, 74]. As a GenAI model itself, His‐MMDM fits naturally into these paradigms. For example, it can be integrated as a plugin into interactive whole‐slide image (WSI) viewers such as QuPath [75] and ImageJ [76], allowing pathologists to interactively select regions in WSIs for image translation or editing. By simultaneously viewing both the original image and the AIGC, pathologists can gain more intuitive insights into the presence of specific somatic mutations or transcriptomic dysregulations within selected regions. Additionally, His‐MMDM can be invoked as a tool [77] within LLM‐based histopathological copilot systems, thereby extending their functionality to include image generation capabilities.

One limitation of His‐MMDM concerns its high computational demand. Due to the iterative nature of the diffusing and denoising processes of diffusion models, it still takes minutes to translate a single batch of histopathological image patches on a typical machine equipped with eight NVIDIA V100 GPUs. Strategies for developing more efficient diffusion models [78] can be adopted to streamline the process of image translation. Experiments with more efficient diffusion model solvers [79, 80] can also be an interesting exploration for future works. Currently, for simplicity, we trained His‐MMDM with fixed resolution (128×128) under a unified magnification level in each cohort. Extending His‐MMDM to simultaneously work at different magnification levels will further unleash its application potential.

4. Materials and Methods

4.1. Datasets

In this research, three cohorts of histopathological image datasets were used to train and test the His‐MMDM model under different image translation settings.

The Cancer Genome Atlas cohort (TCGA). We downloaded 22 596 whole‐side histopathological images from The Cancer Genome Atlas (TCGA) web portal (https://portal.gdc.cancer.gov/) of the 19 cancer types following the practice of [41]. The 19 cancer types contain five gastrointestinal (ESCA, STAD, COAD, READ, and PAAD), two lung (LUAD and LUSC), three kidney (KIRP, KICH, and KIRC), three pan‐gynecological (OV, BRCA, and UCEC), and five other tumor types (LIHC, HNSC, SARC, THCA, BLCA, and PRAD) (statistics shown in Table S2). These tumor types in TCGA were specially chosen because they contain enough numbers of tumor and normal slides, as well as enough numbers of FFPE and cryosectioned slides for analyses. The patient's metadata and clinical information were downloaded along with the WSIs. The processed genetic mutations (TCGA Unified Ensemble “MC3” gene‐level mutation calls) and the bulk transcriptomic profiles (Illumina Hi‐Seq) of the corresponding samples were downloaded from the UCSC Xena website (https://tcga.xenahubs.net) [81]. This cohort is used for training and testing His‐MMDM to translate across different primary tumor types and different genomic and transcriptomic profiles. For genomic profiles, the 522 genes we considered in total include the 334 genes from the ten oncological signaling pathways defined in [59], and 188 other genes with high pan‐cancer mutation rate or discriminative performance [57] in the TCGA cohort. We strictly followed the “MC3” gene‐level mutation calls’ definition and only non‐synonymous mutation statuses were considered for each gene. For transcriptomic profiles, we considered a total of 4 461 genes that are available in the TCGA transcriptomic profiles, including 4 193 genes that are included in the collection of 50 hallmark pathways of MSigDB (v2023.2) [60, 61], as well as an additional 268 genes that either displayed the greatest variability between tumor and normal samples (in terms of the t‐statistic) or previously reported discriminative performance [57]. During the evaluation of the His‐MMDM model, we prioritized the top 30 genes in terms of pan‐cancer mutation rate from nine oncological pathways for genomic‐guided editing and the top 10 pathways in terms of pan‐cancer t‐statistic from MSigDB for transcriptomic manipulation.

The Harbin Medical University Cancer Hospital Cohort (HMU‐C). This cohort contains 475 primary and metastatic lung tumor histopathological WSIs of 400 patients from the Harbin Medical University Cancer Hospital. This HMU‐C cohort consists of 315 WSIs from primary lung tumor tissues, 76 WSIs from lymph node metastases, and 84 WSIs from the brain metastases of the same set of patients. The primary and metastatic tumor tissues are from the most common lung tumor subtypes: adenocarcinoma, squamous cell carcinoma, and small cell carcinoma (statistics in Table S2). All slides in this cohort are FFPE slides.

The First Affiliated Hospital of Harbin Medical University Cohort (HMU‐1st). This cohort contains 6 200 brain tumor histopathological WSIs of 557 patients from the First Harbin Medical University Hospital. Among the 6 200 WSIs, 2 660 are glioma slides, and 3 540 are meningioma slides; 2 753 were H&E‐stained, and 3 447 were IHC‐stained with 14 different markers (Table S5). Some of these markers are useful for the confirmation of the source of the tumor tissue, such as ATRX; some are used to display specific cellular processes, such as Ki67; while others are useful in tumor subtyping and forecasting prognoses, such as IDH‐1 and MGMT (Table S5). Among them six markers are commonly used in both meningiomas and gliomas, four are used exclusively in gliomas and the other four are used exclusively in meningiomas (Table S5).

The protocols for the human studies comply with all relevant ethical regulations and are approved by the Ethics Committee of The First Harbin Medical University Hospital and The Harbin Medical University Cancer Hospital (KY2021‐42). The consent forms of the patients were waived before this research was carried out under the retrospective research protocol of the institutions.

4.2. Patch Selection and Patch Feature Extraction

The WSIs from each cohort were segmented for tissue regions from the empty slide background. The slide image was then converted to a binary mask using Otsu's thresholding method on the Gaussian blurred version of the saturation channel. Subsequently, using a sliding window‐based approach, the carved‐out tissue area is completely covered with image patches that are 256 × 256 (if the objective magnification is 20 ×) or 512 × 512 (if the objective magnification is 40 ×) in size. To select patches from the WSIs that contain representative tumor tissues, we extracted the image patches’ features using a pretrained (on ImageNet [82]) ViT‐L‐16 [83] model. From each cohort's extracted patches, we asked the pathologists to select a small set of positive patches (which contains representative tumor cells and microenvironment) and a small set of negative patches (in which the tumor tissue content is less than 10% of the area covered). We then classified the patches as positive/negative from each WSI based on their similarity to the positive/negative patch sets (as measured by the cosine similarity of their ViT‐L‐16 feature vectors). From each WSI, we randomly sampled a certain number of patches from the ‘positive patches’ determined above and then incorporated them in the train/test set of the His‐MMDM model. The patches were resized to 128×128 before being processed by the generative models. The statistics of the extracted patches in each cohort are shown in Table S2.

4.3. The Multi‐Domain Multi‐Omics Image Translation Model

His‐MMDM is an image translation model that learns a mapping $F : X^{(src)} \to X^{(trg)}$ that maps a source domain image X ^(src) to a target domain image X ^(trg). If the images in the source domain are distributed as p _src(X) and the images in the target domain are distributed as p _trg(X), a well‐trained image translation model is expected to have the property that $F (X^{(src)})$ distributes as p _trg(X). Although it is possible to translate images between different dimensions [28], we are assuming that X ^(src) and X ^(trg) to have the same dimensions in this paper for simplicity. This assumption is valid since we have resized images to represent the same physical dimension when they come from slides with different magnification levels.

Su et al. have demonstrated theoretically that such mapping $F$ corresponds to a deterministic solution of the probability flow ordinary differential equations (PF‐ODEs) of the Schrödinger Bridge Problem (SBP) between the two distributions p _src and p _trg [28]. The SBP from p _src to p _trg aims to establish the most likely evolutionary path of a distribution from p _src to p _trg. They also established the connection between the SBP and the score‐based generative modeling (SGM) of DMs and proved that SGM is a special case of the SBP when the evolution is from a particular data distribution (could be p _src or p _trg) to the multivariate Gaussian distribution. In this way, such mapping $F$ can be found by composing the forward (from p _src → N(0, I)) and backward diffusion processes (from N(0, I) → p _trg) of the SGM. This framework was named as denoising diffusion implicit bridges (DDIB). Concretely, during the forward diffusion processes, we obtain the latent noisy image X ^(lat) from a source domain image X ^(src) with

X^{(lat)} = ODESolve (X^{(src)}; f^{(src)}, t_{1} = 0, t_{2} = T)

where f ^(src) is the denoising diffusion model trained in the SGM of the source domain, ODESolve(· ; t ₁ = 0, t ₂ = T) represents the numerical approximation of the forward PF‐ODE that maps the source domain image X ^(src) to the latent image X ^(lat), and T is the preset number of discretization steps of the solver. If we assume that the SGM models perfectly the source domain score function and there is no discretization error in solving the PF‐ODE, sX ^(lat) should distribute as N(0, I). Symmetrically, we can then transform the latent image X ^(lat) to the target domain with

X^{(trg)} = ODESolve (X^{(lat)}; f^{(trg)}, t_{1} = T, t_{2} = 0)

where X ^(trg) is the generated target domain image, ODESolve(· ; t ₁ = T, t ₂ = 0) represents the numerical approximation of the backward PF‐ODE that maps the latent image to the target domain image, and f ^(trg) is the denoising diffusion model of the target domain SGM.

His‐MMDM focuses on the translation of histopathological images belonging to different domains. Such domains include categorical domains such as different organ sites of the tumor and different types of stains, as well as multi‐omic domains that correspond to different genomic or transcriptomic profiles. Therefore, the denoising diffusion model used in His‐MMDM needs to simultaneously take care of the three types of condition information. For clarity, we first write the categorical condition as $c \in C$ , where $C$ is a finite set containing all the possible categorical conditions that are handled by this model. Second, we use $G = {(g_{i}, m_{i})}_{i = 1}^{| G |}$ to hold the genomic mutation profiles of all the genes considered by the model (denoted by gene set $G$ ) for a particular train/test example, in wch g_i is the name of the ith gene and m_i is the Boolean value of its mutation status. Thirdly, we use $T = {(g_{i}, t_{i})}_{i = 1}^{| G |}$ to hold the transcriptomic expression levels of all the genes considered by the model, in which g_i is the name of the ith gene and t_i is the quantile‐normalized value of its expression level. In this way, the source domain model f ^(src) and the target domain model f ^(trg) can be obtained by providing one model parameterized by θ (f _θ) with different condition information, i.e., f _θ(· ; c ^(src), G ^(src), T ^(src)) and f _θ(· ; c ^(trg), G ^(trg), T ^(trg)), respectively. As the denoising diffusion model is usually implemented as a U‐Net architecture [32], the two types of condition information are added to each layer of the U‐Net model via the categorical embedding and the multi‐omics embedding, respectively:

Layer [l + 1] = Layer [l] + e_{cat} + e_{genomic} + e_{transcriptomic}

The categorical embeddings are initialized randomly and trained together with the parameters of the U‐Net. As for the genomic embeddings, for each gene g, His‐MMDM utilizes one embedding for the wide‐type (WT) version of the gene (e_g ) and the other for the mutated version of the gene ( ${\tilde{e}}_{g}$ ) and use a multi‐layer perceptron (MLP) to transform them, in other words:

e_{genomic} = \sum_{i = 1}^{|G|} ML P_{genomic} (e_{g_{i}} \cdot I \{m_{i} is false\} + {\tilde{e}}_{g_{i}} \cdot I \{m_{i} is true\})

To account for the quantitative effect of transcriptomics, $e_{transcriptomic}^{(i)}$ is implemented as two different networks (both as MLPs) that deal with genes and expression levels separately:

e_{transcriptomic} = \sum_{i = 1}^{|G|} ML P_{transcriptomic} (e_{g_{i}}) \times ML P_{\exp} (e_{g_{i}}, t_{i})

We first pre‐train the denoising diffusion model f _θ using the standard classifier‐guided diffusion training procedure on the domains of images between which we wish to translate [84]. We used the Algorithm 1 in [7] for the training procedure.

ALGORITHM 1. Image‐translation with His‐MMDM.

Function TRANSLATE‐IMAGE (X ^(src), f _θ, { c ^(src), G ^(src), T ^(src)} , { c ^(trg), G ^(trg), T ^(trg)})

Inputs:

X ^(src): the source image to translate

f _θ: the trained denoising diffusion model to use

{ c ^(src), G ^(src), T ^(src)}: the three types of condition information of the source image

{c ^(trg), G ^(trg), T ^(trg)}: the three types of condition information of the target image

Constants:

T: number of steps of the DDIM solver

outputs:

X ^(t): the translated image

X_{0}^{(src)} \leftarrow X^{(src)}

for t = 0… T − 1 do // Implementation of ODESolve(X ^(src); f ^(src),t ₁ = 0, t ₂ = T)

X_{t + 1}^{(src)} \leftarrow

\sqrt{{\bar{α}}_{t + 1}} {\hat{x}}_{0} (X_{t}^{(src)}, f_{θ} (\cdot | c^{(src)}, G^{(src)}, T^{(src)})) + \sqrt{1 - {\bar{α}}_{t + 1}} \hat{ε} (X_{t}^{(src)}, f_{θ} (\cdot | c^{(src)}, G^{(src)}, T^{(src)}))

end

X_{T}^{(trg)} \leftarrow X_{T}^{(src)}

for t = T…1 do // Implementation of ODESolve(X ^(trg); f ^(trg),t ₁ = T, t ₂ = 0)

X_{t - 1}^{(trg)} \leftarrow \sqrt{{\bar{α}}_{t - 1}} {\hat{x}}_{0} (X_{t}^{(trg)}, f_{θ} (\cdot | c^{(trg)}, G^{(trg)}, T^{(trg)})) + \sqrt{1 - {\bar{α}}_{t - 1}} \hat{ε} (X_{t}^{(trg)}, f_{θ} (\cdot | c^{(trg)}, G^{(trg)}, T^{(trg)}))

end

X^{(trg)} \leftarrow X_{0}^{(trg)}

return X ^(trg)

Open in a new tab

After f _θ is successfully trained, we use the denoising diffusion implicit models (DDIMs) as ODESolve for image translation. The process of it can be summarized as Algorithm 1.

To accelerate the diffusion sampling process without significantly sacrificing the quality of the sampled images, we employed the following procedure to empirically determine an optimal acceleration strategy for the DDIM sampling process:

On each dataset, we perform sampling on randomly selected samples using the full number of DDIM timesteps, i.e., 1000 steps.
On the same samples, we run the sampling procedure with a reduced number of evenly distributed timesteps and compare the Structural Similarity Index (SSIM) between the resulting images and those obtained in step (1).
We determine the optimal number (N*) of reduced timesteps using the elbow method.
We determine the best allocation of the N* timesteps (N* = N ₁ + N ₂) in the first (N ₁) and the second (N ₂) halves of the sampling process.

In our experiments, the optimal number of reduced timesteps varies between 100 and 800 across different datasets (Figure S10C). Step (4) is motivated by the empirical observation that allocating more sampling steps to the first half of the process often leads to better structural similarity in the resulting images (Figure S10C). In this way, we reduced the computational burden of the diffusion sampling procedure to 10%, 17.5%, and 80% on our three datasets respectively.

The details of the model architecture, hyperparameters and other configurations regarding training and translation are specified in Table S1.

For comparison, we compared the performance of His‐MMDM with two GAN‐based models, CycleGAN [26] and CUT [34]. To translate images from domain X to domain Y, CycleGAN simultaneously establishes two GANs where one translates images from X to Y and the other from Y to X. To ensure the consistency of the translated images, a cycle‐consistency loss is imposed to ensure the invariance when an image is translated from X to Y and then back from Y to X. The other GAN‐based model, CUT, resorts to a different approach. CUT establishes only one GAN model to achieve one‐way translation from X to Y. It ensures consistency of translation using patchwise contrastive learning [85, 86] to encourage image patches of the source and translated images of the same location to be similar.

4.4. Image Translation Tasks and the Evaluation of the Synthesized Images

We summarize the specific settings of image translation tasks, including the specifications of the model checkpoints, the dataset used the condition information of the translation tasks in Table S1. The Frechet Inception Distance (FID) [87] was used to evaluate the quality of a set of synthesized images against a set of real images. For baseline models (AI‐FFPE with its ablations, CycleGAN, CUT, StyleGAN2, and D2C), we trained and tested them on the same train/test split as His‐MMDM. The image features after the last pooling layer of an Inception V3 model [38] were used for the computation. Let (μ, Σ) and (μ₀,Σ₀) denote the mean and covariance of the Inception V3 features of the synthesized images and real images respectively. The FID score is defined as,

d_{FID} ((μ, Σ), (μ_{0}, Σ_{0})) = {‖μ - μ‖}_{0}_{2}^{2} + Tr (Σ + Σ_{0} - 2 {(Σ Σ_{0})}^{\frac{1}{2}})

To quantify the effect of image translation from cryosection to FFPE, across primary tumor types, and across tumor organ sites, we computed the difference in FID scores before and after image translation, i.e., ΔFID = d _FID((μ_before,Σ_before), (μ₀,Σ₀)) − d _FID((μ_after,Σ_after), (μ₀,Σ₀)). Although both CycleGAN and CUT are GAN‐based algorithms, CycleGAN utilizes cycle consistency for image translation while CUT leverages patch‐wise contrastive learning [85, 86]. We utilized the same discriminator MLP network, the GAN loss, and the PatchNCE loss as in the original CUT experiments. Based on the reported performance on TCGA, although patch‐wise contrastive learning is an effective objective to train image translation models between different photography domains, it seems to be less effective in histopathological domains, possibly due to the repetitive nature of the contents of the histopathological images which can confuse this contrastive objective. Although StyleGAN2 had been mainly used for paired image translation tasks in its original publication [35], we evaluated the performance of the StyleGAN2 network in unpaired image translation by using the StyleGAN2 network as the generator backbone in the CUT framework. D2C [36] is an image translation model based on latent diffusion. It performs image translation by using a discriminative classifier to guide the sampling process within the latent space. Designed primarily for few‐shot translation, D2C keeps the diffusion model, encoder, and decoder fixed during adaptation to new conditions. This makes it well‐suited for editing images with distinct, localized regions but less effective for broader image translation tasks, such as those in histopathology (Figure 2B).

To compute the magnitude of the effect of genomic‐ and transcriptomic‐guided editing, we used cosine distance rather than FID as we were interested in the editing effect on a single image rather than a collection of images. Besides the cosine distance of the Inception V3 deep features, we also evaluated the difference between image pairs based on their deep features extracted by the pre‐trained model CHIEF [31], or the structural similarity index (SSIM) [88]. SSIM between two image regions x and y is defined as,

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

where μ_x and μ_y are the pixel mean values of x and y, $σ_{x}^{2}$ and $σ_{y}^{2}$ are the pixel variance values of x and y, σ_xy is the covariance of x and y, and c ₁ and c ₂ are constant parameters. The full SSIM score of two images is computed and averaged using a sliding window approach that fully iterates over the two images, using the implementation from scikit‐image [89]. We convert SSIM into ‘structural distance’ by reversing and scaling it into the range [0, 1], using the formula $\frac{1 - SSIM (x, y)}{2}$ . Both the structural distance map (per sliding window) and the structural distance score (averaged for all sliding windows) are used in reporting.

Due to the vast difference between H&E and IHC‐stained images, we evaluated the quality of IHC virtual staining in a different way than the previous two tasks. Instead, we computed the ratio of the FID score between two different sets of real IHC images and the FID score between synthesized and real IHC images, i.e., $\frac{d_{FID} ((μ_{0}^{'}, Σ_{0}^{'}), (μ_{0}, Σ_{0}))}{d_{FID} ((μ_{syn}, Σ_{syn}), (μ_{0}, Σ_{0}))}$ . We refer to this as the ‘inverted normalized FID score’ in the main text. In this way, both the effect of image translation and the inherent variability of images of an IHC‐stain can be taken into account, and the inversion makes the metric greater if the synthetic IHC images are more similar to the real IHC images.

4.5. Enhancing the Performance of Pre‐Trained Foundation Models in Histopathology

Previous studies have demonstrated the practical value of image translation models, which convert cryosectioned slide images into FFPE ones, in enhancing intra‐operative cryosectioned slide images [24, 25]. However, in this work, we highlight a different perspective by showing that these image translation models can improve the performance of existing pre‐trained foundation models in histopathology. This is especially relevant since many foundation models, such as PLIP [14], Prov‐GigaPath [30], and CHIEF [31] have been trained predominantly on vast collections of FFPE slides rather than cryosectioned ones.

Similar to the experiments conducted in PLIP, we performed image‐to‐image retrieval using image features encoded by the PLIP image encoder. The retrieval process involved searching a database of TCGA FFPE images from 14 sites, based on their cosine distance to the query image. Following the methodology outlined by Chen et al., 2022 [90], we assessed retrieval accuracy using a majority voting scheme (“mMV”@k), which measures the accuracy of the predicted tumor site of the query image based on the majority vote of the top‐k retrieved images. To evaluate the improvement in PLIP's performance on text‐guided classification using His‐MMDM's translated images on TCGA cryosectioned slides (Figure S1A), we had our pathologists annotate a small subset of image patches from the cryosectioned slides of the TCGA colon cancer cohort, focusing on the five most abundant classes in the Kather colon dataset [91]. We employed prompts in the format ‘a xx histopathological image,’ where ‘xx’ corresponded to one of the following: ‘adipose,’ ‘background,’ ‘epithelium,’ ‘lymphocyte,’ or ‘mucus.’ The accuracy of the PLIP model was then evaluated based on the consistency between the model's prediction (the prompt whose PLIP embedding was closest to the image embedding) and our pathologist's annotations.

In contrast to the multi‐modal PLIP model, the CHIEF model [31] is an image‐only histopathological model pretrained on vast amounts (60,530 WSIs) of real‐world data. Although the dataset consists of images from diverse tumor types (19 anatomical sites), they are still mainly FFPE rather than cryosectioned images. We obtained the cryosectioned slide images from the colorectal tumor types (COAD and READ) and the glioma tumor types (GBM and LGG) of TCGA. We then evaluated the fine‐tuning performance of microsatellite instability (MSI) and IDH1 mutation status prediction of the pretrained CHIEF model using either the original cryosectioned images or the FFPE images synthesized from them by His‐MMDM. The fivefold cross‐validation results are reported.

4.6. Stain Quantification

To compute the positivity of the IHC stains in the translated images, we applied the color deconvolution algorithm [92] (implemented by the ‘scikit‐image’ package [89]) to the images and obtained the intensity of the two stains components used in our HMU‐1st cohort, i.e., hematoxylin (as background) and 3, 3'‐diaminobenzidine (DAB, as positive signal). The averaged intensity of the DAB stain is used as the metric of the positivity of the patch.

4.7. Outlier Detection of Translated Histopathological Images

When His‐MMDM translates an image in the source domain that is unlikely to have a corresponding image in the target domain, the translated image tends to look very eccentric (Figure 4C). To quantify the eccentricity of the translated images, we fitted an outlier detection model (Isolation Forest [49], implemented by the ‘scikit‐learn’ package [93]) on the Inception V3 features of them. Using the fitted model, we were able to obtain an ‘inlier score’ in the range of [− 1, 1] of each sample, where a score close to 1 indicates that the image is a perfect inlier and a score close to ‐1 indicates that the image is likely to be an outlier. Superimposing this score on the tSNE plot of the Inception V3 features of the translated patch images also shows that the patches with low scores tend to be located on the edge of the tSNE structure. Using the trained His‐MMDM model and the fitted outlier detection model, we applied them to the TCGA LUAD and LUSC cohorts. To obtain an inlier score of a WSI, we sampled 50 image patches from it and aggregated their scores by averaging them. Similarly, we obtained the scores of each patient by averaging the inlier scores of all their available WSIs. Finally, we analyzed the relationship between a patient's inlier score and the patient's staging and survival information. The Kaplan‐Meier method was used to estimate the survival function of the patients under each condition. P values of the fitted Cox proportional hazard models were reported as statistical significance values.

4.8. Additional Evaluation of Categorical Histopathological Translation Tasks

We trained ResNet50 [94] models for the binary classification tasks that were performed on the translated images in the primary tumor type translation task (Figure 3C). We report the F1 score of the classification performance of the trained ResNet50 model using the real/translated images by His‐MMDM. In addition to the quantitative evaluation metrics described earlier, we conducted an independent qualitative assessment by asking four pathologists (T.S., G.S., T.F., L.Z.) to review the His‐MMDM synthesized images across three tasks: cryosection to FFPE translation, tumor type translation, and organ site translation. For the cryosection to FFPE task, the pathologists rated each image patch on a 1–5 scale across four criteria: nucleus shape, cell distribution, background contrast/enhancement, and improvement of low‐quality areas or artifact removal. For the tumor type and organ site translation tasks, they evaluated each patch on a 1–5 scale based on two criteria: retainment of cellular content and appropriate modification of the background to reflect the target tumor type or organ site. Scores for each patch were averaged across the four pathologists. Statistical significance was assessed using Wilcoxon signed‐rank tests, comparing the average score of each patch against the neutral score of 3.

4.9. Additional Evaluation of Genomics and Transcriptomics‐Guided Editing of Histopathological Images

In addition to qualitatively evaluating genomics‐ and transcriptomics‐guided image editing by comparing and visualizing the edited images against the originals, we implemented supplementary tests to further validate their alignment with real‐world data. First, we curated a database of histopathological images, each characterized by a specific somatic mutation or the up/down regulation of transcriptional pathways from MSigDB. These images were sourced from TCGA slides that were reserved exclusively for testing and not used during model training. The selected images represent regions with high epithelial tissue content, as annotated by pathologists. For each somatic mutation or pathway alteration, 200 patch images were stored for each condition based on whole exome sequencing and transcriptomic profiling. This database serves as a comprehensive resource for examining the histopathological appearance of somatic mutations and pathway changes. We then used the original (q) and edited (q̃) histopathological images as queries to this database. Images ({r_i })) from a certain condition were retrieved based on the cosine similarity of their PLIP features with the query images. For the top‐K retrieved images, we calculated the average cosine distance to assess the overall similarity,

{\bar{d}}_{\cos} (q) = \sum_{i = 1}^{K} \frac{d_{\cos} (q, r_{i})}{K}

When q is the original image without a certain mutation or pathway dysregulation and $\tilde{q}$ is the His‐MMDM‐edited image by enabling such conditions, we use the percentage reduction in this metric ( $\frac{{\bar{d}}_{\cos} (q) - {\bar{d}}_{\cos} (\tilde{q})}{{\bar{d}}_{\cos} (q)}$ ) as the effectiveness of image editing. A higher value in this metric indicates better consistency between His‐MMDM‐edited images to the real images in a particular genomic or transcriptomic condition. K is set to 5 in practice.

In addition to the evaluation using the curated database, we further utilized the two sets of images generated by His‐MMDM (with or without mutations or pathway up‐/down‐regulation) as training data for an independent deep learning model or educational examples for pathologists. For the independent deep learning model, we trained a ResNet50 [94], using 1, 2, 3, and 4 K His‐MMDM generated images and evaluated its performance in classifying real histopathological images with/without a particular mutation or pathway alteration. For pathologists, we randomly sampled ten histopathological image pairs generated by His‐MMDM. We further sampled real histopathological images from the previous database, ensuring an equal number of positive and negative classes, and assessed the pathologists' accuracy in identifying mutations and pathway dysregulations both before and after reviewing the His‐MMDM‐generated content (Supplementary Data). We recruited two of our pathologists (G.S. and L.Z.) for this experiment. As a comparison, we also utilized real, unpaired histopathological images as training materials for the other two pathologists (T.S. and T.F.) and compared the improvement of pathologists’ classification accuracy under the two settings.

4.10. Statistical Analysis

Data pre‐processing in this study focused primarily on whole‐slide images (WSIs). For each cohort, WSIs were first processed to separate tissue regions from background. Specifically, the slide image was converted into a binary mask using Otsu's thresholding applied to a Gaussian‐blurred saturation channel. A sliding‐window procedure was then used to tile the identified tissue regions into image patches of size 256 × 256 pixels for slides scanned at 20× magnification or 512 × 512 pixels for those scanned at 40×. Bar and line plots are presented with 95% confidence intervals. No statistical methods were used to prospectively determine sample size. All available patient samples were included in the analyses. Wilcoxon rank‐sum tests and Wilcoxon signed‐rank tests were performed using the statistical functions from the Scipy package (scipy.stats) [95]. Statistical significance (p values) was reported in the respective figures (* p < 0.05, **p < 0.01, *** p < 0.001, **** p < 0.0001, ns – no significant difference).

Author Contributions

Z.L., P.L., and X.G.: conceptualization. Z.L. and X.G.: methodology. Z.L., T.S., B.Z., W.H., S.Z., G.S., Y.C., X.C., J.Q., and Y.W.: investigation. Z.L. and T.S.: visualization. S.Z., H.M., P.L., and X.G.: supervision. Z.L., T.S., B.Z., and X.G.: writing – original draft. Z.L., T.S., B.Z., W.H., S.Z., G.S., Y.C., X.C., J.Q., Y.W., S.Z., H.M., P.L., and X.G.: writing – review and editing.

Funding

This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No REI/1/5289‐01‐01, REI/1/5992‐01‐01, URF/1/6713‐01‐01, FCC/1/5932‐12‐11, URF/1/6599‐01‐02, FCC/1/5940‐20‐03, Center of Excellence for Smart Health (KCSH), under award number 5932, and Center of Excellence on Generative AI, under award number 5940.

Conflicts of Interest

The authors declare no conflict of interest.

Supporting information

Supporting File: advs74015‐sup‐0001‐SuppMat.pdf

ADVS-13-e18066-s001.pdf^{(12.1MB, pdf)}

Acknowledgements

We gratefully acknowledge the contributions of pathologists Tianjiao Fu (TF) from Harbin Medical University Cancer Hospital and Lu Zheng (LZ) from The Fourth Affiliated Hospital of Harbin Medical University for their participation in the evaluation of the histopathological images. The results presented here are, in whole or in part, based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Contributor Information

Shiguang Zhao, Email: guangsz@hotmail.com.

Hongxue Meng, Email: menghongxue@hrbmu.edu.cn.

Peng Liang, Email: liangpeng@hrbmu.edu.cn.

Xin Gao, Email: xin.gao@kaust.edu.sa.

Data Availability Statement

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.12636449, reference number 12636449.

References

1. Rombach R., Blattmann A., Lorenz D., Esser P., and Ommer B., “High‐resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , (IEEE; 2022): 10674–10685, 10.1109/CVPR52688.2022.01042. [DOI] [Google Scholar]
2. Midjourney Inc. Midjourney , accessed July 12, 2024, www.midjourney.com 2022.
3. Betker J., Goh G., Jing L., et al., “Improving Image Generation with Better Captions,” Computer Science 2, no. 3 (2023): 8. [Google Scholar]
4. Yang Z., Li L., Lin K., et al., “The Dawn of Lmms: Preliminary Explorations with gpt‐4v (ision),” arXiv preprint (2023): 17421. [Google Scholar]
5. Zhang L., Rao A., and Agrawala M., “Adding Conditional Control to Text‐to‐image Diffusion Models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , (IEEE, 2023): 3813–3824, 10.1109/ICCV51070.2023.00355. [DOI] [Google Scholar]
6. Goodfellow I., Pouget‐Abadie J., Mirza M., et al., “Generative Adversarial Nets,” in NeurIPS Proceedings , (NIPS, 2014). [Google Scholar]
7. Ho J., Jain A., and Abbeel P., “Denoising Diffusion Probabilistic Models,” Advances in Neural Information Processing Systems 33 (2020): 6840–6851. [Google Scholar]
8. Hanna M. H., Dolezal J., Doytcheva K., et al., “Latent Transcriptional Programs Reveal Histology‐Encoded Tumor Features Spanning Tissue Origins,” BioRxiv (2023): 533810. [Google Scholar]
9. Moghadam P. A., Dalen S. V., Martin K. C., et al., “A Morphology Focused Diffusion Probabilistic Model for Synthesis of Histopathology Images,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , (IEEE, 2023), 10.1109/WACV56688.2023.00204. [DOI] [Google Scholar]
10. Shrivastava A. and Fletcher P. T., “NASDM: Nuclei‐Aware Semantic Histopathology Image Generation Using Diffusion Models,” arXiv preprint (2023): 11477. [Google Scholar]
11. Wong W. S., Amer M., Maul T., Liao I. Y., and Ahmed A., “Conditional Generative Adversarial Networks for Data Augmentation in Breast Cancer Classification,” in Recent Advances on Soft Computing and Data Mining: Proceedings of the Fourth International Conference on Soft Computing and Data Mining (SCDM 2020) , (Springer, 2020). [Google Scholar]
12. Pandey S., Singh P. R., and Tian J., “An Image Augmentation Approach Using Two‐Stage Generative Adversarial Network for Nuclei Image Segmentation,” Biomedical Signal Processing and Control 57 (2020): 101782, 10.1016/j.bspc.2019.101782. [DOI] [Google Scholar]
13. Xue Y., Zhou Q., Ye J., et al., “Synthetic Augmentation and Feature‐based Filtering for Improved Cervical Histopathology Image Classification,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference , (Springer, 2019). [Google Scholar]
14. Huang Z., Bianchi F., Yuksekgonul M., Montine T. J., and Zou J., “A Visual–Language Foundation Model for Pathology Image Analysis Using Medical Twitter,” Nature Medicine 29 (2023): 2307–2316, 10.1038/s41591-023-02504-3. [DOI] [PubMed] [Google Scholar]
15. Chen R. J., Ding T., Lu M. Y., et al., “Towards a General‐Purpose Foundation Model for Computational Pathology,” Nature Medicine 30, no. 3 (2024): 850–862, 10.1038/s41591-024-02857-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Visual‐Language Foundation Model for Computational Pathology,” Nature Medicine 30, no. 3 (2024): 863–874, 10.1038/s41591-024-02856-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Multimodal Generative AI Copilot for Human Pathology,” Nature 634 (2024): 466–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Zhou J., He X., Sun L., et al., “Pre‐Trained Multimodal Large Language Model Enhances Dermatological Diagnosis Using SkinGPT‐4,” Nature Communications 15, no. 1 (2024): 5649, 10.1038/s41467-024-50043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Zanjani F. G., Zinger S., Bejnordi B. E., and AWM van der Laak J., “Histopathology Stain‐color Normalization Using Deep Generative Models,” in 1st Conference on Medical Imaging with Deep Learning (MIDL 2018) , (MIDL, 2018). [Google Scholar]
20. de Haan K., Zhang Y., Zuckerman J. E., et al., “Deep Learning‐Based Transformation of H&E Stained Tissues into Special Stains,” Nature Communications 12, no. 1 (2021): 4884, 10.1038/s41467-021-25221-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Rivenson Y., Wang H., Wei Z., et al., “Virtual Histological Staining of Unlabelled Tissue‐Autofluorescence Images via Deep Learning,” Nature Biomedical Engineering 3, no. 6 (2019): 466–477, 10.1038/s41551-019-0362-y. [DOI] [PubMed] [Google Scholar]
22. Ghahremani P., Li Y., Kaufman A., et al., “Deep Learning‐Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification,” Nature Machine Intelligence 4, no. 4 (2022): 401–412, 10.1038/s42256-022-00471-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Pati P., Karkampouna S., Bonollo F., et al., “Accelerating Histopathology Workflows with Generative AI‐Based Virtually Multiplexed Tumour Profiling,” Nature Machine Intelligence 6 (2024): 1077–1093, 10.1038/s42256-024-00889-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Ozyoruk K. B., Can S., Darbaz B., et al., “A Deep‐Learning Model for Transforming the Style of Tissue Images from Cryosectioned to Formalin‐Fixed and Paraffin‐Embedded,” Nature Biomedical Engineering 6, no. 12 (2022): 1407–1419, 10.1038/s41551-022-00952-9. [DOI] [PubMed] [Google Scholar]
25. Liu Z., Chen L., Cheng H., et al., “Virtual Formalin‐Fixed and Paraffin‐Embedded Staining of Fresh Brain Tissue via Stimulated Raman CycleGAN Model,” Science Advances 10, no. 13 (2024): adn3426, 10.1126/sciadv.adn3426. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Zhu J.‐Y., Park T., Isola P., and Efros A. A., “Unpaired Image‐to‐Image Translation Using Cycle‐Consistent Adversarial Networks,” in Proceedings of the IEEE International Conference on Computer Vision , (IEEE, 2017): 2242–2251. [Google Scholar]
27. Dhariwal P. and Nichol A., “Diffusion Models Beat Gans on Image Synthesis,” Advances in Neural Information Processing Systems 34 (2021): 8780–8794. [Google Scholar]
28. Su X., Song J., Meng C., and Ermon S., “Dual Diffusion Implicit Bridges for Image‐to‐Image Translation,” arXiv preprint (2022): 2203.08382.
29. Lu M. Y., Chen B., and Mahmood F., “Harnessing Medical Twitter Data for Pathology AI,” Nature Medicine 29, no. 9 (2023): 2181–2182, 10.1038/s41591-023-02530-1. [DOI] [PubMed] [Google Scholar]
30. Xu H., Usuyama N., Bagga J., et al., “A Whole‐Slide Foundation Model for Digital Pathology from Real‐World Data,” Nature 630 (2024): 181–188, 10.1038/s41586-024-07441-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Wang X., Zhao J., Marostica E., et al., “A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction,” Nature 634 (2024): 970–978, 10.1038/s41586-024-07894-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Ronneberger O., Fischer P., and Brox T., U‐Net: Convolutional Networks for Biomedical Image Segmentation Springer International Publishing, (2015). [Google Scholar]
33. Song J., Meng C., and Ermon S., “Denoising Diffusion Implicit Models,” arXiv preprint (2021): 2010.02502.
34. Park T., Efros A. A., Zhang R., and Zhu J. Y., “Contrastive Learning for Unpaired Image‐to‐Image Translation,” in European Conference on Computer Vision , (Cham: Springer International Publishing, 2020), 319–345. [Google Scholar]
35. Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., and Aila T., “Analyzing and Improving the Image Quality of Stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , (IEEE, 2020): 8107–8116, 10.1109/CVPR42600.2020.00813. [DOI] [Google Scholar]
36. Sinha A., Song J., Meng C., and Ermon S., “D2c: Diffusion‐Decoding Models for Few‐Shot Conditional Generation,” Advances in Neural Information Processing Systems 34 (2021): 12533–12548. [Google Scholar]
37. Radford A., Kim J. W., Hallacy C., and Ramesh A., “Learning Transferable Visual Models from Natural Language Supervision,” in International Conference on Machine Learning , (PMLR, 2021): 8748–8763. [Google Scholar]
38. Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., “Rethinking the Inception Architecture for Computer Vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2016, 2818–2826, 10.1109/CVPR.2016.308. [DOI] [Google Scholar]
39. He B., Bukhari S., Fox E., et al., “AI‐Enabled in Silico Immunohistochemical Characterization for Alzheimer's Disease,” Cell Reports Methods 2, no. 4 (2022): 100191, 10.1016/j.crmeth.2022.100191. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Gatys L. A., Ecker A. S., and Bethge M., “Image Style Transfer Using Convolutional Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , (IEEE, 2016): 2414–2423, 10.1109/CVPR.2016.265. [DOI] [Google Scholar]
41. Noorbakhsh J., Farahmand S., Foroughi pour A., et al., “Deep Learning‐Based Cross‐Classifications Reveal Conserved Spatial Behaviors within Tumor Histological Images,” Nature Communications 11, no. 1 (2020): 6367, 10.1038/s41467-020-20030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Menon A., Singh P., Vinod P. K., and Jawahar C. V., “Exploring Histological Similarities across Cancers from a Deep Learning Perspective,” Frontiers in oncology 12 (2022): 842759. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Lu M. Y., Chen T. Y., Williamson D. F. K., et al., “AI‐Based Pathology Predicts Origins for Cancers of Unknown Primary,” Nature 594, no. 7861 (2021): 106–110, 10.1038/s41586-021-03512-4. [DOI] [PubMed] [Google Scholar]
44. Tian F., Liu D., Wei N., et al., “Prediction of Tumor Origin in Cancers of Unknown Primary Origin with Cytology‐Based Deep Learning,” Nature Medicine 30 (2024): 1309–1319, 10.1038/s41591-024-02915-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Quaedvlieg P. J. F., Creytens D. H. K. V., Epping G. G., et al., “Histopathological Characteristics of Metastasizing Squamous Cell Carcinoma of the Skin and Lips,” Histopathology 49, no. 3 (2006): 256–264, 10.1111/j.1365-2559.2006.02472.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Wang X., Chen Y., Gao Y., et al., “Predicting Gastric Cancer Outcome from Resected Lymph Node Histopathology Images Using Deep Learning,” Nature Communications 12, no. 1 (2021): 1637, 10.1038/s41467-021-21674-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Huang S.‐C., Chen C.‐C., Lan J., et al., “Deep Neural Network Trained on Gigapixel Images Improves Lymph Node Metastasis Detection in Clinical Settings,” Nature Communications 13, no. 1 (2022): 3347, 10.1038/s41467-022-30746-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Krogue J. D., Azizi S., Tan F., et al., “Predicting Lymph Node Metastasis from Primary Tumor Histology and Clinicopathologic Factors in Colorectal Cancer Using Deep Learning,” Communications Medicine 3, no. 1 (2023): 59, 10.1038/s43856-023-00282-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Liu F. T., Ting K. M., and Zhou Z. H., “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining , (IEEE, 2008): 413–422, 10.1109/ICDM.2008.17. [DOI] [Google Scholar]
50. Zou Y., Bai H. X., Wang Z., and Yang L., “Comparison of Immunohistochemistry and DNA Sequencing for the Detection of IDH1 Mutations in Gliomas,” Neuro‐Oncology 17, no. 3 (2015): 477–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Bilal M., Raza S. E. A., Azam A., et al., “Development and Validation of a Weakly Supervised Deep Learning Framework to Predict the Status of Molecular Pathways and Key Mutations in Colorectal Cancer from Routine Histology Images: A Retrospective Study,” The Lancet Digital Health 3, no. 12 (2021): e763–e772, 10.1016/S2589-7500(21)00180-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Chen M., Zhang B., Topatana W., et al., “Classification and Mutation Prediction Based on Histopathology H&E Images in Liver Cancer Using Deep Learning,” npj Precision Oncology 4, no. 1 (2020): 14, 10.1038/s41698-020-0120-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Jain M. S. and Massoud T. F., “Predicting Tumour Mutational Burden from Histopathological Images Using Multiscale Deep Learning,” Nature Machine Intelligence 2, no. 6 (2020): 356–362, 10.1038/s42256-020-0190-5. [DOI] [Google Scholar]
54. Coudray N., Ocampo P. S., Sakellaropoulos T., et al., “Classification and Mutation Prediction from Non–small Cell Lung Cancer Histopathology Images Using Deep Learning,” Nature Medicine 24, no. 10 (2018): 1559–1567, 10.1038/s41591-018-0177-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Qu H., Zhou M., Yan Z., et al., “Genetic Mutation and Biological Pathway Prediction Based on Whole Slide Images in Breast Carcinoma Using Deep Learning,” npj Precision Oncology 5, no. 1 (2021): 87, 10.1038/s41698-021-00225-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Li Z., et al., “Vision Transformer‐Based Weakly Supervised Histopathological Image Analysis of Primary Brain Tumors,” Iscience 26, no. 1 (2023): 105872, 10.1016/j.isci.2022.105872. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Fu Y., Jung A. W., Torne R. V., et al., “Pan‐Cancer Computational Histopathology Reveals Mutations, Tumor Composition and Prognosis,” Nature Cancer 1, no. 8 (2020): 800–810, 10.1038/s43018-020-0085-8. [DOI] [PubMed] [Google Scholar]
58. Schmauch B., Romagnoni A., Pronier E., et al., “A Deep Learning Model to Predict RNA‐Seq Expression of Tumours from Whole Slide Images,” Nature communications 11, no. 1 (2020): 1–15, 10.1038/s41467-020-17678-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
59. Sanchez‐Vega F., Mina M., Armenia J., et al., “Oncogenic Signaling Pathways in the Cancer Genome Atlas,” Cell 173, no. 2 (2018): 321–337, 10.1016/j.cell.2018.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Subramanian A., Tamayo P., Mootha V. K., et al., “Gene Set Enrichment Analysis: A Knowledge‐based Approach for Interpreting Genome‐wide Expression Profiles,” Proceedings of the National Academy of Sciences 102, no. 43 (2005): 15545–15550, 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
61. Liberzon A., Birger C., Thorvaldsdóttir H., Ghandi M., Mesirov J. P., and Tamayo P., “The Molecular Signatures Database Hallmark Gene Set Collection,” Cell Systems 1, no. 6 (2015): 417–425, 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
62. Graham S., Vu Q. D., Raza S. E. A., et al., “Hover‐Net: Simultaneous Segmentation and Classification of Nuclei in Multi‐Tissue Histology Images,” Medical Image Analysis 58 (2019): 101563, 10.1016/j.media.2019.101563. [DOI] [PubMed] [Google Scholar]
63. Cai H., Zhang B., Ahrenfeldt J., et al., “CRISPR/Cas9 Model of Prostate Cancer Identifies Kmt2c Deficiency as a Metastatic Driver by Odam/Cabs1 Gene Cluster Expression,” Nature Communications 15, no. 1 (2024): 2088, 10.1038/s41467-024-46370-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
64. Aaltonen L. A., et al., “Pan‐Cancer Analysis of Whole Genomes,” Nature 578, no. 7793 (2020): 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Thorsson V., Gibbs D. L., Brown S. D., et al., “The Immune Landscape of Cancer,” Immunity 48, no. 4 (2018): 812–830, 10.1016/j.immuni.2018.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
66. Guinney J., Dienstmann R., Wang X., et al., “The Consensus Molecular Subtypes of Colorectal Cancer,” Nature Medicine 21, no. 11 (2015): 1350–1356, 10.1038/nm.3967. [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Dolezal J. M., et al., “Deep Learning Generates Synthetic Cancer Histology for Explainability and Education,” npj Precision Oncology 7, no. 1 (2023): 49, 10.1038/s41698-023-00399-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
68. Carrillo‐Perez F., Pizurica M., Ozawa M. G., et al., “Synthetic Whole‐Slide Image Tile Generation with Gene Expression Profile‐infused Deep Generative Models,” Cell Reports Methods 3, no. 8 (2023): 100534, 10.1016/j.crmeth.2023.100534. [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Carrillo‐Perez F., Pizurica M., Zheng Y., et al., “Generation of Synthetic Whole‐slide Image Tiles of Tumours from RNA‐sequencing Data via Cascaded Diffusion Models,” Nature Biomedical Engineering 9 (2024): 320–332, 10.1038/s41551-024-01193-8. [DOI] [PubMed] [Google Scholar]
70. AutoGPT, accessed July 12, 2024, https://news.agpt.co/ 2023.
71. Cao Y., “A Comprehensive Survey of Ai‐Generated Content (aigc): A History of Generative ai from gan to Chatgpt,” arXiv preprint (2023): 2303.04226.
72. Roziere B., Gehring J., Gloeckle F., et al., “Code Llama: Open Foundation Models for Code,” arXiv preprint (2023): 2308.12950.
73. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Foundational Multimodal Vision Language AI Assistant for Human Pathology,” arXiv preprint (2023): 2312.07814.
74. Zhou J., Zhang B., Li G., et al., “An AI Agent for Fully Automated Multi‐Omic Analyses,” BioRxiv (2023): 556814. [DOI] [PMC free article] [PubMed] [Google Scholar]
75. Bankhead P., Loughrey M. B., Fernández J. A., et al., “QuPath: Open Source Software for Digital Pathology Image Analysis,” Scientific Reports 7, no. 1 (2017): 16878, 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
76. Abràmoff M. D., Magalhães P. J., and Ram S. J., “Image Processing with ImageJ,” Biophotonics International 11, no. 7 (2004): 36–42. [Google Scholar]
77. Schick T., et al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” Advances in Neural Information Processing Systems 36 (2023): 68539–68551. [Google Scholar]
78. Ulhaq A., Akhtar N., and Pogrebna G., “Efficient Diffusion Models for Vision: a Survey,” arXiv preprint (2022): 2210.09292.
79. Lu C., Zhou Y., Bao F., Chen J., Li C., and Zhu J., “Dpm‐Solver: A Fast Ode Solver for Diffusion Probabilistic Model Sampling in around 10 Steps,” Advances in Neural Information Processing Systems 35 (2022): 5775–5787. [Google Scholar]
80. Lu C., Zhou Y., Bao F., Chen J., Li C., and Zhu J., “Dpm‐Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models,” arXiv preprint (2022): 2211.01095.
81. Goldman M., Craft B., Hastie M., et al., “The UCSC Xena Platform for Public and Private Cancer Genomics Data Visualization and Interpretation,” BioRxiv (2018): 326470. [Google Scholar]
82. Krizhevsky A., Sutskever I., and Hinton G. E., “Imagenet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems , (NIPS, 2012). [Google Scholar]
83. Dosovitskiy A., Beyer L., Kolesnikov A., et al., “An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv preprint (2020): 11929. [Google Scholar]
84. Nichol A. Q. and Dhariwal P., “Improved Denoising Diffusion Probabilistic Models,” International Conference on Machine Learning (2021): 8162–8171. [Google Scholar]
85. Henaff O., “Data‐Efficient Image Recognition with Contrastive Predictive Coding,” in International Conference on Machine Learning , (PMLR, 2020). [Google Scholar]
86. Bachman P., Hjelm R. D., and Buchwalter W., “Learning Representations by Maximizing Mutual Information across Views,” in Advances in Neural Information Processing Systems 30 , (NIPS, 2019). [Google Scholar]
87. Heusel M., Ramsauer H., Unterthiner T., Nessler B., and Hochreiter S., “Gans Trained by a Two Time‐scale Update Rule Converge to a Local Nash Equilibrium,” Advances in Neural Information Processing Systems 30 , (NIPS, 2017). [Google Scholar]
88. Zhou Wang, Bovik A. C., Sheikh H. R., and Simoncelli E. P., “Image Quality Assessment: from Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing 13, no. 4 (2004): 600–612, 10.1109/TIP.2003.819861. [DOI] [PubMed] [Google Scholar]
89. van der Walt S., Schönberger J. L., Nunez‐Iglesias J., et al., “Scikit‐Image: Image Processing in Python,” PeerJ 2 (2014): 453, 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]
90. Chen C., Lu M. Y., Williamson D. F. K., Chen T. Y., Schaumberg A. J., and Mahmood F., “Fast and Scalable Image Search for Histology,” arXiv preprint (2021): 2107.13587.
91. Kather J. N., Krisam J., Charoentong P., et al., “Predicting Survival from Colorectal Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study,” PLOS Medicine 16, no. 1 (2019): 1002730, 10.1371/journal.pmed.1002730. [DOI] [PMC free article] [PubMed] [Google Scholar]
92. Ruifrok A. C. and Johnston D. A., “Quantification of Histochemical Staining by Color Deconvolution,” Analytical and Quantitative Cytology and Histology 23, no. 4 (2001): 291–299. [PubMed] [Google Scholar]
93. Pedregosa F., Varoquaux G., Gramfort A., et al., “Scikit‐Learn: Machine Learning in Python,” Journal of Machine Learning Research 12 (2011): 2825–2830. [Google Scholar]
94. He K., Zhang X., Ren S., and Sun J., “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , (IEEE, 2016): 770–778, 10.1109/CVPR.2016.90. [DOI] [Google Scholar]
95. Virtanen P., Gommers R., Oliphant T. E., et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods 17, no. 3 (2020): 261–272, 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting File: advs74015‐sup‐0001‐SuppMat.pdf

ADVS-13-e18066-s001.pdf^{(12.1MB, pdf)}

Data Availability Statement

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.12636449, reference number 12636449.

[advs74015-bib-0001] 1. Rombach R., Blattmann A., Lorenz D., Esser P., and Ommer B., “High‐resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , (IEEE; 2022): 10674–10685, 10.1109/CVPR52688.2022.01042. [DOI] [Google Scholar]

[advs74015-bib-0002] 2. Midjourney Inc. Midjourney , accessed July 12, 2024, www.midjourney.com 2022.

[advs74015-bib-0003] 3. Betker J., Goh G., Jing L., et al., “Improving Image Generation with Better Captions,” Computer Science 2, no. 3 (2023): 8. [Google Scholar]

[advs74015-bib-0004] 4. Yang Z., Li L., Lin K., et al., “The Dawn of Lmms: Preliminary Explorations with gpt‐4v (ision),” arXiv preprint (2023): 17421. [Google Scholar]

[advs74015-bib-0005] 5. Zhang L., Rao A., and Agrawala M., “Adding Conditional Control to Text‐to‐image Diffusion Models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , (IEEE, 2023): 3813–3824, 10.1109/ICCV51070.2023.00355. [DOI] [Google Scholar]

[advs74015-bib-0006] 6. Goodfellow I., Pouget‐Abadie J., Mirza M., et al., “Generative Adversarial Nets,” in NeurIPS Proceedings , (NIPS, 2014). [Google Scholar]

[advs74015-bib-0007] 7. Ho J., Jain A., and Abbeel P., “Denoising Diffusion Probabilistic Models,” Advances in Neural Information Processing Systems 33 (2020): 6840–6851. [Google Scholar]

[advs74015-bib-0008] 8. Hanna M. H., Dolezal J., Doytcheva K., et al., “Latent Transcriptional Programs Reveal Histology‐Encoded Tumor Features Spanning Tissue Origins,” BioRxiv (2023): 533810. [Google Scholar]

[advs74015-bib-0009] 9. Moghadam P. A., Dalen S. V., Martin K. C., et al., “A Morphology Focused Diffusion Probabilistic Model for Synthesis of Histopathology Images,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , (IEEE, 2023), 10.1109/WACV56688.2023.00204. [DOI] [Google Scholar]

[advs74015-bib-0010] 10. Shrivastava A. and Fletcher P. T., “NASDM: Nuclei‐Aware Semantic Histopathology Image Generation Using Diffusion Models,” arXiv preprint (2023): 11477. [Google Scholar]

[advs74015-bib-0011] 11. Wong W. S., Amer M., Maul T., Liao I. Y., and Ahmed A., “Conditional Generative Adversarial Networks for Data Augmentation in Breast Cancer Classification,” in Recent Advances on Soft Computing and Data Mining: Proceedings of the Fourth International Conference on Soft Computing and Data Mining (SCDM 2020) , (Springer, 2020). [Google Scholar]

[advs74015-bib-0012] 12. Pandey S., Singh P. R., and Tian J., “An Image Augmentation Approach Using Two‐Stage Generative Adversarial Network for Nuclei Image Segmentation,” Biomedical Signal Processing and Control 57 (2020): 101782, 10.1016/j.bspc.2019.101782. [DOI] [Google Scholar]

[advs74015-bib-0013] 13. Xue Y., Zhou Q., Ye J., et al., “Synthetic Augmentation and Feature‐based Filtering for Improved Cervical Histopathology Image Classification,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference , (Springer, 2019). [Google Scholar]

[advs74015-bib-0014] 14. Huang Z., Bianchi F., Yuksekgonul M., Montine T. J., and Zou J., “A Visual–Language Foundation Model for Pathology Image Analysis Using Medical Twitter,” Nature Medicine 29 (2023): 2307–2316, 10.1038/s41591-023-02504-3. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0015] 15. Chen R. J., Ding T., Lu M. Y., et al., “Towards a General‐Purpose Foundation Model for Computational Pathology,” Nature Medicine 30, no. 3 (2024): 850–862, 10.1038/s41591-024-02857-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0016] 16. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Visual‐Language Foundation Model for Computational Pathology,” Nature Medicine 30, no. 3 (2024): 863–874, 10.1038/s41591-024-02856-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0017] 17. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Multimodal Generative AI Copilot for Human Pathology,” Nature 634 (2024): 466–473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0018] 18. Zhou J., He X., Sun L., et al., “Pre‐Trained Multimodal Large Language Model Enhances Dermatological Diagnosis Using SkinGPT‐4,” Nature Communications 15, no. 1 (2024): 5649, 10.1038/s41467-024-50043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0019] 19. Zanjani F. G., Zinger S., Bejnordi B. E., and AWM van der Laak J., “Histopathology Stain‐color Normalization Using Deep Generative Models,” in 1st Conference on Medical Imaging with Deep Learning (MIDL 2018) , (MIDL, 2018). [Google Scholar]

[advs74015-bib-0020] 20. de Haan K., Zhang Y., Zuckerman J. E., et al., “Deep Learning‐Based Transformation of H&E Stained Tissues into Special Stains,” Nature Communications 12, no. 1 (2021): 4884, 10.1038/s41467-021-25221-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0021] 21. Rivenson Y., Wang H., Wei Z., et al., “Virtual Histological Staining of Unlabelled Tissue‐Autofluorescence Images via Deep Learning,” Nature Biomedical Engineering 3, no. 6 (2019): 466–477, 10.1038/s41551-019-0362-y. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0022] 22. Ghahremani P., Li Y., Kaufman A., et al., “Deep Learning‐Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification,” Nature Machine Intelligence 4, no. 4 (2022): 401–412, 10.1038/s42256-022-00471-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0023] 23. Pati P., Karkampouna S., Bonollo F., et al., “Accelerating Histopathology Workflows with Generative AI‐Based Virtually Multiplexed Tumour Profiling,” Nature Machine Intelligence 6 (2024): 1077–1093, 10.1038/s42256-024-00889-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0024] 24. Ozyoruk K. B., Can S., Darbaz B., et al., “A Deep‐Learning Model for Transforming the Style of Tissue Images from Cryosectioned to Formalin‐Fixed and Paraffin‐Embedded,” Nature Biomedical Engineering 6, no. 12 (2022): 1407–1419, 10.1038/s41551-022-00952-9. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0025] 25. Liu Z., Chen L., Cheng H., et al., “Virtual Formalin‐Fixed and Paraffin‐Embedded Staining of Fresh Brain Tissue via Stimulated Raman CycleGAN Model,” Science Advances 10, no. 13 (2024): adn3426, 10.1126/sciadv.adn3426. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0026] 26. Zhu J.‐Y., Park T., Isola P., and Efros A. A., “Unpaired Image‐to‐Image Translation Using Cycle‐Consistent Adversarial Networks,” in Proceedings of the IEEE International Conference on Computer Vision , (IEEE, 2017): 2242–2251. [Google Scholar]

[advs74015-bib-0027] 27. Dhariwal P. and Nichol A., “Diffusion Models Beat Gans on Image Synthesis,” Advances in Neural Information Processing Systems 34 (2021): 8780–8794. [Google Scholar]

[advs74015-bib-0028] 28. Su X., Song J., Meng C., and Ermon S., “Dual Diffusion Implicit Bridges for Image‐to‐Image Translation,” arXiv preprint (2022): 2203.08382.

[advs74015-bib-0029] 29. Lu M. Y., Chen B., and Mahmood F., “Harnessing Medical Twitter Data for Pathology AI,” Nature Medicine 29, no. 9 (2023): 2181–2182, 10.1038/s41591-023-02530-1. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0030] 30. Xu H., Usuyama N., Bagga J., et al., “A Whole‐Slide Foundation Model for Digital Pathology from Real‐World Data,” Nature 630 (2024): 181–188, 10.1038/s41586-024-07441-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0031] 31. Wang X., Zhao J., Marostica E., et al., “A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction,” Nature 634 (2024): 970–978, 10.1038/s41586-024-07894-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0032] 32. Ronneberger O., Fischer P., and Brox T., U‐Net: Convolutional Networks for Biomedical Image Segmentation Springer International Publishing, (2015). [Google Scholar]

[advs74015-bib-0033] 33. Song J., Meng C., and Ermon S., “Denoising Diffusion Implicit Models,” arXiv preprint (2021): 2010.02502.

[advs74015-bib-0034] 34. Park T., Efros A. A., Zhang R., and Zhu J. Y., “Contrastive Learning for Unpaired Image‐to‐Image Translation,” in European Conference on Computer Vision , (Cham: Springer International Publishing, 2020), 319–345. [Google Scholar]

[advs74015-bib-0035] 35. Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., and Aila T., “Analyzing and Improving the Image Quality of Stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , (IEEE, 2020): 8107–8116, 10.1109/CVPR42600.2020.00813. [DOI] [Google Scholar]

[advs74015-bib-0036] 36. Sinha A., Song J., Meng C., and Ermon S., “D2c: Diffusion‐Decoding Models for Few‐Shot Conditional Generation,” Advances in Neural Information Processing Systems 34 (2021): 12533–12548. [Google Scholar]

[advs74015-bib-0037] 37. Radford A., Kim J. W., Hallacy C., and Ramesh A., “Learning Transferable Visual Models from Natural Language Supervision,” in International Conference on Machine Learning , (PMLR, 2021): 8748–8763. [Google Scholar]

[advs74015-bib-0038] 38. Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., “Rethinking the Inception Architecture for Computer Vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . IEEE, 2016, 2818–2826, 10.1109/CVPR.2016.308. [DOI] [Google Scholar]

[advs74015-bib-0039] 39. He B., Bukhari S., Fox E., et al., “AI‐Enabled in Silico Immunohistochemical Characterization for Alzheimer's Disease,” Cell Reports Methods 2, no. 4 (2022): 100191, 10.1016/j.crmeth.2022.100191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0040] 40. Gatys L. A., Ecker A. S., and Bethge M., “Image Style Transfer Using Convolutional Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , (IEEE, 2016): 2414–2423, 10.1109/CVPR.2016.265. [DOI] [Google Scholar]

[advs74015-bib-0041] 41. Noorbakhsh J., Farahmand S., Foroughi pour A., et al., “Deep Learning‐Based Cross‐Classifications Reveal Conserved Spatial Behaviors within Tumor Histological Images,” Nature Communications 11, no. 1 (2020): 6367, 10.1038/s41467-020-20030-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0042] 42. Menon A., Singh P., Vinod P. K., and Jawahar C. V., “Exploring Histological Similarities across Cancers from a Deep Learning Perspective,” Frontiers in oncology 12 (2022): 842759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0043] 43. Lu M. Y., Chen T. Y., Williamson D. F. K., et al., “AI‐Based Pathology Predicts Origins for Cancers of Unknown Primary,” Nature 594, no. 7861 (2021): 106–110, 10.1038/s41586-021-03512-4. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0044] 44. Tian F., Liu D., Wei N., et al., “Prediction of Tumor Origin in Cancers of Unknown Primary Origin with Cytology‐Based Deep Learning,” Nature Medicine 30 (2024): 1309–1319, 10.1038/s41591-024-02915-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0045] 45. Quaedvlieg P. J. F., Creytens D. H. K. V., Epping G. G., et al., “Histopathological Characteristics of Metastasizing Squamous Cell Carcinoma of the Skin and Lips,” Histopathology 49, no. 3 (2006): 256–264, 10.1111/j.1365-2559.2006.02472.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0046] 46. Wang X., Chen Y., Gao Y., et al., “Predicting Gastric Cancer Outcome from Resected Lymph Node Histopathology Images Using Deep Learning,” Nature Communications 12, no. 1 (2021): 1637, 10.1038/s41467-021-21674-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0047] 47. Huang S.‐C., Chen C.‐C., Lan J., et al., “Deep Neural Network Trained on Gigapixel Images Improves Lymph Node Metastasis Detection in Clinical Settings,” Nature Communications 13, no. 1 (2022): 3347, 10.1038/s41467-022-30746-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0048] 48. Krogue J. D., Azizi S., Tan F., et al., “Predicting Lymph Node Metastasis from Primary Tumor Histology and Clinicopathologic Factors in Colorectal Cancer Using Deep Learning,” Communications Medicine 3, no. 1 (2023): 59, 10.1038/s43856-023-00282-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0049] 49. Liu F. T., Ting K. M., and Zhou Z. H., “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining , (IEEE, 2008): 413–422, 10.1109/ICDM.2008.17. [DOI] [Google Scholar]

[advs74015-bib-0050] 50. Zou Y., Bai H. X., Wang Z., and Yang L., “Comparison of Immunohistochemistry and DNA Sequencing for the Detection of IDH1 Mutations in Gliomas,” Neuro‐Oncology 17, no. 3 (2015): 477–478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0051] 51. Bilal M., Raza S. E. A., Azam A., et al., “Development and Validation of a Weakly Supervised Deep Learning Framework to Predict the Status of Molecular Pathways and Key Mutations in Colorectal Cancer from Routine Histology Images: A Retrospective Study,” The Lancet Digital Health 3, no. 12 (2021): e763–e772, 10.1016/S2589-7500(21)00180-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0052] 52. Chen M., Zhang B., Topatana W., et al., “Classification and Mutation Prediction Based on Histopathology H&E Images in Liver Cancer Using Deep Learning,” npj Precision Oncology 4, no. 1 (2020): 14, 10.1038/s41698-020-0120-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0053] 53. Jain M. S. and Massoud T. F., “Predicting Tumour Mutational Burden from Histopathological Images Using Multiscale Deep Learning,” Nature Machine Intelligence 2, no. 6 (2020): 356–362, 10.1038/s42256-020-0190-5. [DOI] [Google Scholar]

[advs74015-bib-0054] 54. Coudray N., Ocampo P. S., Sakellaropoulos T., et al., “Classification and Mutation Prediction from Non–small Cell Lung Cancer Histopathology Images Using Deep Learning,” Nature Medicine 24, no. 10 (2018): 1559–1567, 10.1038/s41591-018-0177-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0055] 55. Qu H., Zhou M., Yan Z., et al., “Genetic Mutation and Biological Pathway Prediction Based on Whole Slide Images in Breast Carcinoma Using Deep Learning,” npj Precision Oncology 5, no. 1 (2021): 87, 10.1038/s41698-021-00225-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0056] 56. Li Z., et al., “Vision Transformer‐Based Weakly Supervised Histopathological Image Analysis of Primary Brain Tumors,” Iscience 26, no. 1 (2023): 105872, 10.1016/j.isci.2022.105872. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0057] 57. Fu Y., Jung A. W., Torne R. V., et al., “Pan‐Cancer Computational Histopathology Reveals Mutations, Tumor Composition and Prognosis,” Nature Cancer 1, no. 8 (2020): 800–810, 10.1038/s43018-020-0085-8. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0058] 58. Schmauch B., Romagnoni A., Pronier E., et al., “A Deep Learning Model to Predict RNA‐Seq Expression of Tumours from Whole Slide Images,” Nature communications 11, no. 1 (2020): 1–15, 10.1038/s41467-020-17678-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0059] 59. Sanchez‐Vega F., Mina M., Armenia J., et al., “Oncogenic Signaling Pathways in the Cancer Genome Atlas,” Cell 173, no. 2 (2018): 321–337, 10.1016/j.cell.2018.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0060] 60. Subramanian A., Tamayo P., Mootha V. K., et al., “Gene Set Enrichment Analysis: A Knowledge‐based Approach for Interpreting Genome‐wide Expression Profiles,” Proceedings of the National Academy of Sciences 102, no. 43 (2005): 15545–15550, 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0061] 61. Liberzon A., Birger C., Thorvaldsdóttir H., Ghandi M., Mesirov J. P., and Tamayo P., “The Molecular Signatures Database Hallmark Gene Set Collection,” Cell Systems 1, no. 6 (2015): 417–425, 10.1016/j.cels.2015.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0062] 62. Graham S., Vu Q. D., Raza S. E. A., et al., “Hover‐Net: Simultaneous Segmentation and Classification of Nuclei in Multi‐Tissue Histology Images,” Medical Image Analysis 58 (2019): 101563, 10.1016/j.media.2019.101563. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0063] 63. Cai H., Zhang B., Ahrenfeldt J., et al., “CRISPR/Cas9 Model of Prostate Cancer Identifies Kmt2c Deficiency as a Metastatic Driver by Odam/Cabs1 Gene Cluster Expression,” Nature Communications 15, no. 1 (2024): 2088, 10.1038/s41467-024-46370-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0064] 64. Aaltonen L. A., et al., “Pan‐Cancer Analysis of Whole Genomes,” Nature 578, no. 7793 (2020): 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0065] 65. Thorsson V., Gibbs D. L., Brown S. D., et al., “The Immune Landscape of Cancer,” Immunity 48, no. 4 (2018): 812–830, 10.1016/j.immuni.2018.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0066] 66. Guinney J., Dienstmann R., Wang X., et al., “The Consensus Molecular Subtypes of Colorectal Cancer,” Nature Medicine 21, no. 11 (2015): 1350–1356, 10.1038/nm.3967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0067] 67. Dolezal J. M., et al., “Deep Learning Generates Synthetic Cancer Histology for Explainability and Education,” npj Precision Oncology 7, no. 1 (2023): 49, 10.1038/s41698-023-00399-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0068] 68. Carrillo‐Perez F., Pizurica M., Ozawa M. G., et al., “Synthetic Whole‐Slide Image Tile Generation with Gene Expression Profile‐infused Deep Generative Models,” Cell Reports Methods 3, no. 8 (2023): 100534, 10.1016/j.crmeth.2023.100534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0069] 69. Carrillo‐Perez F., Pizurica M., Zheng Y., et al., “Generation of Synthetic Whole‐slide Image Tiles of Tumours from RNA‐sequencing Data via Cascaded Diffusion Models,” Nature Biomedical Engineering 9 (2024): 320–332, 10.1038/s41551-024-01193-8. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0070] 70. AutoGPT, accessed July 12, 2024, https://news.agpt.co/ 2023.

[advs74015-bib-0071] 71. Cao Y., “A Comprehensive Survey of Ai‐Generated Content (aigc): A History of Generative ai from gan to Chatgpt,” arXiv preprint (2023): 2303.04226.

[advs74015-bib-0072] 72. Roziere B., Gehring J., Gloeckle F., et al., “Code Llama: Open Foundation Models for Code,” arXiv preprint (2023): 2308.12950.

[advs74015-bib-0073] 73. Lu M. Y., Chen B., Williamson D. F. K., et al., “A Foundational Multimodal Vision Language AI Assistant for Human Pathology,” arXiv preprint (2023): 2312.07814.

[advs74015-bib-0074] 74. Zhou J., Zhang B., Li G., et al., “An AI Agent for Fully Automated Multi‐Omic Analyses,” BioRxiv (2023): 556814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0075] 75. Bankhead P., Loughrey M. B., Fernández J. A., et al., “QuPath: Open Source Software for Digital Pathology Image Analysis,” Scientific Reports 7, no. 1 (2017): 16878, 10.1038/s41598-017-17204-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0076] 76. Abràmoff M. D., Magalhães P. J., and Ram S. J., “Image Processing with ImageJ,” Biophotonics International 11, no. 7 (2004): 36–42. [Google Scholar]

[advs74015-bib-0077] 77. Schick T., et al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” Advances in Neural Information Processing Systems 36 (2023): 68539–68551. [Google Scholar]

[advs74015-bib-0078] 78. Ulhaq A., Akhtar N., and Pogrebna G., “Efficient Diffusion Models for Vision: a Survey,” arXiv preprint (2022): 2210.09292.

[advs74015-bib-0079] 79. Lu C., Zhou Y., Bao F., Chen J., Li C., and Zhu J., “Dpm‐Solver: A Fast Ode Solver for Diffusion Probabilistic Model Sampling in around 10 Steps,” Advances in Neural Information Processing Systems 35 (2022): 5775–5787. [Google Scholar]

[advs74015-bib-0080] 80. Lu C., Zhou Y., Bao F., Chen J., Li C., and Zhu J., “Dpm‐Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models,” arXiv preprint (2022): 2211.01095.

[advs74015-bib-0081] 81. Goldman M., Craft B., Hastie M., et al., “The UCSC Xena Platform for Public and Private Cancer Genomics Data Visualization and Interpretation,” BioRxiv (2018): 326470. [Google Scholar]

[advs74015-bib-0082] 82. Krizhevsky A., Sutskever I., and Hinton G. E., “Imagenet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems , (NIPS, 2012). [Google Scholar]

[advs74015-bib-0083] 83. Dosovitskiy A., Beyer L., Kolesnikov A., et al., “An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale,” arXiv preprint (2020): 11929. [Google Scholar]

[advs74015-bib-0084] 84. Nichol A. Q. and Dhariwal P., “Improved Denoising Diffusion Probabilistic Models,” International Conference on Machine Learning (2021): 8162–8171. [Google Scholar]

[advs74015-bib-0085] 85. Henaff O., “Data‐Efficient Image Recognition with Contrastive Predictive Coding,” in International Conference on Machine Learning , (PMLR, 2020). [Google Scholar]

[advs74015-bib-0086] 86. Bachman P., Hjelm R. D., and Buchwalter W., “Learning Representations by Maximizing Mutual Information across Views,” in Advances in Neural Information Processing Systems 30 , (NIPS, 2019). [Google Scholar]

[advs74015-bib-0087] 87. Heusel M., Ramsauer H., Unterthiner T., Nessler B., and Hochreiter S., “Gans Trained by a Two Time‐scale Update Rule Converge to a Local Nash Equilibrium,” Advances in Neural Information Processing Systems 30 , (NIPS, 2017). [Google Scholar]

[advs74015-bib-0088] 88. Zhou Wang, Bovik A. C., Sheikh H. R., and Simoncelli E. P., “Image Quality Assessment: from Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing 13, no. 4 (2004): 600–612, 10.1109/TIP.2003.819861. [DOI] [PubMed] [Google Scholar]

[advs74015-bib-0089] 89. van der Walt S., Schönberger J. L., Nunez‐Iglesias J., et al., “Scikit‐Image: Image Processing in Python,” PeerJ 2 (2014): 453, 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0090] 90. Chen C., Lu M. Y., Williamson D. F. K., Chen T. Y., Schaumberg A. J., and Mahmood F., “Fast and Scalable Image Search for Histology,” arXiv preprint (2021): 2107.13587.

[advs74015-bib-0091] 91. Kather J. N., Krisam J., Charoentong P., et al., “Predicting Survival from Colorectal Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study,” PLOS Medicine 16, no. 1 (2019): 1002730, 10.1371/journal.pmed.1002730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[advs74015-bib-0092] 92. Ruifrok A. C. and Johnston D. A., “Quantification of Histochemical Staining by Color Deconvolution,” Analytical and Quantitative Cytology and Histology 23, no. 4 (2001): 291–299. [PubMed] [Google Scholar]

[advs74015-bib-0093] 93. Pedregosa F., Varoquaux G., Gramfort A., et al., “Scikit‐Learn: Machine Learning in Python,” Journal of Machine Learning Research 12 (2011): 2825–2830. [Google Scholar]

[advs74015-bib-0094] 94. He K., Zhang X., Ren S., and Sun J., “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , (IEEE, 2016): 770–778, 10.1109/CVPR.2016.90. [DOI] [Google Scholar]

[advs74015-bib-0095] 95. Virtanen P., Gommers R., Oliphant T. E., et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods 17, no. 3 (2020): 261–272, 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

His‐MMDM: Multi‐Domain and Multi‐Omics Translation of Histopathological Images with Diffusion Models

Zhongxiao Li

Tianqi Su

Bin Zhang

Wenkai Han

Sibin Zhang

Guiyin Sun

Yuwei Cong

Xin Chen

Jiping Qi

Yujie Wang

Shiguang Zhao

Hongxue Meng

Peng Liang

Xin Gao

ABSTRACT

1. Introduction

2. Results

2.1. The Design of His‐MMDM Model

FIGURE 1.

TABLE 1.

2.2. Cryosection to FFPE Conversion and Virtual IHC Staining of Histopathological Images

FIGURE 2.

2.3. Knowledge Transfer of Histopathological Images Across Primary Tumor Types

FIGURE 3.

2.4. Knowledge Transfer of Histopathological Images Between Primary and Metastatic Organ Sites

FIGURE 4.

2.5. Genomics‐Guided Editing of Histopathological Images

FIGURE 5.

2.6. Transcriptomics‐Guided Editing of Histopathological Images

FIGURE 6.

3. Discussion

4. Materials and Methods

4.1. Datasets

4.2. Patch Selection and Patch Feature Extraction

4.3. The Multi‐Domain Multi‐Omics Image Translation Model

ALGORITHM 1. Image‐translation with His‐MMDM.

4.4. Image Translation Tasks and the Evaluation of the Synthesized Images

4.5. Enhancing the Performance of Pre‐Trained Foundation Models in Histopathology

4.6. Stain Quantification

4.7. Outlier Detection of Translated Histopathological Images

4.8. Additional Evaluation of Categorical Histopathological Translation Tasks

4.9. Additional Evaluation of Genomics and Transcriptomics‐Guided Editing of Histopathological Images

4.10. Statistical Analysis

Author Contributions

Funding

Conflicts of Interest

Supporting information

Acknowledgements

Contributor Information

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases