Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Jul 23:2024.07.19.604355. [Version 1] doi: 10.1101/2024.07.19.604355

AI-driven Discovery of Morphomolecular Signatures in Toxicology

Guillaume Jaume 1,2,3,4,, Thomas Peeters 1,10,, Andrew H Song 1,2,3,4, Rowland Pettit 1,2, Drew F K Williamson 1,8, Lukas Oldenburg 1, Anurag Vaidya 1,2,3,4,9, Simone de Brot 5,6,7, Richard J Chen 1,2,3,4, Jean-Philippe Thiran 10, Long Phi Le 2,11, Georg Gerber 1,9, Faisal Mahmood 1,2,3,4,11,*
PMCID: PMC11291055  PMID: 39091765

Abstract

Early identification of drug toxicity is essential yet challenging in drug development. At the preclinical stage, toxicity is assessed with histopathological examination of tissue sections from animal models to detect morphological lesions. To complement this analysis, toxicogenomics is increasingly employed to understand the mechanism of action of the compound and ultimately identify lesion-specific safety biomarkers for which in vitro assays can be designed. However, existing works that aim to identify morphological correlates of expression changes rely on qualitative or semi-quantitative morphological characterization and remain limited in scale or morphological diversity. Artificial intelligence (AI) offers a promising approach for quantitatively modeling this relationship at an unprecedented scale. Here, we introduce GEESE, an AI model designed to impute morphomolecular signatures in toxicology data. Our model was trained to predict 1,536 gene targets on a cohort of 8,231 hematoxylin and eosin-stained liver sections from Rattus norvegicus across 127 preclinical toxicity studies. The model, evaluated on 2,002 tissue sections from 29 held-out studies, can yield pseudo-spatially resolved gene expression maps, which we correlate with six key drug-induced liver injuries (DILI). From the resulting 25 million lesion-expression pairs, we established quantitative relations between up and downregulated genes and lesions. Validation of these signatures against toxicogenomic databases, pathway enrichment analyses, and human hepatocyte cell lines asserted their relevance. Overall, our study introduces new methods for characterizing toxicity at an unprecedented scale and granularity, paving the way for AI-driven discovery of toxicity biomarkers.

Introduction

Identifying and characterizing the potential toxicity of a drug early in its development is a major challenge for the pharmaceutical industry13. At the preclinical phase, toxicity is assessed in animal models through histological examination of tissue sections, in which pathologists report drug-induced lesions and abnormalities to determine the dose-response relationship of the compound (Fig. 1a). Despite advancements of in vitro assays for early toxicity detection, safety concerns remain the leading cause of drug attrition at the preclinical stage4. For these reasons, preclinical research increasingly relies on toxicogenomics58, such as gene expression profiling, to develop a mechanistic understanding of the drug action. By correlating changes in gene expression with specific morphological lesions, such as cellular necrosis, investigators can characterize the morphomolecular response of the compound. When validated across multiple studies and compounds, lesion-specific genetic biomarkers can serve as novel indicators for early toxicity detection from in vitro testing, overall enhancing the likelihood of successfully transitioning to early-stage clinical development9.

Figure 1: Study overview.

Figure 1:

a. At preclinical stage, drug candidates undergo toxicity assessment in animal models to characterize the dose-response relationship of the compound based on histological examination. Toxicogenomics can be employed to complement the compound characterization. b. Overview of TG-GATEs composed of 156 preclinical safety studies (and compounds) accounting for 10,234 pairs of hematoxylin and eosin (H&E) whole-slide images and gene expression profiles. TG-GATEs is split into a development set (127 studies, 8,232 slides) and a test set (29 studies, 2,002 slides). c. We developed two independent prediction models: (1) a morphological lesion prediction model (denoted as Lesion classifier), which classifies 256×256 pixels (or 128 μm) image patches into six lesions, and (2) a gene expression regression model (GEESE), which predicts bulk expression of 1,536 gene targets from an input tissue section. Feature attribution enables GEESE to derive patch-level expression profiles to yield pseudo-spatially resolved expression maps. d. The resulting output forms a dataset of 25 million predicted patch-level morphology-expression pairs, which we use for inferring and validating morphomolecular signatures across several scales, from patches (small regions of interest) to slides (entire tissue sections) to compounds (can include dozens of slides), then across several compounds, and finally across species (rat in vivo to human in vitro).

However, prior studies investigating relationships between gene expression changes and morphological lesions have relied on pathology reports for morphological characterization of the tissue, which provides limited information compared to the tissue itself1014. Therefore, these assessments remain qualitative and semi-quantitative, and may be subject to high interobserver variability, especially for severity scoring of lesions, which limits the ability to detect subtle associations. In addition, existing works and toxicity databases aggregating findings, such as the Comparative Toxicogenomics Database1517 are limited to reporting biomarkers linked to drug-induced injury rather than linking specific lesions, such as fatty change, to expression changes.

To overcome these limitations, artificial intelligence (AI) and computational pathology offer a promising approach for quantitatively and spatially modeling the relationship between morphology and gene expression changes at scale1820. Multiple works have shown the ability to predict molecular profiles, such as gene mutations2127, microsatellite instability28, and gene expression2932, directly from whole-slide images (WSIs). This direction holds promise for identifying morphological correlates of molecular alterations and ultimately for biomarker discovery25,33. Existing works in this area have primarily focused on cancer cohorts, typically with sample sizes under 1,000 cases per disease, such as those from The Cancer Genome Atlas Program (TCGA). Moreover, most studies have been limited to qualitative analyses, such as examining attention weights or gradient attributions, rather than quantitative assessments. Consequently, there remains an unmet need for quantitative, objective, and scalable methods to analyze morphological correlates of gene expression.

Here, we introduce the first AI model designed to identify and impute morphomolecular signatures in toxicology data by connecting specific morphologies to expression changes. Our model, named Gene Expression Regressor (GEESE), is a deep learning architecture that predicts bulk expression levels of 1,536 selected gene targets from digitized H&E-stained liver sections (whole-slide images, WSIs). GEESE employs a weakly supervised training approach to predict slide-level labels (i.e., gene expression) without requiring patch-level annotations and enabling scalable training on large datasets. We trained GEESE on 8,231 hematoxylin and eosin (H&E) WSIs from Rattus norvegicus liver, spanning 127 preclinical drug safety studies. The model was evaluated on an independent test set of 29 studies comprising 2,002 WSIs and expression profile pairs (Fig. 1b). GEESE’s unique architecture enables fine-grained attribution of gene expression predictions, yielding pseudo-spatially resolved gene expression maps for all test samples (Fig. 1c). To connect gene expression with morphological lesions, we additionally developed a morphological classification model to identify six common drug-induced liver injuries (DILI), including necrosis, fatty change, and increased mitosis. By correlating the pseudo-spatially resolved expression predictions from GEESE with the lesion predictions, we generated a dataset of 25 million morphology-expression pairs.

From this analysis, we established a robust and quantitative relation between up and downregulated genes and morphological lesions, with multiple associations being preserved across multiple compounds (Fig. 1d). We curated lists of genes linked to each of the studied lesions and identified biomarkers that were corroborated with public databases, such as the Comparative Toxicogenomics Database1517, and pathway-enrichment analyses. We further validated these gene signatures against in vitro primary human hepatocyte cell lines, providing an initial assessment of translatability to humans. Overall, our study introduces new methods to understand toxicity and its underlying morphomolecular mechanisms at an unprecedented scale and granularity, paving the way for enhanced prediction and mechanistic understanding of compounds.

Results

Study overview

Our study leverages the publicly available Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System (TG-GATEs) dataset34, a collection of preclinical drug safety studies (Materials and Methods, section TG-GATEs protocol and table S1). TG-GATEs studies were acquired as part of the Japanese Toxicogenomics Project consortium designed to test the hepatotoxicity of known drugs and chemicals after in vivo compound exposure on Rattus norvegicus. Here, we collected data from 156 drug safety studies accounting for 10,234 pairs of haematoxylin and eosin (H&E) whole-liver tissue sections (20× magnification, 0.49μm/px) with the corresponding gene expression profile measured with mRNA microarrays35. Each slide represents the morphological changes caused by administering a specific dose of a compound at a particular time post-administration. In addition, each slide was annotated with morphological lesions identified by toxicologic pathologists, such as reporting the presence of hepatocellular hypertrophy – with lesions that might be drug-induced or spontaneous (Materials and Methods, section Histopathology acquisition and annotation).

Analogously, bulk gene expression profiles was performed to encode the molecular landscape from the tissue section, which the action of the compound could have altered. Gene expression was measured with mRNA microarrays, which provide a bulk whole-transcriptome description post-drug administration. We limit our study from 31,042 probes to a set of 1,536 genes selected to cover a large biological space relevant to toxicology (Materials and Methods, section Gene expression of in vivo rat studies). Specifically, through complementary a priori knowledge-driven and data-driven approaches, we filtered the full set of genes to a subset that satisfies either of the two conditions: (1) genes that are biologically relevant to toxicology (linked to liver metabolism and liver response to injury), as identified through works such as T100036; and (2) genes whose measured expression levels were highly correlated with the presence of drug-induced lesions were included (e.g., TMBIM1, a gene involved in death receptor binding activity and necrosis).

To rigorously assess the generalizability of our model to unseen studies, we split TG-GATEs into a set for training and validation that comprises 127 studies and 8,232 slides, and a test set that comprises 29 studies and 2,002 slides (Fig. 1b and Materials and Methods, section Dataset split). Overall, 1,788 slides (17% of the TG-GATEs dataset) report morphological lesions: 1,141 (17%) in train, 283 (17%) in validation, and 364 (18%) in test. In this study, we focus our analysis on compounds that induce the following six commonly found morphological lesions: abnormal increases in mitotic figures (2.4% of slides in TG-GATEs), necrosis (5.2% of slides), cellular infiltration (5.1% of slides), bile duct proliferation (0.9% of slides), fatty change (2.2% of slides), and hepatocellular hypertrophy (8.0% of slides).

Weakly-supervised expression prediction

We introduce the gene expression regressor (GEESE) method, a deep learning model based on multiple instance learning (MIL)3739 that can predict the expression profile associated with an input whole-slide image (WSI). Following the MIL paradigm, we tessellate the slide into 256×256 pixels patches. We use a pre-trained vision encoder to extract patch embeddings from each patch. Here, to reduce the domain gap between the source training domain (such as natural images40 or human histology4143) and the target domain (rodent histology), we trained a Vision Transformer from scratch with iBOT4446, a self-supervised learning (SSL) model47,48. Our SSL model was trained on 15 million patches extracted in 46,734 slides from TG-GATEs (Materials and Methods, section iBOT pretraining). We then trained a MIL regression model predicting gene expression from patch embeddings within the slide. Our method, GEESE, can extract patch-level gene expression scores using a multilayer perceptron (MLP) mapping the patch embedding to the expression profile. By summing patch-level predictions, GEESE can derive slide-level expression levels, which represent the predicted log2 fold change expression quantifying expression changes between the control and the tested configuration (zero means no change). We employed the mean squared error (MSE) between the predicted and measured expression to train GEESE end-to-end with the patch embeddings. This MIL formulation naturally yields patch-level expression predictions, thereby enabling a pseudo-spatially resolved expression map, where the resolution is given by patch size29. This differs from widely employed attention-based methods38,39 or gradient attribution methods49, which rely on surrogate attribution. Additional information is provided in the Materials and Methods, section Gene expression regressor (GEESE).

We trained GEESE on a set of 127 studies (8,232 slides) and evaluated it on 29 held-out studies (2,002 slides) (fig. S1 and S2). The model prediction is assessed using the Pearson correlation and area under the ROC curve (AUC) adapted to regression (Materials and Methods, section Evaluation and implementation). Predicted expression levels of several genes, such as TNFRSF12A (involved in inflammation and cell death) and SLC10A1 (part of the sodium/bile acid co-transporter family), show a high correlation with observed expression levels: r=0.722 (95% CI: [0.675, 0.762]) and r=0.688 (95% CI: [0.630, 0.733]), respectively. When considering all selected genes, the average correlation is 0.29 and increases to 0.63 when considering the top 100 best-predicted genes. We attribute these large variations in prediction to the fact that (1) some genes (expressed or not) might not be reflected in the tissue morphology making them undetected by GEESE, (2) other genes might have low expression in the test studies rendering detection harder due to noise, and (3) some learned expression profiles associated with certain morphologies in training might not generalize to test studies.

We additionally investigated model performance stratified by compound. Specifically, we identified the top 100 best-predicted genes and computed the average Pearson correlation for each study (Fig. 2a). Overall, we observe large variations from one study to another. In some studies, such as thioacetamide and methylene dianiline, predictions have high correlations with ground truth (r >0.8), while in others, such as carboplatin, correlations are smaller (r <0.5). To understand such discrepancies, we investigated the percentage of slides per study that reported on the six considered lesions (background or drug-induced). We observe that studies with poorly predicted expression usually corresponded to compounds with little to no reported lesions. This suggests that the model uses the presence of morphological lesions to predict expression.

Figure 2: Gene expression profiling from whole-slide images using GEESE.

Figure 2:

a. Gene expression prediction of the top 100 best-predicted genes evaluated using Pearson correlation and stratified by compound from the test set. The percentage of slides with lesions is displayed in red. Boxes indicate quartile values of gene-level Pearson correlation, with the center line indicating the 50th percentile. Whiskers extend to data points within 1.5× the interquartile range. b. Example of a liver section after exposure to bromobenzene (left). Overlay of patch-level necrosis prediction. Predictions below 90% confidence are represented in blue, and high-probability predictions are represented in red (center-left). Pseudo spatially-resolved gene expression heatmaps of genes TNFRSF12A (center-right) and ATF3 (right). Examples of high-probability necrotic patches from the slide pseudo gene expression of ATF3, TNFRSF12A, MDM2, DDIT3 (also known as CHOP), HMOX1, and GADD45A (bottom).

We further inspected the slide-level expression prediction and ground truth in thioacetamide and methylene dianiline, the two best-predicted compounds for the genes SLC10A150 and TNFRSF12A51. Notably, we observe a substantial Pearson correlation (r ∈ [0.8, 0.9]), where the presence of a lesion (as per TG-GATEs annotations) appears to be linked to SLC10A1 downregulation and TNFRSF12A upregulation (fig. S1c). Yet, samples with lesions highlight different expression levels (from 0 to 4 log2 fold change in TNFRSF12A expression in methylene dianiline), which require quantification methods for deeper analyses.

Analysis of morphological correlates of expression changes

As part of routine histological assessment, identified morphological lesions are reported with a score describing the extent and severity of the lesion (in TG-GATEs, minimal, slight, moderate, severe). However, lesion scoring remains based on qualitative or semi-quantitative assessment and, as such, lacks consistency within and across studies, making it impractical for robust quantification analyses. Instead, we leverage our in-domain vision encoder as a foundation for training a lesion classifier. We gathered patch-level annotations from six commonly found lesions: increased mitosis, necrosis, cellular infiltration, bile duct proliferation, fatty change, and hypertrophy (Materials and Methods, section Patch encoding and lesion classification, table S3 and table S4). In total, we acquired 24,631 patch annotations with lesions extracted from 3,458 liver slides and 13,888 normal patch annotations from 3,531 slides, accounting for a total of 38,519 patch annotations. The vision encoder was then fine-tuned on these annotations and trained to minimize a multilabel binary cross-entropy objective on the six lesions of interest (patches can include multiple lesions). Our fine-tuned lesion classifier reaches an average performance of 98.9% macro-AUC across all lesions, making it a reliable and robust predictive tool for subsequent analysis.

By running lesion classification on all 2,002 test slides, we can obtain high-quality patch-level predictions on 25 million patches, where each prediction describes the likelihood that a given lesion is present in the patch. Subsequently, we can compare the lesion prediction with the pseudo-spatially-resolved gene expression obtained with GEESE (Fig. 2b). For instance, when analyzing a sample from the bromobenzene study (administration of 100mg/kg for four days, each day), we observe that the gene expression heatmap of TNFRSF12A51 and ATF352 focuses on the same regions. In addition, both regions largely overlap with the prediction of necrosis. This suggests that GEESE uses the presence of necrosis to predict the increased expression of both genes. This finding is not limited to TNFRSF12A and ATF3, as other genes highlight the same trend such as MDM2, DDIT3, HMOX1, and GADD45A, genes involved in a diverse array of processes often related to inflammation, oxidative stress, and cell death such as positive regulation of apoptotic process or response to endoplasmic reticulum (ER) stress.

When conducting a similar analysis across several test studies, such as on thioacetamide, ethionamide, methylene dianiline, and methyldopa, we observe that the presence of necrosis aligns with the upregulation of TNFRSF12A and ATF3, suggesting that this finding is compoundagnostic – or that the mechanism of action of the compound is linked to both these genes (fig. S2).

While this analysis was conducted with necrosis, similar behavior is observed with other lesions. The upregulation of CCNA2, KNSTRN, CDKN3, CCNB1, ARHGAP11A, and HMMR aligns with the presence of increased mitosis (fig. S3); the upregulation of CXCL1, BCL2A1, S100A4, CXCL10, EVI2A, and FILIP1L aligns with the presence of cellular infiltration (fig. S4); the upregulation of SERPINA7, BEX4, CDH13, CLDN7, CD24 and downregulation of SERPINA4 aligns with the presence of bile duct proliferation (fig. S5); the upregulation of ACOT1, GSTP1, ACOT2, CYP1A1, HID1 and the downregulation of SLC6A6 aligns with the presence of fatty change (fig. S6); and the upregulation of GSTA3, ALDH1A1, ADGRG2, RGD1559459, and the downregulation of PLVAP and LOX aligns with the presence of hypertrophy (fig. S7). Overall, by comparing patch-level lesion predictions with the pseudo-gene expression map from GEESE, strong relationships are observed between the presence of each of the six lesions and the upregulation or downregulation of specific genes.

Single-study morphomolecular analysis

We expanded the analysis to all WSIs within a study to refine and characterize the morphomolecular relationships previously identified. We rely on three quantitative measures for indepth investigation: 1) the gene expression (normalized as log2 fold change) as measured with mRNA microarray, 2) the predicted pseudo-spatially-resolved expression map, and 3) the predicted size, type, and location of each lesion. We emphasize that the latter two measures are enabled by our proposed lesion classifier and GEESE architecture.

By measuring the Pearson correlation between the measured slide-level expression and the predicted lesion size, we observe a subset of genes with high correlation (r >0.7, Fig. 3a). This is exemplified with increased mitosis after administration of danazol (Fig. 3), cellular infiltration in methylene dianiline (fig. S8), bile duct proliferation in methylene dianiline (fig. S9), fatty change in ethionamide (fig. S10), and hypertrophy in hexachlorobenzene (fig. S11). This analysis further shows that genes showing a high correlation between measured expression and predicted lesion size overlap with the genes best predicted by GEESE. In addition, we observe that certain genes are not associated with any of the considered lesions.

Figure 3: Morphomolecular analysis of increased mitosis in danazol.

Figure 3:

a. For each slide-expression pair, we correlate the predicted percentage of mitosis found in the slide with the measured gene expression. Genes with a high correlation between measured expression and predicted lesion extent can be seen as part of the morphomolecular signature associated with the compound. TG-GATEs original annotations report 8/64 slides with increased mitosis in danazol. b. Correlation between the estimated percentage of the mitosis in a slide and the gene expression of CCNB2 and CDCA3 (left: measured, right: predicted). P-value derived from testing the two-sided null hypothesis of non-correlation. c. Density distribution of patch-level expression for patches predicted as normal by the lesion classifier (n=746,699 patches) and patches predicted as containing mitosis (n=26,429). Patches were extracted from the 64 WSIs of the danazol study.

When focusing on specific genes, such as CCNB2 (Cyclin B2) and CDCA3 (Cell Division Cycle Associated 3) in danazol, we observe how the measured expression varies with the predicted lesion size (Fig. 3b). We find CCNB2 and CDCA3 to be increasingly upregulated as the predicted lesion size increases (r>0.7). The same observation holds when analyzing the slide-level gene prediction (r>0.7 between predicted expression and predicted mitosis). We further analyzed the distribution of the predicted patch-level expression for these two genes. To this end, we assign a label to each patch (lesion or normal), according to whether the predicted lesion probability crosses lesion-specific thresholds (Materials and Methods, section Lesion classifier). We observe a bi-modal distribution, where normal patches (i.e., no sign of lesion) have an average predicted expression centered around zero and patches with mitosis are predicted as upregulated (Fig. 3c). Overall, by comparing gene expression in patches with and without lesions, we can establish precise and quantitative relationships between up and downregulation of genes and the extent of lesions.

Cross-study analysis of morphomolecular signatures

The next aim is to study whether these signatures are preserved across studies, for instance, if the molecular correlates of mitosis identified in response to danazol are also found in other compounds. This allows us to determine if observed genetic markers linked to particular morphologies remain consistent across compounds, which would be suggestive of specific toxicological mechanisms. To this end, we compiled a list of studies where a given lesion is observed (table S5 and table S6), for instance, the eight studies from the TG-GATEs test set with mitosis (fig. S12). For each slide from each study, we extracted all patches with lesions (for instance, 157,414 patches predicted with mitosis). A detailed description of the number of patches extracted for each study can be found in table S7. We then computd the macro-average of the predicted expression for the selected patches across all studies and rank them from the highest to lowest. Genes with the highest (absolute) expression are expected to be the most linked to the lesions (Materials and Methods, section Post-hoc morphomolecular signature analysis). We conducted similar analyses for all lesions: necrosis (four studies, fig. S13), cellular infiltration (four studies, fig. S14), bile duct proliferation (two studies, fig. S15), fatty change (four studies, fig. S16), and hypertrophy (eleven studies, fig. S17).

An overview of the analyses is shown in Fig. 4 where we highlight the 40 most upregulated (in red) and most downregulated (in blue) genes for all lesions. Some genes are consistently upregulated (such as ABCC3 and GPX2) or downregulated (such as NOX4, CAR3, and OAT) for multiple lesions (table S8). These genes do not appear to be lesion-specific but may rather be indicative of general toxic exposure (for instance, ABCC3 is involved in transporting organic anions and drugs out of cells). Interestingly, some of these genes are those best predicted by GEESE, such as KLF6, NOX4, and CAR3, that have Pearson correlations of 0.717 (95% CI: [0.672, 0.759]), 0.639 (95% CI: [0.605, 0.672]), 0.698 (95% CI: [0.659, 0.736]), respectively, between predicted and observed expression on all test studies (fig. S1). A large number of the identified genes (both up and downregulated) are linked to drug-induced liver injury according to the Comparative Toxicogenomics Database (CTD)1517, either through direct therapeutic or mechanistic evidence (genes with filled black star in Fig. 4) or via inferred evidence from network-based analysis (star with black contour in Fig. 4). For instance, 23/40 of the most upregulated genes associated with hypertrophy are referenced in CTD as direct or inferred evidence (inference score>400), such as CYP1A1, which encodes a member of the cytochrome P450 superfamily of enzymes known for catalyzing reactions involved in drug metabolism53,54. This analysis validates the ability of GEESE to identify highly relevant gene sets connected to toxicity.

Figure 4: Cross-study morphomolecular analysis.

Figure 4:

Heatmap illustrating the mean expression of genes in patches displaying specific lesions, comparing the top 40 upregulated and top 40 downregulated genes for each lesion type. Panels a. mitosis, b. cellular infiltration, c. bile duct proliferation, d. fatty change, e. hypertrophy, and f. necrosis details the gene expression dynamics, where each analysis spans across multiple compounds (table S5). Genes are ranked by their absolute expression, with upregulated genes indicated in red (descending order) and downregulated genes in blue (ascending order). Genes expressed for each specific lesion are highlighted in purple. Genes with known connections to drug-induced liver injury (DILI) are marked with a star (either as direct or inferred evidence). Direct evidence refers to known mechanistic and/or therapeutic connections between the gene and DILI. Inferred evidence with DILI refers to genes with an inference score>400 as per the Comparative Toxicogenomics Database (CTD)1517 (inference score measured with the similarity between CTD chemical–gene–disease networks and a similar scale-free random network).

Certain genes exhibit changes in expression only for a specific lesion, indicating a closer association with the lesion itself rather than general toxic exposure. To identify these lesion-specific genes, we compared the average predicted patch-level expression containing a lesion of interest against the expression of all other patches with lesions. A gene was then considered lesion-specific if its expression was significantly larger than the ones of the other five lesions (Materials and Methods, section Post-hoc morphomolecular signature analysis). For each lesion, we identified a varying number of lesion-specific genes (genes marked in purple in Fig. 4). For instance, mitosis exhibits a distinct molecular signature, likely due to its unique nature compared to other lesions, such as cellular infiltration and fatty change, that can co-occur with other more prominent lesions in our dataset. To assess the relevance of the identified lesion-specific molecular signatures, we conducted a pathway enrichment analysis using the Rat Genome Database55 that aggregates previously established biological processes in rats and humans (Fig. 5a,b,c and fig. S12b, S13b, S14b, S15b, S16b and S17b for lesion-wise analysis). GEESE-identified gene sets are significantly enriched for pathways linked to different lesions. For instance, out of the 51 genes uniquely linked to mitosis (Fig. 5a and table S10), such as CDK156 and CCNB1, 40 are involved in the cell cycle pathway in rats (p-value=6.53E-34), with 41 also involved in the equivalent human pathway (p-value=5.31E-33), 29 genes are involved in the chromosome segregation pathway in rats and humans (p-value=2.07E-36 and p-value=1.69E-33, respectively), and 25 genes are involved in cell division (p-value=2.09E-26), with 34 also involved in the equivalent human pathway (p-value=1.96E-37). When conducting a similar analysis in necrosis (Fig. 5b and table S10), we found that out of the 33 upregulated identified genes, such as ATF352, TNFRSF12A51, DDIT35759 and TRIB314, 21 genes are involved in the apoptotic process in rats and humans (p-value=1.30E-14 and p-value=1.25E-13, respectively), and 20 genes are involved in the cellular response to stress in rats (p-value=1.24E-13), with 21 also involved in humans (p-value=3.18E-14). The 18 upregulated genes linked to hypertrophy (e.g., ALDH1A1, ACSM260 and VNN161) are also significantly enriched for several metabolic pathways such as fatty acid, lipid, and glucose metabolism. Similar analyses conducted on the three other lesions further assert the relevance of GEESE-identified genes. This illustrates the ability of GEESE to identify relevant lesion-specific genetic biomarkers with promising transferability to humans.

Figure 5: Pathway enrichment analysis and human in vitro validation.

Figure 5:

a. Pathway enrichment analysis of GEESE-identified genes uniquely linked to necrosis in rat and human biological processes. b,c. Enrichment analysis conducted for hypertrophy and increased mitosis. Additional analysis is provided in fig. S12,13,14,15,16, and 17. d. Translation to in vitro primary human hepatocyte cell lines in the gene set uniquely identified as linked to necrosis (in vivo) with a focus on the 33 upregulated and 33 downregulated genes identified as related to necrosis.

Translation to in vitro human cell lines

We further assessed the translatability of the GEESE-identified genes associated with necrosis to human biology. To this end, we leveraged data from in vitro primary human hepatocytes (PHH) cell lines collected as part of TG-GATEs on a subset of 140 compounds. For each tested compound, expression changes were measured using mRNA microarrays after fixed time intervals (Materials and Methods, section Gene expression of in vitro human studies). Specifically, from the in vivo rat experiments, we defined two groups of compounds: (1) a group with 10 compounds where at least five necrotic slides were found (corresponds to 42 high-dose samples in vitro), and (2) a group with 83 compounds without any slides with necrosis, which act as a control group (corresponds to 331 high-dose samples in vitro). For each group, we computed the average gene expression (log2 fold change) of the high-dose samples, specifically targeting genes identified by GEESE in vivo (rat). We then reported the expression difference between the first and second groups.

Several GEESE-identified genes were associated with necrosis, including ATF3, TRIB3, and MAFF, which were also differentially expressed in PHH treated with necrosis-inducing compounds (Fig. 5d and fig. S18). This suggests that these genes could be conserved markers of necrosis across species and experimental systems. These findings align with our pathway enrichment analysis and comparison against the CTD database. Other genes, such as MDM2 and AVPR1A, did not show the same differential expression patterns in PHH as observed in rat livers, which several factors could explain: (1) the doses and time points used in the PHH experiments may not elicit the full extent of necrosis-related gene expression changes observed in vivo; (2) PHH cell lines have known limitations in fully recapitulating the complexity of the intact liver, such as its 3D architecture, zonation, and cross-talk with non-parenchymal cells and other organs62; and (3) inherent differences between rat and human hepatocytes could result in a lack of direct translation for some genes. Despite the translational gap across species and systems, GEESE successfully identified several genes that exhibit evidence of conservation as biomarkers of liver lesions and could be further investigated.

Discussion

In this study, we demonstrated that large deep learning models can be used to elucidate morphological correlates of molecular changes in toxicity studies. To this end, we built GEESE, a weakly-supervised regression model (GEESE) trained on over 8,000 liver tissue sections from 129 preclinical safety studies to predict bulk gene expression changes of 1,536 gene targets. In addition to slide-level profiling, GEESE can derive local gene expression changes by predicting pseudo-spatially-resolved gene expression maps of 1,536 genes. By combining GEESE with a morphological lesion classifier that can precisely locate and quantify six commonly found morphological lesions in liver, we extracted from 29 held-out safety studies a dataset of 25 million image patches, each associated with a pseudo-expression profile and lesion labels. This analysis enabled us to identify various morphomolecular associations within and across multiple compounds. For example, the upregulation of 33 genes such as ATF3, TNFRSF12A, DDIT3, known for their involvement in stress response pathways and apoptotic signaling63, was associated with the presence of hepatocellular necrosis in multiple studies. Similarly, the presence of mitosis was consistently linked with the upregulation of 52 genes such as CCNA2, CCNB2, KNSTRN and CDKN3, which are involved in the regulation of the cell cycle, DNA replication, and mitotic spindle formation. This analysis allowed the curation of comprehensive gene sets associated with each of the six studied lesion types that we further validated against public toxicogenomic databases, pathway enrichment analyses, and in vitro human cell line data. Overall, GEESE enables the discovery of subtle and robust morphomolecular associations within and across compounds at an unparalleled scale.

Even though the size of the cohorts used in this study is of unprecedented size in toxicology, our study has limitations. First, our analysis is focused on rat liver tissue, which limits our findings to a single organ. Analyzing other tissues and confirming the conservation of signatures across species would strengthen the translational relevance. While we validate our analysis with in vitro primary human hepatocyte cell lines, these data remain limited in representing the complexity of an intact liver. Future studies, which employ more advanced in vitro models that better mimic the in vivo liver microenvironments such as high-content imaging64, 3D spheroids65 or organ-on-a-chip systems66, can accelerate such translational efforts. Additionally, the number of compounds considered in the downstream analysis is limited to 29 in vivo rat studies (156 total, with 127 for model training) and 140 in vitro human studies. Therefore, our analysis cannot encompass the morphomolecular diversity that the administration of any compound can induce. Scaling to thousands of preclinical studies (and millions of slides) is needed to increase the diversity of the discovered morphomolecular signatures. Lastly, validating the accuracy of GEESE pseudo-spatially resolved expression profiles remains challenging. Immunohistochemistry (IHC) is not routinely performed in toxicology studies, and spatial transcriptomic (ST) data remains scarce due to high cost.

We envision the incorporation of GEESE into the preclinical workflow, providing a valuable tool for toxicogenomic profiling from histology and imputation of spatially-resolved gene expression maps. In addition, GEESE can be used to automatically identify, quantify, and characterize the relationships between morphological lesions and gene expression changes. These capabilities are crucial as toxicity remains a major cause of drug attrition, with preclinical toxicology studies as a critical threshold for the $1.8 billion total cost necessary to bring a new molecular entity to market2,67,68. Given the high attrition rate of drug candidates due to toxicity during preclinical testing with only about 5% of compounds that enter preclinical studies ultimately receiving approval1, GEESE can contribute to streamlining toxicity assessment by reducing manual semi-quantitative evaluations and providing molecular insights cost-effectively without requiring specialized techniques like ST or IHC.

While additional validations will be required to ascertain some of our findings, our approach to morphomolecular signature discovery can seamlessly scale to more compounds, gene targets, and species. In addition, joint efforts from the pharmaceutical industry and academia, such as the BigPicture initiative69, will gather large cohorts of preclinical studies, which can be harnessed as additional training data or validation. Furthermore, the integration of additional clinical data, such as from the DrugMatrix10 and ToxCast70 programs, could help bridge the translational gap between rodent studies and human toxicity. Finally, establishing community-wide standards and infrastructure for structuring toxicological data for AI method incorporation and foundation model development, such as through the eTRANSAFE71 consortia, will bring us closer to translating AI methods into practical tools for drug and biomarker discovery. Overall, our study lays the foundations for several promising avenues in AI-driven toxicology research and preclinical drug safety assessment.

Materials and Methods

Ethics statement

The study involves a retrospective examination of previously collected tissue samples of Rattus Norvegicus liver sections, which are part of a public archive. Examination of the original study’s documentation confirms that the experimental protocol was subject to an ethical review and subsequently received approval from both the Ethics Review Committee for Animal Experimentation at the National Institute of Health Sciences (NIHS) and the relevant contract research organizations.

Study design

TG-GATEs protocol:

Four contract research organizations conducted animal experiments on male Crl:CD Sprague-Dawley (SD). Animals were allocated into groups of 20, each using a computerized stratified random grouping method based on body weight34. Two types of dose administration were conducted: single-dose and repeated-dose. In single-dose experiments, groups of 20 animals were administered a compound, and then five animals were sacrificed 3, 6, 9, and 24 hours after administration. In repeated-dose experiments, groups of 20 animals received a dose every day, and five animals were sacrificed 4, 8, 15, and 29 days after administration. For each sample group (unique compound, dose, sacrifice time), three animals underwent a toxicogenomic analysis with mRNA microarrays. Animals were not fasted before being sacrificed. The compounds examined (as detailed in table S1) were chosen through literature reviews and agreement among toxicologists from the pharmaceutical industry and the Japanese government. In most compounds, three dose levels were tested with a dose ratio between the low, middle, and high levels of 1:3:10.

Histopathology acquisition and annotation:

All liver sections were stained with H&E (hematoxylin and eosin) and mounted on glass slides. Tissue sections were converted into digital pathology images using a ScanScope AT scanner (Aperio Technologies Inc., CA, USA) at 20× magnification (0.49 μm/px). The histopathology data from TG-GATEs include annotations that detail the lesions observed in the slides. These annotations are unnormalized, with various studies employing differing terminologies and taxonomies to describe identical findings. In total, 66 different lesion types were identified across 23,136 liver sections. However, many of these are either synonyms or more specific classifications of broader lesion categories. In our analysis, we grouped related lesions into six lesions of interest: increased mitosis, necrosis, cellular infiltration, bile duct proliferation, fatty change, and hypertrophy. A description of each lesion is provided in table S4.

Gene expression of in vivo rat studies:

The raw transcriptomic data consists of microarrays (Affymetrix Rat Genome 230 2.0 Array GeneChip) with 31,042 probes. All data followed probe-wise normalization using log2 fold change with respect to a control group. Log2 fold change quantifies the proportional difference, on a logarithmic scale, between the expression levels of a particular probe under two conditions: a control group (on average 22 slides per study in TG-GATEs) and a sample group (a defined set of compound, time and sacrifice). The log2 fold change gene expression changes were not further normalized before processing by our models. Each probe was mapped to a unique gene name identifier, resulting in 13,404 gene expression measurements per sample. From there, we reduced the number of genes analyzed to (1) discard genes unrelated to liver metabolism, drug administration, and toxicity, (2) simplify training of the gene expression prediction model (GEESE), and (3) simplify the post-hoc analysis. Here, we selected genes based on two strategies to ensure the use of a biologically diverse set. Firstly, we included the T1000 gene set36, a set of 1,000 genes responsive to chemical exposures from which we retrieved 867 genes. Second, we used a data-driven approach, where we computed the Pearson correlation between each measured gene expression and slide-level lesion labels (as reported in TG-GATEs annotations) in the train studies. We then retained genes with a Pearson correlation larger than a threshold set to 0.15. The threshold was decided arbitrarily to include promising genes that may be morphologically expressed while keeping the total number of genes analyzed around 1,500. The integration of genes from distinct methodologies results in a consolidated subset of 1,536 gene targets.

Gene expression of in vitro human studies:

In-vitro human experiments were conducted using the Affymetrix human U133 Plus assay with 54,613 probes on primary human hepatocytes (PHH) cell lines34,72. This assay was conducted on a subset of 140 compounds with three dose levels: low, medium, and high, followed by sample collection after 2h, 8h, and 24h.

Dataset split:

To avoid compound-specific information leakage when training the gene expression regressor, we extracted 29 studies for testing (N=2,002 slides) and kept 127 studies from 8,232 pairs for training and validation. We further split training and validation slides to obtain a train (N=6,585 slides) and validation set (N=1,647 slides). From the 6,585 samples in the training set, 1,141 (17%) were annotated as containing one or multiple lesions: 154 with increased mitosis (2.3%), 314 with necrosis (4.8%), 321 with cellular infiltration (4.9%), 54 with bile duct proliferation (0.8%), 143 with fatty change (2.2%), and 526 with hypertrophy (8.0%). From the 2,002 samples from the testing set, 364 samples were annotated as containing one or multiple lesions: 53 with increased mitosis (2.6%), 150 with necrosis (7.5%), 116 with cellular infiltration (5.7%), 21 with bile duct proliferation (1.0%), 47 with fatty change (2.3%), and 166 with hypertrophy (8.3%). The complete distribution of lesions in test studies is provided in table S6.

Deep learning modeling

All slides are preprocessed by first segmenting tissue regions and then tesselating the slide into patches (see Tissue segmentation and patching). Then, we learn a vision encoder based on self-supervised learning to derive a compressed representation of image patches (see iBOT pre-training). The learned patch embeddings serve as input to the expression regressor (GEESE) following the Multiple Instance Learning (MIL) paradigm38,39 (see Gene expression regressor). In addition, patch-level annotations are used to fine-tune the patch encoder and extract pseudo-lesion labels (see Lesion classifier).

Tissue segmentation and patching

Before MIL training, each slide was segmented using the CLAM toolbox39 that includes modules for automatic detection of tissue vs. background. After segmentation, non-overlapping 256 × 256-pixel patches were extracted at 20× magnification (0.5 μm/px) and then resized to 224×244 pixels image patches.

iBOT pretraining

We employed the iBOT framework46, a state-of-the-art approach in self-supervised learning for building compressed morphological descriptors (patch embeddings) of image patches. iBOT employs a student-teacher knowledge distillation strategy designed for pretraining Vision Transformer (ViT)45. iBOT uses two main objectives: self-distillation loss47,73, which aims to align the representations of a student and teacher network, and masked image modeling loss74, which aims to reconstruct the original image from partially observed inputs. We trained a ViT-Base model that yields 768-dimensional embeddings on 15 million patches extracted from 46,734 WSIs. We trained the network for 1,176,640 iterations (or 80 epochs). The specific hyperparameters used for training are listed in table S15.

Gene expression regressor (GEESE)

We cast gene expression prediction as a weakly supervised regression task, where we learn a pooling function to aggregate the iBOT patch embeddings into a slide-level gene expression prediction. We propose the gene expression regressor, denoted as GEESE, that enables joint derivation of patch and slide prediction scores using slidelevel supervision only. Namely, each patch embedding, denoted as hid, is passed to a patch regressor network f(.). Then, the slide-level regressor is built by taking the arithmetic mean over all patch-level regression scores, resulting in a slide prediction. Formally, we define it as:

s=1Ni=1Nsi=1Ni=1Nf(hi), (1)

where sNG denotes the slide-level log2 fold change gene expression scores. As the slide prediction is directly defined as the mean of the individual patch contributions, denoted as siNG, the gene-wise patch importance can be readily obtained without the need for analyzing attention scores39 or gradient attribution33,75. The resulting patch attribution si can, therefore, be seen as a pseudo-spatially-resolved gene expression, where the resolution is given by the patch resolution (for the patch of 256×256 pixels at 0.5μm/px, the resolution is 128μm). Here, f(.) is implemented using a 4-layer MLP patch regressor with LayerNorm, dropout (0.1) between all layers, and GELU activation (see table S16). This formulation has connections with AdditiveMIL76.

Lesion classifier

We use the iBOT model as a foundation for classifying six common liver lesions at patch-level (each patch is 256×256 pixels or 128μm). As the TG-GATEs cohort only includes slide-level labels, we curated a set of patch annotations. To this end, we employed four different approaches: 1) Public annotations We used publicly available annotations provided by Bayer Pharmaceuticals and Aignostics GmbH https://zenodo.org/record/7541930. These annotations consist of polygonal annotations within 230 whole slide images from TG-GATEs. These polygonal annotations were subsequently converted into patch annotations, with patches retained based on their overlap with annotations. 2) Human-in-the-loop annotation Semi-annotated human-in-the-loop annotations were generated using a weakly supervised slide classification system. A subsequent manual review led to the selection of true positive examples. 3) Normal patches To include normal patches, we extracted ten random patches from lesion-free slides, each thoroughly examined to exclude small lesions such as mitosis or single-cell necrosis. 4) Manual annotation Human annotations were performed using the QuPath software77 to extract missing lesions such as fatty change. The process yielded 24,631 patch annotations with lesions extracted from 3,458 slides, and 13,888 normal patch annotations from 3,531 slides (see table S3).

The pretrained iBOT vision encoder was fine-tuned on these annotations and trained to minimize classification loss, defined as a multilabel binary cross entropy objective on all six classes. Note that each patch can either be normal (no lesion detected) or include one or multiple lesions. Here, we used a class-stratified 80/20% train/validation split. The network was finetuned for 20 epochs using the AdamW optimizer with an initial learning rate of 4e-4 and layerwise learning decay of 0.65. Basic patch augmentations were performed during fine-tuning, based on random color jittering, mirroring, and rotation. The lesion classifier provides the likelihood that each patch contains one of the six lesions (expressed as a probability post-Sigmoid activation). To ensure that we only include positive patches, we use conservative classification thresholds set to 0.95 for cellular infiltration, 0.9 for necrosis, 0.9 for bile duct proliferation, 0.99 for fatty change, and 0.9 for increased mitosis. Each threshold was determined using an independent set of patches for each lesion.

Post-hoc morphomolecular signature analysis

Using GEESE, we inferred the patch-level pseudo-expression on the 2,002 slides from TG-GATEs test set. We proceeded analogously to extract patch-level lesion prediction using our morphological lesion classifier. Each patch becomes assigned to 1,536 gene expression scores (expressed as a log2 fold change), and a single or multiple morphological labels, such as necrosis, cellular infiltration and fatty change, or normal if no lesion was detected. In total, this operation yielded 25 million lesion-expression pairs. We used all pairs for the downstream quantitative analyses.

Gene identification

We present the detailed steps for the post-hoc analysis of necrosis. A similar process is conducted for all other five lesions. We start by selecting compounds associated with necrosis (for instance, we selected 4 out of 29 studies from the TG-GATEs test set, see table S5). Out of the 25 million lesion-expression pairs, we subsequently selected patches that contain necrosis in the selected studies, yielding 53,542 patches (fig. S13). Using the corresponding patch-level expression of the selected patches, we extract the most upregulated and the most downregulated genes out of the 1,536 gene targets. Specifically, we compute the average patch expression per selected compound and further average across all selected compounds. We then extract genes with a mean expression above 1 log2 fold change (upregulated genes), and with a mean expression below −1 (downregulated genes). We proceed similarly for other lesions using a threshold of −0.5 and 0.5 (instead of 1 and −1) for increased mitosis, fatty change, cellular infiltration, and hypertrophy. These thresholds were set arbitrarily so that a pool of genes could be identified for further analysis and validation. Varying this threshold controls the number of genes used for additional investigation.

We refine the gene selection as a final step to focus on genes that are only differentially expressed for a single lesion (e.g., CCNA2 is only upregulated in the presence of mitosis, whereas ABCC3 is associated with multiple lesions, such as necrosis, cellular infiltration, and fatty change). A gene is selected as specific to necrosis if it satisfies the following three conditions simultaneously: (1) no other lesion is more differentially expressed than necrosis, (2) its absolute average expression across the five other lesions is at least two times lower, and (3) the absolute Pearson correlation between measured and predicted slide-level gene expressions, for slides identified as containing necrosis, is above 0.3. Formally, we express these three conditions as,

Condition-(1):GeneExpressionnecrosis>c×max(GeneExpressionother_lesions|) (2)
Condition-(2):GeneExpressionnecrosis>2×mean(GeneExpressionother_lesions) (3)
Condition-(3):Pearsonmeasuredpredicted>0.3 (4)

where the constant c is set to 1.5. Intuitively, Condition-(1) screens an initial subset of genes associated with necrosis. Condition-(2) serves to refine this subset, where the constant c is set arbitrarily to 2 to identify genes with the largest correlation to necrosis while removing those primarily indicative of generic toxic exposure effects. Condition-(3) ensures that only the genes that can be satisfactorily predicted from tissue morphology, with GEESE, are retained. We follow the same process for other lesions besides cellular infiltration, where the first condition is relaxed such that no other lesion is more differentially expressed than 75% of its value (c=0.75). This relaxation reflects the fact that cellular infiltration often co-appears with necrotic patches and bile duct proliferation. For the same reason, in the analysis of bile duct proliferation and necrosis, we exclude cellular infiltration from Condition-(1) and Condition-(2).

Pathway analysis

To validate our findings, we identify biological pathways related to previously identified genes. The goal of this analysis is two-fold: (1) confirm the biological relevance of the identified molecular signatures and (2) identify new biomarkers that have previously been poorly explored and characterized. To this end, we utilized the Rat Genome Database11,55, a state-of-the-art public resource for multi-species pathway enrichment analysis that includes both rats and human pathways. Specifically, we employed the Multi-Ontology Enrichment Tool, MOET, available within the Rat Genome Database to identify the most relevant biological processes based on the Gene Ontology database. For this analysis, overlaps were computed on the child terms of the term biological process (GO:0008150), which include 20,292 process sets for rats and 19,761 process sets for humans. This allowed the identification of biological pathways in which the genes from each lesion’s gene list were most involved.

Evaluation and implementation

Training details

All weakly supervised expression regression models are trained using the AdamW optimizer with an initial learning rate of 1e-04, a mean-squared error objective, a maximum of 40 epochs with early stopping (patience set to 10) with respect to the validation loss.

Metrics

GEESE predictive performance is evaluated using the Pearson correlation, Area under the ROC Curve (AUC), log2 fold change, R2, and Mean Squared Error.

Pearson correlation

Pearson correlation describes the linear relationship between two sets of scalars. It varies between −1 and +1, with 0 implying no correlation. The corresponding p-value (2-tailed p-value) represents the two-sided null hypothesis of non-correlation. We employ the pearsonr implementation from the Python package Scipy version 1.13.0.

AUC

AUC is the area under the receiver operating curve plotting the true positive rate against the false positive rate as the classification threshold is varied. This metric is mainly used for classification tasks but can be adapted for regression. Formally, we define AUC for regression as,

AUC=1Ni=1nj=i+1n(1{ytrue[i]ytrue[j]}(1{sign(ytrue[i]ytrue[j])=sign(ypred[i]ypred[j])}+121{ypred[i]=ypred[j]})) (5)

where ytruen and ypredn are vectors with the true and predicted regression values, N is the number of valid pairs (i,j).

Log2 fold change

Log2 fold change is a measurement commonly used to quantify the relative change between two experimental conditions. It is calculated by taking the base 2 logarithm of the ratio between the percentage of a certain lesion under some conditions (such as high dose, sacrifice time of 29 days) and the percentage of that same lesion in the control group.

R2

R2 is a metric used to assess the quality of a regression model, where 1 indicates perfect regression, and 0 random regression. R2 measures the goodness of the fit and represents the proportion of variance in the dependent variable that is explained by the model. We employ the metrics.r2 score implementation from the python package scikit-learn version 1.2.1.

Mean Squared Error (Standardized)

Mean Squared Error (Standardized) is computing the average of the squares of the errors and is used to quantify how far model predictions are from the actual values. Before computing the mean square error, the gene expression measured and predicted was standardized (subtracted by the mean and divided by the standard deviation of the gene expression measured). This standardization is made to make this metric consistent across genes. We employ the metrics.mean_squared_error implementation from the python package scikit-learn version 1.2.1.

Statistical analysis

The reported error bars correspond to 95% confidence intervals derived using non-parametric bootstrapping using 100 bootstrap iterations.

Computing hardware and software.

In this study, all coding was conducted using Python version 3.9. The neural networks were implemented with PyTorch version 2.1.0 with CUDA version 11.7. For whole slide image (WSI) pre-processing and manipulation, we utilized OpenSlide version 4.3.1 and openslide-python version 1.2.0. Metrics were implemented using Scikit-learn version 1.2.1. and Scipy version 1.13.0 Data processing tasks were performed using Pandas version 1.4.2, Numpy version 1.21.5, Pillow version 9.3.0 and OpenCV-python version 3.3.1. Matplotlib version 3.7 was employed for generating plots. The training of the patch encoder was based on the original iBOT implementation, which is available at1. The pretraining of iBOT was carried out on 8 × 80GB NVIDIA A100 GPUs, configured for multi-GPU training using distributed data parallelism. Downstream experiments were conducted on 3 × 24GB NVIDIA 3090 GPUs. Slide annotation and visualization were done using QuPath version 0.4.3. Finally, rat microarray probes were converted using SynGoPortal, accessible at2, and human microarray probes were converted using the python API of PythonBio, version 1.83. The viewer used for the online demo is based on OpenSeadragon (version 4.1.0) and JavaScript (version ES13). The GO processes for rats and humans were queried using the MOET tool (Multi Ontology Enrichment Tool) from the Rat Genome Database accessible at3. Genes linked to chemical and drug-induced liver injury were retrieved using the Comparative Toxicology Database accessible at4.

Supplementary Material

1

Acknowledgements

This work was supported in part by BWH & MGH Pathology, BWH President’s Fund, Massachusetts Life Sciences Center, NIGMS R35GM138216 (F.M.), and BWH President’s Scholar fund (G.G.) and NIGMS R35GM149270 (G.G.). R.J.C. was also supported by the NSF Graduate Fellowship. L.O. was supported by the German Academic Exchange (DAAD) Fellowship. We thank Dr. Pierre Moulin for the early-stage discussions on toxicity assessment and computational toxicology.

Footnotes

Code availability

Upon publication, the authors will release code and pre-trained models for extracting patch-level embeddings and lesion classification, performing weakly-supervised gene expression regression, and analyzing gene expression predictions and lesion predictions.

Data availability

The TG-GATEs data, which includes histopathology whole-slide images and labels, is openly accessible on the National Institute of Biomedical Innovation portal at5. A subset of 230 TG-GATEs with pixel annotations can be freely accessed from Zenodo at6. Patch annotations, as well as pseudo-patch annotations generated by the fine-tuned patch encoder, are available on a case-by-case basis, depending on specific needs. The microarray data, part of The Japanese Toxicogenomics Project, were obtained from the Toxigates portal, accessible at7.

References

  • 1.Cook D. et al. Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nature Reviews Drug Discovery 13, 419–431 (2014). [DOI] [PubMed] [Google Scholar]
  • 2.Waring M. J. et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nature Reviews Drug Discovery 14, 475–486 (2015). [DOI] [PubMed] [Google Scholar]
  • 3.Weaver R. J. & Valentin J.-P. Today’s Challenges to De-Risk and Predict Drug Safety in Human “Mind-the-Gap”. Toxicological Sciences 167, 307–321 (2019). [DOI] [PubMed] [Google Scholar]
  • 4.Seyhan A. A. Lost in translation: the valley of death across preclinical and clinical divide – identification of problems and overcoming obstacles. Translational Medicine Communications 4, 1–19 (2019). [Google Scholar]
  • 5.Waters M. D. & Fostel J. M. Toxicogenomics and systems toxicology: aims and prospects. Nature Reviews Genetics 5, 936–948 (2004). [DOI] [PubMed] [Google Scholar]
  • 6.Hoeng J. et al. Hayes’ Principles and Methods of Toxicology, chap. Toxicopanomics: Applications of Genomics, Transcriptomics, Proteomics, and Lipidomics in Predictive Mechanistic Toxicology; (2023). [Google Scholar]
  • 7.Huang R. et al. Modelling the Tox21 10K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nature Communications 7, 1–10 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Banerjee P., Eckert A. O., Schrey A. K. & Preissner R. ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Research 46, W257–W263 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pognan F. et al. The evolving role of investigative toxicology in the pharmaceutical industry. Nature Reviews Drug Discovery 22, 317–335 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ganter B., Snyder R. D., Halbert D. N. & Lee M. D. Toxicogenomics in drug discovery and development: mechanistic analysis of compound/class-dependent effects using the DrugMatrix® database. Pharmacogenomics (2006). [DOI] [PubMed] [Google Scholar]
  • 11.Smith B. P. et al. Identification of early liver toxicity gene biomarkers using comparative supervised machine learning. Scientific Reports 10, 19128 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sutherland J. J. et al. Toxicogenomic module associations with pathogenesis: a network-based approach to understanding drug toxicity. Pharmacogenomics J. 18, 377–390 (2018). [DOI] [PubMed] [Google Scholar]
  • 13.Podtelezhnikov A. A. et al. Quantitative Transcriptional Biomarkers of Xenobiotic Receptor Activation in Rat Liver for the Early Assessment of Drug Safety Liabilities. Toxicological Sciences 175, 98–112 (2020). [DOI] [PubMed] [Google Scholar]
  • 14.Callegaro G. et al. Identifying multiscale translational safety biomarkers using a network-based systems approach. Iscience 26 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Davis A. P. et al. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Research 39, D1067–D1072 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Davis A. P. et al. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Research 45, D972–D978 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Davis A. P. et al. Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Research 51, D1257–D1262 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.LeCun Y., Bengio Y. & Hinton G. Deep learning. Nature 521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
  • 19.Van der Laak J., Litjens G. & Ciompi F. Deep learning in histopathology: the path to the clinic. Nature Medicine 27, 775–784 (2021). [DOI] [PubMed] [Google Scholar]
  • 20.Song A. H. et al. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering 1, 930–949 (2023). [Google Scholar]
  • 21.Saldanha O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precision Oncology 7, 35 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wagner S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kather J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nature Cancer 1, 789–799 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Echle A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. British Journal of Cancer 124, 686–696 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fu Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nature Cancer 1, 800–810 (2020). [DOI] [PubMed] [Google Scholar]
  • 26.Wang S. et al. Predicting egfr mutation status in lung adenocarcinoma on computed tomography image using deep learning. European Respiratory Journal 53 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Loeffler C. M. L. et al. Artificial intelligence–based detection of fgfr3 mutational status directly from routine histology in bladder cancer: A possible preselection for molecular testing? European Urology Focus 8, 472–479 (2022). [DOI] [PubMed] [Google Scholar]
  • 28.Kather J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nature Medicine 25, 1054–1056 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schmauch B. et al. A deep learning model to predict rna-seq expression of tumours from whole slide images. Nature Communications 11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.El Nahhas O. S. et al. Regression-based deep-learning predicts molecular biomarkers from pathology slides. Nature Communications 15, 1253 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.He B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nature Biomedical Engineering 4, 827–834 (2020). [DOI] [PubMed] [Google Scholar]
  • 32.Alsaafin A., Safarpoor A., Sikaroudi M., Hipp J. D. & Tizhoosh H. Learning to predict rna sequence expressions from whole slide images with applications for search and classification. Communications Biology 6, 304 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Igarashi Y. et al. Open TG-GATEs: a large-scale toxicogenomics database. Nucleic Acids Research 43, D921–D927 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Nyström-Persson J. et al. Toxygates: interactive toxicity analysis on a hybrid microarray and linked data platform. Bioinformatics 29, 3080–3086 (2013). 24048354. [DOI] [PubMed] [Google Scholar]
  • 36.Soufan O. et al. T1000: A reduced toxicogenomics gene set for improved decision making. PeerJ (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dietterich T. G., Lathrop R. H. & Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence 89, 31–71 (1997). [Google Scholar]
  • 38.Ilse M., Tomczak J. & Welling M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, 2132–2141 (2018). [Google Scholar]
  • 39.Lu M. Y. et al. Data efficient and weakly supervised computational pathology on whole slide images. Nature Biomedical Engineering (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Deng J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255 (Ieee, 2009). [Google Scholar]
  • 41.Wang X. et al. Transpath: Transformer-based self-supervised learning for histopathological image classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention (2021). [Google Scholar]
  • 42.Lu M. et al. Towards a visual-language foundation model for computational pathology. Nature Medicine (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chen R. J. et al. Towards a general-purpose foundation model for computational pathology. Nature Medicine (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Vaswani A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017). [Google Scholar]
  • 45.Dosovitskiy A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021). [Google Scholar]
  • 46.Zhou J. et al. ibot: Image bert pre-training with online tokenizer. International Conference on Learning Representations (ICLR) (2022). [Google Scholar]
  • 47.Caron M. et al. Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9630–9640 (2021). [Google Scholar]
  • 48.He K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022). [Google Scholar]
  • 49.Lee D. H., Choi S., Kim H. J. & Chung S.-Y. Unsupervised visual representation learning via mutual information regularized assignment. Advances in Neural Information Processing Systems 35, 29610–29623 (2022). [Google Scholar]
  • 50.Goutam K., Ielasi F. S., Pardon E., Steyaert J. & Reyes N. Structural basis of sodium-dependent bile salt uptake into the liver. Nature 606, 1015–1020 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Liao M. et al. Hepatic tnfrsf12a promotes bile acid-induced hepatocyte pyroptosis through nfκb/caspase-1/gsdmd signaling in cholestasis. Cell Death Discovery 9, 26 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Xu Y. et al. Hepatocytic Activating Transcription Factor 3 Protects Against Steatohepatitis via Hepatocyte Nuclear Factor 4α. Diabetes 70, 2506 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Vorrink S. U., Severson P. L., Kulak M. V., Futscher B. W. & Domann F. E. Hypoxia perturbs aryl hydrocarbon receptor signaling and CYP1A1 expression induced by PCB 126 in human skin and liver-derived cell lines. Toxicology and Applied Pharmacology 274, 408–416 (2014). 24355420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Shimizu Y. et al. Association of CYP1A1 and CYP1B1 inhibition in in vitro assays with drug-induced liver injury. The Journal of Toxicological Sciences 46, 167–176 (2021). [DOI] [PubMed] [Google Scholar]
  • 55.Vedi M. et al. 2022 updates to the Rat Genome Database: a Findable, Accessible, Interoperable, and Reusable (FAIR) resource. Genetics 224, iyad042 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Santamaría D. et al. Cdk1 is sufficient to drive the mammalian cell cycle. Nature 448, 811–815 (2007). [DOI] [PubMed] [Google Scholar]
  • 57.Jauhiainen A. et al. Distinct Cytoplasmic and Nuclear Functions of the Stress Induced Protein DDIT3/CHOP/GADD153. PLoS One 7, e33208 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Yong J. et al. Chop/Ddit3 depletion in β cells alleviates ER stress and corrects hepatic steatosis in mice. Science Translational Medicine 13 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Ajoolabady A. et al. Endoplasmic reticulum stress in liver diseases. Hepatology 77, 619 (2022).35524448 [Google Scholar]
  • 60.Boomgaarden I., Vock C., Klapper M. & Döring F. Comparative Analyses of Disease Risk Genes Belonging to the Acyl-CoA Synthetase Medium-Chain (ACSM) Family in Human Liver and Cell Lines. Biochem. Genet. 47, 739–748 (2009). [DOI] [PubMed] [Google Scholar]
  • 61.van Diepen J. A. et al. PPAR-alpha dependent regulation of vanin-1 mediates hepatic lipid metabolism. Journal of Hepatology 61, 366–372 (2014). [DOI] [PubMed] [Google Scholar]
  • 62.Godoy P. et al. Recent advances in 2d and 3d in vitro systems using primary hepatocytes, alternative hepatocyte sources and non-parenchymal liver cells and their use in investigating mechanisms of hepatotoxicity, cell signaling and adme. Archives of toxicology 87, 1315–1530 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hai T., Wolfgang C. D., Marsee D. K., Allen A. E. & Sivaprasad U. ATF3 and Stress Responses. Gene Expr. 7, 321 (1999). [PMC free article] [PubMed] [Google Scholar]
  • 64.Wink S. et al. Quantitative High Content Imaging of Cellular Adaptive Stress Response Pathways in Toxicity for Chemical Safety Assessment. Chemical Research in Toxicology 27, 338–355 (2014). [DOI] [PubMed] [Google Scholar]
  • 65.Costa E. C. et al. 3D tumor spheroids: an overview on the tools and techniques used for their analysis. Biotechnol. Adv. 34, 1427–1441 (2016). [DOI] [PubMed] [Google Scholar]
  • 66.Zhang B., Korolj A., Lai B. F. L. & Radisic M. Advances in organ-on-a-chip engineering. Nature Reviews Materials 3, 257–278 (2018). [Google Scholar]
  • 67.Paul S. M. et al. How to improve r&d productivity: the pharmaceutical industry’s grand challenge. Nature reviews Drug discovery 9, 203–214 (2010). [DOI] [PubMed] [Google Scholar]
  • 68.DiMasi J. A., Grabowski H. G. & Hansen R. W. Innovation in the pharmaceutical industry: new estimates of r&d costs. Journal of Health Economics 47, 20–33 (2016). [DOI] [PubMed] [Google Scholar]
  • 69.Moulin P. et al. Imi—bigpicture: A central repository for digital pathology. Journal of Toxicologic Pathology (2021). [DOI] [PubMed] [Google Scholar]
  • 70.Dix D. J. et al. The ToxCast Program for Prioritizing Toxicity Testing of Environmental Chemicals. Toxicological Sciences 95, 5–12 (2007). [DOI] [PubMed] [Google Scholar]
  • 71.Sanz F. et al. etransafe: data science to empower translational safety assessment. Nature Reviews Drug Discovery 22 (2023). [DOI] [PubMed] [Google Scholar]
  • 72.J. V. C., Jover R., C. P. M.-J. & M. J. G.-L. Hepatocyte cell lines: their use, scope and limitations in drug metabolism studies. Expert Opinion on Drug Metabolism & Toxicology 2, 183–212 (2006). [DOI] [PubMed] [Google Scholar]
  • 73.Grill J.-B. et al. Bootstrap your own latent - a new approach to self-supervised learning. In Larochelle H., Ranzato M., Hadsell R., Balcan M. & Lin H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 21271–21284 (Curran Associates, Inc., 2020). [Google Scholar]
  • 74.Devlin J., Chang M.-W., Lee K. & Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers; ) (2018). [Google Scholar]
  • 75.Jaume G. et al. Quantifying explainers of graph neural networks in computational pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8106–8116 (2021). [Google Scholar]
  • 76.Javed S. A. et al. Additive mil: Intrinsically interpretable multiple instance learning for pathology. In Advances in Neural Information Processing Systems (NeurIPS) (2022). [Google Scholar]
  • 77.Bankhead P. et al. Qupath: Open source software for digital pathology image analysis. Scientific Reports 7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Thoolen B. et al. Proliferative and nonproliferative lesions of the rat and mouse hepatobiliary system. Toxicologic Pathology 38, 5S–81S (2010). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

The TG-GATEs data, which includes histopathology whole-slide images and labels, is openly accessible on the National Institute of Biomedical Innovation portal at5. A subset of 230 TG-GATEs with pixel annotations can be freely accessed from Zenodo at6. Patch annotations, as well as pseudo-patch annotations generated by the fine-tuned patch encoder, are available on a case-by-case basis, depending on specific needs. The microarray data, part of The Japanese Toxicogenomics Project, were obtained from the Toxigates portal, accessible at7.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES