Abstract
The rapid expansion of chemical diversity presents substantial challenges for health and environmental risk assessment, necessitating the development of alternative, high-throughput computational methodologies. A key hurdle in toxicity prediction lies in the heterogeneous nature of adverse health outcomes at the tissue and cellular levels, as biological processes exhibit cell-type-specific and context-dependent responses. Effective prediction of individual-level health effects thus requires the integration of multimodal data, capturing both structural and biological perturbations induced by chemical exposures. We present GenotoxNet, a multimodal deep learning framework that enhances genotoxicity prediction by systematically integrating chemical structures, high-throughput in vitro assay data, and transcriptomics data. By leveraging this multimodal integration, GenotoxNet effectively captures cellular heterogeneity and mechanistic complexity, enabling more comprehensive evaluation of chemical-induced genotoxicity. The model outperformed single-modality approaches, achieving AUCROC of 0.891 ± 0.017 on the internal test set, demonstrating superior predictive capability over models relying solely on chemical structures or individual biological features. The model still performed well on the external chemical set. Beyond classification, GenotoxNet facilitates mechanistic interpretation by aligning multimodal feature representations of genotoxic chemicals with adverse outcome pathway (AOP). This framework not only offers a robust approach for predicting genotoxicity but also aids in the development of preventive strategies and regulatory decisions aimed at mitigating the health risks posed by hazardous chemicals.
Keywords: multimodal deep learning, systems biology, transcriptomics, adverse outcome pathways, genotoxicity
Graphical Abstract
Graphical Abstract.
Introduction
The rapid expansion of industrial and environmental chemicals has raised significant concerns regarding their potential health impacts. Traditional in vitro and in vivo toxicity assessments, while crucial, pose ethical, and logistical challenges, particularly under the 3R principles (i.e. Reduce, Refine, and Replace animal testing) [1]. To address these limitations, in silico approaches, such as Quantitative Structure–Activity Relationship (QSAR), have been widely adopted for toxicity prediction [2]. QSAR models leverage structural similarities between chemicals to infer toxicological properties; however, they often fail to capture cell-type-specific and tissue-specific toxicological responses, resulting in “activity cliffs” [3, 4], where structurally similar compounds exhibit markedly different toxic effects.
For instance, genotoxicity, characterized by chemical-induced genetic damage, encompasses a range of mechanisms beyond direct mutagenicity, including chromosomal breakage and nonteratogenic mutations. Previous QSAR research has often employed large numbers of classifiers and molecular descriptors, selecting the best combinations to address the limitations of algorithmic or structure-based models. Traditional QSAR approaches have sought to address these complexities through extensive descriptor-based modeling, employing large numbers of molecular fingerprints and machine learning (ML) algorithms [5]. However, even sophisticated structure-based models struggle to distinguish between closely related chemicals with divergent toxicological outcomes. A notable example is methyl parathion (CAS No. 298-00-0) and parathion (CAS No. 56-38-2), which differ by a methyl group (-CH3) or ethyl group (-C2H5), yet only methyl parathion exhibits genotoxicity [6]. Such discrepancies underscore the limitations of structure-dependent prediction methods in accurately capturing cellular heterogeneity and multitarget interactions that drive systemic toxicity.
Beyond structural features, biological response data—such as transcriptomics and high-throughput bioassay data—offer deeper mechanistic insights by capturing cell-specific molecular perturbations following chemical exposure. This additional layer of information can help disentangle the complex interactions between chemicals and biological systems, providing a more nuanced understanding of toxicity mechanisms. For instance, Li et al. [7] constructed a large-scale toxicogenomic dataset, analyzing 6000 gene expression profiles from HepG2 cells exposed to 330 chemicals across multiple doses. Their random forest (RF) model achieved the area under the receiver operating characteristic curve (AUCROC) of 72.2% for carcinogenicity and 82.3% for genotoxicity, demonstrating that short-term transcriptomic responses can partially predict long-term toxicological outcomes. However, the effectiveness of transcriptomic and high-throughput screening (HTS) data is often constrained by experimental variability and their inability to fully capture Molecular Initiating Events (MIEs), which trigger toxicity cascades.
To enhance predictive accuracy and mechanistic interpretability, integrating chemical structure, gene expression, and bioassay data within a multimodal framework has emerged as a promising solution. Liu et al. [8] combined ToxCast bioactivity descriptors with ML algorithms to predict hepatotoxicity for 677 chemicals in the ToxRefDB database. However, their approach primarily relied on input-level data fusion, limiting the model’s ability to capture cross-modal relationships and deep feature interactions. With the rapid development of artificial intelligence (AI), and deep learning (DL), relying on its superior algorithms that can automatically extract deep-level features of data and flexible model architectures, has shown great potential in improving prediction accuracy and expanding application scopes [9–12]. Especially, recent advancements in multimodal deep learning (MMDL) have overcome these limitations by employing intermediate feature fusion, enabling models to extract complex relationships while preserving modality-specific features [13, 14]. For instance, Yang et al. [15] developed GPDRP, a multimodal model for drug response prediction, integrating molecular graphs with gene pathway activity scores, while XGDP [16] incorporated cancer cell gene expression profiles to refine predictions. These approaches, based on graph neural networks (GNNs) and DL architectures, demonstrate the potential of cross-modal integration [17] in improving biological response modeling.
Building upon these advancements, we introduce GenotoxNet, a MMDL model designed to predict genotoxicity by integrating chemical structure features, transcriptomics data, and ToxCast HTS bioassays. By employing intermediate fusion, the model effectively captures cellular heterogeneity, ensuring that critical biological interactions are preserved while enabling complex cross-modal feature extraction. GenotoxNet demonstrates superior predictive performance compared to recent QSAR and single-modality models, achieving state-of-the-art results across multiple evaluation metrics. To further enhance interpretability, we leverage multimodal feature representations of typical genotoxic chemicals in conjunction with adverse outcome pathway (AOP) analysis, bridging in silico predictions with mechanistic toxicology. This approach not only facilitates a deeper understanding of genotoxic mechanisms but also provides a robust framework for risk assessment, regulatory decision-making, and preventive strategies aimed at mitigating the health risks associated with hazardous chemicals.
Materials and methods
Data collection and processing
Data of genotoxic substances
The dataset used in the study is sourced from The Carcinogenome Project and includes 261 chemical structures with genotoxicity labels: 100 identified as genotoxic and 161 as nongenotoxic [7]. To ensure the validity of the molecular structures, molecular structure inspection and cleaning were performed based on their Simplified Molecular Input Line Entry Specification (SMILES). Open-source cheminformatics toolboxes—MolVS and RDKit [18], were utilized to standardize functional groups, handle hydrogen atoms properly, recalculate stereochemistry, remove covalently bound impurities, ensure correct dissociation of acids and bases, generate neutral forms of molecules, standardize tautomers, and remove salts, covalently bound metals, mixtures, and solvents.
Gene expression profiles of genotoxic substances
Gene expression profiles of the 261 substances were obtained from CRCGN_ABC (https://clue.io/data/CRCGN), which includes data for 978 landmark genes and 21 290 inferred genes [19]. Each substance in the dataset is associated with gene expression profiles that vary depending on exposure doses and durations. The control group was considered when extracting the gene expression profile. For more details of the experimental treatment, please refer to Text S1. To ensure high-quality data, only profiles derived from HepG2 cell lines were selected, focusing on those with the highest transcriptional activity scores (TAS), which evaluate transcriptional activity by combining the intensity of transcriptional activity (based on the number of significantly differentially expressed genes) with transcriptional consistency (the reproducibility of gene transcription levels across parallel samples). For each chemical, the gene expression profile with the highest TAS from HepG2 cell lines was retained as the raw gene dataset (referred to as rG). rG contains 978 L1000 landmark genes across all substances. A cleaned subset G, then derived by retaining the 295 genotoxicity-related genes annotated in The Carcinogenome Project (highclass_genotoxicity = POSITIVE) together with their moderated z-scores (MODZ). For each landmark gene and each unique perturbation (combination of compound and dose), MODZ is calculated by determining the number of standard deviations by which the expression of a single gene deviates from the mean of its overall distribution (control group) [7].
Processing and integration of high-throughput screening data
High-throughput screening assay data were obtained from ToxCast, which provides binary response values for 9224 substances across 1473 in vitro bioassay endpoints. Endpoint categories were derived from the official ToxCast in vitro bioassay endpoint summary, as specified by the biological_process_target annotation. Based on prior studies [20–27] on genotoxicity mechanisms, nine relevant categories of bioassay endpoints were selected, including cell proliferation, detection of DNA damage, cell apoptosis, regulation of DNA repair, detection of DNA structure, regulation of morphogenesis, cell cycle, regulation of transcription factor activity, and cell death (Table S1). Following the selection of assays within these categories, the final cleaned dataset (referred to as B) included binary response data for 279 substances across 434 bioassay endpoints. For comparative analysis, the raw dataset (rB) containing 941 bioassays was also retained for model evaluation.
To construct a comprehensive dataset integrating gene expression profiles with HTS data, 244 substances were selected, comprising 92 genotoxic and 152 nongenotoxic substances (Table S2). The proportions of genotoxic and nongenotoxic substances were maintained consistently across all datasets, ensuring no duplication among the subsets.
Multimodal deep learning model architecture
In the molecular graph G = (V, E), V represents the set of all atoms (nodes) in the molecule, and E denotes the molecular bonds (edges) between them. Each node vi corresponds to atom i, and the edges indicate the connections between these atoms. Each molecule is characterized by a node feature matrix F of size n × m and an adjacency matrix A of size n × n, where n is the total number of atoms and m is the number of atomic features (set to 75). These atomic features include properties such as atom types and hybridization states (Table S3), with each feature and bond represented as a corresponding vector. The feature matrix F and the adjacent matrix A serve as inputs to the multimodal neural network. Each row of the feature matrix F is a binary vector of length 75, where fik is set to “1” if atom i possesses feature k and “0” otherwise. The adjacent matrix A illustrates the bonding relationships between atoms, where aij equals 1 if there is a chemical bond between atom i and atom j, and 0 if no bond exists. Further details on the molecular graph convolution process of GenotoxNet are provided in Text S2.
The architecture of the MMDL model, GenotoxNet, is shown in Fig. 1. GenotoxNet consists of three graph convolution layers (GraphConv), two sets of fully connected (FC) layers, and three 2D convolution layers (2D_Conv). The model receives both a feature matrix and an adjacency matrix, which are processed through the three GraphConv layers, generating output feature matrices of sizes 48 × 256, 256 × 256, and 256 × 100, respectively. These outputs are then transformed into feature vectors of length 100 via a global maximum pooling layer (global_maxpooling). In parallel, gene expression profiles and ToxCast assay data are processed by two tandem FC layers. The first FC layer processes the ToxCast assay data, generating a 256-dimensional feature vector, while the second FC layer applies the ReLU activation function to reduce its dimensions to 100. The chemical structure and the two types of bioassay data are subsequently concatenated and passed through a 300-dimensional FC layer. The three 2D_Conv layers operate on the fused feature vector, utilizing convolution kernels of sizes 30/150, 10/5, and 5/5, each followed by a two-dimensional maximum pooling layer (2D_maxpooling). The corresponding pooling kernel sizes for the three 2D_maxpooling layers are 1 × 2, 1 × 3, and 1 × 3, respectively. Finally, a FC layer with a Sigmoid activation function is used as the output layer to compress the result into a range of [0,1]. This output represents the probability scores of the genotoxicity of substances, which are then converted into binary labels—genotoxic or nongenotoxic—based on specified thresholds. Details on evaluation metrics and hyperparameter optimization process are provided Text S3 and Text S4, respectively. The model construction and participated in training of this study were carried out on a graphics processing workstation equipped with an NVIDIA GeForce RTX 3090. More detailed information about the toolkit is provided in Text S5.
Figure 1.
Overview of the data processing workflow and architecture of GenotoxNet for genotoxicity prediction. The workflow begins with data collection from three sources: the Carcinogenome project (chemical structure), CRCGN_ABC (gene expression data), and ToxCast (bioassay data). Feature extraction involves constructing an adjacency matrix and a feature matrix from molecular graphs, as well as selecting 295 genes associated with genotoxicity and 434 relevant bioassay endpoints. The hybrid architecture of GenotoxNet integrates molecular graph information processed by a GCN module, gene expression data, and bioassay data through fully connected layers. The concatenated features are passed through a 2D CNN for classification, predicting genotoxic or nongenotoxic outcomes. (B) Schematic figure of dataset split and 5-CV used in the study.
Model training and evaluation
Optimal hyperparameters for GenotoxNet were selected via grid search and 5-fold cross-validation (5-CV), a procedure that reduces variance from any single split. The optimal hyperparameters and five folds yielding the highest mean AUCROC in 5-CV was retained (Table S4). The fold with highest validation AUCROC (optimal fold) among the five folds was selected. Then, to obtain stable performance estimates under limited data, the optimal validation fold was repeatedly stratified to sample 50% of its instances as an internal test set (including five independent subsets), preserving class balance and enabling consistent evaluation (Fig. 1B). The final model was retrained using the optimal hyperparameters and training set of the optimal fold. Classification performance was assessed using AUCROC, area under the precision-recall curve (AUCPRC), F1 score and Matthews correlation coefficient (MCC), and is reported as mean ± standard deviation of the five resamples.
Model comparison
To benchmark genotoxicity classification, GenotoxNet was compared with ML and MMDL baselines under identical data splits and tuning protocols. Modalities are denoted S (chemical structure), G (gene expression), and B (bioassay). Two ML models using concatenated features (S + G + B) were evaluated: XGBoost and RF. The MMDL baseline GPDRP [15] was considered in its original two-modality form, which fuses molecular graphs with gene features (GPDRP-2 M; S + G). For comparability in the three-modality setting, an augmented variant incorporating an additional bioassay branch with intermediate fusion consistent with the original design was implemented (GPDRP-3 M; S + G + B). All baselines were trained and evaluated on the same datasets as GenotoxNet, with hyperparameters selected via grid search. Performance is reported as mean ± standard deviation across five stratified resamples.
Identification of key genes and adverse outcome pathway analysis
The absolute gradient values of genes are obtained from the FC layers corresponding to the gene expression profiles, and are used to evaluate the contribution and importance to the model. The gene gradients are the partial derivative of the predicted value of genotoxicity with respect to the gene expression feature. To identify key regulatory genes, the sum of the absolute gradient values was calculated for all 295 genes across 244 substances and ranked the sum of absolute gradient values for genes. This processing method reduced instability and contingency. Then, seven representative genotoxic substances (N-nitrosomorpholine, acrolein, methyl yellow, bromodichloromethane,1,3-dichloropropene, furfural and N-nitrosodiethylamine (NDEA)) were selected for analysis, which are pesticides and industrial chemicals, with five of them classified as Hazardous Air Pollutants (HAPs) by Environmental Protection Agency (EPA). To systematically analyze the toxicological mechanisms of these substances, the AOP framework was utilized. AOPs provide a structured approach for describing how chemicals or environmental stressors initiate MIE, which propagate through Key Events (KEs) and ultimately lead to Adverse Outcomes (AOs). To identify AOP events related to these genotoxic substances, AOP-helpFinder [28, 29] and AOP-Wiki (https://aopwiki.org) were used to retrieve relevant MIEs, KEs or AOs. Additionally, EPA AOP-DB [30, 31] and the EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/) were consulted to extract complete AOP pathways (MIE → KE → AO) and to explore the relationship between AOPs and bioassay data. The process for linking genotoxic substances to AOP events and constructing mechanistic pathways is detailed in Text S6.
Results and discussion
Analysis of datasets and comparison of substances features
Toxicological effects of small molecules are driven by their interactions with specific biological targets, initiating downstream responses. Chemical structure defines these initial interactions, while HTS assay data and gene expression profiles capture subsequent pathway alterations and biological effects. Since each feature type provides information at distinct stages of the toxicity pathway, their integration is crucial for enhancing prediction accuracy and mechanistic understanding. Structural similarity was assessed using 881-bit PubChem fingerprints and Tanimoto coefficient. The mean pairwise similarity was 0.26. The pairwise similarity matrix for 244 substances (Fig. 2A) shows predominantly low similarity (blue), with only a small subset showing higher similarity (red), consistent with substantial structural diversity and minimal redundancy in the dataset. To further quantify the relationships between different feature types, Pearson correlation coefficient (PCC) was calculated for genotoxic and nongenotoxic substances across the three feature types (Fig 2B–D). Bioassay data showed the highest within-group correlations (genotoxic: 0.527 ± 0.201, nongenotoxic: 0.583 ± 0.208), chemical structure ranked second, and gene expression data displayed the lowest correlations (genotoxic: 0.199 ± 0.129, nongenotoxic: 0.154 ± 0.115). These differences highlight distinct but complementary information provided by each data modality. Among these, bioassay data exhibited the highest correlations, indicating a strong alignment between functional responses and toxicity assays. Structural features rank second, highlighting their ability to capture shared chemical properties, while gene expression profiles showed the lowest correlations, highlighting the complexity and variability of transcriptional responses. These findings suggest that while chemical structure and bioassays align well with known toxicological targets, gene expression provides unique mechanistic insights that may diverge from structure-based predictions.
Figure 2.
Distribution of datasets and feature correlations among genotoxic and nongenotoxic substances. (A) Heat map of pairwise substances based on Tanimoto similarity coefficient. (B–D) 3D scatter plots depicting the relationships among chemical structure, gene expression, and bioassay responses across different substance groups, including genotoxic versus nongenotoxic substances (B), nongenotoxic substances (C), and genotoxic substances (D).
Further analysis revealed that structural correlations among genotoxic substances were relatively high correlations but with larger variability, as reflected in the higher standard deviation (structure axis in Fig 2B–D). This underscores the limitations of using structure alone for toxicity classification, as structurally similar compounds can exhibit markedly different toxicological profiles. Further, we analyzed three representative pairs of substances to illustrate feature differences explicitly (Fig. 3). For example, Chloroacetic acid (CAS No. 79-11-8) and Dichloroacetic acid (CAS No. 79-43-6), two structurally similar disinfection by-products found in drinking water, exhibit opposing genotoxicity outcomes, highlighting activity cliffs. Despite their structural resemblance, their gene expression profiles and bioassay response show weak correlation. Only six differentially expressed genes overlap between the two compounds, and no bioassay endpoints are shared, indicating that structural similarity alone is insufficient for predicting genotoxic effects. A similar pattern is observed in HC blue No.2 (CAS No. 33229-34-4) and 2-(4-amino-2-nitroanilino) ethanol (CAS No. 2871-01-4), both hair dye ingredients sharing hundreds of structural features. However, their differentially expressed gene profiles show minimal overlap, suggesting that their induce genotoxicity through distinct molecular mechanisms despite their structural similarity. Conversely, structurally distinct substances (tetrachlorobenzidine versus thiram) showed substantial overlaps in gene expression and bioassay responses, suggesting common mechanistic pathways. These findings underscore the importance of integrating chemical structure, gene expression, and bioassay data, as relying solely on structural similarity can lead to misleading toxicity predictions, whereas multimodal approaches provide a more comprehensive understanding of genotoxic mechanisms.
Figure 3.
Shared and distinct features across chemical structure, gene expression, and bioassay modalities for selected substance pairs. Genotoxic substances are marked in bold. Distinct markers represent the former and the latter of the substance pair, respectively.
Model performance and ablation analysis
To quantify the contribution of each input modality, models were trained with chemical structure (S), gene expression (G), and bioassay (B) individually and in combination (S + G + B). Under the optimal hyperparameters, GenotoxNet (S + G + B) achieved a mean AUCROC = 0.898 ± 0.051 on five validation folds (Table S5), indicating strong predictive performance with does not highly depend on specific data splits. The best-performing fold was retained for subsequent training. Chemical-space coverage was examined by projecting S (881-bit PubChem fingerprints), G (295 genes), and B (434 endpoints) into two dimensions using t-distributed stochastic neighbor embedding (t-SNE). The optimal training and validation sets exhibited similar spatial patterns with only a few outliers (Fig. S1), suggesting that the test set largely lies within the model’s applicability domain. On the internal test set, GenotoxNet (S + G + B) yielded AUCROC of 0.891 ± 0.017 and AUCPRC of 0.874 ± 0.045 (Fig. 4A, S + G + B), with small standard deviations indicating robust generalization across resamples.
Figure 4.
Feature visualization and performance comparison for GenotoxNet. (A) Performance of models trained with various combinations of data types (S, G, and B) on the internal test set. S represents chemical structure; G represents cleaned gene expression data; B represents cleaned bioassay data. (B) Performance comparison of GenotoxNet with other ML/MMDL models on the internal test set. (C) t-SNE projections of chemical structure, gene expression profiles, and bioassay data before and after training.
Ablation experiment further elucidated modality contributions (Fig. 4A). In these analysis, one or more modalities were removed, and the resulting models were trained and evaluated under the same protocol as the full model. Hyperparameters for all ablation models were optimized via 5-CV on the identical training data, and comparisons were made on the same five internal test sets. The full model (S + G + B) outperformed all single-modality and two-modality variants, demonstrating the necessity of multimodal integration. Structure-only model achieved the highest AUCROC (0.847 ± 0.037) among single-input, indicating that structural information can support effective early screening when biological data are unavailable. In head-to-head comparisons with other structure-based ML/DL baselines (Fig. S2), more complex GNNs (e.g. AttentiveFP [32]) underperformed, likely due to overfitting on limited data, whereas simpler ML models architectures generalized better; the S model surpassed all structure-only baselines across metrics, highlighting its practical utility for classification.
Across biological inputs, G consistently outperformed B on all metrics, indicating that transcriptomic responses provide a richer mechanistic representation of cellular perturbations than single-endpoint bioassays. The B-only model showed the weakest performance (AUCROC = 0.582 ± 0.103), aligning with previous reports [33] and with the high inter-assay correlation observed between genotoxic and nongenotoxic substances (Fig. 2B), which can reduce discriminative power when relying solely on phenotypic readouts. Among two-modality models, S + G delivered the strongest F1 and MCC, whereas G + B performed the worst, suggesting that chemical structure contributes more to predictive power than bioassays. Both S + G and S + B outperformed G + B on AUCROC and AUCPRC, emphasizing that structural information plays a dominant role in genotoxicity prediction. However, the integration of gene expression and bioassay data enhances feature richness, improving predictive reliability. The impact of data cleaning was evaluated by comparing models trained on raw (rB, rG) versus cleaned (B, G) datasets (Fig. S3 and Table S6). The S + G + B model trained on cleaned data achieved the best overall performance, highlighting the importance of noise reduction. Notably, the S + G + rB model (cleaned G, raw B) outperformed the S + rG + B model (cleaned B, raw G), indicating that gene-expression data quality exerts a greater influence on model performance than bioassay cleanliness. This suggests that reducing variability in transcriptomic inputs enhance mechanistic clarity, and yielded more reliable genotoxicity predictions.
To contextualize these findings, GenotoxNet was benchmarked against classical ML and MMDL baselines under identical data splits and tuning (Fig. 4B). Models that integrated multiple modalities, either via multimodal DL or simple concatenation, generally exhibited better generalization than structure-only baselines. GenotoxNet achieved the highest performance across all metrics, followed by GPDRP-3M (S + G + B) and GPDRP-2M (S + G). Traditional ML models (RF and XGBoost) using concatenated S + G + B features performed slightly worse than the deep-learning counterparts, though they still benefited from multimodal inputs. Overall, GenotoxNet’s multimodal integration effectively captures cross-modal dependencies and delivers superior genotoxicity predictions, even with limited data.
The model’s ability to extract informative features was further examined using t-SNE projection before and after training. Prior to training, positive and negative samples were chaotically distributed without clear boundaries across individual feature types and their concatenation, indicating a high degree of data complexity (Fig. 4C left). After preliminary training, the feature space began to exhibit separation between genotoxic and nongenotoxic substances, suggesting that the model had started learning relevant discriminative patterns (Fig. 4C right). At final training using concatenated features, the two classes formed clearly distinct clusters, supporting the model's ability to extract meaningful representations from multimodal inputs (Fig. 4C bottom).
Generalization assessment on an external chemical set
To extend the assessment beyond internal sampling, an external chemical set was assembled from compounds in The Carcinogenome Project that lacked verified genotoxicity annotations. These were cross-referenced with CRCGN_ABC and ToxCast to identify 44 overlapping compounds. Their corresponding structures, gene-expression profiles, and bioassay data were curated and cleaned to construct the external set (Table S7). Because gold-standard labels were unavailable, predictions were evaluated against the literature. Among the eight predicted positives, 2-nitrosotoluene (CAS 611-23-4) has been reported to arise via metabolic conversion in vivo and to exhibit genotoxic activity [34]. The Stockholm Convention notes that α-hexachlorocyclohexane (CAS 319-84-6) shows potential genotoxicity, while also emphasizing the need for additional evidence. By contrast, nifedipine (CAS 21829-25-4) was predicted as genotoxic, yet extensive long-term safety evaluations classify it as nongenotoxic, representing a false positive. For 2,6-dinitrotoluene (CAS 606-20-2), literature indicates liver-mediated metabolic activation is required for genotoxicity and that only slight chromosomal damage occurs at high concentrations [35]. Its predicted probability is ~0.5, which indicates that it exhibits genotoxicity (weak) only under specific conditions, consistent with experimental results. These case analyses partially corroborate the external screening results, suggesting good generalization and robustness of GenotoxNet and highlighting its potential for prioritizing uncharacterized compounds. As additional experimental evidence accrues, further prospective validation will clarify the status of other candidates flagged by the model.
Analysis of the molecular mechanisms of genotoxicity
To further elucidate mechanisms captured by the model, the 20 genes with the highest absolute gradient sums were profiled across seven genotoxicants (Fig. 5A). For some genes, such as Cyclin F (CCNF) and Lysine Demethylase 3A (KDM3A), the attribution magnitudes aligned with differential expression (MODZ), suggesting that gradient-based attributions captured biologically coherent signals. CCNF showed consistent upregulation, especially under methyl yellow and NDEA, consistent with impaired DNA repair and heightened genotoxicity [36], whereas KDM3A was predominantly downregulated, suggesting weakened control of cell-cycle and DNA-damage responses; a similar pattern is implicated in N-nitrosomorpholine–mediated genotoxicity [37]. Extending beyond individual gene responses, the network analysis maps genotoxic substances to their AOP components, a substance–AOP network linked chemicals to MIEs, KEs, and AOs, revealing convergent disruptions in DNA damage, oxidative stress, and cell-cycle regulation (Fig. 5B).
Figure 5.
Elucidation of the toxic mechanism pathway of genotoxic substances. (A) Differential gene expression and absolute gradient values of genotoxicity-related genes across seven genotoxic substances. The size of the circle represents the absolute gradient value. The fill intensity reflects the magnitude of gene expression changes, with positive and negative scales indicating upregulation and downregulation. The bar chart on the right represents the ranking of the sum of the gradient values. (B) Network representation of genotoxic substances and their associated AOP events, encompassing seven MIEs, 11 KEs, and six AOs. (C) Mechanistic AOP framework for NDEA-induced genotoxicity.
For NDEA, two MIEs drive downstream cascade (Fig. 5C). Cyp2E1 activation generates reactive intermediates and reactive oxygen species (ROS), causing hepatotoxicity and sustained proliferation that progress toward liver cancer (AOP220). In parallel, DNA alkylation by reactive metabolites introduces covalent adducts that overwhelm repair systems, yielding DNA repair as a KE. Accumulated DNA lesions promote genomic instability, connecting to increased mutations (AOP15), general carcinogenesis (AOP139) and reduced sperm count (AOP322). Mapping gene attributions onto this cascade implicates several high-ranking genes: Nuclear Factor Kappa B Kinase Subunit Epsilon (IKBKE), ranked second, is upregulated under oxidative stress and activates NF-κB, fostering chronic inflammation, amplifying ROS, inducing DNA strand breaks, and promoting proliferation while dampening apoptosis, events consistent with AOP220 and AOP322 [38–40]. CCNF upregulation modulates the inadequate DNA repair KE by disrupting checkpoints, increasing strand-break accumulation, and facilitating mutation and apoptosis (AOP15, AOP139, and AOP322). ABHD4 also showed elevated attribution; although its genotoxic roles remain to be defined, emerging evidence points to involvement in stress-response lipid signaling. Concordant ToxCast assay evidence supports these linkages: the TOX21_RT_HEPG2_FLO cell-viability assay (fluorescence detection of DNA from dead cells) was broadly active, reinforcing the AOP220 trajectory for NDEA. Inter-AOP dependencies were also evident. The oxidative stress KE acts the upstream event of increased mutation and increased DNA strand breaks [41]. In AOP322, inadequate DNA repair KE leads to increased strand breaks and events in turn remain associated. [42]. Together, these results integrate gene-level attributions, differential expression, and assay endpoints into a coherent AOP-centric narrative supported by experimental and database evidence. By systematically integrating S, G, and B, GenotoxNet provides a mechanistic AOP-centric framework, capturing how molecular perturbations propagate through toxicological pathways. This integrative approach enhances model interpretability, providing a data-driven foundation for refining AOP-based risk assessments and regulatory decision-making.
Conclusion
Traditional toxicity assessments, centered on chemical structure alone, often overlook the intricate biological processes that convert molecular interactions into adverse health outcomes. While structural properties can provide preliminary clues about potential toxic effects, cellular and organism-level responses are mediated by multiple, interlinked molecular pathways that cannot be captured solely through structure-based methods. By harnessing multilevel systems biology data—including transcriptomic profiles and high-throughput bioassays—deeper mechanistic insights into how chemicals modulate gene expression, disrupt signaling cascades, and ultimately induce pathological states could be gained. In this work, genotoxicity serves as a demonstration of the methodological framework rather than an endpoint-specific endeavor. But the methodology holds broad applicability for a range of complex toxicological endpoints (e.g. carcinogenicity, neurotoxicity, and developmental toxicity). By uniting multilevel biological data with advanced AI for toxicity predictions, we show how transcriptomics, bioassay outputs, and structural information can collaboratively unravel cellular heterogeneity and early toxicity signals. This strategy holds broader relevance for complex diseases and other health outcomes.
Several methodological challenges remain. The primary challenges are limited data size and incomplete availability of biological modalities in real-world settings; however, these constraints can be reframed as opportunities for principled extension. Data size can be increased through self-supervised pretraining, transfer and multitask learning across related endpoints, and active learning that prioritizes the most informative compounds for follow-up assays. Missing modalities can be handled with modality-imputation and generative models that predict gene or assay responses from structure, coupled with calibrated uncertainty and contrastive alignment across structure, gene, and assay spaces. Future work can focus on inferring missing genetic or bioassay data for new substances directly from their chemical structures. Additional limitations merit attention. Label quality is imperfect because genotoxicity annotations can be context dependent; weak-supervision strategies and probabilistic labels would better reflect uncertainty. Class imbalance and assay redundancy may bias decision boundaries; cost-sensitive losses, focal objectives, and decorrelation penalties can mitigate these effects. Batch effects and domain shifts across laboratories, cell systems, doses, and exposure times limit portability; standardized curation, batch correction, and domain adaptation are needed. The biology is constrained by reliance on a single cell line and landmark-gene panels rather than full transcriptomes, with limited time- and dose–response resolution; future studies should incorporate multicell-line or primary-cell data, time-series designs, single-cell readouts, and high-content phenotypes such as cell painting. External validation was limited by the scarcity of gold-standard labels; prospective, blinded evaluations and federated analyses on privacy-constrained datasets would strengthen generalization claims. Finally, interpretability remains associational; pathway-level, AOP-aware, or causal representation learning paired with targeted perturbation experiments (e.g. CRISPR or RNAi) will help convert model attributions into testable mechanistic hypotheses suitable for regulatory use.
Key Points
Introduces GenotoxNet, a multimodal deep learning framework using genotoxicity as a case study; fuses chemical structure, transcriptomics, and bioassays, outperforming single-modality and structure-only baselines.
Demonstrates strong cross-modal complementarity; structure is a strong baseline, gene expression provides mechanistic signal, and bioassays offer functional corroboration; integrated fusion with low-noise data is most robust.
Provides mechanism-aware interpretability: attributions and assay evidence map to AOP elements, linking substances to downstream KEs and AOs.
Supplementary Material
Contributor Information
Xin Zhang, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Huazhou Zhang, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Xiao Yun, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; Sino-Danish College, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Wenxiao Pan, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Qiao Xue, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Xian Liu, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Jianjie Fu, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 7 Shilongshan Street, Xihu District, Hangzhou 310012, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Aiqian Zhang, State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, 18 Shuangqing Road, Haidian District, Beijing 100085, P. R. China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 7 Shilongshan Street, Xihu District, Hangzhou 310012, P. R. China; College of Resources and Environment, University of Chinese Academy of Sciences, 1 Yanqihu East Rd, Huairou District, Beijing 100190, P. R. China.
Author contributions
Xin Zhang, Huazhou Zhang (Methodology, Investigation, Formal analysis, Data curation, Visualization, Writing–original draft), Xian Liu (Resources, Conceptualization, Software, Writing–review & editing, Supervision, Funding acquisition), Xiao Yun, Wenxiao Pan, Qiao Xue, Jianjie Fu (Software, Validation, Writing-review & editing), and Aiqian Zhang (Software, Funding acquisition, Writing-review & editing)
Conflict of interest: The authors declare no competing financial interest.
Funding
This research was supported by the project of National Natural Science Foundation of China grant numbers 22276197, 22193053, 22022611 and 92143301; the Strategic Priority Research Program of the Chinese Academy of Sciences XDB0750100; and the Youth Innovation Promotion Association of CAS grant number Y2022020.
Data availability
All data and source code used in this study are publicly available at: https://github.com/Zzxin89/GenotoxNet. This repository includes the processed datasets, model implementation, and instructions for reproducing the experiments.
References
- 1. Nawroth C, Krause ET. The academic, societal and animal welfare benefits of Open Science for animal science. Front Vet Sci 2022;9:810989. 10.3389/fvets.2022.810989 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tunkel J, Mayo K, Austin C. et al. Practical considerations on the use of predictive models for regulatory purposes. Environ Sci Technol 2005;39:2188–99. 10.1021/es049220t [DOI] [PubMed] [Google Scholar]
- 3. Stumpfe D, Bajorath J. Exploring activity cliffs in medicinal chemistry. J Med Chem 2012;55:2932–42. 10.1021/jm201706b [DOI] [PubMed] [Google Scholar]
- 4. Stumpfe D, Hu H, Bajorath J. Evolving concept of activity cliffs. ACS Omega 2019;4:14360–8. 10.1021/acsomega.9b02221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Fan D, Yang H, Li F. et al. In silico prediction of chemical genotoxicity using machine learning methods and structural alerts. Toxicol Res 2018;7:211–20. 10.1039/C7TX00259A [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Oki NO, Edwards SW. An integrative data mining approach to identifying adverse outcome pathway signatures. Toxicology 2016;350-352:49–61. 10.1016/j.tox.2016.04.004 [DOI] [PubMed] [Google Scholar]
- 7. Li A, Lu X, Natoli T. et al. The Carcinogenome project: In vitro gene expression profiling of chemical perturbations to predict long-term carcinogenicity. Environ Health Perspect 2019;127:047002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Liu J, Mansouri K, Judson RS. et al. Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. Chem Res Toxicol 2015;28:738–51. 10.1021/tx500501h [DOI] [PubMed] [Google Scholar]
- 9. Le NQK. Predicting emerging drug interactions using GNNs. Nat Comput Sci 2023;3:1007–8. 10.1038/s43588-023-00555-7 [DOI] [PubMed] [Google Scholar]
- 10. Zhao Z, Gui J, Yao A. et al. Improved prediction model of protein and peptide toxicity by Integrating Channel attention into a convolutional neural network and gated recurrent units. ACS Omega 2022;7:40569–77. 10.1021/acsomega.2c05881 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Januszewski M, Jain V. Next-generation AI for connectomics. Nat Methods 2024;21:1398–9. 10.1038/s41592-024-02336-0 [DOI] [PubMed] [Google Scholar]
- 12. Li Y, Huang C, Ding L. et al. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019;166:4–21. 10.1016/j.ymeth.2019.04.008 [DOI] [PubMed] [Google Scholar]
- 13. Li Y, Wu F-X, Ngom A. Multimodal machine learning: a survey and taxonomy. Brief Bioinform 2018;19:325–40. 10.1093/bib/bbw113 [DOI] [PubMed] [Google Scholar]
- 14. Baltrusaitis T, Ahuja C, Morency L-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 2019;41:423–43. 10.1109/TPAMI.2018.2798607 [DOI] [PubMed] [Google Scholar]
- 15. Yang Y, Li P. GPDRP: a multimodal framework for drug response prediction with graph transformer. BMC Bioinf 2023;24:484. 10.1186/s12859-023-05618-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wang C, Kumar GA, Rajapakse JC. Drug discovery and mechanism prediction with explainable graph neural networks. Sci Rep 2025;15:179. 10.1038/s41598-024-83090-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: A review. Brief Bioinform 2022;23:bbab569. 10.1093/bib/bbab569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Grisoni F, Merk D, Byrne R. et al. Scaffold-hopping from synthetic drugs by holistic molecular representation. Sci Rep 2018;8:16469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Subramanian A, Narayan R, Corsello SM. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 2017;171:1437–1452.e17. 10.1016/j.cell.2017.10.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Christmann M, Kaina B. Transcriptional regulation of human DNA repair genes following genotoxic stress: trigger mechanisms, inducible responses and genotoxic adaptation. Nucleic Acids Res 2013;41:8403–20. 10.1093/nar/gkt635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Eastmond DA, Hartwig A, Anderson D. et al. Mutagenicity testing for chemical risk assessment: update of the WHO/IPCS harmonized scheme. Mutagenesis 2009;24:341–9. 10.1093/mutage/gep014 [DOI] [PubMed] [Google Scholar]
- 22. Kaina B. DNA damage-triggered apoptosis: critical role of DNA repair, double-strand breaks, cell proliferation and signaling. Biochem Pharmacol 2003;66:1547–54. 10.1016/S0006-2952(03)00510-0 [DOI] [PubMed] [Google Scholar]
- 23. Lopez, Perez R, Münz F, Kroschke J. et al. Cell cycle-specific measurement of YH2AX and apoptosis after genotoxic stress by flow cytometry. J Vis Exp 2019;151:59968. 10.3791/59968 [DOI] [PubMed] [Google Scholar]
- 24. Ng C-T, Li JJ, Bay B-H. et al. Current studies into the genotoxic effects of nanomaterials. J Nucleic Acids 2010;2010:947859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nohmi T, Tsuzuki T. Chapter 4 - possible mechanisms underlying genotoxic thresholds: DNA repair and translesion DNA synthesis. In: Nohmi T, Fukushima S (eds.), Thresholds of Genotoxic Carcinogens. Boston: Academic Press, 2016, 49–66.
- 26. Vinson RK, Hales BF. DNA repair during organogenesis. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 2002;509:79–91. 10.1016/S0027-5107(02)00223-3 [DOI] [PubMed] [Google Scholar]
- 27. Wu Z-H, Shi Y, Tibbetts RS. et al. Molecular linkage between the kinase ATM and NF-κB signaling in response to genotoxic stimuli. Science 2006;311:1141–6. 10.1126/science.1121513 [DOI] [PubMed] [Google Scholar]
- 28. Carvaillo J-C, Barouki R, Coumoul X. et al. Linking Bisphenol S to adverse outcome pathways using a combined text mining and systems biology approach. Environ Health Perspect 2019;127:47005. 10.1289/EHP4200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Jaylet T, Coustillet T, Jornod F. et al. AOP-helpFinder 2.0: integration of an event-event searches module. Environ Int 2023;177:108017. 10.1016/j.envint.2023.108017 [DOI] [PubMed] [Google Scholar]
- 30. Pittman ME, Edwards SW, Ives C. et al. AOP-DB: a database resource for the exploration of adverse outcome pathways through integrated association networks. Toxicol Appl Pharmacol 2018;343:71–83. 10.1016/j.taap.2018.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Mortensen HM, Senn J, Levey T. et al. The 2021 update of the EPA’s adverse outcome pathway database. Sci Data 2021;8:169. 10.1038/s41597-021-00962-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Xiong Z, Wang D, Liu X. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 2020;63:8749–60. 10.1021/acs.jmedchem.9b00959 [DOI] [PubMed] [Google Scholar]
- 33. Thomas RS, Black MB, Li L. et al. A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol Sci 2012;128:398–417. 10.1093/toxsci/kfs159 [DOI] [PubMed] [Google Scholar]
- 34. Watanabe C, Egami T, Midorikawa K. et al. DNA damage and estrogenic activity induced by the environmental pollutant 2-nitrotoluene and its metabolite. Environ Health Prev Med 2010;15:319–26. 10.1007/s12199-010-0146-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Suzuki H, Imamura T, Koeda A. et al. Genotoxicity studies of 2,6-dinitrotoluene (2,6-DNT). J Toxicol Sci 2011;36:499–505. 10.2131/jts.36.499 [DOI] [PubMed] [Google Scholar]
- 36. D’Angiolella V, Donato V, Forrester FM. et al. Cyclin F-mediated degradation of ribonucleotide reductase M2 controls genome integrity and DNA repair. Cell 2012;149:1023–34. 10.1016/j.cell.2012.03.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Baker M, Petasny M, Taqatqa N. et al. KDM3A regulates alternative splicing of cell-cycle genes following DNA damage. RNA 2021; 27:1353–62. 10.1261/rna.078796.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Gao B, Wu X, Bu L. et al. Atypical inflammatory kinase IKBKE phosphorylates and inactivates FoxA1 to promote liver tumorigenesis. Sci Adv 2024;10:eadk2285. 10.1126/sciadv.adk2285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Siomek A. NF-κB signaling pathway and free radical impact. Acta Biochim Pol 2012;59:323–31. [PubMed] [Google Scholar]
- 40. Wang W, Mani AM, Wu Z-H. DNA damage-induced nuclear factor-kappa B activation and its roles in cancer progression. J Cancer Metas Treat 2017;3:45–59. 10.20517/2394-4722.2017.03 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Roginskaya M, Razskazovskiy Y. Oxidative DNA damage and repair: mechanisms, mutations, and relation to diseases. Antioxidants 2023;12:1623. 10.3390/antiox12081623 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Schipler A, Iliakis G. DNA double-strand-break complexity levels and their possible contributions to the probability for error-prone processing and repair pathway choice. Nucleic Acids Res 2013;41:7589–605. 10.1093/nar/gkt556 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data and source code used in this study are publicly available at: https://github.com/Zzxin89/GenotoxNet. This repository includes the processed datasets, model implementation, and instructions for reproducing the experiments.






