Skip to main content
BMC Pharmacology & Toxicology logoLink to BMC Pharmacology & Toxicology
. 2026 Jan 28;27:38. doi: 10.1186/s40360-026-01092-5

Decrypting potential mechanisms linking ochratoxin A to hepatocellular carcinoma: an integrated approach combining toxicology, machine learning, molecular docking, and molecular dynamics simulation

Junyi Zhuo 1, Hua Wu 2, Xiaoling Zhou 3, Xi Wang 3, Tianqi Qiu 1, Min Lin 1, Yu Tang 1,
PMCID: PMC12924422  PMID: 41593793

Abstract

Background

Ochratoxin A (OTA), a common food-borne mycotoxin, is a potential human carcinogen, yet the specific molecular mechanisms linking it to hepatocellular carcinoma (HCC) remain unclear.

Methods

We integrated network toxicology to predict OTA targets and intersected them with HCC transcriptomic data to identify key candidate genes. Functional enrichment analysis was then conducted. Multiple machine learning algorithms were applied to screen and validate core genes. Furthermore, molecular docking and molecular dynamics (MD) simulations were employed to evaluate the binding stability between OTA and key target proteins.

Results

A total of 50 key genes were identified as potential targets for potential OTA-associated hepatocarcinogenesis. Enrichment analysis revealed their significant involvement in critical processes such as xenobiotic metabolism and oxidative stress response. Machine learning analysis prioritized eight core genes (AURKA, GABARAPL1, CA2, PARP1, LMNA, SLC27A5, EPHX2, and GSTP1), and a combined diagnostic model demonstrated outstanding performance (AUC = 0.986). Structural analyses via molecular docking and MD simulations confirmed stable binding interactions between OTA and these core targets.

Conclusions

This integrated computational study identifies a set of candidate genes through which OTA may potentially interact with HCC-associated molecular networks. The robust binding predicted between OTA and the core targets provides a structural basis for these interactions. These findings offer a prioritized list of targets and a theoretical framework for subsequent experimental validation and investigation into OTA’s toxicological role in HCC.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40360-026-01092-5.

Keywords: Ochratoxin A, Hepatocellular carcinoma, Bioinformatics, Machine learning, Molecular docking, Molecular dynamics simulation

Introduction

Hepatocellular carcinoma (HCC) presents a significant global health challenge. Epidemiological data from 2020 estimated approximately 906,000 new cases and 830,000 deaths from primary liver cancer worldwide, making it the sixth most common malignancy and the third leading cause of cancer-related mortality, with HCC accounting for 75% to 85% of these cases [1, 2]. The etiology of HCC is multifactorial, involving a complex interplay of chronic viral infections, metabolic disorders, and environmental exposures. Hepatitis B and C viruses, along with alcohol consumption, are widely recognized as major causative agents [3]. Beyond these established factors, the pronounced geographical disparities in HCC incidence and mortality underscore the importance of environmental and dietary exposures, either independently or synergistically [4, 5].

Among dietary contaminants, mycotoxins-natural toxic products-are prevalent in improperly stored food crops [6]. Aflatoxin B1 (AFB1) and ochratoxin A (OTA) are two of the most concerning mycotoxins [7, 8]. AFB1, classified as a Group 1 human carcinogen by the International Agency for Research on Cancer (IARC), has been extensively studied for its role in hepatocarcinogenesis, particularly its synergistic effect with hepatitis B virus in increasing HCC risk in regions such as sub-Saharan Africa and East Asia [9, 10]. In contrast, OTA, a mycotoxin produced by certain Aspergillus and Penicillium species, commonly contaminates grains, wine, coffee, spices, and nuts [11, 12]. The IARC classifies OTA as a Group 2B possible human carcinogen. It exhibits exceptional stability and heat resistance, making it difficult to eliminate from contaminated food [13, 14]. Notably, OTA demonstrates unusual toxicokinetics in humans, with a prolonged plasma half-life of approximately 840 h (35 days), leading to persistent tissue accumulation [15]. Experimental studies have shown that OTA can induce liver adenomas and carcinomas in mice and rats. In vitro evidence further indicates that OTA causes DNA damage, oxidative stress, and apoptosis in human hepatocytes and hepatic carcinoma cells [14, 16]. Recent mechanistic insights suggest that OTA may trigger oxidative stress and induce apoptosis, disrupt lipid metabolism, and cause mitochondrial dysfunction via the aryl hydrocarbon receptor (AhR)-regulated phase I response, thereby potentially contributing to liver injury and hepatocarcinogenesis in experimental models [17]. Multi-omics studies have revealed that OTA exposure induces early hepatotoxicity and promotes metabolic disorders in mouse models [18].

The emergence of network toxicology offers a powerful, systems-oriented framework for deciphering complex toxicological mechanisms. By integrating multi-omics data—including genomics, transcriptomics, proteomics, and metabolomics—this approach constructs a multi-modal “compound-target-pathway” interaction network, effectively translating intricate mechanisms into more comprehensible visual representations [19]. It is particularly adept at managing the complex interactions among multiple components, targets, and pathways. Complementarily, machine learning algorithms can efficiently analyze complex biological data, identify key features, and predict disease-related genes or biomarkers [20].

While the aforementioned research provides preliminary insights into OTA-induced liver injury and its potential role in HCC, most studies are limited to rodent models or observations of cellular phenotypes. A significant gap remains in systematically linking pathway enrichment signals to specific molecular targets and validating these at the structural level, thereby failing to establish a comprehensive “pathway-to-target-to-structure” evidence chain for OTA-driven hepatocarcinogenesis.

To address this research gap, our study employs an integrative computational approach that combines network toxicology, multi-dataset machine learning, molecular docking, and molecular dynamics simulations. We aim to systematically identify critical targets and pathways involved in OTA-associated HCC, utilize SHapley Additive exPlanations (SHAP) analysis for transparent attribution of feature importance, and validate the binding stability between OTA and core targets at the structural level. This work seeks to elucidate the molecular nexus between OTA and HCC, proposing interpretable key pathways and potential targets supported by structural evidence, thereby providing a theoretical foundation for subsequent experimental validation and risk assessment.

Materials and methods

Collection of HCC-related targets and data preprocessing

This study integrated five publicly available HCC transcriptomic datasets (GSE25097, GSE36376, GSE14811, GSE54236, and GSE76427) from the NCBI GEO database [2128]. The datasets were selected based on the following criteria: (1) well-annotated tumor and adjacent non-tumor liver tissue samples from HCC patients; (2) diverse patient cohorts and microarray platforms; (3) complete expression matrices with reliable probe annotation; (4) prior use in published HCC bioinformatics studies; (5) sufficient sample size. The clinical and platform characteristics of each dataset are summarized in Supplementary Table S1, which shows that the selected datasets were generated using different platforms. For each dataset, platform-specific annotation files were used to map probes to official gene symbols. Probes mapping to multiple genes or to no gene were removed. Expression values were log2-transformed after adding an offset of 1 [log2(x + 1)] to ensure numerical stability, with negative values set to zero prior to transformation. Each dataset was independently normalized using the ‘normalizeBetweenArrays’ function in the limma package. To ensure robust downstream analysis, low-expression genes were removed. Specifically, genes with expression below the 20th percentile in more than 50% of samples within each dataset were excluded. Additionally, only genes commonly detected across all five datasets were retained for cross-dataset integration. GSE25097 and GSE36376 were merged as the training cohort, while GSE14811, GSE54236, and GSE76427 comprised the external validation cohort. To mitigate technical batch effects, the merged expression matrix was adjusted using the ComBat algorithm with an empirical Bayes framework. The effectiveness of batch correction was assessed by comparing boxplots of global expression distributions and principal component analysis (PCA) plots before and after adjustment. After batch correction, PCA was used to identify potential outlier samples. No samples fell outside ± 3 standard deviations along the first two principal components. Therefore, all samples were retained for subsequent analysis. The overall analytical workflow is illustrated in Fig. 1.

Fig. 1.

Fig. 1

Workflow of datasets analysis in this study

Prediction of OTA targets

The molecular structure and SMILES notation of OTA were retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) using the keyword “ochratoxin A”. Potential biological targets of OTA were systematically predicted through a multi-source approach: the ChEMBL database for ligand-receptor interaction profiling (https://www.ebi.ac.uk/chembl/), SwissTargetPrediction for ligand-based forecasting (https://www.swisstargetprediction.ch/), and PharmMapper for reverse pharmacophore mapping (https://lilab-ecust.cn/pharmmapper) [2931]. ChEMBL provides experimentally validated ligand-target interaction data, with its predictions based on the assumption that compounds with similar chemical structures may share biological targets. SwissTargetPrediction employs a ligand-based approach, comparing the query molecule’s 2D and 3D descriptors against a curated library of known active compounds to infer potential targets. Its accuracy depends on the coverage and quality of the reference ligand set. PharmMapper performs reverse pharmacophore mapping by matching the input molecule against a database of pharmacophore models derived from protein binding sites, thereby identifying proteins whose binding pocket features complement the ligand’s pharmacophoric pattern. All three tools operate under the common premise that predicted interactions are plausible within the human proteome, although false positives may arise due to limitations in model training data or chemical space coverage. All predicted targets were restricted to the Homo sapiens proteome. Additionally, the ADMETlab 2.0 (https://admetmesh.scbdd.com/) and ProTox 3.0 platforms (https://tox.charite.de/protox3/#) were used to evaluate the toxicological properties of OTA, generating a comprehensive toxicological prediction report [32, 33].

Differential gene expression analysis

Differential expression analysis was performed using the limma R package. Prior to differential analysis, expression data were log2-transformed when necessary and normalized using the normalizeBetweenArrays function to minimize technical variability across samples. A linear model was fitted for each gene, followed by empirical Bayes moderation to obtain moderated t-statistics. Genes with an absolute log2-fold change (|log₂ FC| > 0.585, corresponding to a 1.5-fold change) and a false discovery rate (FDR)-adjusted p-value < 0.05, calculated using the Benjamini-Hochberg method, were considered significantly differentially expressed. Differential expression results were visualized using volcano plots and heatmaps.

Weighted gene co-expression network analysis (WGCNA)

A scale-free co-expression network was constructed using the WGCNA R package to identify functional modules strongly associated with HCC. After filtering out genes with low expression and low variance, an optimal soft-thresholding power (β) was selected to achieve a scale-free topology fit index (R²) > 0.9. Gene modules were identified through dynamic hierarchical clustering of the topological overlap matrix (TOM), using a minimum module size of 50 and a merge cut height of 0.25. Modules significantly correlated with the HCC phenotype (P < 0.05) were identified. Genes with high intramodular connectivity (kME > 0.8) and gene significance (GS > 0.2) within these modules were defined as hub genes. The results were visualized using dendrograms, while the module-trait relationships were illustrated using heatmaps.

Identification of OTA-HCC intersection targets

HCC-related targets were defined as the union of DEGs and genes derived from the most disease-relevant WGCNA modules. Potential key targets through which OTA may contribute to HCC pathogenesis were identified by intersecting HCC-related targets with OTA-predicted targets. This intersection strategy aims to isolate genes that are not only dysregulated in HCC but also putative molecular interactors of OTA, thereby prioritizing candidates with dual relevance to both the toxin and the disease. The intersection was visualized using the VennDiagram R package.

Protein-protein interaction network and functional enrichment analysis

The intersecting genes were submitted to the STRING database (https://string-db.org/) to construct a protein-protein interaction (PPI) network [34]. The network was constructed with the default evidence settings, which incorporate multiple complementary sources of interaction evidence, including experimentally determined data, curated biological databases, co-expression patterns, computational predictions, and text mining. A STRING confidence score threshold of 0.4 was used for primary network construction to ensure sufficient interaction coverage. To ensure robustness, a sensitivity analysis was conducted by reconstructing the PPI network using progressively stringent thresholds (0.5, 0.6, and 0.7). The resulting network was imported into Cytoscape (version 3.9.0) for visualization and further analysis. Functional annotation (Gene Ontology) and pathway enrichment (KEGG) analyses were performed on the intersecting genes using the clusterProfiler R package, with a significance threshold of FDR-adjusted p-value < 0.05 (Benjamini-Hochberg method).

Machine learning-based screening of core genes

An integrated machine learning framework was developed to identify core genes based on the expression profiles of intersecting genes. The raw data were preprocessed by removing missing values and outliers, followed by Z-score normalization. The dataset was then partitioned into training and validation sets using stratified random sampling. Twelve classical machine learning algorithms-Lasso, Ridge, Elastic Net, Random Forest (RF), Gradient Boosting Machine (GBM), eXtreme Gradient Boosting (XGBoost), glmBoost, Stepglm, Linear Discriminant Analysis (LDA), Naive Bayes, plsRglm, and Support Vector Machine (SVM)-were employed to build 127 predictive models. Hyperparameters were optimized via 10-fold cross-validation. Model performance was evaluated on the independent validation set using the area under the receiver operating characteristic curve (AUC) as the primary metric. A stacking ensemble strategy was applied to integrate the predictions of the top-performing single models. Feature genes frequently selected by high-confidence models (AUC > 0.9) were ranked by their selection frequency to identify the final core genes. Model performance was visualized using the ComplexHeatmap package, and the SHapley Additive exPlanations (SHAP) method was utilized to assess the contribution of each core gene to the model predictions.

Molecular docking validation

To validate the binding interactions between OTA and the core target proteins, three-dimensional structures of the core proteins were retrieved from the UniProt database, and their corresponding PDB (Protein Data Bank) IDs (Supplementary Table S2) were used for docking analysis. The molecular structure of OTA in SDF format was obtained from PubChem. Molecular docking was performed using the CB-DOCK2 online platform (https://cadd.labshare.cn/cb-dock2/) [35]. This server execution involves several key steps: (i) automated detection of potential binding cavities on the entire protein surface, (ii) molecular docking simulation using the Vina scoring function to sample ligand conformations, and (iii) ranking of poses based on predicted binding affinity (kcal/mol). Prior to docking, protein structures were prepared within CB-DOCK2 by adding hydrogen atoms, assigning partial charges, and removing water molecules. The OTA ligand was similarly prepared for docking, with rotatable bonds defined. To assess the robustness of the docking predictions, the binding poses and affinity rankings for key complexes were cross-validated using the widely adopted molecular docking program AutoDock Vina (version 1.1.2), run locally with default parameters. The conformation exhibiting the most favorable (most negative) binding energy from the primary CB-DOCK2 analysis was selected as the representative binding pose for each OTA-target complex. All resulting complexes were visualized using PyMOL (version 3.0.4).

Molecular dynamics simulation

To investigate the binding stability of the OTA-target complexes in a dynamic environment, molecular dynamics (MD) simulations were conducted for 100 ns using GROMACS version 2024.2 [36]. The AMBER99SB force field and the SPC water model were employed, with the system temperature maintained at 300 K. Prior to the MD simulations, the system underwent energy minimization, consisting of 3000 steps of steepest descent followed by 2000 steps of conjugate gradient optimization. Throughout the simulation, key properties including the root mean square deviation (RMSD) and root mean square fluctuation (RMSF) were monitored to assess overall conformational stability and local flexibility of the complex. Additionally, the number of hydrogen bonds (H-bonds), radius of gyration (Rg), and solvent-accessible surface area (SASA) were analyzed to characterize the binding interface in terms of interaction strength, structural compactness, and solvent exposure. All results were visualized using QtGrace, and the free energy landscape (FEL) was plotted to reveal conformational distribution and dynamic stability [37].

Results

Screening of HCC-related target genes

To evaluate the effect of data integration, PCA was performed on the merged expression matrix before and after batch correction. Before correction, samples exhibited clear separation along PC1, which accounted for 98.6% of the variance, while PC2 explained only 0.2% (Fig. 2A). After ComBat correction, the sample distributions along PC1 and PC2 overlapped, with the contribution of PC1 decreasing to 9.6% and that of PC2 increasing to 6.8% (Fig. 2B). Boxplots of global gene expression distributions showed that the two datasets (GSE25097 and GSE36376) exhibited markedly different expression distributions before batch correction while the expression distributions became highly comparable across datasets, with aligned medians and reduced inter-dataset variability, indicating successful mitigation of batch effects (Fig. 2C and D). Based on predefined thresholds (|log₂FC| > 0.585, FDR < 0.05), a total of 734 significantly differentially expressed genes (DEGs) were identified, with red and blue dots representing upregulated and downregulated genes, respectively (Fig. 2E). Heatmap analysis further confirmed stable expression patterns of DEGs across samples (Fig. 2F). In our WGCNA, a soft-thresholding power of 7 was selected after systematically evaluating powers from 1 to 20, as it was the minimum value that satisfied the scale-free topology criterion (R² > 0.85) (Fig. 2G). Using this power, a topological overlap matrix (TOM) was then constructed, from which hierarchical clustering identified co-expression modules. Excluding the grey module, which served as a repository for genes lacking strong co-expression with others, eight distinct gene modules were identified, each assigned a unique color for visualization. After merging highly similar modules, the final modular structure became more distinct (Fig. 2H). Among these, the green module exhibited the strongest positive correlation with disease status (r = 0.91, p < 1e − 300) and was therefore selected for subsequent analysis (Fig. 2I). All 192 genes in the green module met the criteria for hub genes (kME > 0.8, GS > 0.2). By integrating DEGs and WGCNA module genes, a total of 798 HCC-related targets were obtained (Fig. 3A).

Fig. 2.

Fig. 2

Screening of HCC-related targe genes. (A) Principal component analysis(PCA) before batch-effect correction reveals a pronounced batch separation. (B) Principal component analysis after batch-effect correction reduces batch effects. PC1 and PC2 percentages indicate the proportion of variance explained. Successful correction is evidenced by reduced dataset clustering and overlapping sample distributions. (C and D) Boxplots showing gene expression distributions across samples before (left) and after (right) batch effect removal. Pronounced batch-dependent differences were observed before correction, whereas expression distributions became highly comparable after correction, indicating effective mitigation of batch effects. (E) Volcano plot of differentially expressed genes (DEGs). Red dots represent significantly upregulated genes (log₂FC > 0.585, FDR < 0.05), blue dots represent significantly downregulated genes (log₂FC < -0.585, FDR < 0.05), and grey dots represent non-significant genes. (F) Heatmap depicting the expression patterns of the top 50 most significant DEGs across tumor and adjacent non-tumor samples. The color scale from blue to red indicates low to high expression. (G) Determination of the soft-thresholding power (β) in WGCNA. The left panel shows the scale-free topology fit index (R²) as a function of β. The chosen power (β = 7) is marked in red, where R² exceeds 0.85. The right panel shows the mean connectivity as a function of β. (H) Cluster dendrogram of genes and module identification in WGCNA. The top panel shows the hierarchical clustering tree of genes based on topological overlap dissimilarity. Each colored band below the dendrogram represents an assigned co-expression module (the grey module contains unassigned genes). The bottom panel shows the module eigengene dendrogram after merging highly similar modules (merge cut height = 0.25). (I) Module–trait relationship heatmap. Each cell contains the correlation coefficient and the corresponding p-value between a module eigengene and the HCC phenotype. The color intensity of the cell indicates the strength of the correlation (red for positive, blue for negative). The green module showed the strongest positive correlation with HCC (r = 0.91, p < 1e-300)

Fig. 3.

Fig. 3

Identification of OTA-related disease targets in HCC. (A) Potential targets of OTA were aggregated and de-duplicated across ChEMBL, SwissTargetPrediction, and PharmMapper. (B) HCC-related genes were identified via the union of differentially expressed genes (Diff Gene) with genes from WGCNA green module (ModuleGenes_green). (C) Venn diagram illustrating the overlap between potential OTA targets and HCC-related genes, resulting in 50 intersection genes. (D) Protein–protein interaction (PPI) network visualizes interactions among potential genes. Node colors indicate gene expression direction in HCC: orange for upregulated, green for downregulated. Edges represent predicted or known interactions. (E) Gene Ontology (GO) enrichment analysis of the potential targets, including Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) categories, visualized as a bar plot ranked by − log10 (p value), with bar length indicating the number of enriched targets. (F) Circular visualization of GO enrichment results, in which the outer ring represents individual GO terms, and the inner rings indicate the number of enriched targets and the rich factor (ratio of enriched targets to the total number of targets associated with each GO term); color gradients correspond to enrichment significance expressed as − log10 (p value). (G) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the potential targets shown as a bar plot, where bar length represents the number of enriched targets and color intensity reflects the statistical significance (p value). (H) Bubble plot of KEGG pathway enrichment, with the x-axis indicating the GeneRatio, bubble size representing the number of enriched targets, and bubble color denoting enrichment significance (p value)

Prediction of OTA toxicity profiles

Toxicity prediction using the ADMETLab 2.0 platform indicated that OTA possesses potential human hepatotoxicity and carcinogenicity, with a high probability of inducing liver injury (Supplementary File S1). Further analysis with the ProTox-3.0 platform revealed that OTA may induce multiple toxic effects, including carcinogenicity, nephrotoxicity, respiratory toxicity, cytotoxicity, and clinical toxicity (Supplementary File S2).

Identification of OTA target proteins

The canonical 2D structural descriptor (SMILES: C[C@@H]1CC2 = C(C = C(C(= C2C(= O)O1)O)C(= O)NC@@HC(= O)O)Cl) of OTA was retrieved from the PubChem database to serve as a standardized identifier for cross-database searches (Supplementary Fig. 1). By integrating prediction results from three complementary databases—ChEMBL, PharmMapper, and SwissTargetPrediction—a total of 560 unique potential targets of OTA were identified after removing duplicates (Fig. 3B).

Identification of intersection genes linking OTA to HCC

Intersection analysis between OTA potential targets and HCC-related genes identified 50 key targets potentially involved in OTA-associated HCC (Fig. 3C). A PPI network was constructed using the STRING database and visualized with Cytoscape, where orange and green nodes represent upregulated and downregulated genes, respectively (Fig. 3D).

Functional and pathway enrichment analysis

GO enrichment analysis revealed that the intersecting genes were significantly enriched (FDR-adjusted p < 0.05) in biological processes such as steroid metabolic process and response to xenobiotic stimulus, molecular functions including steroid binding and serine hydrolase activity, and cellular components such as spindle pole and collagen-containing extracellular matrix (Fig. 3E and F). KEGG pathway analysis indicated that these genes were primarily involved (FDR-adjusted p < 0.05) in chemical carcinogenesis—DNA adducts formation, bile secretion, xenobiotic metabolism by cytochrome P450, drug metabolism via cytochrome P450, linoleic acid metabolism, arachidonic acid metabolism, phenylalanine metabolism, arginine biosynthesis, and arginine and proline metabolism (Fig. 3G and H). Notably, the enriched pathways align closely with the known hepatotoxic and carcinogenic mechanisms of OTA, including genotoxicity, metabolic disruption, and oxidative stress. This concordance supports the biological relevance of the identified intersecting genes.

Sensitivity analysis of the protein-protein network

To assess the robustness of our network-based findings relative to the interaction confidence threshold, we reconstructed PPI networks at three higher stringency levels: 0.5, 0.6, and 0.7. As anticipated, network sparsity increased with threshold stringency. Five of the eight core genes (AURKA, PARP1, LMNA, EPHX2, GSTP1) remained connected across all thresholds. SLC27A5 was lost at thresholds ≥ 0.6, while CA2 and GABARAPL1 were present only at the 0.4 threshold (Supplementary Fig. 2). These observations reflect the limited availability of high-confidence interaction evidence for certain proteins in the STRING database, resulting in their classification as isolated nodes that are removed during network construction or visualization when only connected nodes are retained. Furthermore, functional enrichment analyses on genes from networks at each threshold consistently highlighted the same core pathways related to OTA toxicity (e.g., xenobiotic metabolism, chemical carcinogenesis), confirming the stability of our biological interpretations (Supplementary Fig. 3).

Machine learning-based core gene screening and model construction

Based on the 50 intersecting genes, we constructed a machine learning framework comprising 127 algorithm combinations (Fig. 4A). Among these, the Random Forest (RF) model performed the best, achieving an average AUC of 0.946. By integrating feature genes selected by high-confidence models (AUC > 0.9), we identified eight core genes based on selection frequency: AURKA, GABARAPL1, CA2, PARP1, LMNA, SLC27A5, EPHX2, and GSTP1. Expression pattern analysis revealed that GABARAPL1, CA2, SLC27A5, EPHX2, and GSTP1 were downregulated, while AURKA, PARP1, and LMNA were upregulated (Fig. 4B). ROC analysis of individual genes demonstrated good predictive performance (AUC range: 0.791–0.934), and the multi-gene combined model achieved a further improved AUC of 0.986 (Fig. 4C and D). A nomogram was developed to translate the expression levels of the core genes into a quantitative risk score, which demonstrated a positive correlation with disease probability (Fig. 4E). The calibration curve indicated high consistency between model predictions and actual observations (Fig. 4F).

Fig. 4.

Fig. 4

Machine learning-based identification of core genes in OTA-associated HCC. (A) Heatmap summarizing the performance of 127 machine learning models. Rows represent different base algorithms and their hyperparameter combinations. Columns represent performance metrics evaluated on the training and validation sets. The color scale indicates performance level. The Random Forest (RF) model achieved the highest average AUC (0.946). (B) Expression pattern of the eight prioritized core genes. A volcano plot highlighting the eight core genes (AURKA, GABARAPL1, CA2, PARP1, LMNA, SLC27A5, EPHX2, GSTP1) among the 50 intersection genes. Their log₂ fold change (x-axis) and statistical significance (-log₁₀(FDR), y-axis) in HCC vs. normal tissue are shown. Upregulated genes are in red, downregulated in blue. (C) Receiver Operating Characteristic (ROC) curves for individual core genes. The diagnostic performance (AUC values ranging from 0.791 to 0.934) of each core gene for distinguishing HCC from normal liver tissue in the training cohort is shown. (D) ROC curve of the combined diagnostic model. The stacking ensemble model integrating the eight core genes demonstrated superior diagnostic accuracy (AUC = 0.986). (E) Nomogram for predicting HCC risk. The nomogram translates the expression levels (Z-scores) of the eight core genes into points, which are summed to calculate a total score corresponding to an individual’s predicted probability of having HCC. (F) Corrected calibration curve of nomogram comparing the model-predicted probability of HCC (x-axis) with the observed actual frequency (y-axis) in the validation cohort. The dashed 45-degree line represents perfect prediction. The solid line and the bias-corrected line indicate excellent agreement between predictions and observations

Model interpretability analysis

By comparing the ROC curves of different models (Fig. 5A), the RF model demonstrated the best performance (AUC = 0.980). SHAP analysis further elucidated the contributions of core genes to the predictions, identifying AURKA (SHAP value = 0.126), GABARAPL1 (SHAP value = 0.095), GSTP1 (SHAP value = 0.091), and LMNA (SHAP value = 0.078) as having the most significant impacts (Fig. 5B and C). SHAP dependence plots showed that the expression levels of upregulated genes were positively correlated with predicted risk, whereas downregulated genes exhibited a negative correlation (Fig. 5D). Waterfall plots illustrated the contribution of each feature to the prediction outcome for individual samples (Fig. 5E).

Fig. 5.

Fig. 5

Interpretability analysis of the machine learning model using SHAP (SHapley Additive exPlanations). (A) ROC curves of the top five performing single machine learning models. The Random Forest (RF) model achieved the best performance (AUC = 0.980). (B) Summary bar plot of mean absolute SHAP values. The eight core genes are ranked by their mean absolute SHAP value, which represents their average impact on the model’s output magnitude across all samples. AURKA, GABARAPL1, and GSTP1 showed the highest feature importance. (C) Beeswarm summary plot of SHAP values. Each point represents a sample. The x-axis displays the SHAP value, indicating the impact on the model’s prediction: positive values push the prediction toward HCC, while negative values push it toward normal. The color represents the actual gene expression level in each sample (golden yellow: high, purple: low). The plot demonstrates that high expression of upregulated genes (e.g., AURKA) and low expression of downregulated genes (e.g., GABARAPL1) contribute to an increased predicted risk of HCC. (D) SHAP dependence plots for selected core genes. Each plot shows how the SHAP value (impact) for a specific gene varies with its expression value (feature value). Non-linear relationships and potential interactions are visualized. (E) SHAP waterfall plot for a representative sample. This plot explains the model’s prediction for a single HCC sample, showing how each feature (gene) shifts the base model output (average prediction) to the final predicted value. Features are ordered by the magnitude of their impact

Molecular docking validation

Molecular docking results revealed strong binding affinities between OTA and eight core target proteins, with all binding energies below − 7 kcal/mol (Fig. 6A). Visualization of the binding conformations indicated stable hydrogen bond interactions and spatial complementarity between OTA and the target proteins (Fig. 6B to I). Furthermore, the binding affinities predicted by the two docking approaches (CB-DOCK2 online platform and AutoDock Vina) exhibited highly consistent trends across all target proteins. In particular, proteins such as EPHX2 and PARP1 showed strong binding affinities to OTA in both methods, whereas relatively weaker interactions were observed for AURKA and LMNA (Supplementary Fig. 4). These results indicate that the docking outcomes are robust and not dependent on a single docking algorithm.

Fig. 6.

Fig. 6

Molecular docking validation of OTA binding to core target proteins. (A) Heatmap of molecular docking scores (binding energies in kcal/mol). The eight core target proteins are listed. Darker blue indicates stronger (more negative) binding affinity, confirming favorable interactions between OTA and all targets (all energies < -7 kcal/mol). (B-I) 3D visualization of predicted binding poses for OTA with each core target. For each panel: the protein surface or cartoon is shown in grey/light blue. The OTA ligand is represented as sticks. Key residues involved in hydrogen bonding (dashed yellow lines) or hydrophobic interactions are highlighted. Specific binding pockets are indicated. (B) AURKA, (C) GABARAPL1, (D) CA2, (E) PARP1, (F) LMNA, (G) SLC27A5, (H) EPHX2, (I) GSTP1

Molecular dynamics simulation analysis

Based on molecular docking results, the three core genes (EPHX2, PARP1, and SLC27A5) with the strongest binding affinities were selected for further molecular dynamics (MD) simulations. RMSD was used to evaluate the conformational stability of the protein–ligand complexes, reflecting the deviation of atomic positions from the initial structure. RMSD analysis indicated that all three complexes reached equilibrium during the simulation and maintained overall conformational stability throughout the 100 ns trajectory. Specifically, the PARP1–OTA complex exhibited the most stable RMSD profile, fluctuating around 0.4 nm. The SLC27A5–OTA system converged within the first 10 ns and remained stable thereafter, while the EPHX2–OTA complex, despite a brief fluctuation toward the end of the simulation, generally fluctuated around 0.5 nm (Fig. 7A). RMSF was employed to characterize the flexibility of protein residues. In the EPHX2–OTA system, several residue regions displayed higher fluctuations, suggesting enhanced conformational flexibility in these areas. The PARP1–OTA complex showed generally low RMSF values overall, though increased fluctuations were observed in the residue region 70–120, possibly reflecting local conformational adjustments induced by ligand binding. In contrast, the SLC27A5–OTA system exhibited consistently low RMSF values (< 0.2 nm) after equilibration, indicating a relatively rigid structure (Fig. 7B). Rg revealed distinct dynamic behaviors among the complexes. While the PARP1–OTA and SLC27A5–OTA complexes exhibited minimal Rg fluctuations (± 0.1 nm), indicating stable folded conformations, the EPHX2–OTA complex showed greater variability (± 0.2 nm), suggesting conformational rearrangements within specific residue regions (Fig. 7C). To further characterize the dynamic properties, we computed SASA, hydrogen bonding, and FEL. The SASA profiles suggested greater structural flexibility in the EPHX2–OTA and SLC27A5–OTA complexes compared to PARP1–OTA, which displayed a more compact and stable interface (Fig. 7D). Persistent hydrogen-bond interactions were identified in all systems, underpinning the stability of the OTA binding pose (Fig. 7E). Projection of the simulation trajectories onto the FEL provided insights into conformational entropy and stability. The broad, multi-basin landscape of EPHX2–OTA signifies conformational heterogeneity, whereas the focused, deep minimum of PARP1–OTA reflects a highly rigid and stable complex. The SLC27A5–OTA system exhibited an intermediate profile, with a dominant global minimum accompanied by low-energy microstates, implying a stable yet marginally adaptable conformation (Fig. 7F).

Fig. 7.

Fig. 7

Molecular dynamics simulations assessment of OTA binding stability with three selected targets. (A) Root Mean Square Deviation (RMSD) of the protein backbone (solid lines) and the OTA ligand (dashed lines) over the 100 ns simulation trajectory. Stable plateaus indicate conformational equilibration. (B) Root Mean Square Fluctuation (RMSF) per residue for the protein backbone. Peaks indicate regions of high flexibility. Active site or binding pocket residues are typically less flexible. (C) Radius of Gyration (Rg) of the protein-ligand complex over time, reflecting overall compactness. (D) Solvent Accessible Surface Area (SASA) of the binding pocket or the ligand over time, indicating solvent exposure. (E) Number of hydrogen bonds formed between OTA and the target protein throughout the simulation. Persistent H-bonds contribute to complex stability. (F) Free Energy Landscape (FEL) projected onto the first two principal components (PC1 and PC2) derived from the simulation trajectory. The color scale from blue to red represents low to high free energy. Deep blue basins correspond to stable conformational states sampled by the complex. The EPHX2-OTA complex shows a broader, multi-basin landscape (higher conformational flexibility), while the PARP1-OTA complex exhibits a deep, single basin (high stability)

Discussion

This study integrated network toxicology, multi-dataset machine learning, and molecular structural simulation to systematically elucidate the key molecular targets and pathways linking OTA exposure to HCC development. The foundation of our study is the identification of 50 candidate genes at the intersection of HCC transcriptomic signatures and predicted OTA targets. To confirm the biologically relevance of this gene set, we first examined its functional enrichment. These genes were significantly enriched in pathways including Xenobiotic metabolism by cytochrome P450, and chemical carcinogenesis involving DNA adducts. Crucially, these pathways aligns perfectly with OTA’s established toxicological mechanisms: metabolic activation/genotoxicity, oxidative stress, and lipid metabolism disruption [14, 16, 17, 38]. This concordance validates our intersection strategy, confirming that it captured genes central to OTA’s mechanistic landscape.

Building upon this validated gene set, a PPI network revealed their functional interconnectivity. Machine learning applied within this network context prioritized eight core genes: AURKA, GABARAPL1, CA2, PARP1, LMNA, SLC27A5, EPHX2, and GSTP1. The significance of these network hubs is supported by prior experimental literature involved in hepatic carcinogenesis. AURKA protein expression is significantly elevated in HCC tumors and has been identified as a central hub gene with critical functions in HCC [39, 40]. In p53-aberrant liver cancers, AURKA forms a complex with MYC, stabilizing MYC and driving addiction-like proliferation [41]. PARP1 serves as a transcriptional coregulator, interacting with NF-κB and STAT5 to promote the expression of pro-inflammatory and tumorigenic genes, thereby fostering chronic inflammation, cell proliferation, and angiogenesis in HCC [42]. The overactivation of PARP1, triggered by OTA-induced DNA damage, consumes NAD + excessively, ultimately leading to disruptions in hepatic energy metabolism [38]. PARP1 is highly expressed in human embryonic stem cells, and its reactivated expression is detected in residual HCC tumors after sorafenib treatment, suggesting a potential role in stem cell pluripotency and sorafenib resistance in HCC [43]. LMNA encodes nuclear lamins that are involved in chromatin organization, regulation of gene expression, DNA damage response, and mechanotransduction [44]. High LMNA expression is associated with poor prognosis in HCC patients, and LMNA knockout reduces the tumorigenicity of HepG2 cells, suggesting an oncogenic role in HCC [45]. GABARAPL1, a key autophagy-related protein involved in autophagosome formation, is associated with tumor cell proliferation, invasion, and mitochondrial homeostasis [46, 47]. Its expression is decreased in HCC tissues, and its overexpression can inhibit cell growth [48]. SLC27A5, also known as fatty acid transport protein 5 (FATP5), primarily facilitates long-chain fatty acid transport and bile acid conjugation [49]. Its deficiency in mice promotes hepatic fibrosis via cholic acid (CA)-induced hepatic stellate cell (HSC) activation [50]. SLC27A5 acts as a novel tumor suppressor in HCC, partly by regulating the AMPK/mTOR pathway [51]. EPHX2 (epoxide hydrolase 2) is a bifunctional enzyme with epoxide hydrolase and lipid phosphatase activities, playing important roles in xenobiotic metabolism and lipid homeostasis [52]. Downregulated EPHX2 expression was detected in HCC tissue samples and cell lines [53]. GSTP1 is a crucial Phase II detoxification enzyme involved in cellular detoxification, oxidative stress regulation, and cancer development [54]. OTA may downregulate the expression of GSTP1 by inhibiting the Nrf2 pathway, thereby compromising hepatic detoxification capacity and exacerbating oxidative damage [55]. GSTP1 gene polymorphisms can serve as independent prognostic indicators for HCC patient [56]. Studies show that GSTP1 promoter hypermethylation is common in HBV-related early HCC, and inhibiting or silencing GSTP1 can enhance HCC resistance to chemotherapeutic drugs [57]. Taken together, the concordance—where computationally identified hubs corresponds to genes with independent experimental validation—strengthens the biological plausibility of the PPI network.

The eight core genes demonstrated exceptional predictive power for HCC, with the combined model achieving an AUC of 0.986. SHAP analysis quantifying the individual contributions of each gene. Molecular docking revealed strong binding affinities (binding energy < -7 kcal/mol) between OTA and all eight target proteins. Subsequent molecular dynamics simulations confirmed the stability of these complexes including PARP1-OTA and SLC27A5-OTA. This multi-level validation—from diagnostic utility to structural interaction—reinforces their roles as key molecular interfaces in potential OTA-associated hepatocarcinogenesis.

Although this study proposes a multi-target mechanism for OTA-associated HCC, several limitations should be noted. First, despite batch effect correction, residual platform-specific biases stemming from the multi-source transcriptomic data and clinical heterogeneity may persist. Second, the conclusions are primarily derived from bioinformatic and computational analysis, lacking direct functional validation of the core targets in vitro or in vivo. Finally, regarding the structural analyses, we prioritized experimentally-determined PDB structures for molecular docking, as these provide the most reliable representation of binding pocket geometry for small-molecule pose prediction, aligning with standard practice in the field. The integration of predictions from advanced tools like AlphaFold2 for comparative docking remains a valuable avenue for future research. Consequently, future work should focus on establishing an OTA exposure risk assessment system and employing experimental models to functionally validate the roles of these core targets in HCC pathogenesis.

Conclusion

In summary, our integrated computational approach reveals that OTA may be linked to HCC pathogenesis via the dysregulation of a specific gene network, with molecular docking and dynamics simulations confirming stable OTA-target binding. These in silico findings provide a testable mechanistic framework for OTA-associated hepatocarcinogenesis and highlight critical candidate targets. Future studies are warranted to functionally validate the roles of prioritized core genes, such as AURKA and PARP1, in OTA-induced hepatotoxicity using experimental models.

Supplementary Information

Below is the link to the electronic supplementary material.

40360_2026_1092_MOESM1_ESM.pdf (133.7KB, pdf)

Supplementary Material 1: Supplementary File 1. ADMETlab2- Prediction of toxicity of OTA

40360_2026_1092_MOESM2_ESM.pdf (941.5KB, pdf)

Supplementary Material 2: Supplementary File 2. ProTox-3.0 - Prediction of toxicity of OTA

40360_2026_1092_MOESM3_ESM.tif (2.4MB, tif)

Supplementary Material 3: Supplementary Figure 1. Chemical structure of OTA sourced from the PubChem website

40360_2026_1092_MOESM4_ESM.jpg (2.5MB, jpg)

Supplementary Material 4: Supplementary Figure 2. Sensitivity analysis by constructing Protein-Protein interaction (PPI) networks using four progressively stringent confidence thresholds of 0.4 (A), 0.5 (B), 0.6 (C), 0.7 (D). Five genes (AURKA, PARP1, LMNA, EPHX2, and GSTP1) were consistently retained as connected nodes across all thresholds (0.4–0.7). SLC27A5 was retained at thresholds of 0.4 and 0.5 but was excluded at higher thresholds (0.6 and 0.7), while CA2 and GABARAPL1 was only retained at the 0.4 threshold

40360_2026_1092_MOESM5_ESM.jpg (1.4MB, jpg)

Supplementary Material 5: Supplementary Figure 3. Consistency of functional enrichment across PPI networks built at different confidence thresholds. GO and KEGG enrichment analyses were performed on the gene sets from the PPI networks shown in Supplementary Fig. 2 (thresholds: A=0.4, B=0.5, C=0.6, D=0.7). The top enriched terms are displayed. Despite varying network density, the core enriched biological themes remain consistently significant across all thresholds, underscoring the stability of the biological interpretation

40360_2026_1092_MOESM6_ESM.jpg (368.2KB, jpg)

Supplementary Material 6: Supplementary Figure 4. Binding affinities predicted by the two docking approaches exhibited highly consistent trends across all target proteins

40360_2026_1092_MOESM7_ESM.docx (15.8KB, docx)

Supplementary Material 7: Supplementary Table S1. Samples and clinical characteristics in each dataset

40360_2026_1092_MOESM8_ESM.docx (14.9KB, docx)

Supplementary Material 8: Supplementary Table S2. PDB IDs used for the docking analysis

Acknowledgements

Not applicable.

Author contributions

Zhuo J wrote the main manuscript text. Tang Y reviewed and edited the manuscript. Wu H and Zhou X were responsible for data analysis. Wang X and Qiu T performed data visualization. Lin M curated and summarized the data.

Funding

This work was supported by the Sichuan Association of the Integration of Traditional Chinese and Western Medicine (Grant No. ZXY2025010); the Municipal Key Science and Technology Program of Leshan (Grant No. 24ZDYF0098); the Sichuan Medical Science and Technology Innovation Research Association (No. YCH-KY-YCZD2024-135).

Data availability

No datasets were generated or analyzed during the current study.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication.

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Rumgay H, et al. Global burden of primary liver cancer in 2020 and predictions to 2040. J Hepatol. 2022;77(6):1598–606. 10.1016/j.jhep.2022.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sung H, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 Cancers in 185 countries. CA Cancer J Clin. 2021; 71(3):209–249. 10.3322/caac.21660 [DOI] [PubMed]
  • 3.Liu J, et al. The spatio-temporal trends and determinants of liver cancer attributable to specific etiologies: a systematic analysis from the global burden of disease study 2021. Glob Health Res Policy. 2025;10(1):22. 10.1186/s41256-025-00416-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li Q, et al. Global epidemiology of liver cancer 2022: an emphasis on geographic disparities. Chin Med J (Engl). 2024;137(19):2334–42. 10.1097/cm9.0000000000003264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Choi S, et al. Global burden of primary liver cancer and its association with underlying aetiologies, sociodemographic status, and sex differences from 1990–2019: A DALY-based analysis of the global burden of disease 2019 study. Clin Mol Hepatol. 2023;29(2):433–52. 10.3350/cmh.2022.0316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wild CP, et al. Mycotoxins and human disease: a largely ignored global health issue. Carcinogenesis. 2010;31(1):71–82 10.1093/carcin/bgp264 [DOI] [PMC free article] [PubMed]
  • 7.Felizardo RJ, et al. Hepatocellular carcinoma and food contamination: aflatoxins and Ochratoxin A as a great prompter. World J Gastroenterol. 2013;19(24):3723–5. 10.3748/wjg.v19.i24.3723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bryden WL. Mycotoxins in the food chain: human health implications.Asia Pac. J Clin Nutr. 2007;16(Suppl 1):95–101. [PubMed] [Google Scholar]
  • 9.Song C, et al. Mechanisms and transformed products of aflatoxin B1 degradation under multiple treatments: a review. Crit Rev Food Sci Nutr. 2024;64(8):2263–75. 10.1080/10408398.2022.2121910. [DOI] [PubMed] [Google Scholar]
  • 10.Rumgay H, et al. Global burden of cancer in 2020 attributable to alcohol consumption: a population-based study. Lancet Oncol. 2021;22(8):1071–80. 10.1016/s1470-2045(21)00279-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Heussner AH, et al. Comparative Ochratoxin toxicity: A review of the available data. Toxins (Basel). 2015;7(10):4253–82. 10.3390/toxins7104253. [DOI] [PMC free article] [PubMed]
  • 12.Harris JP, et al. Biosynthesis of ochratoxins by Aspergillus ochraceus. Phytochemistry. 2001;58(5):709–16. 10.1016/s0031-9422(01)00316-8. [DOI] [PubMed] [Google Scholar]
  • 13.el Khoury A, et al. Ochratoxin a: general overview and actual molecular status. Toxins (Basel). 2010;2(4):461–93. 10.3390/toxins2040461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Pfohl-Leszkowicz A, Ochratoxin A, et al. An overview on toxicity and carcinogenicity in animals and humans.Mol. Nutr Food Res. 2007;51(1):61–99. 10.1002/mnfr.200600137. [DOI] [PubMed] [Google Scholar]
  • 15.Petzinger E, et al. Mycotoxins in the food chain: the role of ochratoxins. Livest Prod Sci. 2002;76(3):245–50. 10.1016/s0301-6226(02)00124-0. [Google Scholar]
  • 16.Tao Y, Ochratoxin A, et al. Toxicity, oxidative stress and metabolism. Food chem toxicol. 2018;112:320–31 10.1016/j.fct.2018.01.002 [DOI] [PubMed]
  • 17.Shin HS et al. Ochratoxin A-induced hepatotoxicity through phase I and phase II reactions regulated by AhR in liver cells. Toxins (Basel). 2019;11(7). 10.3390/toxins11070377 [DOI] [PMC free article] [PubMed]
  • 18.Qi XZ, et al. Ochratoxin A induced early hepatotoxicity: new mechanistic insights from microRNA, mRNA and proteomic profiling studies. Sci Rep. 2014. 10.1038/srep05163.25524793 [Google Scholar]
  • 19.Cheng M, et al. Exploring the mechanism of PPCPs on human metabolic diseases based on network toxicology and molecular Docking. Environ Int. 2025;196:109324. 10.1016/j.envint.2025.109324. [DOI] [PubMed] [Google Scholar]
  • 20.Yan B, et al. Identification of key fatty acid Metabolism-Related genes in alzheimer’s disease. Mol Neurobiol. 2025;62(7):9399–415. 10.1007/s12035-025-04857-x. [DOI] [PubMed] [Google Scholar]
  • 21.Clough E, et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Acids Res. 2024;52(D1):D138–44. 10.1093/nar/gkad965 [DOI] [PMC free article] [PubMed]
  • 22.Gao J, et al. Integrating machine learning and molecular Docking to Decipher the molecular network of aflatoxin B1-induced hepatocellular carcinoma. Int J Surg. 2025;111(7):4539–49. 10.1097/js9.0000000000002455. [DOI] [PubMed] [Google Scholar]
  • 23.Lim H-Y, et al. Prediction of disease-free survival in hepatocellular carcinoma by gene expression profiling. Annals of Surgical Oncology. 2013;20(12):3747–53 10.1245/s10434-013-3070-y [DOI] [PubMed]
  • 24.Kim B-Y, et al. Feature genes of hepatitis B virus-positive hepatocellular carcinoma, established by its molecular discrimination approach using prediction analysis of microarray.Biochimica. Et Biophys Acta. 2004;1739(1):50–61. [DOI] [PubMed] [Google Scholar]
  • 25.Villa E, et al. Neoangiogenesis-related genes are hallmarks of fast-growing hepatocellular carcinomas and worst survival. Results from a prospective study. Gut. 2015;65(5):861–9. 10.1136/gutjnl-2014-308483. [DOI] [PubMed] [Google Scholar]
  • 26.Grinchuk OV, et al. Tumor-adjacent tissue co-expression profile analysis reveals pro-oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma. Mol Oncol. 2017; 12(1). 10.1002/1878-0261.12153 [DOI] [PMC free article] [PubMed]
  • 27.Sung W-K, et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat Genet. 2012;44(7):765–9. 10.1038/ng.2295. [DOI] [PubMed] [Google Scholar]
  • 28.Lamb JR, et al. Predictive genes in adjacent normal tissue are preferentially altered by sCNV during tumorigenesis in liver cancer and May rate limiting. PLoS ONE. 2011;6(7):e20090. 10.1371/journal.pone.0020090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang X, et al. PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database. Nucleic Acids Res. 2017;45(W1):W356. 10.1093/nar/gkx374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wang X, et al. Enhancing the enrichment of Pharmacophore-Based target prediction for the polypharmacological profiles of drugs. J Chem Inf Model. 2016;56(6):1175–83. 10.1021/acs.jcim.5b00690. [DOI] [PubMed] [Google Scholar]
  • 31.Liu X, et al. PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach. Nucleic Acids Res. 2010. 10.1093/nar/gkq300. 38(Web Server issue). W609-W. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Fu L, et al. ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res. 2024;52(W1):W422–31. 10.1093/nar/gkae236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Banerjee P, et al. ProTox 3.0: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res. 2024;52(W1):W513–20. 10.1093/nar/gkae303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Szklarczyk D, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. 10.1093/nar/gkac1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Liu Y, et al. CB-Dock2: improved protein-ligand blind Docking by integrating cavity detection, Docking and homologous template fitting. Nucleic Acids Res. 2022;50(W1):W159. 10.1093/nar/gkac394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Van Der Spoel D, et al. GROMACS: fast, flexible, and free. J Comput Chem. 2005;26(16):1701–18. [DOI] [PubMed] [Google Scholar]
  • 37.Filipe HAL, et al. Molecular dynamics simulations: advances and applications. Molecules. 2022;27(7). 10.3390/molecules27072105 [DOI] [PMC free article] [PubMed]
  • 38.Ma W, et al. Ochratoxin A induces abnormal Tryptophan metabolism in the intestine and liver to activate AMPK signaling pathway. J Anim Sci Biotechnol. 2023;14(1):125. 10.1186/s40104-023-00912-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Grisetti L, et al. The role of Aurora kinase A in hepatocellular carcinoma: unveiling the intriguing functions of a key but still underexplored factor in liver cancer. Cell Prolif. 2024;57(8):e13641. 10.1111/cpr.13641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Du R, et al. Targeting AURKA in cancer: molecular mechanisms and opportunities for cancer therapy. Mol Cancer. 2021;20(1):15. 10.1186/s12943-020-01305-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Dauch D, et al. A MYC-aurora kinase A protein complex represents an actionable drug target in p53-altered liver cancer. Nat Med. 2016;22(7):744–53. 10.1038/nm.4107. [DOI] [PubMed] [Google Scholar]
  • 42.Hu K, et al. PARP–1 in liver diseases: molecular mechanisms, therapeutic potential and emerging clinical applications (Review). Mol Med Rep. 2025;32(6). 10.3892/mmr.2025.13689. [DOI] [PMC free article] [PubMed]
  • 43.Yang XD, et al. PARP inhibitor Olaparib overcomes Sorafenib resistance through reshaping the pluripotent transcriptome in hepatocellular carcinoma. Mol Cancer. 2021;20(1):20. 10.1186/s12943-021-01315-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Vahabikashi A, et al. Nuclear lamins: structure and function in mechanobiology. APL Bioeng. 2022;6(1):011503. 10.1063/5.0082656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liu H, et al. LMNA functions as an oncogene in hepatocellular carcinoma by regulating the proliferation and migration ability. J Cell Mol Med. 2020;24(20):12008–19. 10.1111/jcmm.15829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jacquet M, et al. The functions of Atg8-family proteins in autophagy and cancer: linked or unrelated? Autophagy. 2021;17(3):599–611. 10.1080/15548627.2020.1749367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Boyer-Guittaut M, et al. The role of GABARAPL1/GEC1 in autophagic flux and mitochondrial quality control in MDA-MB-436 breast cancer cells. Autophagy. 2014;10(6):986–1003 10.4161/auto.28390 [DOI] [PMC free article] [PubMed]
  • 48.Liu C, et al. Low expression of GABARAPL1 is associated with a poor outcome for patients with hepatocellular carcinoma. Oncol Rep. 2014;31(5):2043–8. 10.3892/or.2014.3096. [DOI] [PubMed] [Google Scholar]
  • 49.Anderson CM, et al. SLC27 fatty acid transport proteins. Mol Aspects Med. 2013;34(2–3):516–28. 10.1016/j.mam.2012.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wu K, et al. Loss of SLC27A5 activates hepatic stellate cells and promotes liver fibrosis via unconjugated cholic acid. Adv Sci (Weinh). 2024;11(2):e2304408. 10.1002/advs.202304408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wang MD et al. Fatty acid transport protein-5 (FATP5) deficiency enhances hepatocellular carcinoma progression and metastasis by reprogramming cellular energy metabolism and regulating the AMPK-mTOR signaling pathway. Oncogenesis. 2021;10(11):74 10.1038/s41389-021-00364-5 [DOI] [PMC free article] [PubMed]
  • 52.Gautheron J, et al. The multifaceted role of epoxide hydrolases in human health and Disease.Int. J Mol Sci. 2020;22(1). 10.3390/ijms22010013. [DOI] [PMC free article] [PubMed]
  • 53.Zhan K, et al. Identification and validation of EPHX2 as a prognostic biomarker in hepatocellular carcinoma. Mol Med Rep. 2021;24(3). 10.3892/mmr.2021.12289. [DOI] [PMC free article] [PubMed]
  • 54.Qu K, et al. Polymorphisms of glutathione S-transferase genes and survival of resected hepatocellular carcinoma patients. World J Gastroenterol. 2015;21(14):4310–22. 10.3748/wjg.v21.i14.4310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Więckowska M et al. Ochratoxin A-The current knowledge concerning hepatotoxicity, mode of action and possible Prevention. Molecules. 2023;28(18). 10.3390/molecules28186617 [DOI] [PMC free article] [PubMed]
  • 56.Wang J, et al. Detection of aberrant promoter methylation of GSTP1 in the tumor and serum of Chinese human primary hepatocellular carcinoma patients. Clin Biochem. 2006;39(4):344–8. 10.1016/j.clinbiochem.2006.01.008. [DOI] [PubMed] [Google Scholar]
  • 57.Marin JJG, et al. Models for understanding resistance to chemotherapy in liver cancer. Cancers (Basel). 2019;11(11). 10.3390/cancers11111677 [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Clough E, et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Acids Res. 2024;52(D1):D138–44. 10.1093/nar/gkad965 [DOI] [PMC free article] [PubMed]

Supplementary Materials

40360_2026_1092_MOESM1_ESM.pdf (133.7KB, pdf)

Supplementary Material 1: Supplementary File 1. ADMETlab2- Prediction of toxicity of OTA

40360_2026_1092_MOESM2_ESM.pdf (941.5KB, pdf)

Supplementary Material 2: Supplementary File 2. ProTox-3.0 - Prediction of toxicity of OTA

40360_2026_1092_MOESM3_ESM.tif (2.4MB, tif)

Supplementary Material 3: Supplementary Figure 1. Chemical structure of OTA sourced from the PubChem website

40360_2026_1092_MOESM4_ESM.jpg (2.5MB, jpg)

Supplementary Material 4: Supplementary Figure 2. Sensitivity analysis by constructing Protein-Protein interaction (PPI) networks using four progressively stringent confidence thresholds of 0.4 (A), 0.5 (B), 0.6 (C), 0.7 (D). Five genes (AURKA, PARP1, LMNA, EPHX2, and GSTP1) were consistently retained as connected nodes across all thresholds (0.4–0.7). SLC27A5 was retained at thresholds of 0.4 and 0.5 but was excluded at higher thresholds (0.6 and 0.7), while CA2 and GABARAPL1 was only retained at the 0.4 threshold

40360_2026_1092_MOESM5_ESM.jpg (1.4MB, jpg)

Supplementary Material 5: Supplementary Figure 3. Consistency of functional enrichment across PPI networks built at different confidence thresholds. GO and KEGG enrichment analyses were performed on the gene sets from the PPI networks shown in Supplementary Fig. 2 (thresholds: A=0.4, B=0.5, C=0.6, D=0.7). The top enriched terms are displayed. Despite varying network density, the core enriched biological themes remain consistently significant across all thresholds, underscoring the stability of the biological interpretation

40360_2026_1092_MOESM6_ESM.jpg (368.2KB, jpg)

Supplementary Material 6: Supplementary Figure 4. Binding affinities predicted by the two docking approaches exhibited highly consistent trends across all target proteins

40360_2026_1092_MOESM7_ESM.docx (15.8KB, docx)

Supplementary Material 7: Supplementary Table S1. Samples and clinical characteristics in each dataset

40360_2026_1092_MOESM8_ESM.docx (14.9KB, docx)

Supplementary Material 8: Supplementary Table S2. PDB IDs used for the docking analysis

Data Availability Statement

No datasets were generated or analyzed during the current study.


Articles from BMC Pharmacology & Toxicology are provided here courtesy of BMC

RESOURCES