Skip to main content
CPT: Pharmacometrics & Systems Pharmacology logoLink to CPT: Pharmacometrics & Systems Pharmacology
. 2026 Mar 15;15(3):e70178. doi: 10.1002/psp4.70178

Unraveling the Link Between Azathioprine and Acute Pancreatitis: Integrating Network Toxicology, Machine Learning, and Mendelian Randomization

Zhijun Xie 1,2, Hang Lei 2, Pengcheng Zhang 2, Zhi Li 1,3,4,, Wenfu Tang 1,2,
PMCID: PMC13097569  PMID: 41832938

ABSTRACT

Azathioprine (AZA), a widely used immunosuppressant, can induce acute pancreatitis (AP), yet the underlying molecular mechanisms remain unclear. This study employed an integrative multiomics strategy—combining network toxicology, machine learning, Mendelian randomization (MR), and molecular docking—to elucidate the biological basis of AZA‐induced AP. AZA‐associated genes were first identified through bioinformatics databases and analyzed using protein–protein interaction networks and GO/KEGG functional enrichment. Least absolute shrinkage and selection operator (LASSO) regression and support vector machine recursive feature elimination (SVM‐RFE) were applied to prioritize key differentially expressed genes for diagnostic modeling. MR was then used to examine potential causal links between gene expression and AP risk, followed by molecular docking to assess AZA–protein interactions. Sixty‐eight candidate genes related to AZA‐induced AP were identified. Enrichment analyses indicated involvement in lipid metabolic regulation, inflammatory pathways, and energy homeostasis. Machine learning highlighted seven key genes—CES1, CTSK, JAK1, NR3C2, PLIN5, WEE1, and RORA—as central to AP development. MR analysis further demonstrated that decreased expression of CES1 and CTSK may mediate AZA‐related AP susceptibility. Docking simulations revealed strong, specific binding between AZA and both CES1 and CTSK. Overall, this study identifies CES1 and CTSK as genetically protective factors and mechanistic mediators in AZA‐triggered AP. These findings offer new molecular insights into the genomic and biochemical pathways underlying this adverse drug reaction.

Keywords: acute pancreatitis, azathioprine, machine learning, mendelian randomization, molecular docking, network toxicology

Study Highlights

  • What is the current knowledge on the topic?
    • Azathioprine (AZA), a key immunosuppressant, increases acute pancreatitis (AP) risk, but the molecular mechanisms of this drug‐induced AP (DIAP) remain poorly understood, with DIAP posing diagnostic and preventive challenges.
  • What question did this study address?
    • What are the molecular mechanisms underlying AZA‐induced AP.
  • What does this study add to our knowledge?
    • CES1 and CTSK are causal protective genes against AZA‐induced AP; AZA may downregulate/inhibit them, disrupting lipid metabolism, inflammatory signaling, and energy homeostasis to drive AP.
  • How might this change drug discovery, development, and/or therapeutics?
    • Enables CES1/CTSK as AP risk biomarkers; guides targeted interventions to preserve their function, mitigating AZA’s pancreatic toxicity and enhancing drug safety.

Abbreviations

AP

acute pancreatitis

AUC

area under the curve

AZA

azathioprine

BPs

biological processes

CCs

cellular components

CE

cholesterol ester

CES

carboxylesterase

CES1

carboxylesterase 1

CPs

cysteine proteases

CTSK

cathepsin K

DEGs

differentially expressed genes

DG

diglyceride

DIAP

drug‐induced acute pancreatitis

ECM

extracellular matrix

EQTLs

expression quantitative trait loci

ER

endoplasmic reticulum

FABPs

fatty acid‐binding proteins

FFAs

free fatty acids

FXR

Farnesoid X receptor

GEO

gene expression omnibus

GO

gene ontology

GWAS

genome‐wide association study

IBD

inflammatory bowel disease

IVs

instrumental variables

IVW

inverse‐variance weighted

KEGG

Kyoto Encyclopedia of Genes and Genomes

LASSO

Least Absolute Shrinkage and Selection Operator

LD

linkage disequilibrium

MFs

molecular functions

MODS

multiple organ dysfunction syndrome

MR

Mendelian randomization

PPARα/γ

peroxisome proliferator‐activated receptors α/γ

PPI

protein–protein interaction

PUFAs

polyunsaturated fatty acids

ROC

receiver operating characteristic

SEA

similarity ensemble approach

SIRS

systemic inflammatory response syndrome

SNP

single nucleotide polymorphism

SVM‐RFE

Support Vector Machine‐Recursive Feature Elimination

TG

triglyceride

1. Introduction

Azathioprine (AZA), a cornerstone immunosuppressant, is highly valued for its efficacy in managing chronic active inflammatory bowel disease (IBD) and is recommended as a maintenance therapy for steroid‐dependent patients [1]. Despite its clinical benefits, the utility of AZA is frequently curtailed by treatment‐limiting adverse effects, most notably acute pancreatitis (AP), in addition to nausea, malaise, and arthralgia [2]. Epidemiological studies have established a substantial, approximately 5.82‐fold increased risk of AP in IBD patients exposed to AZA compared to untreated cohorts [3], with prospective data indicating an incidence of AZA‐induced AP around 7.3% [4]. Although AZA‐associated AP typically presents as a mild condition [5], its molecular pathogenesis remains incompletely characterized, posing a significant challenge to its prediction and prevention.

AP is an inflammatory disorder of the gastrointestinal tract, primarily triggered by the aberrant activation of pancreatic digestive enzymes. This typically leads to autodigestion of the pancreas, microcirculatory disturbances, and a cascade of inflammatory cytokines [6]. A significant proportion of patients, estimated at 20%–30%, develop systemic inflammatory response syndrome (SIRS) and multiple organ dysfunction syndrome (MODS), which pose severe threats to survival [7]. While biliary calculi and excessive alcohol consumption are the primary etiological factors for AP, drug‐induced causes represent clinically significant contributors. To date, over 1175 medications have been reported as potential triggers of AP [8]. Drug‐induced acute pancreatitis (DIAP), though accounting for a relatively small percentage (2%–5%) of global AP cases, demonstrates an upward trend in specific populations and presents substantial clinical challenges due to its often elusive diagnosis and poorly defined mechanisms [9]. DIAP lacks distinctive clinical manifestations, and its pathogenesis may involve hypersensitivity reactions, direct cytotoxic effects on acinar cells, sphincter of Oddi dysfunction, or the accumulation of toxic metabolites [10]. Therefore, elucidating the underlying mechanisms and establishing effective preventive strategies against DIAP, particularly for commonly used drugs like AZA, remain critical unresolved priorities.

Traditional reductionist research approaches, which often focus on isolated biomarkers or localized pathology, tend to fall short in capturing the systemic homeostatic regulation and complex pathophysiological networks involved in drug‐induced toxicities. Consequently, these methods impede a comprehensive mechanistic dissection of AZA‐induced AP. Given the multisystem interactions and intricate molecular cascades implicated in this adverse event, novel integrative methodologies are imperative to delineate risk profiles and guide precision prevention strategies.

Recent advances in bioinformatics and computational biology have provided novel tools for exploring the complex interactions between drug‐induced toxicity and diseases. Network toxicology, an emerging field derived from network pharmacology, integrates multi‐omics data with biological interaction networks to construct relational models that visually represent intricate toxicological mechanisms, thereby facilitating their analysis and prediction [11]. Molecular docking complements this by elucidating the mechanistic basis of toxicity at the molecular level, evaluating the binding potential between drug molecules and their target proteins [12]. The combined strategy of network toxicology and molecular docking has emerged as a pivotal methodology for identifying molecular targets and pathways implicated in disease pathogenesis [13]. Furthermore, machine learning algorithms demonstrate substantial potential in disease prediction and the identification of high‐precision biomarkers from complex biological datasets [14]. Mendelian randomization (MR) offers a powerful approach to infer potential causal relationships between modifiable risk factors (such as gene expression) and disease outcomes by leveraging genetic variants as instrumental variables (IVs), thereby minimizing biases from confounding and reverse causation inherent in observational studies [15].

These four approaches—network toxicology, machine learning, MR, and molecular docking—exhibit complementary synergism. Their integration can significantly enhance research depth, strengthen causal inference capabilities, and improve the reliability of results. This study leverages this integrated strategy to comprehensively investigate the potential toxicological mechanisms of AZA and its risk of inducing AP. By identifying key molecular entities involved in this process, as depicted in the research design workflow (Figure 1), this work aims to elucidate AZA's toxicological profiles and molecular mechanisms. The findings are anticipated to contribute novel insights for drug toxicity assessment, support the development of mechanism‐driven targeted interventions to mitigate AZA's adverse effects, and ultimately guide safer medication use, holding significant public health value. The systematic approach employed herein addresses some of the inherent challenges in DIAP research, such as its incompletely defined pathogenesis, by providing a multi‐layered investigation into AZA‐induced AP.

FIGURE 1.

FIGURE 1

Overview of Research Design. Assumption1 (Correlation) requires Cis‐eQTLs to exhibit strong associations with gene expression; Assumption2 (Independence) requires genetic instrumental variables not to correlate with confounding factors through pathways independent of gene expression; Assumption3 (Exclusivity Constraint) requires Cis‐eQTLs to influence AP solely through gene expression, not via other biological pathways. Circles marked with a red “X” indicate assumptions that cannot be violated (i.e., pleiotropy or invalid instrumental variables). Confounders refer to third‐party variables (e.g., population stratification, lifestyle factors, or environmental exposures) that simultaneously influence both gene expression (exposure) and AP (outcome). They represent a common source of confounding bias in traditional observational studies. In MR, the fulfillment of “Assumption 2” signifies that confounding pathways are severed, rendering the MR estimate an unbiased causal effect.

2. Methods and Materials

The methodological framework of this study was designed to rigorously investigate the molecular mechanisms of AZA‐induced AP by integrating computational toxicology, bioinformatics, machine learning, genetic epidemiology, and structural biology techniques. Each step was performed with attention to detail to ensure reproducibility and robustness of the findings.

2.1. Identification of AZA Toxicity Profiles

The chemical structure and molecular characteristics of AZA, including structural identifiers (e.g., CAS number, InChIKey), molecular descriptors (e.g., molecular weight, logP), and known bioactivity profiles, were retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov). Subsequently, the ProTox‐3.0 toxicity prediction platform (https://tox.charite.de/protox3) was employed as a primary screening tool. This platform utilizes a combination of 2D similarity‐based methods and pharmacophore‐based machine learning models to predict various toxicological endpoints. This step provided foundational data on AZA‐induced toxicity, helping to frame the subsequent targeted investigations [16].

2.2. Collection of AZA Target Genes

Based on AZA's chemical structure obtained from PubChem, potential human target genes were predicted using three distinct, established databases to ensure comprehensive coverage: the TargetNet database (http://targetnet.scbdd.com), the SwissTargetPrediction database (http://www.swisstargetprediction.ch/), and the Similarity Ensemble Approach (SEA) database (https://sea.bkslab.org/). For all database queries, the species filter was set to Homo sapiens . Predicted targets obtained from TargetNet were initially identified by UniProt IDs. To standardize gene nomenclature across all datasets, these UniProt IDs were converted to official Gene Symbols using the UniProt database (https://www.uniprot.org/). The predicted target lists from all three databases were subsequently integrated, and duplicate entries were meticulously removed to establish a consolidated and non‐redundant set of potential AZA target genes for further analysis [17]. It is important to note that “Predicted targets” refers to potential binding proteins predicted based on structural or similarity algorithms—that is, predicted results for protein targets that the drug may directly bind to or regulate—rather than drug‐induced differentially expressed genes (DEGs).

2.3. Collection of Pancreatitis‐Associated Genes

To systematically identify genes associated with pancreatitis, this study utilized two publicly available transcriptomic datasets retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/): GSE143754 and GSE194331. The GSE143754 dataset comprises six chronic pancreatitis samples, 11 pancreatic adenocarcinoma samples, and nine normal control samples. The GSE194331 dataset contains samples from 32 normal subjects and 87 pancreatitis patients. For the purpose of this study, normal samples from both datasets served as controls, while pancreatitis samples were designated as the experimental groups.

Given the disparate research platforms and experimental conditions underlying these datasets, potential batch effects, which can introduce systematic non‐biological variations, were a significant concern. To ensure the reliability of subsequent integrated analyses, the ComBat algorithm, implemented within the sva R package, was applied to the merged expression matrix to correct for these batch effects. This correction aimed to eliminate systematic variations attributable to technical factors rather than biological differences. Following batch correction, differential expression analysis was performed using the limma package in R. To maximize the capture of potential AP‐associated candidate genes and prevent the inadvertent exclusion of biologically significant genes with modest expression changes, the threshold for identifying DEGs was set at an absolute log2 fold change (∣log2 FC∣) > 0.5 with an adjusted p‐value (false discovery rate, FDR) < 0.05. Through this limma analytical pipeline, significantly dysregulated DEGs identified in the integrated and batch‐corrected dataset were preliminarily classified as candidate genes associated with AP pathogenesis, forming the basis for subsequent network toxicological analysis [18].

2.4. Screening of AZA‐AP Core Targets and PPI Network Construction

To identify potential core targets that may mediate the interactions between AZA and AP, a Venn analysis was performed. This involved finding the intersection between the AZA target gene set (derived in Section 2.2) and the set of AP‐associated DEGs (identified in Section 2.3). The genes common to both sets were considered potential core targets for AZA's action in AP pathogenesis.

This list of core target genes was subsequently inputted into the STRING database (version 11.5, https://string‐db.org/) to construct a protein–protein interaction (PPI) network. The analysis was conducted under standardized parameters: “Species: Homo sapiens ”, “Minimum required interaction score: > 0.4 (medium confidence)”, and “Disconnected nodes in the network: Hidden”. This confidence score threshold was chosen to balance network interpretability against interaction reliability, effectively filtering out low‐confidence interactions that might represent false positives [19]. The PPI data, including interaction pairs and scores, were then imported into Cytoscape software (version 3.10.3, https://cytoscape.org/) for topological analysis and network visualization. Within Cytoscape, nodes (representing the core target proteins) were ranked by descending degree centrality, a fundamental graph metric that quantifies the number of direct connections a node has. Visual encoding parameters were applied to map interaction strength to node attributes: darker coloration and larger node diameters were used to indicate higher connectivity (i.e., higher degree centrality) [20].

2.5. GO/KEGG Enrichment Analysis

To elucidate the biological functions and signaling pathways underlying the AZA‐AP core targets identified in Section 2.4, systematic enrichment analyses were conducted using the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov/). Gene ontology (GO) enrichment analysis was performed to evaluate biological processes (BPs), Cellular Components (CCs), and molecular functions (MFs), thereby characterizing the principal biological roles of the core target genes [21]. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was employed to identify significantly enriched signaling and metabolic pathways [22]. Particular attention was given to pathways mechanistically linked to AP pathophysiology, including those involved in inflammation, lipid metabolism, apoptosis, and immune response, to delineate the core signaling networks potentially modulated by AZA. These analyses aimed to reveal key biological contexts where the core targets exert critical roles and to demonstrate their coordinated regulation of AP‐associated cellular events through enriched KEGG pathways. The results of these enrichment analyses were visualized and interpreted using the Microbiostats platform (https://www.bioinformatics.com.cn) to enable an intuitive presentation of essential biological functions and pathway insights.

2.6. Machine Learning‐Based Development and Validation of an AZA‐AP Prediction Model

To identify and validate core diagnostic gene biomarkers predictive of the association between AZA and AP, a machine learning workflow was implemented. The initial candidate feature gene set was formed by integrating the core intersection targets from Section 2.4 with the broader set of AP‐associated DEGs from Section 2.3. The batch‐corrected combined dataset (GSE143754 and GSE194331), as described in Section 2.3, served as the training set, comprising normal samples (controls) and pancreatitis samples (experimental group).

Two distinct feature selection algorithms—Least Absolute Shrinkage and Selection Operator (LASSO) regression and Support Vector Machine‐Recursive Feature Elimination (SVM‐RFE)—were applied in parallel to screen for optimal feature gene subsets. LASSO regression, a penalized regression method, selected genes with non‐zero regression coefficients through regularization path analysis, effectively performing variable selection and shrinkage [14]. SVM‐RFE iteratively trains an SVM model and removes features with the smallest ranking criteria, identifying gene combinations that exert the maximal impact on classifier performance [23]. Genes that were independently identified by both algorithms were then intersected to establish a high‐confidence diagnostic biomarker set. This dual‐algorithm approach enhances the robustness of biomarker selection, as genes identified by multiple methods are more likely to be genuinely significant.

The predictive efficacy of these key genes was evaluated through receiver operating characteristic (ROC) curve analysis and calibration curve assessment [14]. ROC analysis was used to calculate the capacity of individual genes and gene signatures to discriminate between normal and AP samples, quantified by the area under the curve (AUC). An AUC value approaching 1 indicates perfect diagnostic accuracy. Calibration curves were generated to measure the agreement between the predicted probabilities of AP and the observed incidence, with proximity to the diagonal line reflecting better prediction reliability.

In this study, we used the pROC package in R to evaluate the discriminative ability of each candidate gene between AP and non‐AP samples. The outcome variable was defined as the presence of acute pancreatitis (AP = 1, non‐AP = 0), and the predictor was the normalized expression value of the corresponding gene for each sample. The continuous expression values of each gene across all samples were then used as the predictor to compute the ROC curve and AUC.

2.7. Selection of Genetic IVs

For the MR analyses aimed at exploring potential causal relationships between the expression levels of key genes (identified in Section 2.6) and AP, IVs were sourced from the eQTLGen Consortium (https://www.eqtlgen.org). The eQTLGen Consortium is a large‐scale international project that has systematically identified expression quantitative trait loci (eQTLs) across human tissues and cell types by integrating whole‐genome genotyping and whole‐blood transcriptomic data from over 30,000 predominantly European ancestry individuals [24]. This resource provides robust genetic variants (primarily single nucleotide polymorphisms, SNPs) that regulate gene expression.

SNPs significantly associated with the expression levels of the key genes identified by machine learning were selected as IVs, adhering to the core assumptions of MR: Relevance (The IVs must be strongly associated with the exposure); Independence (The IVs must be independent of any unmeasured confounders that could affect both gene expression and AP risk); Exclusion Restriction (The IVs must influence the outcome solely through their effect on the exposure and not via any alternative pathways).

Only cis‐eQTLs, defined as SNPs located within a ±500 kb window of the target gene's transcription start site, were included. This increases the likelihood that the SNPs directly regulate the proximal gene's expression [25]. SNP‐gene associations were required to meet genome‐wide significance (p < 5 × 10–8). Significant SNPs underwent linkage disequilibrium (LD) clumping based on European reference populations (1000 Genomes Project data) using an r 2 threshold < 0.3. This step ensures that the selected IVs for each gene are largely independent of each other. Finally, F‐statistics were calculated for each IV using the formula F = [R 2 × (N—2)]/(1—R 2), where R 2 is the proportion of variance in gene expression explained by the SNP and N is the sample size of the eQTL study. IVs with F‐statistics > 10 were considered robust, minimizing potential weak instrument bias [26]. The meticulous selection of IVs according to these established criteria is fundamental to the validity of MR analyses.

2.8. Selection of Outcome Factor

To investigate the potential causal relationships between the expression levels of AZA‐affected key genes and AP, AP was selected as the outcome for the MR analysis. Summary‐level genetic association data for AP were derived from a publicly available genome‐wide association study (GWAS) conducted by the FinnGen research project (https://www.finngen.fi/en). FinnGen is a large‐scale biobank initiative that integrates Finnish national health registry data with genomic data from Finnish participants. The AP GWAS dataset used in this study encompassed 198,166 participants of European ancestry (3022 AP cases and 195,144 controls) and included data for 16,380,428 SNPs.

To ensure the validity and reliability of the MR analysis and to prevent confounding bias arising from population stratification, this study strictly limited the outcome dataset to European‐ancestry participants. This aligns with the predominant ancestry (European) of the eQTLGen exposure data used for IV selection (Section 2.7). Such ancestry matching between exposure and outcome datasets is a critical prerequisite for conducting valid two‐sample MR analyses, as it mitigates the risk of spurious associations due to systematic differences in allele frequencies and genetic architecture between populations.

2.9. MR Analysis

To evaluate the potential causal effects of key AZA‐AP target gene expression on acute pancreatitis risk, a two‐sample MR analysis was conducted, following established research protocols [26]. The primary causal estimates were obtained using a random‐effects inverse‐variance weighted (IVW) model to account for heterogeneity across IVs. To assess robustness and address potential violations of MR assumptions, particularly horizontal pleiotropy, four complementary MR methods were applied: MR‐Egger regression, the weighted median, simple mode, and weighted mode. Consistency across these approaches would reinforce the reliability of the IVW estimates.

Sensitivity analyses were conducted to further ensure result robustness. Cochran's Q statistic (IVW and MR‐Egger) was used to test for heterogeneity, while the MR‐Egger intercept evaluated directional horizontal pleiotropy. MR‐PRESSO was employed to detect and correct for outlier SNPs contributing to pleiotropy or heterogeneity, with causal estimates recalculated after outlier removal when necessary. Additionally, leave‐one‐out analysis was conducted by iteratively excluding each SNP to determine whether the overall causal effect was disproportionately influenced by any single instrument, thereby strengthening confidence in the final MR findings.

2.10. Molecular Docking

To validate the structural plausibility of interactions between AZA and key AZA‐AP protein targets, molecular docking analyses were conducted. The 3D structure of AZA was retrieved from PubChem, while high‐resolution (≤ 2.5 Å) human protein crystal structures with intact ligand‐binding sites were obtained from the RCSB Protein Data Bank. When multiple structures were available, complexes with endogenous ligands or the highest‐resolution apo forms were prioritized.

Docking was performed using CB‐Dock2, which identifies putative binding pockets and conducts blind docking without predefined binding‐site constraints. Binding affinity was evaluated using the predicted binding free energy (ΔG_bind). Values below 0 kcal/mol indicate thermodynamically favorable interactions, whereas ΔG_bind < −5.0 kcal/mol is generally regarded as evidence of strong and stable ligand–receptor binding, reflecting thresholds commonly applied in computational drug‐discovery studies [14].

3. Results

3.1. Toxicity Evaluation of AZA

A comprehensive analysis using multi‐source computational toxicology prediction tools, primarily ProTox‐3.0, was conducted to systematically characterize the potential toxicity profile of AZA.1 The predictive model analysis indicated that AZA possesses potential toxicological liabilities across several key endpoints (Data S1). As illustrated in Figure 2, AZA (represented in blue) showed predicted risks for Immunotoxicity, Mutagenicity, and Carcinogenicity when compared against the average profile of active molecules (represented in orange). Additionally, the predictive results suggested that AZA exhibits potential for organ‐specific toxicity, particularly hepatotoxicity. Collectively, these computational predictions delineated the multidimensional potential toxicological characteristics of AZA. These findings provided an important target‐oriented foundation for the subsequent in‐depth exploration of the molecular mechanisms underlying AZA‐induced adverse effects, including its potential role in pancreatitis, which often involves immune and metabolic dysregulation.

FIGURE 2.

FIGURE 2

Potential toxicity profile of AZA. Blue represents AZA, while orange denotes the average of active molecules. This figure demonstrates that the potential toxicological target sites of AZA are primarily enriched in key toxicological endpoints such as Immunotoxicity, Mutagenicity, and Carcinogenicity.

3.2. Collection of AZA Target Genes

Leveraging the molecular structure of AZA (Data S2), this study systematically screened for potential human protein targets of AZA by integrating predictive results from three distinct databases. After merging the data from these sources and removing redundant entries, a consolidated list of 436 potential human target genes associated with AZA was ultimately identified. This comprehensive list of potential AZA targets (provided in Data S3) formed the basis for subsequent intersection with pancreatitis‐associated genes to identify common molecular players.

3.3. Collection of Pancreatitis‐Associated Genes

This study integrated two transcriptomic datasets, GSE143754 and GSE194331, to identify genes differentially expressed in pancreatitis. To eliminate the impact of potential batch effects on the analysis results, rigorous batch correction was performed using the ComBat algorithm. The effectiveness of this correction was confirmed by visualizing the data distribution before and after the procedure. Pre‐correction boxplots (Figure 3A) and Principal Component Analysis (PCA) plots (Figure 3C) showed significant differences in expression level distributions and systematic separation of samples based on their dataset of origin, indicative of prominent batch effects. Post‐correction boxplots (Figure 3B) and PCA plots (Figure 3D) demonstrated converged expression value distributions and systematic integration of samples, respectively, confirming the successful removal of these systematic, non‐biological variations.

FIGURE 3.

FIGURE 3

(A) Pre‐correction boxplots show significant differences in expression level distributions among samples from different GEO datasets, with evident systematic bias, indicating prominent batch effects. (B) Post‐correction boxplots (using Combat batch correction) demonstrate converged expression value distributions across samples, with significantly reduced box height variability and markedly improved overall data consistency, suggesting effective elimination of batch effects. (C) Pre‐correction PCA plots reveal systematic separation of samples from datasets of different origins in principal component space, with clustering patterns strongly correlated to data sources, indicating significant batch effects introduced by technical factors. (D) Post‐correction PCA plots display a systematic integration trend of samples from both datasets in principal component space, with significantly reduced within‐group dispersion and convergent clustering patterns, confirming effective removal of systematic deviations induced by technical factors. (E) Heatmap results of DEG analysis on the integrated and corrected dataset visualize the expression patterns of samples from GSE143754 and GSE194331 datasets, where red denotes high expression and blue denotes low expression. This figure enables intuitive identification of DEGs and clustering of samples with similar expression profiles. (F) Volcano plot results of DEG analysis on the integrated and corrected dataset illustrate the distribution of logFC and –log10 (adj.p.val) for DEGs. Using |logFC| ≥ 0.5 and adj.p.val ≤ 0.05 as thresholds, it distinguishes between significantly upregulated, downregulated, and non‐differential genes, visually highlighting inter‐group expression differences.

Subsequently, DEG analysis was conducted on the integrated and corrected dataset. The results were visualized using heatmaps (Figure 3E), which displayed the expression patterns of DEGs across samples, and volcano plots (Figure 3F), which illustrated the magnitude of fold change and statistical significance for each gene. A total of 2304 significantly differentially expressed pancreatitis‐associated genes were ultimately identified, based on the criteria of an adjusted p‐value < 0.05 and an absolute log2FC > 0.5. Among these, 1024 genes were found to be upregulated, and 1354 genes were downregulated in pancreatitis samples compared to normal controls. The complete list of these pancreatitis‐associated DEGs, including gene names, log2FC values, p‐values, adjusted p‐values, and other relevant information, is detailed in Data S4.

3.4. Screening of AZA‐AP Core Targets and Construction of the PPI Network

By performing a Venn analysis to find the intersection between the AZA potential target gene set (436 genes, Section 3.2) and the pancreatitis‐related DEGs set (2304 genes, Section 3.3), this study identified 74 common genes (Figure 4A). These intersection genes were preliminarily recognized as potential core targets through which AZA might exert its effects on the pathological process of AP.

FIGURE 4.

FIGURE 4

(A) Venn diagram shows 74 potential core targets of AZA acting on AP, derived from the intersection of 436 AZA potential targets (screened via integration of TargetNet, SwissTargetPrediction, and SEA databases) and 2304 pancreatitis DEGs in corrected GSE143754 and GSE194331 transcriptomic data. (B) A PPI network was constructed using the STRING database with a confidence threshold ≥ 0.4 and removal of unconnected isolated nodes. In the network, nodes represent proteins, and edges denote interactions, visually revealing the functional associations between AZA‐related targets and AP. (C) The PPI network was visualized using Cytoscape 3.10.3 software, with targets sorted and visually mapped based on Degree centrality (i.e., the number of directly connected edges). Node size and color intensity were positively correlated with Degree values, where darker colors and larger nodes indicate stronger protein interaction strength. (D) GO functional enrichment analysis of 68 AZA‐AP core target genes. E. KEGG pathway enrichment analysis of 68 AZA‐AP core target genes. Color bar: “‐log10(p value)” indicates the significance of enrichment analysis. Darker colors (red) denote greater enrichment significance. Size bar: “Count” represents the number of genes enriched in this pathway. Larger points indicate more enriched genes.

These 74 core target genes were then imported into the STRING database (version 11.5) for PPI analysis to explore their functional interactions. After applying a medium confidence threshold for interactions (score > 0.4) and removing unconnected, isolated nodes from the network, 68 of these core target genes were ultimately included in the construction of the PPI network (Figure 4B). This network visually represents the potential functional associations between AZA‐related targets and AP‐relevant genes.

The constructed PPI network was subsequently visualized using Cytoscape 3.10.3 software. To characterize the topological importance of individual nodes within this network, the target proteins were sorted and visually mapped based on their Degree centrality (i.e., the number of direct PPIs). In the visualization (Figure 4C), the size and color intensity of the nodes were made proportional to their Degree values, with darker colors and larger circles indicating a greater number of interactions with other proteins, highlighting central hubs within the network.

3.5. GO/KEGG Enrichment Analysis

Targeting the 68 AZA‐AP core target genes identified from the PPI network (Section 3.4), systematic GO functional enrichment analysis and KEGG pathway enrichment analysis were conducted using the DAVID database (version 6.8) to understand their collective biological significance.

GO functional enrichment analysis revealed that these core target genes were significantly enriched in several key BPs, CCs, and MFs. Notably, prominent BPs included lipid metabolic processes, inflammatory responses, regulation of apoptotic processes, and cell migration (Figure 4D). These findings suggest that AZA might influence AP by modulating these fundamental cellular activities.

KEGG pathway enrichment analysis further delineated the signaling pathways in which these core target genes are significantly involved. The analysis revealed significant enrichment in pathways closely related to the known pathophysiology of AP (Figure 4E). Major enriched pathway categories included: Lipid‐related signaling pathways (lipid and atherosclerosis, sphingolipid signaling pathway); Inflammatory signaling pathways (NF‐κB signaling pathway, Toll‐like receptor signaling pathway [27], MAPK signaling pathway [28], PI3K‐Akt signaling pathway [29]); Energy metabolism pathways (PPAR signaling pathway [30]).

Collectively, these enrichment results suggest that the potential molecular mechanisms by which AZA influences AP may involve a complex interplay of dysregulating lipid metabolic pathways, activating key inflammatory signaling cascades, and disrupting cellular energy metabolic homeostasis. These perturbations, acting synergistically, could ultimately contribute to the occurrence and progression of pancreatic inflammation through processes such as promoting pancreatic cell apoptosis and aberrant cell migration.

3.6. Development and Validation of an AZA‐AP Predictive Model Using Machine Learning

To identify robust diagnostic gene markers associated with AZA‐induced AP, two machine learning algorithms—LASSO regression and SVM‐RFE—were applied to the integrated, batch‐corrected transcriptomic dataset. LASSO regression, optimized via 10‐fold cross‐validation, yielded a nine‐gene (Figure 5A,B) signature with strong discriminative ability (training AUC = 0.942) (Figure 5C) and consistent performance in internal validation cohorts (AUC = 0.907 in GSE143754; AUC = 0.969 in GSE194331; Figure 5D,E), demonstrating good generalizability. Independently, SVM‐RFE identified ten key DEGs (Figure 5F), and intersecting both algorithms produced seven high‐confidence biomarkers: CES1, CTSK, JAK1, NR3C2, PLIN5, WEE1, and RORA (Figure 5G). These genes were subsequently used for downstream MR analyses.

FIGURE 5.

FIGURE 5

A. 10‐fold cross‐validation optimizes LASSO regression parameters, plotting the relationship between the binomial deviation curve and log(λ). A vertical line marks the optimal λ value determined by cross‐validation, corresponding to the minimum binomial deviation. (B) LASSO coefficient profile plot. LASSO regression achieves high‐dimensional feature screening through L1 regularization, zeroing out coefficients of irrelevant features to identify key predictor variables. This figure illustrates the dynamic trajectory of feature coefficients as the regularization parameter λ changes, visually demonstrating the feature screening process. (C) Merged dataset. Performance of the 9‐gene signature screened by LASSO regression in the training set. This 9‐gene combination exhibits excellent discriminative performance on the training set, with an AUC value of 0.942. (D, E) D. Validation set GSE143754; E. Validation set GSE194331. Performance of the 9‐gene signature screened by LASSO regression in the internal validation sets. It maintains robust predictive efficacy in internal validation, with AUC values reaching 0.907 and 0.969, respectively. (F) Merged dataset. Identification of 10 AZA‐AP‐associated key DEGs screened by the SVM‐RFE algorithm. (G) Venn diagram of the intersection between LASSO regression and SVM‐RFE screening results, revealing 7 high‐confidence AZA‐AP key genes. (H) Calibration curve analysis. The predictive model constructed using these 7 key genes demonstrates good consistency between the predicted probability of AP occurrence and the observed risk. (I–O) ROC curve analyses of the 7 key genes individually, revealing their ability to distinguish AP from normal samples.

Model calibration indicated strong agreement between predicted and observed AP risk (Figure 5H), and ROC analyses further confirmed that each of the seven genes individually achieved an AUC > 0.7 (Figure 5I–O), supporting their diagnostic relevance. The high accuracy of predictive models does not directly equate to their biological translational potential. Their core value lies in assisting the prioritization of candidate genes for subsequent experimental validation. Therefore, predictive models need not achieve perfect performance (e.g., AUC > 0.9) to possess translational value; even moderate predictive performance (e.g., AUC > 0.7) can effectively prioritize genes for functional validation, biomarker development, or therapeutic target identification. Collectively, the results establish CES1, CTSK, JAK1, NR3C2, PLIN5, WEE1, and RORA as credible AZA‐AP biomarkers with favorable predictive performance.

3.7. MR Analysis

Using the genetic IVs selected for the seven key genes (Data S5) and AP GWAS data from FinnGen, we performed two‐sample MR analyses to assess the causal effects of gene expression levels on AP risk. IVW estimates revealed significant protective associations for two genes: higher genetically predicted CES1 expression was linked to reduced AP risk (OR = 0.950; 95% CI: 0.909–0.993; p = 0.022), and increased CTSK expression showed a similar protective effect (OR = 0.977; 95% CI: 0.960–0.994; p = 0.008) (Figure 6).

FIGURE 6.

FIGURE 6

Forest plot illustrating the MR analysis results for the association between CES1/CTSK gene expression and AP risk. Horizontal lines denote 95% confidence intervals (CIs) for odds ratios (ORs), with vertical dashed lines indicating the null effect threshold (OR = 1). p < 0.05 denote statistically significant causal relationships.

Sensitivity analyses supported the robustness of these findings (Table 1). For CES1, no evidence of heterogeneity (Cochran's Q p = 0.982) or directional pleiotropy (MR‐Egger intercept p = 0.209) was observed, and MR‐PRESSO detected no outliers. For CTSK, heterogeneity was absent (Q p = 0.990); although the MR‐Egger intercept suggested potential pleiotropy (p = 0.025), MR‐PRESSO found no significant outliers, and both the weighted median and weighted mode methods consistently indicated protective effects. Leave‐one‐out analyses showed that no individual SNP disproportionately influenced the results.

TABLE 1.

Presents the key results of MR analysis and sensitivity analyses. JAK1 had only one genetic IV meeting the screening threshold (p < 5 × 10−8 and independent after LD clumping), so the Wald ratio method of single‐SNP MR was used to estimate the causal effect.

Exposure Method Snp EQTLgenBeta SE p OR LCI95 UCI95 Q_pval MR_presso Pleiotropyle
CES1 MR Egger 51 −0.096 0.041 0.025 0.909 0.838 0.986 0.987 0.986 0.209
Weighted median 51 −0.036 0.032 0.268 0.965 0.906 1.028
Inverse variance weighted 51 −0.051 0.022 0.022 0.950 0.909 0.993 0.982
Simple mode 51 −0.017 0.051 0.745 0.983 0.889 1.088
Weighted mode 51 −0.059 0.032 0.073 0.943 0.885 1.004
CTSK MR Egger 166 0.021 0.021 0.332 1.021 0.979 1.065 0.995 0.988 0.025
Weighted median 166 −0.029 0.014 0.043 0.971 0.944 0.999
Inverse variance weighted 166 −0.023 0.009 0.008 0.977 0.960 0.994 0.990
Simple mode 166 −0.057 0.031 0.071 0.945 0.888 1.004
Weighted mode 166 −0.063 0.022 0.005 0.939 0.899 0.981
WEE1 MR Egger 42 0.004 0.083 0.959 1.004 0.853 1.182 0.998 0.997 0.665
Weighted median 42 −0.026 0.051 0.608 0.974 0.882 1.076
Inverse variance weighted 42 −0.028 0.036 0.438 0.972 0.905 1.044 0.998
Simple mode 42 −0.021 0.086 0.804 0.979 0.828 1.158
Weighted mode 42 −0.027 0.061 0.657 0.973 0.864 1.096
RORA MR Egger 8 −0.523 0.301 0.133 0.593 0.329 1.070 0.610 0.655 0.338
Weighted median 8 −0.267 0.164 0.104 0.765 0.555 1.056
Inverse variance weighted 8 −0.237 0.123 0.055 0.789 0.620 1.005 0.589
Simple mode 8 −0.277 0.234 0.275 0.758 0.479 1.199
Weighted mode 8 −0.281 0.206 0.215 0.755 0.504 1.131
PLIN5 MR Egger 5 −0.043 0.371 0.915 0.958 0.463 1.982 0.615 0.690 0.645
Weighted median 5 0.138 0.184 0.451 1.148 0.801 1.646
Inverse variance weighted 5 0.129 0.154 0.400 1.138 0.842 1.538 0.725
Simple mode 5 0.280 0.277 0.370 1.323 0.769 2.278
Weighted mode 5 0.232 0.234 0.378 1.261 0.797 1.994
NR3C2 MR Egger 3 −0.139 0.865 0.899 0.871 0.160 4.740 0.841 NA 0.914
Weighted median 3 −0.042 0.199 0.835 0.959 0.649 1.418
Inverse variance weighted 3 −0.024 0.182 0.895 0.976 0.684 1.394 0.971
Simple mode 3 −0.042 0.242 0.879 0.959 0.596 1.543
Weighted mode 3 −0.046 0.234 0.864 0.956 0.604 1.512
JAK1 Wald ratio 1 0.077 0.312 0.806 1.080 0.586 1.989 NA NA NA

Abbreviations: Beta, beta coefficients; CES1, carboxylesterase 1; CI 95, confidence intervals; CTSK, Cathepsin K; JAK1, janus kinase 1.; NR3C2, nuclear receptor subfamily 3 group C member 2; OR, odds ratios; PLIN5, Perilipin 5; RORA, retinoic acid receptor‐related orphan receptor alpha; SE, standard error; SNP, single nucleotide polymorphism; WEE1, WEE1 G2 checkpoint kinase.

Collectively, these MR findings provide genetic evidence that elevated expression of CES1 and CTSK exerts a causal protective effect against AP. Integrated with pathway enrichment results implicating lipid metabolism, inflammation, and energy homeostasis, these genes may modulate pancreatic injury and inflammatory responses, offering mechanistic plausibility for their protective roles. MR thus strengthens causal inference by reducing confounding and reverse causation relative to observational designs.

3.8. Molecular Docking Analysis

To explore whether AZA directly interacts with the protein products of CES1 and CTSK—the two genes identified as causally protective against AP in the MR analysis—molecular docking was conducted using CB‐Dock2. Docking simulations showed strong predicted affinities between AZA and both proteins, with ΔG_bind values below −5.0 kcal/mol, indicating thermodynamically favorable and potentially biologically meaningful binding interactions.

Structural visualization revealed distinct AZA binding conformations within predicted pockets of CES1 and CTSK (Figure 7), highlighting specific residues likely contributing to complex stabilization through non‐covalent interactions. These docking results are consistent with network toxicology predictions and MR findings, suggesting that AZA may directly bind CES1 and CTSK with high affinity, thereby influencing their biological functions. This structural evidence provides a plausible mechanistic link connecting genetic associations to AZA‐induced pancreatic toxicity and reinforces the hypothesized roles of CES1 and CTSK in the pathogenesis of AZA‐related AP.

FIGURE 7.

FIGURE 7

Molecular docking visualization of AZA with CES1 and CTSK proteins. This visualization demonstrates the binding conformations of AZA to CES1 and CTSK. The binding free energy values (ΔGbind) are quantitatively presented to support the mechanistic interpretation of AZA's pharmacological effects.

4. Discussion

This study utilized in silico toxicity prediction to demonstrate that azathioprine (AZA) exhibits elevated risks of immunotoxicity, mutagenicity, and carcinogenicity. The predicted immunotoxicity likely arises from AZA's immunosuppressive properties, which, upon prolonged use, may disrupt immune homeostasis and provoke non‐specific inflammatory responses. Furthermore, its mutagenic potential may readily induce stress and apoptotic responses in pancreatic tissue. These findings provide critical insights and a clear research direction for subsequent mechanistic investigations into the molecular basis of AZA‐associated adverse effects during pancreatitis treatment.

Additionally, machine learning algorithms were employed in this study for predictive modeling. However, it is important to note that PPI analysis and machine learning serve distinct objectives and methodological foundations. The primary goal of PPI analysis is to construct a protein interaction network from DEGs, thereby identifying hub genes—those occupying central positions in the disease‐associated biological network. These hubs are further subjected to GO and KEGG enrichment analyses to elucidate potential signaling pathways and molecular mechanisms. Thus, PPI analysis is predominantly mechanistic and pathway‐oriented. In contrast, machine learning focuses on selecting a subset of DEGs with optimal diagnostic and predictive performance, prioritizing discriminative accuracy over network topology. Furthermore, core nodes in PPI networks (high‐degree genes) are often well‐studied or functionally pleiotropic molecules; their high centrality does not necessarily translate into strong biomarker utility. Accordingly, machine learning identifies clinically actionable molecular markers, whereas PPI highlights regulatory hubs within biological networks. Therefore, partial overlap between the gene sets identified by these two approaches is desirable, while incomplete concordance is both expected and scientifically reasonable.

Carboxylesterase 1 (CES1) plays a key role in lipid homeostasis through triglyceride (TG) and diglyceride (DG) hydrolysis [31, 32]. CES1 is a downstream target of farnesoid X receptor (FXR), with FXR activation reducing hepatic TG in a CES1‐dependent manner [33, 34]. Overexpression of CES1 inhibits lipid droplet accumulation and promotes reverse cholesterol transport [35], whereas CES1 deficiency or inhibition predisposes to hyperlipidemia and non‐alcoholic fatty liver disease (NAFLD) [36]. The current findings indicate that AZA may disrupt pancreatic lipid metabolism by downregulating or inhibiting CES1 activity, leading to intracellular TG accumulation [37]. Abnormal lipid metabolites may activate the PI3K‐Akt pathway, upregulating pro‐inflammatory cytokines (TNF‐α, IL‐1β, IL‐6) [38]. Concurrently, elevated DG levels can activate protein kinase C (PKC), triggering the MAPK pathway, intensifying oxidative stress and cytokine release, ultimately contributing to AP onset [39].

CES1 also supports mitochondrial energy metabolism by hydrolyzing TGs and DGs to generate free fatty acids (FFAs), including PUFAs, which serve as substrates for β‐oxidation [40]. PUFAs activate PPARα/γ, upregulating fatty acid transport and oxidation genes (e.g., CD36, CPT1A) to enhance energy production [41]. AP involves mitochondrial dysfunction and high energy demands due to inflammation [42]. CES1‐mediated β‐oxidation may protect cells by sustaining ATP homeostasis. AZA or its metabolites may reduce CES1 activity, limiting FFA supply, impairing β‐oxidation, decreasing ATP, and sensitizing acinar cells to injury, promoting premature trypsinogen activation and pro‐inflammatory release—key AP drivers. CES1 has been shown to promote survival under metabolic stress via fatty acid oxidation [43].

CES1 also maintains ER protein homeostasis by interacting with GRP78/BiP, preventing its aberrant secretion and preserving ER integrity [44]. In AP, this interaction may weaken, allowing GRP78 release for protective effects [45]. CES1 may further exert immunoregulatory roles, potentially modulating NF‐κB to suppress pro‐inflammatory mediators or upregulate IL‐10 [46, 47]. Thus, CES1 protects against AP via a multifaceted network integrating metabolism, energy homeostasis, and inflammation, establishing it as a key endogenous protective factor.

Cathepsin K (CTSK), a lysosomal cysteine protease highly expressed in osteoclasts, mediates bone resorption by degrading type I collagen [48]. Beyond bone, CTSK participates in tissue repair by degrading damaged ECM and is expressed in macrophages and smooth muscle cells, contributing to ECM remodeling and inflammation [49]. In AP, characterized by pancreatic necrosis and multi‐organ risk, CTSK may be protective [50] by clearing necrotic debris, limiting inflammation spread, and reducing SIRS/MODS risk. It may also inhibit premature zymogen activation by degrading specific substrates [51]. During recovery, CTSK could prevent pathological fibrosis, preserving pancreatic function.

Emerging research shows that CTSK expression is upregulated by pro‐inflammatory cytokines and promotes M2 macrophage polarization via TLR4 signaling, thereby attenuating inflammation [52]. In AP, TLR4 activation by DAMPs drives pro‐inflammatory cytokine release [53], while CTSK‐induced M2 polarization suppresses NF‐κB and fosters tissue repair [54]. Given AZA's predicted binding to CTSK and its immunosuppressive effects, AZA or its metabolites may inhibit CTSK activity, impairing M2 polarization, causing NF‐κB overactivation, and worsening pancreatic inflammation. Concurrently, AZA's systemic lymphocyte suppression may disrupt local immune balance [1], synergistically exacerbating AP.

The molecular docking simulations in this study, demonstrating significant predicted binding affinities of AZA to both CES1 and CTSK proteins, corroborate the findings from network toxicology and MR analysis. This suggests that AZA may directly interact with and thereby disrupt the physiological functions of CES1 and CTSK, positioning these interactions as central mechanistic drivers in AZA‐induced pancreatic toxicity. The convergence of evidence from these diverse computational methodologies provides a robust foundation for the proposed mechanisms.

This study aligns with prior reports linking AZA to AP [3, 4], but its novelty lies in integrating network toxicology, machine learning, MR, and molecular docking to form a multidimensional validation framework. This approach enhances mechanistic insight by validating DEG patterns across datasets and identifying core targets, with MR providing causal evidence for CES1 and CTSK roles, bolstering scientific rigor. This strategy enhances result credibility and offers a new paradigm for drug toxicity research. For instance, pancreatic acinar cell models can be used for AZA exposure experiments to validate AZA‐induced downregulation or inhibition of CES1 and CTSK. Experimental endpoints could include CES1/CTSK expression levels, lipid accumulation, mitochondrial function, and inflammatory cytokine profiles. Additionally, gene‐modified animal models (e.g., CES1 or CTSK knockout or transgenic mice) could be used to assess whether regulating these genes alters susceptibility to AZA‐induced pancreatic injury.

Furthermore, this study identifies CES1 and CTSK as causal protective factors against azathioprine (AZA)‐induced acute pancreatitis (AP), which has substantial clinical translational value. For example, pre‐treatment screening for low‐expression or loss‐of‐function variants in CES1 and CTSK may help identify patients at high risk for AZA‐associated AP. CES1 copy number variations and CTSK promoter variants could be included in pharmacogenetic panels together with established thiopurine metabolism genes (e.g., TPMT, NUDT15) to optimize dosing and prevent adverse reactions. In addition, baseline or treatment‐induced changes in CES1 and CTSK expression—detected by blood transcriptomics or serum proteomics—may serve as early biomarkers of pancreatic stress. Integrating these biomarkers into clinical monitoring algorithms could provide early warning systems for impending AZA‐induced AP. Moreover, targeted modulation of CES1 and CTSK activity (e.g., using small‐molecule stabilizers or inducers) could serve as preventive co‐therapies to maintain pancreatic metabolic and immune homeostasis during AZA treatment. Collectively, translational strategies such as pharmacogenetic screening, risk biomarker development, and mechanism‐based therapeutic intervention exemplify precision medicine, bridging computational toxicogenomics and clinical practice.

Despite the valuable insights, this study has limitations. First, core target and pathway identification relies on public databases and computational algorithms, which are constrained by algorithmic assumptions, data biases, and source quality, potentially compromising result accuracy and comprehensiveness. Second, while network toxicology, MR, and molecular docking are robust for hypothesis generation, their predictions remain computational. Experimental validation—via in vitro assays, in vivo AZA‐induced AP models, and clinical studies—is essential to confirm the biological roles of CES1 and CTSK and their causal contributions to AZA‐associated AP pathogenesis.

In addition, despite systematically integrating multi‐dimensional approaches such as DEG analysis, drug target prediction, machine learning screening, causal inference, and molecular docking, this study still exhibits insufficient overlap between different omics levels. This “low overlap” phenomenon represents a common challenge in current systems toxicology and multi‐omics integration research, primarily stemming from systematic differences in data sources, algorithmic principles, and sample types. For instance, transcriptomic signals are susceptible to tissue specificity and inflammatory states, while predicted targets often rely on chemical structure or molecular similarity models, making biological correspondence between the two challenging. Furthermore, inconsistencies among algorithms can lead to significant variations in candidate gene rankings. Future research can enhance integration effectiveness through the following approaches: incorporating multi‐tissue, multi‐time‐point transcriptomic and proteomic data to improve cross‐omics consistency; employing computational frameworks like Bayesian or ensemble learning to systematically fuse omics signals; and integrating experimental validation to distinguish genuine causal targets from downstream effect signals, thereby strengthening the biological reliability of research conclusions.

5. Conclusion

This study integrated network toxicology, machine learning, MR, and molecular docking to elucidate molecular mechanisms potentially underlying AZA‐induced acute pancreatitis. The multi‐omics evidence converges on a central hypothesis: AZA may promote AP by reducing the expression or impairing the function of two protective proteins, CES1 and CTSK, thereby disrupting lipid metabolic homeostasis, activating pro‐inflammatory pathways (including lipid and atherosclerosis, sphingolipid signaling, NF‐κB, TLR4, PI3K–Akt, and MAPK), and perturbing energy‐metabolic networks such as the PPAR pathway. These coordinated disturbances may drive excessive pancreatic cell apoptosis, aberrant cell migration, and amplification of the inflammatory cascade.

MR analyses provided genetic support for a causal protective effect of higher CES1 and CTSK expression against AP, while molecular docking demonstrated a strong predicted AZA binding affinity to both proteins, supplying structural plausibility for direct functional interference.

Collectively, this work offers a robust mechanistic framework for understanding AZA‐induced pancreatic toxicity. The identification of CES1 and CTSK and their regulatory pathways establishes a molecular foundation for the development of early risk‐prediction biomarkers and precision therapeutic strategies aimed at mitigating or preventing AZA‐associated AP, thereby improving the safety profile of this widely used immunosuppressant.

Author Contributions

Zhijun Xie and Hang Lei wrote the manuscript; Zhijun Xie, Hang Lei, Pengcheng Zhang, Zhi Li, and Wenfu Tang designed the research; Zhijun Xie, Hang Lei, and Pengcheng Zhang performed the research; Zhijun Xie, Hang Lei, and Pengcheng Zhang analyzed the data.

Funding

This work was supported by the grants from National Natural Science Foundation of China (no. 82174264); Sichuan Administration of Traditional Chinese Medicine (no. 2024zd035); Southwest Medical University Technology Program (no. 2023ZYQJ03); the Sichuan Science and Technology Program (no. 2024YFFK0154).

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: Potential toxicological risks of AZA.

Data S2: Molecular structure of AZA.

Data S3: The list of potential AZA targets.

Data S4: The complete list of these differentially expressed genes (DEGs) associated with pancreatitis.

Data S5: The genetic instrumental variables (IVs) selected for the 7 key genes.

Data S6: This is the Table 1 in the article: Presents the key results of MR analysis and sensitivity analyses.

PSP4-15-e70178-s001.zip (364.2KB, zip)

Contributor Information

Zhi Li, Email: lizhi-scholar@swmu.edu.cn.

Wenfu Tang, Email: tangwf@scu.edu.cn.

Data Availability Statement

Data are available from the corresponding author on reasonable request.

References

  • 1. Beaugerie L., Brousse N., Bouvier A. M., et al., “Lymphoproliferative Disorders in Patients Receiving Thiopurines for Inflammatory Bowel Disease: A Prospective Observational Cohort Study,” Lancet 374 (2009): 1617–1625. [DOI] [PubMed] [Google Scholar]
  • 2. Aloi M. and Cucchiara S., “Acute Pancreatitis and Azathioprine in Paediatric Inflammatory Bowel Disease,” Lancet Child & Adolescent Health 3 (2019): 131–132. [DOI] [PubMed] [Google Scholar]
  • 3. Wintzell V., Svanström H., Olén O., Melbye M., Ludvigsson J. F., and Pasternak B., “Association Between Use of Azathioprine and Risk of Acute Pancreatitis in Children With Inflammatory Bowel Disease: A Swedish‐Danish Nationwide Cohort Study,” Lancet Child & Adolescent Health 3 (2019): 158–165. [DOI] [PubMed] [Google Scholar]
  • 4. Teich N., Mohl W., Bokemeyer B., et al., “Azathioprine‐Induced Acute Pancreatitis in Patients With Inflammatory Bowel Diseases—A Prospective Study on Incidence and Severity,” Journal of Crohn's & Colitis 10 (2015): 61–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Moran G. W., Dubeau M. F., Kaplan G. G., et al., “Clinical Predictors of Thiopurine‐Related Adverse Events in Crohn's Disease,” World Journal of Gastroenterology 21 (2015): 7795–7804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Mederos M. A., Reber H. A., and Girgis M. D., “Acute Pancreatitis: A Review,” JAMA 325 (2021): 382–390. [DOI] [PubMed] [Google Scholar]
  • 7. Frossard J.‐L., Steer M. L., and Pastor C. M., “Acute Pancreatitis,” Lancet 371 (2008): 143–152. [DOI] [PubMed] [Google Scholar]
  • 8. Li D., Wang H., Qin C., et al., “Drug‐Induced Acute Pancreatitis: A Real‐World Pharmacovigilance Study Using the FDA Adverse Event Reporting System Database,” Clinical Pharmacology and Therapeutics 115 (2023): 535–544. [DOI] [PubMed] [Google Scholar]
  • 9. Meczker Á., Hanák L., Párniczky A., et al., “Analysis of 1060 Cases of Drug‐Induced Acute Pancreatitis,” Gastroenterology 159 (2020): 1958–1961.e8. [DOI] [PubMed] [Google Scholar]
  • 10. Lei H., Wu Y., Ma W., et al., “Network Toxicology and Molecular Docking Analysis of Tetracycline‐Induced Acute Pancreatitis: Unveiling Core Mechanisms and Targets,” Toxics 12 (2024): 929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Daina A., Michielin O., and Zoete V., “SwissTargetPrediction: Updated Data and New Features for Efficient Prediction of Protein Targets of Small Molecules,” Nucleic Acids Research 47 (2019): W357–W364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hao J.‐Q., Ran B., Hu S. Y., et al., “Exploring the Link Between Di‐2‐Ethylhexyl Phthalate (DEHP) Exposure and Muscle Mass: A Systematic Investigation Utilizing NHANES Data Analysis, Network Toxicology and Molecular Docking Approaches,” Ecotoxicology and Environmental Safety 295 (2025): 118–132. [DOI] [PubMed] [Google Scholar]
  • 13. Chu Z.‐Y. and Zi X.‐J., “Network Toxicology and Molecular Docking for the Toxicity Analysis of Food Contaminants: A Case of Aflatoxin B(1),” Food and Chemical Toxicology 188 (2024): 114. [DOI] [PubMed] [Google Scholar]
  • 14. Li Y., Zhou T., Liu Z., et al., “Air Pollution and Prostate Cancer: Unraveling the Connection Through Network Toxicology and Machine Learning,” Ecotoxicology and Environmental Safety 292 (2025): 117. [DOI] [PubMed] [Google Scholar]
  • 15. Davies N. M., Holmes M. V., and Davey Smith G., “Reading Mendelian Randomisation Studies: A Guide, Glossary, and Checklist for Clinicians,” BMJ 362 (2018): k601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Kim S., Chen J., Cheng T., et al., “PubChem in 2021: New Data Content and Improved Web Interfaces,” Nucleic Acids Research 49 (2020): D1388–D1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Nowotka M. M., Gaulton A., Mendez D., Bento A. P., Hersey A., and Leach A., “Using ChEMBL Web Services for Building Applications and Data Processing Workflows Relevant to Drug Discovery,” Expert Opinion on Drug Discovery 12 (2017): 757–767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Li Y., Liu Z., Zhou T., et al., “Integrating Network Toxicology and Mendelian Randomization to Uncover the Role of AHR in Linking Air Pollution to Male Reproductive Health,” Reproductive Toxicology 135 (2025): 108. [DOI] [PubMed] [Google Scholar]
  • 19. Szklarczyk D., Kirsch R., Koutrouli M., et al., “The STRING Database in 2023: Protein–Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest,” Nucleic Acids Research 51 (2022): D638–D646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Shannon P., Markiel A., Ozier O., et al., “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks,” Genome Research 13 (2003): 2498–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Consortium, T. G. O. Gene Ontology Consortium , “Going Forward,” Nucleic Acids Research 43 (2014): D1049–D1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Kanehisa M., Furumichi M., Tanabe M., Sato Y., and Morishima K., “KEGG: New Perspectives on Genomes, Pathways, Diseases and Drugs,” Nucleic Acids Research 45 (2016): D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Huang S., Cai N., Pacheco P. P., Narrandes S., Wang Y., and Xu W., “Applications of Support Vector Machine (SVM) Learning in Cancer Genomics,” Cancer Genomics & Proteomics 15 (2017): 41–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. van der Wijst M., van der Wijst M. G. P., de Vries D. H., et al., “The Single‐Cell eQTLGen Consortium,” eLife 9 (2020): e52155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Võsa U., Claringbould A., Westra H. J., et al., “Large‐Scale Cis‐ and Trans‐eQTL Analyses Identify Thousands of Genetic Loci and Polygenic Scores That Regulate Blood Gene Expression,” Nature Genetics 53 (2021): 1300–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Xie Z., Chen Z., Jiang Y., et al., “Causal Relationships Between Epilepsy and the Microstructure of the White Matter: A Mendelian Randomization Study,” Medicine (Baltimore) 103 (2024): e40090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Ali B. M., Al‐Mokaddem A. K., Selim H. M. R. M., et al., “Pinocembrin's Protective Effect Against Acute Pancreatitis in a Rat Model: The Correlation Between TLR4/NF‐κB/NLRP3 and miR‐34a‐5p/SIRT1/Nrf2/HO‐1 Pathways,” Biomedicine & Pharmacotherapy 176 (2024): 116854. [DOI] [PubMed] [Google Scholar]
  • 28. Huang H., Wang M., Guo Z., et al., “Rutaecarpine Alleviates Acute Pancreatitis in Mice and AR42J Cells by Suppressing the MAPK and NF‐κB Signaling Pathways via Calcitonin Gene‐Related Peptide,” Phytotherapy Research 35 (2021): 6472–6485. [DOI] [PubMed] [Google Scholar]
  • 29. Li S., Dai Q., Zhang S. X., et al., “Ulinastatin Attenuates LPS‐Induced Inflammation in Mouse Macrophage RAW264.7 Cells by Inhibiting the JNK/NF‐κB Signaling Pathway and Activating the PI3K/Akt/Nrf2 Pathway,” Acta Pharmacologica Sinica 39 (2018): 1294–1304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Marimuthu M. K., Moorthy A., and Ramasamy T., “Diallyl Disulfide Attenuates STAT3 and NF‐κB Pathway Through PPAR‐γ Activation in Cerulein‐Induced Acute Pancreatitis and Associated Lung Injury in Mice,” Inflammation 45 (2022): 45–58. [DOI] [PubMed] [Google Scholar]
  • 31. Satoh T. and Hosokawa M., “The Mammalian Carboxylesterases: From Molecules to Functions,” Annual Review of Pharmacology and Toxicology 38 (1998): 257–288. [DOI] [PubMed] [Google Scholar]
  • 32. Choi Y.‐J., Nam Y. A., Hyun J. Y., et al., “Impaired Chaperone‐Mediated Autophagy Leads to Abnormal SORT1 (Sortilin 1) Turnover and CES1‐Dependent Triglyceride Hydrolysis,” Autophagy 21 (2024): 827–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Xu J., Li Y., Chen W. D., et al., “Hepatic Carboxylesterase 1 Is Essential for Both Normal and Farnesoid X Receptor‐Controlled Lipid Homeostasis,” Hepatology 59 (2013): 1761–1771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Gai Z., Gui T., Alecu I., et al., “Farnesoid X Receptor Activation Induces the Degradation of Hepatotoxic 1‐Deoxysphingolipids in Non‐Alcoholic Fatty Liver Disease,” Liver International 40 (2019): 844–859. [DOI] [PubMed] [Google Scholar]
  • 35. Xu J., Xu Y., Xu Y., Yin L., and Zhang Y., “Global Inactivation of Carboxylesterase 1 (Ces1/Ces1g) Protects Against Atherosclerosis in Ldlr (−/−) Mice,” Scientific Reports 7 (2017): 17,845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Chen B. B., Yan J. H., Zheng J., et al., “Copy Number Variation in the CES1 Gene and the Risk of Non‐Alcoholic Fatty Liver in a Chinese Han Population,” Scientific Reports 11 (2021): 13984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Nam H. J., Kim Y. E., Moon B.‐S., et al., “Azathioprine Antagonizes Aberrantly Elevated Lipid Metabolism and Induces Apoptosis in Glioblastoma,” iScience 24 (2021): 102,238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Fontana F., Giannitti G., Marchesi S., and Limonta P., “The PI3K/Akt Pathway and Glucose Metabolism: A Dangerous Liaison in Cancer,” International Journal of Biological Sciences 20 (2024): 3113–3125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Huang P., Han J., and Hui L., “MAPK Signaling in Inflammation‐Associated Cancer Development,” Protein & Cell 1 (2011): 218–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Li G., Li X., Mahmud I., et al., “Interfering With Lipid Metabolism Through Targeting CES1 Sensitizes Hepatocellular Carcinoma for Chemotherapy,” JCI Insight 8 (2022): e163624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Qiu Y., Gan M., Wang X., et al., “The Global Perspective on Peroxisome Proliferator‐Activated Receptor γ (PPARγ) in Ectopic Fat Deposition: A Review,” International Journal of Biological Macromolecules 253 (2023): 127,042. [DOI] [PubMed] [Google Scholar]
  • 42. Biczo G., Vegh E. T., Shalbueva N., et al., “Mitochondrial Dysfunction, Through Impaired Autophagy, Leads to Endoplasmic Reticulum Stress, Deregulated Lipid Metabolism, and Pancreatitis in Animal Models,” Gastroenterology 154 (2017): 689–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Li C., Wang F., Cui L., Li S., Zhao J., and Liao L., “Association Between Abnormal Lipid Metabolism and Tumor,” Frontiers in Endocrinology 14 (2023): 1,134,154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Yue C. C., Muller‐Greven J., Dailey P., Lozanski G., Anderson V., and Macintyre S., “Identification of a C‐Reactive Protein Binding Site in Two Hepatic Carboxylesterases Capable of Retaining C‐Reactive Protein Within the Endoplasmic Reticulum,” Journal of Biological Chemistry 271 (1996): 22,245–22,250. [DOI] [PubMed] [Google Scholar]
  • 45. Sproston N. R. and Ashworth J. J., “Role of C‐Reactive Protein at Sites of Inflammation and Infection,” Frontiers in Immunology 9 (2018): 754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Szafran B. N., Borazjani A., Scheaffer H. L., et al., “Carboxylesterase 1d Inactivation Augments Lung Inflammation in Mice,” ACS Pharmacology & Translational Science 5 (2022): 919–931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Capece D., Verzella D., Flati I., Arboretto P., Cornice J., and Franzoso G., “NF‐κB: Blending Metabolism, Immunity, and Inflammation,” Trends in Immunology 43 (2022): 757–775. [DOI] [PubMed] [Google Scholar]
  • 48. Zou N., Liu R., and Li C., “Cathepsin K(+) Non‐Osteoclast Cells in the Skeletal System: Function, Models, Identity, and Therapeutic Implications,” Frontiers in Cell and Developmental Biology 10 (2022): 818,462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Paracha M., Thakar A., Darling R. A., et al., “Role of Cathepsin K in the Expression of Mechanical Hypersensitivity Following Intra‐Plantar Inflammation,” Scientific Reports 12 (2022): 7108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Fabrik I., Bilkei‐Gorzo O., Öberg M., et al., “Lung Macrophages Utilize Unique Cathepsin K‐Dependent Phagosomal Machinery to Degrade Intracellular Collagen,” Life Science Alliance 6 (2023): e202201535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Talukdar R., Sareen A., Zhu H., et al., “Release of Cathepsin B in Cytosol Causes Cell Death in Acute Pancreatitis,” Gastroenterology 151 (2016): 747–758.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Li R., Zhou R., Wang H., et al., “Gut Microbiota‐Stimulated Cathepsin K Secretion Mediates TLR4‐Dependent M2 Macrophage Polarization and Promotes Tumor Metastasis in Colorectal Cancer,” Cell Death and Differentiation 26 (2019): 2447–2463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Chen S.‐N., Tan Y., Xiao X. C., et al., “Deletion of TLR4 Attenuates Lipopolysaccharide‐Induced Acute Liver Injury by Inhibiting Inflammation and Apoptosis,” Acta Pharmacologica Sinica 42 (2021): 1610–1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Mijanović O., Jakovleva A., Branković A., et al., “Cathepsin K in Pathological Conditions and New Therapeutic and Diagnostic Perspectives,” International Journal of Molecular Sciences 23 (2022): 13762. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Potential toxicological risks of AZA.

Data S2: Molecular structure of AZA.

Data S3: The list of potential AZA targets.

Data S4: The complete list of these differentially expressed genes (DEGs) associated with pancreatitis.

Data S5: The genetic instrumental variables (IVs) selected for the 7 key genes.

Data S6: This is the Table 1 in the article: Presents the key results of MR analysis and sensitivity analyses.

PSP4-15-e70178-s001.zip (364.2KB, zip)

Data Availability Statement

Data are available from the corresponding author on reasonable request.


Articles from CPT: Pharmacometrics & Systems Pharmacology are provided here courtesy of Wiley

RESOURCES