Abstract
Alzheimer’s disease is a complex disorder of the nervous system. Diagnosing this disease is a costly process in which numerous laboratory tests and examinations are conducted. Most computational methods for Alzheimer’s disease prediction face low accuracy due to challenges such as a limited number of training samples, noisy/overlapping data, and variability in gene expression. This study presents a reliable computational approach for predicting Alzheimer’s disease through a new method of analyzing gene expression profiles from either brain tissue or blood samples. The proposed Petri net–based approach demonstrates superior diagnostic accuracy compared to existing methods across multiple gene expression datasets derived from both blood and brain tissue. The proposed method runs a Petri net model of the signaling pathways involved in complex nervous system disorders. In addition, the Petri net model provides step-by-step tracking of gene activation until the final diagnosis state is reached. An accurate understanding of the functions of the key genes of the signaling pathways involved in brain cell death will play a significant role in the early diagnosis of this complex disease and hopefully will lead to the identification of suitable preventive treatments or drug targets.
Keywords: Petri net model, Signaling pathways, Alzheimer’s disease, Gene expression
Subject terms: Computational biology and bioinformatics, Diseases, Neurology, Neuroscience
Introduction
Alzheimer’s disease (AD) is one of the most prevalent causes of dementia, accounting for 60–80% of all cases1,2. This progressive neurodegenerative disorder primarily affects older adults, with symptoms typically appearing after the age of 653. Currently, there are no effective treatments that specifically target AD4, however, clinical practice commonly employs a combination of drug therapy, attentive nursing care, and nonpharmacological interventions to alleviate symptoms and slow disease progression5. The average life expectancy following diagnosis ranges from 3 to 12 years, depending on factors such as age and overall health6. Diagnosing AD is particularly challenging in its early stages, when symptoms are often subtle and easily overlooked7,8. Achieving an accurate and timely diagnosis is essential, as it can improve disease management and help reduce the associated healthcare costs.
Traditionally, the diagnosis of AD has relied on brain magnetic resonance imaging (MRI) and neuropsychological assessments9,10. While these methods are valuable, they face limitations, especially in detecting the disease at an early stage11. The emergence of omics data has enabled the development of innovative prediction methods for AD, overcoming some challenges related to brain sample accessibility10,12. Among these, microarray technology, a subset of omics, allows researchers to identify AD-related genes and evaluate their expression levels13,14. Gene expression analysis, particularly via microarrays, has deepened our understanding of the molecular pathways and gene networks implicated in Alzheimer’s disease15. Such advancements may improve diagnostic accuracy and support earlier detection, ultimately leading to better patient outcomes and more effective disease management. These strategies are essential not only for lowering healthcare costs but also for improving the quality of life of those affected by this devastating condition16,17. Recent guidelines emphasize the importance of a proactive diagnostic approach, particularly in the identification of preclinical AD and mild cognitive impairment (MCI) before the onset of full dementia symptoms18.
In a 2022 review, Hajjo et al.19 confirmed that Petri nets have emerged as a powerful tool for modeling complex biological systems, including signaling pathways implicated in AD. In their 2018 study on neuronal pathway behavior, Ashraf et al.20 demonstrated that Petri net modeling of AD signaling pathways offers a promising approach for early prediction and intervention in this complex neurodegenerative disorder. Liu et al.21, in their 2023 work on a protocol for bio-model engineering, and Kodamullil et al.22, in their 2015 study on computable cause‒and‒effect models, have shown that Petri nets enable the analysis of biological systems across multiple stages. This facilitates the identification of abnormalities and atypical states that may indicate disease onset. Additionally, Heiner et al.23, in their 2004 study focused on model validation of biological pathways, advocated the use of Petri nets for stepwise qualitative and quantitative analysis for behavior prediction. In addition, Koch et al.24, in their 2010 study on systems biology modeling, and Rybarczyk et al.25, in their 2024 investigation of macrophage dynamics in atherosclerosis, demonstrated that Petri nets are especially useful in biological contexts where quantitative data are limited. They highlight the ability of these models to effectively capture the structural dynamics of biological systems while integrating quantitative aspects in a constrained manner. Moreover, Puniya et al.26, in their 2024 work on computational modeling of biological systems, and Mansoori et al.27, in their 2019 study of cancer-related signaling pathways, support the increasing adoption of this technique as a solution to the limitations of traditional stochastic approaches. It offers a more robust framework for understanding the complex interactions within biological networks. Studies by Koch et al.28 in 2023 on signal transduction networks, Baldan et al.29 in 2010 on metabolic pathways, and Trinh et al.30 in 2023 on Petri nets encoding Boolean networks further demonstrated the utility of Petri net models in enhancing our ability to analyze signaling pathway dynamics and in contributing to improved diagnostic and therapeutic strategies.
Petri nets have also been utilized in numerous recent studies to model signaling pathways27,31–33 and investigate the roles of various key genes in the progression of complex diseases25,34,35. In 2021, Mansoori et al.31 proposed a Petri net modeling approach to rank signaling pathways associated with different complex diseases. This methodology enables a systematic analysis of biological systems, offering insights into the dynamics of signaling networks and their contributions to disease progression. Other studies36–39 have similarly applied Petri net modeling to explore the molecular interactions among genes involved in the development or treatment of various disorders. In 2016, Cherdá et al.38 demonstrated that mutations in enzymes or metabolites within biological systems are associated with numerous mental disorders, including Alzheimer’s disease, Parkinson’s disease, and schizophrenia. They used an extended form of Petri nets, called hybrid functional Petri nets (HFPNs), to model a complex system related to AD. In a subsequent 2018 study, Cherdá et al.39 applied HFPNs to model the methionine metabolic pathway and reported that abnormalities in this pathway can lead to several serious conditions, such as AD, cardiovascular disease, and cancer.
In several other works36,40–44, machine learning and deep learning techniques have been employed as the primary means for disease prediction. These approaches typically begin by analyzing gene expression samples collected from either tissue or blood to assess the feasibility of disease prediction. From these, selected subsets of samples are used to achieve the desired predictive accuracy. Finally, key genes involved in disease diagnosis were identified. While gene expression analysis has revealed functional patterns relevant to disease prediction, most datasets are tissue-based, which poses practical challenges in clinical settings. Only a few studies have utilized blood-based gene expression data for AD prediction.
For example, in 2013, Lunnon et al.40 demonstrated that blood gene expression data can be used for AD classification. They applied a t test and random forest (RF) model to identify genes capable of distinguishing between AD patients and healthy individuals. In 2015, Voyle et al.36 employed recursive feature elimination (RFE) alongside RF on blood-derived gene expression data and achieved comparable diagnostic performance when both gene-level and pathway-level features were used. In 2017, Li H. et al.41 introduced a novel method, Ref -REO, to identify differentially expressed genes (DEGs) from blood samples for AD diagnosis. In 2018, Li X. et al.42 conducted the first large-scale analysis of gene expression in blood samples from AD patients. Their goal was to identify DEGs in blood and compare them with those in brain tissue, uncovering shared genes and validating critical signaling pathways involved in AD. They used least absolute shrinkage and selection operator (LASSO) regression and three classifiers support vector machine (SVM), logistic ridge regression (LR), and RF to predict disease status from blood data. In 2020, Lee et al.43 used deep learning models, such as variational autoencoders and deep neural networks, in conjunction with traditional classifiers (DEGs, SVM, LR, L1-LR, RF) to predict AD from blood gene expression data. In 2022, Ji et al.44 extracted DEG lists from male and female blood samples and constructed a protein–protein interaction (PPI) network to identify sex-specific hub genes. Using these features with an SVM classifier, they reported that sex is a significant factor in distinguishing individuals with AD.
More recently, several studies have focused on discovering blood-based biomarkers for AD diagnosis, with promising results. For example, a 2022 study developed a diagnostic model based on autophagy-related genes, emphasizing the importance of cellular degradation processes in AD pathogenesis45. Other research in 2024 has linked biomarkers to energy metabolism and immune system activity, reinforcing the systemic nature of AD46. Additionally, in another study in 2022, sex-specific differences in blood biomarkers were explored, demonstrating the influence of sex on AD diagnostics47. Machine learning methods, such as RF, LASSO regression, and artificial neural networks, have played a key role in refining these models and enhancing classification accuracy44,48. A 2022 study also underscored the relevance of metabolic pathways and immune cell infiltration in AD, suggesting potential therapeutic targets in addition to diagnostic improvements49.
Moreover, a recent 2023 study introduced a deep learning framework called One2MFusion, which demonstrated the potential of multimodal data fusion in diagnosing neurodegenerative disorders. This model integrates blood gene expression profiles with their corresponding 2D image representations to improve AD detection50. In 2023, Abdelwahab et al.51 explored deep learning for AD prediction via microarray gene expression data. They applied gene selection techniques such as singular value decomposition (SVD) and principal component analysis (PCA) in combination with neural networks.
Recent research has further highlighted the importance of understanding the molecular mechanisms underlying AD, particularly the roles of key proteins and their associated signaling pathways in neuronal cell death. For example, research employing stochastic Petri nets has successfully modeled the aggregation of amyloid beta and its detrimental effects on neuronal health, revealing critical cytotoxic events that contribute to the early onset and progression of AD²⁰. However, these models often lack integration with real-world biological data, which limits their applicability in clinical contexts52.
In parallel with these developments, several recent studies have demonstrated the power of advanced machine learning frameworks in bioinformatics for biomarker discovery and disease association prediction. For instance, graph-based and matrix factorization approaches have been successfully employed to infer miRNA– and lncRNA–disease relationships with high predictive accuracy, leveraging deep learning and network representation learning to model complex biological interactions53– 56. These studies highlight how integrating heterogeneous biological data with AI-driven models can accelerate the identification of potential disease biomarkers. Although these studies do not focus on blood-based biomarkers or Alzheimer’s disease specifically, they illustrate how integrating heterogeneous biological data with AI-driven models can accelerate the identification of potential disease biomarkers and provide a methodological foundation relevant for data-driven disease prediction in systems biology.
Moreover, while machine learning techniques have shown considerable promise in analyzing neuroimaging data, they typically depend on large, labeled datasets that are expensive and time-consuming to generate57,58. Additionally, traditional machine learning models are generally restricted to processing a single, fixed-length longitudinal dataset at a time, which hinders their ability to detect complex patterns and interrelationships across multiple data sources and heterogeneous risk factors59.
Proposed approach
This study aims to address the existing challenges by integrating a Petri net–based computational model with a novel strategy for employing gene expression profiles to run the Petri net model. Methodologically, our approach transforms the signaling pathway associated with Alzheimer’s Disease (AD) into a rule-based dynamic model, enabling the step-by-step simulation of gene activation sequences for each individual sample. This explicit simulation framework allows us to trace and interpret the chain of biological events that leads to the final diagnostic state, thereby enhancing transparency and reproducibility.
Another key technical contribution of our work is the development of a unified diagnostic framework that operates consistently on both blood and brain tissue samples. The same Petri net structure and inference mechanism can process heterogeneous sample types without requiring pathway reconstruction or model re-training, which makes the method highly generalizable and cost-effective for real-world applications. Using gene expression profiles to obtain initial input tokens, the Petri net model is executed to determine whether each sample represents a healthy or diseased state.
To the best of our knowledge, no prior study has proposed a deterministic, interpretable Petri net framework with high accuracy for AD diagnosis using both blood and brain derived gene expression data. We believe that the combination of rule-based pathway modeling, deterministic event tracing, and unified applicability across sample types constitutes a substantial methodological and technical advance. This modeling strategy not only improves diagnostic reliability and accuracy but also establishes a generalizable and interpretable computational framework for analyzing other complex disease pathways.
The rest of the paper is organized as follows. Section 2 briefly introduces the input datasets, the signaling pathway materials, and the modeling method. Section 3 describes the proposed method in detail. Section 4 outlines the hyperparameter tuning process and the final configuration settings. Section 5 reports the experimental results across multiple benchmark datasets and compares our approach with current state-of-the-art methods. Section 6 presents the ablation study conducted to evaluate the contribution of each component within the proposed framework. Section 7 provides our discussion and future work, and Sect. 8 offers our concluding remarks and summarizes the key findings of the study.
Materials and methods
Sample datasets
The healthy and diseased samples used for Petri net model tokenization were downloaded from the NCBI/GEO website. Initially, two different datasets with the identifiers GSE63060 and GSE63061 were used. These datasets are related to healthy individuals and patients in Europe. The GSE63060 dataset has 329 samples, including 104 subjects with AD, 145 healthy subjects, and 80 subjects with mild cognitive impairment. The GSE63061 dataset has 382 samples, including 139 subjects with AD, 134 healthy subjects, and 109 subjects with mild cognitive impairment. Additionally, multiple other benchmark datasets (GSE97760, GSE48350, GSE5281, GSE109887 and GSE37263) have been successfully used for a more accurate evaluation of the presented method on different types of brain tissue and blood samples. The information related to these datasets is given in Table 5.
Table 5.
The results on different types of datasets (blood or brain tissue).
| Dataset | Type | Patient | healthy | TP | TN | Acc (%) | Pre (%) | Rec (%) | F1 (%) | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | GSE63060 | blood | 145 | 104 | 145 | 100 | 98.4 | 97.32 | 100 | 98.64 |
| 2 | GSE63061 | blood | 139 | 134 | 137 | 134 | 99.27 | 100 | 98.57 | 99.28 |
| 3 | GSE97760 | blood | 9 | 10 | 9 | 10 | 100 | 100 | 100 | 100 |
| 4 | GSE48350 | tissue | 80 | 173 | 78 | 173 | 99.21 | 100 | 97.5 | 98.74 |
| 5 | GSE5281 | tissue | 87 | 74 | 87 | 73 | 99.38 | 98.87 | 100 | 99.43 |
| 6 | GSE109887 | tissue | 46 | 32 | 46 | 31 | 98.72 | 97.88 | 100 | 98.93 |
| 7 | GSE37263 | tissue | 8 | 8 | 8 | 7 | 93.75 | 88.89 | 100 | 94.12 |
Signaling pathway data
The KEGG60–62 signaling pathway data related to AD are used to construct the Petri net model. The corresponding KGML file is available on the KEGG website with the identification number hsa:05010. In the preprocessing stage of the signaling pathways, the names of the genes involved in the AD pathway and all the activation and inhibition relationships between these genes are extracted from the KGML file. Chemical molecules in the signaling pathway (e.g., calcium ions, amyloid-β, tau proteins) were excluded, as these elements are not represented in gene expression datasets and therefore cannot be directly modeled using transcriptomic data. While these molecules play central roles in AD pathology particularly in aggregation, inflammation, and synaptic dysfunction the model focuses solely on gene-level interactions available in microarray profiles. This exclusion simplifies the biological representation and may limit the completeness of disease mechanism modeling. The nodes with the same name that appear in different parts of the pathway are merged into one node. The gene IDs used in the GEO and KEGG databases are different and need to be mapped. Some gene IDs in GEO are missing in KEGG and need to be omitted. Some gene IDs in KEGG have multiple gene IDs in GEO, in which case only the data for the first gene ID in GEO are retained. Finally, the information on the genes in the signaling pathway for AD is employed, and the other genes are eliminated from the data.
Petri net modeling tool
The Petri net model serves as a state machine that enables analysis of the transition sequence from one biological state to another. In the initial state, some places can hold one or more tokens. Tokens can move from one place to another at each transition. The Petri net model is defined as a bipartite, directed weighted graph with two types of nodes, namely, places and transitions, linked via directed weighted arcs (Fig. 2). A transition can be triggered only if all input places have the minimum number of tokens (weight) indicated on the corresponding arcs63–65.
Fig. 2.

A Sample transition in Petri Net model.
In a biological Petri net model, places represent biological molecules, such as genes or proteins, and transitions represent biochemical reactions, such as activations (or inhibitions). Arcs can only connect places to transitions and vice versa. Places are shown as circles, whereas transitions are depicted as rectangles66. The Petri net enables us to model signaling pathways and examine step-by-step how the sequence of activations or inhibitions can occur among genes (DEGs), leading a person from an initial healthy state to a final patient state67 (see Fig. 4).
Fig. 4.

(a) A portion of the Alzheimer signaling pathway in KEGG, (b) the Petri net model for (a).
Proposed method
Figure 1 shows the global structure of the proposed method. There are three main components colored in the proposed structure that need to be zoomed in later:
Fig. 1.
The global structure of the proposed method, including three main components (colored).
Petri net building tool,
The cluster estimator tool.
DEG* finding tool.
The first tool’s goal is to generate a Petri net model for the related signaling pathway.
The second tool makes a preliminary estimation of the healthy/patient status of any selected gene expression profile (and adds it to one of the two estimated clusters of healthy/patient samples in the training phase).
The third tool identifies those genes whose expression is significantly different within the two clusters of healthy and patient samples. We refer to these genes, which exhibit unusually distinct expression values in a given test sample compared to both healthy and diseased training profiles, as DEG (differentially expressed genes specific to the test sample).
Petri net tool
Petri nets are a well-established formalism originally introduced in computer science for modeling concurrent and distributed systems. They provide both a graphical and a mathematical framework for describing dynamic behaviors involving synchronization, causality, and concurrency. Because of their intuitive visual representation and rigorous theoretical foundation, Petri nets have been widely applied in diverse fields ranging from engineering to biology, where complex interactions among components must be understood in a dynamic context. In biological contexts, Petri nets have proven particularly effective for modeling signaling pathways and molecular interactions63,64.
A Petri net (PN) is a bipartite directed graph formally defined as a 4-tuple:
![]() |
where:
is the set of places, each representing a gene in the pathway.
is the set of transitions, representing biological interactions such as activation or inhibition.
is the set of directed arcs that connect places and transitions.
defines the weight of each arc, indicating the number of tokens transferred during a firing event.
Places can contain dynamic objects, called tokens, each place
holds a number of tokens
representing its current activity level or expression state.
A transition
is able to fire if all its input places contain at least as many tokens as required by their corresponding arc weights. When an enabled transition fires, tokens are removed from its input places and added to its output places according to the arc weights defined in
. This process models the propagation of signals through the pathway over time65.
Figure 2 illustrates a simple example in which a transition
connects two genes,
and
. The arc from
to
represents the dependency of
on gene
, and the arc from
to
represents the activation of gene
.
Here,
denotes the number of tokens currently present in place
, and
is the threshold (weight) required in
to enable transition
. When
fires,
tokens are removed from
, and
tokens are added to
.
For each transition, specific triggering rules define whether a connected place acts as an activator or inhibitor68. For example, the rule
specifies that the transition
can fire only when the number of tokens in
meets or exceeds the required threshold.
In this study, we implemented a Petri net tool that automatically constructs a Petri net model from KGML files to simulate the dynamic behavior of signaling pathways. The Petri net tool was primarily implemented in Python using extensive custom code for parsing KGML files, extracting genes and transitions, and constructing the network structure, while the SNAKES https://github.com/fpom/snakes. library was employed to define, simulate, and visualize the Petri net elements. The overall workflow of constructing a Petri net from a KGML pathway file is illustrated in Fig. 3, showing the step-by-step process from parsing nodes and relations to building transitions and the final network structure.
Fig. 3.
Constructing a Petri net workflow.
The constructed model enables us to trace, step-by-step, how differential gene activations or inhibitions may transition a biological system from an initial healthy state to a final disease state67 (see Fig. 4). Figure 4 presents a portion of the Alzheimer’s disease signaling pathway and its corresponding Petri net model representation. Once the enabling conditions of a transition are satisfied, the transition fires and tokens flow through the network. The model at first tokenized using gene-expression values from a selected healthy or diseased sample to initialize the tokens
in each place. After tokenization, the Petri net is executed to simulate the signaling dynamics and determine the final state of the sample.
Model execution is performed in multiple steps (or iterations). At each step, all enabled transitions are fired simultaneously to move forward the model from the current state to a new state. This approach ensures that all possible system states are considered without requiring separate executions of the Petri net for each individual transition. The iterations continue until there is no more enabled transition.
At the end of the processing for a selected test sample(i), if the final place has at least one token, sample(i) is diagnosed as diseased. Otherwise, the sample is diagnosed as healthy. In our model, the final place corresponds to one or more downstream nodes in the KEGG signaling pathway (hsa05010) that represent pro-apoptotic or neurodegenerative states. Therefore, the presence of a token in this place reflects the result of pathological gene activations associated with AD progression.
The workflow for running the Petri net on sample gene expression is illustrated in Fig. 5, showing token initialization, iterative transition firing, and final sample classification steps.
Fig. 5.
Running a Petri net workflow.
Cluster estimator tool
In the proposed method, before running the Petri net model for a selected test sample
, we perform a preliminary estimation of its association with either the healthy or diseased cluster. Unlike classical clustering algorithms such as K-means or KNN, this module does not rely on a predefined clustering model. Instead, it applies a custom similarity-based procedure inspired by the idea of neighborhood comparison.
This estimation is based on the gene expression profiles of the training samples and the list of differentially expressed genes (DEGs) (
genes) obtained using the Limma package64 for the Alzheimer’s disease (AD) signaling pathway. The overall structure of this module is illustrated in Fig. 6.
Fig. 6.
Cluster estimator tool.
Specifically, for each test sample
, the
DEGs are identified as indicator genes. For each indicator gene, the
genes with the same gene ID that exhibit the closest expression values to the current gene are retrieved from the healthy and diseased training samples. Each indicator gene is then tagged as healthy or diseased based on the majority label of its most similar training genes. After processing all indicator genes, a list of
tagged genes is obtained for the test sample. The final predicted label of the test sample is determined by a majority vote over these tags.
During the training phase, each sample
is then assigned to either the healthy or diseased cluster based on its predicted label. This approach functions similarly to KNN in the sense that it leverages local similarity information, but it is specifically designed and customized for the biological context of gene expression and signaling pathway analysis.
DEG* finding tool
As mentioned above, our objective is to predict the final healthy/diseased tag of a selected test sample (i) via a Petri net model to study the sequence of gene activation (or inhibition) in the Alzheimer signaling pathway. To achieve this goal, we propose a novel method to identify DEG* genes within the selected test sample (i) for the initial tokenization of the model.
The general definition of DEGs includes those genes whose gene expression values significantly differ between two distinct sets of healthy and diseased samples.
However, since the aim of our method is to run the Petri net model for a single test sample(i) with reliable precision, we propose identifying only those genes that show more significant differences in their gene expression values within the selected test sample(i) compared with a set of training samples. We refer to those genes as DEG*.
Figure 7 illustrates the structure of the DEG* finding tool. The input to this tool consists of a test sample(i) that has already been classified in a healthy or diseased cluster by the cluster estimator tool. Two distinct evaluation methods are then employed on the basis of the cluster of the test sample (i):
Fig. 7.
DEG* Finding tool.
For each gene(j) among the m genes in test sample(i), the expression values of the α genes with the closest values to the expression value of gene(j) in the training samples of the same cluster as sample(i) are selected. Then, the expression values of the β genes with the closest values to the expression value of gene(j) are selected among the training samples of the opposite cluster.
In the next step, the median of the gene expression values is calculated for both sets, and the 90% confidence interval of the median is determined. Subsequently, it is determined whether the expression value of gene(j) falls within the range of the two confidence intervals. If the expression value of gene(j) is inside the confidence interval of patient samples and outside the confidence interval of healthy samples, then gene(j) is qualified as DEG* for sample(i). During the initial tokenization of the model, a token must be placed for each DEG* identified within the test sample(i). The Petri net model is then executed for sample (i) according to the method explained in Sect. (3.1).
To ensure that the selection of DEG* genes for each test sample is robust to noise and sample variability, we evaluated the reproducibility of DEG* identification using confidence intervals around the medians of gene expression values. The median is inherently more robust than the mean, as it is less sensitive to outliers and fluctuations in the training data. By comparing the expression of each gene in the test sample to the 90% confidence interval of the corresponding cluster in the training samples, we mitigate the effect of sample variability and ensure reliable DEG* identification. We further verified robustness by testing multiple confidence interval thresholds (90%, 95%, and 98%) on datasets GSE63060 and GSE63061, which resulted in only minor changes in the final disease detection, confirming the stability of DEG* selection (Fig. 8). Note that, owing to the small range of variation among values, the Y-axes in Figs. 8 and 10 are truncated to highlight differences.
Fig. 8.
Performance Analysis of the Proposed Method for Different Confidence Interval Values on (a) GSE63060, (b) GSE63061. Y-axis is truncated from 94–100 to emphasize small but meaningful differences.
Fig. 10.
Performance analysis of the proposed method for different values of α and β on (a) GSE63060, (b) GSE63061. Y-axis is truncated from 88–100 to emphasize small but meaningful differences.
Hyperparameter tuning and final settings
To optimize the performance of our proposed method, we conducted a comprehensive hyperparameter tuning process for three key parameters: the confidence interval, the δ parameter used in the Cluster Estimator tool, and the α and β parameters used in DEG* identification.
Confidence interval (CF)
To determine the optimal value of the confidence interval (CF) parameter, we evaluated several commonly used confidence levels 90%, 95%, and 98% on both datasets (GSE63060 and GSE63061). For each sample in the benchmark datasets, three distinct initialization stages were performed corresponding to these confidence interval values. In each stage, the Petri net was executed using the tokens generated from the respective initialization process, and the predicted state by Petri net model (healthy or diseased) was recorded. Consequently, each sample was processed three times, and model performance was computed for all confidence interval settings.
The obtained results, shown in Fig. 8, present the model’s accuracy, precision, and recall across the tested values. The horizontal axis represents the tested confidence intervals, while the vertical axis shows the performance metrics.
As can be seen, although the performance variations are relatively small, the 90% confidence interval yielded superior accuracy, precision, and recall on both datasets. Specifically, for GSE63060, the 90% and 98% intervals produced similar and better results than the 95% setting, while for GSE63061, the 90% interval achieved the best performance across all three metrics. Accordingly, CF = 90% was selected as the optimal value and used in all subsequent experiments.
To emphasize the subtle yet meaningful differences among the tested values, the Y-axis in Fig. 8 is truncated from 94% to 100%. This zoomed-in view highlights the influence of the confidence interval parameter on model performance and supports the selection of the optimal value.
Cluster estimator parameter (δ)
The δ parameter determines the number of neighboring samples with the most similar gene expression value in profiles used for initial estimation. To identify its optimal value, we systematically evaluated nine δ values ranging from 2 to 40 using two benchmark datasets (GSE63060 and GSE63061). For each configuration, the Cluster Estimator module predicted the health status of samples based on their closest training instances, and the resulting accuracy, precision, and recall were measured (Fig. 9). In this figure, the horizontal axis represents the tested δ values, while the vertical axis displays the corresponding evaluation metrics. The experimental results indicate that a neighborhood size of δ = 3 achieved the best overall performance across both datasets yielding the highest accuracy, precision, and recall for GSE63061, and the best accuracy and recall for GSE63060. Although performance differences among larger δ values were relatively minor, δ = 3 consistently produced the most stable and reliable predictions. Consequently, δ = 3 was selected as the final value for subsequent experiments.
Fig. 9.
Performance of cluster estimator tool for different δ values on (a) GSE63060, (b) GSE63061.
DEG* identification parameters (α, β)
Parameters α and β determine the number of healthy and patient samples used when constructing the Petri net model for DEG* identification. To automatically determine suitable values for these parameters, we evaluated seven combinations of α and β ranging from 5 to 60 on both benchmark datasets (GSE63060 and GSE63061). For each parameter pair, all samples in the dataset were processed by the Petri net, and the resulting accuracy, precision, and recall were recorded (Fig. 10). The horizontal axis of the figure represents the tested (α, β) pairs, while the plotted points correspond to performance metrics under each setting.
As illustrated, the best overall results for both datasets were obtained with α = 20 and β = 40, which consistently achieved the highest values across all three evaluation metrics. Parameter combinations with smaller values (e.g., α = β = 10) resulted in noticeably lower performance. Although the overall variation across tested configurations was moderate less than 5% for GSE63060 and 10% for GSE63061 across accuracy, precision, and recall the performance stabilized for larger α and β values. To emphasize these subtle yet meaningful trends, the Y-axis in Fig. 10 is truncated. Consequently, α = 20 and β = 40 were selected as the optimal settings and used in all subsequent experiments.
All hyperparameters were tuned via grid search on the benchmark datasets, and the final values were selected based on overall model performance. Table 1 summarizes the parameters, their descriptions, tested ranges, and the optimal values used in the final configuration.
Table 1.
Optimized hyperparameters and search ranges for model Configuration.
| Parameter | Description | Search Range / Tested Values | Optimal Value |
|---|---|---|---|
| Confidence interval | Used for confidence level in DEG* identification | 90-95-98 | cf. = 90 |
| δ | Number of nearest neighbors for Cluster Estimator | 1–5 | δ = 3 |
| α | Number of same-class samples in Petri net | 10–30 | α = 20 |
| β | Number of opposite-class samples in Petri net | 20–60 | β = 40 |
Experiments and results
In this section, we provide a comprehensive evaluation of the microarray datasets GSE63060 and GSE63061, which were downloaded from the GEO database. We present the preprocessing steps involved, including the mapping of GEO IDs to unique KEGG IDs and the subsequent focus on genes within the signaling pathway. We also describe the identification of DEGs via the Limma package in R, which is then used for initial clustering of samples.
We evaluate the performance of our model with the standard criteria generally proposed in related works. We assess the accuracy, precision, and recall of our diagnosis method, considering the classification of samples into healthy and diseased clusters. These metrics provide a clear understanding of the effectiveness of our approach in accurately diagnosing samples on the basis of gene expression patterns. We compared our results with those of previous studies that explored microarray gene expression datasets. Our experimental results show the high accuracy of our method for predicting the healthy or diseased status of test samples. Analysis of DEGs in signaling pathways provides significant insight into disease-related signaling pathways. Our model’s performance and the accuracy of our results compared with those of other studies prove the reliability and originality of our proposed method.
Data preprocessing
Microarray datasets from the GSE63060 and GSE63061 datasets were downloaded from the GEO database. These datasets encompass gene expression information for approximately 30,000 genes. After mapping the GEO ID to the unique KEGG ID (Entrez ID) and filtering the genes to include only those present in the signaling pathway, the dataset sizes were reduced to 110 genes for GSE63060 and 102 genes for GSE63061. Consequently, only the genes within the signaling pathway were considered for further analysis.
Differentially expressed genes (DEGs) were identified for both datasets via the Limma package in R. These DEGs were used to predict the status of the samples. The GSE63060 dataset revealed 35 DEGs out of the 110 genes, whereas the GSE63061 dataset revealed 22 DEGs out of the 102 genes. The violin plots illustrating DEGs in healthy versus patient samples for both the GSE63060 and GSE63061 datasets are presented in Fig. 9.
The selection of the delta (δ) parameter involved testing various values, with a final choice of δ = 3. This means that three training samples with the closest gene expression values are used to predict whether individuals are healthy or diseased. Figure 9 illustrates the results of multiple tests conducted to determine the optimal value for the gamma parameter.
To identify DEG* genes, we constructed and executed a Petri net using all genes within the signaling pathway. For the identification of these DEG*s, the alpha (α) and beta (β) parameters are set to 20 and 40, respectively, considering the training dataset’s size of over 40 samples. In other words, for a healthy sample, 20 trained healthy samples and 40 trained patient samples are used, whereas for patient samples, 20 trained patient samples and 40 trained healthy samples with the closest values are employed to identify DEG* genes. Various tests have been conducted to select appropriate values for the alpha and beta parameters. The results of these experiments on datasets GSE63060 and GSE63061 are shown in Fig. 11.
Fig. 11.
Violin plots of DEGs in healthy vs patient samples for (a) GSE63060, (b) GSE63061.
Model evaluation criteria
To assess the performance of the developed model, three fundamental criteria are employed: accuracy, precision, and recall. The formulas and detailed explanations for each criterion are provided in Fig. 12.
Fig. 12.
Evaluation criteria and their formulas.
Accuracy (ACC) is used to measure the model’s ability to correctly diagnose the healthy or diseased status of samples. It quantifies the proportion of samples for which the model accurately determines their true status, providing an overall measure of classification correctness69.
Precision (PR) measures the proportion of predicted positive cases (patients) that are truly diseased, whereas specificity assesses the model’s ability to correctly identify healthy individuals. This signifies how correctly the model can pinpoint individuals affected by the specific condition under investigation70.
Recall (RE), or sensitivity, characterizes the model’s ability to identify true positive patient samples among the overall diagnosed patient samples. This highlights the model’s effectiveness in capturing individuals who are affected by the condition of interest71.
The F1 score (F1) is the harmonic mean of precision and recall, providing a balanced measure that considers both false positives and false negatives. It is especially useful when the class distribution is imbalanced, as it combines the model’s ability to correctly identify positive cases (precision) and its ability to capture all relevant positive cases (recall) into a single metric72.
These four criteria enable us to evaluate the model’s ability to predict healthy patients and patients via the Petri net modeling tool.
Comparative analysis with other existing solutions
As mentioned in the introduction, several studies have used machine learning approaches to identify healthy and patient profiles. Tables 2 and 3 show the results of the experiments in which blood samples from the gene expression profiles were used to predict the healthy and patient profiles. These benchmarks are focused on the GSE63060 and GSE63061 datasets. In the following, we compare the benchmark results of our proposed method (Table 2) with those reported in other studies (Table 3).
Table 2.
| Study | Train | Test | patient | healthy | MCI | N/A | TP | TN | Method | Acc (%) |
Pre (%) |
Rec (%) |
F1 (%) |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Petrinet | GSE63060 | GSE63060 | 145 | 104 | - | 0 | 145 | 100 |
Pathway + DEGs + Petri net |
98.4 | 97.32 | 100 | 98.64 |
| 2 | Petrinet | GSE63060 | GSE63060 | - | 104 | 80 | 0 | 79 | 102 |
Pathway + DEGs + Petri net |
98.37 | 97.54 | 98.75 | 98.14 |
| 3 | Petrinet | GSE63060 | GSE63060 | 145 | 104 | 80 | 0 | 225 | 81 |
Pathway + DEGs + Petri net |
93.1 | 90.73 | 99.12 | 95.1 |
| 4 | Petrinet | GSE63061 | GSE63061 | 139 | 134 | - | 0 | 137 | 134 |
Pathway + DEGs + Petri net |
99.27 | 100 | 98.57 | 99.28 |
| 5 | Petrinet | GSE63061 | GSE63061 | - | 134 | 109 | 0 | 107 | 133 |
Pathway + DEGs + Petri net |
98.36 | 99.07 | 97.25 | 98.15 |
| 6 | Petrinet | GSE63061 | GSE63061 | 139 | 134 | 109 | 0 | 247 | 104 |
Pathway + DEGs + Petri net |
91.88 | 89.17 | 99.6 | 94.1 |
Table 3.
| n | Study | Train | Test | patient | healthy | MCI | N/A | Method | Acc (%) |
Pre (%) |
Rec (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Li X. et al.42 | GSE63061 | GSE63060 | 143 | 104 | - | 2 | Vote(Full4set) | 81.1 | 87.2 | 76.5 |
| 2 | Li X. et al.42 | GSE63061 | GSE63060 | 143 | 104 | - | 2 | Vote(Full6set) | 78.3 | 82.1 | 75.5 |
| 3 | Li X. et al.42 | GSE63061 | GSE63060 | - | 104 | 77 | 2 | Vote(Full4set) | 76.2 | 78.8 | 72.7 |
| 4 | Voyle et al.33 | GSE63060 | GSE63061 | 118 | 118 | - | 37 | RFE + RF | 65 | 61 | 70 |
| 5 | Li X. et al.42 | GSE63060 | GSE63061 | 102 | 78 | - | 93 | Vote(Full4set) | 78.1 | 78.8 | 77.6 |
| 6 | Li X. et al.42 | GSE63060 | GSE63061 | 102 | 78 | - | 93 | Vote(Full6set) | 79.8 | 77.9 | 76.5 |
| 7 | Li X. et al.42 | GSE63060 | GSE63061 | - | 78 | 65 | 100 | Vote(Full6set) | 79 | 82.1 | 75.4 |
Table 2 shows the results obtained in the experiments to evaluate our method with the datasets GSE63060 and GSE63061. In these experiments, we selected one by one every profile in each dataset and handled each selected profile as an unfamiliar test sample. In other words, we employed the leave-one-out cross-validation method at each iteration and continued the iterations on all the samples in each dataset.
In Table 3, we present the results reported in previous studies utilizing healthy and diseased gene samples. They achieved accuracies ranging from 78% to 81% on the GSE63060 dataset (Table 3 - rows 1–2). However, we achieve an accuracy of 98% on the same dataset (Table 2 - Row 1).
Table 3 shows that GSE63061 achieved an accuracy ranging from 65% to 79% (Table 3 - Rows 4–6). However, we achieve an accuracy of 99% (Table 2 - Row 4).
In further investigations, a study explored the utilization of individuals with mild cognitive impairment (MCI) as patients. This study reported accuracies of 76% on the GSE63060 dataset (Table 3 - Row 3) and 79% on the GSE63061 dataset (Table 3 - Row 7). For the same experiments, we achieved an accuracy of 98% on the GSE63060 dataset (Table 2 - Row 2) and an accuracy of 98% on the GSE63061 dataset (Table 2 - Row 5). Furthermore, we established a third scenario encompassing all samples, where healthy individuals were contrasted against the combined group of individuals with disease and MCI. With the Petri net model, we achieved an accuracy of 93% on the GSE63060 dataset (Table 2 - Row 3) and an accuracy of 91% on the GSE63061 dataset (Table 2 - Row 6). However, it is important to note that the Petri net model’s accuracy decreased slightly in this scenario because of the presence of MCI individuals who exhibited certain marginal AD symptoms.
It is important to note that the current version of our model operates on a binary classification framework, distinguishing only between ‘healthy’ and ‘diseased’ profiles. However, Alzheimer’s disease progression is a continuous process, and individuals with MCI often represent an intermediate state with ambiguous molecular signatures. In future work, by having supportive data, researchers can extend the model to support multi-class or probabilistic classification to better capture these transitional disease stages.
It is worth noting that, in most of the compared studies, several samples were excluded from analysis due to noise, outlier behavior, or uncertain biological interpretation. In contrast, our proposed method incorporates an internal mechanism to handle noise directly within each dataset. Consequently, no samples were omitted in our experiments, ensuring a comprehensive evaluation of all available data.
In addition, numerous recent studies have investigated the GSE63060 and GSE63061 gene expression datasets using various machine learning–based models to classify healthy and diseased samples. These studies often report their performance in terms of the area under the receiver operating characteristic curve (AUC-ROC), with reported values typically ranging from approximately 0.73 to 0.88. A summary of several representative works is provided in Table 4. Although AUC-ROC values offer useful insights into the ranking ability of probabilistic classifiers, they are not directly comparable to the evaluation criteria employed in our proposed method, which fundamentally differ in both methodological design and output interpretation.
Table 4.
| Study | Year | Validation dataset | AUC | |
|---|---|---|---|---|
| 1 | Wang H. et al.46 | 2024 | GSE63060 | 0.887 |
| 2 | Wang H .et al.46 | 2024 | GSE63061 | 0.789 |
| 3 | Yu W. et al.48 | 2021 | GSE63060 | 0.837 |
| 4 | Yu W. et al.48 | 2021 | GSE63061 | 0.802 |
| 5 | Ji W. et al.44 | 2022 | GSE63060 | 0.65 |
| 6 | Ji W. et al.44 | 2022 | GSE63061 | 0.81 |
| 7 | Qin Q. et al.45 | 2022 | GSE63061 | 0.836 |
| 8 | Qin Q. et al.45 | 2022 | GSE63061 | 0.731 |
Generalization ability: results on other benchmark datasets
To evaluate the generalization ability of the proposed model, we further tested it on several independent benchmark datasets. While GSE63060 and GSE63061 were used as the main datasets in our study, additional datasets (GSE97760, GSE48350, GSE5281, GSE109887, and GSE37263) were included to examine the model’s performance across different sample sources and experimental conditions. Detailed information and results for all datasets are summarized in Table 5.
These datasets encompass both blood and brain tissue samples, thereby providing a comprehensive assessment of the model’s robustness and transferability across distinct biological contexts.
In these experiments, the parameter δ was fixed at 3 to ensure optimal performance. The parameters α and β were adjusted according to the dataset size as follows:
For datasets with more than 40 samples: α = 20, β = 40.
For datasets with more than 20 samples: α = 10, β = 20.
For smaller datasets: α and β were set equal to the total number of healthy and diseased samples, respectively.
Across all benchmark datasets, the proposed model consistently achieved high accuracy, precision, recall, and F1 scores, confirming its strong generalization capability and adaptability to diverse data sources, including both blood-based and brain-tissue datasets.
Furthermore, to explore the broader applicability of the proposed framework beyond Alzheimer’s disease, preliminary tests have been conducted on several cancer-related and other neurodegenerative disease datasets. Early results suggest that the Petri net–based approach also performs effectively in these domains. Detailed analyses and results of these additional experiments will be presented in future studies.
Ablation study
To evaluate the contribution of each component in the proposed framework, we conducted an ablation study in which one module was selectively removed at a time. The overall framework consists of two primary modules:
Estimator Module: Responsible for generating an initial prediction of whether each sample belongs to the healthy or patient group. It also identifies the sample-specific DEG* genes required for the tokenization of the Petri net model.
Petri Net Module: Represents the Alzheimer’s signaling pathway and determines the final diagnostic outcome based on the activation of disease-related terminal places.
When the Petri net module was excluded, the estimator module alone achieved generally high classification accuracy. However, several samples were misclassified due to noise or subtle variations in gene expression. After reintroducing the Petri net module, many of these misclassifications were corrected. Notably, the overall accuracy did not decrease in any of the evaluated datasets. The Petri net module either confirmed the estimator’s prediction or improved it by resolving uncertain or borderline cases, leading to decisions that were more biologically consistent.
Conversely, using the Petri net module in isolation was not feasible, as it requires the initial predicted labels from the estimator to define representative healthy and patient groups used during DEG* identification. Therefore, the estimator module is indispensable for establishing the initial conditions, while the Petri net module refines and validates these predictions.
This analysis demonstrates that both modules contribute synergistically to the high diagnostic performance of the proposed system: the estimator module provides an initial classification based on expression similarity, and the Petri net model ensures biological plausibility, robustness, and the reliable correction of ambiguous predictions.
Discussion
The proposed Petri net–based framework predicts healthy versus Alzheimer’s disease (AD) patient profiles from gene expression datasets. Evaluated on the GSE63060 and GSE63061 datasets, the model achieved high classification accuracy across all samples while providing mechanistic insights into gene activation sequences underlying AD progression. Unlike previous machine learning approaches that report probabilistic metrics such as AUC-ROC, our framework delivers deterministic, biologically interpretable classifications, allowing transparent, rule-based tracking of molecular processes.
A major strength of the proposed method lies in its biological transparency and interpretability. Our approach employs a Petri net-based framework that performs deterministic binary classification, using interpretable, rule-based transitions that reflect the biological processes underpinning disease progression. As such, evaluation metrics such as accuracy, F1 score, precision, and recall were employed to assess model performance metrics that better reflect classification quality when predictions are categorical rather than probabilistic.
In contrast, AUC-ROC evaluates how well a model ranks samples by likelihood but does not reflect actual classification decisions. This key methodological distinction limits the feasibility of direct performance comparisons with probabilistic models. Unlike probabilistic classifiers, the Petri net framework allows for transparent tracking of gene activation sequences, elucidating the ordered molecular events that may contribute to neurodegenerative processes in Alzheimer’s disease.
In future works, integrating confidence estimates or hybrid probabilistic elements could enable compatibility with metrics such as AUC-ROC, allowing broader metric compatibility. Nonetheless, the current formulation prioritizes interpretability, simplicity, and clinical relevance, which positions our model as a compelling alternative for biologically informed, data-driven decision-making in the early detection of Alzheimer’s disease.
As a limitation, while the Petri net model captures the directionality and presence of gene-gene interactions, it does not incorporate kinetic rates, feedback loops, or non-linear dynamics typical of biological systems. This abstraction limits the model’s ability to mimic exact biological behavior.
Future work
Although the Petri net inherently provides interpretability through its rule-based structure, future research will focus on integrating explainable AI (XAI) techniques to further enhance model transparency and biological credibility. Applying XAI-based analyses such as feature attribution and pathway-level contribution mapping will allow us to quantify the influence of individual genes and signaling interactions on disease prediction. This integration will strengthen the biological interpretability of the model and provide deeper insights into the molecular mechanisms underlying Alzheimer’s disease progression. Furthermore, future studies could explore the integration of graph-based deep learning models such as those applied in recent biomarker discovery frameworks (e.g., matrix factorization and graph neural network–based methods) to enhance the predictive capability of Petri net–driven analyses and uncover novel disease–gene associations.
Conclusion
In this work, our main objective was to predict the healthy and diseased status of gene expression profiles by modeling signaling pathways. We successfully achieve this objective through several key components. First, we developed a tool to build the target Petri net model correctly, which plays a crucial role in our approach. This tool converts any selected source signaling pathway data from KEGG into the corresponding Petri net model. This automated process ensures efficient and reliable construction of the target Petri net model.
The built model represents all the interactions and dynamics of the signaling pathway related to the selected disease. Executing the constructed Petri net model enables binary classification of gene expression profiles based on their correspondence with disease-related signaling dynamics.
To enhance the results produced by the Petri net model, we have also conceived a tool for preprocessing the input datasets and making a preliminary estimation of healthy or diseased clusters. A comparison of the gene expression values of the DEGs in the preliminary clusters allowed the identification of DEG* genes whose gene expression values significantly differed.
In recent years, novel blood-based biomarkers of AD have been developed73. Different tissues, such as blood cells and brain cells, express distinct sets of genes on the basis of their specific functions and requirements74,75. Blood biomarkers often reflect systemic changes rather than the specific pathological processes occurring in the brain76.
Although signaling pathways in blood samples and brain tissue exhibit variations in terms of the genes involved and gene‒gene interactions, our developed model demonstrates satisfactory performance when applied to both datasets. This feature is particularly advantageous, as blood sampling is generally a simpler, less risky, and more cost-effective procedure for many diseases. Despite the limited number of samples in most training datasets, our proposed approach achieves three objectives: (a) high accuracy, (b) effectively handling sample variations across different classes, and (c) avoiding biased results.
Finally, our study introduces a novel and versatile approach for predicting disease status via a Petri net model of signaling pathways. On the basis of this approach, we gain comprehensive insights into (a) the key gene activation sequence, (b) the disease progression state, and (c) the role of each gene in pathway outcomes. This method holds promise as a valuable approach for analyzing and comprehending complex disease mechanisms in terms of gene‒gene interaction networks and for facilitating diagnosis and treatment strategies in bioinformatics and other related fields, such as drug design.
Furthermore, we successfully evaluated the proposed method via various KEGG datasets comprising other complex diseases, such as Parkinson’s disease, and a variety of cancers. These successful experiments will be revealed very shortly.
Acknowledgements
Authors would like to thank Dr. Fatemeh Mansoori for her encouragements and the initial technical supports to start this research.
Author contributions
H.E. and F.A. contributed equally in developing the tools needed for this research and wrote this article under the supervision of M.R. and K.K.; All authors read and approved the final manuscript.
Data availability
All raw data used in this study are publicly available from two sources:1. The KEGG signaling pathway for Alzheimer’s disease (AD), which was used to construct the Petri net model, was obtained from the KEGG database: [https://www.genome.jp/dbget-bin/www\_bget? pathway+hsa05010](https:/www.genome.jp/dbget-bin/www_bget? pathway+hsa05010) 0.2. Gene expression datasets used to evaluate the proposed method were retrieved from the NCBI Gene Expression Omnibus (GEO) repository ( [https://www.ncbi.nlm.nih.gov/geo/](https:/www.ncbi.nlm.nih.gov/geo) ) under the following accession numbers: GSE63060, GSE63061, GSE97760, GSE48350, GSE5281, GSE109887, and GSE37263.Intermediate data generated during this study are available upon reasonable request from the corresponding author.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Hananeh Ebrahimian and Fatemeh Asadzadeh.
Contributor Information
Maseud Rahgozar, Email: rahgozar@ut.ac.ir.
Kaveh Kavousi, Email: kkavousi@ut.ac.ir.
References
- 1.Juganavar, A., Joshi, A. & Shegekar, T. Navigating Early Alzheimer’s Diagnosis: A Comprehensive Review of Diagnostic Innovations. Cureus 15, (2023). [DOI] [PMC free article] [PubMed]
- 2.Shukla, R. & Singh, T. R. AlzGenPred - CatBoost-based gene classifier for predicting alzheimer’s disease using high-throughput sequencing data. Sci. Rep.14, 30294 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Anwal, L. A comprehensive review on alzheimer’s disease. World J. Pharm. Pharm. Sci.10, 1170 (2021). [Google Scholar]
- 4.Passeri, E. et al. Alzheimer’s disease: treatment strategies and their limitations. Int. J. Mol. Sci.23, 13954 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xu, F. et al. Diagnostic implications of ubiquitination-related gene signatures in alzheimer’s disease. Sci. Rep.14, 10728 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schaffert, J. et al. Predictors of life expectancy in Autopsy-Confirmed alzheimer’s Disease1. J. Alzheimer’s Dis.86, 271–281 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Alzheimer’s disease facts and figures. Alzheimer’s Dement. 15, 321–387 (2019). (2019).
- 8.Schaffert, J. et al. ZNF384: A potential therapeutic target for psoriasis and alzheimer’s disease through inflammation and metabolism. Front Immunol13, (2022). [DOI] [PMC free article] [PubMed]
- 9.Li, W., Zhao, Y., Chen, X., Xiao, Y. & Qin, Y. Detecting alzheimer’s disease on small dataset: a knowledge transfer perspective. IEEE J. Biomed. Heal Inf.23, 1234–1242 (2018). [DOI] [PubMed] [Google Scholar]
- 10.N Bryan, R. Machine learning applied to alzheimer disease. Radiology281, 665–668 (2016). at. [DOI] [PubMed] [Google Scholar]
- 11.AlMansoori, M. E., Jemimah, S., Abuhantash, F. & AlShehhi, A. Predicting early alzheimer’s with blood biomarkers and clinical features. Sci. Rep.14, 6039 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Teja, M. S., Thanuja, K., Deep, N. M., Reddy, P. R. & Reddy, O. L. K. Prediction and analysis of alzheimer’s disease using deep learning algorithms. Int. J. Comput. Learn. \& Intell.2, 48–57 (2023). [Google Scholar]
- 13.Othman, M. S., Kumaran, S. R. & Yusuf, L. M. Gene selection using hybrid Multi-Objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access.8, 186348–186361 (2020). [Google Scholar]
- 14.Ahmed, S. T., Kadhim, Q. K. & Alsultani, H. S. M. Abd almahdy, W. S. Applying the MCMSI for online educational systems using the Two-Factor authentication. Int. J. Interact. Mob. Technol.15, 162 (2021). [Google Scholar]
- 15.Alenizi, A. S. & Al-Karawi, K. A. Internet of Things (IoT) Adoption: Challenges and Barriers. In: Proceedings of Seventh International Congress on Information and Communication Technology: ICICT 2022, London, Volume 3 217–229 (2023). 10.1007/978-981-19-2394-4_20
- 16.Geldmacher, D. S. & Kerwin, D. R. Practical diagnosis and management of dementia due to alzheimer’s disease in the primary care setting. Prim. Care Companion CNS Disord. 10.4088/PCC.12r01474 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Weller, J. & Budson, A. Current understanding of alzheimer’s disease diagnosis and treatment. F1000Research7, 1161 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Porsteinsson, A. P., Isaacson, R. S., Knox, S., Sabbagh, M. N. & Rubino, I. Diagnosis of early alzheimer’s disease: clinical practice in 2021. J. Prev. Alzheimer’s Dis.8, 371–386 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hajjo, R., Sabbah, D. A., Abusara, O. H. & Al Bawab A. Q. A review of the recent advances in alzheimer’s disease research and the utilization of network biology approaches for prioritizing diagnostics and therapeutics. Diagnostics12, 2975 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ashraf, J., Ahmad, J., Ali, A. & Ul-Haq, Z. Analyzing the behavior of neuronal pathways in alzheimer’s disease using petri net modeling approach. Front Neuroinform12, (2018). [DOI] [PMC free article] [PubMed]
- 21.Liu, F., Heiner, M. & Gilbert, D. Protocol for biomodel engineering of unilevel to multilevel biological models using colored petri Nets. STAR. Protoc.4, 102651 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kodamullil, A. T., Younesi, E., Naz, M., Bagewadi, S. & Hofmann-Apitius, M. Computable cause‐and‐effect models of healthy and alzheimer’s disease States and their mechanistic differential analysis. Alzheimer’s Dement.11, 1329–1339 (2015). [DOI] [PubMed] [Google Scholar]
- 23.Heiner, M., Koch, I. & Will, J. Model validation of biological pathways using petri nets—demonstrated for apoptosis. Biosystems75, 15–28 (2004). [DOI] [PubMed] [Google Scholar]
- 24.Koch, I., Reisig, W. & Schreiber, F. Modeling in Systems Biology: the Petri Net Approach16 (Springer science \& business media, 2010).
- 25.Rybarczyk, A., Formanowicz, D. & Formanowicz, P. The role of macrophage dynamics in atherosclerosis analyzed using a petri Net-Based model. Appl. Sci.14, 3219 (2024). [Google Scholar]
- 26.Puniya, B. L., Verma, M., Damiani, C., Bakr, S. & Dräger, A. Perspectives on computational modeling of biological systems and the significance of the SysMod community. Bioinforma Adv.4, (2024). [DOI] [PMC free article] [PubMed]
- 27.Mansoori, F., Rahgozar, M. & Kavousi, K. FoPA: identifying perturbed signaling pathways in clinical conditions using formal methods. BMC Bioinform.20, 92 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Koch, I. & Büttner, B. Computational modeling of signal transduction networks without kinetic parameters: petri net approaches. Am. J. Physiol. Physiol.324, C1126–C1140 (2023). [DOI] [PubMed] [Google Scholar]
- 29.Baldan, P., Cocco, N., Marin, A. & Simeoni, M. Petri Nets for modelling metabolic pathways: a survey. Nat. Comput.9, 955–989 (2010). [Google Scholar]
- 30.Trinh, V. G., Benhamou, B. & Soliman, S. Trap spaces of boolean networks are conflict-free siphons of their petri net encoding. Theor. Comput. Sci.971, 114073 (2023). [Google Scholar]
- 31.Mansoori, F., Rahgozar, M. & Kavousi, K. A pathway analysis approach using petri net. IEEE J. Biomed. Heal Inf.25, 874–880 (2021). [DOI] [PubMed] [Google Scholar]
- 32.Gutowska, K. et al. Petri Nets and odes as complementary methods for comprehensive analysis on an example of the ATM–p53–NF-$$\kappa$$B signaling pathways. Sci. Rep.12, 1135 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Moigne, T., Le, Mahulea, C. & Faraut, G. Optimizing clinical pathways: A probabilistic time petri net approach. IFAC-PapersOnLine58, 96–101 (2024). [Google Scholar]
- 34.Wang, L. & Liu, Z. P. Detecting diagnostic biomarkers of Alzheimer’s disease by integrating gene expression data in six brain regions. Front. Genetics 10 at (2019). 10.3389/fgene.2019.00157 [DOI] [PMC free article] [PubMed]
- 35.Rybarczyk, A., Formanowicz, D. & Formanowicz, P. Key Therapeutic Targets to Treat Hyperglycemia-Induced Atherosclerosis Analyzed Using a Petri Net-Based Model. Metabolites 13, (2023). [DOI] [PMC free article] [PubMed]
- 36.Voyle, N. et al. B. A pathway based classification method for analyzing gene expression for alzheimer’s disease diagnosis. J. Alzheimer’s Dis.49, 659–669 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Pernice, S. et al. A computational approach based on the colored petri net formalism for studying multiple sclerosis. BMC Bioinform.20, 623 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cherdal, S. & Mouline, S. Petri nets for modelling and analysing a complex system related to alzheimer’s disease. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing 309–312 doi:0.1145/2851613.2851939. (2016).
- 39.Cherdal, S. & Mouline, S. Modelling and simulation of biochemical processes using petri Nets. Processes6, 97 (2018). [Google Scholar]
- 40.Lunnon, K. et al. A blood gene expression marker of early alzheimer’s disease. J. Alzheimer’s Dis.33, 737–753 (2013). [DOI] [PubMed] [Google Scholar]
- 41.Li, H. et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of alzheimer’s disease. Sci. Rep.7, 14027 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Li, X. et al. Systematic analysis and biomarker study for alzheimer’s disease. Sci. Rep.8, 17394 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee, T. & Lee, H. Prediction of alzheimer’s disease using blood gene expression data. Sci. Rep.10, 3485 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ji, W., An, K., Wang, C. & Wang, S. Bioinformatics analysis of diagnostic biomarkers for alzheimer’s disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas159, 38 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Qin, Q. et al. A diagnostic model for alzheimer’s disease based on blood levels of Autophagy-Related genes. Front. Aging Neurosci.14, 1–11 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wang, H. et al. Identification of blood biomarkers related to energy metabolism and construction of diagnostic prediction model based on three independent alzheimer’s disease cohorts. J. Alzheimer’s Dis.100, 1261–1287 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.He, Y., Cong, L., He, Q., Feng, N. & Wu, Y. Development and validation of immune-based biomarkers and deep learning models for alzheimer’s disease. Front. Genet.13, 1–15 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yu, W., Yu, W., Yang, Y. & Lü, Y. Exploring the key genes and identification of potential diagnosis biomarkers in alzheimer’s disease using bioinformatics analysis. Front. Aging Neurosci.13, 1–15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li, J. et al. Identification of diagnostic genes for both alzheimer’s disease and metabolic syndrome by the machine learning algorithm. Front. Immunol.13, 1–13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Akkaya, U. M. & Kalkan, H. A new approach for multimodal usage of gene expression and its image representation for the detection of alzheimer’s disease. Biomolecules13, 1563 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Abdelwahab, M. M., Al-Karawi, K. A. & Semary, H. E. Deep Learning-Based prediction of alzheimer’s disease using microarray gene expression data. Biomedicines11, 3304 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Nenning, K. H. & Langs, G. Machine learning in neuroimaging: from research to clinical practice. Die Radiol.62, 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kim, K. & Ha, J. G. M. F. L. D. A. Improved prediction of lncRNA- disease association via graph convolutional network. IEEE Access 1 (2025).
- 54.Ha, J. & Park, S. N. C. M. D. Node2vec-Based neural collaborative filtering for predicting MiRNA-Disease association. IEEE/ACM Trans. Comput. Biol. Bioinforma. 20, 1257–1268 (2023). [DOI] [PubMed] [Google Scholar]
- 55.Ha, J. Graph convolutional network with neural collaborative filtering for predicting miRNA-Disease association. (2025). 10.3390/biomedicines13010136 [DOI] [PMC free article] [PubMed]
- 56.Ha, J. S. M. A. P. Similarity-based matrix factorization framework for inferring miRNA-disease association. Knowledge-Based Syst.263, 110295 (2023). [Google Scholar]
- 57.Dinsdale, N. K. et al. A. I. L. Challenges for machine learning in clinical translation of big data imaging studies. Neuron110, 3866–3881 (2022). [DOI] [PubMed] [Google Scholar]
- 58.Qayyum Patel, R. A. & Mihailescu, R. C. Reducing Labeling Costs in Alzheimer’s Disease Diagnosis: A Study of Semi-Supervised and Active Learning with 3D Medical Imaging. in International Conference on Modeling, Simulation & Intelligent Computing (MoSICom) 264–269 (IEEE, 2023).10.1109/MoSICom59118.2023.10458754
- 59.Chang, C. Y., Slowiejko, D. & Win, N. Prediction and clustering of alzheimer’s disease by race and sex: a multi-head deep-learning approach to analyze irregular and heterogeneous data. Sci. Rep.14, 26668 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res.44, D457–D462 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kanehisa, M. & Goto, S. K. E. G. G. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res.53, D672–D677 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Gehlot, V. From petri NETS to colored petri NETS: A tutorial introduction to NETS based formalism for modeling and simulation. In: Winter Simulation Conference (WSC) 1519–1533 (2019).
- 64.Gehlot, V. & Nigro, C. An introduction to systems modeling and simulation with Colored Petri Nets. In: Proceedings of the 2010 Winter Simulation Conference 104–118IEEE, (2010). 10.1109/WSC.2010.5679170
- 65.Tai, K. Y. et al. Smart fall prediction for elderly care using iPhone and Apple watch. Wirel. Pers. Commun.114, 347–365 (2020). [Google Scholar]
- 66.Hardy, S. & Robillard, P. N. Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways. Bioinformatics24, 209–217 (2008). [DOI] [PubMed] [Google Scholar]
- 67.Pernice, S. et al. Exploiting Stochastic Petri Net formalism to capture the Relapsing Remitting Multiple Sclerosis variability under Daclizumab administration. in IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2168–2175 (IEEE, 2019).10.1109/BIBM47256.2019.8983368
- 68.Ghomri, L. & Alla, H. Modeling and analysis using hybrid petri Nets. 1, 141–153 (2007).
- 69.Xia, W., Zhang, R., Zhang, X. & Usman, M. A novel method for diagnosing alzheimer’s disease using deep pyramid CNN based on EEG signals. Heliyon9, e14858 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Uddin, K. M. M., Alam, M. J., Uddin, J. E. A., Aryal, S. & M. A. & A novel approach utilizing machine learning for the early diagnosis of alzheimer’s disease. Biomed. Mater. Devices. 1, 882–898 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Arya, A. D. et al. A systematic review on machine learning and deep learning techniques in the effective diagnosis of alzheimer’s disease. Brain Inf.10, 17 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom.21, 6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Garcia-Escobar, G. et al. Jiménez-Balado, J. Blood biomarkers of alzheimer’s disease and cognition: A literature review. Biomolecules14, 93 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Oh, S. L. et al. Alzheimer’s disease blood biomarkers associated with neuroinflammation as therapeutic targets for early personalized intervention. Front. Digit. Heal. 4, 875895 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Cheng, Y. W. et al. Application of blood-based biomarkers of alzheimer’s disease in clinical practice: recommendations from Taiwan dementia society. J. Formos. Med. Assoc.10.1016/j.jfma.2024.01.018 (2024). [DOI] [PubMed] [Google Scholar]
- 76.Varesi, A. et al. Blood-Based biomarkers for alzheimer’s disease diagnosis and progression: an overview. Cells11, 1–42 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All raw data used in this study are publicly available from two sources:1. The KEGG signaling pathway for Alzheimer’s disease (AD), which was used to construct the Petri net model, was obtained from the KEGG database: [https://www.genome.jp/dbget-bin/www\_bget? pathway+hsa05010](https:/www.genome.jp/dbget-bin/www_bget? pathway+hsa05010) 0.2. Gene expression datasets used to evaluate the proposed method were retrieved from the NCBI Gene Expression Omnibus (GEO) repository ( [https://www.ncbi.nlm.nih.gov/geo/](https:/www.ncbi.nlm.nih.gov/geo) ) under the following accession numbers: GSE63060, GSE63061, GSE97760, GSE48350, GSE5281, GSE109887, and GSE37263.Intermediate data generated during this study are available upon reasonable request from the corresponding author.












