Explainable AI-driven graph-based neural networks for mucopolysaccharidoses diagnosis

Ruba Fadul; Natnael Tumzghi; Mohamed Seghier; Fatma Al-Jasmi; Aamna AlShehhi

doi:10.1186/s13040-026-00523-7

. 2026 Jan 27;19:13. doi: 10.1186/s13040-026-00523-7

Explainable AI-driven graph-based neural networks for mucopolysaccharidoses diagnosis

Ruba Fadul ¹, Natnael Tumzghi ¹, Mohamed Seghier ¹, Fatma Al-Jasmi ^2,³, Aamna AlShehhi ^1,^✉

PMCID: PMC12874758 PMID: 41593722

Abstract

Background

Mucopolysaccharidoses (MPS) are a group of rare lysosomal storage disorders caused by enzyme deficiencies leading to the accumulation of glycosaminoglycans (GAGs). Their clinical heterogeneity and low prevalence contribute to delayed and often missed diagnoses. Early detection is critical for improving treatment outcomes. This study investigates the utility of Graph Neural Networks (GNNs) for diagnosing MPS using real-world electronic health records (EHRs).

Methods

Diagnostic features were extracted for 106 subjects (37 MPS and 69 controls) from the SEHA health system in Abu Dhabi. Four GNN architectures were trained and evaluated across seven feature selection strategies using nested stratified cross-validation. Model interpretability was assessed using Shapley Additive exPlanations (SHAP) to rank diagnostic features and PGExplainer to provide patient-level graph-topology explanations.

Results

The Graph Convolutional Network (GCN) combined with chi-square feature selection achieved the highest performance, with an AUC of 0.97 [95% CI: 0.93–1.00], sensitivity/specificity of 0.97/0.94, PPV/NPV of 0.9/0.98, F1-score of 0.93, and accuracy of 0.95. SHAP analysis highlighted clinically coherent diagnostic features aligned with the domain expert-driven features, while PGExplainer identified compact relational subgraphs that characterized MPS cases, complementing the global feature-level interpretation.

Conclusion

The proposed graph-based diagnostic framework demonstrates strong potential for early MPS diagnosis using EHR data typically available in the clinical setting. By integrating high-performing predictive modeling with both global and patient-level interpretability, this approach provides a promising foundation for developing clinically meaningful and data-driven screening tools for rare diseases, particularly in the context of limited data availability.

Supplementary information

The online version contains supplementary material available at 10.1186/s13040-026-00523-7.

Keywords: MPS disease, Early diagnosis, Graph neural networks

Background

Rare diseases (RDs) are complex and progressive conditions with a low prevalence; they are often genetic and manifest frequently in childhood [1]. Although more than 7,000 rare diseases have been identified worldwide, fewer than 500 of them have approved treatments [2]. Their rarity and complexity pose a major challenge for the healthcare system, such as inaccurate and delayed diagnosis [3]. The heterogeneity of these diseases highlights the importance of developing innovative diagnostic tools, treatments, and management protocols.

Artificial Intelligence (AI) techniques have recently been increasingly utilized in data-driven healthcare research across various applications [4], including effective disease diagnosis [5], genomic medicine [6], therapeutic drug monitoring [7], and precision medicine [8]. In particular, AI has demonstrated the ability to extract meaningful patterns from electronic health records (EHRs), which provide comprehensive patient information, improving disease prediction [9]. Despite the promise of AI with EHRs, challenges remain for rare diseases, including the limited availability of annotated datasets and the lack of interpretable models trusted by clinicians and patients [10]. While deep learning (DL) has helped overcome some of these obstacles, significant research gaps persist in rare disease diagnosis [11].

Researchers have applied DL methods to EHRs to extract latent representations of patients from raw clinical data. These approaches have enabled models to capture the complex relationships and interactions within EHR features, transforming them into robust feature representations for downstream clinical applications [11, 12]. Recently, graph-based approaches have emerged as an extension of DL models for EHR, as they are well-suited to extract the relationships among patients, diagnoses, and symptoms. While Zong et al. [13] did not focus on rare diseases, they showcased a multimodal approach of combining genetic reports with EHR data to demonstrate a patient–phenotypic–genetic network using graph-based representation to predict primary cancers. They used Health Level Seven (HL7), Fast Healthcare Interoperability Resources (FHIR), and Resource Description Framework (RDF) to build a network representation of the data, and then applied the Node2vec graph embedding algorithm to pull out key features that help classify primary cancers and predict cancers of unknown origin. Although Node2vec captured graph structure efficiently, it produced fixed features without task-specific updates and did not adapt features during training [13]. However, Graph Neural Network (GNN) models integrate the representation directly into the model, which leads to more robust, context-aware embeddings for complex tasks such as disease prediction from EHR data [11]. A great example is the work of Lu and Uddin, who introduced a GNN-based framework for chronic disease prediction [14]. They constructed a weighted patient network from a patient-disease bipartite graph and then employed GNNs to build the predictive models. Their obtained results showed that Graph Attention Networks (GAT) enhanced prediction outcomes for both cardiovascular and chronic pulmonary diseases.

Understanding the limitations of applying and compromising for the shortcomings of most predictive models using GNN becomes even more challenging when working on EHR for rare diseases, where data imbalance, very low prevalence, and, as a result, limited case samples are compounded by literature scarcity in this domain [11]. DL methods have been applied to counter that, for instance, the work of Alsentzer et al. introduced the SHEPHERD framework, which is a DL approach tailored for diagnosing rare genetic conditions [2]. They trained a GNN on simulated patient data and integrated knowledge of phenotypes, genes, and diseases. SHEPHERD was designed to support causal gene discovery and to retrieve “patients-like-me” for improved diagnostic insights, which was further evaluated on cohorts from Undiagnosed Diseases Network and MyGene2. This system ranks the correct gene first in 40% of patients spanning 16 disease areas compared to a non-guided baseline [2]. However, SHEPHERD is a knowledge-graph-based, multi-disease system that is mainly trained on simulated patients, not a real-world EHR.

Similarly, Sun et al. have constructed an innovative model based on GNNs by constructing a dual-graph framework comprising a medical concept graph and a patient record graph to integrate external medical knowledge [15]. Their model utilized neighborhood aggregation methods such as Graph Attention Networks and Graph Isomorphic Networks to learn meaningful symptom-disease relationships for accurate prediction of common and rare diseases. Several reviews also highlight the same research gaps. For example, Lee et al. found that DL research in rare diseases is dominated by data modalities such as image, with only 8 out of 332 studies using EHR data, where none of them apply GNNs [1]. Additionally, recent reviews likewise emphasize that most AI applications in rare diseases still rely on imaging and genomic data, with only limited use of real-world EHR. For example, Germain et al. identify EHR screening based on natural language processing (NLP) as one of three major AI-based methods that substantially improved the diagnosis of Fabry disease, alongside facial analysis and multi-omics approaches [16]. AI applications in Mucopolysaccharidoses (MPS) diagnosis are extremely rare; however, Kadali et al. applied a supervised machine learning algorithm, CART, to classify MPS subtypes using glycosaminoglycans (GAGs) biomarker profiles [17]. Their model established diagnostic cutoff thresholds and achieved over 95% accuracy, showing it can be used to support clinical diagnosis [17]. Overall, these studies outline a progression from traditional EHR analysis to deep and graph-based representation learning in the healthcare sector, and further rare diseases. Despite these promising advances, no study has evaluated a GNN on real-world EHR to diagnose a single rare disease such as MPS. This methodological progression of using DL methods for disease diagnosis creates the framework for our work, which leverages GNNs to address the complexities of MPS, a rare disease, bridging the current gap of utilizing GNNs to target the diagnosis of a certain rare disease using real-world EHR data.

This study focuses on the early diagnosis of rare diseases, particularly MPS disease, using graph-based neural networks. MPS are a group of rare, genetically inherited metabolic disorders characterized by the progressive accumulation of glycosaminoglycans (GAGs) due to the deficiency of lysosomal enzymes responsible for degrading GAGs [18]. MPS has seven defined subtypes in humans, namely MPS I, II, III, IV, VI, VII, and IX, varying in severity and prevalence based on the deficient enzyme [19, 20]. Each MPS subtype presents a spectrum of clinical manifestations that may vary for each subtype, yet many symptoms (particularly in MPS I and II) often overlap, whereas certain subtypes—such as MPS III and MPS VII—exhibit distinct characteristics like severe neurological problems and hydrops fetalis, respectively [18]. The diagnosis of MPS and its subtypes typically involves a combination of urinary and blood GAG tests, enzyme assays, and genetic testing [18]. The prevalence of MPS varies globally across different populations influenced by geographic regions and ethnic backgrounds. The birth prevalence of MPS in the UAE is 5.5 per 100,000 [21]. While Saudi Arabia reported the highest birth prevalence of 16.9 per 100,000 live births for overall MPS, due to regional or consanguineous marriages [22]. A study conducted between 1983 and 2008 in Saudi Arabia reported that out of 163,130 live births, 28 individuals were diagnosed with MPS [23]. In terms of global prevalence, Portugal and the Netherlands reported a birth prevalence of 4.8 and 4.5 per 100,000, respectively [24, 25].

In this work, we explore the potential of graph-based neural networks for early MPS diagnosis using real-world EHR data provided by Abu Dhabi Health Services network (SEHA). We constructed a patient graph where each node represents a patient and edges encode relationships based on shared diagnostic features extracted from medical records, allowing the model to capture complex relationships among medical diagnoses and improve prediction accuracy. Our proposed framework utilized a nested cross-validation scheme to train, optimize, and evaluate four GNN variants, namely Graph Convolutional Networks (GCNs), Graph Attention Networks, Graph SAmpling and AggreGatE (GraphSAGE), and Cheb Convolutional Networks (CCNs), across different feature sets selected using both domain experts-driven selection and automated feature selection techniques such as chi-square, mutual information, recursive feature elimination, and lasso regression. Further, we conducted feature importance analysis for the top-performing model to provide valuable insights into the significant diagnostic features and their clinical relevance for efficient MPS detection. Furthermore, we investigated the feasibility of a fully automated AI-based screening tool for MPS disease by comparing the key features of the top GNN model with automatically selected features against an expert-driven feature set. Our findings show the effectiveness of GNNs in distinguishing MPS patients from controls and highlight their potential in advancing rare disease diagnosis.

The main contributions of this study can be summarized as follows:

Developing and evaluating a graph-based diagnostic framework for early MPS diagnosis using the unique SEHA EHR dataset.
Demonstrating the feasibility of GNNs as a non-invasive, EHR-driven screening tool for MPS diagnosis by modeling complex comorbidity patterns and relational dependencies between patients.
Systematically assessing four GNN architectures across multiple feature sets generated through automated feature selection methods and domain expert knowledge.
Identifying and clinically validating key diagnostic features of the top-performing model through KernelSHAP-based global attribution and comparing them with domain expert-selected features to highlight clinically meaningful symptom patterns associated with MPS.
Introducing patient-level graph-topology interpretability using PGExplainer, revealing the critical relational subgraphs that drive individual predictions and providing a complementary, mechanism-oriented understanding of MPS classification decisions.

Methods

Dataset acquisition and characteristics

The dataset used in this study comprises 106 patients aged from 2 to 19 years old, recorded between 2004 and 2022 within SEHA, the largest and most comprehensive healthcare network in the UAE. The cohort includes two groups: 37 MPS-positive patients and 69 controls (i.e, MPS disease-free). The outcome variable was defined as a binary indicator, with 1 representing MPS-positive cases and 0 representing controls. The covariates, representing each patient’s historical diagnostic features, were derived from ICD-coded medical records encoded as binary indicators denoting the presence or absence of each symptom. No additional transformation, embedding, or numerical encoding methods were applied beyond binary representation.

After preprocessing, 1,186 diagnostic features were extracted, spanning physical and growth development, hearing and vision issues, respiratory and cardiac conditions, and oral and speech manifestations, with an average sparsity of 96.4% across the cohort. Because all features were binary and contained no missing values, no scaling or imputation was required. To avoid temporal leakage, only diagnostic features recorded before each patient’s MPS diagnosis date were retained; any features documented on or after the diagnosis date were excluded. The MPS cohort included patients diagnosed with any of the seven MPS subtypes. The cohort selection process is illustrated in Fig. 1.

Fig. 1 — Cohort selection flow diagram outlining patients’ inclusion criteria

Framework architecture design

The overall workflow of the MPS diagnostic pipeline using GNN-based models with EHR data is presented in Fig. 2. After extracting the diagnostic features from the subject’s medical records in the SEHA dataset, we employed a supervised binary graph node classification approach to tune and validate the different GNN models using nested cross-validation with Optuna hyperparameter optimization (HPO) framework. First, we split the dataset into training, validation, and testing splits using a nested stratified k-fold cross-validation scheme. Second, we selected the most informative features using different feature selection methods. Then, we constructed several graphs using the Euclidean distance between the diagnostic features to generate the graphs’ adjacency matrix. Following that, the GNN models are fitted on the data, following the models’ performance evaluation and interpretation of the best-performing model’s outputs.

Feature selection

After preparing and cleaning the dataset, we implemented various feature selection techniques on the training folds to select the most informative features. We have also examined the GNN models with all the diagnosis features without applying any feature selection technique.

Feature selection (FS) techniques improve the performance of predictive frameworks by eliminating irrelevant and redundant features from the input data, thereby identifying the most relevant features to the class output that contribute significantly to the classification task. This process reduces data noise and decreases the computational cost involved in building the model [26]. In this study, we applied multiple feature selection techniques on the dataset, including chi-square (CHI2), mutual information (MI), recursive feature elimination (RFE), lasso regression, genetic algorithm (GA), bat algorithm (BA), and domain-specific knowledge (DSK) methods, to select the most relevant features which were utilized later to construct the graph from the data.

Chi-square method

The chi-square test is a univariate feature selection method that evaluates the independence between each feature and the target variable, by assessing how the observed frequency distribution of the data deviates from the expected distribution [27]. Features with higher CHI2 values are considered more relevant. In our implementation, we utilized the SelectKBest algorithm to keep the top k features with the highest chi-square scores.

where O is the observed frequency and E is the expected frequency.

Mutual information method

Mutual Information measures the amount of information one variable contains about another, quantifying the dependency between variables. This approach captures non-linear relationships between features and the target variable [28]. Features that have greater MI scores convey more information. We computed the MI scores for all features and selected those exceeding a specific threshold to ensure the inclusion of features with significant predictive power. Mutual Information between two random variables X and Y is defined as:

where Inline graphic is the joint probability distribution of X and Y, and p(x) and p(y) are the marginal distributions.

Lasso regression method

Lasso regression is an embedded-based technique that imposes an L1 penalty on the regression coefficients, which shrinks some coefficients to zero, to effectively select a subset of features [29]. We applied Lasso regression to our medical diagnoses data, which selected features with non-zero coefficients as the most significant features. The Lasso regression objective function is defined as:

where n is the number of samples, p the number of features, x_i the i-th feature vector, y_i the target value, β the regression coefficients, and λ the regularization parameter. The L₁ penalty encourages sparsity by driving some coefficients exactly to zero, thereby performing feature selection.

Recursive feature elimination method

RFE is a wrapper-based feature selection method that recursively removes the least significant features based on model performance. Starting with all features, the model is trained, and the importance of each feature is assessed. The least important features are eliminated iteratively until the optimal subset of features is identified [30]. In our study, we employed a Logistic Regression classifier as the estimator within the RFE framework to determine the most relevant diagnostic features. Logistic Regression provides a more stable and lower-variance estimator for small datasets, thereby reducing overfitting risk while maintaining robust feature ranking. The RFE feature selection can be summarized using the pseudo-code below:

Genetic algorithm

GA is a method of solving complex optimization problems by mimicking the biological evolution process, such as heredity and mutation biology, by iterative modification of a population of individual solutions [31]. The process begins by defining an objective function and initializing a community. Each member is assigned a rating based on its value, and the solution is iteratively refined until the performance criterion is satisfied [31]. The probability of selecting individual i (with fitness Inline graphic ) for reproduction is given by:

where N is the population size, and p_i is the chance that x_i is chosen as a parent.

Bat algorithm

BA is a metaheuristic optimization technique inspired by the echolocation of micro-bats and utilizes a population-based approach where each bat represents a potential solution that is iteratively updated in its position and velocity within the search space [32]. The algorithm incorporates techniques such as frequency tuning, amplitude modulation, and pulse emission to balance global exploration of the solution space and local exploitation of promising areas [32]. Each “bat” i has a position x_i, a velocity v_i, and a frequency f_i [32]. At iteration t, the velocity and position update are given by:

where Inline graphic is the best solution found up to iteration t − 1, and is the bat’s frequency at iteration t.

Domain-specific knowledge method

In addition to the automated feature selection methods, we incorporated domain-specific knowledge to identify clinically relevant diagnostic features. A total of 71 features were selected in consultation with the medical expert and co-author (FA), a board-certified biochemical geneticist with expertise in MPS and rare metabolic disorders. These features were derived from expert clinical judgment and were not subjected to a multi-expert consensus process, reflecting the single-center nature of the dataset and the rarity of the condition. Incorporating this expert-curated feature set ensures that the model captures clinically significant indicators that may not be detected through purely data-driven methods.

Graph construction

We constructed undirected and unweighted patient graphs using the k-Nearest Neighbors (kNN) graph construction technique. The nodes represent patients, and edges denote similarity relationships based on diagnostic features. Since the diagnostic input features used for graph construction were binary, and therefore no additional scaling or normalization was required before distance computation. The distance metric and the number of neighbors k used for kNN graph construction were included in the hyperparameter search space and jointly optimized within the nested CV framework. For each trial, the distance between two patients was computed in the normalized feature space, and the k nearest neighbors of each patient were identified to generate the adjacency matrix [33]. This procedure was repeated independently for each fold inside the nested CV pipeline to avoid information leakage. Labels 0 and 1 were assigned to nodes representing control and MPS patients, respectively.

To make the graph construction procedure more intuitive, Fig. 3 provides a small illustrative example of a patient similarity graph constructed using 7 hypothetical patients and their top 3 k-nearest neighbor connections using symptom-based distances. Each node represents a patient (blue = control, red = MPS), and edges connect each patient to its Inline graphic most similar neighbors based on diagnostic symptom similarity in feature space. This example visualizes how symptom-based similarity relationships form the adjacency structure used by the GNN models.

GNN models

To explore the node classification performance of GNN models in the diagnosis of MPS, we implemented four GNN-based classifiers: graph convolutional networks, graph attention networks, graph sampling and aggregate, and cheb convolutional networks. Each of these models adopts a unique approach for aggregating and propagating information through the graph structure, offering different node embedding representations.

GCNs

GCNs utilize spectral graph convolutions, where node features are aggregated using normalized adjacency matrices to capture local neighborhood information [34]. This method effectively smooths features across neighboring nodes, enhancing classification accuracy while maintaining computational efficiency.

The GCN layer operation applies a transformation to node embeddings and propagates information across neighboring nodes. The equation for a single GCN layer is:

where Inline graphic is the node feature matrix at layer l, is the normalized adjacency matrix with added self-loops, is the degree matrix of , is the learnable weight matrix, and is a non-linear activation function [34]. This equation describes how each layer updates node representations by aggregating and transforming features.

GATs

GATs introduce an attention mechanism to weigh the importance of neighboring nodes by assigning different importance weights to neighboring nodes during feature aggregation [35]. The attention coefficient between nodes i and j is computed as:

where h_i and h_j are the feature vectors of nodes i and j, W is a learnable weight matrix, α is the attention weight vector, Inline graphic represents vector concatenation, N(i) denotes the set of neighbors of node i, and the LeakyReLU function introduces non-linearity to the attention mechanism.

The final node embedding is obtained by aggregating the attention-weighted features:

This mechanism allows GATs to dynamically focus on the most relevant neighbors while reducing the influence of less informative nodes [35].

GraphSAGE

GraphSAGE employs an inductive learning framework that enables the model to generalize to unseen nodes by following sampling and aggregation approach, where it samples a fixed number of neighbors for each node and aggregates information to generate node embeddings [36]. The embedding update step is given by:

where Inline graphic is the embedding of node i at layer l, is a learnable weight matrix, is a function that aggregates neighbor features (e.g., mean, or max pooling), and is a non-linear activation function. The GraphSAGE framework is more scalable for large datasets and allows for efficient node embeddings even in dynamic graphs [36].

CCNs

CCNs, based on Chebyshev polynomials, extend the spectral convolution framework by approximating graph filters through recursive polynomial expansions [37]. The convolution operation is defined as:

where K is the polynomial order, θ_k are the learnable Chebyshev coefficients, Inline graphic are Chebyshev polynomials computed recursively as:

The rescaled Laplacian Inline graphic is defined as:

where L is the graph Laplacian matrix, and Inline graphic is the largest eigenvalue of L.

By employing Chebyshev polynomials, the model captures multi-hop neighborhood information efficiently, reducing computational overhead compared to standard spectral graph convolutions [37].

Each model was built by sequentially stacking a GNN layer followed by an activation function to introduce non-linearity, a PairNorm [38] normalization layer to stabilize training by mitigating exploding/vanishing gradients, and a dropout layer to prevent overfitting. This sequence of a GNN layer, activation function, normalization, and dropout was repeated across multiple layers before generating the final node embeddings. The number of layers and other hyperparameters, such as depth, were optimized independently through hyperparameter optimization within the same search space to achieve the best performance. This approach facilitates efficient learning of node embeddings from the graph data and ensures a consistent performance comparison.

To further evaluate the effectiveness of GNN models, we compared them to traditional ML methods, including Decision Trees (DT), Support Vector Classifier (SVC), and eXtreme Gradient Boosting (XGBoost). These ML models also underwent hyperparameter optimization to ensure a fair comparison. This comparative analysis aimed to demonstrate the advantages of GNN models in handling graph-structured data for MPS disease prediction.

Nested cross-validation

To evaluate the graph-based diagnostic models and jointly optimize the hyperparameters of the feature selection methods, graph construction, and GNN architectures, we employed a nested cross-validation (CV) framework [39]. This design provides an unbiased estimate of generalization performance while reducing the risk of overfitting and optimistic bias.

In this framework, the outer cross-validation loop provides an unbiased estimate of the generalization performance, while the inner cross-validation loop is dedicated to feature selection and hyperparameter optimization. We employed a stratified five-fold CV in the outer loop to split the dataset into training and testing subsets. During each outer iteration, nodes corresponding to the held-out fold were masked and reserved exclusively for testing, whereas the remaining nodes were used for model training and hyperparameter optimization. The final reported performance represents the average of the evaluation metrics obtained by bootstrapping all the test samples.

FS was integrated within the nested structure to avoid data leakage. In the inner loop, FS, graph construction, and model hyperparameters were jointly optimized using Optuna, with FS applied strictly to the inner-training folds and never to validation or test data. After the best configuration was selected, the same FS method and corresponding hyperparameters were reapplied to the outer-training fold to derive the final feature subset and construct the graph for that fold. The GNN was then trained on this outer-training set and evaluated on the unseen outer-test fold, which remained fully isolated throughout the entire procedure.

Hyperparameter optimization

Hyperparameter optimization was carried out in the inner CV loop using Optuna [40], which efficiently explores the joint search space of FS methods, graph construction settings, and GNN model parameters. Fifty optimization trials were performed per outer fold, with the Tree-structured Parzen Estimator (TPE) sampler [41] proposing candidate hyperparameter sets. The objective function maximized the mean AUC across inner validation folds.

During each inner iteration, FS was applied only to the inner-training subset to identify the most informative diagnostic features for that fold. A graph was then constructed using the selected features and the candidate (k and distance metric) hyperparameters, and the model was trained and evaluated on the corresponding validation fold. The best configuration was chosen based on the average validation AUC.

After identifying the optimal setup, FS was reapplied to the full outer-training fold, the graph was reconstructed accordingly, and the GNN was trained and evaluated on the masked outer-test fold. This hierarchical workflow guarantees that no information from the validation or test folds influenced feature selection or hyperparameter tuning, ensuring an unbiased estimation of model generalization.

The joint hyperparameter search spaces for FS methods, graph construction, and GNN architectures are summarized in Tables 1, 2, and 3. FS optimization included the number of selected features for CHI2 and MI, the alpha and threshold parameters for Lasso, and the number of retained features for RFE. Metaheuristic methods optimized parameters such as population size, mutation probability, loudness, and pulse rate. Graph construction tuning involved the number of neighbors and the distance metric. GNN hyperparameters included layer depth, hidden dimensions, activation function, dropout, optimizer, and learning rate, with model-specific settings, such as attention heads for GAT, and neighborhood size K for CCN. Classical models, including DT, SVC, and XGBoost, had their respective hyperparameters optimized under the same framework. All models were trained for 200 epochs using binary cross-entropy loss. A fixed number of training epochs was adopted to ensure stable convergence, as implementing early stopping on this small dataset would require holding out an additional validation split and could further reduce the effective training set, leading to unstable estimates. Overfitting was instead controlled through dropout regularization, the nested CV framework, and evaluation on an independent outer-test fold. All hyperparameters were optimized simultaneously within the same Optuna search process.

Table 1.

Hyperparameters search space for the feature selection methods

Method	Parameters	Parameters Search Space
CHI2	No. of selected features	{10, 15, 20, 30, 50}
MI	No. of selected features	{10, 15, 20, 30, 50}
Lasso	Alpha	numerical space ranging from 0.003 to 0.01 with steps 0.001
Lasso	Threshold	numerical space ranging from 0.03 to 0.1 with steps 0.01
RFE	No. of features	{10, 15, 20, 30, 50}
GA	Population size	{20, 30, 40, 50}
GA	Mutation probability	numerical space ranging from 0.02 to 0.1
BA	Population size	{20, 30, 40, 50}
	Loudness	numerical space ranging from 0.5 to 0.9
	Pulse Rate	numerical space ranging from 0.6 to 0.99

Open in a new tab

Table 2.

Hyperparameters search space for the graph construction techniques

Parameters	Parameters Search Space
No. of neighbors	{3, 5, 7, 9}
Distance metric	{cosine, euclidean, jaccard}

Open in a new tab

Table 3.

Hyperparameters search space for the MPS classification models

Model	Parameters	Parameters Search Space
GCN	Layers dimensions	{(8), (16), (32), (64), (8, 8), (16, 8), (32, 16), (64, 32)}
	Activation function	{ReLU, ELU, Tanh}
	Dropout rate	{0.1, 0.2, 0.3, 0.4, 0.5}
	Optimizer	{Adam, RMSprop}
	Learning rate	{0.001, 0.005, 0.01, 0.05, 0.1}
GAT	Layers dimensions	{(8), (16), (32), (64), (8, 8), (16, 8), (32, 16), (64, 32)}
	No. of heads	{1, 2, 3, 4, 5}
	Activation function	{ReLU, ELU, Tanh}
	Dropout rate	{0.1, 0.2, 0.3, 0.4, 0.5}
	Optimizer	{Adam, RMSprop}
	Learning rate	{0.001, 0.005, 0.01, 0.05, 0.1}
GraphSAGE	Layers dimensions	{(8), (16), (32), (64), (8, 8), (16, 8), (32, 16), (64, 32)}
	Activation function	{ReLU, ELU, Tanh}
	Dropout rate	{0.1, 0.2, 0.3, 0.4, 0.5}
	Optimizer	{Adam, RMSprop}
	Learning rate	{0.001, 0.005, 0.01, 0.05, 0.1}
CCN	Layers dimensions	{(8), (16), (32), (64), (8, 8), (16, 8), (32, 16), (64, 32)}
	K	{1, 2, 3, 4, 5}
	Activation function	{ReLU, ELU, Tanh}
	Dropout rate	{0.1, 0.2, 0.3, 0.4, 0.5}
	Optimizer	{Adam, RMSprop}
	Learning rate	{0.001, 0.005, 0.01, 0.05, 0.1}
DT	Criterion	{gini, entropy}
	Maximum depth	{None, 3, 5, 7, 9}
	Maximum features	{None, sqrt, log2}
	Minimum samples split	{0.5, 2, 4, 8}
	Minimum samples leaf	{0.25, 1, 2, 4}
SVC	Regularization parameter	{0.1, 1, 10, 100}
	Kernel	{rbf, linear, sigmoid}
	Gamma	{scale, auto}
XGBoost	Learning rate	{0.001, 0.01, 0.1, 1}
	Maximum depth	{3, 5, 7, 9}
	Subsample ratio	{0.5, 0.6, 0.7, 0.8, 0.9, 1}
	Colsample by tree ratio	{0.5, 0.6, 0.7, 0.8, 0.9, 1}
	No. of estimators	{10, 50, 100, 200}

Open in a new tab

For completeness, we report the optimal hyperparameter configuration selected by Optuna in each outer fold of the best-performing GNN pipeline (Supplementary Table S1), and we summarize the selection frequency of these hyperparameter values across folds in Table 4.

Table 4.

Selection frequency of hyperparameter values across outer folds of the best-performing GNN pipeline

	Parameter	Value	Selection Frequency
CHI2 Method	k	50	1
Graph Construction	No. of neighbors	9	0.6
		5	0.2
		7	0.2
	Distance metric	euclidean	0.8
	Distance metric	jaccard	0.2
GCN Model	Layers dimensions	[8]	0.6
		[8, 16]	0.2
		[8, 8]	0.2
	Activation function	ELU	0.8
	Activation function	ReLU	0.2
	Dropout rate	0.3	0.6
	Dropout rate	0.2	0.4
	Optimizer	Adam	0.6
	Optimizer	RMSprop	0.4
	Learning rate	0.005	0.6
		0.001	0.2
		0.05	0.2

Open in a new tab

Models performance evaluation

The node classification performance of the proposed GNN models, combined with different feature selection techniques, was evaluated under multiple hyperparameter settings using a nested cross-validation framework. The Area Under the ROC Curve (AUC) was employed as the optimization metric in the inner loop, where the hyperparameter configuration yielding the highest mean AUC across validation folds was selected. For each outer fold, a decision threshold was computed using Youden’s J index, assuming equal misclassification costs, and applied to the model outputs on the held-out outer test set to obtain binary class labels.

To assess the stability and uncertainty of the performance estimates, we performed patient-level bootstrapping on the aggregated predictions from all outer folds. For each bootstrap resample, evaluation metrics were recomputed to provide variance estimates that reduce sensitivity to outliers and give more reliable performance summaries [42]. We report sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), F1-score, and accuracy as mean values with corresponding 95% confidence intervals (CI) derived from 1000 bootstrap replicates. In addition, we visualized ROC curves using the average AUC and its 95% confidence interval over the 1000 bootstraps.

The ROC analysis is quantified using AUC, which measures the ability of the model to discriminate between positive and negative cases. AUC can be expressed as the probability that a randomly chosen positive sample is ranked higher than a randomly chosen negative sample:

where Inline graphic is the model score, is a positive instance, and is a negative instance.

SE (also known as True Positive Rate or recall) quantifies the proportion of actual positives that are correctly identified [43]. The formula is given by:

where TP and FN refer to true positive and false negative, respectively.

SP (also known as True Negative Rate) quantifies the proportion of actual negatives that are correctly identified [43]. The formula is given by:

where TN and FP refer to true negatives and false positives, respectively.

PPV represents the proportion of correct positive predictions:

NPV represents the proportion of negative predictions that are correct:

F1-score is described as a harmonic mean of precision and recall, using the following equation:

Accuracy measures the proportion of correctly predicted samples and can be summarized using the following equation:

To further illustrate the representative relationships learned by the proposed graph-based diagnostic framework, we visualized the node embeddings in a latent feature space. The embeddings were extracted from the final graph convolution layer of the model, before the prediction layer, and projected into a two-dimensional space for visualization. Each node corresponds to a patient and is color-coded according to its class label (control or MPS). The visualization captures both training and test nodes before and after training, revealing structured nodes’ arrangements that reflect the model’s ability to differentiate between classes based on learned relationships. Since the output dimension of the last convolutional layer ranges between 8 and 32 (as shown in Table 3), the high-dimensional embeddings were reduced to two dimensions using the t-distributed Stochastic Neighbor Embedding (t-SNE) technique for intuitive interpretation. The 2D t-SNE scatter plots were generated using perplexity = 15, number of iterations = 500, and random state = 0. Additionally, Silhouette scores were computed for the visualized embeddings as a quantitative measure of class separation quality.

Interpretability methods

The complexity of graph-structured data and the nonlinear propagation of information across nodes can obscure the decision-making process of GNNs. To enhance interpretability, we designed a dual-component framework that (i) provides a global ranking of diagnostic features contributing to graph construction using Shapley Additive exPlanations (SHAP), and (ii) generates patient-level graph-topology explanations using the PGExplainer method.

Feature-level global ranking attribution

To quantify the contribution of individual diagnostic features on the constructed graph and subsequent predictions, we employed the model-agnostic KernelSHAP approach [44]. This method treats the feature-to-graph mapping as the model input pathway to the trained GNN. For each evaluation, a model function reconstructs the patient-similarity graph from candidate feature matrices and returns the model’s logit outputs, which isolate the effect of feature perturbations on graph connectivity. KernelSHAP then estimates Shapley values of symptom features using a background set that consists of all the training nodes. Each explanation used the default number of KernelSHAP samples per instance with default kernel weighting, l₁ regularization, and fixed random seeds for reproducibility.

Within each outer fold of the nested cross-validation, mean absolute Shapley values ( Inline graphic ) were computed per feature over test nodes, and global importance was calculated as the average across outer folds. After obtaining feature importance rankings, common features across folds were aggregated and visualized in descending order to highlight consistent predictors of MPS diagnosis. This global attribution analysis identifies the most influential diagnostic features driving graph construction and supports clinical interpretability of the model outputs.

Graph-topology patient-level local explanations

To complement the global feature-level analysis, we employed PGExplainer [45] to interpret the model’s decision-making on the graph itself. For a target patient (node), PGExplainer learns a sparse edge mask over the node’s h-hop neighborhood that maximizes the fidelity of the model’s prediction, thus revealing the most critical patient-to-patient connections influencing the classification outcome. The explainer was trained within each outer fold using the training nodes and subsequently applied to the corresponding test nodes. Representative control and MPS cases were visualized across each of the outer folds to qualitatively demonstrate the discovered subgraphs and clinically coherent neighbor relationships.

Further implementation details, including software versions and libraries used, are provided in the Supplementary Material.

Results

Performance evaluation

Table 5 summarizes the performance of all graph-based diagnostic models across the different feature selection methods. Using AUC as the primary discrimination measure, the GCN model demonstrated the strongest and most consistent performance, achieving AUCs between 0.88–0.97 depending on the feature subset. The best results were obtained with the CHI2 feature set (AUC = 0.97, SE/SP = 0.97/0.94, PPV/NPV = 0.90/0.98, F1-score = 0.93, accuracy = 95%), followed by Lasso (AUC = 0.96) and RFE (AUC = 0.95). Feature subsets generated by GA, BAT, and DSK also produced strong performance with AUCs in the 0.93–0.94 range.

Table 5.

Bootstrapped evaluation of GNN and classical machine learning models across feature selection methods, with performance metrics averaged over 1000 resamples

Model	Metrics	All Features	Different Features Selection Methods
Model	Metrics	All Features	CHI2	MI	Lasso	RFE	GA	BA	DSK
GCN	AUC	0.96	0.97 ± 0.02	0.88	0.96	0.95	0.93	0.93	0.94
	SE	0.97	0.97 ± 0.03	0.84	0.97	0.97	0.95	0.92	0.89
	SP	0.93	0.94 ± 0.03	0.91	0.91	0.91	0.87	0.93	0.96
	PPV	0.88	0.90 ± 0.05	0.84	0.86	0.86	0.8	0.87	0.92
	NPV	0.99	0.98 ± 0.01	0.91	0.98	0.98	0.97	0.96	0.94
	F1-score	0.92	0.93 ± 0.03	0.84	0.91	0.91	0.86	0.89	0.9
	Accuracy	0.94	0.95 ± 0.02	0.89	0.93	0.94	0.9	0.92	0.94
GAT	AUC	0.9	0.91	0.86	0.9	0.96 ± 0.02	0.86	0.86	0.9
	SE	0.92	0.97	0.84	0.87	0.95 ± 0.04	0.92	0.92	0.81
	SP	0.85	0.93	0.86	0.93	0.94 ± 0.03	0.81	0.78	0.94
	PPV	0.77	0.88	0.76	0.86	0.90 ± 0.05	0.72	0.69	0.88
	NPV	0.95	0.98	0.91	0.93	0.97 ± 0.02	0.95	0.95	0.9
	F1-score	0.84	0.92	0.79	0.86	0.92 ± 0.03	0.81	0.79	0.84
	Accuracy	0.88	0.94	0.85	0.91	0.94 ± 0.02	0.85	0.83	0.9
SAGE	AUC	0.82	0.90 ± 0.04	0.84	0.82	0.84	0.73	0.83	0.72
	SE	0.87	0.92 ± 0.04	0.81	0.89	0.97	0.68	0.89	0.65
	SP	0.91	0.90 ± 0.04	0.9	0.83	0.88	0.9	0.85	0.91
	PPV	0.84	0.83 ± 0.06	0.82	0.74	0.82	0.78	0.77	0.8
	NPV	0.93	0.95 ± 0.03	0.9	0.94	0.98	0.84	0.94	0.83
	F1-score	0.85	0.87 ± 0.04	0.81	0.81	0.89	0.72	0.82	0.71
	Accuracy	0.9	0.90 ± 0.03	0.87	0.85	0.91	0.82	0.87	0.82
CCN	AUC	0.85	0.91	0.93 ± 0.02	0.93	0.9	0.71	0.8	0.9
	SE	0.65	0.94	0.97 ± 0.03	0.92	0.92	0.76	0.78	0.92
	SP	0.99	0.9	0.87 ± 0.04	0.88	0.84	0.75	0.84	0.87
	PPV	0.96	0.83	0.80 ± 0.06	0.81	0.76	0.62	0.73	0.79
	NPV	0.84	0.97	0.98 ± 0.02	0.95	0.95	0.85	0.88	0.95
	F1-score	0.77	0.88	0.88 ± 0.04	0.86	0.83	0.68	0.75	0.85
	Accuracy	0.87	0.91	0.91 ± 0.03	0.9	0.87	0.75	0.82	0.88
DT	AUC	0.8	0.82	0.88 ± 0.04	0.87	0.86	0.87	0.88	0.86
	SE	0.78	0.92	0.94 ± 0.04	0.92	0.84	0.84	0.89	0.97
	SP	0.81	0.78	0.80 ± 0.05	0.77	0.84	0.86	0.79	0.76
	PPV	0.69	0.69	0.71 ± 0.06	0.68	0.74	0.76	0.7	0.68
	NPV	0.87	0.95	0.96 ± 0.03	0.95	0.91	0.91	0.93	0.98
	F1-score	0.73	0.79	0.81 ± 0.05	0.78	0.78	0.79	0.78	0.8
	Accuracy	0.8	0.83	0.85 ± 0.03	0.82	0.84	0.85	0.83	0.83
SVC	AUC	0.91 ± 0.03	0.87	0.85	0.88	0.87	0.86	0.88	0.89
	SE	0.94 ± 0.04	0.89	0.86	0.92	0.92	0.92	0.97	0.95
	SP	0.87 ± 0.04	0.84	0.83	0.84	0.83	0.8	0.78	0.82
	PPV	0.79 ± 0.06	0.75	0.73	0.76	0.74	0.71	0.71	0.74
	NPV	0.97 ± 0.02	0.94	0.92	0.95	0.95	0.95	0.98	0.97
	F1-score	0.86 ± 0.04	0.81	0.79	0.83	0.82	0.8	0.82	0.83
	Accuracy	0.90 ± 0.03	0.86	0.84	0.87	0.86	0.84	0.85	0.87
XGBoost	AUC	0.89	0.88	0.88	0.88	0.89	0.89	0.91	0.90 ± 0.03
	SE	0.97	0.95	0.95	0.97	0.95	0.92	0.89	0.95 ± 0.04
	SP	0.84	0.88	0.87	0.83	0.87	0.85	0.93	0.90 ± 0.04
	PPV	0.76	0.81	0.8	0.75	0.8	0.77	0.87	0.83 ± 0.06
	NPV	0.98	0.97	0.97	0.98	0.97	0.95	0.94	0.97 ± 0.02
	F1-score	0.85	0.87	0.86	0.85	0.86	0.84	0.88	0.89 ± 0.04
	Accuracy	0.89	0.91	0.9	0.88	0.9	0.88	0.92	0.92 ± 0.03

Open in a new tab

Note: This table presents the performance of the four GNN models, GCN, GAT, SAGE, and CCN, and three classical machine learning models, DT, SVC, and XGBoost. Each model was evaluated using multiple feature selection techniques, CHI2, MI, RFE, lasso, GA, BA, and DSK. The reported metrics show the 1000 bootstrapped means for the AUC, SE, SP, PPV, NPV, F1-score, and accuracy, with the standard deviation provided only for the best feature selection across each model. Bold values indicate the best-performing model with the selected feature set

The GAT model achieved the highest performance with the RFE feature subset (AUC = 0.96), comparable to GCN trained on the full feature set. With all features, CHI2, Lasso, and DSK methods, GAT achieved moderate AUCs (0.90–0.91) but showed greater variability across PPV, NPV, and F1-scores. The GraphSAGE model exhibited the least effective discrimination overall, with performance dependent on the selected features; the CHI2 feature subset produced the best AUC of 0.90, while GA and DSK techniques led to notably reduced performance. Alternatively, the CCN model performed optimally with MI and Lasso (AUC = 0.93 for both), showing better alignment with these automated feature subsets than with CHI2. Across all traditional machine learning baselines, including DT, SVC, and XGBoost, performance consistently lagged behind the top-performing GNN pipelines, highlighting the benefit of modeling graph-structured patient relationships.

Overall, CHI2 emerged as the most effective feature selection method, yielding the top-performing GCN–CHI2 pipeline and providing strong synergy with graph-based architectures. MI and Lasso FS methods were reliable alternatives across multiple models, while RFE was uniquely advantageous for the GAT model. In contrast, metaheuristic methods, namely GA and BAT, and DSK, produced more variable and generally lower results. The 95% confidence intervals of all the metrics were reported in Supplementary Table S4 to provide a more robust estimate of metric variability and should be considered when interpreting the stability of each model–feature selection combination.

Figure 4 shows the detailed ROC curves with 95% confidence intervals for all GNN models across the feature selection methods. Overall, GCN and GAT models exhibited the strongest and most stable discrimination, with consistently higher ROC curves than SAGE and CCN models. GCN model achieved the best performance with CHI2 (AUC = 0.97; 95% CI: 0.93–1.00) and maintained high AUCs across Lasso, RFE, GA, BAT, and DSK feature sets as presented in Fig. 4a. The GAT model performed comparably well, particularly with RFE (AUC = 0.96; 95% CI: 0.91–0.99), while CHI2 and Lasso methods also provided stable results around 0.90–0.91 as shown in Fig. 4b. In contrast, SAGE and CCN models showed more variable and generally lower ROC curves, with performance drop under GA and BAT but improving with CHI2, MI, and Lasso FS methods. These trends highlight the clear advantage of GCN and GAT models and emphasize the importance of feature selection choice, particularly CHI2 for GCN and RFE for GAT, in driving the diagnostic performance.

Fig. 4 — ROC curves for the different GNN models in combination with feature selection methods showing the AUC along with the 95% CI. (a) GCN model. (b) GAT model. (c) SAGE model. (d) CCN model

We further compared the performance of the graph-based models using ROC–AUC curves and F1-score box plots based on the best feature selection method identified for each GNN model, as presented in Fig. 5. The ROC–AUC curves with 95% confidence intervals (Fig. 5a) clearly demonstrate the superior discriminative performance of the GCN model relative to the other architectures. Similarly, the F1-score distributions shown in Fig. 5b indicate that the GCN model consistently outperforms GAT, while both SAGE and CCN exhibit notably weaker performance.

Fig. 5 — Comparison of the performance of GNN models using the best-performing feature selection method identified for each model. (a) ROC curves. (b) F1 score boxplots

To provide a deeper understanding of the classification performance of the GCN–CHI2 model, we also report the fold-wise normalized confusion matrices in Fig. 6. Across the five outer folds, the model demonstrated strong and consistent performance. Perfect classification was obtained in Folds 02 and 03, and perfect MPS sensitivity was maintained in Fold 04. In Folds 01 and 05, the model correctly identified the majority of MPS cases, with only minor misclassification of control patients. Collectively, these results highlight the robustness of the GCN–CHI2 model as the top-performing pipeline.

Fig. 6 — Normalized confusion matrices illustrating the fold-wise performance of the best-performing GNN model across five outer cross-validation folds ((a)–(e)). (a) Fold 01. (b) Fold 02. (c) Fold 03. (d) Fold 04. (e) Fold 05

The t-SNE visualizations of the node embeddings, shown in Fig. 7 for all five folds, provide insight into the learned representations of the top-performing GCN model with CHI2-selected features. Prior to training, nodes are scattered without any clear pattern, reflecting the absence of meaningful class separation in the initial high-dimensional feature space. After training, however, the t-SNE projections reveal a more organized arrangement, with MPS and control nodes forming distinguishable clusters. This indicates that the model has successfully learned feature representations that capture diagnostically relevant relationships between patients. Since t-SNE provides only qualitative evidence, we complemented these visualizations with Silhouette scores computed for both the pre- and post-training embeddings in each fold. As shown in Fig. 7, the Silhouette scores consistently increased after training, reflecting more coherent intra-class grouping and improved separation between MPS and control nodes across all folds.

Interpretability methods

We interpret the graph-based model by analyzing the importance of the diagnostic features used to construct the graph edges. Evaluating each feature’s contribution to the model’s predictions allows us to identify the most influential clinical indicators and better understand the decision-making process of the GNN. In our analysis, we examined the features that were consistently selected across all outer folds of the nested cross-validation in the top-performing pipeline, namely GCN with CHI2 feature selection, representing the most dominant and stable predictors. Figure 8 presents the top 20 commonly selected features from the original set of 1,186 diagnostic variables. The top-ranked features included “accretions on teeth,” “dental caries,” “malocclusion,” “chronic gingivitis,” and “acute gingivitis,” all of which reflect well-documented dental manifestations in MPS patients [46]. The model also highlighted musculoskeletal and developmental symptoms such as “myalgia,” “congenital anomalies of skull and face bones,” “expressive language disorder,” and “other developmental speech disorder,” consistent with primary skeletal and developmental involvement in MPS [18, 47, 48]. In addition, several respiratory-related conditions, including “acute pharyngitis,” “contact with and (suspected) exposure to other viral communicable disease,” “bacterial infection,” “nasal congestion,” “sinusitis,” “acute bronchitis,” and “pain in throat,” were identified as key discriminative features, reflecting the high prevalence of recurrent airway infections in MPS [49–52]. Other clinically relevant features, such as “iron deficiency anemia” and “xerosis cutis,” also contributed meaningfully to the model’s predictions [53, 54]. The top SHAP-identified features were cross-referenced with established literature and clinical expertise, and each was validated as a primary or secondary symptom associated with various MPS subtypes, as summarized in Table 6.

Fig. 8 — Top 20 dominant diagnostic features consistently selected across outer folds and ranked by SHAP importance for the GCN–CHI2 model

Table 6.

Clinical relevance of the top 20 SHAP-identified diagnostic features and their association with MPS symptomatology

Feature	Feature Type	Comments
Accretions on teeth	Secondary	Plaque and calculus buildup from oral structural changes [46].
Myalgia	Secondary	Secondary to skeletal deformities and muscle compression [47].
Iron deficiency anemia	Secondary	Arises from GAG burden, GI dysfunction, and inflammation [53].
Malocclusion	Secondary	All 7 MPS subtypes present with malocclusion as a result of skeletal dysplasia and as a result, jaw abnormalities [46].
Dental caries on smooth surface penetrating into dentin	Secondary	MPS patients can have increased caries risk due to oral motor dysfunction, enamel defects, and difficulty with oral hygiene [46].
Acute pharyngitis due to other specified organisms	Primary	Reported in multiple MPS subtypes [49, 52].
Contact with and (suspected) exposure to other viral communicable disease	Secondary	MPS patients are immunocompromised so viral infections cause severe presentations and are associated with a worse prognosis [50, 51].
Chronic gingivitis, plaque induced	Secondary	Periodontal disease from GAG deposition and plaque [46].
Congenital anomalies of skull and face bones	Primary	Facial coarsening, prominent eyebrows and frontal bossing [48].
Dental caries on pit and fissure surface penetrating into dentin	Secondary	MPS patients can have increased caries risk due to oral motor dysfunction, enamel defects, and difficulty with oral hygiene [46].
Acute gingivitis, plaque induced	Secondary	Periodontal disease from GAG deposition and plaque [46].
Xerosis cutis	Secondary	Could be related to nonspecific cutaneous involvement in MPS due to dermal GAG accumulation leading to altered skin structure including dryness [54].
Dental caries extending into pulp	Secondary	MPS patients can have increased caries risk due to oral motor dysfunction, enamel defects, and difficulty with oral hygiene [46].
Other developmental speech disorder	Primary	Linked to cognitive and behavioral issues in some subtypes [18].
Bacterial infection	Secondary	Might be related to high incidence of ENS infection in MPS patients [49, 52].
Expressive language disorder	Primary	Linked to cognitive and behavioural issues in some subtypes [18].
Acute pharyngitis	Primary	Reported in multiple MPS subtypes [49].
Nasal congestion	Primary	Small nasal passages and increased secretions [49].
Acute bronchitis	Secondary	Acute bronchitis may occur in MPS patients due to underlying airway abnormalities that predispose to infection of the bronchi. [52].
Pain in throat	Secondary	Common ENT issues due to malformation and GAG buildup [49, 52].

Open in a new tab

We further assessed the consistency of the top-ranked features across outer folds to evaluate their robustness. Features that appeared in all folds were considered stable and more likely to represent genuine disease-related signals, whereas features present in only a subset of folds were interpreted as variable, potentially reflecting sampling fluctuations associated with the limited cohort size. A summary of the stable and variable features is presented in Supplementary Table S5, where we report the top 20 features ranked by their mean SHAP importance for clarity. A complete list of all stable and variable features across folds is provided in Additional Data File 1.

In addition to the global feature-level insights obtained with KernelSHAP, the patient-level graph-topology explanations generated with PGExplainer revealed coherent and compact subgraphs that provided each prediction. Across all five outer folds, PGExplainer consistently identified sparse neighborhoods of influential patient-to-patient connections that differed between correctly classified Control and MPS cases. As illustrated in Fig. 9, each fold includes representative explanations for one Control and one MPS patient, showing a clear qualitative distinction between the two classes. Control subjects typically exhibited more diffuse explanatory neighborhoods, suggesting that their predictions rely on broader but less tightly interconnected patient relationships. In contrast, MPS patients consistently presented compact, densely connected subgraphs, indicating that the model leverages localized and structurally cohesive relational patterns when identifying disease cases. This recurrent topological contrast across all folds complements the global diagnostic feature rankings derived from KernelSHAP and provides a mechanistic view of how patient-to-patient similarity patterns contribute to the model’s decision-making process.

Fig. 9 — Graph-level explanations generated by the best-performing model across the five outer folds ((a) – (e)) for randomly selected control and MPS nodes. (a) Fold 01. (b) Fold 02. (c) Fold 03. (d) Fold 04. (e) Fold 05

Discussion

MPS disease is a rare inherited metabolic disorder caused by the deficiency of lysosomal enzymes essential for the degradation of GAGs, leading to the accumulation of GAG in cells and subsequent systemic symptoms [55]. MPS, like other RDs, exhibits low prevalence worldwide, which in turn demonstrates considerable variations among various populations influenced by their ethnicity [56]. Although the low prevalence of MPS poses significant challenges in early diagnosis, historical EHRs could serve as a valuable resource for identifying diagnostic patterns, enabling the development of a data-driven screening tool.

The innovations in GNNs with the growing availability of EHR data have advanced research efforts in the early diagnosis of challenging diseases, particularly rare diseases. Moreover, the application of AI in MPS remains limited, with the studies reported relying on biomarker, enzyme, or GAG data and using a traditional machine learning algorithm [17]. While prior approaches achieved high accuracy and interpretability, they were constrained by small sample sizes, reliance on simple machine learning models, and limited generalizability beyond the studied cohorts. In contrast, our work expands the diagnostic scope beyond isolated biomarkers by leveraging EHR-derived data and modeling complex patient relationships and comorbidity structures. Specifically, we explore how GNNs combined with different feature engineering strategies can provide a more robust and accurate framework for detecting MPS patients. We implemented a thorough detection pipeline utilizing four GNN models in combination with five different automated and domain expert-driven feature selection techniques for discriminating MPS conditions from controls based on historical medical records covering a wide range of symptomatic manifestations provided by the SEHA healthcare network for patients in the UAE. Moreover, we optimized and validated the performance of the proposed predictive diagnostic approach using a nested cross-validation scheme to conduct hyperparameter optimization for each model’s parameters and further validate the optimized models on unbiased and previously unseen test data.

For the diagnosis of the MPS using GNNs with SEHA electronic medical records, the GCN model with feature set selected by chi-square FS method reported a superior performance in effectively distinguishing between control and MPS patients with an AUC of 0.97 and F1 score of 0.93 [95% CI (0.91, 1.00)] demonstrating its robust capability to handle graph-structured EHR data for the early diagnosis of MPS disease. The GAT model also showed strong performance, particularly with the original feature set and features selected by the RFE method. The SAGE model, on the other hand, consistently underperformed across the different feature selection methods, which could be attributed to their limitations in handling small datasets, which is the case with our MPS dataset. This leads us to the necessity of scaling the dataset and adding more patients and clinical features, which would potentially improve the performance of the SAGE model. Although the CCN model outperformed the SAGE model, it showed lower performance compared to the GCN and GAT models. Concerning the utilization of the various medical diagnoses feature sets by the GNN models, although using all the feature sets provided a good overall performance with GCN, GAT, and SAGE models, the other feature selection techniques provided valuable insights into the importance of specific features in predicting MPS cases. For each GNN model, a different FS method outperforms the other selection methods based on the GNN model employed, where CHI2, RFE, and Lasso techniques outperform the other FS methods when applied with the GCN and SAGE, GAT, and CCN detection models, respectively. Additionally, the GNN models, particularly GCN and GAT models, applied to the graphs constructed using features selected based on domain expert knowledge, achieved an improvement in performance.

By analyzing the feature importance of the constructed graph’s edges, we identified key factors such as dental manifestation and respiratory-associated issues as pivot factors for characterizing the MPS disease. Moreover, the identified biomarkers by the best graph-based pipeline, GCN-CHI2, have been validated by experts in the domain, confirming their clinical relevance to the MPS disease. These patterns emphasize that osteomuscular clinical conditions, dental symptoms, respiratory, and weight factors are critical for diagnosing MPS patients. Our GNN model identified “accretions on teeth” as an important feature of MPS, which, in addition to other dental manifestations, is a common secondary clinical feature. Dental accretions are described as a high buildup of plaque and calculus and might be related to the fact that MPS patients have an increased risk of caries and periodontal disease due to oral deformities as a result of GAG accumulation, to leads to facial dysmorphia [46]. Our model also highlighted additional dental manifestations as key factors associated with MPS disease, including “malocclusion”, “dental caries”, “chronic and acute gingivitis”, and “accretions on teeth”, which is aligned with the literature indicating that MPS patients exhibit varied dental implications [46]. The study also highlights osteomuscular features such as “myalgia” and “congenital anomalies of skull and face bones” with the latter being a primary developmental feature observed on all MPS patients and “myalgia” being a secondary symptom resulting from progressive skeletal and joint abnormalities [48, 53]. MPS patients often experience severe anemia, particularly in MPS Plus Syndrome, which can be seen as one of the top features identified by our model [57]. Moreover, the high incidence of infectious and respiratory complications in patients with MPS I suggests a significant prevalence of bacterial infections, potentially linked to immune system defects, where the GAG accumulation may promote microbial growth and affect immune response, leading to increased susceptibility to infections [49, 52]. The respiratory manifestations, such as “acute pharyngitis”, “nasal congestion”, “sinusitis”, “acute bronchitis”, and “pain in throat”, have also been identified as core components for MPS diagnosis. This aligns with studies such as [49] and [52] that demonstrate the otorhinolaryngological manifestations of MPS patients due to underlying airway abnormalities that predispose subjects to infections and infection-related respiratory problems. Another key set of features captured by the model are cognitive and behavioral features such as “expressive language disorder” and “other developmental speech disorder” that are primary clinical presentations in most MPS subtypes [18]. Instead of establishing a direct linear relationship between individual symptoms and the disease, GNNs capture complex interactions among multiple key features, including both core and co-occurring diagnostic indicators. These insights are valuable for MPS detection and intervention strategies.

Beyond global feature attribution, the graph-topology explanations generated with PGExplainer offered an additional interpretability layer by revealing how patient-to-patient connectivity patterns influenced individual predictions. Across all five outer folds, MPS patients consistently exhibited compact and densely connected explanatory subgraphs, indicating that the model relied on tightly clustered relational neighborhoods when assigning an MPS label. In contrast, Control subjects were characterized by more diffuse and weakly structured explanatory neighborhoods, suggesting less dependence on localized relational patterns. These topological distinctions reinforce the SHAP-derived feature insights and demonstrate that the model utilizes both symptom-level indicators and coherent relational structures within the patient graph. Such complementary interpretability enhances trust in the diagnostic framework and provides a mechanistic understanding of how patient similarity patterns contribute to rare disease detection.

The key diagnostic features identified by the top-performing GNN model and those provided by the domain expert exhibit both overlaps and distinctions, providing valuable insights into the automated model’s capability to capture relevant medical patterns for MPS diagnosis. Both feature sets highlight “accretions on teeth,” “dental caries,” “acute gingivitis,” “chronic gingivitis,” “acute pharyngitis,” “acute bronchitis,” and “nasal congestion,” suggesting that dental health and respiratory issues are significant in MPS detection. However, the domain expert’s set includes more systemic MPS traits such as “hearing loss”, “developmental disorders”, “macrocephaly”, “abnormal gait”, and “short stature”, whereas our model emphasizes primary osteomuscular and neurological features such as “myalgia” and “congenital anomalies of skull and face bones,” and broader secondary features such as “iron deficiency anemia”. Notably, both feature sets include “expressive language disorder,” showcasing its diagnostic importance. Overall, our best-performing GCN model with the CHI2 automated feature selection method managed to effectively identify key indicators along with secondary clinical presentation of MPS disease for more accurate and efficient diagnosis.

Despite the promising findings, several limitations should be acknowledged. First, the dataset was relatively small and derived from a single healthcare provider, which may restrict the generalizability of the model and increase the risk of overfitting. Although the nested cross-validation framework mitigates optimistic bias by providing an unbiased performance estimate, a limited sample size inherently introduces variability in both model performance and feature importance rankings. As a result, the stability of identified diagnostic features cannot be guaranteed without validation in larger, multi-center cohorts. Furthermore, because MPS is an ultra-rare disease, all subtypes were combined to achieve a sufficient sample size. While necessary, this aggregation may mask subtype-specific phenotypic differences and limit the model’s ability to capture distinct clinical signatures associated with each subtype. In addition, although GNNs offer strong representational power for modeling patient similarity structures, their interpretability remains less straightforward than traditional machine-learning models, and some uncertainty persists due to wide confidence intervals caused by the small number of positive cases. Overall, these limitations highlight the need for future studies incorporating larger, multi-institutional EHR datasets, external validation across diverse populations, and additional clinical modalities to improve robustness, interpretability, and translational readiness.

Conclusion

In this study, we introduced a novel graph-based diagnostic framework for the early diagnosis of MPS using patients’ historical medical diagnoses extracted from real-world EHR data. By evaluating multiple GNN architectures across a diverse set of feature subsets derived from both automated feature selection methods and domain expert knowledge, we demonstrated that the combination of the chi-square feature selection method and the GCN model achieved the most accurate and consistent performance under nested cross-validation. The model effectively identified clinically meaningful diagnostic features, aligning with established manifestations of MPS and highlighting the capacity of GNN-driven feature selection to capture subtle yet informative disease patterns. Beyond global feature attribution, the integration of explainable AI components, namely SHAP for symptom-level importance and PGExplainer for patient-level graph-topology insights, provided complementary interpretability by revealing both which diagnostic features influence predictions and how patient-to-patient relationships shape model decisions. These findings underscore the potential of graph-based models to serve as efficient and cost-effective screening tools for rare diseases by leveraging routinely collected, non-invasive EHR data. Nevertheless, the study is limited by the small sample size and single-institution dataset, which resulted in wide confidence intervals and may restrict the generalizability of the findings. Larger, multi-center cohorts are needed for external validation, along with richer multimodal clinical data. Future research should explore scaling the framework to subtype-specific MPS classification and integrating additional data modalities to enhance robustness and clinical utility.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(4.9KB, csv)}

Author contributions

RF conceptualized the study, performed data curation, formal analysis, and developed the methodology. She also validated the results, created visualizations, and was the main author in writing and editing the manuscript. NT contributed to writing and editing the manuscript. MS contributed to the result validation and was a co-author in editing the manuscript. FA acquired funding, managed the project, validated results, and was a co-author in editing the manuscript. AA conceptualized the study, performed data curation and formal analysis, acquired funding, managed and supervised the project, validated results, and was a senior author in editing the manuscript.

Funding

This research was supported by Khalifa University and United Arab Emirates University [grant number KU-UAEU-2023–068]. The funding sources had no role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Data availability

The data that support the findings of this study are available from the Department of Health (DOH), Abu Dhabi, UAE (medical.research@doh.gov.ae), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the DOH.

Code availability

The analysis code used in this study involves data processing modules and scripts that are tightly coupled with the SEHA clinical data and, therefore, cannot be made publicly available due to institutional data governance and privacy restrictions. However, all methodological steps, pseudo-code descriptions, model configurations, and hyperparameter search spaces have been fully detailed within the manuscript to facilitate reproducibility of the pipeline. The code may be shared with academic researchers upon reasonable request and subject to approval under Khalifa University and SEHA data governance policies.

Declarations

Ethics approval and consent to participate

This study was approved by the Institutional Review Board of the Department of Health, Abu Dhabi, UAE (Approval No. IRB DOH/CMDC/2021/406). The board waived the requirement for individual informed consent. All investigators accessed only anonymized patient data. The study was performed in compliance with relevant laws and regulations governing research in the Emirate of Abu Dhabi, UAE.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Lee J, Liu C, Kim J, Chen Z, Sun Y, Rogers JR, et al. Deep learning for rare disease: a scoping review. medRxiv. 2022. 10.1101/2022.06.29.22277046. [DOI] [PubMed] [Google Scholar]
2.Alsentzer E, Li MM, Kobren SN, Network UD, Kohane IS, Zitnik M. Deep learning for diagnosing patients with rare genetic diseases. medRxiv. 2022. 10.1101/2022.12.07.22283238. [Google Scholar]
3.Phillips C, Parkinson A, Namsrai T, Chalmers A, Dews C, Gregory D, et al. Time to diagnosis for a rare disease: managing medical uncertainty. A qualitative study. Orphanet J Rare Dis. 2024;19(1):297. 10.1186/s13023-024-03319-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):689. 10.1186/s12909-023-04698-z. Accessed 2025-02-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Raghunath S, Pfeifer JM, Ulloa-Cerna AE, Nemani A, Carbonati T, Jing L, et al. Deep neural Networks can predict New-onset atrial fibrillation from the 12-Lead ECG and Help identify those at risk of atrial fibrillation–related stroke. Circulation. 2021;143(13):1287–98. 10.1161/CIRCULATIONAHA.120.047829. American Heart Association. Accessed 2025-02-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wang H, Avillach P. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: genotype-based Deep learning. JMIR Med Inf. 2021;9(4):24754. 10.2196/24754. Accessed 2025-03-5. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
7.Sjövall F, Lanckohr C, Bracht H. What’s new in therapeutic drug monitoring of antimicrobials? Intensive Care Med. 2023;49(7):857–59. 10.1007/s00134-023-07060-5. Accessed 2025 02 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Johnson KB, Wei W-Q, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al. Precision medicine, AI, and the future of personalized Health care. Clin Transl Sci. 2021;14(1):86–93. 10.1111/cts.12884. Accessed 2025-02-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.John TJL-S, Kanwar O, Abidi E, Nekidy WE, Piechowski-Jozwiak B. Towards artificial intelligence-based disease prediction algorithms that comprehensively leverage and continuously learn from real-world clinical tabular data systems. PLoS Digit Health. 2024;3(9):0000589. 10.1371/journal.pdig.0000589. Public Library of Science. Accessed 2025-02-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Visibelli A, Roncaglia B, Spiga O, Santucci A. The impact of artificial intelligence in the Odyssey of rare diseases. Biomedicines. 2023;11(3):887. 10.3390/biomedicines11030887. Accessed 2025-02-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Oss Boll H, Amirahmadi A, Ghazani MM, Morais WOD, Freitas EPD, Soliman A, et al. Graph neural networks for clinical risk prediction based on electronic health records: a survey. J Biomed Inf. 2024;151:104616. 10.1016/j.jbi.2024.104616. Accessed 2025-02-21. [DOI] [PubMed] [Google Scholar]
12.Weng W-H, Szolovits P. Representation learning for electronic Health records. arXiv. arXiv: 1909.09248 [cs]. 2019. 10.48550/arXiv.1909.09248. . Accessed 2025-02-21.
13.Zong N, Ngo V, Stone DJ, Wen A, Zhao Y, Yu Y, et al. Leveraging genetic reports and electronic Health records for the prediction of primary cancers: algorithm development and validation study. JMIR Med Inf. 2021;9(5):23586. 10.2196/23586. Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada. Accessed 2025-02-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lu H, Uddin S. A weighted patient network-based framework for predicting chronic diseases using graph neural networks. Sci Rep. 2021;11:22607. 10.1038/s41598-021-01964-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Sun Z, Yin H, Chen H, Chen T, Cui L, Yang F. Disease prediction via graph neural Networks. IEEE J Biomed Health Inf. 2021;25:818–26. 10.1109/JBHI.2020.3004143. [DOI] [PubMed] [Google Scholar]
16.Germain DP, et al. Applying artificial intelligence to rare diseases: a literature review highlighting lessons from fabry disease. Orphanet J Rare Dis. 2025;20(1):61. 10.1186/s13023-025-03655-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kadali S, Naushad SM, Devi ARR, Bodiga VL. Biochemical, machine learning and molecular approaches for the differential diagnosis of mucopolysaccharidoses. Mol Cellular Biochem. 2019;458:27–37. 10.1007/s11010-019-03527-6. [DOI] [PubMed] [Google Scholar]
18.Zhou J, Lin J, Leung WT, Wang L. A basic understanding of mucopolysaccharidosis: Incidence, clinical features, diagnosis, and management. Intractable Rare Dis Res. 2020;9:1–9. 10.5582/irdr.2020.01011. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kobayashi H. Recent trends in mucopolysaccharidosis research. J Hum Genet. 2019;64(2):127–37. 10.1038/s10038-018-0534-8. [DOI] [PubMed] [Google Scholar]
20.M K, Chakraborty A, Cr*, Atre A, Shivakumar P. Mucopolysaccharidoses: an overview and new treatment modalities. Int J Clin Biochem Res. 2023;10(2):101–09. 10.18231/j.ijcbr.2023.016.
21.Al-Jasmi FA, Tawfig N, Berniah A, Ali BR, Taleb M, Hertecant JL, et al. Prevalence and novel mutations of lysosomal storage disorders in United Arab Emirates. In: Zschocke J, Gibson KM, Brown G, Morava E, Peters V, editors. JIMD reports. Vol. 10. Berlin, Heidelberg: Springer; 2013. p. 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Celik B, Tomatsu SC, Tomatsu S, Khan SA. Epidemiology of mucopolysaccharidoses update. Diagnostics. 2021;11(2):273. 10.3390/diagnostics11020273. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Moammar H, Cheriyan G, Mathew R, Al QSN. Incidence and patterns of inborn errors of metabolism in the Eastern Province of Saudi Arabia, 1983-2008. Ann Saudi Med. 2010;30(4):271–77. 10.4103/0256-4947.65254. King Faisal Specialist Hospital & Research Centre. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Poorthuis BJHM, Wevers RA, Kleijer WJ, Groener JEM, Jong JGN, Weely S, et al. The frequency of lysosomal storage diseases in the Netherlands. Hum Genet. 1999;105(1):151–56. 10.1007/s004399900075. Accessed 2025-02-10. [DOI] [PubMed] [Google Scholar]
25.Pinto R, Caseiro C, Lemos M, Lopes L, Fontes A, Ribeiro H, et al. Prevalence of lysosomal storage diseases in Portugal. Eur J Hum Genet. 2004;12(2):87–92. 10.1038/sj.ejhg.5201044. Nature Publishing Group. Accessed 2025-02-10. [DOI] [PubMed] [Google Scholar]
26.Deng X, Li Y, Weng J, Zhang J. Feature selection for text classification: a review. Multimedia Tools Appl. 2019;78:3797–816. 10.1007/s11042-018-6083-5. [Google Scholar]
27.Abdo A, Mostafa R, Abdel-Hamid L. An optimized hybrid approach for feature selection based on chi-square and particle swarm optimization algorithms. Data. 2024;9(2):20. 10.3390/data9020020. Accessed 2025-02-11. [Google Scholar]
28.Vergara JR, Estévez, PA. A review of feature selection methods based on mutual information. Neural Comput Appl. 2014;24(1):175–86. 10.1007/s00521-013-1368-0. Accessed 2025-02-11. [Google Scholar]
29.Ai C. A method for cancer genomics feature selection based on LASSO-RFE. Iran J Sci Technol, Trans A: Sci. 2022;46(3):731–38. 10.1007/s40995-022-01292-8. Accessed 2025-02-11. [Google Scholar]
30.Awad M, Fraihat S. Recursive feature elimination with cross-validation with decision tree: feature selection method for machine learning-based intrusion detection Systems. J Sens Actuator Networks. 2023;12(5):67. 10.3390/jsan12050067. Multidisciplinary Digital Publishing Institute. Accessed 2025-02-11. [Google Scholar]
31.Elsevier. Genetic algorithm – an overview. https://www.sciencedirect.com/topics/engineering/genetic-algorithm. ScienceDirect Topics.
32.Elsevier. Bat algorithm – an overview. https://www.sciencedirect.com/topics/computer-science/bat-algorithm. ScienceDirect Topics.
33.Hastie T, Tibshirani R, Friedman J. Prototype methods and nearest-neighbors. In: Hastie T, Tibshirani R, Friedman J, editors. The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer; 2009. p. 459–83. [Google Scholar]
34.Kipf TN, Welling M. Semi-supervised classification with graph Convolutional Networks. arXiv. 2017. 10.48550/arXiv.1609.02907.
35.Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention Networks. arXiv. 2018. 10.48550/arXiv.1710.10903.
36.Hamilton WL, Ying R, Leskovec J. Inductive representation learning on large graphs. arXiv. 2018. 10.48550/arXiv.1706.02216. [Google Scholar]
37.Defferrard M, Bresson X, Vandergheynst P. Convolutional neural Networks on graphs with fast localized spectral filtering. arXiv. 2017. 10.48550/arXiv.1606.09375.
38.Zhao L, Akoglu L. PairNorm: tackling oversmoothing in GNNs. arXiv. 2020. 10.48550/arXiv.1909.12223.
39.Wainer J, Cawley G. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl. 2021;182:115222. 10.1016/j.eswa.2021.115222. [Google Scholar]
40.Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD’19. New York, NY, USA: Association for Computing Machinery; 2019, pp. 2623–31. 10.1145/3292500.3330701.
41.Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11. Red Hook, NY, USA: Curran Associates Inc; 2011, pp. 2546–54.
42.Wood M. Bootstrapped confidence intervals as an approach to statistical inference. Organizational Res Methods. 2005;8(4):454–70. 10.1177/1094428105280059. SAGE Publications Inc. Accessed 2025-02-11.
43.Altman DG, Bland JM. Diagnostic tests 1: sensitivity and specificity. Br Med J (BMJ). 1994;308(6943):1552. 10.1136/bmj.308.6943.1552. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Sundararajan M, Najmi A. The many Shapley values for model explanation. Proceedings of the 37th International Conference on Machine Learning. ICML’20. JMLR.org, ???; 2020, pp. 9269–78, vol. 119.
45.Luo D, Cheng W, Xu D, Yu W, Zong B, Chen H, et al. Parameterized explainer for graph neural network. Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Red Hook, NY, USA: Curran Associates Inc; 2020, pp. 19620–31.
46.Hirst L, Mubeen S, Abou-Ameira G, Chakrapani A. Mucopolysaccharidosis (MPS): review of the literature and case series of five pediatric dental patients. Clin Case Rep. 2021;9(3):1704–10. 10.1002/ccr3.3885. Accessed 2025-02-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Stepien KM, Bentley A, Chen C, Dhemech MW, Gee E, Orton P, et al. Non-cardiac manifestations in adult patients with mucopolysaccharidosis. Front Cardiovasc Med. 2022;9:839391. 10.3389/fcvm.2022.839391. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.D’Souza A, Ryan E, Sidransky E. Facial features of lysosomal storage disorders. Expert Rev Endocrinol Metab. 2022;17(6):467–74. 10.1080/17446651.2022.2144229. Epub 2022 Nov 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Waśniewska-Włodarczyk A, Pepaś R, Rosiak O, Konopka W. Otorhinolaryngological problems in mucopolysaccharidoses: a review of common symptoms in a rare disease. Brain Sci. 2024;14(11):1085. 10.3390/brainsci14111085. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Mandolfo O, Colzani M, Costanzo F, et al. Innate immunity in mucopolysaccharide diseases. Mol Genet Metab. 2022;135:93–112. 10.1016/j.ymgme.2022.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Kilavuz S, et al. Real-world patient data on immunity and covid-19 status of patients with mucopolysaccharidosis, gaucher and pompe diseases from Turkey. Mol Genet Metab Rep. 2022;33:100867. 10.1016/j.ymgmr.2022.100867. [Google Scholar]
52.Berger KI, Fagondes SC, Giugliani R, Hardy KA, Lee KS, McArdle C, et al. Respiratory and sleep disorders in mucopolysaccharidosis. J Inherited Metabolic Disease. 2013;36(2):201–10. 10.1007/s10545-012-9555-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Kaciulyte G, Yorulmaz G, Sharma R, Jones SA, Wynn R, Church H, et al. Iron metabolism and hematological abnormalities in adult patients affected with mucopolysaccharidoses. Mol Genet Metab Rep. 2025. 10.1016/j.ymgmr.2025.101243. eCollection 2025 Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Tran K, Lam JM. Cutaneous manifestations of the mucopolysaccharidoses. Paediatric Respir Rev. 2016;20:25–30. 10.1016/j.prrv.2016.02.002. [Google Scholar]
55.Muenzer J. Overview of the mucopolysaccharidoses. Rheumatology. 2011;50(suppl_5):4–12. 10.1093/rheumatology/ker394. Accessed 2025-02-4. [DOI] [PubMed] [Google Scholar]
56.Borges P, Pasqualim G, Giugliani R, Vairo F, Matte U. Estimated prevalence of mucopolysaccharidoses from population-based exomes and genomes. Orphanet J Rare Dis. 2020;15(1):324. 10.1186/s13023-020-01608-0. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Sofronova V, Iwata R, Moriya T, Loskutova K, Gurinova E, Chernova M, et al. Hematopoietic disorders, renal impairment and growth in mucopolysaccharidosis-plus syndrome. Int J Mol Sci. 2022;23(10):5851. 10.3390/ijms23105851. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(4.9KB, csv)}

Data Availability Statement

[CR1] 1.Lee J, Liu C, Kim J, Chen Z, Sun Y, Rogers JR, et al. Deep learning for rare disease: a scoping review. medRxiv. 2022. 10.1101/2022.06.29.22277046. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Alsentzer E, Li MM, Kobren SN, Network UD, Kohane IS, Zitnik M. Deep learning for diagnosing patients with rare genetic diseases. medRxiv. 2022. 10.1101/2022.12.07.22283238. [Google Scholar]

[CR3] 3.Phillips C, Parkinson A, Namsrai T, Chalmers A, Dews C, Gregory D, et al. Time to diagnosis for a rare disease: managing medical uncertainty. A qualitative study. Orphanet J Rare Dis. 2024;19(1):297. 10.1186/s13023-024-03319-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):689. 10.1186/s12909-023-04698-z. Accessed 2025-02-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Raghunath S, Pfeifer JM, Ulloa-Cerna AE, Nemani A, Carbonati T, Jing L, et al. Deep neural Networks can predict New-onset atrial fibrillation from the 12-Lead ECG and Help identify those at risk of atrial fibrillation–related stroke. Circulation. 2021;143(13):1287–98. 10.1161/CIRCULATIONAHA.120.047829. American Heart Association. Accessed 2025-02-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Wang H, Avillach P. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: genotype-based Deep learning. JMIR Med Inf. 2021;9(4):24754. 10.2196/24754. Accessed 2025-03-5. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[CR7] 7.Sjövall F, Lanckohr C, Bracht H. What’s new in therapeutic drug monitoring of antimicrobials? Intensive Care Med. 2023;49(7):857–59. 10.1007/s00134-023-07060-5. Accessed 2025 02 15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Johnson KB, Wei W-Q, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al. Precision medicine, AI, and the future of personalized Health care. Clin Transl Sci. 2021;14(1):86–93. 10.1111/cts.12884. Accessed 2025-02-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.John TJL-S, Kanwar O, Abidi E, Nekidy WE, Piechowski-Jozwiak B. Towards artificial intelligence-based disease prediction algorithms that comprehensively leverage and continuously learn from real-world clinical tabular data systems. PLoS Digit Health. 2024;3(9):0000589. 10.1371/journal.pdig.0000589. Public Library of Science. Accessed 2025-02-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Visibelli A, Roncaglia B, Spiga O, Santucci A. The impact of artificial intelligence in the Odyssey of rare diseases. Biomedicines. 2023;11(3):887. 10.3390/biomedicines11030887. Accessed 2025-02-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Oss Boll H, Amirahmadi A, Ghazani MM, Morais WOD, Freitas EPD, Soliman A, et al. Graph neural networks for clinical risk prediction based on electronic health records: a survey. J Biomed Inf. 2024;151:104616. 10.1016/j.jbi.2024.104616. Accessed 2025-02-21. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Weng W-H, Szolovits P. Representation learning for electronic Health records. arXiv. arXiv: 1909.09248 [cs]. 2019. 10.48550/arXiv.1909.09248. . Accessed 2025-02-21.

[CR13] 13.Zong N, Ngo V, Stone DJ, Wen A, Zhao Y, Yu Y, et al. Leveraging genetic reports and electronic Health records for the prediction of primary cancers: algorithm development and validation study. JMIR Med Inf. 2021;9(5):23586. 10.2196/23586. Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada. Accessed 2025-02-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Lu H, Uddin S. A weighted patient network-based framework for predicting chronic diseases using graph neural networks. Sci Rep. 2021;11:22607. 10.1038/s41598-021-01964-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Sun Z, Yin H, Chen H, Chen T, Cui L, Yang F. Disease prediction via graph neural Networks. IEEE J Biomed Health Inf. 2021;25:818–26. 10.1109/JBHI.2020.3004143. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Germain DP, et al. Applying artificial intelligence to rare diseases: a literature review highlighting lessons from fabry disease. Orphanet J Rare Dis. 2025;20(1):61. 10.1186/s13023-025-03655-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Kadali S, Naushad SM, Devi ARR, Bodiga VL. Biochemical, machine learning and molecular approaches for the differential diagnosis of mucopolysaccharidoses. Mol Cellular Biochem. 2019;458:27–37. 10.1007/s11010-019-03527-6. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Zhou J, Lin J, Leung WT, Wang L. A basic understanding of mucopolysaccharidosis: Incidence, clinical features, diagnosis, and management. Intractable Rare Dis Res. 2020;9:1–9. 10.5582/irdr.2020.01011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Kobayashi H. Recent trends in mucopolysaccharidosis research. J Hum Genet. 2019;64(2):127–37. 10.1038/s10038-018-0534-8. [DOI] [PubMed] [Google Scholar]

[CR20] 20.M K, Chakraborty A, Cr*, Atre A, Shivakumar P. Mucopolysaccharidoses: an overview and new treatment modalities. Int J Clin Biochem Res. 2023;10(2):101–09. 10.18231/j.ijcbr.2023.016.

[CR21] 21.Al-Jasmi FA, Tawfig N, Berniah A, Ali BR, Taleb M, Hertecant JL, et al. Prevalence and novel mutations of lysosomal storage disorders in United Arab Emirates. In: Zschocke J, Gibson KM, Brown G, Morava E, Peters V, editors. JIMD reports. Vol. 10. Berlin, Heidelberg: Springer; 2013. p. 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Celik B, Tomatsu SC, Tomatsu S, Khan SA. Epidemiology of mucopolysaccharidoses update. Diagnostics. 2021;11(2):273. 10.3390/diagnostics11020273. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Moammar H, Cheriyan G, Mathew R, Al QSN. Incidence and patterns of inborn errors of metabolism in the Eastern Province of Saudi Arabia, 1983-2008. Ann Saudi Med. 2010;30(4):271–77. 10.4103/0256-4947.65254. King Faisal Specialist Hospital & Research Centre. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Poorthuis BJHM, Wevers RA, Kleijer WJ, Groener JEM, Jong JGN, Weely S, et al. The frequency of lysosomal storage diseases in the Netherlands. Hum Genet. 1999;105(1):151–56. 10.1007/s004399900075. Accessed 2025-02-10. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Pinto R, Caseiro C, Lemos M, Lopes L, Fontes A, Ribeiro H, et al. Prevalence of lysosomal storage diseases in Portugal. Eur J Hum Genet. 2004;12(2):87–92. 10.1038/sj.ejhg.5201044. Nature Publishing Group. Accessed 2025-02-10. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Deng X, Li Y, Weng J, Zhang J. Feature selection for text classification: a review. Multimedia Tools Appl. 2019;78:3797–816. 10.1007/s11042-018-6083-5. [Google Scholar]

[CR27] 27.Abdo A, Mostafa R, Abdel-Hamid L. An optimized hybrid approach for feature selection based on chi-square and particle swarm optimization algorithms. Data. 2024;9(2):20. 10.3390/data9020020. Accessed 2025-02-11. [Google Scholar]

[CR28] 28.Vergara JR, Estévez, PA. A review of feature selection methods based on mutual information. Neural Comput Appl. 2014;24(1):175–86. 10.1007/s00521-013-1368-0. Accessed 2025-02-11. [Google Scholar]

[CR29] 29.Ai C. A method for cancer genomics feature selection based on LASSO-RFE. Iran J Sci Technol, Trans A: Sci. 2022;46(3):731–38. 10.1007/s40995-022-01292-8. Accessed 2025-02-11. [Google Scholar]

[CR30] 30.Awad M, Fraihat S. Recursive feature elimination with cross-validation with decision tree: feature selection method for machine learning-based intrusion detection Systems. J Sens Actuator Networks. 2023;12(5):67. 10.3390/jsan12050067. Multidisciplinary Digital Publishing Institute. Accessed 2025-02-11. [Google Scholar]

[CR31] 31.Elsevier. Genetic algorithm – an overview. https://www.sciencedirect.com/topics/engineering/genetic-algorithm. ScienceDirect Topics.

[CR32] 32.Elsevier. Bat algorithm – an overview. https://www.sciencedirect.com/topics/computer-science/bat-algorithm. ScienceDirect Topics.

[CR33] 33.Hastie T, Tibshirani R, Friedman J. Prototype methods and nearest-neighbors. In: Hastie T, Tibshirani R, Friedman J, editors. The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer; 2009. p. 459–83. [Google Scholar]

[CR34] 34.Kipf TN, Welling M. Semi-supervised classification with graph Convolutional Networks. arXiv. 2017. 10.48550/arXiv.1609.02907.

[CR35] 35.Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention Networks. arXiv. 2018. 10.48550/arXiv.1710.10903.

[CR36] 36.Hamilton WL, Ying R, Leskovec J. Inductive representation learning on large graphs. arXiv. 2018. 10.48550/arXiv.1706.02216. [Google Scholar]

[CR37] 37.Defferrard M, Bresson X, Vandergheynst P. Convolutional neural Networks on graphs with fast localized spectral filtering. arXiv. 2017. 10.48550/arXiv.1606.09375.

[CR38] 38.Zhao L, Akoglu L. PairNorm: tackling oversmoothing in GNNs. arXiv. 2020. 10.48550/arXiv.1909.12223.

[CR39] 39.Wainer J, Cawley G. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl. 2021;182:115222. 10.1016/j.eswa.2021.115222. [Google Scholar]

[CR40] 40.Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD’19. New York, NY, USA: Association for Computing Machinery; 2019, pp. 2623–31. 10.1145/3292500.3330701.

[CR41] 41.Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. Proceedings of the 24th International Conference on Neural Information Processing Systems. NIPS’11. Red Hook, NY, USA: Curran Associates Inc; 2011, pp. 2546–54.

[CR42] 42.Wood M. Bootstrapped confidence intervals as an approach to statistical inference. Organizational Res Methods. 2005;8(4):454–70. 10.1177/1094428105280059. SAGE Publications Inc. Accessed 2025-02-11.

[CR43] 43.Altman DG, Bland JM. Diagnostic tests 1: sensitivity and specificity. Br Med J (BMJ). 1994;308(6943):1552. 10.1136/bmj.308.6943.1552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Sundararajan M, Najmi A. The many Shapley values for model explanation. Proceedings of the 37th International Conference on Machine Learning. ICML’20. JMLR.org, ???; 2020, pp. 9269–78, vol. 119.

[CR45] 45.Luo D, Cheng W, Xu D, Yu W, Zong B, Chen H, et al. Parameterized explainer for graph neural network. Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Red Hook, NY, USA: Curran Associates Inc; 2020, pp. 19620–31.

[CR46] 46.Hirst L, Mubeen S, Abou-Ameira G, Chakrapani A. Mucopolysaccharidosis (MPS): review of the literature and case series of five pediatric dental patients. Clin Case Rep. 2021;9(3):1704–10. 10.1002/ccr3.3885. Accessed 2025-02-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Stepien KM, Bentley A, Chen C, Dhemech MW, Gee E, Orton P, et al. Non-cardiac manifestations in adult patients with mucopolysaccharidosis. Front Cardiovasc Med. 2022;9:839391. 10.3389/fcvm.2022.839391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.D’Souza A, Ryan E, Sidransky E. Facial features of lysosomal storage disorders. Expert Rev Endocrinol Metab. 2022;17(6):467–74. 10.1080/17446651.2022.2144229. Epub 2022 Nov 16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Waśniewska-Włodarczyk A, Pepaś R, Rosiak O, Konopka W. Otorhinolaryngological problems in mucopolysaccharidoses: a review of common symptoms in a rare disease. Brain Sci. 2024;14(11):1085. 10.3390/brainsci14111085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Mandolfo O, Colzani M, Costanzo F, et al. Innate immunity in mucopolysaccharide diseases. Mol Genet Metab. 2022;135:93–112. 10.1016/j.ymgme.2022.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Kilavuz S, et al. Real-world patient data on immunity and covid-19 status of patients with mucopolysaccharidosis, gaucher and pompe diseases from Turkey. Mol Genet Metab Rep. 2022;33:100867. 10.1016/j.ymgmr.2022.100867. [Google Scholar]

[CR52] 52.Berger KI, Fagondes SC, Giugliani R, Hardy KA, Lee KS, McArdle C, et al. Respiratory and sleep disorders in mucopolysaccharidosis. J Inherited Metabolic Disease. 2013;36(2):201–10. 10.1007/s10545-012-9555-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Kaciulyte G, Yorulmaz G, Sharma R, Jones SA, Wynn R, Church H, et al. Iron metabolism and hematological abnormalities in adult patients affected with mucopolysaccharidoses. Mol Genet Metab Rep. 2025. 10.1016/j.ymgmr.2025.101243. eCollection 2025 Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Tran K, Lam JM. Cutaneous manifestations of the mucopolysaccharidoses. Paediatric Respir Rev. 2016;20:25–30. 10.1016/j.prrv.2016.02.002. [Google Scholar]

[CR55] 55.Muenzer J. Overview of the mucopolysaccharidoses. Rheumatology. 2011;50(suppl_5):4–12. 10.1093/rheumatology/ker394. Accessed 2025-02-4. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Borges P, Pasqualim G, Giugliani R, Vairo F, Matte U. Estimated prevalence of mucopolysaccharidoses from population-based exomes and genomes. Orphanet J Rare Dis. 2020;15(1):324. 10.1186/s13023-020-01608-0. Accessed 2025-02-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Sofronova V, Iwata R, Moriya T, Loskutova K, Gurinova E, Chernova M, et al. Hematopoietic disorders, renal impairment and growth in mucopolysaccharidosis-plus syndrome. Int J Mol Sci. 2022;23(10):5851. 10.3390/ijms23105851. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Explainable AI-driven graph-based neural networks for mucopolysaccharidoses diagnosis

Ruba Fadul

Natnael Tumzghi

Mohamed Seghier

Fatma Al-Jasmi

Aamna AlShehhi

Abstract

Background

Methods

Results

Conclusion

Supplementary information

Background

Methods

Dataset acquisition and characteristics

Fig. 1.

Framework architecture design

Fig. 2.

Feature selection

Chi-square method

Mutual information method

Lasso regression method

Recursive feature elimination method

Genetic algorithm

Bat algorithm

Domain-specific knowledge method

Graph construction

Fig. 3.

GNN models

GCNs

GATs

GraphSAGE

CCNs

Nested cross-validation

Hyperparameter optimization

Table 1.

Table 2.

Table 3.

Table 4.

Models performance evaluation

Interpretability methods

Feature-level global ranking attribution

Graph-topology patient-level local explanations

Results

Performance evaluation

Table 5.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Interpretability methods

Fig. 8.

Table 6.

Fig. 9.

Discussion

Conclusion

Electronic supplementary material

Author contributions

Funding

Data availability

Code availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases