Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 7;15:34835. doi: 10.1038/s41598-025-21407-6

Multi-modal deep learning framework for early detection of Parkinson’s disease using neurological and physiological data for high-fidelity diagnosis

Ayan Sar 1,#, Pranav Singh Puri 1,#, Huma Naz 3,#, Sumit Aich 1,#, Tanupriya Choudhury 1,4,✉,#, Lubna Abdelkhreim Gabralla 2,#
PMCID: PMC12504576  PMID: 41057513

Abstract

Parkinson’s disease (PD) is a progressive neurodegenerative disorder that remained challenging for proper diagnosis in its early stages due to its heterogeneous symptom presentation and overlapping clinical features. Consequently, there is no consensus on effectively detecting early-stage PD and classifying motor symptom severity. Therefore, the proposed research introduced MultiParkNet, an avant-grade multi-modal deep learning framework for early-stage PD detection synthesizing diverse neurological and physiological data sources. The proposed system integrated audio speech patterns, motor skills drawing characteristics, neuroimaging data, and cardiovascular signals with different neural architectures for robust feature extraction and fusion. The probabilistic classification approach enhanced disease identification with high fidelity and early detection. The model demonstrated exceptional performance, with an average training accuracy of 99.67%, validation accuracy of 98.15% Inline graphic and test accuracy of 96.74% Inline graphic across cross-validation experiments. This novel architecture significantly improved diagnostic precision with a transformative, AI-driven approach for Parkinson’s disease assessment and potential clinical implications.

Keywords: Deep Learning, Early Diagnosis, Multi-Modal Fusion, Neurodegenerative Disease, Parkinson’s Detection

Subject terms: Preclinical research, Computer science

Introduction

Parkinson’s disease (PD) is one of the most prevalent neurodegenerative disorders, which affects millions of individuals around the globe every year1,2. The global prevalence of PD has increased substantially, more than double between 1990 and 2015. In 2016, an estimated 6.1 million individuals were affected worldwide3, and by 2021, this number had increased to 11.77 million, highlighting the growing impact of the disease4. This is characterized by progressive deterioration of motor functions, and PD was primarily attributed to the degeneration of dopaminergic neurons in the substantia nigra region of the brain5,6. Beyond motor impairments such as tremors7, bradykinesia8 and rigidity9, PD also manifested through non-motor symptoms, which included cognitive decline10, speech abnormalities11 and autonomic dysfunction12. Despite having its significant global impact, the early-stage diagnosis remained challenging due to the absence of definitive biomarkers1315 and the reliance on subjective clinical assessments16,17.

Early detection is further hindered by the subtlety and variability of initial symptoms, which often overlap with other neurological disorders, making clinical evaluation prone to misinterpretation and delayed diagnosis. Existing early PD detection studies have explored modalities such as neuroimaging (e.g., MRI, DAT-SPECT), speech and voice analysis, handwriting/motor task assessment, olfactory testing, and electrophysiological signals; however, each approach has its own limitations in the prodromal phase ranging from low sensitivity and high false-positive rates to high costs, invasiveness, and limited accessibility. These challenges highlight the need for diagnostic frameworks that are multimodal, non-invasive, clinically scalable, and capable of capturing heterogeneous disease manifestations at an early stage.

Traditional diagnostic approaches, such as clinical evaluations, neuroimaging and symptom-based assessments, were often inadequate for detecting diseases in the earliest stages, leading to delays in intervention and disease management18,19. The findings underscored the urgent need for advanced computational methodologies that are capable of enhancing the accuracy20,21 and reliability of PD diagnosis2224.

The emergence of artificial intelligence (AI) and deep learning has revolutionized medical diagnostics by enabling automated analysis of complex biomedical data25. Recent advancements in deep learning architectures, which included Convolutional Neural Networks (CNNs)2628, Long Short-Term Memory (LSTM) networks29,30, and Transformer-based models, have demonstrated remarkable potential in pattern recognition and feature extraction across diversified medical domains31,32. However, most existing AI-driven approaches for PD detection focused on single-modal analysis, such as speech-based assessments33, neuroimaging studies34,35, or motor function evaluations36,37. While these methods offered promising results, they relied on singular data modality, limiting their generalizability and diagnostic robustness. The integration of multimodal data sources, spanning neurological, physiological, and behavioral biomarkers, could provide a more complete representation of PD progression, which would facilitate early and precise detection3840.

The proposed research introduces MultiParkNet: a multi-modal deep learning framework that synthesised heterogeneous data sources, which included audio speech patterns, motor skill drawing characteristics, neuroimaging data and cardiovascular signals for constructing a high-fidelity diagnostic model (Fig. 1). This integrative approach enabled the extraction of intricate patterns across different physiological and neurological domains, which further enhanced the granularity of disease characterisation. The system is built upon specialised deep learning architectures for each modality, which includes a CNN-LSTM model for speech processing, dual-branch CNN for motor skill evaluation, 3D CNNs for neuroimaging analysis, and dilated convolutional neural networks for cardiovascular signal interpretation. The novel feature fusion mechanism employed multi-head attention, and dynamic inter-modal weight allocation was integrated for capturing non-linear interactions between diversified biomarkers for optimal feature representation. Further, the probabilistic nature of the proposed classification framework allowed for confidence-weighted decision-making and mitigating the uncertainty inherent in medical diagnostics. To enhance the generalizability of the model and clinical applicability, many advanced preprocessing techniques were included, such as dimensional normalisation, signal-to-noise ratio optimisation, and feature space regularisation for each data source. The integration of data augmentation strategies further strengthened the robustness, addressing the challenges of data heterogeneity and limited sample availability in medical research. The novelty of the framework lies in,

  1. Unlike conventional approaches of single-modal analysis, this framework synthesised audio speech patterns, motor skill drawings, neuroimaging data and cardiovascular signals, enabling more comprehensive and accurate early-stage diagnosis of disease.

  2. The model incorporated a multi-head attention mechanism with dynamic inter-modal weight allocation for adaptive feature integration across diverse data modalities to enhance diagnostic precision.

  3. The confidence weighted probabilistic model enhances decision-making, making it reliable by incorporating uncertainty estimation using Monte Carlo Dropout (MC-Dropout) during inference. In this approach, dropout is applied at test time to perform multiple stochastic forward passes through the network, enabling approximation of the posterior predictive distribution. The variance or entropy of these predictions is interpreted as an uncertainty measure, guiding the model in identifying low-confidence or ambiguous cases. This strategy helps reduce false positives and ensures safer diagnostic outcomes, particularly important in early-stage PD detection where decision sensitivity is critical.

The proposed system holds significant clinical implications by facilitating personalized disease monitoring and early therapeutic interventions, thus reducing diagnostic ambiguity. The remainder of the paper is structured as follows. The section Material and Methods details the materials and methods used in the proposed study. The section Results shows the results obtained from the study and the usage of the proposed framework. The section Discussion presents an extensive review of related work in the diagnosis of PD and discusses the contributions of this work. The section Conclusion concludes the paper, highlighting its limitations, which are further discussed in the section Future Scope.

Fig. 1.

Fig. 1

Multi-Modal Deep Learning Framework for Early-Stage PD Detection. The proposed system integrates diverse data modalities, including audio speech patterns, motor skill drawings, neuroimaging (DATSCAN, MRI), and cardiovascular signals. Each modality undergoes specialised preprocessing before being processed through dedicated deep-learning models. The extracted features are fused using a multi-head attention mechanism, followed by probabilistic disease classification to enhance early detection accuracy.

Materials and methods

Modality-wise dataset description

The modalities of the data distribution are given in Table 1. The dataset has eight modalities, which include Audio 1, Audio 2, DaTSCAN images, Drawing task 1, Drawing task 2, MRI protocol 1, MRI protocol 2, and ECG images. In total, 1,802 samples were collected, 523 from Parkinson’s disease (PD) subjects and 1,279 from healthy controls. These are the counts of files because individuals can contribute to several modalities.

Table 1.

Modality-wise dataset distribution showing PD vs healthy samples.

Modality Healthy Samples PD Samples Total Samples
Audio 1 21 16 37
Audio 2 21 15 36
DaTSCAN 30 30 60
Drawing Task 1 36 36 72
Drawing Task 2 15 15 30
MRI 1 610 221 831
MRI 2 30 30 60
ECG Images 516 160 676
Total 1279 523 1802

The data records belong to various publicly listed sources and do not necessarily identify with one coherent cohort; thus, DaTSCAN, EEG, and ECG data were not likely to be obtained on the same participants, either. To reduce the potential label inconsistency and the source of training bias of cross-individual combination of modalities, we leave the original labels of each dataset as they currently exist, as determined by clinical diagnosis or by other documentation. When training the model to learn modalities, only fusion at the feature level was carried out, and modalities were handled by individual subnetworks. This architecture made the model learn the modality-specific patterns without the need to align elements one to another. Also, the stratified sample in each modality ensured consideration of balanced PD and healthy control representation in training, validation, and testing groups.

Clinical deployment pathway

Whilst real-world deployment remained outside the timeline of this research study, a pragmatic clinical deployment route map was drawn up to provide future translation of the proposed framework. The model can be implemented in a hybrid cloud-edge system in which deployment of processes such as feature extraction, noise filtering, is executed on mobile devices or in a hospital workstation (edge), and specific inference processes are offloaded to cloud-based devices. This division maintains scalability, fast response, and efficient utilization of resources in high-resource hospitals as well as small clinics.

As with most systems, integration of the hospital IT systems is a must-have requirement for deployment. To ensure that, the framework is amenable to mainstream Electronic Medical Record (EMR) systems via the HL7/FHIR interoperability standards, which means it can be easily incorporated into the flow of any healthcare facility. In addition, the calibration and harmonization plans are required to manage the data obtained using various devices. Variable tasks, such as MRI field strength normalization and DaTSCAN vendor harmonization, will limit inconsistencies in imaging data, whereas quality-control pipelines in speech, handwriting, and ECG data will minimize cross-device-specific metrics. These mechanisms combine to ensure that it is more robust and generalizable across several clinical sites.

To evaluate the feasibility of doing this, it is proposed that there is a preliminary breakdown of the cost of deployment. Preprocessing end devices cost is estimated at 200-300 dollars/device, whereas the costs of cloud GPU rents are estimated at 50-100 dollars/month, depending on the working load. Certain storage as well as system maintenance is an added cost of about $20-30 per month. Such estimates give an actual picture of the cost of actual implementation.

Dataset demographics & bias considerations

While Table 1 summarizes the modality-wise distribution of healthy and PD samples, it is important to note that the datasets were compiled from publicly available sources, and detailed demographic metadata (e.g., age, gender, ethnicity, regional origin) was not consistently available. As a result, the present study cannot explicitly quantify demographic diversity across modalities or cohorts. This limitation introduces the possibility of population bias, as underrepresentation of certain demographic groups (for example, specific ethnicities or geographic regions) could affect the generalizability of the results.

However, the limitations emphasize them as an important source of potential variance. We also highlight that while our framework is designed to learn modality-specific biomarkers and mitigate dataset-specific biases through cross-validation and regularization strategies, future clinical validation on demographically diverse populations remains a critical step.

Dataset used

The proposed framework, MultiParkNet, used the potential of multiple heterogeneous datasets for capturing diversified physiological and neurological biomarkers associated with the disease. The multi-modal approach enhanced diagnostic accuracy by integrating speech, handwriting, neuroimaging, and electrophysiological data. This dataset was preprocessed and analyzed using specialised deep learning architectures to extract and interpret disease-specific features effectively. The datasets utilized in this research are subsequently described.

The Mobile Device Voice Recordings at King’s College London (MDVR-KCL) dataset consists of various speech impairments such as reduced vocal intensity, irregular vocal intensity, irregular pitch modulation, and increased jitter and shimmer values. Moreover, the recordings include sustained phonation, sentence reading, and spontaneous speech, which allowed for a comprehensive analysis of vocal deterioration. These speech samples can serve as essential accoustic biomarkers, facilitating the early detection of PD (Table 2 for details of the dataset and Fig. 2 for the MeL spectrograms of the samples).

Table 2.

MDVR-KCL dataset overview.

Sample ID Subject Type Phonation Task Duration (s) Mean Pitch (Hz) Jitter (%) Shimmer (%) Harmonics-to-Noise Ratio (HNR)
S01 PD Sustained “A” 5.2 185.4 0.78 3.12 18.5
S02 Control Sentence Reading 6.1 198.7 0.41 1.87 22.3
S03 PD Spontaneous Speech 7.8 172.9 1.12 4.56 16.7
S04 Control Sustained “E” 6.3 201.1 0.39 2.01 23.5
S05 PD Spontaneous Speech 8.2 169.3 1.34 5.02 15.9
S06 Control Sentence Reading 6.7 192.5 0.45 2.12 21.7

Fig. 2.

Fig. 2

Visualization of the MeL spectrograms generated for the samples mentioned in Table 2 from MDVR-KCL Dataset.

The Parkinson’s Drawing dataset41 consists of spiral and wave drawings captured from PD patients and healthy individuals using digital tablets. These drawings are utilized to assess the motor impairments, particularly hand tremors and bradykinesia, which showed distortions, irregular line consistency, and pressure variations. The dataset included multiple samples per participant, which provided a reliable basis for distinguishing Parkinsonian movement patterns from normal handwriting characteristics. Some samples from the data are shown in Fig. 3.

Fig. 3.

Fig. 3

Visualization of the samples from Parkinson Drawing dataset showing two types of drawing: spiral and wave of healthy and PD patients.

Furthermore, the NTUA Parkinson Dataset42 is utilized, containing Dopamine Transporter (DAT) scans that visualize dopamine activity in the brain’s striatum. This dataset is considered crucial in identifying the extent of dopamine depletion, a PD hallmark. Each scan is labelled based on clinical diagnosis and includes PD and non-PD cases. The dataset offered high-resolution neuroimaging data that highlighted variations in dopamine transporter density. The dataset also contains structural Magnetic Resonance Imaging (MRI) scans that help detect neurodegenerative changes in brain regions associated with PD. These scans provides a volumetric representation of the brain, that enables the identification of structural atrophy and white matter degeneration. The dataset consisted of T1-weighted MRI images labelled with disease severity levels, allowing for a more nuanced analysis of disease progression. The datscans and MRI samples with their proper categories are shown in Fig. 4.

Fig. 4.

Fig. 4

Visualization of the data samples of DATSCANS and MRI scans from NTUA Parkinson Dataset. From left to right, show a normal patient MRI and b PD patient MRI, c normal patient DATSCAN and d PD patient DATSCAN.

The EEG motor movement dataset43 consisted of electrophysiological recordings that captures the neural activity patterns associated with motor functions. This dataset includes EEG signals from PD patients and healthy controls during motor imagery and movement tasks. This helps analyze cortical activity disruptions, slow-wave oscillations and abnormalities in beta-band frequencies indicative of Parkinson’s disease. The dataset comprised multi-channel recordings with time-series data corresponding to different brain regions. A sample EEG visualised using PhysioNet is shown in Fig. 5.

Fig. 5.

Fig. 5

Visualization of the EEG data sample from the PTB database used in this study.

Modality balance analysis

A key problem that we need to solve in our multi-source fusion approach is related to the imbalanced sample size between the modalities. As an example, the MRI-1 dataset included 831 samples, which is significantly more than for the Drawing Task-2 modality, which used 30 samples. These disparities introduce the danger of potential over-fitting of the high sample modalities at the expense of the low sample modalities. Such unbalance may introduce a bias into the learning mechanism, often resulting in underutilization of modalities that are underrepresented and negatively affecting the generalization performance of the model.

To overcome this, we analyzed the modality-related learning behavior. We analyzed the performance metrics (accuracy, recall, and precision) on a per-modality basis and analyzed how the modalities with a lower number of samples were contributing to the overall decision-making process. Results indicated that high-sample modalities (e.g., MRI and speech data) contributed almost saturating feature representations, whereas low-sample modalities (e.g., certain drawing tasks and gait signals) had normalized weights in the fusion layers. To counteract these effects we considered two schemes of optimization:

Weighted loss functions: Assigning a greater weight of losses on underrepresented modalities allowed the model to give their contribution more weight during training, balancing gradient flow across modalities.

Targeted data augmentation: Augmentation methods (Rotation, scaling, synthetic perturbations of trajectories) were used to artificially increase the size of the data in the case of small-sample modalities like Drawing Task-2. This greater variability served to avoid over-fitting the model and made it more robust.

On the whole, this discussion demonstrates that evaluation of modality imbalance in multi-source learning has to be tackled directly. Adding weighted losses and augmentation techniques led to not only an enhancement in sensitivity to modalities that occur in low numbers but also in the cross-modal diagnostic task in general.

Class imbalance handling

Although the overall ratio of healthy to PD patient samples is approximately 2.4:1, several explicit measures were implemented to mitigate the potential adverse effects of class imbalance on the model’s predictions and diagnostic sensitivity:

  1. Fusion mechanism and attention weights: The multimodal fusion block incorporated dynamic intermodal weighting, automatically adjusting emphasis toward features from underrepresented modalities. This further helped prevent bias toward the healthy class, especially in ambiguous cases spanning multiple data types.

  2. Weighted loss functions: During model training, a greater loss weight was assigned to underrepresented classes and modalities with fewer samples (e.g., PD cases, drawing modalities, and ECG PD images). This ensured that the optimization process penalized misclassification of PD samples more strongly, thereby maintaining high sensitivity to the minority class rather than defaulting predictions to the majority (healthy) class.

  3. Cross-validation & performance metrics: A stratified 10-fold cross-validation approach was used, preserving the class distribution in each training and testing fold. Notably, the model achieved high recall (97.94%) and F1-score (98.04%) across folds as shown in Table 3. Consistently strong recall–the metric most sensitive to underdiagnosis of PD demonstrates that the model’s ability to detect PD cases remains unaffected by class imbalance. Furthermore, the confusion matrices Fig. 14 show minimal false negatives, reinforcing the model’s robustness against such imbalance.

  4. Targeted data augmentation: Augmentation schemes such as rotation, scaling, synthetic perturbations, and noise injection were specifically applied to low-sample modalities (notably in drawing and ECG data for PD cases). This increased the effective representation of PD instances in the training set and enhanced the model’s exposure to diverse pathological patterns, directly countering imbalance effects.

  5. Evaluation on minority class: An ablation study was conducted, omitting individual modalities, verifying that removal of a modality or a class-imbalance countermeasure led to a corresponding drop in recall and accuracy. This experiment illustrates the efficacy of the adopted strategies in upholding minority class detection performance.

Table 3.

Comparative analysis of deep learning models for Parkinson’s disease detection.

Model Name Accuracy (%) Precision (%) Recall (%) F1 Score (%) AUC-ROC PR-AUC Interpretability
MultiParkNet 96.74 98.15 97.94 98.04 0.986 High Inline graphic GradCAM & Attention
ResNet50 94.85 93.90 94.10 94.00 0.958 Medium
VGG19 93.62 92.10 93.20 92.64 0.960 Medium
InceptionV3 94.33 94.00 93.80 93.90 0.951 Medium
EfficientNetB0 94.95 95.10 94.00 94.55 0.960 Medium Inline graphic Moderate
DenseNet121 95.14 95.70 94.50 95.10 0.956 Medium Inline graphic CAM-based
MobileNetV2 93.80 93.20 92.40 92.80 0.950 Medium
XceptionNet 95.20 94.50 95.10 94.80 0.956 Medium
CNN (Scratch) 91.00 89.30 90.40 89.84 0.954 Low Inline graphic Fully transparent
LSTM 90.00 94.85 85.98 90.20 0.937 Medium

Fig. 14.

Fig. 14

Confusion matrix plots of the different deep learning models tested on the dataset along with MultiParkNet. A clear idea of the false positives can be taken from this plot.

Handling label heterogeneity across datasets

The datasets used in this study (MDVR-KCL speech, NTUA neuroimaging, and Parkinson’s Drawing datasets) originate from distinct clinical cohorts with potentially different diagnostic criteria. Such heterogeneity introduces the possibility of label noise, as the same patient phenotype may be categorized differently depending on institution-specific protocols.

To minimize this effect, we employed the following strategies:

  1. Modality-independent learning: Each modality was processed by dedicated sub-networks, preventing cross-contamination of diagnostic criteria between datasets. The MDVR-KCL speech data, NTUA neuroimaging data, and Parkinson’s Drawing data maintained their original diagnostic labels without harmonization attempts that could introduce artifacts.

  2. Feature-level fusion: Fusion was restricted to the representation level, avoiding direct patient-level alignment across datasets that could introduce label artifacts.

  3. Biologically-grounded features: Our architectures were designed to extract objective PD-specific markers (e.g., dopamine transporter density in DaTSCAN, tremor oscillations in speech, handwriting micrographia), which are consistent across diagnostic standards.

  4. Cross-dataset validation: We employed stratified 10-fold cross-validation across datasets. The framework consistently achieved 98.15% ± 1.24% accuracy, demonstrating that performance generalizes beyond dataset-specific diagnostic conventions.

Theoretical justification: The biological basis of our approach is crucial here. PD’s pathophysiology progressive dopaminergic neuron loss in the substantia nigra produces consistent peripheral manifestations (tremor, bradykinesia, micrographia, vocal changes) regardless of the diagnostic criteria used to initially label patients. Our modality-specific feature extractors target these objective biomarkers rather than subjective clinical assessments.

Addressing the theoretical performance limit: While label noise may theoretically limit performance, our 96.74% test accuracy suggests we are capturing genuine disease signals rather than fitting to diagnostic artifacts. The consistency across modalities and the biological plausibility of extracted features (visualized through GradCAM) provide evidence that performance reflects true pathophysiological detection rather than dataset overfitting.

Methodology

The designed multi-modal Parkinson’s disease detection system uses deep learning architecture tailored to each modality (see Fig. 6), such as Speech, Handwriting, DAT scans, MRI scans, and EEG signals to increase diagnostic accuracy. However, the pipeline begins with severe preprocessing steps specific to each dataset, aimed at removing noise, standardizing features, and augmenting data for robustness. The hybrid ensemble deep learning framework is a core model architecture, that treats each modality using a separate deep neural network optimized for feature extraction. The speech data is learned as tensors using a Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN-BiLSTM) model, where the convolutional layers play the role of spectra extractor layers, and BiLSTM layers capture the temporal dependency. Thus, deep spatial features from the ResNet50 backbone are processed for handwriting images with an additional attention mechanism to highlight fine motor inconsistencies. A dual-stream CNN takes DAT and MRI scans and directs ResNet50 and EfficientNetB7 to learn respective structural and intensity-based patterns, which are then fused to perform cross-modal attention fusion using another net to increase diagnostic precision. CNN-LSTM hybrid model is used to analyse the EEG signals, where 1D convolutive filters are applied to extract temporally localised patterns, and the LSTMs describe long-term dependencies. Given that the extracted feature vectors from all modalities only consist of similar information, it is concatenated and fed into a fully connected fusion network, combining self-attention layers, to learn inter-modal correlations in order to maximise correlated patterns across multiple physiological markers. Finally, the final diagnosis is given by a multi-head classifier with softmax activation, predicting Parkinson’s presence or severity levels. An end-to-end optimisation strategy with weighted loss functions then trains the entire framework with the imbalances of the real-world datasets being considered. We perform transfer learning and fine-tuning on MRI and DAT scans and improve the separability of features in speech and EEG data using contrastive learning. With a cyclical learning rate scheduler, AdamW optimiser, dropout, and batch normalisation, the model is trained to the best possible convergence and mitigating overfitting.

Fig. 6.

Fig. 6

Methodology of the proposed framework: MultiParkNet.

Dataset preprocessing

The functional datasets underwent extensive processing, mainly tailored to their respective modality to ensure optimal model performance and reliable PD detection. This step removed the noise, standardised the inputs, and extracted essential features thereby enhancing the classification accuracy. The preprocessing pipeline for each dataset is described in the following, with its diagrammatic representation in Fig. 7.

Fig. 7.

Fig. 7

Overview of preprocessing pipelines for multimodal data. This figure illustrates the comprehensive preprocessing steps applied to five distinct data modalities: Speech, Handwriting (Motor), DAT Scan, MRI Image, and EEG. Each row represents a specific modality and showcases the sequential flow of preprocessing techniques employed to enhance data quality and extract relevant features.

For the MDVR-KCL dataset, the raw speech recordings contained background noise, inconsistent sampling rates, and silent intervals addressed before feature extraction. The audio files were converted to a uniform sampling rate ( 16Khz) for consistency. Then, the amplitude levels were normalised to avoid variations due to recording equipment differences. Thereafter,spectral subtraction and wiener filtering were applied to remove background noise. Voice activity detection (VAD) was also used to isolate speech segments and discard silent regions. Next, MeL-frequency spectral coefficients (MFCCs) were computed to capture speech spectral characteristics. The pitch, jitter, and shimmer values were extracted to assess voice tremors indicative of PD. The chromatogram and spectral contrast were also generated to enhance the detection of vocal disorders. Then, time stretching and pitch shifting were performed to simulate variations in speech patterns as part of the data augmentation. Noise injection was also performed to make the model robust against real-world variations.

The handwriting dataset consisted of scanned images of spiral and wave drawings, which must be refined to highlight PD-related movement inconsistencies. The images were resized to Inline graphic pixels for uniformity. The pixel values were normalised between 0 and 1 for best model training. Next, canny edge detection was applied to extract the main drawing contours for contour detection and stroke analysis. The Hough transform was used to assess the line curvature, which captures the signs of tremors. The variation in stroke width along with the pressure distribution was computed using pixel intensity analysis. Next, Gaussian filtering was used to remove background noise, and the images were converted to binary marks to better contrast and easier feature extraction. The Gray-Level Co-occurrence Matrix (GLCM) extracted the spatial texture features, while Fourier transforms analyzed the frequency-based stroke variations.

Next, for the NTUA dataset, in the case of DAT Scans, these provided critical insights into dopamine transporter availability, which required extensive preprocessing for accurate interpretation. The scan dimensions were standardised into Inline graphic for deep learning input. All the scans were aligned using affine transformation. Non-local means (NLM) filtering was applied to reduce scan noise. Contrast-Limited Adaptive Histogram Equalization (CLAHE) was used to enhance the contrast of the images. Next, the K-Means Clustering algorithm was applied to segment the relevant regions of the brain and thresholding techniques were applied to isolate dopamine transporter areas. The dopamine transporter intensity variations were also a computer for assessing PD-related deficiencies. The texture descriptors, including Gabor filters, were also used to analyse the fine-grain structure of the affected regions.

MRI scans help assess structural brain changes, but this requires extensive standardisation for effective deep-learning classification. The Brain Extraction Tool (BET) removed the non-brain tissues, and bias field correction was applied to eliminate intensity inhomogeneities. The voxel intensity distributions were normalised to a fixed range of (0 to 255). The pixel intensity was standardised across scans using z-score normalisation. Fuzzy C-means clustering was used to separate brain tissues. Regions of interest (ROI) (including substantia nigra, and basal ganglia) were identified, which were relevant to PD. Then, volumetric features of affected brain regions were computed, and wavelet decomposition was used for multi-resolution feature analysis.

The EEG signals were highly noise-sensitive, requiring advanced preprocessing to ensure reliable feature extraction. Bandpass filtering (0.5 - 45Hz) was applied to remove unwanted low- and high-frequency noise. Next, independent component analysis (ICA) was used to eliminate eye-blink and muscle artefacts. EEG recordings were divided into fixed-length epochs of 2-second windows. The amplitude values were normalised using Z-score transformation. The power spectral density (PSD) was also computed to analyse signal frequency content, and wavelet coefficients were extracted using the discrete wavelet transform (DVT). Event-related potentials (ERP) were used to identify task-related brain activity changes.

Theoretical and biological basis for multi-source fusion

The consideration of the multi-source data in the suggested MultiParkNet system is supported by the theoretical assumption that various modalities correspond to the different, but overlapping, dimensions of Parkinson’s disease pathology. Measures using neuroimaging based on MRI and DaTScan give more in-depth knowledge regarding the structural and functional abnormalities of the brain, whereas speech, handwriting, gait, and physiology signals reflect motor and cognitive decline that show peripheral expressions. Having both modalities augments feature diversity, makes a model better generalize, and reduces overfitting in comparison to one of the modalities separately. Although there might be variability in the collection of datasets across patients or acquisition centers, the standardized preprocessing, such as normalization, resolution matching, denoising, and data augmentation, has an effect capable of mapping extracted features of different datasets to similar representation spaces. It has been noted in previous multimodal diagnostic research that with this integration, it is possible to attain considerable improvement in predictive accuracy without any loss in feature interpretability.

Biologically, PD produces systematic changes that are always mirrored by the various forms of data. The consequent dysfunction of the basal ganglia caused by progressive loss of dopaminergic neurons in the substantia nigra, will cause both central and peripheral nervous functions, the symptoms of which include tremors, bradykinesia, rigidity, dysfluency in speech, and micrographia in handwriting. Templates of these disease-specific parameters, including hypoactivity on DaTScan images, voice with reduced pitch variation, and micro-movements in handwritten text, have been proven robust across demographics of patients measured as standardized conditions. MultiParkNet can combine the heterogeneous data sources and maintain the associated biological signatures pertinent to the diagnosis of PD based on the cross-modal consistency of these putative biomarkers.

To ensure the homogeneity of the combined datasets, matched preprocessing pipelines were utilized on each modality to scale features, harmonize temporal characteristics, and reduce noise, respectively. Analyses of statistical distribution were performed in order to refute the possibility of inter-dataset variability being predominant over disease-specific patterns. This allows theory and biology-based multi-source data to be combined within the same predictive framework in theory and rationally in biology, and allows sound reliability and clarity in clinical integration in the detection of PD.

Model architecture

The proposed framework consisted of multiple modality-specific sub-models to extract high-dimensional feature representations from diverse biomedical data sources. The extracted features were fused using an attention-based feature integration mechanism, followed by a fully connected classification network.

The speech analysis model (see Fig. 8) aimed to extract meaningful vocal features from speech signals and use the potential of temporal dependencies for distinguishing PD speech patterns from healthy speech. This was achieved using a combination of CNN for feature extraction and bidirectional LSTM for sequence modeling. The raw signal was first converted into a MeL spectrogram, which is a time-frequency representation that captured speech characteristics. For a discrete-time speech signal s(t), here Short-Time Fourier Transform (STFT) was applied to obtain the frequency representation using Eq. 1.

graphic file with name d33e977.gif 1

In the above equation, S(ft) was the spectrogram input, s(n) was the discrete-time speech signal, and Inline graphic was the hamming window function which was centered at t. N is the FFT window size, and Inline graphic represented the Fourier basis function. The power spectrum of the signal was obtained as in Eq. 2.

graphic file with name d33e1022.gif 2

This power spectrum was then mapped to MeL scale using triangular filters Inline graphic for simulating human auditory perception using Eq. 3.

graphic file with name d33e1039.gif 3

where M(mt) was MeL spectrogram with m frequency bins. The MeL-spectrogram was passed through multiple convolutional layers to extract local patterns from speech features. Each convolutional layer applied multiple learnable filters Inline graphic to detect frequency and time-based dependencies.

Fig. 8.

Fig. 8

Architectural diagram of the proposed model for Speech Data (MDVR-KCL).

Here, R is the pooling window size, which ensures that only the most salient features are preserved. Next, a BiLSTM network was applied to capture long-term temporal dependencies in speech. Unlike standard LSTMs, this learnt past and future contextual information, improving the robustness in sequential speech data. Each LSTM cell consisted of three gates (input, forget and output gates) that regulated the flow of information. First, the forget gate determined the past information that should be retained using Eq. 4.

graphic file with name d33e1073.gif 4

The memory cell updation was done using Eq. 5.

graphic file with name d33e1083.gif 5

Lastly, the output gate determines which information should be propagated forward using Eq. 6, 7.

graphic file with name d33e1097.gif 6
graphic file with name d33e1103.gif 7

A dropout layer was applied to prevent overfitting using Inline graphic. Finally, a sigmoid activation function produced the probability of PD using Inline graphic, where P(y) was the predicted probability of the patient having PD, and Inline graphic. Next, the binary cross-entropy loss function optimized. the network using Eq. 8.

graphic file with name d33e1138.gif 8

Where Inline graphic is the ground truth label, Inline graphic is the predicted probability, and N is the total number of samples.

The handwritten analysis module (see Fig. 9) was then designed to extract fine-grained spatial and structural features from handwritten images using ResNet50. An attention mechanism was integrated into this to improve the discrimination of characteristics, which ensured that the network focused on crucial regions affected by PD, such as tremors, irregular spacing, and character distortions.

Fig. 9.

Fig. 9

Architectural diagram of the proposed model for Handwriting analysis.

The model takes two inputs:

graphic file with name d33e1172.gif

where H and W represent the spatial dimensions of the input, and C denotes the number of channels (e.g., grayscale or RGB).

Here, the Max Pooling Layer is a 2Inline graphic2 max pooling operation that compresses the spatial map size of feature maps by collecting essential information. The algorithm takes maximum values from 2Inline graphic2 windows for each area, thus enhancing translation invariance and decreasing computational resources. External input variations do not affect the network performance through this step in Eq. 9.

graphic file with name d33e1204.gif 9

This operation reduces the size of the feature map to Inline graphic.

The Global Average Pooling (GAP) layer converts each feature map into a single value by calculating the average spatial distribution across all locations. A 64-dimensional vector features the essential details that result from this compression. The advantage of GAP arises from its ability to reduce the number of parameters, thus minimizing overfitting compared to fully connective dense layers using Eq. 10.

graphic file with name d33e1222.gif 10

where Inline graphic and Inline graphic.

The image analysis module (see Fig. 10) was then designed to extract DATSCAN and magnetic resonance images to enhance predictive performance for medical image analysis. The architecture consists of two sub-models: (1) the DATSCAN Sub-Model, which processes DATSCAN images using a 3D convolutional neural network (CNN), and (2) the MRI Sub-Model, which incorporates both 2D and 3D MRI images using a dual-branch CNN. The extracted features from both models are fused to create a comprehensive features representation.

Fig. 10.

Fig. 10

Architectural diagram of the proposed model for Image analysis for DATSCAN and MRI data.

The DATSCAN sub-model is a 3D CNN designed to process volumetric input data. The model follows a sequence of convolutional, pooling, and global averaging layers. Given an input DATSCAN image Inline graphic, the feature extraction process is performed as follows:

A 3D convolutional layer with 16 filters of kernel size Inline graphic and ReLU activation is applied as shown in Eq. 11:

graphic file with name d33e1272.gif 11

where Inline graphic and Inline graphic denote filter weights and biases.

Subsequently, max pooling with a Inline graphic window reduces spatial dimensions, as in Eq. 12:

graphic file with name d33e1303.gif 12

A second 3D convolutional layer with 32 filters and ReLU activation refines feature extraction through Eq. 13:

graphic file with name d33e1313.gif 13

Global average pooling is applied to obtain a 64-dimensional feature vector in Eq. 14.

graphic file with name d33e1324.gif 14

Now,the 3D MRI branch processes volumetric MRI data Inline graphic. The processing pipeline includes:

A 3D convolutional layer with 16 filters as in Eq. 15.

graphic file with name d33e1342.gif 15

Max pooling with a Inline graphic window as shown in Eq. 16.

graphic file with name d33e1358.gif 16

A second 3D convolutional layer with 32 filters in Eq. 17.

graphic file with name d33e1369.gif 17

Global average pooling to obtain a 32-dimensional feature vector as in Eq. 18.

graphic file with name d33e1379.gif 18

The final output of the model is computed as in Eq. 19.

graphic file with name d33e1389.gif 19

where Inline graphic represents the fused feature vector, which is further processed for classification or regression tasks.

The proposed architecture efficiently integrates 2D and 3D medical imaging modalities using a CNN-based pipeline. The DATSCAN, 2D MRI, and 3D MRI submodels extract meaningful features combined through feature fusion. The final 128-dimensional feature vector is utilized for downstream medical analysis tasks.

The cardiovascular signal sub-model (see Fig. 11) is intended to identify useful features of ECG signals. This model combines Convolutional Neural Networks (CNNs) to extract features and long-short-term memory (LSTM) networks to learn sequential temporal dependencies in ECG data. This combination of CNN and LSTM enables the model to identify spatial and temporal patterns and is therefore suitable for cardiovascular signal analysis.

Fig. 11.

Fig. 11

Architectural diagram of the proposed model for cardiovascular signal.

The ECG sub-model consists of four main components: These layers extract local patterns from the ECG signal, such as peaks, slopes, and other structural features. The convolution operation is defined as in Eq. 20.

graphic file with name d33e1422.gif 20

Where Inline graphic represents the input feature map from the previous layer, Inline graphic is the convolutional kernel, Inline graphic is the bias term, and Inline graphic is the activation function (ReLU in this case). The model applies two convolutional layers with dilated filters, which expand the receptive field and allow the network to capture broader temporal dependencies.

These layers reduce the spatial dimensionality of the feature maps by selecting the most prominent features from local regions. The max-pooling operation is shown in Eq. 21.

graphic file with name d33e1459.gif 21

Where R represents the pooling window. Max-pooling helps reduce computational complexity while retaining important spatial features.

The model is trained using the categorical cross-entropy loss function, which is defined as in Eq. 22.

graphic file with name d33e1474.gif 22

where C is the number of classes, Inline graphic is the actual label, and Inline graphic is the predicted probability. The Adam optimizer is employed for parameter updates as shown in Eq. 23.

graphic file with name d33e1500.gif 23

where Inline graphic is the learning rate, and Inline graphic and Inline graphic represent the first and second moment estimates, respectively. This adaptive learning rate mechanism improves convergence and prevents vanishing gradients.

The ECG sub-model successfully integrates CNNs for spatial feature extraction and LSTMs for sequence modeling, giving a strong framework for cardiovascular signal analysis. The combination of convolutional and recurrent architecture enables precise ECG classification, making it effective for real-time heart health monitoring and predicting cardiovascular disease.

After extracting the high-dimensional features of each modality-specific deep learning model, a strong way of fusion of the different biomedical information was used. This last step was important to exploit the combined features of multimodal data and to increase the accuracy of the classification of Parkinson’s disease (PD).

The Multi-Model Feature Fusion (see Fig. 12) which consists of the variants that were derived automatically by speech (CNN-BiLSTM), handwriting (ResNet50 with Attention), DAT scanning (3DCNN with EfficientNetB7), MRI (Hybrid Dual-Branch CNN), and EEG (CNN-LSTM) were concatenated to create a joint representation. This concatenated vector consisted of both spatial and temporal variety of entire modalities, meaning that modality-specific semantics are kept.

Fig. 12.

Fig. 12

Architectural diagram of the proposed model for Multimodal data fusion.

Dynamic intermodal weighting was also attempted to further enhance the fusion of the modalities, a dynamic weighting methodology was explored to learn the individual proportions of every modality. Trainable scalars Inline graphic were used for each modality feature Inline graphic so that more informative modalities were weighted. The last fused representation was computed as disclosed in Eq. 24.

graphic file with name d33e1558.gif 24

This approach made it possible for the model to automatically emphasize on the most distinguishing characteristics to calculate PD classification.

Experimental setup

The experimental configuration of the proposed MultiParkNet framework was set up to include effectively computerizing multimodal sources of data and optimizing the deep learning architectures for screening early-stage PD. The dataset included audio speech recordings, motor skill-based drawing patterns, neuroimaging scans, and cardiovascular signals acquired from public and clinical databases. Data preprocessing consisted of noise control of speech records, extraction of acoustic features from handwriting dynamics, waveforms of cardiovascular traces, and intensity standardization of neuroimaging data.

The experimental arrangement employed a multi-stream deep learning method in which convolutional neural networks CNNs fathomed neuroimaging signals and recurrent neural networks RNNs were induced to replicate temporal dependency amongst speech and cardiovascular signal and graph neural networks GNNs were made to apperceive structural correlation amongst motor routine. A probabilistic classification module combined the extracted features by attention-driven fusion. The model was trained on a High-Performance Computing high-performance GPU-enabled computing cluster using Adam optimizer with 0.0001 initial learning rate and 32 batch size. Cross-validation with 10-fold validated the reliability of the model, and the performance of the model was assessed by using accuracy, precision, recall, and F1-score. The experimental frame-up is an AI-driven, clinically available PD detection system with a more precise diagnostic.

The proposed framework MultiParkNet was trained on a computing infrastructure to ensure that the model processes huge multimodal data efficiently and that computation remains stable. The system was powered by NVIDIA A100 GPU and 80 GB VRAM for fast tensor operation and accelerated deep learning computing. Moreover, the system consisted of 512GB system RAM and an Intel Xeon-P processor for efficient performance of large-scale neural network training(inertia) tasks. The system was set up with a 64-bit Ubuntu 22.04 operating system, one of the best platforms for running AI-intensive applications to achieve the best possible performance and smooth execution of deep learning workflows. With fast hardware and an idealistic software environment, the MultiParkNet framework efficiently handled intricate feature extraction, fusion, and classification procedures. This configuration was important for obtaining high accuracy and robustness in cross-validation experiments.

Evaluation metrics

The evaluation criteria for the proposed MultiParkNet framework comprehensively evaluate the model’s performance on early-stage PD diagnosis.

Accuracy determines the percentage of all correct predictions, both of a healthy and unhealthy class, respectively. Loss is calculated by means of binary cross-entropy, which gives an economic side by a model’s mistake and also optimizes the performance. F1 Score is the harmonic mean of precision and recall, and weighing both of them is computed as BinaryF1Score. ROC Curve plot is the True positive rate (TPR V S False positive rate (FPR) across different classification thresholds. AUC (Area Under the Curve) is the ROC curve’s Area, which measures the model’s existence in differentiating classes. Precision is the share of correctly predicted positive instances, whereas recall (sensitivity) is the percentage of actual positives correctly specifier. The precision-recall curve displays the precision-recall trade-off, with PR-AUC summarizing the area under it. Average Precision means the weighted average of precision overall thresholds. Confusion Matrix is a table used to summarize the prediction results. GradCAM and attention surface showed the input areas that impact predictions. There are cross-validation metrics that give variations in the accuracy across k-folds. Class weights allow for the handling of class imbalance to achieve balanced learning.

Results

The performance of MultiParkNet was thoroughly assessed using MAP and other metrics to ensure the robustness and reliability of its results. The model was able to achieve an average training accuracy of 99.67%, so it can learn complex patterns in diverse data modalities. In cross-validation, the validation precision reached 98.15% (±1.24%) and helped to ensure stability and generalizability. Using the test set, this framework had 96.74% (±3.70%) test accuracy, showing good real-world relevance (see Fig. 13).

Fig. 13.

Fig. 13

Results for the MultiParkNet. The left side shows the ROC-AUC curve for different deep learning models and the right side curve shows the training accuracy curve for different models tested.

In addition, precision, recall, and F1 score indicated the classification strength model, with an AUC-ROC of 0.986, confirming the high discrimination capacity of the model between Parkinson’s and normal subjects. The precision-recall AUC was high, signaling good reliability in unbalanced data sets. GradCAM visualizations and attention surface maps provided important insights, indicating a disease-specific area involved in the predictions.

Cross-validation metrics such as average loss tendencies and average deviations validated the model’s reliability. The confusion matrix in Fig. 14 analysis revealed poor misclassification and again confirmed effectiveness. The proposed framework improves early PD detection by being a good AI-formed clinical diagnostic solution (see Fig. 15). The next studies will investigate deployment to the real world, federated training, and patient-specific predictive models to enhance their impact on clinical and scalability.

Fig. 15.

Fig. 15

This displays a comparative visualization of original input data (audio spectrogram, wave, MRI scan, DATSCAN, EEG) alongside their corresponding GradCAM explainability maps and attention surface maps, highlighting regions of importance identified by MultiParkNet for each data modality. The prediction of the proposed framework is also shown below based on all the modalities.

The proposed MultiParkNet is shown to outperform all other state-of-the-art deep learning models in terms of multiple evaluation metrics across all deep learning models considered for predicting Parkinson’s disease (see Table 3). With an accuracy of 96.74%, an F1-score of 98.04%, an AUC-ROC of 0.986, MultiParkNet surpasses other state-of-the-art models such as ResNet50, EffiecintNetB0, DenseNet121 and XceptionNet. Although their high accuracies of around 94–95% produced by traditional models such as ResNet50 and VGG19 are very reliable, their absence of integrated attention mechanisms and lack of interpretability disables the model’s diagnostic robustness. Compared to MultiParkNet, which uses a hybrid ensemble of deep neural networks with attention layers and Grad-CAM visualization based on each modality, our method provides high interpretability and clinical relevance. Incorporating contrastive learning, self-attention fusion, and optimization strategies like cyclical learning rate and AdamW further sharpens the generalization and separability in multimodal inputs for MultiParkNet. Despite models trained from scratch having inferior performance and interpretability, MultiParkNet presents itself as a trustworthy and interpretable AI diagnostic tool for Parkinson’s disease.

Ablation study

We conduct an extensive ablation study (see Table 4) over each modality, architectural enhancement and optimization strategy in order to evaluate their contribution in the overall performance of MultiParkNet. First, single modalities, including speech, handwriting, MRI, DAT, and EEG, were screened in terms of their own baselines. The relatively good performance (93.45% accuracy) of the DAT+MRI combination shows that neuroimaging data have strong diagnostic relevance among them. Increasing to 94.72%, this increased the overall accuracy with speech and handwriting data added, which shows that motor and vocal biomarkers have a great impact on early stage detection. Beyond using optical signals, the model also had higher accuracy when EEG was included. In the dual stream path for imaging, replacing backbone networks such as ResNet50 with EfficientNetB7 improved our precision and recall because of EfficientNetB7’s feature extraction efficiency. The use of attention mechanisms led to an increase of 2.1% in F1 score and interpretability as well. Feature integration was further refined with cross modal attention fusion, especially in multi input scenario, where the AUC-ROC was increased to 0.986. We further incorporated transfer learning from different low sample modality, e.g., EEG and MRI, to improve feature separability. Consequently, optimization techniques like cyclical learning rate scheduling as well as the AdamW optimizer were utilized in order to converge faster and avoid overfitting. At the same time dropout and batch normalization were used to stabilize training across folds.

Table 4.

Ablation study: Component-wise and modality-wise impact on performance.

Experiment Configuration Accuracy (%) F1 Score (%) Precision (%) Recall (%) AUC-ROC
MultiParkNet (All Modalities + Attention + Fusion) 96.74 98.04 98.15 97.94 0.986
Without Speech Modality 92.45 93.40 93.10 93.85 0.954
Without Handwriting Modality 93.80 94.30 94.10 94.50 0.961
Without MRI + DAT (Only Non-Imaging Modalities) 89.62 90.10 90.65 89.45 0.932
Without EEG Modality 91.74 92.00 92.50 91.45 0.947
Without Attention Mechanism 90.92 91.30 91.70 90.90 0.938
Without Contrastive Learning 91.65 92.20 92.40 91.75 0.944
Without Cross-Modal Fusion Network 88.94 89.40 89.70 89.10 0.926
Using Single-Modality Only (Speech Only) 85.25 85.90 85.30 86.50 0.901
Using Single-Modality Only (MRI Only) 87.30 87.70 88.20 87.50 0.913
Without Transfer Learning on MRI + DAT 89.15 89.80 90.10 89.00 0.930
Without Self-Attention Layers in Fusion Block 90.42 91.10 91.00 90.75 0.936
Without Cyclical Learning Rate Scheduler 91.60 91.80 92.00 91.40 0.940
Replacing AdamW with SGD 90.75 91.00 91.20 90.85 0.938
Without Batch Normalization 89.80 90.50 90.20 90.70 0.929
Without Dropout Regularization 88.94 89.70 89.10 90.40 0.921
With Monte Carlo Dropout for Uncertainty Estimation 94.25 95.84 95.90 95.80 0.961

In addition to the above, we evaluated the contribution of uncertainty estimation using Monte Carlo Dropout during inference, a technique often employed to estimate model confidence in high-risk domains like healthcare44. This modification led to a slight performance drop (Accuracy: 96.25%, AUC-ROC: 0.981) compared to the complete model (96.74%, AUC-ROC: 0.986), but provided meaningful uncertainty estimates which can assist clinicians in identifying borderline or ambiguous cases. The trade-off indicates the potential value of incorporating predictive confidence in future clinical deployment scenarios. Although all these design choices were confirmed to have a synergistic impact on the final model performance and a deeply integrated diagnostic system, the final model with all the modules combined achieved the highest performance with 96.74% accuracy and demonstrated good generalization across folds.

Comparative study

In order to evaluate the effectiveness of our proposed MultiParkNet framework, we compare it to several state-of-the-art PD detection models on different modalities (see Table 5). Interestingly, compared to the other approaches, the top result in terms of overall accuracy (96.74%), precision (98.15%), recall (97.94%), and F1-score (98.04%) is attained by MultiParkNet. However, with such low accuracy of 95.29%, lower precision, and F1 score using this type of CNN with MRI data, it was demonstrated in Chakraborty et al. We applied VGG16 and Xception models to DaTscan data, and obtained competitive results with Xception having a much higher recall of 98.84%. Nevertheless, they did not have interpretability enhancements and multi-modal integration. These high precision (98.71%) were obtained with Kernel SVM, but its generalizability to different data types was poor. With the PPMI dataset, the ensemble and classical machine learning models like Random Forest and SVM (AUC-ROC: 0.9888), presented strong performance in some cases; however, a lack of interpretability and not real real-time scenario were not considered. Compared with MultiParkNet, it not only demonstrated better diagnostic performance but also integrated the interpretation as Grad-CAM and attention layers, making the model transparent. Furthermore, it offers robustness amongst the datasets, and its multi-modal architecture makes it ideal for clinical deployment, marking a new benchmark in comprehensive PD detection systems.

Table 5.

Comparative analysis of Parkinson’s disease detection models.

Model Name & Reference Acc. (%) Prec. (%) Recall (%) F1 (%) AUC-ROC PR-AUC Interp. Real-World
MultiParkNet (Proposed) 96.74 98.15 97.94 98.04 0.986 High Inline graphic Inline graphic
3D CNN (MRI)45 95.29 92.70 94.30 93.60 0.980 Inline graphic
VGG16 (DaTscan)46 95.34 96.51 96.51 96.51 Inline graphic
Xception (DaTscan)46 95.34 94.44 98.84 96.59 Inline graphic
MCNN47 96.00 96.05 96.76 96.00 Inline graphic
KSVM + FS48 95.89 98.71 96.88 97.62
Random Forest48 93.12 95.15 93.15 94.13 Inline graphic
SVM (PPMI)49 96.40 97.03 0.9788 Inline graphic
Ensemble CNN (DaTscan)46 95.34 Inline graphic
Explainable AI50 91.11 89.84 92.50 91.13 0.9125 Inline graphic Inline graphic
ResNet18 (CFT-PET + ADC)51 97.0 93.0 95.0 92.0 0.96

Despite its superior performance, the current MultiParkNet architecture is computationally intensive, comprising approximately 100M parameters and requiring around 400 MB of memory, with inference times ranging between 15–30 seconds on high-end GPUs. Such requirements impose limitations on deployment in resource-constrained environments and incur notable cloud costs (estimated at $50–$100 per month). To address these challenges, we propose model compression strategies, including knowledge distillation and pruning, to achieve an 80–90% reduction in parameters while retaining 90–92% accuracy. Additionally, a hybrid cloud–edge deployment framework is suggested, wherein lightweight modalities (e.g., speech and handwriting) are processed on mobile devices, and full multimodal analysis is delegated to cloud servers for complex or uncertain cases. This design ensures both scalability and accessibility while maintaining high diagnostic accuracy, thereby bridging the gap between research performance and real-world clinical applicability.

Discussions

Relevance to early detection

The proposed model is explicitly designed with an emphasis on the early detection of Parkinson’s Disease by focusing on subtle, early-stage neuromotor and physiological biomarkers that often precede clinical diagnosis. Unlike conventional PD classification models that rely on overt motor symptoms typically observed in advanced stages, our framework integrates fine-grained features such as micro-tremors, minor gait instability, and subtle variations in physiological signals (e.g., heart rate variability and muscle rigidity) captured through high-resolution sensor modalities. These early indicators are often underrepresented or missed in standard diagnostic pipelines. By training the model to recognize these nuanced patterns through multimodal fusion and deep feature extraction, MultiParkNet enables differentiation between early-stage PD and non-PD cases, thereby contributing meaningfully to timely intervention and disease management.

Computational efficiency and real-time feasibility

Traditional diagnostic methods often relied on clinical assessments and neurological examinations, which could be subjective and unable to detect the disease in its early stages. Integrating multi-modal data, including images, speech, handwriting, gait, and physiological signals, emerged as a promising approach for enhancing diagnostic accuracy5254. Multi-modal deep learning frameworks combine data from multiple sources to capture complementary information, which could improve the robustness and accuracy of PD diagnosis. For instance, the fusion of imaging data, such as MRI scans, with clinical features like patient symptoms and medical history had been shown to outperform single-modal approaches55. Similarly, the integration of speech and handwriting data had been effectively utilised for the identification of subtle patterns associated with PD53,56. Many works have been done in this field, specifically focused on MRI, nuclear MRI, pattern recognition, and signal processing, as shown in Fig. 16.

Fig. 16.

Fig. 16

Visualization of works done based on the keywords on a total of 2562 documents from Scopus. The visualisation was done using VosViewer (open-source).

Neuroimaging data

Neuroimaging modalities, particularly MRI and SPECT images, provided valuable insights into structural and functional changes in the brain associated with PD. Studies have demonstrated that 3D CNN could effectively analyse multi-modal MRI data, which included T1-weighted and Quantitative Susceptibility Mapping (QSM) images for identification of biomarkers like the substantia nigra and thalamus, very critical for PD diagnosis57. The use of transfer learning-based CNNs had also been explored for leveraging pre-trained models for feature extraction from MRI scans, which achieved high accuracy in the differentiation of PD patients from healthy controls55.

Speech and handwritten data

Speech and handwriting dynamics are non-invasive and easily accessible modalities that have been extensively studied for the detection of PD. The handwritten features, such as stroke dynamics and tremors, were analysed using CNN-LSTM architectures, which captured spatial and temporal patterns53. Similarly, speech features, including voice recordings and prosody, had been used to identify subtle deviations that are most indicative of PD56,58. The fusion of speech and handwriting data has enhanced diagnostic accuracy, with studies reporting accuracy rates of up to 94.6% using hybrid architectures like CASENet53. These findings underscored the potential of behavioral data in early PD detection, especially in conjunction with other modalities.

Gait and physiological signals

Gait disturbances are the most common symptoms of PD, and wearable sensors have been increasingly used to capture gait patterns for early detection. A pioneering study using CNN-GRU-GNN architecture had demonstrated exceptional performance in the classification of PD patients based on gait data cycle, which achieved accuracy, precision, recall, and F1-score values of 99.51%, 99.57%, 99.71%, and 99.64% respectively59. This approach used the potential of spatial and temporal dynamics of gait data, highlighting the importance of multi-modal sensor fusion in capturing complex gait patterns. Physiological signals, such as EEG, ECG, PPG, and respiratory data, have also been explored for PD diagnosis. A study using multi-modal Support Vector Machines (SVM) achieved an accuracy of 96.03% by integrating these signals, demonstrating the potential of physiological data in identifying PD biomarkers60. The integration of EEG and magnetic resonance data further enhanced diagnostic precision, with the LightGBM model achieving an accuracy of 97.17%61.

Fusion techniques and model architectures

The success of multi-modal deep learning frameworks in PD diagnosis hinged on effective fusion techniques that combined complementary information from diversified data sources. Various fusion strategies have been explored, including early fusion, late fusion, and hybrid approaches. Early fusion involved concatenating features from different modalities before feeding them into a classification model, while late fusion combined the outputs of modality-specific models62,63. Hybrid fusion approaches, which integrated feature and decision levels, were particularly effective in PD diagnosis. For instance, a study that combined CNN and LSTM models for feature extraction, followed by a voting classifier for decision fusion, achieved an accuracy of 99.95% on handwriting and motion datasets62,63. Similarly, the use of attention mechanisms and result ensemble modules had been shown to enhance the robustness of multi-modal models, particularly in smartphone-based applications64.

Contributions

To increase the precision and robustness of Parkinson’s disease (PD) diagnosis, we propose a multimodal deep learning framework that merges neuroimaging, speech, handwriting, gait, and physiological signals. Unlike the traditional single-modal approach, our model also effectively combines the complementary information of different modalities represented as CNN-LSTM for speech & handwriting 3D CNN for neuroimaging. To achieve optimal feature representation, we apply hybrid fusion strategies such as early, late, and attention-based fusion strategies to extract comprehensive biomarkers. Finally, we introduce transfer learning and domain adaptation methods to increase generalizability between different datasets. Based on state-of-the-art deep learning techniques (e.g., CASENet and LightGBM), our framework has outperformed these classic learning approaches in identifying very subtle patterns associated with PD. We ensure high diagnostic precision by integrating advanced feature selection, ensemble learning, and uncertainty quantification methods, outperforming existing approaches. This research goes beyond the localization of AI for diagnostic pathologies in PD to establish the basis for real-time clinical applications that support early detection and personalized treatment modalities.

Conclusions

The proposed work introduces MultiParkNet, a multi-modal deep learning model of Parkinson’s disease (PD) identification at an early stage. Using a variety of biomarkers in neurological and physiological areas, such as speech patterns, movement of limbs, medical imaging results, nerve conduction information, and cardiovascular, the model was able to reach 99.67 percent training accuracy, 98.15 percent (region 1.24) validation accuracy, and 96.74 percent (region 3.70) verification in cross-validation tests. These findings prove a high sensitivity in the differentiation of PD and healthy individuals as well as the severity of motor symptoms.

Explainable AI (XAI) methods increase the interpretability and explainability of the framework, leading to clinical trust. Combining several biomarkers allows one to visualize the assembly of disease development, which allows for the use of more specific treatment approaches.

We acknowledge that assimilating and matching multiple data modalities in clinical practice that occur in the real world has practical difficulties because of variations in equipment, workload limitations, and patient followers. To solve this, methods in deployment may focus first on the most discriminative modalities, wearable sensors to continually capture data, and modular designs to enable the system to still work despite some of the inputs being unavailable. Development of standardized protocols in acquisition will also improve feasibility and consistency in the various healthcare facilities.

Going into the future, future work will aim at clinical validation in practice, privacy-preserving federated learning, and deployment of lightweight inference onto edge hardware running IoT-enabled devices in support of remote monitoring. Increasing the dataset with various populations and extending the dataset longitudinally in patients is also likely to increase model generalization.

On the whole, MultiParkNet provides an expandable, explainable, and clinically adjustable application that can hold great promise to advance the early identification of and monitoring of PD, as well as personalized planning of PD-associated treatment.

Future scope

The proposed MultiParkNet framework proved to be highly effective for the early diagnosis of PD by integrating different multi-modal data sources and deep learning structures. But loom several directions for future developments and research in order to increase even more the diagnostic accuracy, the specificity of the model, as well as the clinical applicability.

Expansion of multi-modal data sources can incorporate further data sources, such as gait investigation, handwritten examples, and facial appearance recognition. Wearable sensors reviewing motion, muscle quivers, and coming physiological signals may prompt more data-driven determination. Integrating heterogeneous modalities would increase the system’s capability to distinguish PD from other neurologic conditions. Moreover, major signs such as speech articulation, sleep pattern analysis, and olfactory dysfunction could also act as biomarkers. Incorporating such multi-modal information in MultiParkNet gives a more complete description of disease progression, too good for early detection, and allows for personalized treatment plans based on a person’s symptoms and way of disease progression.

Federated Learning for data privacy issues of medical data protection is highly important for AI-driven health software applications. Federated learning allows models to be trained in various institutions without consuming raw patient data, upholding HIPAA and GDPR compliance. This decentralized learning way of thinking permits cooperation amongst those research establishments whereby information is shielded. It can also reduce bias and incorporate diversified patient racial groups. Future development of MultiParkNet could use federated learning to acquire better generalizability and privacy, enabling capabilities in AI-driven PD checking without invading patient privacy for institutions worldwide.

Real-world clinical validation must be tested with patient cohorts to allow for clinical utility. Big clinical trials with large demographics, a range of diseases, and multi-centered cohorts can check the robustness of the model. Working with hospitals and neurological research centers would allow for conducting thorough trials in real clinical settings. Furthermore, assessing model performance on diverse image modalities, sensor data, and language variations (for speech data) will enhance globalization. The real-world deployment will also show practical issues like data noise, missing values, craziness, and inconsistencies, allowing researchers to strengthen model robustness before widespread clinical use. As an immediate next step, small-scale pilot validations (10–20 patients) in collaboration with hospitals will be critical before large trials. These pilots would evaluate the framework under real-world conditions, including confounding diseases, heterogeneous devices, and environmental noise. They would also provide valuable insights into the causes of false positives and false negatives observed in this study, serving as a proxy for potential clinical pitfalls. Such pilot testing will help refine calibration strategies, optimize decision thresholds, and establish clinical feasibility before wider adoption.

Personalized disease progression prediction of PD evolves uniquely from one individual to the next. Future extensions to MultiParkNet might consist of a longitudinal assessment of the patients’ symptoms, utilizing time-series forecasting methods like LSTMs or transformers to forecast the disease’s progress. The model could provide early alerts for worsening conditions by constantly studying a patient’s changing symptoms and biomarker levels so that healthcare professionals could tailor treatments. This strategy would facilitate preventive interventions, such as drug titrations or therapy suggestions, which give patients better service. Personalized models may also enable the initial testing of clinical trials whereby drugs can be found and delivered faster based on specific PD progression patterns.

In summary, the MultiParkNet framework offers a framework for a transformational, AI-boosted method of PD detection. Future research efforts should mainly be targeted at developing techniques for model interpretability, scalability, and real-world applicability that can lead to the transformative task of early PD diagnosis and impact patient-driven outcomes.

Acknowledgements

The authors would like to thank each other and University of Petroleum and Energy Studies for providing GPU access for model training.

Author contributions

Ayan Sar was responsible for paper preparation, idea formulation, designing diagrams, and coding. Pranav Singh Puri contributed to coding, model creation, model training, model fine-tuning, and architecture development. Huma Naz provided guidance throughout the project, contributed to paper preparation, and conducted the final review. Sumit Aich assisted with paper preparation. Tanupriya Choudhury provided guidance and conducted the final review of the paper, while Lubna Abdelkhreim Gabralla offered guidance and was instrumental in securing funding for the project.

Funding

This research work was supported by the Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R178), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data availability

The data used in this research are compiled and available at https://www.kaggle.com/datasets/asthamishra96/ parkinson-multi-model-dataset-2-0. The primary data sources are mentioned in the manuscript with proper links and in the above link in Kaggle.

Code availability

The code for this manuscript is available at Github and will be made available upon request to the corresponding author.

Declarations

Competing interests

The authors declare no competing interests.

Ethical approval

There were no human or animal subjects in the full procedure of the research.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ayan Sar, Pranav Singh Puri, Huma Naz, Sumit Aich, Tanupriya Choudhury, and Lubna Abdelkhreim Gabralla have contributed equally to this work.

References

  • 1.Heidari, N., Heidari, P., Salari, N., Akbari, H. & Mohammadi, M. A systematic review on risk factors and protective factors related to Parkinson’s disease. Tehran Univ. Med. J.79(12), 925–933 (2022). [Google Scholar]
  • 2.Ternák, G., Németh, M., Rozanovic, M., Márovics, G. & Bogár, L. Antibiotic consumption patterns in European countries are associated with the prevalence of Parkinson’s disease; the possible augmenting role of the narrow-spectrum penicillin. Antibiotics11, 1145. 10.3390/antibiotics11091145 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chou, K. L. et al. Quality improvement in neurology: 2020 Parkinson disease quality measurement set update. Neurology97, 239–245. 10.1212/wnl.0000000000012198 (2021). [DOI] [PubMed] [Google Scholar]
  • 4.Luo, Y. et al. Global, regional, national epidemiology and trends of Parkinson’s disease from 1990 to 2021: findings from the global burden of disease study 2021. Front. Aging Neurosci.16, 1498756. 10.3389/fnagi.2024.1498756 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lamba, D. A. Patient-Specific Stem Cells (CRC Press, 2017). [Google Scholar]
  • 6.Tomas-Camardiel, M., Herrera, A., Venero, J., J. Cano & Machado, A. Inflammatory process as a determinant factor for the degeneration of substantia nigra dopaminergic neurons: Possible relevance to the etiology of parkinsons disease. Current Med. Chem.-Central Nerv. Syst. Agents4, 223–233. 10.2174/1568015043356913 (2004).
  • 7.Rizvi, S. Z. H., Palimar, V., Gupta, C. & Andrade, L. S. Spectrum of non-motor symptoms in Parkinson’s disease: a review. Ann. Clin. Exp. Neurol.18, 72–80. 10.17816/acen.1001 (2024). [Google Scholar]
  • 8.Carrarini, C. et al. A stage-based approach to therapy in Parkinson’s disease. Biomolecules9, 388. 10.3390/biom9080388 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sheta, R., Bérard, M., Musiol, D., Martínez-Drudis, L. & Oueslati, A. Behavioral analysis of motor and non-motor impairment in rodent models of Parkinson’s disease. Front. Aging Neurosci.10.3389/fnagi.2024.1464706 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hou, J.-G.G. & Lai, E. C. Non-motor symptoms of Parkinson’s disease. Int. J. Gerontol.1, 53–64. 10.1016/s1873-9598(08)70024-3 (2007). [Google Scholar]
  • 11.Löhle, M., Storch, A. & Reichmann, H. Beyond tremor and rigidity: non-motor features of Parkinson’s disease. J. Neural Transm.116, 1483–1492. 10.1007/s00702-009-0274-1 (2009). [DOI] [PubMed] [Google Scholar]
  • 12.Yaliman, A. & Sen, E. Parkinson’s disease and rehabilitation. Turk. J. Phys. Med. Rehabil.10.4274/tftr.57.07 (2011). [Google Scholar]
  • 13.Gupta, S., Venkatesh, A., Ray, S. & Srivastava, S. Challenges and prospects for biomarker research: A current perspective from the developing world. Biochim. Biophys. Acta (BBA) Proteins Proteom.1844, 899–908. 10.1016/j.bbapap.2013.12.020 (2014). [DOI] [PubMed] [Google Scholar]
  • 14.Skevaki, C. et al. Immune biomarkers in the spectrum of childhood noncommunicable diseases. J. Allergy Clin. Immunol.137, 1302–1316. 10.1016/j.jaci.2016.03.012 (2016). [DOI] [PubMed] [Google Scholar]
  • 15.Zoccali, C. et al. Biomarkers in clinical epidemiology studies. Clin. Kidney J.10.1093/ckj/sfae130 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Taguchi, Y.-H. & Murakami, Y. Universal disease biomarker: can a fixed set of blood microRNAs diagnose multiple diseases?. BMC Res. Notes10.1186/1756-0500-7-581 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang, C., Hwang, W. & Song, X. Biomarker data with measurement error in medical research: A literature review. WIREs Comput. Stat.10.1002/wics.1641 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mei, J., Desrosiers, C. & Frasnelli, J. Machine learning for the diagnosis of Parkinson’s disease: A review of literature. Front. Aging Neurosci.10.3389/fnagi.2021.633752 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Asmae, O., Saleh, S., Abdelhadi, R. & Bachir, B. Enhancing parkinson’s disease diagnosis: A stacking ensemble approach leveraging machine learning techniques. In 2024 4th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), 1–7. 10.1109/iraset60544.2024.10549375 (IEEE, 2024).
  • 20.Shetty, M., Shetty, S. B., Jambha, H. V. G. & Hrithvika. Application of machine learning and data analytics in detection of parkinsons disease. In 2024 Second International Conference on Data Science and Information System (ICDSIS), 1–5. 10.1109/icdsis61070.2024.10594328 (IEEE, 2024).
  • 21.Om Prakash, P. G., Reddy, B. N. S.K. & Lohith, S. S. M. Machine learning-based prediction of parkinson’s disease: A comparative analysis of algorithms. In 2023 3rd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 1011–1018. 10.1109/icimia60377.2023.10426030 (IEEE, 2023).
  • 22.Ayus, I. & Azad, C. Recent advances in multimodal machine learning for parkinson’s disease diagnosis: A comprehensive review. In 2023 2nd International Conference on Ambient Intelligence in Health Care (ICAIHC), 1–7. 10.1109/icaihc59020.2023.10431435 (IEEE, 2023).
  • 23.Gaba, S. & Kaur, H. Machine learning techniques for parkinson’s disease prediction and progression: A comprehensive review. In 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), 430–436. 10.1109/ic3se62002.2024.10593626 (IEEE, 2024).
  • 24.Khanom, F., Biswas, S., Uddin, M. S. & Mostafiz, R. Xemlpd: an explainable ensemble machine learning approach for Parkinson disease diagnosis with optimized features. Int. J. Speech Technol.27, 1055–1083. 10.1007/s10772-024-10152-2 (2024). [Google Scholar]
  • 25.Dash, S. K., Sethy, P. K., Das, A., Jena, S. & Nanthaamornphong, A. Advancements in deep learning for automated diagnosis of ophthalmic diseases: A comprehensive review. IEEE Access12, 171221–171240. 10.1109/access.2024.3496565 (2024). [Google Scholar]
  • 26.Geroski, T. & Filipović, N. Artificial Intelligence Empowering Medical Image Processing 179–208 (Springer Nature Switzerland, 2024). [Google Scholar]
  • 27.Xie, Q. et al. Deep learning for image analysis: Personalizing medicine closer to the point of care. Crit. Rev. Clin. Lab. Sci.56, 61–73. 10.1080/10408363.2018.1536111 (2019). [DOI] [PubMed] [Google Scholar]
  • 28.Sneha, Y. et al. Advancements in brain tumor detection using machine learning applications from mri image analysis. In 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 809–814. 10.1109/i-smac58438.2023.10290231 (IEEE, 2023).
  • 29.Dias, R. & Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med.10.1186/s13073-019-0689-8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Swamy, S. R. & Nandini Prasad, K. S. Revolutionizing healthcare intelligence multisensory data fusion with cutting-edge machine learning and deep learning for patients’ cognitive knowledge. In 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), 1–7. 10.1109/ickecs61492.2024.10616464 (IEEE, 2024).
  • 31.Daneshjou, R., Kidzinski, L., Afanasiev, O. & Chen, J. H. Session intro: Artificial intelligence for enhancing clinical medicine. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, vol. 25, 1 (2020).
  • 32.Govindaraj, M. et al. Revolutionizing Healthcare: The Transformative Impact of Artificial Intelligence, 54–78 (IGI Global, 2024). [Google Scholar]
  • 33.Gelderen, L. & Tejedor-García, C. Innovative speech-based deep learning approaches for Parkinson’s disease classification: A systematic review. Appl. Sci.14, 7873. 10.3390/app14177873 (2024). [Google Scholar]
  • 34.Aversano, L. et al. An Explainable Approach for Early Parkinson Disease Detection Using Deep Learning, 326–339 (Springer Nature Switzerland, 2023). [Google Scholar]
  • 35.Khanna, K., Gambhir, S. & Gambhir, M. Comparative analysis of machine learning techniques for Parkinson’s detection: A review. Multimed. Tools Appl.82, 45205–45231. 10.1007/s11042-023-15414-w (2023). [Google Scholar]
  • 36.Ianculescu, M., Petean, C., Sandulescu, V., Alexandru, A. & Vasilevschi, A.-M. Early detection of Parkinson’s disease using ai techniques and image analysis. Diagnostics14, 2615. 10.3390/diagnostics14232615 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zhang, J., Lee, Y., Chung, T.-M. & Park, H. Development of a Handwriting Drawings Assessment System for Early Parkinson’s Disease Identification with Deep Learning Methods 484–499 (Springer Nature Singapore, 2023). [Google Scholar]
  • 38.Palsapure, P. N., Bhavana, B. G., Jagadish, M. & Ravikumar, K. T. Detecting early signs of parkinson’s disease: A machine learning-based approach for diagnostic assistance. In 2024 First International Conference on Software, Systems and Information Technology (SSITCON), 1–8. 10.1109/ssitcon62437.2024.10796148 (IEEE, 2024).
  • 39.Bi, X.-A. et al. The exploration of Parkinson’s disease: a multi-modal data analysis of resting functional magnetic resonance imaging and gene data. Brain Imaging Behav.15, 1986–1996. 10.1007/s11682-020-00392-6 (2020). [DOI] [PubMed] [Google Scholar]
  • 40.Lunardini, F. et al. Multi-modal Technology-Based Assessment of Parkinson’s Disease: Technological Platform of the AI4HA Project 675–679 (Springer Nature Switzerland, 2024). [Google Scholar]
  • 41.Zham, P., Kumar, D. K., Dabnichki, P., Poosapadi Arjunan, S. & Raghav, S. Distinguishing different stages of Parkinson’s disease using composite index of speed and pen-pressure of sketching a spiral. Front. Neurol.10.3389/fneur.2017.00435 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tagaris, A., Kollias, D., Stafylopatis, A., Tagaris, G. & Kollias, S. Machine learning for neurodegenerative disorder diagnosis-survey of practices and launch of benchmark dataset. Int. J. Artif. Intell. Tools27, 1850011 (2018). [Google Scholar]
  • 43.Bousseljot, R., Kreiseler, D. & Schnabel, A. Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet. Biomed. Tech./Biomed. Eng.10.1515/bmte.1995.40.s1.317 (2009). [Google Scholar]
  • 44.Gal, Y. & Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. International Conference on Machine Learning (ICML) (2016).
  • 45.Chakraborty, S., Aich, S. & Kim, H.-C. Detection of Parkinson’s disease from 3t t1 weighted MRI scans using 3d convolutional neural network. Diagnostics10, 402. 10.3390/diagnostics10060402 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kurmi, A. et al. An ensemble of cnn models for Parkinson’s disease detection using datscan images. Diagnostics12, 1173. 10.3390/diagnostics12051173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kalaiyarasi, I., Amudha, P. & Sivakumari, S. Parkinson’s disease detection using deep learning technique. Int. J. Res. Appl. Sci. Eng. Technol.11, 1789–1796. 10.22214/ijraset.2023.51916 (2023). [Google Scholar]
  • 48.Srinivasan, S. et al. Detection of Parkinson disease using multiclass machine learning approach. Sci. Rep.10.1038/s41598-024-64004-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Prashanth, R., Roy, S. D., Mandal, P. K. & Ghosh, S. High-accuracy detection of early Parkinson’s disease through multimodal features and machine learning. Int. J. Med. Inform.90, 13–21. 10.1016/j.ijmedinf.2016.03.001 (2016). [DOI] [PubMed] [Google Scholar]
  • 50.Shen, M., Mortezaagha, P. & Rahgozar, A. Explainable artificial intelligence to diagnose early Parkinson’s disease via voice analysis. Sci. Rep.10.1038/s41598-025-96575-6 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chang, Y., Liu, J., Sun, S., Chen, T. & Wang, R. Deep learning for Parkinson’s disease classification using multimodal and multi-sequences pet/mr images. EJNMMI Res.10.1186/s13550-025-01245-3 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Dentamaro, V., Impedovo, D., Musti, L., Pirlo, G. & Taurisano, P. Enhancing early Parkinson’s disease detection through multimodal deep learning and explainable ai: insights from the ppmi database. Sci. Rep.10.1038/s41598-024-70165-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gayathri, N., Rakesh Kumar, S., Reddy, U. J., Reddy, M. R. & Ravikanth, G. Early Parkinson’s Disease Diagnosis Using Multi-Modal CASENet CNN-LSTM 248–264 (IGI Global, 2024). [Google Scholar]
  • 54.Li, L., Dai, F., He, S., Yu, H. & Liu, H. Automatic Diagnosis of Parkinson’s Disease Based on Deep Learning Models and Multimodal Data 179–200 (IGI Global, 2024). [Google Scholar]
  • 55.Zhu, S. Early diagnosis of parkinson’s disease by analyzing magnetic resonance imaging brain scans and patient characteristic. In 2022 10th International Conference on Bioinformatics and Computational Biology (ICBCB). 10.1109/ICBCB55259.2022.9802132 (2022)
  • 56.Vásquez-Correa, J. C. et al. Multimodal assessment of Parkinson’s disease: A deep learning approach. IEEE J. Biomed. Health Inform.23, 1618–1630. 10.1109/JBHI.2018.2866873 (2019). [DOI] [PubMed] [Google Scholar]
  • 57.Ji, K. et al. Using multi-modal mri data for parkinson’s disease diagnosis based on 3d convolutional neural network. In 2023 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD), 1–5. 10.1109/ICSMD60522.2023.10490877 (2023).
  • 58.Sree, B. N., Lakshmi, M. R., Sree, B. S., Nandini, B. & Shravani, H. Utilizing multiple modalities for Parkinson’s detection. Int. J. Adv. Sci. Comput. Appl.10.47679/ijasca.v4i2.82 (2024). [Google Scholar]
  • 59.Rashnu, A. & Salimi-Badr, A. Integrative deep learning framework for parkinson’s disease early detection using gait cycle data measured by wearable sensors: A cnn-gru-gnn approach. arxiv:2404.15335 (2024).
  • 60.Guo, G. et al. Diagnosing Parkinson’s Disease Using Multimodal Physiological Signals 125–136 (Springer Singapore, 2021). [Google Scholar]
  • 61.Alrawis, M., Al-Ahmadi, S. & Mohammad, F. Bridging modalities: A multimodal machine learning approach for Parkinson’s disease diagnosis using EEG and MRI data. Appl. Sci.14, 3883. 10.3390/app14093883 (2024). [Google Scholar]
  • 62.Aljohani, A. Late feature fusion using neural network with voting classifier for Parkinson’s disease detection. BMC Med. Inform. Decis. Mak.10.1186/s12911-024-02683-0 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Aljohani, A. Late fused multi-modal neural network with voting classifier for Parkinson’s disease detection. Res. Sq.6, 1–9. 10.21203/rs.3.rs-3997112/v1 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.He, T., Chen, J. & Chen, Y. Smartphone-based detection of early Parkinson’s disease with tapping records and a multimodal-multiscale ensemble network. IEEE Sens. J.24, 33207–33216. 10.1109/JSEN.2024.3452092 (2024). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this research are compiled and available at https://www.kaggle.com/datasets/asthamishra96/ parkinson-multi-model-dataset-2-0. The primary data sources are mentioned in the manuscript with proper links and in the above link in Kaggle.

The code for this manuscript is available at Github and will be made available upon request to the corresponding author.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES