Applications of machine learning for computer-aided diagnosis of Parkinson’s disease: progress and benchmark case study

Juntao Zhang; Yiming Zhang; Ying Weng; Akram A Hosseini; Boding Wang; Tom Dening; Weinyu Fan; Weizhong Xiao

doi:10.1007/s10462-025-11347-y

. 2025 Aug 29;58(11):357. doi: 10.1007/s10462-025-11347-y

Applications of machine learning for computer-aided diagnosis of Parkinson’s disease: progress and benchmark case study

Juntao Zhang ^1,^#, Yiming Zhang ^1,^#, Ying Weng ^1,^2,^✉, Akram A Hosseini ^3,⁵, Boding Wang ⁴, Tom Dening ⁵, Weinyu Fan ⁴, Weizhong Xiao ⁶

PMCID: PMC12397128 PMID: 40895221

Abstract

Machine learning (ML) has emerged as a vital tool for the diagnosis of Parkinson’s Disease (PD). This study presents a comprehensive review on the applications of ML for computer-aided diagnosis (CAD) of PD. We conducted a comprehensive review by searching articles published from 2010 till 2024. The risk of bias is assessed using the PROBAST checklist. Case studies are also provided. This review includes 117 articles with six categories: neuroimaging data (20.5%); voice data (40.2%); handwriting data (12.0%); gait data (14.5%); EEG data (8.5%); and other data (4.3%). According to the PROBAST checklist, only 28 articles (23.9%) have a low risk of bias. A benchmark case study is conducted for five different data modalities. We also discuss current limitations and future directions of applying ML to the diagnosis of PD. This review reduces the gap between Artificial Intelligence (AI) and PD medical professionals and provides helpful information for future research.

Keywords: Parkinson’s disease (PD), Machine learning (ML), Deep learning (DL), Computer-aided diagnosis (CAD), Case study

Introduction

Parkinson’s disease (PD) is the second-leading progressive neurodegenerative disorder after Alzheimer’s disease (AD) and is characterised by numerous motor and non-motor features (Jankovic 2008). Its incidence tends to increase, especially beyond the age of 60 years. PD is diagnosed based on the patient’s medical history and clinical criteria, and there is no definitive test or laboratory test for PD diagnosis (Jankovic 2008). It is a challenge for the medical specialist to correctly differentiate PD from other pathologies when the signs and symptoms of the patients overlap with other Parkinsonian syndromes (Trifonova et al. 2020). Hence, it is important to assess whether applying Computer-aided diagnosis (CAD) will help the medical specialist to aid the diagnosis of PD. Artificial Intelligence (AI) has been found helpful in healthcare, and it has been utilised for disease detection, diagnosis, treatment, and prognosis evaluation (Jiang et al. 2017). In the past 15 years, many AI methods have been applied in the field of CAD of PD. In particular, deep learning (DL) has become more attractive in the last decade than conventional Machine learning (ML), as it can discover and learn more hidden patterns from healthcare data (LeCun et al. 2015). For example, ML and DL-based methods have been applied as computer-assisted techniques on the diagnosis of brain diseases using neuroimaging data (Li et al. 2014), including the diagnosis of AD (Liu et al. 2014) and PD. Moreover, PD diagnosis using ML involves high data complexity due to the variety of data modalities, such as neuroimaging, gait, voice, and handwriting (Cano 2013). These datasets are often high-dimensional and may contain noise, making preprocessing and analysis more challenging (Khan et al. 2023, 2024; Perumal et al. 2025). To comprehensively examine the research progress over the past 15 years and provide meaningful guidance on the application of ML in the medical domain, we conduct a systematic review of ML-based computer-aided diagnosis for PD. Unlike previous review papers (Zhang 2022; Sigcha et al. 2023; Wang et al. 2024), which neither focused on a limited number of data modalities nor lacked practical benchmarking efforts, our work introduces case studies that directly address the gaps in the field. To ensure methodological rigor, we select five of the most commonly used data modalities and use a public dataset. Additionally, to support transparency and reproducibility, we have released the code implementations of all benchmark case studies via GitHub: https://github.com/yiming95/PD_ML_benchmark.

Search strategy

We perform this systematic review of literature on PD diagnosis using ML techniques following the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement (Moher et al. 2009). Four electronic databases (1) IEEE Xplore, (2) Association for Computing Machinery (ACM), (3) Springer, and (4) Science Direct are searched for relevant publications from 2010 to 2024. Google Scholar and PubMed are searched between these dates for potentially relevant studies as well. We use several keywords as search queries, including “Parkinson’s disease”, “PD”, "Diagnosis”, “Diagnostics”, “Computer-Aided Diagnosis”, “Deep learning”, “Machine learning”, and “Artificial Intelligence”. The PRISMA flowchart is shown in Fig. 1.

Fig. 1 — PRISMA Flow chart. The study selection process shows the number of the literature identified, screened, assessed, and included in this systematic review

The review aimed to identify the publications on PD diagnosis using ML. All included articles focus on the topic of PD diagnosis using ML. Besides, only publications in English are included. The publications focusing on the treatment or prognostic evaluation of PD, or those only using image analysis or signal analysis methods, are excluded. Review papers and non-peer-reviewed papers are also excluded. Publications were first screened for eligibility by Title and Abstract. Potentially eligible studies were then assessed and evaluated in full text. We then analyze and extract data from the screened articles. Data extracted from the full-text articles include (1) Author, (2) Published year, (3) Objective, (4) Data modality, (5) Dataset, (6) Number of subjects, (7) ML algorithms applied, (8) Validation, (9) Evaluation metrics. The results section analyzes five different data modalities from the main public datasets that have been used by ML researchers, including neuroimaging data, voice data, handwriting data, gait data, and electroencephalogram (EEG) data. The meta-analysis is not performed due to the heterogeneity of the included studies.

Contributions

This interdisciplinary systematic review quantifies and analyzes the last 15 years’ publications on the diagnosis of PD using ML techniques. By conducting a benchmark case study on five commonly used modalities, including MRI, gait, voice, EEG, and handwriting, we find the issues in this field, which are that the reported results are hard to reproduce and lack interpretability analysis. Furthermore, this systematic review aims to summarize the current trends of how ML techniques are applied in the early diagnosis of PD. Besides, it also aims to identify the current limitations and challenges of applying ML in the diagnosis of PD and propose a few promising future directions. Compared to previous works, this article encompasses the broadest range of literature from 2010 to 2024 and includes the largest number of modalities. Additionally, no prior works have conducted a detailed case study experiment to test the reproducibility of results across multiple modalities. The contributions of this paper can be summarized into:

We conduct a systematic review on ML-based CAD for PD applications published from 2010 to 2024. Specifically, we analyze the data modalities, dataset, ML algorithm, and model performance for each study.
We conduct a comprehensive case study on five data modalities.
The paper also discusses the current limitations and future directions of applying ML in PD diagnosis.

The rest of the paper is organized as follows. Section 2 summarizes the ML-based PD diagnosis applications and introduces the datasets and evaluation metrics. Section 3 shows the results of the risk of bias assessment. Section 4 shows the details of the case study. Section 5 provides the discussion, including a summary of the findings, current challenges, and future research directions. Section 6 summarizes the paper.

Applications of ML-based PD diagnosis

Based on our search and study selection process, we first identified 12424 articles from IEEE Xplore, ACM, Springer, and Science Direct. Additional articles are also included from Google Scholar and PubMed. After removing duplicates, 8908 articles are then screened for eligibility. After screening the article’s Title and Abstract, we excluded 8407 articles, leaving 501 articles for full-text examination. Finally, we include 117 articles for data extraction. A general procedure pipeline for PD diagnosis using ML is shown in Fig 2. Table 1 shows all the included studies.

Fig. 2 — Pipeline for the general ML-based computer-assisted PD diagnosis

Table 1.

Summary of studies on PD classification using different modalities

Author	Year	Objective	Data modality	Dataset	Subjects	ML Algorithm	Validation	Evaluation metrics
Neuroimaging
Prashanth et al. (2014)	2014	Classification (PD vs. HC)	Neuroimaging: DaTSCAN SPECT	PPMI	548 subjects: 369 PD + 179 HC	RBF-SVM	10-fold cross-validation	Accuracy: 96.14%, Sensitivity: 96.55%, Specificity: 95.03%
Salvatore et al. (2014)	2014	Classification (PD vs. HC)	Neuroimaging: MRI	Collected from participants	84 subjects: 56 PD + 28 HC	SVM	Leave-One-Out (LOO) validation	Accuracy: 92.2%, Sensitivity: 94.4%, Specificity: 91.3%
Rana et al. (2015)	2015	Classification (PD vs. HC)	Neuroimaging: MRI	Collected from participants	60 subjects: 30 PD + 30 HC	SVM	leave-one-out cross-validation (LOOCV)	Accuracy: 86.67%, Sensitivity: 90.00%, Specificity: 83.33%
Oliveira and Castelo–Branco (2015)	2016	Classification (PD vs. HC)	Neuroimaging: FP-CIT SPECT	PPMI	654 subjects: 445 PD + 209 HC	SVM	LOOCV	Accuracy: 97.68%, Sensitivity: 97.75%, Specificity: 98.09%
Zhang and Kagen (2017)	2017	Classification (PD vs. HC)	Neuroimaging: DaTSCAN SPECT	PPMI	Not specified	ANN	10-fold cross-validation	Accuracy: 93.8%, Sensitivity: 97.4%, Specificity: 82.2%
Peng et al. (2017)	2017	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	172 subjects: 69 PD + 103 HC	RBF-SVM	10-fold cross-validation	Accuracy: 85.8%, Sensitivity: 87.6%, Specificity: 87.8%
Sivaranjini and Sujatha (2020)	2020	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	182 subjects: 100 PD + 82 HC	AlexNet	Train and Test split (80–20%)	Accuracy: 88.90%, Sensitivity: 89.30%, Specificity: 88.40%
West et al. (2019)	2019	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	445 subjects: 299 PD + 146 HC	3D CNN	Not specified	Accuracy: 75%, Sensitivity: 76%, Specificity: 74%, Precision: 74%
Dai et al. (2019)	2019	Classification (PD vs. HC)	Neuroimaging: PET	PPMI, ANDI, HCP	Not specified	U-Net	10-fold cross-validation	Accuracy (U-Net): 84.17%, Accuracy (CNN): 76.19%
Zhang et al. (2019)	2019	Classification (Prodromal PD vs. Confirmed PD vs. HC)	Neuroimaging: MRI	PPMI	578 subjects: 49 Prodromal PD + 366 Confirmed PD + 163 HC	Deep neural network with Broad Views (DBV)	Train and Test split (80–20%)	Accuracy: 76.27%
Chakraborty et al. (2020)	2020	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	406 subjects: 203 PD + 203 HC	3D CNN	5-fold cross-validation	Accuracy: 95.29%, F1 score: 93.6%, Specificity: 94.3%, Precision: 92.7%, Recall: 94.3%, ROC-AUC: 98%
Kaur et al. (2021)	2021	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	Not specified	AlexNet	Train, Validation and Test split (60–20–20%)	Accuracy: 89.23%, Sensitivity: 90.27%, Specificity: 89.03%, ROC-AUC: 97.23%
Vyas et al. (2022)	2022	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	318 subjects: 236 PD + 82 HC	3D CNN	Train and Test split validation (70–30%)	3D CNN Accuracy: 88.9% 3D CNN AUC: 86.0%
Ya et al. (2022)	2022	Classification (PD vs. NC)	Neuroimaging: MRI	Collected from participants, PPMI	Collected from participants 116 subjects: 60 PD + 56 NC; PPMI 140 subjects: 69 PD + 71 NC	Regression models	5-fold cross-validation	Cerebellar model AUC: 64.6% Subcortical model AUC:63.2% Cortical model AUC:69.0% Combined model AUC:75.6%
Erdaş and Sümer (2022)	2022	Classification (PD vs. NC)	Neuroimaging: MRI	Combined from multiple datasets (Badea et al. 2017)	83 subjects: 47 PD + 36 NC	2D CNN	10-fold cross-validation	Accuracy: 90.36%, ROC-AUC: 90.51%, F1 score: 90.25%, Sensitivity: 90.52%, Precision: 90.08%
Huang et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	194 subjects: 97 PD + 97 HC	multi-task node cluster based graph structure learning framework (MNC-Net)	10-fold cross-validation	Accuracy: 95.5%, F1 score: 95.49%, Precision: 97.00%, Recall: 94.42%
Xu et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	117 subjects: 84 PD + 34 HC	DNN	5-fold cross-validation	Accuracy: 96.4%
Camacho et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	PPMI	2041 subjects: 1024 PD + 1017 HC	CNN with Log-Jacobian model	Train, Validation, and Test split (85–5–10%)	Accuracy: 79.3%, Precision: 80.2%, Specificity: 81.3%, Sensitivity:77.7%
Priyadharshini et al. (2024)	2024	Classification (PD vs. HC)	Neuroimaging: 3D MRI	PPMI	500 subjects: 180 PD + 160 prodromal PD + 160 HC	Gradient Boosting (GB), with SHAP, LIME, SHAPASH for XAI	5-fold cross-validation	Accuracy: 96.8% Precision: 97% Recall: 94.2% Specificity: 96.6% F1 score: 94.6%
Talai et al. (2021)	2021	Classification (PD vs. PSP vs. HC)	Neuroimaging: T1, T2, DTI MRI	PPMI	103 subjects: 45 PD + 20 PSP-RS + 38 HC	SVM+MLP	LOOCV	Accuracy: 95.1%
Prasuhn et al. (2020)	2020	Classification (PD vs. HC)	Neuroimaging: Diffusion Tensor Imaging (DTI)	PPMI	232 subjects: 162 PD + 70 HC	SVM (bSVM)	10-fold cross-validation	Balanced Accuracy: 58.1% ROC-AUC: 52.0% Sensitivity: 56% Specificity:41%
Chen et al. (2023)	2023	Classification (PD-MCI vs. PD-NC)	Neuroimaging: DTI (FA, MD, AD, RD, LDH)	Collected from participants	117 subjects: 52 PD-NC + 68 PD-MCI	XGBoost	10-fold cross-validation	Accuracy: 91.67%, Sensitivity: 92.86%, Specificity: 90.00%, AUC: 94.00%
Tsai et al. (2023)	2023	Classification (PD vs. PSP vs. MSA vs. HC)	Neuroimaging: DTI (whole-brain features)	Collected from participants	625 subjects: 286 PD + 69 PSP + 51 MSA + 219 HC	SVM, Discriminant Function Analysis	5-fold cross-validation	Accuracy: 83.0%, Sensitivity: 84.8%, Specificity: 78.3%, F1 Score: 86.7%
Zhao et al. (2022)	2022	Classification (PD vs. HC)	Neuroimaging: DTI (Fractional Anisotropy, MD)	Collected from participants	532 subjects: 305 PD + 227 HC	3D CNN	10-fold cross-validation, independent test set	AUC: 94.1%
Voice
Sakar and Kursun (2010)	2010	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	SVM	LOOCV	Accuracy: 81.53%, (LOO validation) Accuracy: 92.75% ( bootstrap resampling validation)
Bhattacharya and Bhatia (2010)	2010	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Linear-SVM	Cross-validation	Accuracy: 65.22%
Guo et al. (2010)	2010	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Minimum distance classifier (MDC)	10-fold cross-validation	Accuracy: 93.12%
Åström and Koker (2011)	2011	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Parallel network system (9 FNN)	train and test split (60–40%)	Accuracy: 91.2% ± 1.6%
Ramani and Sivagami (2011)	2011	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Fisher Filter + RF	Not specified	Accuracy: 100%
Yadav et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	SVM	10-fold cross-validation	Accuracy: 76%, Sensitivity: 97%, Specificity: 13%
Tsanas et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	NCVS	43 subjects: 33 PD + 10 HC	RELIEF + SVM	10-fold cross-validation	Accuracy: 98.6%
Mandal and Sairam (2014)	2014	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	LR	10-fold cross-validation	Accuracy: 100%, Sensitivity: 98.3%, Specificity: 99.6%
Hazan et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	Collected from participants	American Dataset: 52 subjects: 38 PD + 14 HC German Dataset: 98 subjects: 68 PD + 30 HC	SVM	Cross-validation	American Accuracy: 96%, German Accuracy: 85%
Gharehchopogh and Mohammadi (2013)	2013	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	MLP	train and test split (70–30%)	Accuracy: 93.22%
Rustempasic and Can (2013)	2013	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	MLP	Not specified	Accuracy: 81.33%
Sharma and Giri (2014)	2014	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	RBF-SVM	Train and Test split (80–20%)	Accuracy: 85.29%, Sensitivity: 100%, Specificity: 37.5%
Olanrewaju et al. (2014)	2014	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	MLFFN + K-Means	Train and Test split (50–50%)	Accuracy: 80%, Sensitivity: 63.6%, Specificity: 83.3%
Peker et al. (2015)	2015	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	CVANN	10-fold cross-validation	Accuracy: 98.12%, Sensitivity: 99.24%, Specificity: 98.96%
Gök (2015)	2015	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Linear SVM + KNN	10-fold cross-validation	Accuracy: 98.46%
Chen et al. (2016)	2016	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	mRMR – KELM	10-fold cross-validation	Accuracy: 95.97%
Avci and Dogantekin (2016)	2016	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	GA-WK-ELM	3-fold cross-validation	Highest Accuracy: 96.81%
Dinesh and He (2017)	2017	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Boosted Decision Tree	10-fold cross-validation	Highest Accuracy: 95%
Caliskan et al. (2017)	2017	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	DNN	10-fold cross-validation	Accuracy: 86.095%, Sensitivity: 58.27%, Specificity: 95.387%
Parisi et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository	40 subjects: 20 PD + 20 HC	MLP-LSVM	20-fold cross-validation	Accuracy: 100%, Sensitivity: 100%, Specificity: 100%
Wroge et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	mPower dataset	N/A	SVM	10-fold cross-validation	Accuracy: 85%, Precision: 84%, Recall: 71%
Lahmiri et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	Private dataset	195 subjects: 147 PD + 48 HC	SVM	10-fold cross-validation	Accuracy: 92%, Sensitivity: 95%, Specificity: 91%
Haq et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	DNN	Train and Test split (70–30%)	Accuracy: 98%, Sensitivity: 95%, Specificity: 99%
Ali et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository	40 subjects: 20 PD + 20 HC	LDA-NN-GA	leave-one-subject-out (LOSO) validation	Training Accuracy: 80%, Testing Accuracy: 82.14%
Mostafa et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	RF	10-fold cross-validation	Accuracy: 99.49%, Precision: 95.5%, Recall: 95.5%
Lahmiri and Shmuel (2019)	2019	Classification (PD vs. HC)	Voice dataset	Private dataset	43 subjects: 33 PD + 10 HC	Wilcoxon statistic + SVM	10-fold cross-validation	Accuracy: 92.21%, Sensitivity: 99.63%, Specificity: 82.79%
Haq et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	L1-norm SVM feature selection + SVM	10-fold cross-validation	Accuracy: 99%, Sensitivity: 100%, Specificity: 99%
Senturk (2020)	2020	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	SVM	Not specified	Accuracy: 93.84%
Karan et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository + PC-GITA	UCI: 45 subjects: 25 PD + 20 HC PC-GITA: 45 subjects: 25 PD + 20 HC	SVM	10-fold cross-validation	UCI accuracy: 100%, PC-GITA accuracy: 96%
Soumaya et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	Collected from participants	34 subjects: 20 PD + 14 HC	GA + SVM	10-fold cross-validation	Best accuracy: 91.18%
Karaman et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	mPower dataset	N/A subjects	DenseNet-161	Not specified	Accuracy: 89.75% Specificity: 91.50% Sensitivity: 88.40%
Quan et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	Collected from participants	45 subjects: 30 PD + 15 HC	Bidirectional LSTM+CNN	10-fold cross-validation	Accuracy:75.56% F-score: 80.70% Specificity: 76.67% Sensitivity: 85.19% MCC: 0.4811
Zahid et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	pc-Gita dataset	100 subjects: 50 PD + 50 HC	AlexNet	5-fold cross-validation	Accuracy (RF): 99%, Accuracy (MLP): 99.7%
Rizvi et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	PSD dataset	40 subjects: 20 PD + 20 HC	LSTM + DNN	Not specified	Accuracy: 99.03%, Sensitivity: 99%, Specificity: 99%, Precision: 99%
Abayomi-Alli et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	Bidirectional LSTM	5-fold cross-validation	Accuracy: 82.86%
Gunduz (2019)	2019	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository	252 subjects: 188 PD + 64 HC	2D CNN	Leave-one-person-out cross-validation	Accuracy (Triple feature sets): 83.3% F-score (Triple feature sets): 89.4% MCC (Triple feature sets): 0.521
Nagasubramanian and Sankayya (2021)	2021	Classification (PD vs. HC)	Voice dataset	Parkinson telemonitoring dataset + multi-variate sound record dataset	102 subjects: 82 PD + 20 HC	DWVDA	Not specified	Accuracy (ADNN): 98.96% Specificity (ADNN): 98.82% Recall (ADNN): 98.89% Precision (ADNN): 98.90% MAE(ADNN): 1.04
Fang et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	Collected from participants	68 subjects: 34 PD + 34 HC	CNN + LSTM	LOSO validation	ACC (Talking): 94.0% ACC (DDK): 83.5% ACC (Reading): 91.1%
Ali et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	Combined from two public datasets	228 subjects: 108 PD + 120 HC	Ensemble learning-based framework	LOSO	Accuracy: 100%
Hireš et al. (2022)	2022	Classification (PD vs. HC)	Voice dataset	PC-GITA dataset	100 subjects: 50 PD + 50 HC	2D CNN	10-fold cross-validation	Accuracy: 99%, AUC: 99.6%, Sensitivity: 86.2%, Specificity: 93.3%
Rana et al. (2022)	2022	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	195 subjects: 147 PD + 48 HC	ANN	train and test split (80–20%)	Accuracy (SVM): 87.2%, Accuracy (NB): 74.1%, Accuracy (ANN): 96.7%, Accuracy (KNN): 87.2%
Madruga et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	Collected from participants	60 subjects: 30 PD + 30 HC	Passive aggressive classifier	Cross-validation	Accuracy (position 1): 70.1%, Accuracy (position 2): 71.8%, Accuracy (position 3): 72.9%, Accuracy (position 4): 73.1%
Govindu and Palwe (2023)	2023	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	RF	train and test split (75–25%)	Accuracy: 91.8%, Precision: 95.0%, Recall: 86.0%
Celik and Başaran (2023)	2023	Classification (PD vs. HC)	Voice dataset	PD Dataset and PDO Dataset	PD Dataset: 252 subjects (188 PD + 64 HC) PDO Dataset: 31 subjects (23 PD+ 8 HC)	SkipCon Net + RF	Not specified	Accuracy: 99.1%, Precision: 99.0%, Recall: 99.0%, Specificity: 98%, Specificity:98.77%
Khaskhoussy and Ayed (2023)	2023	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository	40 subjects: 20 PD + 20 HC	Polynomial kernel SVM	5-fold cross-validation	Accuracy: 97.6%, Precision: 94%, Sensitivity: 96%, Specificity: 93%, F-Score: 94%
Dheer et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	Oxford Parkinson’s Disease dataset	31 subjects: 23 PD + 8 HC	KNN	train and test split (75–25%)	Accuracy: 95.9%
Akila and Nayahi (2024)	2024	Classification (PD vs. HC)	Voice dataset	UCI Machine Learning repository	252 subjects: 188 PD + 64 HC	MASS-PCNN (Multi-agent Salp Swarm Algorithm)	5-fold cross-validation	Accuracy: 95.1%, Precision: 97.8%, Recall: 94.7%, F1 score: 99.1%
Handwriting
Drotár et al. (2014)	2014	Classification (PD vs. HC)	Handwriting dataset	Collected from participants	75 subjects: 37 PD + 38 HC	SVM	10-fold cross-validation	Accuracy: 95.29%, F1 score: 93.6%, Specificity: 94.3%, Precision: 92.7%, Recall: 94.3%, ROC-AUC: 98%
Drotár et al. (2015)	2015	Classification (PD vs. HC)	Handwriting dataset	Collected from participants	75 subjects: 37 PD + 38 HC	RBF-SVM	10-fold cross-validation	Accuracy: 88.1%
Pereira et al. (2015)	2015	Classification (PD vs. HC)	Handwriting dataset	Collected from participants	55 subjects: 37 PD + 18 HC	NB	10-fold cross-validation	Accuracy: 88.13%, Sensitivity: 89.74%, Specificity: 91.89%
Ribeiro et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	HandPD dataset	35 subjects: 14 PD + 21 HC	Gated Recurrent Units + Attention	train and test split (75–25%)	Accuracy: 78.9%
Razzak et al. (2020)	2020	Classification (PD vs. HC)	Handwriting dataset	PaHaW, NewHan dataset, Parkinson’s Drawing Dataset	233 subjects: 142 PD + 91 HC	2D CNN (AlexNet, GoogleNet, VGGNet, ResNet)	10-fold cross-validation	Accuracy: 89.48%
Kamran et al. (2021)	2021	Classification (PD vs. HC)	Handwriting dataset	HandPD, NewHandPD, PaHaw, Parkinson’s Drawing Dataset	233 subjects: 142 PD + 91 HC Parkinson’s Drawing Dataset: NA	2D CNN	5-fold cross-validation	Accuracy: 98.04%
Gil-Martín et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	Spiral Drawing dataset	77 subjects: 62 PD + 15 HC	2D CNN	subject-wise 5-fold cross-validation	Accuracy: 96.5%, F1 score: 97.7%, AUC: 99.2%
Diaz et al. (2021)	2021	Classification (PD vs. HC)	Handwriting dataset	PaHaW, NewHan dataset	75 subjects: 37 PD + 38 HC	BiGRUs + CNN	10-fold cross-validation	Accuracy: 94.44%, AUC: 98.25%, Specificity: 98.0%, Sensitivity: 90.0%
Taleb et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	PDMulti MC dataset	42 subjects: 21 PD + 21 HC	CNN + CNN-BLSTM	3-fold cross-validation	Accuracy: 83.33%, Sensitivity: 71.43%, Specificity: 95.24%
Varalakshmi et al. (2022)	2022	Classification (PD vs. HC)	Handwriting dataset	Kaggle spiral data	51 subjects: 50 healthy + 1 PD	A hybrid of RESNET-50 and SVM	train and test split (70–30%)	Accuracy: 98.45%, Sensitivity: 99%, Specificity: 99%
Li et al. (2022)	2022	Classification (PD vs. HC)	Handwriting dataset	Collected from participants	86 subjects: 43 PD + 43 HC	CNN (CC-Net)	Cross-validation	Accuracy: 89.3%, Precision: 99.2%, Recall: 93.1%, F1 Score: 92.5%, Matthews correlation coefficient (MCC): 73.3%
Zhao and Li (2023)	2023	Classification (PD vs. HC)	Handwriting dataset	NewHan dPD	66 subjects: 31 PD + 35 HC	CNN and bidirectional gated recurrent unit (BiGRU)	train and test split (80–20%)	Accuracy (meander): 92.91%, Accuracy (circle): 85.71%, Accuracy (spiral): 90.55%
Abdullah et al. (2023)	2023	Classification (PD vs. HC)	Handwriting dataset	NewHan dPD	66 subjects: 31 PD + 35 HC	ResNet5+ VGG19+Inception V3+kNN	train and test split (80–20%)	Accuracy: 95.29%, AUC: 90%, Recall: 86%, Precision: 99%
Wang et al. (2024)	2024	Classification (PD vs. HC)	Handwriting dataset	DraWritePD, PaHaW datasets	75 subjects: 37 PD + 38 HC	LSTM-CNN	5-fold cross-validation	Accuracy: 96.2%, Sensitivity: 94.5%, Specificity: 97.3%, PaHaW Accuracy: 90.7%
Gait
Tahir and Manap (2012)	2012	Classification (PD vs. HC)	Gait dataset	Collected from participants	32 subjects: 12 PD + 20 HC	SVM	10-fold cross-validation	Accuracy: 100%, Sensitivity: 100%, Specificity: 100%
Wahid et al. (2015)	2015	Classification (PD vs. HC)	Gait dataset	Collected from participants	49 subjects: 23 PD + 26 HC	RF	10-fold cross-validation	Accuracy: 92.6%
Shetty and Rao (2016)	2016	Classification (PD vs. HD vs. ALS)	Gait dataset	Physionet dataset	48 subjects: 15 PD + 20 HD + 13 ALS	SVM	train and test split (50–50%)	Accuracy: 83.33%, Sensitivity: 85.71%, Specificity: 75%
Abdulhay et al. (2018)	2018	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	Medium Gaussian SVM	Not specified	Accuracy: 94.8%
Rehman et al. (2019)	2019	Classification (PD vs. HC)	Gait dataset	Collected from participants	303 subjects: 119 PD + 184 HC	RF	10-fold cross-validation	Accuracy: 97%, Sensitivity: 100%, Specificity: 94%
Balaji et al. (2021)	2021	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	LSTM	train and test split (80–20%)	Accuracy: 98.6%
Xia et al. (2019)	2019	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	CNN, Attention-enhanced LSTM	5-fold cross-validation	Accuracy: 99.07% Sensitivity: 99.10% Specificity: 99.01%
El Maachi et al. (2020)	2020	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	DNN	10-fold cross-validation	Accuracy: 98.7%
Aversano et al. (2020)	2020	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	DNN	10-fold cross-validation	Accuracy: 99.37%
Liu et al. (2021)	2021	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	CNN with Bi-LSTM	Train and Test split (70–30%)	Accuracy: 99.22%, Sensitivity: 100%, Specificity: 98.04%
Nguyen et al. (2022)	2022	Classification (PD vs. HC)	Gait dataset	Physionet	166 subjects: 93 PD + 73 HC	Transformer	10-fold cross-validation	Accuracy: 95.2%, Sensitivity: 98.1%, Specificity: 86.8%
Trabassi et al. (2022)	2022	Classification (PD vs. HC)	Gait dataset	Collected from participants	161 subjects: 81 PD + 80 HC	SVM	10-fold cross-validation, Train and Test split (80–20%)	Accuracy: 81% AUC: 80% F1 score:80% Precision: 80% Recall: 80%
Li and Li (2022)	2022	Classification (PD vs. HC)	Gait dataset	Two public datasets	306 subjects: 214 PD + 92 HC	SVM	Train and Test split (80–20%)	Accuracy: 68% False positive rate: 98% Precision: 69% Recall: 98%
Aşuroğlu and Oğul (2022)	2022	Classification (PD vs. HC), Regression (UPDRS value)	Gait dataset	Physionet	166 subjects: 93 PD + 73 HC	CNN + RF	10-fold cross-validation	Accuracy: 99.5% Sensitivity: 98.7% Specificity: 99.1% Correlation Coefficient: 0.897 Mean Absolute Error: 3.009 Root Mean Square Error: 4.556
Ma et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	Physionet	166 subjects: 93 PD + 73 HC	CNN+XGBoost	Train and Test split	Accuracy: 98.4%
Vinora et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	UCI Machine Learning repository	85 subjects: 70 PD + 15 HC	SVM	Not specified	Recall: 100%, Precision: 50%, F1 score: 67%
Sharma et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	Physionet dataset	166 subjects: 93 PD + 73 HC	CNN+SVM	10-fold cross-validation	Accuracy: 95.2%
EEG
Lee et al. (2019)	2019	Classification (PD vs. HC)	EEG	Collected from participants	406 subjects: 203 PD + 203 HC	3D CNN	Train and Test split (80–20%)	Accuracy: 95.29% F1 score: 93.6% Specificity: 94.3% Precision: 92.7% Recall: 94.3% ROC-AUC: 98%
Oh et al. (2020)	2020	Classification (PD vs. HC)	EEG	Collected from participants	41 subjects: 20 PD + 21 HC	CNN + LSTM	10-fold cross-validation	Accuracy: 96.9%, Recall: 93.4%, Precision: 100%
Anjum et al. (2020)	2020	Classification (PD vs. HC)	EEG	Collected from participants	Participants from New Mexico 54 subjects: 27 PD + 27 HC Participants from Iowa 28 subjects: 14 PD + 14 HC	Linear predictive coding	10-fold cross-validation	Accuracy: 85.3%, AUC: 93.3%, Sensitivity: 87.9%, Specificity: 82.7%
Shaban (2021)	2021	Classification (PD vs. HC)	EEG	UC San Diego Public Dataset	31 subjects: 16 PD + 15 HC	ANN	Train and Test split (80–20%)	Accuracy: 98%, Sensitivity: 97%, Specificity: 100%
Loh et al. (2021)	2021	Classification (PD vs. HC)	EEG	UC San Diego Public Dataset	31 subjects: 16 PD + 15 HC	2D-CNN	10-fold cross-validation	Accuracy: 99.46%
Motin et al. (2022)	2022	Classification (PD vs. HC)	EEG	UC San Diego Public Dataset	31 subjects: 16 PD + 15 HC	Polynomial SVM	Train and Test split	Accuracy: 87.1%, Sensitivity: 93.3%, Specificity: 81.25%
Chawla et al. (2023)	2023	Classification (PD vs. HC)	EEG	Combined from two public datasets	Dataset-1 40 subjects: 20 PD + 20 HC Dataset-2 31 subjects: 16 PD + 15 HC	flexible analytic wavelet transform (FAWT) + KNN	10-fold cross-validation	Dataset-1 Accuracy: 99% AUC: 99.1% Sensitivity: 99.12% Specificity: 99.45% Dataset-2 Accuracy: 95.85% AUC: 95.9% Sensitivity: 96.14% Specificity: 95.88%
Coelho et al. (2023)	2023	Classification (PD vs. HC)	EEG	Public PRED+C repository	50 subjects: 25 PD + 25 HC	SVM	5-fold cross-validation	Accuracy: 89.56%
Nour et al. (2023)	2023	Classification (PD vs. HC)	EEG	UC San Diego Public Dataset	31 subjects: 16 PD + 15 HC	Dynamic Classifier Selection (DCS) in Modified Local Accuracy (MLA)	5-fold cross-validation	Accuracy: 99.3%, Precision: 99.31%, Recall: 99.31%
Zhao et al. (2024)	2024	Classification (PD vs. HC)	EEG	Collected from participants	100 subjects: 52 PD + 48 HC	GSP-GCNs (Graph Signal Processing-Graph Convolutional Networks)	5-fold cross-validation	Accuracy: 90.2%, AUC: 89.1%, Sensitivity: 84.0%, Specificity: 88.4%
Other Data
Bhandari et al. (2023)	2023	Classification (PD vs. HC)	Gene dataset	Five open-source peripheral blood microarray gene expression datasets on PD from GEO	742 subjects: 406 PD + 336 HC	Logistic Regression	10-fold cross-validation	Accuracy: 77.7%, Precision: 77.6%, Recall: 77.82%
Wang et al. (2023)	2023	Classification (PD vs. HC)	Urine biomarkers	Collected from participants	215 subjects: 104 PD + 111 HC	XGBoost	Not specified	Accuracy: 96.5%, AUC: 99.2%
Junaid et al. (2023)	2023	Classification (PD vs. HC)	Patient visits	PPMI	215 subjects: 324 PD + 217 HC	Light gradient boosting machines (LGBM)	10-fold cross-validation	Accuracy: 90.73%, Precision: 83.27%, Recall: 89.53%
Igene et al. (2023)	2023	Classification (PD vs. HC)	Movement data	Collected from participants	34 subjects: 17 PD + 17 HC	SVM	10-fold cross-validation	Accuracy: 94.4%
Varghese et al. (2024)	2024	Classification (PD vs. HC)	Smartwatch data, Questionnaire data	PADS (PD Smartwatch) dataset	469 subjects: 276 PD + 114 DD + 79 HC	Classifier stacking (SVM, NN, CatBoost, Xception- Time)	Nested 5-fold cross-validation	Accuracy: 91.16%, Precision: 96.98%, Recall: 92.40%, F1 score: 94.62%

Open in a new tab

HC: health control, NC: normal control, UPDRS: Unified PD Rating Scale, CNN: convolutional neural network, RNN: recurrent neural network, MLP: multilayer perceptron, DT: decision tree, SVM: support vector machine, ANN: Artificial neural network, RF: random forest, LR: linear regression, NB: Naïve Bayes

Neuroimaging data

Neuroimaging is a branch of medical imaging that applies radiological and other techniques to images of the nervous system (Rastogi et al. 2025; Kujur et al. 2022; Alhussen et al. 2025). With the increasing availability of large-scale neuroimaging datasets and advancements in ML and DL, neuroimaging has played an important role in the early detection, classification, computer-aided diagnosis, and monitoring of various neurological disorders (Goceri 2024, 2025; Nakach et al. 2024). Many studies have applied neuroimaging for the early diagnosis of PD using ML techniques. In this review, we include 18 articles using neuroimaging data. Among them, 12.5% of the articles (3/24) used SPECT data, 20.8% of the articles used (5/24) DTI data, 62.5% of the articles used (15/24) MRI data, and 4.2% (1/24) of the articles used Positron Emission Tomography (PET) imaging data. In terms of ML models, the most commonly used models are Support Vector Machine (SVM), Convolutional Neural Network (CNN), and 3D CNN. Besides, some data-preprocessing techniques are used to minimise the noise of the image. Data augmentation techniques, such as Generative Adversarial Network (GAN), may be used to increase the number of samples. For validation, various methods are used, including 10-fold cross-validation; train, validation, and test split validation; and train and test split validation. 45.8% (11/24) of the articles reported an accuracy of over 90%. There are also some problems with applying neuroimaging data to PD diagnosis. For example, some comparisons between previous studies are unfair as they applied different experimental datasets or the same dataset with different subjects. Besides, some studies only applied the train and test split validation, which is unsuitable because the dataset size is small. Fig 3 presents the distribution of traditional ML and DL approaches employed in neuroimaging-based studies. Since most neuroimaging data are saved in the form of medical imaging (Fig 4), the application of DL in neuroimaging datasets is more widespread than that of traditional ML.

Fig. 3 — Distribution of traditional ML method and DL method in neuroimaging data. Blue represents DL and green represents traditional ML

Fig. 4 — The samples of neuroimaging data

Voice data

Analysis of voice or speech characteristics could contribute to PD diagnosis and detection, especially as recent research has shown that voice impairment is the commonest underlying symptom in many PD patients (Karan et al. 2020). In PD diagnosis based on voice data, 57.1% of the articles (28/47) used the dataset collected from the University of Oxford (Tsanas et al. 2012). However, the dataset size is too small (only contains 31 participants, and the data distribution is unbalanced (23 PD patients and 8 healthy controls). These disadvantages cause the model to have a weak generalisation. For the model evaluation, 55.3% of the articles (26/47) used cross-validation, where 10-fold cross-validation was the most common method (18/47). Unfortunately, 14.9% of the articles (7/47) did not provide a detailed evaluation method. In addition, there was no uniform standard for splitting datasets. Some of the same speech samples often appeared in both the training set and testing set, which led to overly optimistic performance results.

Overall, voice data is the most widely used data modality but has limited potential to apply in the real world due to different languages, accents, and uncontrollable ambient sounds. One model may perform well on one specific dataset, but poorly on another. Many articles simply quoted the performance data from other studies rather than undertaking their own evaluation. Fig 5 presents the distribution of traditional ML and DL approaches employed in voice-based studies.

Fig. 5 — Distribution of traditional ML method and DL method in voice data. Blue represents DL and green represents traditional ML

Handwriting data

Handwriting requires motor control and specific neuromuscular coordination. Handwriting abnormalities are a common early motor symptom of PD and, therefore, of potential value for diagnosis. The number of participants included in studies of handwriting-based PD diagnosis is relatively small. 14.3% (2/14) of the articles used a study population of more than 200, while 85.7% (12/14) of the articles included fewer than 200 patients, where SVM, CNN, and RNN were the most commonly used ML models. Regarding validation, 71.4% (10/14) of the articles applied k-fold cross-validation, and only 28.6% (4/14) of the articles used the train and test split validation mechanism. 57.1% (8/14) of the articles reported a diagnostic accuracy of over 90%, and 92.9% (13/14) of the articles reported an accuracy of over 80%. Fig 6 presents the distribution of traditional ML and DL approaches employed in handwriting-based studies. Since most handwriting datasets are saved in the form of pictures(Fig 7), the application of DL in handwriting datasets is more widespread than that of traditional ML.

Fig. 6 — Distribution of traditional ML method and DL method in handwriting data. Blue represents DL and green represents traditional ML

Fig. 7 — The samples of handwriting data

Gait data

Gait disorder is one of the most incapacitating motor symptoms in PD and a challenge for the medical specialist to evaluate. In PD diagnosis based on gait data, 64.7% of the articles (11/17) used the dataset from Physionet. This dataset contains 166 subjects (93 PD patients and 73 healthy controls (HC)). 58.8% of the articles (10/17) used cross-validation, where 10-fold cross-validation was the most common method (9/17).

The data in the dataset needs to be segmented according to the gait cycle; otherwise, some specific data samples may be located at the intersection part of the probability density functions for two classes. Moreover, extracting the features of the left and right gait separately may result in better performance. The gait data-based model is highly generalizable since walking posture is similar for people from different countries. Fig 8 presents the distribution of traditional ML and DL approaches employed in gait-based studies.

Fig. 8 — Distribution of traditional ML method and DL method in gait data. Blue represents DL and green represents traditional ML

EEG data

EEG involves recording brain signals from the scalp’s surface. As PD is related to brain abnormalities, EEG signals can be applied to assist in PD diagnosis. Ten articles are included in this review. 60.0% (6/10) of the articles included fewer than 50 participants. 30.0% (3/10) of the articles used CNN-based models. 40.0% (4/10) articles applied 10-fold cross-validation, and 30.0% (3/10) articles applied to train and test split validation. Fig 9 presents the distribution of traditional ML and DL approaches employed in EEG-based studies.

Fig. 9 — Distribution of traditional ML method and DL method in EEG data. Blue represents DL and green represents traditional ML

Other data

Besides these data modalities, this review also includes five research studies that used other data modalities, such as gene and urine biomarkers. Out of these, 60% (3/5) of articles used 10-fold cross-validation. Fig 10 presents the distribution of traditional ML and DL approaches employed in other data-based studies. Due to the limited computer science background of most authors who collected these new datasets, and the majority of datasets were recorded in the form of indicators or textual descriptions, traditional ML methods were chosen over DL approaches.

Fig. 10 — distribution of traditional ML method and DL method in other data. Blue represents DL and green represents traditional ML

Datasets

We briefly summarize the five commonly used public datasets for ML-based PD diagnosis.

PPMI Parkinson’s Progression Markers Initiative (PPMI) dataset was sponsored by the Michael J. Fox Foundation (MJFF). It is a dataset used for PD diagnosis with neuroimaging data modality. The study contains imaging, clinical, and biological data on PD patients and the HC group. It is designed to define and discover biomarkers of PD progression.

PC-GITA PC-GITA dataset, also called the new Spanish speech corpus dataset, is the first dataset that provides speech recordings in Spanish (Orozco-Arroyave et al. 2014). It is a dataset used for PD diagnosis with voice data modality. This dataset contains speech recordings of 50 PD patients and 50 HC subjects, where all subjects are native Spanish speakers. The speech recordings were collected following a designed protocol, and the corpus dataset includes several tasks such as sustained phonations of the vowels and diadochokinetic evaluation.

HandPD The HandPD dataset is used for PD diagnosis with handwriting data and contains 55 subjects with 37 PD patients, and 18 HC subjects. Each subject was asked to complete the handwriting clinical exam, such as drawing spirals and circles (Pereira et al. 2015). As some subjects did not complete all of the exam tasks, the entire dataset comprises 373 images.

PaHaW Parkinson’s Disease Handwriting (PaHaW) dataset consists of 75 subjects with 37 PD patients and 38 HC subjects (Drotár et al. 2016). It is a dataset used for PD diagnosis with handwriting data. The tasks include drawing an Archimedean spiral, repetitively writing orthographically simple syllables and words, and writing a sentence.

Physionet Physionet repository, the title of the Research Resource for Complex Physiologic Signals, was established in 1999 and is supported by the National Institutes of Health (NIH) (Goldberger et al. 2000). It is a widely used repository of biomedical data and contains datasets that can be used for PD diagnosis with gait data modality. This repository enables researchers to share and reuse clinical research resources and reduce barriers to data access.

Clinical applicability

The clinical applicability of various diagnostic modalities for PD hinges on their practicality in real-world settings. Although neuroimaging techniques (DaTSCAN, SPECT) are useful in clinical diagnosis, they face limitations due to their high costs and the need for specialised equipment and trained personnel. These barriers make it less feasible to implement in low-resource Settings or routine screening. For EEG data, the subtleties associated with PD-related signal changes and the influences of various confounders, such as patient movement and electrical interference, complicate the interpretation of EEG results. Additionally, the absence of standardized protocols for EEG recording and analysis in the context of PD further complicates its widespread adoption in clinical practice. Conversely, voice, handwriting, and gait analyses offer a more accessible alternative, as they require minimal specialized equipment and can be performed remotely.

However, the clinical applicability of these modalities is contingent upon the standardization of data collection and the development of robust algorithms that can reliably interpret variations in patient data due to external factors such as background noise or emotional state. The adoption of voice and handwriting tools in clinical practice also depends on their integration into existing healthcare systems and workflows. For these tools to be widely accepted, they must demonstrate not only reliability and accuracy but also cost-effectiveness compared to more established diagnostic methods. PD poses a significant burden on both governments and patients’ families. As PD currently lacks a gold standard for diagnosis, ML tools are intended to serve as assistive tools, and their cost-effectiveness is crucial. Compared to MRI-based methods and EEG-based methods, voice, handwriting, and gait-based methods are more affordable and accessible. The integration with existing electronic health record (EHR) systems is also critical to ensure that AI models can be seamlessly embedded into current clinical workflows. To improve diagnostic precision and treatment planning, a database for PD patients with EHR should be established, which should contain a wide range of PD patient examination data, allowing for more personalised treatment. Lastly, for all diagnostic tools, including neuroimaging, voice, and handwriting analysis, there needs to be a clear regulatory pathway for their validation and approval. Establishing comprehensive guidelines that address privacy concerns, data security, and the ethical use of AI in clinical settings will be crucial for their broader adoption.

Evaluation metrics

The evaluation metrics utilised in an ML classification task are Accuracy, Precision, Sensitivity(Recall), Specificity, Area Under Curve (AUC), Matthews Correlation Coefficient (MCC), and F1 score. For an actual positive class, if the result is a predicted positive class, it is a True Positive (TP); otherwise, it is a False Negative (FN). For an actual negative class, if the result is a predicted positive class, it is a False Positive (FP); otherwise, it is a True Negative (TN).

Risk of bias

The risk of bias is assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (Wolff et al. 2019). PROBAST is designed to evaluate the risk of bias in the diagnostic model study. In this review, the risk of bias in all included studies is assessed independently and then validated by the authors separately. The results of the risk of bias assessment are shown in Table 2. Most of the studies are at high risk of bias or unclear, and 28 studies are at low risk of bias (Ya et al. 2022; Huang et al. 2023; Xu et al. 2023; Camacho et al. 2023; Peker et al. 2015; Chen et al. 2016; Parisi et al. 2018; Ali et al. 2019; Haq et al. 2019; Li et al. 2022; Zhao and Li 2023; Abdullah et al. 2023; Balaji et al. 2021; Xia et al. 2019; Oh et al. 2020; Trabassi et al. 2022; Anjum et al. 2020; Chawla et al. 2023; Coelho et al. 2023; Khaskhoussy and Ayed 2023; Nour et al. 2023; Junaid et al. 2023; Zhao et al. 2024; Priyadharshini et al. 2024; Wang et al. 2024; Akila and Nayahi 2024; Hireš et al. 2022; Tsai et al. 2023).

Table 2.

Risk of bias assessment of the included studies according to the PROBAST checklist. “+” indicates a low risk of bias, “-” indicates a high risk of bias, and “?” means an unclear risk of bias

#	Study	Participants	Predictors	Outcome	Analysis	Risk of bias
1	West et al. (2019)	+	+	−	?	−
2	Dai et al. (2019)	?	+	−	+	−
3	Zhang et al. (2019)	+	+	−	?	−
4	Chakraborty et al. (2020)	+	+	?	+	?
5	Kaur et al. (2021)	+	+	?	+	?
6	Vyas et al. (2022)	+	+	?	+	?
7	Quan et al. (2021)	+	+	?	?	?
8	Zahid et al. (2020)	+	+	?	+	?
9	Rizvi et al. (2020)	+	+	−	−	−
10	Abayomi-Alli et al. (2020)	+	+	?	−	−
11	Gunduz (2019)	+	+	?	+	?
12	Nagasubramanian and Sankayya (2021)	+	+	?	−	−
13	Fang et al. (2020)	+	+	?	?	?
14	Ribeiro et al. (2019)	+	+	?	?	?
15	Razzak et al. (2020)	+	+	?	?	?
16	Kamran et al. (2021)	+	+	?	?	?
17	Gil-Martín et al. (2019)	+	+	?	?	?
18	Diaz et al. (2021)	+	+	?	?	?
19	Taleb et al. (2019)	+	+	?	?	?
20	Xia et al. (2019)	+	+	+	+	+
21	El Maachi et al. (2020)	+	+	?	?	?
22	Aversano et al. (2020)	+	+	?	−	−
23	Liu et al. (2021)	+	+	?	+	?
24	Lee et al. (2019)	+	+	?	?	?
25	Oh et al. (2020)	+	+	+	+	+
26	Shaban (2021)	+	+	+	?	?
27	Loh et al. (2021)	+	+	−	?	−
28	Prashanth et al. (2014)	+	+	?	?	?
29	Salvatore et al. (2014)	+	+	?	?	?
30	Rana et al. (2015)	+	+	?	?	?
31	Oliveira and Castelo-Branco (2015)	+	+	?	?	?
32	Zhang and Kagen (2017)	+	+	?	?	?
33	Peng et al. (2017)	+	+	?	+	?
34	Sivaranjini and Sujatha (2020)	+	+	+	−	−
35	Sakar and Kursun (2010)	+	+	?	+	?
36	Bhattacharya and Bhatia (2010)	+	?	−	–	–
37	Guo et al. (2010)	+	+	+	?	?
38	Åström and Koker (2011)	+	+	?	+	?
39	Ramani and Sivagami (2011)	+	+	?	?	?
40	Yadav et al. (2012)	–	+	?	+	−
41	Tsanas et al. (2012)	+	+	?	+	?
42	Mandal and Sairam (2014)	+	+	?	+	?
43	Hazan et al. (2012)	−	+	?	−	−
44	Gharehchopogh and Mohammadi (2013)	+	+	?	?	?
45	Rustempasic and Can (2013)	+	+	?	−	−
46	Sharma and Giri (2014)	+	+	+	−	−
47	Olanrewaju et al. (2014)	+	+	?	?	?
48	Peker et al. (2015)	+	+	+	+	+
49	Gök (2015)	+	+	+	?	?
50	Chen et al. (2016)	+	+	+	+	+
51	Avci and Dogantekin (2016)	?	+	?	−	−
52	Dinesh and He (2017)	+	?	?	−	−
53	Caliskan et al. (2017)	+	+	+	−	−
54	Parisi et al. (2018)	+	+	+	+	+
55	Wroge et al. (2018)	?	+	?	?	?
56	Lahmiri et al. (2018)	+	+	?	+	?
57	Haq et al. (2018)	+	+	+	−	−
58	Ali et al. (2019)	+	+	+	+	+
59	Mostafa et al. (2019)	+	+	?	?	?
60	Lahmiri and Shmuel (2019)	+	+	?	+	?
61	Haq et al. (2019)	+	+	+	+	+
62	Senturk (2020)	+	+	?	−	−
63	Karan et al. (2020)	+	+	?	+	?
64	Soumaya et al. (2021)	−	+	+	?	−
65	Karaman et al. (2021)	+	+	−	+	−
66	Drotár et al. (2014)	+	+	−	+	−
67	Drotár et al. (2015)	+	+	−	−	−
68	Pereira et al. (2015)	+	?	−	−	−
69	Tahir and Manap (2012)	+	+	−	+	−
70	Wahid et al. (2015)	+	+	?	+	?
71	Shetty and Rao (2016)	+	+	−	?	−
72	Abdulhay et al. (2018)	+	+	−	?	−
73	Rehman et al. (2019)	+	+	?	+	?
74	Balaji et al. (2021)	+	+	+	+	+
75	Ya et al. (2022)	+	+	+	+	+
76	Erdaş and Sümer (2022)	−	?	+	+	−
77	Huang et al. (2023)	+	+	+	+	+
78	Ali et al. (2023)	−	−	+	+	−
79	Hireš et al. (2022)	+	+	+	+	+
80	Rana et al. (2022)	−	+	+	−	−
81	Madruga et al. (2023)	−	?	+	−	−
82	Varalakshmi et al. (2022)	−	+	+	−	−
83	Li et al. (2022)	+	+	+	+	+
84	Zhao and Li (2023)	+	+	+	+	+
85	Abdullah et al. (2023)	+	+	+	+	+
86	Nguyen et al. (2022)	+	+	+	−	−
87	Trabassi et al. (2022)	+	+	+	+	+
88	Li and Li (2022)	−	+	+	−	−
89	Aşuroğlu and Oğul (2022)	+	+	+	−	−
90	Ma et al. (2023)	−	−	+	−	−
91	Anjum et al. (2020)	+	+	+	+	+
92	Motin et al. (2022)	−	−	+	−	−
93	Chawla et al. (2023)	+	+	+	+	+
94	Coelho et al. (2023)	+	+	+	+	+
95	Xu et al. (2023)	+	+	+	+	+
96	Camacho et al. (2023)	+	+	+	+	+
97	Govindu and Palwe (2023)	+	+	−	−	−
98	Celik and Başaran (2023)	+	+	−	−	−
99	Khaskhoussy and Ayed (2023)	+	+	+	+	+
100	Dheer et al. (2023)	−	−	−	−	−
101	Vinora et al. (2023)	−	+	−	−	−
102	Sharma et al. (2023)	−	+	+	+	−
103	Nour et al. (2023)	+	+	+	+	+
104	Bhandari et al. (2023)	−	+	+	−	−
105	Wang et al. (2023)	+	+	−	−	−
106	Junaid et al. (2023)	+	+	+	+	+
107	Igene et al. (2023)	−	+	−	−	−
108	Varghese et al. (2024)	+	+	+	?	?
109	Zhao et al. (2024)	+	+	+	+	+
110	Priyadharshini et al. (2024)	+	+	+	+	+
111	Wang et al. (2024)	+	+	+	+	+
112	Akila and Nayahi (2024)	+	+	+	+	+
113	Talai et al. (2021)	+	+	?	?	?
114	Prasuhn et al. (2020)	+	+	−	?	−
115	Chen et al. (2023)	?	+	?	+	?
116	Tsai et al. (2023)	+	+	+	+	+
117	Zhao et al. (2022)	+	+	?	−	−

Open in a new tab

We follow the standard PROBAST framework, which evaluates studies across four domains: participants, predictors, outcome, and analysis. We have found that many included studies used small datasets, which limit the generalizability of their findings. Moreover, several studies had methodological flaws, including data leakage, insufficient sample sizes, and unrealistic validation protocols. These issues contribute to a high risk of bias, particularly in the Participants and Analysis domains of the PROBAST framework. For example, in the study (Sivaranjini and Sujatha 2020), a high risk of bias was identified in the Analysis domain. This was due to the study only reporting the experimental results without providing a detailed analysis of the dataset or methodological details. Moreover, the study used MRI data and applied an image-level train-test split rather than subject-level cross-validation, which increased the likelihood of data leakage. As a result, the study was assessed as having a high risk of bias in the “Analysis” domain, and the overall risk of bias was deemed high. Fig. 11 shows the PROBAST evaluation results in a heatmap.

Fig. 11 — Risk of bias PROBAST assessment summary

While certain data modalities are indeed associated with a higher risk of bias, they nonetheless demonstrate substantial potential for ML-based PD diagnostics. In particular, EEG and gait signals stand out due to their biological plausibility, accessibility, and practical advantages in clinical settings.

EEG offers high temporal resolution and captures neurophysiological activity directly linked to both motor dysfunction and cognitive impairment, two hallmark features of PD. Likewise, gait analysis reflects core motor symptoms such as bradykinesia, rigidity, and postural instability, making it a valuable modality for both diagnosis and monitoring of disease progression. Importantly, these modalities align well with clinicians’ existing understanding of PD pathophysiology and assessment practices, which may facilitate greater acceptance and integration into clinical workflows.

Case studies

In this paper, we have done 5 case studies (1 for MRI, 1 for gait, 1 for voice, 1 for EEG, 1 for handwriting). We repeat the experiment to reproduce the result provided in these papers (Table 3). In our reproduction experiments, we adopted a unified evaluation framework using the following metrics: Accuracy, Specificity, Sensitivity, Precision, Recall, F1 score, AUC (Area Under the ROC Curve), RGA (Ranked Graduation Accuracy) (Giudici and Raffinetti 2025), Lorenz Zonoid (Calzarossa et al. 2025), RGR (Rank Graduation Robustness) (Babaei et al. 2025).

Table 3.

Case studies results

Data modality	Paper Report Result	Reproduction Result	Explainability	Robust/Security
Voice	Accuracy: 100%	Accuracy: 100%	Lorenz Zonoid: Cannot be calculated because AUC cannot be calculated	RGR: 99.93%
		Specificity: 0.00%
		Sensitivity: 100.00%
		Precision: 100.00%
		Recall: 100.00%
		F1 score: 100.00%
		AUC: Cannot be calculated because there is only one class in the test set
		RGA: 100.00%
Gait	Accuracy: 95.2%	Accuracy: 87.12%	Lorenz Zonoid: 69.59%	RGR: 99.84%
	Specificity: 86.8%	Specificity: 68.24%
	Sensitivity: 98.1%	Sensitivity: 94.03%
		Precision: 88.99%
		Recall: 94.03%
		F1 score: 91.44%
		AUC: 84.80%
		RGA: 84.80%
EEG	Accuracy: 98%	Accuracy: 62.40%	Lorenz Zonoid: 35.60%	RGR: 99.84%
	Specificity: 100%	Specificity: 62.68%
	Sensitivity: 97%	Sensitivity: 62.10%
		Precision: 62.00%
		Recall: 62.10%
		F1 score: 62.05%
		AUC: 67.80%
		RGA: 67.80%
MRI	Accuracy: 90.36%	Accuracy: 56.67%	Lorenz Zonoid: -5.02%	RGR: 84.31%
	Sensitivity: 90.52%	Specificity: 16.67%
	Precision: 90.08%	Sensitivity: 86.00%
	F1 score: 90.25%	Precision: 52.92%
	AUC: 90.51%	Recall: 86.00%
		F1 score: 64.37%
		AUC: 47.49%
		RGA: 47.49%
Handwriting	SP_50_50:	SP_50_50:	SP_50_50:	SP_50_50:
	Accuracy: 85.38% (Std: 2.37%)	Accuracy: 85.49% (Std: 2.23%)	Lorenz Zonoid: 87.10% (Std: 2.26%)	RGR: 100.00% (Std: 0.00%)
	Precision: 85.5% (Std: 3.1%)	Specificity: 86.43% (Std: 2.24%)
	Recall: 83.4% (Std: 5.4%)	Sensitivity: 84.44% (Std: 5.49%)
	F1 score: 84.3% (Std: 2.9%)	Precision: 84.91% (Std: 1.86%)
		Recall: 84.44% (Std: 5.49%)
		F1 score: 84.56% (Std: 2.83%)
		AUC: 93.55% (Std: 1.13%)
		RGA: 93.55% (Std: 1.13%)
	SP_75_25:	SP_75_25:	SP_75_25:	SP_75_25:
	Accuracy: 89.48% (Std: 3.67%)	Accuracy: 84.03% (Std: 2.67%)	Lorenz Zonoid: 88.04% (Std: 3.11%)	RGR: 100.00% (Std: 0.00%)
	Precision: 84.8% (Std: 4.7%)	Specificity: 88.00% (Std: 7.86%)
	Recall: 95.5% (Std: 4.8%)	Sensitivity: 79.69% (Std: 7.30%)
	F1 score: 89.7% (Std: 3.5%)	Precision: 86.79% (Std: 6.60%)
		Recall: 79.69% (Std: 7.30%)
		F1 score: 82.61% (Std: 2.92%)
		AUC: 94.02% (Std: 1.55%)
		RGA: 94.02% (Std: 1.55%)
	MEA_50_50:	MEA_50_50:	MEA_50_50:	MEA_50_50:
	Accuracy: 89.29% (Std: 3.75%)	Accuracy: 82.03% (Std: 1.92%)	Lorenz Zonoid: 80.62% (Std: 2.29%)	RGR: 100.00% (Std: 0.00%)
	Precision: 85.0% (Std: 4.5%)	Specificity: 83.71% (Std: 5.00%)
	Recall: 77.9% (Std: 7.9%)	Sensitivity: 80.16% (Std: 5.32%)
	F1 score: 81.0% (Std: 5.0%)	Precision: 81.96% (Std: 4.05%)
		Recall: 80.16% (Std: 5.32%)
		F1 score: 80.81% (Std: 2.27%)
		AUC: 90.31% (Std: 1.14%)
		RGA: 90.31% (Std: 1.14%)
	MEA_75_25:	MEA_75_25:	MEA_75_25:	MEA_75_25:
	Accuracy: 92.24% (Std: 2.65%)	Accuracy: 79.40% (Std: 3.52%)	Lorenz Zonoid: 72.46% (Std: 6.75%)	RGR: 100.00% (Std: 0.00%)
	Precision: 95.2% (Std: 2.5%)	Specificity: 90.57% (Std: 3.63%)
	Recall: 88.3% (Std: 4.9%)	Sensitivity: 67.19% (Std: 8.76%)
	F1 score: 92.4% (Std: 3.1%)	Precision: 86.95% (Std: 3.57%)
		Recall: 67.19% (Std: 8.76%)
		F1 score: 75.41% (Std: 5.38%)
		AUC: 86.23% (Std: 3.38%)
		RGA: 86.23% (Std: 3.38%)

Open in a new tab

Std represents the standard deviations. SP_50_50 and SP_75_25 represent experiments using the Spiral Dataset, with 50%/50% and 75%/25% splits for training and testing, respectively. MEA_50_50 and MEA_75_25 represent experiments using the Meander Dataset with the same respective training/testing splits

Case study 1: voice

Parkinson Speech Dataset: The dataset was collected by Sarkar et al. at the Department of Neurology in Cerrahpasa, Faculty of Medicine, Istanbul University. The dataset can be divided into two parts: training and testing. The training dataset includes data from 20 PD patients and 20 healthy subjects. The age of PD patients is between 43 and 77 years, while healthy subjects are aged between 45 and 83 years. From each subject, 26 samples were recorded. For the testing part, it contains data from 28 subjects (all PD patients) aged between 39 and 79 years. For each subject, 6 samples were recorded.

Data Preprocessing: LDA is used to reduce the data dimension. It transforms the original feature vectors into the reduced vector space where the class separability is maximised.

Result: LOSO validation is used to evaluate the model’s performance. The source code is provided in https://github.com/LiaqatAli007/Automated−Detection-of-Parkinson-s-Disease-Based-on-Multiple-Types-of-Sustained-Phonations-using-Lin. There is only one class, “PD”, in the test set. The paper reported that the model can achieve a 100% accuracy, and the result of the reproduction experiment matches the result reported by the paper.

Case study 2: gait

Physionet Dataset: The dataset was collected from three research (Yogev et al. 2005; Frenkel-Toledo et al. 2005; Goldberger et al. 2000; Hausdorff et al. 2007). 93 Parkinson’s patients (mean age: 66.3 years; 63% men) and 73 healthy controls (mean age: 66.3 years; 55% men) are included in the dataset. For each subject, there are 8 sensors on each foot with a 2-minute length measure of the vertical ground reaction force (in Newtons). The output of each sensor is digitised and recorded at 100 samples per second. Two extra signals reflect the sum of the 8 sensor outputs for each foot.

Data Preprocessing: Each 1D signal is divided into smaller segments with a length of 100 time steps and 50

Result: 10-fold cross-validation is used to evaluate the model’s performance. There are two groups: PD and HC. Each of them is divided into 10 folds at the subject level and combined to form a fold with 70% Parkinson and 30% control. The source code is provided at https://github.com/DucMinhDimitriNguyen/Transformers-for-1D-signals-in-Parkinson-s-disease-detection-from-gait. The result report by the paper is 98.1% in sensitivity, 86.8% in specificity, and 95.2% in accuracy. However, the result of the reproduction experiment cannot achieve the reported performance. It achieved a sensitivity of 94.03%, specificity of 68.24%, and accuracy of 87.12%.

Case study 3: EEG

Dataset: The dataset was collected by the Aron lab at the University of California, San Diego, and subsequently further analyzed by the Swann lab at the University of Oregon. There are 16 PD patients (8 females; mean age: 62.6±8.3) and 15 HC (9 females; mean age: 63.5±9.6) included in the dataset. The data was captured using 40 electrodes with a sampling rate of 512Hz.

Data Preprocessing: Select the data in channels of Inline graphic , , and .9s to 2 minutes and segmented into patches of 512 time samples.

Result: The dataset is divided into three parts: 64% for train, 16% for validation, and 20% for test. There is no source code provided, only the model structure. The paper reported that it can achieve an accuracy of 98.00%, sensitivity of 97.00%, and specificity of 100.00%. However, according to our reproduction, it only achieves 62.40% accuracy, 62.10% sensitivity, and 62.68% specificity. The reason may be that the author used the pre-trained model.

Case study 4: handwriting

NewHandPD Dataset: The dataset was collected by the Botucatu Medical School, São Paulo State University. It contains 12 exams (4 of them related to spirals, 4 related to meanders, 2 circled movements, and left and right-handed diadochokinesis). There are 31 PD patients (10 females; mean age: 57.83±7.85) and 35 HC (17 females; mean age: 44.05±14.88) included in the dataset.

Data Preprocessing: The 5th and 90th percentiles were set as lower and upper bounds. Values outside these bounds were replaced by boundary values to mitigate outlier effects. Normalisation is applied to have a zero mean and unitary standard deviation.

Result: The dataset is divided into three parts: 60% for training, 15% for validation, and 25% for testing. The source code is provided in https://github.com/lzfelix/bag-of-samplings. The paper reported that it can achieve an accuracy of 89.48%±3.7%, precision of 84.8%±4.7%, recall of 95.5%±4.8%, and F1 score of 89.7%±3.5% in the Spiral dataset and an accuracy of 92.24%±2.65%, precision of 95.2%±2.5%, recall of 88.3%±4.9%, and F1 Score of 92.4%±3.1% in the Meander dataset. However, according to our reproduction, it only achieves 84.03%±2.67% accuracy, 86.79%±6.60% precision, 79.69%±7.30% recall and 82.61%±2.92% F1 score in the Spiral dataset and 79.40%±3.52% accuracy, 86.95%±3.57% precision, 67.19%±8.76% recall and 75.41%±5.38% F1 score in the Meander dataset.

Case study 5: MRI

Dataset: The dataset was created by Badea et al. (2017), which combined the T1 MRI images from two datasets collected by Neurocon and Taowu. There are 83 subjects included in the dataset, with 43 from Neurocon (27 PD patients and 16 controls) and 40 from Taowu (20 PD patients and 20 controls).

Data Preprocessing: Median slices from the axial, coronal, and sagittal planes of 3D MR images were extracted and resized to 224x224 pixels. The three median slices are combined into a single three-channel image to maintain spatial integrity across different planes.

Result: 10-fold cross-validation is used to evaluate the model’s performance. The source is not provided, but we reproduce the experiment based on the provided model architecture. The paper reported that it can achieve an accuracy of 90.36%, precision of 90.08%, sensitivity of 90.52%, AUC of 90.51%, and F1 Score of 90.25%. However, according to the reproduction result, it only obtained the accuracy of 56.67%, precision of 52.92%, sensitivity of 86.00%, AUC of 47.49%, and F1 Score of 64.37%.

Reproduction results

We have summarized the case study results, including both the original paper’s reported results and our reproduction results. The code of our reproduction can be accessed via: https://github.com/yiming95/PD_ML_benchmark. According to the reproduction, 3 out of 5 papers could not replicate the presented results. Most of the reviewed papers do not provide source code (MRI: 2, voice: 1, handwriting: 1, gait: 2, EEG: 0, and others: 1). The lack of open-source code negatively impacts the understanding and improvement of existing methods. Additionally, even for the studies that provide code, many fail to include complete code, data preprocessing steps, or specific hyperparameter values. These issues have led to many experiments failing to match the original findings.

More specifically, for the voice data modality, we have successfully reproduced the results with 100% accuracy. For the EEG data modality, the original paper reported an accuracy of 98%, whereas our reproduction result is 62.23%. Since the authors did not release their source code, we have re-implemented the model architecture based on the descriptions provided. The discrepancy may be due to the missing description of the implementation details in the original paper, such as the potential use of pre-trained model initialization or specific training techniques that were not disclosed in the original paper. For the gait data modality, the results differ slightly. A possible reason for this could be variations in hyperparameter tuning strategies. The original authors may not have provided the full set of hyperparameters for their model, leading to slight inconsistencies in the reproduced results. For the handwriting data modality, although the authors provided the code, our reproduced results have shown minor discrepancies. A likely explanation is the use of random data splitting, which can result in inconsistent datasets for model training. We believe this discrepancy is due to the absence of exact dataset splits, but it can be reproduced under certain dataset split, and we consider this result is reproducible. For the MRI data modality, the original authors did not release their source code, and key implementation details were also missing from the paper, which could have significantly influenced performance.

As researchers and developers struggle to validate and reproduce previous results, it affects the credibility and transparency of scientific research. Moreover, most of the models lack explainability, which can make health professionals hesitant to trust and adopt these AI tools. Without understanding how the AI system reached its diagnosis, there is a risk of misdiagnosis. If the AI system’s lack of interpretability leads to errors, doctors may find it difficult to identify and correct the issue, which could result in the wrong treatment for patients, severely affecting their health and quality of life. The complete availability of the codes and explainability for all included studies is shown in Table 4.

Table 4.

Summary of the code availability, data accessibility and explainability for the reviewed paper

Author	Year	Objective	Data Modality	Source Code Provided	Data Accessibility	Explainability
Neuroimaging
Prashanth et al. (2014)	2014	Classification (PD vs. HC)	Neuroimaging: DaTSCAN SPECT	NO	https://www.ppmi-info.org/data	NO
Salvatore et al. (2014)	2014	Classification (PD vs. HC)	Neuroimaging: MRI	NO	NO	NO
Rana et al. (2015)	2015	Classification (PD vs. HC)	Neuroimaging: MRI	NO	NO	NO
Oliveira and Castelo-Branco (2015)	2016	Classification (PD vs. HC)	Neuroimaging: FP-CIT SPECT	NO	https://www.ppmi-info.org/data	NO
Zhang and Kagen (2017)	2017	Classification (PD vs. HC)	Neuroimaging: DaTSCAN SPECT	NO	https://www.ppmi-info.org/data	NO
Peng et al. (2017)	2017	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Sivaranjini and Sujatha (2020)	2020	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
West et al. (2019)	2019	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Dai et al. (2019)	2019	Classification (PD vs. HC)	Neuroimaging: PET	NO	https://www.ppmi-info.org/data; https://adni.loni.usc.edu/; https://db.humanconnectome.org/app/template/Login.vm	NO
Zhang et al. (2019)	2019	Classification (Prodromal PD vs. Confirmed PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Chakraborty et al. (2020)	2020	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Kaur et al. (2021)	2021	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Vyas et al. (2022)	2022	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	NO
Ya et al. (2022)	2022	Classification (PD vs. NC)	Neuroimaging: MRI	NO	NO	NO
Erdaş and Sümer (2022)	2022	Classification (PD vs. NC)	Neuroimaging: MRI	NO	https://fcon_1000.projects.nitrc.org/indi/retro/parkinsons.html	NO
Huang et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	https://gitee.com/yxfamy/mnc-net_master.git (Currently 403 cannot access)		YES
Xu et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	https://github.com/ymlasu/A-Bio-marker-using-Topological-Machine-Learning-of-rs-fMRI (Only part of the code is provided)	https://www.ppmi-info.org/data	NO
Camacho et al. (2023)	2023	Classification (PD vs. HC)	Neuroimaging: MRI	NO	https://www.ppmi-info.org/data	YES
Priyadharshini et al. (2024)	2024	Classification (PD vs. HC)	Neuroimaging: 3D MRI	NO		YES
Talai et al. (2021)	2021	Classification (PD vs. PSP vs. HC)	Neuroimaging: T1, T2, DTI MRI	NO	https://www.ppmi-info.org/data	NO
Prasuhn et al. (2020)	2020	Classification (PD vs. HC)	Neuroimaging: Diffusion Tensor Imaging (DTI)	NO	https://www.ppmi-info.org/data	NO
Chen et al. (2023)	2023	Classification (PD-MCI vs. PD-NC)	Neuroimaging: DTI (FA, MD, AD, RD, LDH)	NO	contact corresponding author for access	YES
Tsai et al. (2023)	2023	Classification (PD vs. PSP vs. MSA vs. HC)	Neuroimaging: DTI (whole-brain features)	NO	NO	NO
Zhao et al. (2022)	2022	Classification (PD vs. HC)	Neuroimaging: DTI (Fractional Anisotropy, MD)	NO	NO	NO
Voice
Sakar and Kursun (2010)	2010	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Bhattacharya and Bhatia (2010)	2010	Classification (PD vs. HC)	Voice dataset	https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (Only part of the code is provided)	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Guo et al. (2010)	2010	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Åström and Koker (2011)	2011	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Ramani and Sivagami (2011)	2011	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Yadav et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Tsanas et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Mandal and Sairam (2014)	2014	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Hazan et al. (2012)	2012	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Gharehchopogh and Mohammadi (2013)	2013	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Rustempasic and Can (2013)	2013	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Sharma and Giri (2014)	2014	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Olanrewaju et al. (2014)	2014	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Peker et al. (2015)	2015	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Gök (2015)	2015	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Chen et al. (2016)	2016	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Avci and Dogantekin (2016)	2016	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Dinesh and He (2017)	2017	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Caliskan et al. (2017)	2017	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Parisi et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings	NO
Wroge et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Lahmiri et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Haq et al. (2018)	2018	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Ali et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	https://github.com/LiaqatAli007/Automated-Detection-of-Parkinson-s-Disease-Based-on-Multiple-Types-of-Sustained-Phonations-using-Lin	https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings	NO
Mostafa et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Lahmiri and Shmuel (2019)	2019	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Haq et al. (2019)	2019	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Senturk (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Karan et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Soumaya et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Karaman et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Quan et al. (2021)	2021	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Zahid et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Rizvi et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings	NO
Abayomi-Alli et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Gunduz (2019)	2019	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification	NO
Nagasubramanian and Sankayya (2021)	2021	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Fang et al. (2020)	2020	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Ali et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Hireš et al. (2022)	2022	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Rana et al. (2022)	2022	Classification (PD vs. HC)	Voice dataset	NO	Avaiable on Request	NO
Madruga et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	NO	NO
Govindu and Palwe (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Celik and Başaran (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons;https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification	NO
Khaskhoussy and Ayed (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings	NO
Dheer et al. (2023)	2023	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/174/parkinsons	NO
Akila and Nayahi (2024)	2024	Classification (PD vs. HC)	Voice dataset	NO	https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification	NO
Handwriting
Drotár et al. (2014)	2014	Classification (PD vs. HC)	Handwriting dataset	NO	NO	NO
Drotár et al. (2015)	2015	Classification (PD vs. HC)	Handwriting dataset	NO	NO	NO
Pereira et al. (2015)	2015	Classification (PD vs. HC)	Handwriting dataset	NO	NO	NO
Ribeiro et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	https://github.com/lzfelix/bag-of-samplings	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/	NO
Razzak et al. (2020)	2020	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/;https://www.kaggle.com/datasets/kmader/parkinsons-drawings; https://bdalab.utko.fekt.vut.cz/	NO
Kamran et al. (2021)	2021	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/;https://www.kaggle.com/datasets/kmader/parkinsons-drawings; https://bdalab.utko.fekt.vut.cz/	NO
Gil-Martín et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	NO	https://archive.ics.uci.edu/dataset/395/parkinson+disease+spiral+drawings+using+digitized+graphics+tablet	NO
Diaz et al. (2021)	2021	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/	NO
Taleb et al. (2019)	2019	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/	NO
Varalakshmi et al. (2022)	2022	Classification (PD vs. HC)	Handwriting dataset	NO	https://www.kaggle.com/datasets/kmader/parkinsons-drawings	NO
Li et al. (2022)	2022	Classification (PD vs. HC)	Handwriting dataset	NO	NO	NO
Zhao and Li (2023)	2023	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/	NO
Abdullah et al. (2023)	2023	Classification (PD vs. HC)	Handwriting dataset	NO	https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/	NO
Wang et al. (2024)	2024	Classification (PD vs. HC)	Handwriting dataset	NO	NO	NO
Gait
Tahir and Manap (2012)	2012	Classification (PD vs. HC)	Gait dataset	NO	NO	NO
Wahid et al. (2015)	2015	Classification (PD vs. HC)	Gait dataset	NO	NO	NO
Shetty and Rao (2016)	2016	Classification (PD vs. HD vs. ALS)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Abdulhay et al. (2018)	2018	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Rehman et al. (2019)	2019	Classification (PD vs. HC)	Gait dataset	NO	NO	NO
Balaji et al. (2021)	2021	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Xia et al. (2019)	2019	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
El Maachi et al. (2020)	2020	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Aversano et al. (2020)	2020	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Liu et al. (2021)	2021	Classification (PD vs. HC)	Gait dataset	Submit an application to the author	https://physionet.org/content/gaitpdb/1.0.0/	NO
Nguyen et al. (2022)	2022	Classification (PD vs. HC)	Gait dataset	https://github.com/DucMinhDimitriNguyen	https://physionet.org/content/gaitpdb/1.0.0/	NO
Trabassi et al. (2022)	2022	Classification (PD vs. HC)	Gait dataset	NO	Request from the corresponding author	NO
Li and Li (2022)	2022	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Aşuroğlu and Oğul (2022)	2022	Classification (PD vs. HC), Regression (UPDRS value)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Ma et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
Vinora et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	NO	NO	NO
Sharma et al. (2023)	2023	Classification (PD vs. HC)	Gait dataset	NO	https://physionet.org/content/gaitpdb/1.0.0/	NO
EEG
Lee et al. (2019)	2019	Classification (PD vs. HC)	EEG	NO	NO	NO
Oh et al. (2020)	2020	Classification (PD vs. HC)	EEG	NO	NO	NO
Anjum et al. (2020)	2020	Classification (PD vs. HC)	EEG	NO	http://narayanan.lab.uiowa.edu/;http://predict.cs.unm.edu/	NO
Shaban (2021)	2021	Classification (PD vs. HC)	EEG	NO	https://openneuro.org/datasets/ds002778/versions/1.0.5	NO
Loh et al. (2021)	2021	Classification (PD vs. HC)	EEG	NO	https://openneuro.org/datasets/ds002778/versions/1.0.5	NO
Motin et al. (2022)	2022	Classification (PD vs. HC)	EEG	NO	https://openneuro.org/datasets/ds002778/versions/1.0.5	YES
Chawla et al. (2023)	2023	Classification (PD vs. HC)	EEG	NO	NO	NO
Coelho et al. (2023)	2023	Classification (PD vs. HC)	EEG	NO	http://predict.cs.unm.edu/	NO
Nour et al. (2023)	2023	Classification (PD vs. HC)	EEG	NO	https://openneuro.org/datasets/ds002778/versions/1.0.5	NO
Zhao et al. (2024)	2024	Classification (PD vs. HC)	EEG	Request from the corresponding author	NO	NO
Other Data
Bhandari et al. (2023)	2023	Classification (PD vs. HC)	Gene dataset	https://github.com/nikitabhandari-dl/Parkinson-s-disease-diagnosis (Currently 404 cannot access)	https://ngdc.cncb.ac.cn/	YES
Wang et al. (2023)	2023	Classification (PD vs. HC)	Urine biomarkers	NO	NO	NO
Junaid et al. (2023)	2023	Classification (PD vs. HC)	Patient visits	NO	https://www.ppmi-info.org/	YES
Igene et al. (2023)	2023	Classification (PD vs. HC)	Movement data	NO	https://doi.org/10.21227/g2g8-1503	NO
Varghese et al. (2024)	2024	Classification (PD vs. HC)	Smartwatch data, Questionnaire data	https://imigitlab.uni-muenster.de/published/pads-project	https://uni-muenster.sciebo.de/s/q69vUfRc9vgBoWX	NO

Open in a new tab

Discussions

Summary of findings

ML-based PD diagnosis is a rapidly growing and changing field of research. This systematic review includes 117 articles about PD diagnosis using ML from 2010 to 2024. We analyze and divide them into six categories based on the data modality used in the study: (1) Neuroimaging, (2) Voice, (3) Handwriting, (4) Gait, (5) EEG, and (6) Other data. Fig 12. provides the trends of the publication for the last 15 years (2010-2024). Compared with other modalities, the neuroimaging modality, especially DaTSCAN SPECT, is the best modality for PD diagnosis in clinical practice, whereas MRI is almost useless. However, the usage of neuroimaging can be expensive. Voice recording, handwriting, and gait data are non-invasive, cost-effective, and easily collected. Hence, these data may be used for PD diagnosis. The main disadvantage of using these modalities is the lack of uniform standards for data collection, which may lead to inaccurate diagnosis. In clinical practice, EEG is not useful for the diagnosis of PD. However, a few studies have used EEG to diagnose PD, and the validity of this modality needs to be further investigated by researchers in this field.

Fig. 12 — The development and changes of data in different modalities

We have also summarized the changes in the application of traditional ML and DL in PD diagnosis over the past 15 years. Fig 13 illustrates the temporal evolution of the application of traditional ML and DL techniques in PD classification research over five-year intervals from 2010 to 2024. During the early years from 2010 to 2014, traditional ML methods such as SVM and Random Forest dominated the field, while there was less use of the DL approach. During the period from 2015 to 2019, DL gained momentum and nearly caught up with traditional ML methods. A major shift occurred in the period from 2020 to 2024, where the number of studies employing DL significantly surpassed those using traditional ML, which has established DL as the mainstream approach. This trend reflects the increasing availability of large-scale datasets, advancements in computational resources, and the superior performance of deep neural networks in complex biomedical classification tasks.

Across the 117 studies reviewed in our systematic review, the main issue is that comparing the model performance with different modalities is hard. For example, the clinical value for an ML-based PD diagnosis using neuroimaging and voice recordings differs. Another issue is that the authors needed to provide more implementation details. For instance, some articles have not reported hyperparameters clearly, which may cause difficulty in reproducing the experiments. In addition, some articles only used accuracy as the evaluation metric of the model, which is not reasonable. Accuracy can be misleading when the data is imbalanced, meaning that there are significantly more samples of one class than the others. Also, different misclassification errors can have different costs in real-world applications. In a medical diagnosis task, a false negative (i.e., a patient is predicted as not having a disease when they do) can have severe consequences compared to a false positive (i.e., a patient is predicted as having a disease when they don’t). Therefore, more evaluation metrics such as specificity and sensitivity should be considered.

Limitations of current studies

Dataset size

This review has identified several limitations of existing studies that have applied ML to PD diagnosis. Firstly, the number of database participants with PD is often relatively small; for example, the total number of subjects may be less than 50 (Sakar and Kursun 2010; Bhattacharya and Bhatia 2010; Guo et al. 2010; Åström and Koker 2011; Ramani and Sivagami 2011; Yadav et al. 2012; Mandal and Sairam 2014; Gharehchopogh and Mohammadi 2013; Rustempasic and Can 2013; Sharma and Giri 2014; Olanrewaju et al. 2014; Peker et al. 2015; Gök 2015; Chen et al. 2016; Avci and Dogantekin 2016; Dinesh and He 2017; Caliskan et al. 2017; Parisi et al. 2018; Haq et al. 2018; Ali et al. 2019; Mostafa et al. 2019; Lahmiri and Shmuel 2019; Haq et al. 2019; Senturk 2020; Soumaya et al. 2021; Quan et al. 2021; Rizvi et al. 2020; Abayomi-Alli et al. 2020; Govindu and Palwe 2023; Khaskhoussy and Ayed 2023; Dheer et al. 2023; Ribeiro et al. 2019; Taleb et al. 2019; Tahir and Manap 2012; Wahid et al. 2015; Shetty and Rao 2016; Oh et al. 2020; Shaban 2021; Loh et al. 2021; Motin et al. 2022; Chawla et al.2023; Igene et al. 2023). Only eight included articles have over 500 number of subjects (Prashanth et al. 2014; Oliveira and Castelo-Branco 2015; Zhang et al. 2019; Camacho et al. 2023; Priyadharshini et al. 2024; Tsai et al. 2023; Zhao et al. 2022; Bhandari et al. 2023). The small data size may limit the performance of the ML models.

Black box nature of ML models

Another challenge is the black-box nature of the ML model, which limits the clinical applications of ML in PD diagnosis. ML algorithms, such as SVM and DL models such as CNN and RNN, are all examples of black-box models. These models contain a large number of parameters, making it difficult to interpret how they arrive at their decisions. This makes it challenging to understand why a particular diagnosis is being made, and this lack of transparency can be a significant barrier to the adoption of ML in clinical environments. The diagnosis of PD is a life safety-critical medical task, where the accuracy of diagnosis is essential for the patient’s treatment and management. Therefore, there is a need to not only use ML as a decision-support tool but also to ensure that the ML models used are interpretable to medical experts and patients. Interpretable ML models allow doctors and patients to understand the reasoning behind the model’s decision-making process, thereby increasing their trust in the model’s accuracy and reliability. Interpretable ML models provide insights into the input features that have the most significant impact on the diagnosis, the relationship between the input features and the output, and how the model arrives at its final decision.

No standardization of validation

This review has identified a lack of standardization of validation. Included studies used different validation methods, including k-fold cross-validation and hold-out validation. The use of different validation methods makes comparisons between different studies difficult. More specifically, if one study claims that it outperformed the state-of-the-art (SOTA), the proposed methodology should at least replicate other SOTA methods under the same dataset, same experiment setup and exact validation mechanism. Otherwise, it is unconvincing, as the dataset’s bias and validation mechanism may produce this better performance and not necessarily the ML algorithm design.

Lack of medical experts’ participation

Most studies follow a typical sequence. First, different modality data are collected and processed from PD and healthy control participants. Next, clinical experts manually annotate the dataset. Finally, the ML model is trained to classify patients and healthy controls. Thus, clinicians only contribute to the data label annotation, which limits the performance of the ML model building. ML scientists and medical experts should collaborate at all stages to provide feedback on the model performance and give valuable suggestions on model selection and explanation.

Bias Risk and Trustworthiness of ML-Based PD Diagnosis

Despite the growing body of ML research on PD diagnosis, only 28 out of the 117 reviewed studies have been assessed as having an overall low risk of bias based on our PROBAST evaluation. Common issues include small sample sizes, lack of external validation, unclear blinding procedures, and potential data leakage during feature selection. These limitations significantly impact the reliability and generalizability of ML models. A model that performs well within a single cohort may still fail when applied to external or real-world clinical settings. Thus, confidence in ML-based diagnostic tools depends not only on predictive performance but also on methodological rigor and transparency. High-risk bias compromises both reproducibility and the level of clinical trust necessary for real-world deployment.

No standardizing ML approaches

Our systematic review reveals that there is currently no standardized ML approach for the diagnosis of PD. One of the key obstacles to achieving generalizable and reproducible ML models is the lack of standardization across publicly available datasets. This issue significantly hinders fair model comparison, reproducibility, and clinical translation. First, there is considerable heterogeneity in data acquisition protocols. Different datasets are collected using varying configurations; for example, EEG sampling rates may differ (e.g., 128 Hz vs. 1024 Hz), MRI scans may be acquired using different field strengths (e.g., 1.5T vs. 3T), and voice recordings may be captured under inconsistent environmental conditions. These discrepancies lead to variations in signal quality and frequency content, which directly affect feature extraction and model performance. Second, substantial variability exists in patient cohorts and diagnostic labelling. Datasets differ in inclusion criteria (e.g., drug-naïve vs. medicated patients), disease stage distributions, age ranges, and definitions of control groups. Furthermore, diagnostic labels are often assigned based on different clinical criteria, such as the MDS-UPDRS, Hoehn and Yahr staging, or clinician judgment, leading to label inconsistency and reduced comparability. Third, inconsistencies in preprocessing and feature engineering pipelines further complicate model standardization. Many studies employ custom workflows, such as filtering, artifact removal, and dimensionality reduction, that are often poorly documented and difficult to reproduce. In some cases, parameter tuning may even occur on the test set, introducing additional bias into performance evaluation. Finally, differences in data modalities and formats add to the complexity. Multimodal datasets often vary in terms of synchronization and alignment between modalities. Some datasets provide only raw signals, while others include derived features or lack essential metadata, making it challenging to develop standardized multimodal fusion methods.

Future research directions

Explainable artificial intelligence (XAI)

XAI aims to provide understandable human explanations to users to better understand the black box models’ decision process (Zhang et al. 2022). The XAI approach has the potential to generate improved models and verified predictions. Moreover, an XAI system can help clinicians and researchers to understand the reasoning behind an AI system’s decision and to identify potential biases or limitations in the model. This can help to improve the accuracy and reliability of PD diagnosis, which can have important implications for patient outcomes.

Data augmentation

Data augmentation is a method to generate synthetic data. As the dataset size used for ML-based PD diagnosis is relatively small, data augmentation is a feasible approach to increase the dataset size and further improve the performance and the generalisation of the ML model. Different data modalities need to apply different data augmentation methods. Generative Adversarial Networks (GAN) is a promising method which has mostly been applied to generating image data (Yi et al. 2019). It can create diverse and realistic synthetic data that can capture the underlying data distribution, which reduces overfitting in ML models by increasing the diversity of the training data. In the future, using GAN to generate voice, neuroimaging, handwriting, gait, and EEG data for PD diagnosis is also achievable.

Transfer learning

The size of the dataset currently used to diagnose PD is insufficient for ML; therefore, transfer learning could be an effective approach to improve training efficiency and speed. When working with a small dataset, there is a higher risk of overfitting, where the model becomes too specialized in the training data and performs poorly on unseen data. To address this issue, transfer learning can be employed by leveraging a pre-trained model that has learned features from a large dataset and transferring that knowledge to a smaller dataset. Additionally, transfer learning can save valuable time and computational resources by reducing the amount of training required for a new model. Instead of training a model from scratch, transfer learning enables the fine-tuning of an existing model on a small dataset, which is a more efficient and quicker process (Kaur et al. 2021).

Federated learning

ML models often require large amounts of user data. However, collecting data for PD poses challenges since individual hospitals and organisations collect data, and data sharing may be hindered. Federated learning presents a potential solution for developing models that identify PD biomarkers and patterns using data from various sources, such as medical records, clinical studies, and wearable devices. With federated learning, different parties can collaborate to create a shared model without sharing their data. This approach also facilitates the use of large datasets without centralising the data, which is essential when working with sensitive patient information. Instead, data remains on local devices, and the model is trained by aggregating information across multiple devices without transferring data. Federated learning thus protects patient privacy while enabling the development of accurate models (Rieke et al. 2020).

Multi-modality

The multi-modality approach is a promising direction, as it can integrate multiple-view information and perform better than a single modality (Makarious et al. 2022). Single-modality learning is prone to overfitting, especially when the data samples are limited, which is often the case with PD datasets that are small and prone to noise. By incorporating additional modalities, such as genetic analysis, neuroimaging, or EEG, the model can compensate for the lack of data, enhancing its ability to learn from different types of information, thereby improving diagnostic accuracy. Furthermore, the clinical manifestations of PD vary across patients, and a single modality may fail to capture these differences comprehensively. Multi-modal data can ensure that the model generalises better across different patient groups. Additionally, multi-modal models remain robust even when one modality’s data is missing or of poor quality, making reliable predictions without being affected by the absence of any single modality. However, the acquisition of diverse data presents challenges, particularly in data availability, quality, and integration. Developing datasets specifically designed for multi-modal research remains a significant hurdle, and the standardization of data collection protocols across modalities is necessary to ensure consistency. In the future, integrating diverse modalities such as genetic data, blood samples, neuroimaging, voice, handwriting, gait analysis, and EEG into a unified ML framework could significantly improve PD diagnosis, leading to earlier and more accurate diagnoses, better patient stratification, and personalized treatments, ultimately enhancing patient outcomes.

Open source culture and standard protocols

To promote the development of ML in the diagnosis of PD, researchers should proactively disclose the full source code and experimental details used in their studies. This includes all necessary steps of data preprocessing, model evaluation, hyperparameter tuning, and pre-trained model. It ensures that other researchers can accurately reproduce and validate experimental results. Additionally, researchers should create standardized datasets collection and evaluation protocols, allowing all methods to be assessed and compared on a fair and uniform basis. At the same time, academic journals should implement stricter peer-review processes, particularly focusing on the reproducibility of the submitted works. Reviewers need to be specifically trained to ensure they can thoroughly assess whether the provided materials are sufficient to replicate the study results. By taking these measures, the transparency and reliability of research can be enhanced, facilitating scientific progress and technological innovation in the field.

Ethical concerns

Ethics are important in applying ML and DL to PD diagnosis. First, data privacy and security are major issues, especially in the medical field, where patient health data contains sensitive information. Unauthorised collection and use of data may lead to privacy breaches and even malicious exploitation. Secondly, fairness is a critical concern. If training data is biased, the model may produce inaccurate diagnostic results for certain groups (e.g., specific ages, genders, or ethnicities), exacerbating health inequalities. Moreover, DL models are often seen as “black boxes”, lacking transparency in their decision-making processes. Medical professionals may be hesitant to trust and adopt AI systems if they cannot understand how diagnoses are made. Finally, as AI becomes more integrated into healthcare, the issue of accountability becomes increasingly complex. If an AI system makes a wrong diagnosis leading to harm, who should take responsibility? The developers, the healthcare institution, or the AI itself? These issues need to be carefully addressed within an ethical framework. Future research should focus on developing methods to enhance the interpretability and transparency of AI systems, establishing guidelines for data privacy and security, and creating clear accountability structures. Collaborative efforts between AI researchers, healthcare professionals, and ethicists are essential to ensure that these technologies are implemented responsibly and fairly, mitigating potential risks and improving patient outcomes.

More complete model evaluation

In future research, it is important not to rely solely on traditional evaluation metrics such as accuracy, AUC, and precision. While these metrics are undoubtedly valuable, they may not fully capture model performance, particularly in tasks involving ordinal or continuous outcomes. To address this limitation, we advocate for the complementary use of more agnostic and unified evaluation measures, such as the Rank Graduation Accuracy (RGA) proposed by Giudici and Raffinetti (2025). RGA is applicable across binary, ordinal, and continuous predictive settings, offering a more generalizable and consistent framework for comparing models under diverse data conditions and outcome types. Incorporating such metrics alongside traditional ones could significantly improve the fairness, robustness, and clinical relevance of performance evaluation in ML-based disease diagnosis.

Beyond predictive performance, model interpretability is another critical yet often underemphasized component of diagnostic model evaluation. Traditional metrics such as accuracy and AUC provide insights into how well a model performs, but offer little information about why it makes certain predictions. In medical applications, particularly in the diagnosis of complex neurodegenerative diseases such as PD, understanding the rationale behind model decisions is essential for building clinical trust, ensuring transparency, and facilitating adoption in practice. Despite its importance, explainability remains underexplored in many published studies. As shown in the Table. 4, only a limited number of works incorporate explainability techniques, and among those that there is little consistency in the methods used. Furthermore, the lack of open-source implementations prevents systematic comparison across models. To address these gaps, we encourage future research to integrate interpretability as a core component of model development and validation. In particular, model-agnostic explainability methods, which can be applied regardless of the underlying algorithm, should be prioritized, as they enable fairer and more standardized comparisons (Calzarossa et al. 2025). The adoption of such frameworks may also facilitate the identification of clinically relevant biomarkers, thereby strengthening the link between computational models and real-world medical applications.

In addition to performance and interpretability, robustness and security represent two further dimensions that are essential for the safe deployment of diagnostic models but are frequently overlooked. Robustness refers to a model’s ability to maintain stable performance when faced with noise, missing data, or domain shifts — all of which are common in real-world clinical settings. Security, by contrast, concerns the model’s resistance to adversarial examples or malicious attacks that could compromise its output. These aspects are rarely evaluated in existing studies, often due to a lack of reproducibility and the absence of standardized assessment tools. The SAFE AI framework (Babaei et al. 2025), for example, introduces the Rank Graduation Box as a structured, model-agnostic approach to evaluating robustness and security. We therefore recommend that future research explicitly incorporate robustness and security testing into the model evaluation process. Doing so will be crucial for developing trustworthy and clinically deployable AI systems, particularly in high-stakes domains such as healthcare.

Bias mitigation

To improve the reliability and clinical applicability of ML models for PD diagnosis, future research must systematically address the sources of bias identified by tools such as PROBAST. It includes implementing rigorous dataset selection with transparent inclusion and exclusion criteria, clearly reporting participant selection logic, and accounting for demographic diversity such as age, disease stage, and comorbidities. Feature selection should be strictly separated from model evaluation to prevent information leakage, an issue commonly caused by selecting features on the entire dataset prior to train-test splitting. Employing nested cross-validation can help mitigate this risk. External validation using independent datasets from different geographic, demographic, or temporal contexts remains essential for demonstrating model generalizability, yet is still underutilized. Moreover, we encourage researchers to explicitly report how each PROBAST domain is addressed, either in the methods section or supplementary materials, to enhance transparency and facilitate cross-study comparisons. Finally, close collaboration with clinical experts is crucial to identifying potential sources of bias in preprocessing and label interpretation, reducing cognitive bias, and ensuring clinical relevance. Incorporating these practices can significantly improve the transparency, robustness, and translational potential of ML-based diagnostic tools for PD.

Conclusions

This paper reviews current trends in applying ML technologies in PD diagnosis. In this review, studies are categorised by different data modalities used in the experiments, including neuroimaging, voice, handwriting, gait, and EEG. ML has shown great potential to assist PD diagnosis, and research findings also show that it can be used as a decision-support tool to assist doctors in screening, detecting, and diagnosing PD effectively. Research on applying ML to PD diagnosis still faces many limitations and challenges. We have these issues and proposed several future directions, including the use of explainable AI for model interpretability, data augmentation techniques to generate synthetic data, transfer learning to leverage pre-trained models, federated learning to protect data privacy, and multi-modality approaches to integrate diverse information from different modalities. Herein, a more comprehensive model evaluation, which is beyond traditional metrics such as accuracy and AUC, is essential for ensuring robust, fair, and clinically relevant results. Bias mitigation strategies should also be incorporated to tackle issues such as dataset imbalance, underrepresentation of subgroups, and algorithmic bias. The case studies on five data modalities show that some research papers in this field may face issues with reproduction. Open-source code and reproduced results are essential, and this should be emphasized. Additionally, an ethical framework should be established to ensure these technologies are implemented responsibly and fairly. This comprehensive review aims to reduce the gap between AI experts and medical professionals and help future researchers design ML-based PD diagnosis applications.

Acknowledgements

This research is supported by Ningbo Science and Technology Innovation 2025 Major Project 2022Z126; A.H. is awarded the Clinical Academic Research Partnership Grant by the UK Research and Innovation (Grant MR/T005580/1 and has received funding from the National Institute of Health/NIA, USA (Grant reference NIH1R56AG074467-01).

Appendix

We have included a meta-analysis for the voice data modality, which encompasses the effect sizes and relevant statistical trends. For both sensitivity and specificity, we perform the meta-analysis using the meta library in the R language. The forest plots are shown below (Fig. 14, Fig. 15). The forest plots display the variability in sensitivity and specificity across multiple studies of the voice modality.

Fig. 14 — Forest plot for sensitivity across voice modality studies

Fig. 15 — Forest plot for specificity across voice modality studies

Author contributions

J.Z., Y.Z. and Y.W. conceived and designed the study. J.Z. and Y.Z. independently screened and reviewed all included articles. J.Z., Y.Z. and Y.W. drafted the manuscript (Y.Z. contributed the abstract, introduction, methods, results, discussion and conclusion sections, J.Z. contributed to the results and discussion sections. Y.Z. significantly contributed to the figures and tables. Y.W. contributed to the methods, discussion and conclusion sections). Y.W., A.H. and B.W. secured the funding. Y.W., A.H. and T.D. supervised the project. Y.W., A.H., B.W., T.D., W.F. and W.X. contributed significant amendments to the final manuscript.

Data availibility

The datasets used in this study are publicly available. The Voice dataset can be accessed at https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple%20+types+of+sound+recordings, the Gait dataset can be accessed at https://physionet.org/content/gaitpdb/1.0.0/, the EEG dataset can be accessed at https://openneuro.org/datasets/ds002778/versions/1.0.5, the Handwriting dataset can be accessed at https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/, and the MRI dataset can be accessed at https://fcon_1000.projects.nitrc.org/indi/retro/parkinsons.html.

Declarations

Conflict of interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Zhang and Y. Zhang have contributed equally to this work.

References

Abdullah SM, Abbas T, Bashir MH, Khaja IA, Ahmad M, Soliman NF, El-Shafai W (2023) Deep transfer learning based parkinson’s disease detection using optimized feature selection. IEEE Access 11:3511–3524 [Google Scholar]
Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Abayomi-Alli A (2020) Bilstm with data augmentation using interpolation methods to improve early detection of parkinson disease. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 371–380. IEEE
Abdulhay E, Arunkumar N, Narasimhan K, Vellaiappan E, Venkatraman V (2018) Gait and tremor investigation using machine learning techniques for the diagnosis of parkinson disease. Futur Gener Comput Syst 83:366–373 [Google Scholar]
Aversano L, Bernardi ML, Cimitile M, Pecori R (2020) Early detection of parkinson disease using deep neural networks on gait dynamics. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
Ali L, Chakraborty C, He Z, Cao W, Imrana Y, Rodrigues JJ (2023) A novel sample and feature dependent ensemble approach for parkinson’s disease detection. Neural Comput Appl 35(22):15997–16010 [Google Scholar]
Avci D, Dogantekin A (2016) An expert diagnosis system for parkinson disease based on genetic algorithm-wavelet kernel-extreme learning machine. Parkinson’s disease 2016(1):5264743 [Google Scholar]
Anjum MF, Dasgupta S, Mudumbai R, Singh A, Cavanagh JF, Narayanan NS (2020) Linear predictive coding distinguishes spectral eeg features of parkinson’s disease. Parkinsonism & related disorders 79:79–85 [DOI] [PMC free article] [PubMed] [Google Scholar]
Alhussen A, Haq MA, Khan AA, Mahendran RK, Kadry S (2025) Xai-racapsnet: Relevance aware capsule network-based breast cancer detection using mammography images via explainability o-net roi segmentation. Expert Syst Appl 261:125461 [Google Scholar]
Åström F, Koker R (2011) A parallel neural network approach to prediction of parkinson’s disease. Expert Syst Appl 38(10):12470–12474 [Google Scholar]
Akila B, Nayahi JJV (2024) Parkinson classification neural network with mass algorithm for processing speech signals. Neural Comput Appl 36(17):10165–10181 [Google Scholar]
Aşuroğlu T, Oğul H (2022) A deep learning approach for parkinson’s disease severity assessment. Heal Technol 12(5):943–953 [Google Scholar]
Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE journal of translational engineering in health and medicine 7:1–10 [Google Scholar]
Bhattacharya I, Bhatia MPS (2010) Svm classification to distinguish parkinson disease patients. In: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, pp. 1–6
Balaji E, Brindha D, Elumalai VK, Vikrama R (2021) Automatic and non-invasive parkinson’s disease diagnosis and severity rating using lstm network. Appl Soft Comput 108:107463 [Google Scholar]
Babaei G, Giudici P, Raffinetti E (2025) A rank graduation box for safe ai. Expert Syst Appl 259:125239 [Google Scholar]
Badea L, Onu M, Wu T, Roceanu A, Bajenaru O (2017) Exploring the reproducibility of functional connectivity alterations in parkinson’s disease. PLoS ONE 12(11):0188196 [Google Scholar]
Bhandari N, Walambe R, Kotecha K, Kaliya M (2023) Integrative gene expression analysis for the diagnosis of parkinson’s disease using machine learning and explainable ai. Comput Biol Med 163:107140 [DOI] [PubMed] [Google Scholar]
Chakraborty S, Aich S, Kim H-C (2020) Detection of parkinson’s disease from 3t t1 weighted mri scans using 3d convolutional neural network. Diagnostics 10(6):402 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cano J-R (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831 [Google Scholar]
Celik G, Başaran E (2023) Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of parkinson’s disease from speech signals. Appl Acoust 211:109476 [Google Scholar]
Caliskan A, Badem H, Basturk A, Yuksel M (2017) Diagnosis of the parkinson disease by using deep neural network classifier. IU-Journal of Electrical & Electronics Engineering 17(2):3311–3318 [Google Scholar]
Calzarossa MC, Giudici P, Zieni R (2025) An assessment framework for explainable ai with applications to cybersecurity. Artif Intell Rev 58(5):150 [Google Scholar]
Coelho BFO, Massaranduba ABR, Santos Souza CA, Viana GG, Brys I, Ramos RP (2023) Parkinson’s disease effective biomarkers based on hjorth features improved by machine learning. Expert Syst Appl 212:118772 [Google Scholar]
Chawla P, Rana SB, Kaur H, Singh K, Yuvaraj R, Murugappan M (2023) A decision support system for automated diagnosis of parkinson’s disease from eeg using fawt and entropy features. Biomed Signal Process Control 79:104116 [Google Scholar]
Chen H-L, Wang G, Ma C, Cai Z-N, Liu W-B, Wang S-J (2016) An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinson’s disease. Neurocomputing 184:131–144 [Google Scholar]
Camacho M, Wilms M, Mouches P, Almgren H, Souza R, Camicioli R, Ismail Z, Monchi O, Forkert ND (2023) Explainable classification of parkinson’s disease using deep learning trained on a large multi-center database of t1-weighted mri datasets. NeuroImage Clinical 38:103405 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen B, Xu M, Yu H, He J, Li Y, Song D, Fan GG (2023) Detection of mild cognitive impairment in parkinson’s disease using gradient boosting decision tree models based on multilevel dti indices. J Transl Med 21(1):310 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dinesh A, He J (2017) Using machine learning to diagnose parkinson’s disease from voice recordings. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4. IEEE
Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2014) Decision support framework for parkinson’s disease based on novel handwriting markers. IEEE Trans Neural Syst Rehabil Eng 23(3):508–516 [DOI] [PubMed] [Google Scholar]
Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2016) Evaluation of handwriting kinematics and pressure for differential diagnosis of parkinson’s disease. Artif Intell Med 67:39–46 [DOI] [PubMed] [Google Scholar]
Drotár P, Mekyska J, Smékal Z, Rektorová I, Masarová L, Faundez-Zanuy M (2015) Contribution of different handwriting modalities to differential diagnosis of parkinson’s disease. In: 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings, pp. 344–348. IEEE
Diaz M, Moetesum M, Siddiqi I, Vessio G (2021) Sequence-based dynamic handwriting analysis for parkinson’s disease detection with one-dimensional convolutions and bigrus. Expert Syst Appl 168:114405 [Google Scholar]
Dheer S, Poddar M, Pandey A, Kalaivani S (2023) Parkinson’s disease detection using acoustic features from speech recordings. In: 2023 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), pp. 1–4. IEEE
Dai Y, Tang Z, Wang Y et al (2019) Data driven intelligent diagnostics for parkinson’s disease. Ieee access 7:106941–106950 [Google Scholar]
El Maachi I, Bilodeau G-A, Bouachir W (2020) Deep 1d-convnet for accurate parkinson disease detection and severity prediction from gait. Expert Syst Appl 143:113075 [Google Scholar]
Erdaş ÇB, Sümer E (2022) A deep learning method to detect parkinson’s disease from mri slices. SN Computer Science 3(2):120 [Google Scholar]
Fang H, Gong C, Zhang C, Sui Y, Li L (2020) Parkinsonian chinese speech analysis towards automatic classification of parkinson’s disease. In: Machine Learning for Health, pp. 114–125. PMLR
Frenkel-Toledo S, Giladi N, Peretz C, Herman T, Gruendlinger L, Hausdorff JM (2005) Effect of gait speed on gait rhythmicity in parkinson’s disease: variability of stride time and swing time respond differently. J Neuroeng Rehabil 2:1–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation 101(23):215–220 [Google Scholar]
Guo P-F, Bhattacharya P, Kharma N (2010) Advances in detecting parkinson’s disease. In: Medical Biometrics: Second International Conference, ICMB 2010, Hong Kong, China, June 28-30, 2010. Proceedings 2, pp. 306–314. Springer
Gharehchopogh FS, Mohammadi P (2013) A case study of parkinson’s disease diagnosis using artificial neural networks. International Journal of Computer Applications 73(19):1–6 [Google Scholar]
Gil-Martín M, Montero JM, San-Segundo R (2019) Parkinson’s disease detection from drawing movements using convolutional neural networks. Electronics 8(8):907 [Google Scholar]
Goceri E (2024) Vision transformer based classification of gliomas from histopathological images. Expert Syst Appl 241:122672 [Google Scholar]
Goceri E (2025) An efficient network with cnn and transformer blocks for glioma grading and brain tumor classification from mris. Expert Syst Appl 268:126290 [Google Scholar]
Gök M (2015) An ensemble of k-nearest neighbours algorithm for detection of parkinson’s disease. Int J Syst Sci 46(6):1108–1112 [Google Scholar]
Govindu A, Palwe S (2023) Early detection of parkinson’s disease using machine learning. Procedia Computer Science 218:249–261 [Google Scholar]
Giudici P, Raffinetti E (2025) Rga: a unified measure of predictive accuracy. Adv Data Anal Classif 19(1):67–93 [Google Scholar]
Gunduz H (2019) Deep learning-based parkinson’s disease classification using vocal feature sets. Ieee access 7:115540–115551 [Google Scholar]
Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021 [DOI] [PubMed] [Google Scholar]
Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S (2012) Early diagnosis of parkinson’s disease via machine learning on speech data. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, pp. 1–4. IEEE
Hausdorff JM, Lowenthal J, Herman T, Gruendlinger L, Peretz C, Giladi N (2007) Rhythmic auditory stimulation modulates gait variability in parkinson’s disease. Eur J Neurosci 26(8):2369–2375 [DOI] [PubMed] [Google Scholar]
Haq AU, Li J, Memon MH, Khan J, Din SU, Ahad I, Sun R, Lai Z (2018) Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of parkinson disease. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 101–106. IEEE
Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M et al (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE access 7:37718–37734 [Google Scholar]
Huang L, Ye X, Yang M, Pan L, Zheng S (2023) Mnc-net: Multi-task graph structure learning based on node clustering for early parkinson’s disease diagnosis. Comput Biol Med 152:106308 [DOI] [PubMed] [Google Scholar]
Igene L, Alim A, Imtiaz MH, Schuckers S (2023) A machine learning model for early prediction of parkinson’s disease from wearable sensors. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0734–0737. IEEE
Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T (2023) Explainable machine learning models based on multimodal time-series data for the early detection of parkinson’s disease. Comput Methods Programs Biomed 234:107495 [DOI] [PubMed] [Google Scholar]
Jankovic J (2008) Parkinson’s disease: clinical features and diagnosis. Journal of neurology neurosurgery & psychiatry 79(4):368–376 [DOI] [PubMed] [Google Scholar]
Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y (2017) Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, 2(4)
Khaskhoussy R, Ayed YB (2023) Improving parkinson’s disease recognition through voice analysis using deep learning. Pattern Recogn Lett 168:64–70 [Google Scholar]
Kaur S, Aggarwal H, Rani R (2021) Diagnosis of parkinson’s disease using deep cnn with transfer learning and data augmentation. Multimedia Tools and Applications 80(7):10113–10139 [Google Scholar]
Karaman O, Çakın H, Alhudhaif A, Polat K (2021) Robust automated parkinson disease detection based on voice signals with transfer learning. Expert Syst Appl 178:115013 [Google Scholar]
Khan AA, Mahendran RK, Perumal K, Faheem M (2024) Dual-3dm 3 ad: mixed transformer based semantic segmentation and triplet pre-processing for early multi-class alzheimer’s diagnosis. IEEE Trans Neural Syst Rehabil Eng 32:696–707 [DOI] [PubMed] [Google Scholar]
Khan AA, Madendran RK, Thirunavukkarasu U, Faheem M (2023) D2pam: Epileptic seizures prediction using adversarial deep dual patch attention mechanism. CAAI Transactions on Intelligence Technology 8(3):755–769 [Google Scholar]
Kamran I, Naz S, Razzak I, Imran M (2021) Handwriting dynamics assessment using deep neural network for early identification of parkinson’s disease. Futur Gener Comput Syst 117:234–244 [Google Scholar]
Kujur A, Raza Z, Khan AA, Wechtaisong C (2022) Data complexity based evaluation of the model dependence of brain mri images for classification of brain tumor and alzheimer’s disease. IEEE Access 10:112117–112133 [Google Scholar]
Karan B, Sahu SS, Mahto K (2020) Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybernetics and Biomedical Engineering 40(1):249–264 [Google Scholar]
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444 [DOI] [PubMed] [Google Scholar]
Lahmiri S, Dawson DA, Shmuel A (2018) Performance of machine learning methods in diagnosing parkinson’s disease based on dysphonia measures. Biomed Eng Lett 8(1):29–39 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S, Hussein R, McKeown MJ (2019) A deep convolutional-recurrent neural network architecture for parkinson’s disease eeg classification. In: 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1–4. IEEE
Li A, Li C (2022) Detecting parkinson’s disease through gait measures using machine learning. Diagnostics 12(10):2404 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu S, Liu S, Cai W, Pujol S, Kikinis R, Feng D (2014) Early diagnosis of alzheimer’s disease with deep learning. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 1015–1018. IEEE
Liu X, Li W, Liu Z, Du F, Zou Q (2021) A dual-branch model for diagnosis of parkinson’s disease based on the independent and joint features of the left and right gait. Applied Intelligence, 1–12
Loh HW, Ooi CP, Palmer E, Barua PD, Dogan S, Tuncer T, Baygin M, Acharya UR (2021) Gaborpdnet: Gabor transformation and deep neural network for parkinson’s disease detection using eeg signals. Electronics 10(14):1740 [Google Scholar]
Lahmiri S, Shmuel A (2019) Detection of parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed Signal Process Control 49:427–433 [Google Scholar]
Li Z, Yang J, Wang Y, Cai M, Liu X, Lu K (2022) Early diagnosis of parkinson’s disease using continuous convolution network: Handwriting recognition based on off-line hand drawing without template. J Biomed Inform 130:104085 [DOI] [PubMed] [Google Scholar]
Li R, Zhang W, Suk H-I, Wang L, Li J, Shen D, Ji S (2014) Deep learning based imaging data completion for improved brain disease diagnosis. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. Springer
Ma Y-W, Chen J-L, Chen Y-J, Lai Y-H (2023) Explainable deep learning architecture for early diagnosis of parkinson’s disease. Soft Comput 27(5):2729–2738 [Google Scholar]
Madruga M, Campos-Roca Y, Pérez CJ (2023) Addressing smartphone mismatch in parkinson’s disease detection aid systems based on speech. Biomed Signal Process Control 80:104281 [Google Scholar]
Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group* t (2009) Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Annals of internal medicine 151(4), 264–269
Makarious MB, Leonard HL, Vitale D, Iwaki H, Sargent L, Dadu A, Violich I, Hutchins E, Saffo D, Bandres-Ciga S et al (2022) Multi-modality machine learning predicting parkinson’s disease. npj Parkinson’s Disease 8(1):35 [Google Scholar]
Motin MA, Mahmud M, Brown DJ (2022) Detecting parkinson’s disease from electroencephalogram signals: an explainable machine learning approach. In: 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE
Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Abd Ghani MK, Jaber MM, Khaleefah SH (2019) Examining multiple feature evaluation and classification methods for improving the diagnosis of parkinson’s disease. Cogn Syst Res 54:90–99 [Google Scholar]
Mandal I, Sairam N (2014) New machine-learning algorithms for prediction of parkinson’s disease. Int J Syst Sci 45(3):647–666 [Google Scholar]
Nakach F-Z, Idri A, Goceri E (2024) A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classification. Artif Intell Rev 57(12):327 [Google Scholar]
Nguyen DMD, Miah M, Bilodeau G-A, Bouachir W (2022) Transformers for 1d signals in parkinson’s disease detection from gait. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 5089–5095. IEEE
Nagasubramanian G, Sankayya M (2021) Multi-variate vocal data analysis for detection of parkinson disease using deep learning. Neural Comput Appl 33(10):4849–4864 [Google Scholar]
Nour M, Senturk U, Polat K (2023) Diagnosis and classification of parkinson’s disease using ensemble learning and 1d-pdcovnn. Comput Biol Med 161:107031 [DOI] [PubMed] [Google Scholar]
Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, Gonzalez-Rátiva MC, Nöth E (2014) New spanish speech corpus database for the analysis of people suffering from parkinson’s disease. In: Lrec, pp. 342–347
Oliveira FP, Castelo-Branco M (2015) Computer-aided diagnosis of parkinson’s disease based on [123i] fp-cit spect binding potential images, using the voxels-as-features approach and support vector machines. J Neural Eng 12(2):026008 [DOI] [PubMed] [Google Scholar]
Oh SL, Hagiwara Y, Raghavendra U, Yuvaraj R, Arunkumar N, Murugappan M, Acharya UR (2020) A deep learning approach for parkinson’s disease diagnosis from eeg signals. Neural Comput Appl 32:10927–10933 [Google Scholar]
Olanrewaju RF, Sahari NS, Musa AA, Hakiem N (2014) Application of neural networks in early detection and diagnosis of parkinson’s disease. In: 2014 International Conference on Cyber and IT Service Management (CITSM), pp. 78–82. IEEE
Prasuhn J, Heldmann M, Münte TF, Brüggemann N (2020) A machine learning-based classification approach on parkinson’s disease diffusion tensor imaging datasets. Neurological research and practice 2:1–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Perumal K, Mahendran RK, Ahmad Khan A, Kadry S (2025) Tri-m2mt: Multi-modalities based effective acute bilirubin encephalopathy diagnosis through multi-transformer using neonatal magnetic resonance imaging. CAAI Transactions on Intelligence Technology
Pereira CR, Pereira DR, Da Silva FA, Hook C, Weber SA, Pereira LA, Papa JP (2015) A step towards the automated diagnosis of parkinson’s disease: Analyzing handwriting movements. In: 2015 IEEE 28th International Symposium on Computer-based Medical Systems, pp. 171–176. Ieee
Parisi L, RaviChandran N, Manaog ML (2018) Feature-driven machine learning to improve early diagnosis of parkinson’s disease. Expert Syst Appl 110:182–190 [Google Scholar]
Prashanth R, Roy SD, Mandal PK, Ghosh S (2014) Automatic classification and prediction models for early parkinson’s disease diagnosis from spect imaging. Expert Syst Appl 41(7):3333–3342 [Google Scholar]
Priyadharshini S, Ramkumar K, Vairavasundaram S, Narasimhan K, Venkatesh S, Amirtharajan R, Kotecha K (2024) A comprehensive framework for parkinson’s disease diagnosis using explainable artificial intelligence empowered machine learning techniques. Alex Eng J 107:568–582 [Google Scholar]
Peker M, Şen B, Delen D (2015) Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mrmr feature selection algorithm. Journal of healthcare engineering 6(3):281–302 [DOI] [PubMed] [Google Scholar]
Peng B, Wang S, Zhou Z, Liu Y, Tong B, Zhang T, Dai Y (2017) A multilevel-roi-features-based machine learning method for detection of morphometric biomarkers in parkinson’s disease. Neurosci Lett 651:88–94 [DOI] [PubMed] [Google Scholar]
Quan C, Ren K, Luo Z (2021) A deep learning based method for parkinson’s disease detection using dynamic features of speech. IEEE access 9:10239–10252 [Google Scholar]
Ribeiro LC, Afonso LC, Papa JP (2019) Bag of samplings for computer-assisted parkinson’s disease diagnosis based on recurrent neural networks. Comput Biol Med 115:103477 [DOI] [PubMed] [Google Scholar]
Rustempasic I, Can M (2013) Diagnosis of parkinson’s disease using principal component analysis and boosting committee machines. Southeast Europe journal of soft computing, 2(1)
Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L (2019) Selecting clinically relevant gait characteristics for classification of early parkinson’s disease: a comprehensive machine learning approach. Sci Rep 9(1):17269 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rana A, Dumka A, Singh R, Rashid M, Ahmad N, Panda MK (2022) An efficient machine learning approach for diagnosing parkinson’s disease by utilizing voice features. Electronics 11(22):3782 [Google Scholar]
Rieke N, Hancox J, Li W, Milletari F, Roth HR, Albarqouni S, Bakas S, Galtier MN, Landman BA, Maier-Hein K et al (2020) The future of digital health with federated learning. NPJ digital medicine 3(1):119 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rastogi D, Johri P, Donelli M, Kadry S, Khan AA, Espa G, Feraco P, Kim J (2025) Deep learning-integrated mri brain tumor analysis: feature extraction, segmentation, and survival prediction using replicator and volumetric networks. Sci Rep 15(1):1437 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Agrawal R, Behari M (2015) Regions-of-interest based automated diagnosis of parkinson’s disease using t1-weighted mri. Expert Syst Appl 42(9):4506–4516 [Google Scholar]
Razzak I, Kamran I, Naz S (2020) Deep analysis of handwritten notes for early diagnosis of neurological disorders. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE
Rizvi DR, Nissar I, Masood S, Ahmed M, Ahmad F (2020) An lstm based deep learning model for voice-based detection of parkinson’s disease. Int. J. Adv. Sci. Technol, 29(8)
Ramani RG, Sivagami G (2011) Parkinson disease classification using data mining algorithms. International journal of computer applications 32(9):17–22 [Google Scholar]
Sigcha L, Borzì L, Amato F, Rechichi I, Ramos-Romero C, Cárdenas A, Gascó L, Olmo G (2023) Deep learning and wearable sensors for the diagnosis and monitoring of parkinson’s disease: A systematic review. Expert Syst Appl 229:120541 [Google Scholar]
Salvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, Arabia G, Morelli M, Gilardi M, Quattrone A (2014) Machine learning on brain mri data for differential diagnosis of parkinson’s disease and progressive supranuclear palsy. J Neurosci Methods 222:230–237 [DOI] [PubMed] [Google Scholar]
Senturk ZK (2020) Early diagnosis of parkinson’s disease using machine learning algorithms. Med Hypotheses 138:109603 [DOI] [PubMed] [Google Scholar]
Sharma A, Giri RN (2014) Automatic recognition of parkinson’s disease via artificial neural network and support vector machine. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 4(3):2278–3075 [Google Scholar]
Shaban M (2021) Automated screening of parkinson’s disease using deep learning based electroencephalography. In: 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 158–161. IEEE
Sharma NP, Junaid I, Ari S (2023) Early diagnosis of parkinson’s disease and severity assessment based on gait using 1d-cnn. In: 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), pp. 1–6. IEEE
Sakar CO, Kursun O (2010) Telediagnosis of parkinson’s disease using measurements of dysphonia. Journal of medical systbagems 34:591–599 [Google Scholar]
Shetty S, Rao Y (2016) Svm based machine learning approach to identify parkinson’s disease using gait analysis. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–5. IEEE
Sivaranjini S, Sujatha C (2020) Deep learning based diagnosis of parkinson’s disease using convolutional neural network. Multimedia tools and applications 79(21):15467–15479 [Google Scholar]
Soumaya Z, Taoufiq BD, Benayad N, Yunus K, Abdelkrim A (2021) The detection of parkinson disease using the genetic algorithm and svm classifier. Appl Acoust 171:107528 [Google Scholar]
Tsai C-C, Chen Y-L, Lu C-S, Cheng J-S, Weng Y-H, Lin S-H, Wu Y-M, Wang J-J (2023) Diffusion tensor imaging for the differential diagnosis of parkinsonism by machine learning. Biomedical journal 46(3):100541 [DOI] [PMC free article] [PubMed] [Google Scholar]
Taleb C, Khachab M, Mokbel C, Likforman-Sulem L (2019) Visual representation of online handwriting time series for deep learning parkinson’s disease detection. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 6, pp. 25–30. IEEE
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271 [DOI] [PubMed] [Google Scholar]
Tahir NM, Manap HH (2012) Parkinson disease gait classification based on machine learning approach. J Appl Sci (Faisalabad) 12(2):180–185 [Google Scholar]
Trifonova OP, Maslov DL, Balashova EE, Urazgildeeva GR, Abaimov DA, Fedotova EY, Poleschuk VV, Illarioshkin SN, Lokhov PG (2020) Parkinson’s disease: available clinical and promising omics tests for diagnostics, disease risk assessment, and pharmacotherapy personalization. Diagnostics 10(5):339 [DOI] [PMC free article] [PubMed] [Google Scholar]
Talai AS, Sedlacik J, Boelmans K, Forkert ND (2021) Utility of multi-modal mri for differentiating of parkinson’s disease and progressive supranuclear palsy using machine learning. Front Neurol 12:648548 [DOI] [PMC free article] [PubMed] [Google Scholar]
Trabassi D, Serrao M, Varrecchia T, Ranavolo A, Coppola G, De Icco R, Tassorelli C, Castiglia SF (2022) Machine learning approach to support the detection of parkinson’s disease in imu-based gait analysis. Sensors 22(10):3700 [DOI] [PMC free article] [PubMed] [Google Scholar]
Vinora A, Ajitha E, Sivakarthi G, et al (2023) Detecting parkinson’s disease using machine learning. In: 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), pp. 1–6. IEEE
Varghese J, Brenner A, Fujarski M, Alen CM, Plagwitz L, Warnecke T (2024) Machine learning in the parkinson’s disease smartwatch (pads) dataset. npj Parkinson’s Disease 10(1):9 [Google Scholar]
Varalakshmi P, Priya BT, Rithiga BA, Bhuvaneaswari R, Sundar RSJ (2022) Diagnosis of parkinson’s disease from hand drawing utilizing hybrid models. Parkinsonism & related disorders 105:24–31 [DOI] [PubMed] [Google Scholar]
Vyas T, Yadav R, Solanki C, Darji R, Desai S, Tanwar S (2022) Deep learning-based scheme to diagnose parkinson’s disease. Expert Syst 39(3):12739 [Google Scholar]
Wahid F, Begg RK, Hass CJ, Halgamuge S, Ackland DC (2015) Classification of parkinson’s disease gait using spatial-temporal gait features. IEEE J Biomed Health Inform 19(6):1794–1802 [DOI] [PubMed] [Google Scholar]
Wang X, Huang J, Chatzakou M, Medijainen K, Toomela A, Nõmm S, Ruzhansky M (2024) Lstm-cnn: An efficient diagnostic network for parkinson’s disease utilizing dynamic handwriting analysis. Comput Methods Programs Biomed 247:108066 [DOI] [PubMed] [Google Scholar]
Wang X, Hao X, Yan J, Xu J, Hu D, Ji F, Zeng T, Wang F, Wang B, Fang J et al (2023) Urine biomarkers discovery by metabolomics and machine learning for parkinson’s disease diagnoses. Chin Chem Lett 34(10):108230 [Google Scholar]
Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S, Group P (2019) Probast: a tool to assess the risk of bias and applicability of prediction model studies. Annals of internal medicine 170(1), 51–58
Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins DC, Ghomi RH (2018) Parkinson’s disease diagnosis using machine learning and voice. In: 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–7. IEEE
West C, Soltaninejad S, Cheng I (2019) Assessing the capability of deep-learning models in parkinson’s disease diagnosis. In: International Conference on Smart Multimedia, pp. 237–247. Springer
Wang J, Xue L, Jiang J, Liu F, Wu P, Lu J, Zhang H, Bao W, Xu Q, Ju Z et al (2024) Diagnostic performance of artificial intelligence-assisted pet imaging for parkinson’s disease: A systematic review and meta-analysis. NPJ Digital Medicine 7(1):17 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia Y, Yao Z, Ye Q, Cheng N (2019) A dual-modal attention-enhanced deep learning network for quantification of parkinson’s disease characteristics. IEEE Trans Neural Syst Rehabil Eng 28(1):42–51 [DOI] [PubMed] [Google Scholar]
Xu N, Zhou Y, Patel A, Zhang N, Liu Y (2023) Parkinson’s disease diagnosis beyond clinical features: a bio-marker using topological machine learning of resting-state functional magnetic resonance imaging. Neuroscience 509:43–50 [DOI] [PubMed] [Google Scholar]
Yogev G, Giladi N, Peretz C, Springer S, Simon ES, Hausdorff JM (2005) Dual tasking, gait rhythmicity, and parkinson’s disease: which aspects of gait are attention demanding? Eur J Neurosci 22(5):1248–1256 [DOI] [PubMed] [Google Scholar]
Ya Y, Ji L, Jia Y, Zou N, Jiang Z, Yin H, Mao C, Luo W, Wang E, Fan G (2022) Machine learning models for diagnosis of parkinson’s disease using multiple structural magnetic resonance imaging features. Frontiers in Aging Neuroscience 14:808520 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yadav G, Kumar Y, Sahoo G (2012) Predication of parkinson’s disease using data mining methods: A comparative analysis of tree, statistical and support vector machine classifiers. In: 2012 National Conference on Computing and Communication Systems, pp. 1–8. IEEE
Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58:101552 [DOI] [PubMed] [Google Scholar]
Zhao S, Dai G, Li J, Zhu X, Huang X, Li Y, Tan M, Wang L, Fang P, Chen X et al (2024) An interpretable model based on graph learning for diagnosis of parkinson’s disease with voice-related eeg. NPJ Digital Medicine 7(1):3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J (2022) Mining imaging and clinical data with machine learning approaches for the diagnosis and early detection of parkinson’s disease. npj Parkinson’s Disease 8(1):13 [Google Scholar]
Zhang YC, Kagen AC (2017) Machine learning interface for medical image analysis. J Digit Imaging 30:615–621 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao A, Li J (2023) A significantly enhanced neural network for handwriting assessment in parkinson’s disease detection. Multimedia Tools and Applications 82(25):38297–38317 [Google Scholar]
Zahid L, Maqsood M, Durrani MY, Bakhtyar M, Baber J, Jamal H, Mehmood I, Song O-Y (2020) A spectrogram-based deep feature assisted computer-aided diagnostic system for parkinson’s disease. IEEE Access 8:35482–35495 [Google Scholar]
Zhao H, Tsai C-C, Zhou M, Liu Y, Chen Y-L, Huang F, Lin Y-C, Wang J-J (2022) Deep learning based diagnosis of parkinson’s disease using diffusion magnetic resonance imaging. Brain Imaging Behav 16(4):1749–1760 [DOI] [PubMed] [Google Scholar]
Zhang Y, Weng Y, Lund J (2022) Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 12(2):237 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X, Yang Y, Wang H, Ning S, Wang H (2019) Deep neural networks with broad views for parkinson’s disease screening. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1018–1022. IEEE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] Abdullah SM, Abbas T, Bashir MH, Khaja IA, Ahmad M, Soliman NF, El-Shafai W (2023) Deep transfer learning based parkinson’s disease detection using optimized feature selection. IEEE Access 11:3511–3524 [Google Scholar]

[CR2] Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Abayomi-Alli A (2020) Bilstm with data augmentation using interpolation methods to improve early detection of parkinson disease. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 371–380. IEEE

[CR3] Abdulhay E, Arunkumar N, Narasimhan K, Vellaiappan E, Venkatraman V (2018) Gait and tremor investigation using machine learning techniques for the diagnosis of parkinson disease. Futur Gener Comput Syst 83:366–373 [Google Scholar]

[CR4] Aversano L, Bernardi ML, Cimitile M, Pecori R (2020) Early detection of parkinson disease using deep neural networks on gait dynamics. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE

[CR5] Ali L, Chakraborty C, He Z, Cao W, Imrana Y, Rodrigues JJ (2023) A novel sample and feature dependent ensemble approach for parkinson’s disease detection. Neural Comput Appl 35(22):15997–16010 [Google Scholar]

[CR6] Avci D, Dogantekin A (2016) An expert diagnosis system for parkinson disease based on genetic algorithm-wavelet kernel-extreme learning machine. Parkinson’s disease 2016(1):5264743 [Google Scholar]

[CR7] Anjum MF, Dasgupta S, Mudumbai R, Singh A, Cavanagh JF, Narayanan NS (2020) Linear predictive coding distinguishes spectral eeg features of parkinson’s disease. Parkinsonism & related disorders 79:79–85 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] Alhussen A, Haq MA, Khan AA, Mahendran RK, Kadry S (2025) Xai-racapsnet: Relevance aware capsule network-based breast cancer detection using mammography images via explainability o-net roi segmentation. Expert Syst Appl 261:125461 [Google Scholar]

[CR9] Åström F, Koker R (2011) A parallel neural network approach to prediction of parkinson’s disease. Expert Syst Appl 38(10):12470–12474 [Google Scholar]

[CR10] Akila B, Nayahi JJV (2024) Parkinson classification neural network with mass algorithm for processing speech signals. Neural Comput Appl 36(17):10165–10181 [Google Scholar]

[CR11] Aşuroğlu T, Oğul H (2022) A deep learning approach for parkinson’s disease severity assessment. Heal Technol 12(5):943–953 [Google Scholar]

[CR12] Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE journal of translational engineering in health and medicine 7:1–10 [Google Scholar]

[CR13] Bhattacharya I, Bhatia MPS (2010) Svm classification to distinguish parkinson disease patients. In: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, pp. 1–6

[CR14] Balaji E, Brindha D, Elumalai VK, Vikrama R (2021) Automatic and non-invasive parkinson’s disease diagnosis and severity rating using lstm network. Appl Soft Comput 108:107463 [Google Scholar]

[CR15] Babaei G, Giudici P, Raffinetti E (2025) A rank graduation box for safe ai. Expert Syst Appl 259:125239 [Google Scholar]

[CR16] Badea L, Onu M, Wu T, Roceanu A, Bajenaru O (2017) Exploring the reproducibility of functional connectivity alterations in parkinson’s disease. PLoS ONE 12(11):0188196 [Google Scholar]

[CR17] Bhandari N, Walambe R, Kotecha K, Kaliya M (2023) Integrative gene expression analysis for the diagnosis of parkinson’s disease using machine learning and explainable ai. Comput Biol Med 163:107140 [DOI] [PubMed] [Google Scholar]

[CR18] Chakraborty S, Aich S, Kim H-C (2020) Detection of parkinson’s disease from 3t t1 weighted mri scans using 3d convolutional neural network. Diagnostics 10(6):402 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Cano J-R (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831 [Google Scholar]

[CR20] Celik G, Başaran E (2023) Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of parkinson’s disease from speech signals. Appl Acoust 211:109476 [Google Scholar]

[CR21] Caliskan A, Badem H, Basturk A, Yuksel M (2017) Diagnosis of the parkinson disease by using deep neural network classifier. IU-Journal of Electrical & Electronics Engineering 17(2):3311–3318 [Google Scholar]

[CR22] Calzarossa MC, Giudici P, Zieni R (2025) An assessment framework for explainable ai with applications to cybersecurity. Artif Intell Rev 58(5):150 [Google Scholar]

[CR23] Coelho BFO, Massaranduba ABR, Santos Souza CA, Viana GG, Brys I, Ramos RP (2023) Parkinson’s disease effective biomarkers based on hjorth features improved by machine learning. Expert Syst Appl 212:118772 [Google Scholar]

[CR24] Chawla P, Rana SB, Kaur H, Singh K, Yuvaraj R, Murugappan M (2023) A decision support system for automated diagnosis of parkinson’s disease from eeg using fawt and entropy features. Biomed Signal Process Control 79:104116 [Google Scholar]

[CR25] Chen H-L, Wang G, Ma C, Cai Z-N, Liu W-B, Wang S-J (2016) An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinson’s disease. Neurocomputing 184:131–144 [Google Scholar]

[CR26] Camacho M, Wilms M, Mouches P, Almgren H, Souza R, Camicioli R, Ismail Z, Monchi O, Forkert ND (2023) Explainable classification of parkinson’s disease using deep learning trained on a large multi-center database of t1-weighted mri datasets. NeuroImage Clinical 38:103405 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] Chen B, Xu M, Yu H, He J, Li Y, Song D, Fan GG (2023) Detection of mild cognitive impairment in parkinson’s disease using gradient boosting decision tree models based on multilevel dti indices. J Transl Med 21(1):310 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] Dinesh A, He J (2017) Using machine learning to diagnose parkinson’s disease from voice recordings. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4. IEEE

[CR29] Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2014) Decision support framework for parkinson’s disease based on novel handwriting markers. IEEE Trans Neural Syst Rehabil Eng 23(3):508–516 [DOI] [PubMed] [Google Scholar]

[CR30] Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2016) Evaluation of handwriting kinematics and pressure for differential diagnosis of parkinson’s disease. Artif Intell Med 67:39–46 [DOI] [PubMed] [Google Scholar]

[CR31] Drotár P, Mekyska J, Smékal Z, Rektorová I, Masarová L, Faundez-Zanuy M (2015) Contribution of different handwriting modalities to differential diagnosis of parkinson’s disease. In: 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings, pp. 344–348. IEEE

[CR32] Diaz M, Moetesum M, Siddiqi I, Vessio G (2021) Sequence-based dynamic handwriting analysis for parkinson’s disease detection with one-dimensional convolutions and bigrus. Expert Syst Appl 168:114405 [Google Scholar]

[CR33] Dheer S, Poddar M, Pandey A, Kalaivani S (2023) Parkinson’s disease detection using acoustic features from speech recordings. In: 2023 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), pp. 1–4. IEEE

[CR34] Dai Y, Tang Z, Wang Y et al (2019) Data driven intelligent diagnostics for parkinson’s disease. Ieee access 7:106941–106950 [Google Scholar]

[CR35] El Maachi I, Bilodeau G-A, Bouachir W (2020) Deep 1d-convnet for accurate parkinson disease detection and severity prediction from gait. Expert Syst Appl 143:113075 [Google Scholar]

[CR36] Erdaş ÇB, Sümer E (2022) A deep learning method to detect parkinson’s disease from mri slices. SN Computer Science 3(2):120 [Google Scholar]

[CR37] Fang H, Gong C, Zhang C, Sui Y, Li L (2020) Parkinsonian chinese speech analysis towards automatic classification of parkinson’s disease. In: Machine Learning for Health, pp. 114–125. PMLR

[CR38] Frenkel-Toledo S, Giladi N, Peretz C, Herman T, Gruendlinger L, Hausdorff JM (2005) Effect of gait speed on gait rhythmicity in parkinson’s disease: variability of stride time and swing time respond differently. J Neuroeng Rehabil 2:1–7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation 101(23):215–220 [Google Scholar]

[CR40] Guo P-F, Bhattacharya P, Kharma N (2010) Advances in detecting parkinson’s disease. In: Medical Biometrics: Second International Conference, ICMB 2010, Hong Kong, China, June 28-30, 2010. Proceedings 2, pp. 306–314. Springer

[CR41] Gharehchopogh FS, Mohammadi P (2013) A case study of parkinson’s disease diagnosis using artificial neural networks. International Journal of Computer Applications 73(19):1–6 [Google Scholar]

[CR42] Gil-Martín M, Montero JM, San-Segundo R (2019) Parkinson’s disease detection from drawing movements using convolutional neural networks. Electronics 8(8):907 [Google Scholar]

[CR43] Goceri E (2024) Vision transformer based classification of gliomas from histopathological images. Expert Syst Appl 241:122672 [Google Scholar]

[CR44] Goceri E (2025) An efficient network with cnn and transformer blocks for glioma grading and brain tumor classification from mris. Expert Syst Appl 268:126290 [Google Scholar]

[CR45] Gök M (2015) An ensemble of k-nearest neighbours algorithm for detection of parkinson’s disease. Int J Syst Sci 46(6):1108–1112 [Google Scholar]

[CR46] Govindu A, Palwe S (2023) Early detection of parkinson’s disease using machine learning. Procedia Computer Science 218:249–261 [Google Scholar]

[CR47] Giudici P, Raffinetti E (2025) Rga: a unified measure of predictive accuracy. Adv Data Anal Classif 19(1):67–93 [Google Scholar]

[CR48] Gunduz H (2019) Deep learning-based parkinson’s disease classification using vocal feature sets. Ieee access 7:115540–115551 [Google Scholar]

[CR49] Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021 [DOI] [PubMed] [Google Scholar]

[CR50] Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S (2012) Early diagnosis of parkinson’s disease via machine learning on speech data. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, pp. 1–4. IEEE

[CR51] Hausdorff JM, Lowenthal J, Herman T, Gruendlinger L, Peretz C, Giladi N (2007) Rhythmic auditory stimulation modulates gait variability in parkinson’s disease. Eur J Neurosci 26(8):2369–2375 [DOI] [PubMed] [Google Scholar]

[CR52] Haq AU, Li J, Memon MH, Khan J, Din SU, Ahad I, Sun R, Lai Z (2018) Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of parkinson disease. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 101–106. IEEE

[CR53] Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M et al (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE access 7:37718–37734 [Google Scholar]

[CR54] Huang L, Ye X, Yang M, Pan L, Zheng S (2023) Mnc-net: Multi-task graph structure learning based on node clustering for early parkinson’s disease diagnosis. Comput Biol Med 152:106308 [DOI] [PubMed] [Google Scholar]

[CR55] Igene L, Alim A, Imtiaz MH, Schuckers S (2023) A machine learning model for early prediction of parkinson’s disease from wearable sensors. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0734–0737. IEEE

[CR56] Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T (2023) Explainable machine learning models based on multimodal time-series data for the early detection of parkinson’s disease. Comput Methods Programs Biomed 234:107495 [DOI] [PubMed] [Google Scholar]

[CR57] Jankovic J (2008) Parkinson’s disease: clinical features and diagnosis. Journal of neurology neurosurgery & psychiatry 79(4):368–376 [DOI] [PubMed] [Google Scholar]

[CR58] Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y (2017) Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, 2(4)

[CR59] Khaskhoussy R, Ayed YB (2023) Improving parkinson’s disease recognition through voice analysis using deep learning. Pattern Recogn Lett 168:64–70 [Google Scholar]

[CR60] Kaur S, Aggarwal H, Rani R (2021) Diagnosis of parkinson’s disease using deep cnn with transfer learning and data augmentation. Multimedia Tools and Applications 80(7):10113–10139 [Google Scholar]

[CR61] Karaman O, Çakın H, Alhudhaif A, Polat K (2021) Robust automated parkinson disease detection based on voice signals with transfer learning. Expert Syst Appl 178:115013 [Google Scholar]

[CR62] Khan AA, Mahendran RK, Perumal K, Faheem M (2024) Dual-3dm 3 ad: mixed transformer based semantic segmentation and triplet pre-processing for early multi-class alzheimer’s diagnosis. IEEE Trans Neural Syst Rehabil Eng 32:696–707 [DOI] [PubMed] [Google Scholar]

[CR63] Khan AA, Madendran RK, Thirunavukkarasu U, Faheem M (2023) D2pam: Epileptic seizures prediction using adversarial deep dual patch attention mechanism. CAAI Transactions on Intelligence Technology 8(3):755–769 [Google Scholar]

[CR64] Kamran I, Naz S, Razzak I, Imran M (2021) Handwriting dynamics assessment using deep neural network for early identification of parkinson’s disease. Futur Gener Comput Syst 117:234–244 [Google Scholar]

[CR65] Kujur A, Raza Z, Khan AA, Wechtaisong C (2022) Data complexity based evaluation of the model dependence of brain mri images for classification of brain tumor and alzheimer’s disease. IEEE Access 10:112117–112133 [Google Scholar]

[CR66] Karan B, Sahu SS, Mahto K (2020) Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybernetics and Biomedical Engineering 40(1):249–264 [Google Scholar]

[CR67] LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444 [DOI] [PubMed] [Google Scholar]

[CR68] Lahmiri S, Dawson DA, Shmuel A (2018) Performance of machine learning methods in diagnosing parkinson’s disease based on dysphonia measures. Biomed Eng Lett 8(1):29–39 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] Lee S, Hussein R, McKeown MJ (2019) A deep convolutional-recurrent neural network architecture for parkinson’s disease eeg classification. In: 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1–4. IEEE

[CR70] Li A, Li C (2022) Detecting parkinson’s disease through gait measures using machine learning. Diagnostics 12(10):2404 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] Liu S, Liu S, Cai W, Pujol S, Kikinis R, Feng D (2014) Early diagnosis of alzheimer’s disease with deep learning. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 1015–1018. IEEE

[CR72] Liu X, Li W, Liu Z, Du F, Zou Q (2021) A dual-branch model for diagnosis of parkinson’s disease based on the independent and joint features of the left and right gait. Applied Intelligence, 1–12

[CR73] Loh HW, Ooi CP, Palmer E, Barua PD, Dogan S, Tuncer T, Baygin M, Acharya UR (2021) Gaborpdnet: Gabor transformation and deep neural network for parkinson’s disease detection using eeg signals. Electronics 10(14):1740 [Google Scholar]

[CR74] Lahmiri S, Shmuel A (2019) Detection of parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed Signal Process Control 49:427–433 [Google Scholar]

[CR75] Li Z, Yang J, Wang Y, Cai M, Liu X, Lu K (2022) Early diagnosis of parkinson’s disease using continuous convolution network: Handwriting recognition based on off-line hand drawing without template. J Biomed Inform 130:104085 [DOI] [PubMed] [Google Scholar]

[CR76] Li R, Zhang W, Suk H-I, Wang L, Li J, Shen D, Ji S (2014) Deep learning based imaging data completion for improved brain disease diagnosis. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. Springer

[CR77] Ma Y-W, Chen J-L, Chen Y-J, Lai Y-H (2023) Explainable deep learning architecture for early diagnosis of parkinson’s disease. Soft Comput 27(5):2729–2738 [Google Scholar]

[CR78] Madruga M, Campos-Roca Y, Pérez CJ (2023) Addressing smartphone mismatch in parkinson’s disease detection aid systems based on speech. Biomed Signal Process Control 80:104281 [Google Scholar]

[CR79] Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group* t (2009) Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Annals of internal medicine 151(4), 264–269

[CR80] Makarious MB, Leonard HL, Vitale D, Iwaki H, Sargent L, Dadu A, Violich I, Hutchins E, Saffo D, Bandres-Ciga S et al (2022) Multi-modality machine learning predicting parkinson’s disease. npj Parkinson’s Disease 8(1):35 [Google Scholar]

[CR81] Motin MA, Mahmud M, Brown DJ (2022) Detecting parkinson’s disease from electroencephalogram signals: an explainable machine learning approach. In: 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE

[CR82] Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Abd Ghani MK, Jaber MM, Khaleefah SH (2019) Examining multiple feature evaluation and classification methods for improving the diagnosis of parkinson’s disease. Cogn Syst Res 54:90–99 [Google Scholar]

[CR83] Mandal I, Sairam N (2014) New machine-learning algorithms for prediction of parkinson’s disease. Int J Syst Sci 45(3):647–666 [Google Scholar]

[CR84] Nakach F-Z, Idri A, Goceri E (2024) A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classification. Artif Intell Rev 57(12):327 [Google Scholar]

[CR85] Nguyen DMD, Miah M, Bilodeau G-A, Bouachir W (2022) Transformers for 1d signals in parkinson’s disease detection from gait. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 5089–5095. IEEE

[CR86] Nagasubramanian G, Sankayya M (2021) Multi-variate vocal data analysis for detection of parkinson disease using deep learning. Neural Comput Appl 33(10):4849–4864 [Google Scholar]

[CR87] Nour M, Senturk U, Polat K (2023) Diagnosis and classification of parkinson’s disease using ensemble learning and 1d-pdcovnn. Comput Biol Med 161:107031 [DOI] [PubMed] [Google Scholar]

[CR88] Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, Gonzalez-Rátiva MC, Nöth E (2014) New spanish speech corpus database for the analysis of people suffering from parkinson’s disease. In: Lrec, pp. 342–347

[CR89] Oliveira FP, Castelo-Branco M (2015) Computer-aided diagnosis of parkinson’s disease based on [123i] fp-cit spect binding potential images, using the voxels-as-features approach and support vector machines. J Neural Eng 12(2):026008 [DOI] [PubMed] [Google Scholar]

[CR90] Oh SL, Hagiwara Y, Raghavendra U, Yuvaraj R, Arunkumar N, Murugappan M, Acharya UR (2020) A deep learning approach for parkinson’s disease diagnosis from eeg signals. Neural Comput Appl 32:10927–10933 [Google Scholar]

[CR91] Olanrewaju RF, Sahari NS, Musa AA, Hakiem N (2014) Application of neural networks in early detection and diagnosis of parkinson’s disease. In: 2014 International Conference on Cyber and IT Service Management (CITSM), pp. 78–82. IEEE

[CR92] Prasuhn J, Heldmann M, Münte TF, Brüggemann N (2020) A machine learning-based classification approach on parkinson’s disease diffusion tensor imaging datasets. Neurological research and practice 2:1–5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR93] Perumal K, Mahendran RK, Ahmad Khan A, Kadry S (2025) Tri-m2mt: Multi-modalities based effective acute bilirubin encephalopathy diagnosis through multi-transformer using neonatal magnetic resonance imaging. CAAI Transactions on Intelligence Technology

[CR94] Pereira CR, Pereira DR, Da Silva FA, Hook C, Weber SA, Pereira LA, Papa JP (2015) A step towards the automated diagnosis of parkinson’s disease: Analyzing handwriting movements. In: 2015 IEEE 28th International Symposium on Computer-based Medical Systems, pp. 171–176. Ieee

[CR95] Parisi L, RaviChandran N, Manaog ML (2018) Feature-driven machine learning to improve early diagnosis of parkinson’s disease. Expert Syst Appl 110:182–190 [Google Scholar]

[CR96] Prashanth R, Roy SD, Mandal PK, Ghosh S (2014) Automatic classification and prediction models for early parkinson’s disease diagnosis from spect imaging. Expert Syst Appl 41(7):3333–3342 [Google Scholar]

[CR97] Priyadharshini S, Ramkumar K, Vairavasundaram S, Narasimhan K, Venkatesh S, Amirtharajan R, Kotecha K (2024) A comprehensive framework for parkinson’s disease diagnosis using explainable artificial intelligence empowered machine learning techniques. Alex Eng J 107:568–582 [Google Scholar]

[CR98] Peker M, Şen B, Delen D (2015) Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mrmr feature selection algorithm. Journal of healthcare engineering 6(3):281–302 [DOI] [PubMed] [Google Scholar]

[CR99] Peng B, Wang S, Zhou Z, Liu Y, Tong B, Zhang T, Dai Y (2017) A multilevel-roi-features-based machine learning method for detection of morphometric biomarkers in parkinson’s disease. Neurosci Lett 651:88–94 [DOI] [PubMed] [Google Scholar]

[CR100] Quan C, Ren K, Luo Z (2021) A deep learning based method for parkinson’s disease detection using dynamic features of speech. IEEE access 9:10239–10252 [Google Scholar]

[CR101] Ribeiro LC, Afonso LC, Papa JP (2019) Bag of samplings for computer-assisted parkinson’s disease diagnosis based on recurrent neural networks. Comput Biol Med 115:103477 [DOI] [PubMed] [Google Scholar]

[CR102] Rustempasic I, Can M (2013) Diagnosis of parkinson’s disease using principal component analysis and boosting committee machines. Southeast Europe journal of soft computing, 2(1)

[CR103] Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L (2019) Selecting clinically relevant gait characteristics for classification of early parkinson’s disease: a comprehensive machine learning approach. Sci Rep 9(1):17269 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR104] Rana A, Dumka A, Singh R, Rashid M, Ahmad N, Panda MK (2022) An efficient machine learning approach for diagnosing parkinson’s disease by utilizing voice features. Electronics 11(22):3782 [Google Scholar]

[CR105] Rieke N, Hancox J, Li W, Milletari F, Roth HR, Albarqouni S, Bakas S, Galtier MN, Landman BA, Maier-Hein K et al (2020) The future of digital health with federated learning. NPJ digital medicine 3(1):119 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR106] Rastogi D, Johri P, Donelli M, Kadry S, Khan AA, Espa G, Feraco P, Kim J (2025) Deep learning-integrated mri brain tumor analysis: feature extraction, segmentation, and survival prediction using replicator and volumetric networks. Sci Rep 15(1):1437 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR107] Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Agrawal R, Behari M (2015) Regions-of-interest based automated diagnosis of parkinson’s disease using t1-weighted mri. Expert Syst Appl 42(9):4506–4516 [Google Scholar]

[CR108] Razzak I, Kamran I, Naz S (2020) Deep analysis of handwritten notes for early diagnosis of neurological disorders. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE

[CR109] Rizvi DR, Nissar I, Masood S, Ahmed M, Ahmad F (2020) An lstm based deep learning model for voice-based detection of parkinson’s disease. Int. J. Adv. Sci. Technol, 29(8)

[CR110] Ramani RG, Sivagami G (2011) Parkinson disease classification using data mining algorithms. International journal of computer applications 32(9):17–22 [Google Scholar]

[CR111] Sigcha L, Borzì L, Amato F, Rechichi I, Ramos-Romero C, Cárdenas A, Gascó L, Olmo G (2023) Deep learning and wearable sensors for the diagnosis and monitoring of parkinson’s disease: A systematic review. Expert Syst Appl 229:120541 [Google Scholar]

[CR112] Salvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, Arabia G, Morelli M, Gilardi M, Quattrone A (2014) Machine learning on brain mri data for differential diagnosis of parkinson’s disease and progressive supranuclear palsy. J Neurosci Methods 222:230–237 [DOI] [PubMed] [Google Scholar]

[CR113] Senturk ZK (2020) Early diagnosis of parkinson’s disease using machine learning algorithms. Med Hypotheses 138:109603 [DOI] [PubMed] [Google Scholar]

[CR114] Sharma A, Giri RN (2014) Automatic recognition of parkinson’s disease via artificial neural network and support vector machine. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 4(3):2278–3075 [Google Scholar]

[CR115] Shaban M (2021) Automated screening of parkinson’s disease using deep learning based electroencephalography. In: 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 158–161. IEEE

[CR116] Sharma NP, Junaid I, Ari S (2023) Early diagnosis of parkinson’s disease and severity assessment based on gait using 1d-cnn. In: 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), pp. 1–6. IEEE

[CR117] Sakar CO, Kursun O (2010) Telediagnosis of parkinson’s disease using measurements of dysphonia. Journal of medical systbagems 34:591–599 [Google Scholar]

[CR118] Shetty S, Rao Y (2016) Svm based machine learning approach to identify parkinson’s disease using gait analysis. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–5. IEEE

[CR119] Sivaranjini S, Sujatha C (2020) Deep learning based diagnosis of parkinson’s disease using convolutional neural network. Multimedia tools and applications 79(21):15467–15479 [Google Scholar]

[CR120] Soumaya Z, Taoufiq BD, Benayad N, Yunus K, Abdelkrim A (2021) The detection of parkinson disease using the genetic algorithm and svm classifier. Appl Acoust 171:107528 [Google Scholar]

[CR121] Tsai C-C, Chen Y-L, Lu C-S, Cheng J-S, Weng Y-H, Lin S-H, Wu Y-M, Wang J-J (2023) Diffusion tensor imaging for the differential diagnosis of parkinsonism by machine learning. Biomedical journal 46(3):100541 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR122] Taleb C, Khachab M, Mokbel C, Likforman-Sulem L (2019) Visual representation of online handwriting time series for deep learning parkinson’s disease detection. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 6, pp. 25–30. IEEE

[CR123] Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271 [DOI] [PubMed] [Google Scholar]

[CR124] Tahir NM, Manap HH (2012) Parkinson disease gait classification based on machine learning approach. J Appl Sci (Faisalabad) 12(2):180–185 [Google Scholar]

[CR125] Trifonova OP, Maslov DL, Balashova EE, Urazgildeeva GR, Abaimov DA, Fedotova EY, Poleschuk VV, Illarioshkin SN, Lokhov PG (2020) Parkinson’s disease: available clinical and promising omics tests for diagnostics, disease risk assessment, and pharmacotherapy personalization. Diagnostics 10(5):339 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR126] Talai AS, Sedlacik J, Boelmans K, Forkert ND (2021) Utility of multi-modal mri for differentiating of parkinson’s disease and progressive supranuclear palsy using machine learning. Front Neurol 12:648548 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR127] Trabassi D, Serrao M, Varrecchia T, Ranavolo A, Coppola G, De Icco R, Tassorelli C, Castiglia SF (2022) Machine learning approach to support the detection of parkinson’s disease in imu-based gait analysis. Sensors 22(10):3700 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR128] Vinora A, Ajitha E, Sivakarthi G, et al (2023) Detecting parkinson’s disease using machine learning. In: 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), pp. 1–6. IEEE

[CR129] Varghese J, Brenner A, Fujarski M, Alen CM, Plagwitz L, Warnecke T (2024) Machine learning in the parkinson’s disease smartwatch (pads) dataset. npj Parkinson’s Disease 10(1):9 [Google Scholar]

[CR130] Varalakshmi P, Priya BT, Rithiga BA, Bhuvaneaswari R, Sundar RSJ (2022) Diagnosis of parkinson’s disease from hand drawing utilizing hybrid models. Parkinsonism & related disorders 105:24–31 [DOI] [PubMed] [Google Scholar]

[CR131] Vyas T, Yadav R, Solanki C, Darji R, Desai S, Tanwar S (2022) Deep learning-based scheme to diagnose parkinson’s disease. Expert Syst 39(3):12739 [Google Scholar]

[CR132] Wahid F, Begg RK, Hass CJ, Halgamuge S, Ackland DC (2015) Classification of parkinson’s disease gait using spatial-temporal gait features. IEEE J Biomed Health Inform 19(6):1794–1802 [DOI] [PubMed] [Google Scholar]

[CR133] Wang X, Huang J, Chatzakou M, Medijainen K, Toomela A, Nõmm S, Ruzhansky M (2024) Lstm-cnn: An efficient diagnostic network for parkinson’s disease utilizing dynamic handwriting analysis. Comput Methods Programs Biomed 247:108066 [DOI] [PubMed] [Google Scholar]

[CR134] Wang X, Hao X, Yan J, Xu J, Hu D, Ji F, Zeng T, Wang F, Wang B, Fang J et al (2023) Urine biomarkers discovery by metabolomics and machine learning for parkinson’s disease diagnoses. Chin Chem Lett 34(10):108230 [Google Scholar]

[CR135] Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S, Group P (2019) Probast: a tool to assess the risk of bias and applicability of prediction model studies. Annals of internal medicine 170(1), 51–58

[CR136] Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins DC, Ghomi RH (2018) Parkinson’s disease diagnosis using machine learning and voice. In: 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–7. IEEE

[CR137] West C, Soltaninejad S, Cheng I (2019) Assessing the capability of deep-learning models in parkinson’s disease diagnosis. In: International Conference on Smart Multimedia, pp. 237–247. Springer

[CR138] Wang J, Xue L, Jiang J, Liu F, Wu P, Lu J, Zhang H, Bao W, Xu Q, Ju Z et al (2024) Diagnostic performance of artificial intelligence-assisted pet imaging for parkinson’s disease: A systematic review and meta-analysis. NPJ Digital Medicine 7(1):17 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR139] Xia Y, Yao Z, Ye Q, Cheng N (2019) A dual-modal attention-enhanced deep learning network for quantification of parkinson’s disease characteristics. IEEE Trans Neural Syst Rehabil Eng 28(1):42–51 [DOI] [PubMed] [Google Scholar]

[CR140] Xu N, Zhou Y, Patel A, Zhang N, Liu Y (2023) Parkinson’s disease diagnosis beyond clinical features: a bio-marker using topological machine learning of resting-state functional magnetic resonance imaging. Neuroscience 509:43–50 [DOI] [PubMed] [Google Scholar]

[CR141] Yogev G, Giladi N, Peretz C, Springer S, Simon ES, Hausdorff JM (2005) Dual tasking, gait rhythmicity, and parkinson’s disease: which aspects of gait are attention demanding? Eur J Neurosci 22(5):1248–1256 [DOI] [PubMed] [Google Scholar]

[CR142] Ya Y, Ji L, Jia Y, Zou N, Jiang Z, Yin H, Mao C, Luo W, Wang E, Fan G (2022) Machine learning models for diagnosis of parkinson’s disease using multiple structural magnetic resonance imaging features. Frontiers in Aging Neuroscience 14:808520 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR143] Yadav G, Kumar Y, Sahoo G (2012) Predication of parkinson’s disease using data mining methods: A comparative analysis of tree, statistical and support vector machine classifiers. In: 2012 National Conference on Computing and Communication Systems, pp. 1–8. IEEE

[CR144] Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58:101552 [DOI] [PubMed] [Google Scholar]

[CR145] Zhao S, Dai G, Li J, Zhu X, Huang X, Li Y, Tan M, Wang L, Fang P, Chen X et al (2024) An interpretable model based on graph learning for diagnosis of parkinson’s disease with voice-related eeg. NPJ Digital Medicine 7(1):3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR146] Zhang J (2022) Mining imaging and clinical data with machine learning approaches for the diagnosis and early detection of parkinson’s disease. npj Parkinson’s Disease 8(1):13 [Google Scholar]

[CR147] Zhang YC, Kagen AC (2017) Machine learning interface for medical image analysis. J Digit Imaging 30:615–621 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR148] Zhao A, Li J (2023) A significantly enhanced neural network for handwriting assessment in parkinson’s disease detection. Multimedia Tools and Applications 82(25):38297–38317 [Google Scholar]

[CR149] Zahid L, Maqsood M, Durrani MY, Bakhtyar M, Baber J, Jamal H, Mehmood I, Song O-Y (2020) A spectrogram-based deep feature assisted computer-aided diagnostic system for parkinson’s disease. IEEE Access 8:35482–35495 [Google Scholar]

[CR150] Zhao H, Tsai C-C, Zhou M, Liu Y, Chen Y-L, Huang F, Lin Y-C, Wang J-J (2022) Deep learning based diagnosis of parkinson’s disease using diffusion magnetic resonance imaging. Brain Imaging Behav 16(4):1749–1760 [DOI] [PubMed] [Google Scholar]

[CR151] Zhang Y, Weng Y, Lund J (2022) Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 12(2):237 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR152] Zhang X, Yang Y, Wang H, Ning S, Wang H (2019) Deep neural networks with broad views for parkinson’s disease screening. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1018–1022. IEEE

PERMALINK

Applications of machine learning for computer-aided diagnosis of Parkinson’s disease: progress and benchmark case study

Juntao Zhang

Yiming Zhang

Ying Weng

Akram A Hosseini

Boding Wang

Tom Dening

Weinyu Fan

Weizhong Xiao

Abstract

Introduction

Search strategy

Fig. 1.

Contributions

Applications of ML-based PD diagnosis

Fig. 2.

Table 1.

Neuroimaging data

Fig. 3.

Fig. 4.

Voice data

Fig. 5.

Handwriting data

Fig. 6.

Fig. 7.

Gait data

Fig. 8.

EEG data

Fig. 9.

Other data

Fig. 10.

Datasets

Clinical applicability

Evaluation metrics

Risk of bias

Table 2.

Fig. 11.

Case studies

Table 3.

Case study 1: voice

Case study 2: gait

Case study 3: EEG

Case study 4: handwriting

Case study 5: MRI

Reproduction results

Table 4.

Discussions

Summary of findings

Fig. 12.

Fig. 13.

Limitations of current studies

Dataset size

Black box nature of ML models

No standardization of validation

Lack of medical experts’ participation

Bias Risk and Trustworthiness of ML-Based PD Diagnosis

No standardizing ML approaches

Future research directions

Explainable artificial intelligence (XAI)

Data augmentation

Transfer learning

Federated learning

Multi-modality

Open source culture and standard protocols

Ethical concerns

More complete model evaluation

Bias mitigation

Conclusions

Acknowledgements

Appendix

Fig. 14.

Fig. 15.

Author contributions

Data availibility

Declarations

Conflict of interests

Footnotes

References

Associated Data