Skip to main content
Springer logoLink to Springer
. 2025 Aug 29;58(11):357. doi: 10.1007/s10462-025-11347-y

Applications of machine learning for computer-aided diagnosis of Parkinson’s disease: progress and benchmark case study

Juntao Zhang 1,#, Yiming Zhang 1,#, Ying Weng 1,2,, Akram A Hosseini 3,5, Boding Wang 4, Tom Dening 5, Weinyu Fan 4, Weizhong Xiao 6
PMCID: PMC12397128  PMID: 40895221

Abstract

Machine learning (ML) has emerged as a vital tool for the diagnosis of Parkinson’s Disease (PD). This study presents a comprehensive review on the applications of ML for computer-aided diagnosis (CAD) of PD. We conducted a comprehensive review by searching articles published from 2010 till 2024. The risk of bias is assessed using the PROBAST checklist. Case studies are also provided. This review includes 117 articles with six categories: neuroimaging data (20.5%); voice data (40.2%); handwriting data (12.0%); gait data (14.5%); EEG data (8.5%); and other data (4.3%). According to the PROBAST checklist, only 28 articles (23.9%) have a low risk of bias. A benchmark case study is conducted for five different data modalities. We also discuss current limitations and future directions of applying ML to the diagnosis of PD. This review reduces the gap between Artificial Intelligence (AI) and PD medical professionals and provides helpful information for future research.

Keywords: Parkinson’s disease (PD), Machine learning (ML), Deep learning (DL), Computer-aided diagnosis (CAD), Case study

Introduction

Parkinson’s disease (PD) is the second-leading progressive neurodegenerative disorder after Alzheimer’s disease (AD) and is characterised by numerous motor and non-motor features (Jankovic 2008). Its incidence tends to increase, especially beyond the age of 60 years. PD is diagnosed based on the patient’s medical history and clinical criteria, and there is no definitive test or laboratory test for PD diagnosis (Jankovic 2008). It is a challenge for the medical specialist to correctly differentiate PD from other pathologies when the signs and symptoms of the patients overlap with other Parkinsonian syndromes (Trifonova et al. 2020). Hence, it is important to assess whether applying Computer-aided diagnosis (CAD) will help the medical specialist to aid the diagnosis of PD. Artificial Intelligence (AI) has been found helpful in healthcare, and it has been utilised for disease detection, diagnosis, treatment, and prognosis evaluation (Jiang et al. 2017). In the past 15 years, many AI methods have been applied in the field of CAD of PD. In particular, deep learning (DL) has become more attractive in the last decade than conventional Machine learning (ML), as it can discover and learn more hidden patterns from healthcare data (LeCun et al. 2015). For example, ML and DL-based methods have been applied as computer-assisted techniques on the diagnosis of brain diseases using neuroimaging data (Li et al. 2014), including the diagnosis of AD (Liu et al. 2014) and PD. Moreover, PD diagnosis using ML involves high data complexity due to the variety of data modalities, such as neuroimaging, gait, voice, and handwriting (Cano 2013). These datasets are often high-dimensional and may contain noise, making preprocessing and analysis more challenging (Khan et al. 2023, 2024; Perumal et al. 2025). To comprehensively examine the research progress over the past 15 years and provide meaningful guidance on the application of ML in the medical domain, we conduct a systematic review of ML-based computer-aided diagnosis for PD. Unlike previous review papers (Zhang 2022; Sigcha et al. 2023; Wang et al. 2024), which neither focused on a limited number of data modalities nor lacked practical benchmarking efforts, our work introduces case studies that directly address the gaps in the field. To ensure methodological rigor, we select five of the most commonly used data modalities and use a public dataset. Additionally, to support transparency and reproducibility, we have released the code implementations of all benchmark case studies via GitHub: https://github.com/yiming95/PD_ML_benchmark.

Search strategy

We perform this systematic review of literature on PD diagnosis using ML techniques following the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement (Moher et al. 2009). Four electronic databases (1) IEEE Xplore, (2) Association for Computing Machinery (ACM), (3) Springer, and (4) Science Direct are searched for relevant publications from 2010 to 2024. Google Scholar and PubMed are searched between these dates for potentially relevant studies as well. We use several keywords as search queries, including “Parkinson’s disease”, “PD”, "Diagnosis”, “Diagnostics”, “Computer-Aided Diagnosis”, “Deep learning”, “Machine learning”, and “Artificial Intelligence”. The PRISMA flowchart is shown in Fig. 1.

Fig. 1.

Fig. 1

PRISMA Flow chart. The study selection process shows the number of the literature identified, screened, assessed, and included in this systematic review

The review aimed to identify the publications on PD diagnosis using ML. All included articles focus on the topic of PD diagnosis using ML. Besides, only publications in English are included. The publications focusing on the treatment or prognostic evaluation of PD, or those only using image analysis or signal analysis methods, are excluded. Review papers and non-peer-reviewed papers are also excluded. Publications were first screened for eligibility by Title and Abstract. Potentially eligible studies were then assessed and evaluated in full text. We then analyze and extract data from the screened articles. Data extracted from the full-text articles include (1) Author, (2) Published year, (3) Objective, (4) Data modality, (5) Dataset, (6) Number of subjects, (7) ML algorithms applied, (8) Validation, (9) Evaluation metrics. The results section analyzes five different data modalities from the main public datasets that have been used by ML researchers, including neuroimaging data, voice data, handwriting data, gait data, and electroencephalogram (EEG) data. The meta-analysis is not performed due to the heterogeneity of the included studies.

Contributions

This interdisciplinary systematic review quantifies and analyzes the last 15 years’ publications on the diagnosis of PD using ML techniques. By conducting a benchmark case study on five commonly used modalities, including MRI, gait, voice, EEG, and handwriting, we find the issues in this field, which are that the reported results are hard to reproduce and lack interpretability analysis. Furthermore, this systematic review aims to summarize the current trends of how ML techniques are applied in the early diagnosis of PD. Besides, it also aims to identify the current limitations and challenges of applying ML in the diagnosis of PD and propose a few promising future directions. Compared to previous works, this article encompasses the broadest range of literature from 2010 to 2024 and includes the largest number of modalities. Additionally, no prior works have conducted a detailed case study experiment to test the reproducibility of results across multiple modalities. The contributions of this paper can be summarized into:

  • We conduct a systematic review on ML-based CAD for PD applications published from 2010 to 2024. Specifically, we analyze the data modalities, dataset, ML algorithm, and model performance for each study.

  • We conduct a comprehensive case study on five data modalities.

  • The paper also discusses the current limitations and future directions of applying ML in PD diagnosis.

The rest of the paper is organized as follows. Section 2 summarizes the ML-based PD diagnosis applications and introduces the datasets and evaluation metrics. Section 3 shows the results of the risk of bias assessment. Section 4 shows the details of the case study. Section 5 provides the discussion, including a summary of the findings, current challenges, and future research directions. Section 6 summarizes the paper.

Applications of ML-based PD diagnosis

Based on our search and study selection process, we first identified 12424 articles from IEEE Xplore, ACM, Springer, and Science Direct. Additional articles are also included from Google Scholar and PubMed. After removing duplicates, 8908 articles are then screened for eligibility. After screening the article’s Title and Abstract, we excluded 8407 articles, leaving 501 articles for full-text examination. Finally, we include 117 articles for data extraction. A general procedure pipeline for PD diagnosis using ML is shown in Fig 2. Table 1 shows all the included studies.

Fig. 2.

Fig. 2

Pipeline for the general ML-based computer-assisted PD diagnosis

Table 1.

Summary of studies on PD classification using different modalities

Author Year Objective Data modality Dataset Subjects ML Algorithm Validation Evaluation metrics
Neuroimaging
Prashanth et al. (2014) 2014 Classification (PD vs. HC) Neuroimaging: DaTSCAN SPECT PPMI 548 subjects: 369 PD + 179 HC RBF-SVM 10-fold cross-validation Accuracy: 96.14%, Sensitivity: 96.55%, Specificity: 95.03%
Salvatore et al. (2014) 2014 Classification (PD vs. HC) Neuroimaging: MRI Collected from participants 84 subjects: 56 PD + 28 HC SVM Leave-One-Out (LOO) validation Accuracy: 92.2%, Sensitivity: 94.4%, Specificity: 91.3%
Rana et al. (2015) 2015 Classification (PD vs. HC) Neuroimaging: MRI Collected from participants 60 subjects: 30 PD + 30 HC SVM leave-one-out cross-validation (LOOCV) Accuracy: 86.67%, Sensitivity: 90.00%, Specificity: 83.33%
Oliveira and Castelo–Branco (2015) 2016 Classification (PD vs. HC) Neuroimaging: FP-CIT SPECT PPMI 654 subjects: 445 PD + 209 HC SVM LOOCV Accuracy: 97.68%, Sensitivity: 97.75%, Specificity: 98.09%
Zhang and Kagen (2017) 2017 Classification (PD vs. HC) Neuroimaging: DaTSCAN SPECT PPMI Not specified ANN 10-fold cross-validation Accuracy: 93.8%, Sensitivity: 97.4%, Specificity: 82.2%
Peng et al. (2017) 2017 Classification (PD vs. HC) Neuroimaging: MRI PPMI 172 subjects: 69 PD + 103 HC RBF-SVM 10-fold cross-validation Accuracy: 85.8%, Sensitivity: 87.6%, Specificity: 87.8%
Sivaranjini and Sujatha (2020) 2020 Classification (PD vs. HC) Neuroimaging: MRI PPMI 182 subjects: 100 PD + 82 HC AlexNet Train and Test split (80–20%) Accuracy: 88.90%, Sensitivity: 89.30%, Specificity: 88.40%
West et al. (2019) 2019 Classification (PD vs. HC) Neuroimaging: MRI PPMI 445 subjects: 299 PD + 146 HC 3D CNN Not specified Accuracy: 75%, Sensitivity: 76%, Specificity: 74%, Precision: 74%
Dai et al. (2019) 2019 Classification (PD vs. HC) Neuroimaging: PET PPMI, ANDI, HCP Not specified U-Net 10-fold cross-validation Accuracy (U-Net): 84.17%, Accuracy (CNN): 76.19%
Zhang et al. (2019) 2019 Classification (Prodromal PD vs. Confirmed PD vs. HC) Neuroimaging: MRI PPMI 578 subjects: 49 Prodromal PD + 366 Confirmed PD + 163 HC Deep neural network with Broad Views (DBV) Train and Test split (80–20%) Accuracy: 76.27%
Chakraborty et al. (2020) 2020 Classification (PD vs. HC) Neuroimaging: MRI PPMI 406 subjects: 203 PD + 203 HC 3D CNN 5-fold cross-validation Accuracy: 95.29%, F1 score: 93.6%, Specificity: 94.3%, Precision: 92.7%, Recall: 94.3%, ROC-AUC: 98%
Kaur et al. (2021) 2021 Classification (PD vs. HC) Neuroimaging: MRI PPMI Not specified AlexNet Train, Validation and Test split (60–20–20%) Accuracy: 89.23%, Sensitivity: 90.27%, Specificity: 89.03%, ROC-AUC: 97.23%
Vyas et al. (2022) 2022 Classification (PD vs. HC) Neuroimaging: MRI PPMI 318 subjects: 236 PD + 82 HC 3D CNN Train and Test split validation (70–30%) 3D CNN Accuracy: 88.9% 3D CNN AUC: 86.0%
Ya et al. (2022) 2022 Classification (PD vs. NC) Neuroimaging: MRI Collected from participants, PPMI Collected from participants 116 subjects: 60 PD + 56 NC; PPMI 140 subjects: 69 PD + 71 NC Regression models 5-fold cross-validation Cerebellar model AUC: 64.6% Subcortical model AUC:63.2% Cortical model AUC:69.0% Combined model AUC:75.6%
Erdaş and Sümer (2022) 2022 Classification (PD vs. NC) Neuroimaging: MRI Combined from multiple datasets (Badea et al. 2017) 83 subjects: 47 PD + 36 NC 2D CNN 10-fold cross-validation Accuracy: 90.36%, ROC-AUC: 90.51%, F1 score: 90.25%, Sensitivity: 90.52%, Precision: 90.08%
Huang et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI PPMI 194 subjects: 97 PD + 97 HC multi-task node cluster based graph structure learning framework (MNC-Net) 10-fold cross-validation Accuracy: 95.5%, F1 score: 95.49%, Precision: 97.00%, Recall: 94.42%
Xu et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI PPMI 117 subjects: 84 PD + 34 HC DNN 5-fold cross-validation Accuracy: 96.4%
Camacho et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI PPMI 2041 subjects: 1024 PD + 1017 HC CNN with Log-Jacobian model Train, Validation, and Test split (85–5–10%) Accuracy: 79.3%, Precision: 80.2%, Specificity: 81.3%, Sensitivity:77.7%
Priyadharshini et al. (2024) 2024 Classification (PD vs. HC) Neuroimaging: 3D MRI PPMI 500 subjects: 180 PD + 160 prodromal PD + 160 HC Gradient Boosting (GB), with SHAP, LIME, SHAPASH for XAI 5-fold cross-validation Accuracy: 96.8% Precision: 97% Recall: 94.2% Specificity: 96.6% F1 score: 94.6%
Talai et al. (2021) 2021 Classification (PD vs. PSP vs. HC) Neuroimaging: T1, T2, DTI MRI PPMI 103 subjects: 45 PD + 20 PSP-RS + 38 HC SVM+MLP LOOCV Accuracy: 95.1%
Prasuhn et al. (2020) 2020 Classification (PD vs. HC) Neuroimaging: Diffusion Tensor Imaging (DTI) PPMI 232 subjects: 162 PD + 70 HC SVM (bSVM) 10-fold cross-validation Balanced Accuracy: 58.1% ROC-AUC: 52.0% Sensitivity: 56% Specificity:41%
Chen et al. (2023) 2023 Classification (PD-MCI vs. PD-NC) Neuroimaging: DTI (FA, MD, AD, RD, LDH) Collected from participants 117 subjects: 52 PD-NC + 68 PD-MCI XGBoost 10-fold cross-validation Accuracy: 91.67%, Sensitivity: 92.86%, Specificity: 90.00%, AUC: 94.00%
Tsai et al. (2023) 2023 Classification (PD vs. PSP vs. MSA vs. HC) Neuroimaging: DTI (whole-brain features) Collected from participants 625 subjects: 286 PD + 69 PSP + 51 MSA + 219 HC SVM, Discriminant Function Analysis 5-fold cross-validation Accuracy: 83.0%, Sensitivity: 84.8%, Specificity: 78.3%, F1 Score: 86.7%
Zhao et al. (2022) 2022 Classification (PD vs. HC) Neuroimaging: DTI (Fractional Anisotropy, MD) Collected from participants 532 subjects: 305 PD + 227 HC 3D CNN 10-fold cross-validation, independent test set AUC: 94.1%
Voice
Sakar and Kursun (2010) 2010 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC SVM LOOCV Accuracy: 81.53%, (LOO validation) Accuracy: 92.75% ( bootstrap resampling validation)
Bhattacharya and Bhatia (2010) 2010 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Linear-SVM Cross-validation Accuracy: 65.22%
Guo et al. (2010) 2010 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Minimum distance classifier (MDC) 10-fold cross-validation Accuracy: 93.12%
Åström and Koker (2011) 2011 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Parallel network system (9 FNN) train and test split (60–40%) Accuracy: 91.2% ± 1.6%
Ramani and Sivagami (2011) 2011 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Fisher Filter + RF Not specified Accuracy: 100%
Yadav et al. (2012) 2012 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC SVM 10-fold cross-validation Accuracy: 76%, Sensitivity: 97%, Specificity: 13%
Tsanas et al. (2012) 2012 Classification (PD vs. HC) Voice dataset NCVS 43 subjects: 33 PD + 10 HC RELIEF + SVM 10-fold cross-validation Accuracy: 98.6%
Mandal and Sairam (2014) 2014 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC LR 10-fold cross-validation Accuracy: 100%, Sensitivity: 98.3%, Specificity: 99.6%
Hazan et al. (2012) 2012 Classification (PD vs. HC) Voice dataset Collected from participants American Dataset: 52 subjects: 38 PD + 14 HC German Dataset: 98 subjects: 68 PD + 30 HC SVM Cross-validation American Accuracy: 96%, German Accuracy: 85%
Gharehchopogh and Mohammadi (2013) 2013 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC MLP train and test split (70–30%) Accuracy: 93.22%
Rustempasic and Can (2013) 2013 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC MLP Not specified Accuracy: 81.33%
Sharma and Giri (2014) 2014 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC RBF-SVM Train and Test split (80–20%) Accuracy: 85.29%, Sensitivity: 100%, Specificity: 37.5%
Olanrewaju et al. (2014) 2014 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC MLFFN + K-Means Train and Test split (50–50%) Accuracy: 80%, Sensitivity: 63.6%, Specificity: 83.3%
Peker et al. (2015) 2015 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC CVANN 10-fold cross-validation Accuracy: 98.12%, Sensitivity: 99.24%, Specificity: 98.96%
Gök (2015) 2015 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Linear SVM + KNN 10-fold cross-validation Accuracy: 98.46%
Chen et al. (2016) 2016 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC mRMR – KELM 10-fold cross-validation Accuracy: 95.97%
Avci and Dogantekin (2016) 2016 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC GA-WK-ELM 3-fold cross-validation Highest Accuracy: 96.81%
Dinesh and He (2017) 2017 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Boosted Decision Tree 10-fold cross-validation Highest Accuracy: 95%
Caliskan et al. (2017) 2017 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC DNN 10-fold cross-validation Accuracy: 86.095%, Sensitivity: 58.27%, Specificity: 95.387%
Parisi et al. (2018) 2018 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository 40 subjects: 20 PD + 20 HC MLP-LSVM 20-fold cross-validation Accuracy: 100%, Sensitivity: 100%, Specificity: 100%
Wroge et al. (2018) 2018 Classification (PD vs. HC) Voice dataset mPower dataset N/A SVM 10-fold cross-validation Accuracy: 85%, Precision: 84%, Recall: 71%
Lahmiri et al. (2018) 2018 Classification (PD vs. HC) Voice dataset Private dataset 195 subjects: 147 PD + 48 HC SVM 10-fold cross-validation Accuracy: 92%, Sensitivity: 95%, Specificity: 91%
Haq et al. (2018) 2018 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC DNN Train and Test split (70–30%) Accuracy: 98%, Sensitivity: 95%, Specificity: 99%
Ali et al. (2019) 2019 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository 40 subjects: 20 PD + 20 HC LDA-NN-GA leave-one-subject-out (LOSO) validation Training Accuracy: 80%, Testing Accuracy: 82.14%
Mostafa et al. (2019) 2019 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC RF 10-fold cross-validation Accuracy: 99.49%, Precision: 95.5%, Recall: 95.5%
Lahmiri and Shmuel (2019) 2019 Classification (PD vs. HC) Voice dataset Private dataset 43 subjects: 33 PD + 10 HC Wilcoxon statistic + SVM 10-fold cross-validation Accuracy: 92.21%, Sensitivity: 99.63%, Specificity: 82.79%
Haq et al. (2019) 2019 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC L1-norm SVM feature selection + SVM 10-fold cross-validation Accuracy: 99%, Sensitivity: 100%, Specificity: 99%
Senturk (2020) 2020 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC SVM Not specified Accuracy: 93.84%
Karan et al. (2020) 2020 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository + PC-GITA UCI: 45 subjects: 25 PD + 20 HC PC-GITA: 45 subjects: 25 PD + 20 HC SVM 10-fold cross-validation UCI accuracy: 100%, PC-GITA accuracy: 96%
Soumaya et al. (2021) 2021 Classification (PD vs. HC) Voice dataset Collected from participants 34 subjects: 20 PD + 14 HC GA + SVM 10-fold cross-validation Best accuracy: 91.18%
Karaman et al. (2021) 2021 Classification (PD vs. HC) Voice dataset mPower dataset N/A subjects DenseNet-161 Not specified Accuracy: 89.75% Specificity: 91.50% Sensitivity: 88.40%
Quan et al. (2021) 2021 Classification (PD vs. HC) Voice dataset Collected from participants 45 subjects: 30 PD + 15 HC Bidirectional LSTM+CNN 10-fold cross-validation Accuracy:75.56% F-score: 80.70% Specificity: 76.67% Sensitivity: 85.19% MCC: 0.4811
Zahid et al. (2020) 2020 Classification (PD vs. HC) Voice dataset pc-Gita dataset 100 subjects: 50 PD + 50 HC AlexNet 5-fold cross-validation Accuracy (RF): 99%, Accuracy (MLP): 99.7%
Rizvi et al. (2020) 2020 Classification (PD vs. HC) Voice dataset PSD dataset 40 subjects: 20 PD + 20 HC LSTM + DNN Not specified Accuracy: 99.03%, Sensitivity: 99%, Specificity: 99%, Precision: 99%
Abayomi-Alli et al. (2020) 2020 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC Bidirectional LSTM 5-fold cross-validation Accuracy: 82.86%
Gunduz (2019) 2019 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository 252 subjects: 188 PD + 64 HC 2D CNN Leave-one-person-out cross-validation Accuracy (Triple feature sets): 83.3% F-score (Triple feature sets): 89.4% MCC (Triple feature sets): 0.521
Nagasubramanian and Sankayya (2021) 2021 Classification (PD vs. HC) Voice dataset Parkinson telemonitoring dataset + multi-variate sound record dataset 102 subjects: 82 PD + 20 HC DWVDA Not specified Accuracy (ADNN): 98.96% Specificity (ADNN): 98.82% Recall (ADNN): 98.89% Precision (ADNN): 98.90% MAE(ADNN): 1.04
Fang et al. (2020) 2020 Classification (PD vs. HC) Voice dataset Collected from participants 68 subjects: 34 PD + 34 HC CNN + LSTM LOSO validation ACC (Talking): 94.0% ACC (DDK): 83.5% ACC (Reading): 91.1%
Ali et al. (2023) 2023 Classification (PD vs. HC) Voice dataset Combined from two public datasets 228 subjects: 108 PD + 120 HC Ensemble learning-based framework LOSO Accuracy: 100%
Hireš et al. (2022) 2022 Classification (PD vs. HC) Voice dataset PC-GITA dataset 100 subjects: 50 PD + 50 HC 2D CNN 10-fold cross-validation Accuracy: 99%, AUC: 99.6%, Sensitivity: 86.2%, Specificity: 93.3%
Rana et al. (2022) 2022 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 195 subjects: 147 PD + 48 HC ANN train and test split (80–20%) Accuracy (SVM): 87.2%, Accuracy (NB): 74.1%, Accuracy (ANN): 96.7%, Accuracy (KNN): 87.2%
Madruga et al. (2023) 2023 Classification (PD vs. HC) Voice dataset Collected from participants 60 subjects: 30 PD + 30 HC Passive aggressive classifier Cross-validation Accuracy (position 1): 70.1%, Accuracy (position 2): 71.8%, Accuracy (position 3): 72.9%, Accuracy (position 4): 73.1%
Govindu and Palwe (2023) 2023 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC RF train and test split (75–25%) Accuracy: 91.8%, Precision: 95.0%, Recall: 86.0%
Celik and Başaran (2023) 2023 Classification (PD vs. HC) Voice dataset PD Dataset and PDO Dataset PD Dataset: 252 subjects (188 PD + 64 HC) PDO Dataset: 31 subjects (23 PD+ 8 HC) SkipCon Net + RF Not specified Accuracy: 99.1%, Precision: 99.0%, Recall: 99.0%, Specificity: 98%, Specificity:98.77%
Khaskhoussy and Ayed (2023) 2023 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository 40 subjects: 20 PD + 20 HC Polynomial kernel SVM 5-fold cross-validation Accuracy: 97.6%, Precision: 94%, Sensitivity: 96%, Specificity: 93%, F-Score: 94%
Dheer et al. (2023) 2023 Classification (PD vs. HC) Voice dataset Oxford Parkinson’s Disease dataset 31 subjects: 23 PD + 8 HC KNN train and test split (75–25%) Accuracy: 95.9%
Akila and Nayahi (2024) 2024 Classification (PD vs. HC) Voice dataset UCI Machine Learning repository 252 subjects: 188 PD + 64 HC MASS-PCNN (Multi-agent Salp Swarm Algorithm) 5-fold cross-validation Accuracy: 95.1%, Precision: 97.8%, Recall: 94.7%, F1 score: 99.1%
Handwriting
Drotár et al. (2014) 2014 Classification (PD vs. HC) Handwriting dataset Collected from participants 75 subjects: 37 PD + 38 HC SVM 10-fold cross-validation Accuracy: 95.29%, F1 score: 93.6%, Specificity: 94.3%, Precision: 92.7%, Recall: 94.3%, ROC-AUC: 98%
Drotár et al. (2015) 2015 Classification (PD vs. HC) Handwriting dataset Collected from participants 75 subjects: 37 PD + 38 HC RBF-SVM 10-fold cross-validation Accuracy: 88.1%
Pereira et al. (2015) 2015 Classification (PD vs. HC) Handwriting dataset Collected from participants 55 subjects: 37 PD + 18 HC NB 10-fold cross-validation Accuracy: 88.13%, Sensitivity: 89.74%, Specificity: 91.89%
Ribeiro et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset HandPD dataset 35 subjects: 14 PD + 21 HC Gated Recurrent Units + Attention train and test split (75–25%) Accuracy: 78.9%
Razzak et al. (2020) 2020 Classification (PD vs. HC) Handwriting dataset PaHaW, NewHan dataset, Parkinson’s Drawing Dataset 233 subjects: 142 PD + 91 HC 2D CNN (AlexNet, GoogleNet, VGGNet, ResNet) 10-fold cross-validation Accuracy: 89.48%
Kamran et al. (2021) 2021 Classification (PD vs. HC) Handwriting dataset HandPD, NewHandPD, PaHaw, Parkinson’s Drawing Dataset 233 subjects: 142 PD + 91 HC Parkinson’s Drawing Dataset: NA 2D CNN 5-fold cross-validation Accuracy: 98.04%
Gil-Martín et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset Spiral Drawing dataset 77 subjects: 62 PD + 15 HC 2D CNN subject-wise 5-fold cross-validation Accuracy: 96.5%, F1 score: 97.7%, AUC: 99.2%
Diaz et al. (2021) 2021 Classification (PD vs. HC) Handwriting dataset PaHaW, NewHan dataset 75 subjects: 37 PD + 38 HC BiGRUs + CNN 10-fold cross-validation Accuracy: 94.44%, AUC: 98.25%, Specificity: 98.0%, Sensitivity: 90.0%
Taleb et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset PDMulti MC dataset 42 subjects: 21 PD + 21 HC CNN + CNN-BLSTM 3-fold cross-validation Accuracy: 83.33%, Sensitivity: 71.43%, Specificity: 95.24%
Varalakshmi et al. (2022) 2022 Classification (PD vs. HC) Handwriting dataset Kaggle spiral data 51 subjects: 50 healthy + 1 PD A hybrid of RESNET-50 and SVM train and test split (70–30%) Accuracy: 98.45%, Sensitivity: 99%, Specificity: 99%
Li et al. (2022) 2022 Classification (PD vs. HC) Handwriting dataset Collected from participants 86 subjects: 43 PD + 43 HC CNN (CC-Net) Cross-validation Accuracy: 89.3%, Precision: 99.2%, Recall: 93.1%, F1 Score: 92.5%, Matthews correlation coefficient (MCC): 73.3%
Zhao and Li (2023) 2023 Classification (PD vs. HC) Handwriting dataset NewHan dPD 66 subjects: 31 PD + 35 HC CNN and bidirectional gated recurrent unit (BiGRU) train and test split (80–20%) Accuracy (meander): 92.91%, Accuracy (circle): 85.71%, Accuracy (spiral): 90.55%
Abdullah et al. (2023) 2023 Classification (PD vs. HC) Handwriting dataset NewHan dPD 66 subjects: 31 PD + 35 HC ResNet5+ VGG19+Inception V3+kNN train and test split (80–20%) Accuracy: 95.29%, AUC: 90%, Recall: 86%, Precision: 99%
Wang et al. (2024) 2024 Classification (PD vs. HC) Handwriting dataset DraWritePD, PaHaW datasets 75 subjects: 37 PD + 38 HC LSTM-CNN 5-fold cross-validation Accuracy: 96.2%, Sensitivity: 94.5%, Specificity: 97.3%, PaHaW Accuracy: 90.7%
Gait
Tahir and Manap (2012) 2012 Classification (PD vs. HC) Gait dataset Collected from participants 32 subjects: 12 PD + 20 HC SVM 10-fold cross-validation Accuracy: 100%, Sensitivity: 100%, Specificity: 100%
Wahid et al. (2015) 2015 Classification (PD vs. HC) Gait dataset Collected from participants 49 subjects: 23 PD + 26 HC RF 10-fold cross-validation Accuracy: 92.6%
Shetty and Rao (2016) 2016 Classification (PD vs. HD vs. ALS) Gait dataset Physionet dataset 48 subjects: 15 PD + 20 HD + 13 ALS SVM train and test split (50–50%) Accuracy: 83.33%, Sensitivity: 85.71%, Specificity: 75%
Abdulhay et al. (2018) 2018 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC Medium Gaussian SVM Not specified Accuracy: 94.8%
Rehman et al. (2019) 2019 Classification (PD vs. HC) Gait dataset Collected from participants 303 subjects: 119 PD + 184 HC RF 10-fold cross-validation Accuracy: 97%, Sensitivity: 100%, Specificity: 94%
Balaji et al. (2021) 2021 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC LSTM train and test split (80–20%) Accuracy: 98.6%
Xia et al. (2019) 2019 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC CNN, Attention-enhanced LSTM 5-fold cross-validation Accuracy: 99.07% Sensitivity: 99.10% Specificity: 99.01%
El Maachi et al. (2020) 2020 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC DNN 10-fold cross-validation Accuracy: 98.7%
Aversano et al. (2020) 2020 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC DNN 10-fold cross-validation Accuracy: 99.37%
Liu et al. (2021) 2021 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC CNN with Bi-LSTM Train and Test split (70–30%) Accuracy: 99.22%, Sensitivity: 100%, Specificity: 98.04%
Nguyen et al. (2022) 2022 Classification (PD vs. HC) Gait dataset Physionet 166 subjects: 93 PD + 73 HC Transformer 10-fold cross-validation Accuracy: 95.2%, Sensitivity: 98.1%, Specificity: 86.8%
Trabassi et al. (2022) 2022 Classification (PD vs. HC) Gait dataset Collected from participants 161 subjects: 81 PD + 80 HC SVM 10-fold cross-validation, Train and Test split (80–20%) Accuracy: 81% AUC: 80% F1 score:80% Precision: 80% Recall: 80%
Li and Li (2022) 2022 Classification (PD vs. HC) Gait dataset Two public datasets 306 subjects: 214 PD + 92 HC SVM Train and Test split (80–20%) Accuracy: 68% False positive rate: 98% Precision: 69% Recall: 98%
Aşuroğlu and Oğul (2022) 2022 Classification (PD vs. HC), Regression (UPDRS value) Gait dataset Physionet 166 subjects: 93 PD + 73 HC CNN + RF 10-fold cross-validation Accuracy: 99.5% Sensitivity: 98.7% Specificity: 99.1% Correlation Coefficient: 0.897 Mean Absolute Error: 3.009 Root Mean Square Error: 4.556
Ma et al. (2023) 2023 Classification (PD vs. HC) Gait dataset Physionet 166 subjects: 93 PD + 73 HC CNN+XGBoost Train and Test split Accuracy: 98.4%
Vinora et al. (2023) 2023 Classification (PD vs. HC) Gait dataset UCI Machine Learning repository 85 subjects: 70 PD + 15 HC SVM Not specified Recall: 100%, Precision: 50%, F1 score: 67%
Sharma et al. (2023) 2023 Classification (PD vs. HC) Gait dataset Physionet dataset 166 subjects: 93 PD + 73 HC CNN+SVM 10-fold cross-validation Accuracy: 95.2%
EEG
Lee et al. (2019) 2019 Classification (PD vs. HC) EEG Collected from participants 406 subjects: 203 PD + 203 HC 3D CNN Train and Test split (80–20%) Accuracy: 95.29% F1 score: 93.6% Specificity: 94.3% Precision: 92.7% Recall: 94.3% ROC-AUC: 98%
Oh et al. (2020) 2020 Classification (PD vs. HC) EEG Collected from participants 41 subjects: 20 PD + 21 HC CNN + LSTM 10-fold cross-validation Accuracy: 96.9%, Recall: 93.4%, Precision: 100%
Anjum et al. (2020) 2020 Classification (PD vs. HC) EEG Collected from participants Participants from New Mexico 54 subjects: 27 PD + 27 HC Participants from Iowa 28 subjects: 14 PD + 14 HC Linear predictive coding 10-fold cross-validation Accuracy: 85.3%, AUC: 93.3%, Sensitivity: 87.9%, Specificity: 82.7%
Shaban (2021) 2021 Classification (PD vs. HC) EEG UC San Diego Public Dataset 31 subjects: 16 PD + 15 HC ANN Train and Test split (80–20%) Accuracy: 98%, Sensitivity: 97%, Specificity: 100%
Loh et al. (2021) 2021 Classification (PD vs. HC) EEG UC San Diego Public Dataset 31 subjects: 16 PD + 15 HC 2D-CNN 10-fold cross-validation Accuracy: 99.46%
Motin et al. (2022) 2022 Classification (PD vs. HC) EEG UC San Diego Public Dataset 31 subjects: 16 PD + 15 HC Polynomial SVM Train and Test split Accuracy: 87.1%, Sensitivity: 93.3%, Specificity: 81.25%
Chawla et al. (2023) 2023 Classification (PD vs. HC) EEG Combined from two public datasets Dataset-1 40 subjects: 20 PD + 20 HC Dataset-2 31 subjects: 16 PD + 15 HC flexible analytic wavelet transform (FAWT) + KNN 10-fold cross-validation Dataset-1 Accuracy: 99% AUC: 99.1% Sensitivity: 99.12% Specificity: 99.45% Dataset-2 Accuracy: 95.85% AUC: 95.9% Sensitivity: 96.14% Specificity: 95.88%
Coelho et al. (2023) 2023 Classification (PD vs. HC) EEG Public PRED+C repository 50 subjects: 25 PD + 25 HC SVM 5-fold cross-validation Accuracy: 89.56%
Nour et al. (2023) 2023 Classification (PD vs. HC) EEG UC San Diego Public Dataset 31 subjects: 16 PD + 15 HC Dynamic Classifier Selection (DCS) in Modified Local Accuracy (MLA) 5-fold cross-validation Accuracy: 99.3%, Precision: 99.31%, Recall: 99.31%
Zhao et al. (2024) 2024 Classification (PD vs. HC) EEG Collected from participants 100 subjects: 52 PD + 48 HC GSP-GCNs (Graph Signal Processing-Graph Convolutional Networks) 5-fold cross-validation Accuracy: 90.2%, AUC: 89.1%, Sensitivity: 84.0%, Specificity: 88.4%
Other Data
Bhandari et al. (2023) 2023 Classification (PD vs. HC) Gene dataset Five open-source peripheral blood microarray gene expression datasets on PD from GEO 742 subjects: 406 PD + 336 HC Logistic Regression 10-fold cross-validation Accuracy: 77.7%, Precision: 77.6%, Recall: 77.82%
Wang et al. (2023) 2023 Classification (PD vs. HC) Urine biomarkers Collected from participants 215 subjects: 104 PD + 111 HC XGBoost Not specified Accuracy: 96.5%, AUC: 99.2%
Junaid et al. (2023) 2023 Classification (PD vs. HC) Patient visits PPMI 215 subjects: 324 PD + 217 HC Light gradient boosting machines (LGBM) 10-fold cross-validation Accuracy: 90.73%, Precision: 83.27%, Recall: 89.53%
Igene et al. (2023) 2023 Classification (PD vs. HC) Movement data Collected from participants 34 subjects: 17 PD + 17 HC SVM 10-fold cross-validation Accuracy: 94.4%
Varghese et al. (2024) 2024 Classification (PD vs. HC) Smartwatch data, Questionnaire data PADS (PD Smartwatch) dataset 469 subjects: 276 PD + 114 DD + 79 HC Classifier stacking (SVM, NN, CatBoost, Xception- Time) Nested 5-fold cross-validation Accuracy: 91.16%, Precision: 96.98%, Recall: 92.40%, F1 score: 94.62%

HC: health control, NC: normal control, UPDRS: Unified PD Rating Scale, CNN: convolutional neural network, RNN: recurrent neural network, MLP: multilayer perceptron, DT: decision tree, SVM: support vector machine, ANN: Artificial neural network, RF: random forest, LR: linear regression, NB: Naïve Bayes

Neuroimaging data

Neuroimaging is a branch of medical imaging that applies radiological and other techniques to images of the nervous system (Rastogi et al. 2025; Kujur et al. 2022; Alhussen et al. 2025). With the increasing availability of large-scale neuroimaging datasets and advancements in ML and DL, neuroimaging has played an important role in the early detection, classification, computer-aided diagnosis, and monitoring of various neurological disorders (Goceri 2024, 2025; Nakach et al. 2024). Many studies have applied neuroimaging for the early diagnosis of PD using ML techniques. In this review, we include 18 articles using neuroimaging data. Among them, 12.5% of the articles (3/24) used SPECT data, 20.8% of the articles used (5/24) DTI data, 62.5% of the articles used (15/24) MRI data, and 4.2% (1/24) of the articles used Positron Emission Tomography (PET) imaging data. In terms of ML models, the most commonly used models are Support Vector Machine (SVM), Convolutional Neural Network (CNN), and 3D CNN. Besides, some data-preprocessing techniques are used to minimise the noise of the image. Data augmentation techniques, such as Generative Adversarial Network (GAN), may be used to increase the number of samples. For validation, various methods are used, including 10-fold cross-validation; train, validation, and test split validation; and train and test split validation. 45.8% (11/24) of the articles reported an accuracy of over 90%. There are also some problems with applying neuroimaging data to PD diagnosis. For example, some comparisons between previous studies are unfair as they applied different experimental datasets or the same dataset with different subjects. Besides, some studies only applied the train and test split validation, which is unsuitable because the dataset size is small. Fig 3 presents the distribution of traditional ML and DL approaches employed in neuroimaging-based studies. Since most neuroimaging data are saved in the form of medical imaging (Fig 4), the application of DL in neuroimaging datasets is more widespread than that of traditional ML.

Fig. 3.

Fig. 3

Distribution of traditional ML method and DL method in neuroimaging data. Blue represents DL and green represents traditional ML

Fig. 4.

Fig. 4

The samples of neuroimaging data

Voice data

Analysis of voice or speech characteristics could contribute to PD diagnosis and detection, especially as recent research has shown that voice impairment is the commonest underlying symptom in many PD patients (Karan et al. 2020). In PD diagnosis based on voice data, 57.1% of the articles (28/47) used the dataset collected from the University of Oxford (Tsanas et al. 2012). However, the dataset size is too small (only contains 31 participants, and the data distribution is unbalanced (23 PD patients and 8 healthy controls). These disadvantages cause the model to have a weak generalisation. For the model evaluation, 55.3% of the articles (26/47) used cross-validation, where 10-fold cross-validation was the most common method (18/47). Unfortunately, 14.9% of the articles (7/47) did not provide a detailed evaluation method. In addition, there was no uniform standard for splitting datasets. Some of the same speech samples often appeared in both the training set and testing set, which led to overly optimistic performance results.

Overall, voice data is the most widely used data modality but has limited potential to apply in the real world due to different languages, accents, and uncontrollable ambient sounds. One model may perform well on one specific dataset, but poorly on another. Many articles simply quoted the performance data from other studies rather than undertaking their own evaluation. Fig 5 presents the distribution of traditional ML and DL approaches employed in voice-based studies.

Fig. 5.

Fig. 5

Distribution of traditional ML method and DL method in voice data. Blue represents DL and green represents traditional ML

Handwriting data

Handwriting requires motor control and specific neuromuscular coordination. Handwriting abnormalities are a common early motor symptom of PD and, therefore, of potential value for diagnosis. The number of participants included in studies of handwriting-based PD diagnosis is relatively small. 14.3% (2/14) of the articles used a study population of more than 200, while 85.7% (12/14) of the articles included fewer than 200 patients, where SVM, CNN, and RNN were the most commonly used ML models. Regarding validation, 71.4% (10/14) of the articles applied k-fold cross-validation, and only 28.6% (4/14) of the articles used the train and test split validation mechanism. 57.1% (8/14) of the articles reported a diagnostic accuracy of over 90%, and 92.9% (13/14) of the articles reported an accuracy of over 80%. Fig 6 presents the distribution of traditional ML and DL approaches employed in handwriting-based studies. Since most handwriting datasets are saved in the form of pictures(Fig 7), the application of DL in handwriting datasets is more widespread than that of traditional ML.

Fig. 6.

Fig. 6

Distribution of traditional ML method and DL method in handwriting data. Blue represents DL and green represents traditional ML

Fig. 7.

Fig. 7

The samples of handwriting data

Gait data

Gait disorder is one of the most incapacitating motor symptoms in PD and a challenge for the medical specialist to evaluate. In PD diagnosis based on gait data, 64.7% of the articles (11/17) used the dataset from Physionet. This dataset contains 166 subjects (93 PD patients and 73 healthy controls (HC)). 58.8% of the articles (10/17) used cross-validation, where 10-fold cross-validation was the most common method (9/17).

The data in the dataset needs to be segmented according to the gait cycle; otherwise, some specific data samples may be located at the intersection part of the probability density functions for two classes. Moreover, extracting the features of the left and right gait separately may result in better performance. The gait data-based model is highly generalizable since walking posture is similar for people from different countries. Fig 8 presents the distribution of traditional ML and DL approaches employed in gait-based studies.

Fig. 8.

Fig. 8

Distribution of traditional ML method and DL method in gait data. Blue represents DL and green represents traditional ML

EEG data

EEG involves recording brain signals from the scalp’s surface. As PD is related to brain abnormalities, EEG signals can be applied to assist in PD diagnosis. Ten articles are included in this review. 60.0% (6/10) of the articles included fewer than 50 participants. 30.0% (3/10) of the articles used CNN-based models. 40.0% (4/10) articles applied 10-fold cross-validation, and 30.0% (3/10) articles applied to train and test split validation. Fig 9 presents the distribution of traditional ML and DL approaches employed in EEG-based studies.

Fig. 9.

Fig. 9

Distribution of traditional ML method and DL method in EEG data. Blue represents DL and green represents traditional ML

Other data

Besides these data modalities, this review also includes five research studies that used other data modalities, such as gene and urine biomarkers. Out of these, 60% (3/5) of articles used 10-fold cross-validation. Fig 10 presents the distribution of traditional ML and DL approaches employed in other data-based studies. Due to the limited computer science background of most authors who collected these new datasets, and the majority of datasets were recorded in the form of indicators or textual descriptions, traditional ML methods were chosen over DL approaches.

Fig. 10.

Fig. 10

distribution of traditional ML method and DL method in other data. Blue represents DL and green represents traditional ML

Datasets

We briefly summarize the five commonly used public datasets for ML-based PD diagnosis.

PPMI Parkinson’s Progression Markers Initiative (PPMI) dataset was sponsored by the Michael J. Fox Foundation (MJFF). It is a dataset used for PD diagnosis with neuroimaging data modality. The study contains imaging, clinical, and biological data on PD patients and the HC group. It is designed to define and discover biomarkers of PD progression.

PC-GITA PC-GITA dataset, also called the new Spanish speech corpus dataset, is the first dataset that provides speech recordings in Spanish (Orozco-Arroyave et al. 2014). It is a dataset used for PD diagnosis with voice data modality. This dataset contains speech recordings of 50 PD patients and 50 HC subjects, where all subjects are native Spanish speakers. The speech recordings were collected following a designed protocol, and the corpus dataset includes several tasks such as sustained phonations of the vowels and diadochokinetic evaluation.

HandPD The HandPD dataset is used for PD diagnosis with handwriting data and contains 55 subjects with 37 PD patients, and 18 HC subjects. Each subject was asked to complete the handwriting clinical exam, such as drawing spirals and circles (Pereira et al. 2015). As some subjects did not complete all of the exam tasks, the entire dataset comprises 373 images.

PaHaW Parkinson’s Disease Handwriting (PaHaW) dataset consists of 75 subjects with 37 PD patients and 38 HC subjects (Drotár et al. 2016). It is a dataset used for PD diagnosis with handwriting data. The tasks include drawing an Archimedean spiral, repetitively writing orthographically simple syllables and words, and writing a sentence.

Physionet Physionet repository, the title of the Research Resource for Complex Physiologic Signals, was established in 1999 and is supported by the National Institutes of Health (NIH) (Goldberger et al. 2000). It is a widely used repository of biomedical data and contains datasets that can be used for PD diagnosis with gait data modality. This repository enables researchers to share and reuse clinical research resources and reduce barriers to data access.

Clinical applicability

The clinical applicability of various diagnostic modalities for PD hinges on their practicality in real-world settings. Although neuroimaging techniques (DaTSCAN, SPECT) are useful in clinical diagnosis, they face limitations due to their high costs and the need for specialised equipment and trained personnel. These barriers make it less feasible to implement in low-resource Settings or routine screening. For EEG data, the subtleties associated with PD-related signal changes and the influences of various confounders, such as patient movement and electrical interference, complicate the interpretation of EEG results. Additionally, the absence of standardized protocols for EEG recording and analysis in the context of PD further complicates its widespread adoption in clinical practice. Conversely, voice, handwriting, and gait analyses offer a more accessible alternative, as they require minimal specialized equipment and can be performed remotely.

However, the clinical applicability of these modalities is contingent upon the standardization of data collection and the development of robust algorithms that can reliably interpret variations in patient data due to external factors such as background noise or emotional state. The adoption of voice and handwriting tools in clinical practice also depends on their integration into existing healthcare systems and workflows. For these tools to be widely accepted, they must demonstrate not only reliability and accuracy but also cost-effectiveness compared to more established diagnostic methods. PD poses a significant burden on both governments and patients’ families. As PD currently lacks a gold standard for diagnosis, ML tools are intended to serve as assistive tools, and their cost-effectiveness is crucial. Compared to MRI-based methods and EEG-based methods, voice, handwriting, and gait-based methods are more affordable and accessible. The integration with existing electronic health record (EHR) systems is also critical to ensure that AI models can be seamlessly embedded into current clinical workflows. To improve diagnostic precision and treatment planning, a database for PD patients with EHR should be established, which should contain a wide range of PD patient examination data, allowing for more personalised treatment. Lastly, for all diagnostic tools, including neuroimaging, voice, and handwriting analysis, there needs to be a clear regulatory pathway for their validation and approval. Establishing comprehensive guidelines that address privacy concerns, data security, and the ethical use of AI in clinical settings will be crucial for their broader adoption.

Evaluation metrics

The evaluation metrics utilised in an ML classification task are Accuracy, Precision, Sensitivity(Recall), Specificity, Area Under Curve (AUC), Matthews Correlation Coefficient (MCC), and F1 score. For an actual positive class, if the result is a predicted positive class, it is a True Positive (TP); otherwise, it is a False Negative (FN). For an actual negative class, if the result is a predicted positive class, it is a False Positive (FP); otherwise, it is a True Negative (TN).

graphic file with name d33e596.gif 1
graphic file with name d33e602.gif 2
graphic file with name d33e608.gif 3
graphic file with name d33e614.gif 4
graphic file with name d33e620.gif 5
graphic file with name d33e626.gif 6
graphic file with name d33e632.gif 7

Risk of bias

The risk of bias is assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST) (Wolff et al. 2019). PROBAST is designed to evaluate the risk of bias in the diagnostic model study. In this review, the risk of bias in all included studies is assessed independently and then validated by the authors separately. The results of the risk of bias assessment are shown in Table 2. Most of the studies are at high risk of bias or unclear, and 28 studies are at low risk of bias (Ya et al. 2022; Huang et al. 2023; Xu et al. 2023; Camacho et al. 2023; Peker et al. 2015; Chen et al. 2016; Parisi et al. 2018; Ali et al. 2019; Haq et al. 2019; Li et al. 2022; Zhao and Li 2023; Abdullah et al. 2023; Balaji et al. 2021; Xia et al. 2019; Oh et al. 2020; Trabassi et al. 2022; Anjum et al. 2020; Chawla et al. 2023; Coelho et al. 2023; Khaskhoussy and Ayed 2023; Nour et al. 2023; Junaid et al. 2023; Zhao et al. 2024; Priyadharshini et al. 2024; Wang et al. 2024; Akila and Nayahi 2024; Hireš et al. 2022; Tsai et al. 2023).

Table 2.

Risk of bias assessment of the included studies according to the PROBAST checklist. “+” indicates a low risk of bias, “-” indicates a high risk of bias, and “?” means an unclear risk of bias

# Study Participants Predictors Outcome Analysis Risk of bias
1 West et al. (2019) + + ?
2 Dai et al. (2019) ? + +
3 Zhang et al. (2019) + + ?
4 Chakraborty et al. (2020) + + ? + ?
5 Kaur et al. (2021) + + ? + ?
6 Vyas et al. (2022) + + ? + ?
7 Quan et al. (2021) + + ? ? ?
8 Zahid et al. (2020) + + ? + ?
9 Rizvi et al. (2020) + +
10 Abayomi-Alli et al. (2020) + + ?
11 Gunduz (2019) + + ? + ?
12 Nagasubramanian and Sankayya (2021) + + ?
13 Fang et al. (2020) + + ? ? ?
14 Ribeiro et al. (2019) + + ? ? ?
15 Razzak et al. (2020) + + ? ? ?
16 Kamran et al. (2021) + + ? ? ?
17 Gil-Martín et al. (2019) + + ? ? ?
18 Diaz et al. (2021) + + ? ? ?
19 Taleb et al. (2019) + + ? ? ?
20 Xia et al. (2019) + + + + +
21 El Maachi et al. (2020) + + ? ? ?
22 Aversano et al. (2020) + + ?
23 Liu et al. (2021) + + ? + ?
24 Lee et al. (2019) + + ? ? ?
25 Oh et al. (2020) + + + + +
26 Shaban (2021) + + + ? ?
27 Loh et al. (2021) + + ?
28 Prashanth et al. (2014) + + ? ? ?
29 Salvatore et al. (2014) + + ? ? ?
30 Rana et al. (2015) + + ? ? ?
31 Oliveira and Castelo-Branco (2015) + + ? ? ?
32 Zhang and Kagen (2017) + + ? ? ?
33 Peng et al. (2017) + + ? + ?
34 Sivaranjini and Sujatha (2020) + + +
35 Sakar and Kursun (2010) + + ? + ?
36 Bhattacharya and Bhatia (2010) + ?
37 Guo et al. (2010) + + + ? ?
38 Åström and Koker (2011) + + ? + ?
39 Ramani and Sivagami (2011) + + ? ? ?
40 Yadav et al. (2012) + ? +
41 Tsanas et al. (2012) + + ? + ?
42 Mandal and Sairam (2014) + + ? + ?
43 Hazan et al. (2012) + ?
44 Gharehchopogh and Mohammadi (2013) + + ? ? ?
45 Rustempasic and Can (2013) + + ?
46 Sharma and Giri (2014) + + +
47 Olanrewaju et al. (2014) + + ? ? ?
48 Peker et al. (2015) + + + + +
49 Gök (2015) + + + ? ?
50 Chen et al. (2016) + + + + +
51 Avci and Dogantekin (2016) ? + ?
52 Dinesh and He (2017) + ? ?
53 Caliskan et al. (2017) + + +
54 Parisi et al. (2018) + + + + +
55 Wroge et al. (2018) ? + ? ? ?
56 Lahmiri et al. (2018) + + ? + ?
57 Haq et al. (2018) + + +
58 Ali et al. (2019) + + + + +
59 Mostafa et al. (2019) + + ? ? ?
60 Lahmiri and Shmuel (2019) + + ? + ?
61 Haq et al. (2019) + + + + +
62 Senturk (2020) + + ?
63 Karan et al. (2020) + + ? + ?
64 Soumaya et al. (2021) + + ?
65 Karaman et al. (2021) + + +
66 Drotár et al. (2014) + + +
67 Drotár et al. (2015) + +
68 Pereira et al. (2015) + ?
69 Tahir and Manap (2012) + + +
70 Wahid et al. (2015) + + ? + ?
71 Shetty and Rao (2016) + + ?
72 Abdulhay et al. (2018) + + ?
73 Rehman et al. (2019) + + ? + ?
74 Balaji et al. (2021) + + + + +
75 Ya et al. (2022) + + + + +
76 Erdaş and Sümer (2022) ? + +
77 Huang et al. (2023) + + + + +
78 Ali et al. (2023) + +
79 Hireš et al. (2022) + + + + +
80 Rana et al. (2022) + +
81 Madruga et al. (2023) ? +
82 Varalakshmi et al. (2022) + +
83 Li et al. (2022) + + + + +
84 Zhao and Li (2023) + + + + +
85 Abdullah et al. (2023) + + + + +
86 Nguyen et al. (2022) + + +
87 Trabassi et al. (2022) + + + + +
88 Li and Li (2022) + +
89 Aşuroğlu and Oğul (2022) + + +
90 Ma et al. (2023) +
91 Anjum et al. (2020) + + + + +
92 Motin et al. (2022) +
93 Chawla et al. (2023) + + + + +
94 Coelho et al. (2023) + + + + +
95 Xu et al. (2023) + + + + +
96 Camacho et al. (2023) + + + + +
97 Govindu and Palwe (2023) + +
98 Celik and Başaran (2023) + +
99 Khaskhoussy and Ayed (2023) + + + + +
100 Dheer et al. (2023)
101 Vinora et al. (2023) +
102 Sharma et al. (2023) + + +
103 Nour et al. (2023) + + + + +
104 Bhandari et al. (2023) + +
105 Wang et al. (2023) + +
106 Junaid et al. (2023) + + + + +
107 Igene et al. (2023) +
108 Varghese et al. (2024) + + + ? ?
109 Zhao et al. (2024) + + + + +
110 Priyadharshini et al. (2024) + + + + +
111 Wang et al. (2024) + + + + +
112 Akila and Nayahi (2024) + + + + +
113 Talai et al. (2021) + + ? ? ?
114 Prasuhn et al. (2020) + + ?
115 Chen et al. (2023) ? + ? + ?
116 Tsai et al. (2023) + + + + +
117 Zhao et al. (2022) + + ?

We follow the standard PROBAST framework, which evaluates studies across four domains: participants, predictors, outcome, and analysis. We have found that many included studies used small datasets, which limit the generalizability of their findings. Moreover, several studies had methodological flaws, including data leakage, insufficient sample sizes, and unrealistic validation protocols. These issues contribute to a high risk of bias, particularly in the Participants and Analysis domains of the PROBAST framework. For example, in the study (Sivaranjini and Sujatha 2020), a high risk of bias was identified in the Analysis domain. This was due to the study only reporting the experimental results without providing a detailed analysis of the dataset or methodological details. Moreover, the study used MRI data and applied an image-level train-test split rather than subject-level cross-validation, which increased the likelihood of data leakage. As a result, the study was assessed as having a high risk of bias in the “Analysis” domain, and the overall risk of bias was deemed high. Fig. 11 shows the PROBAST evaluation results in a heatmap.

Fig. 11.

Fig. 11

Risk of bias PROBAST assessment summary

While certain data modalities are indeed associated with a higher risk of bias, they nonetheless demonstrate substantial potential for ML-based PD diagnostics. In particular, EEG and gait signals stand out due to their biological plausibility, accessibility, and practical advantages in clinical settings.

EEG offers high temporal resolution and captures neurophysiological activity directly linked to both motor dysfunction and cognitive impairment, two hallmark features of PD. Likewise, gait analysis reflects core motor symptoms such as bradykinesia, rigidity, and postural instability, making it a valuable modality for both diagnosis and monitoring of disease progression. Importantly, these modalities align well with clinicians’ existing understanding of PD pathophysiology and assessment practices, which may facilitate greater acceptance and integration into clinical workflows.

Case studies

In this paper, we have done 5 case studies (1 for MRI, 1 for gait, 1 for voice, 1 for EEG, 1 for handwriting). We repeat the experiment to reproduce the result provided in these papers (Table 3). In our reproduction experiments, we adopted a unified evaluation framework using the following metrics: Accuracy, Specificity, Sensitivity, Precision, Recall, F1 score, AUC (Area Under the ROC Curve), RGA (Ranked Graduation Accuracy) (Giudici and Raffinetti 2025), Lorenz Zonoid (Calzarossa et al. 2025), RGR (Rank Graduation Robustness) (Babaei et al. 2025).

Table 3.

Case studies results

Data modality Paper Report Result Reproduction Result Explainability Robust/Security
Voice Accuracy: 100% Accuracy: 100% Lorenz Zonoid: Cannot be calculated because AUC cannot be calculated RGR: 99.93%
Specificity: 0.00%
Sensitivity: 100.00%
Precision: 100.00%
Recall: 100.00%
F1 score: 100.00%
AUC: Cannot be calculated because there is only one class in the test set
RGA: 100.00%
Gait Accuracy: 95.2% Accuracy: 87.12% Lorenz Zonoid: 69.59% RGR: 99.84%
Specificity: 86.8% Specificity: 68.24%
Sensitivity: 98.1% Sensitivity: 94.03%
Precision: 88.99%
Recall: 94.03%
F1 score: 91.44%
AUC: 84.80%
RGA: 84.80%
EEG Accuracy: 98% Accuracy: 62.40% Lorenz Zonoid: 35.60% RGR: 99.84%
Specificity: 100% Specificity: 62.68%
Sensitivity: 97% Sensitivity: 62.10%
Precision: 62.00%
Recall: 62.10%
F1 score: 62.05%
AUC: 67.80%
RGA: 67.80%
MRI Accuracy: 90.36% Accuracy: 56.67% Lorenz Zonoid: -5.02% RGR: 84.31%
Sensitivity: 90.52% Specificity: 16.67%
Precision: 90.08% Sensitivity: 86.00%
F1 score: 90.25% Precision: 52.92%
AUC: 90.51% Recall: 86.00%
F1 score: 64.37%
AUC: 47.49%
RGA: 47.49%
Handwriting SP_50_50: SP_50_50: SP_50_50: SP_50_50:
Accuracy: 85.38% (Std: 2.37%) Accuracy: 85.49% (Std: 2.23%) Lorenz Zonoid: 87.10% (Std: 2.26%) RGR: 100.00% (Std: 0.00%)
Precision: 85.5% (Std: 3.1%) Specificity: 86.43% (Std: 2.24%)
Recall: 83.4% (Std: 5.4%) Sensitivity: 84.44% (Std: 5.49%)
F1 score: 84.3% (Std: 2.9%) Precision: 84.91% (Std: 1.86%)
Recall: 84.44% (Std: 5.49%)
F1 score: 84.56% (Std: 2.83%)
AUC: 93.55% (Std: 1.13%)
RGA: 93.55% (Std: 1.13%)
SP_75_25: SP_75_25: SP_75_25: SP_75_25:
Accuracy: 89.48% (Std: 3.67%) Accuracy: 84.03% (Std: 2.67%) Lorenz Zonoid: 88.04% (Std: 3.11%) RGR: 100.00% (Std: 0.00%)
Precision: 84.8% (Std: 4.7%) Specificity: 88.00% (Std: 7.86%)
Recall: 95.5% (Std: 4.8%) Sensitivity: 79.69% (Std: 7.30%)
F1 score: 89.7% (Std: 3.5%) Precision: 86.79% (Std: 6.60%)
Recall: 79.69% (Std: 7.30%)
F1 score: 82.61% (Std: 2.92%)
AUC: 94.02% (Std: 1.55%)
RGA: 94.02% (Std: 1.55%)
MEA_50_50: MEA_50_50: MEA_50_50: MEA_50_50:
Accuracy: 89.29% (Std: 3.75%) Accuracy: 82.03% (Std: 1.92%) Lorenz Zonoid: 80.62% (Std: 2.29%) RGR: 100.00% (Std: 0.00%)
Precision: 85.0% (Std: 4.5%) Specificity: 83.71% (Std: 5.00%)
Recall: 77.9% (Std: 7.9%) Sensitivity: 80.16% (Std: 5.32%)
F1 score: 81.0% (Std: 5.0%) Precision: 81.96% (Std: 4.05%)
Recall: 80.16% (Std: 5.32%)
F1 score: 80.81% (Std: 2.27%)
AUC: 90.31% (Std: 1.14%)
RGA: 90.31% (Std: 1.14%)
MEA_75_25: MEA_75_25: MEA_75_25: MEA_75_25:
Accuracy: 92.24% (Std: 2.65%) Accuracy: 79.40% (Std: 3.52%) Lorenz Zonoid: 72.46% (Std: 6.75%) RGR: 100.00% (Std: 0.00%)
Precision: 95.2% (Std: 2.5%) Specificity: 90.57% (Std: 3.63%)
Recall: 88.3% (Std: 4.9%) Sensitivity: 67.19% (Std: 8.76%)
F1 score: 92.4% (Std: 3.1%) Precision: 86.95% (Std: 3.57%)
Recall: 67.19% (Std: 8.76%)
F1 score: 75.41% (Std: 5.38%)
AUC: 86.23% (Std: 3.38%)
RGA: 86.23% (Std: 3.38%)

Std represents the standard deviations. SP_50_50 and SP_75_25 represent experiments using the Spiral Dataset, with 50%/50% and 75%/25% splits for training and testing, respectively. MEA_50_50 and MEA_75_25 represent experiments using the Meander Dataset with the same respective training/testing splits

Case study 1: voice

Parkinson Speech Dataset: The dataset was collected by Sarkar et al. at the Department of Neurology in Cerrahpasa, Faculty of Medicine, Istanbul University. The dataset can be divided into two parts: training and testing. The training dataset includes data from 20 PD patients and 20 healthy subjects. The age of PD patients is between 43 and 77 years, while healthy subjects are aged between 45 and 83 years. From each subject, 26 samples were recorded. For the testing part, it contains data from 28 subjects (all PD patients) aged between 39 and 79 years. For each subject, 6 samples were recorded.

Data Preprocessing: LDA is used to reduce the data dimension. It transforms the original feature vectors into the reduced vector space where the class separability is maximised.

Result: LOSO validation is used to evaluate the model’s performance. The source code is provided in https://github.com/LiaqatAli007/Automated−Detection-of-Parkinson-s-Disease-Based-on-Multiple-Types-of-Sustained-Phonations-using-Lin. There is only one class, “PD”, in the test set. The paper reported that the model can achieve a 100% accuracy, and the result of the reproduction experiment matches the result reported by the paper.

Case study 2: gait

Physionet Dataset: The dataset was collected from three research (Yogev et al. 2005; Frenkel-Toledo et al. 2005; Goldberger et al. 2000; Hausdorff et al. 2007). 93 Parkinson’s patients (mean age: 66.3 years; 63% men) and 73 healthy controls (mean age: 66.3 years; 55% men) are included in the dataset. For each subject, there are 8 sensors on each foot with a 2-minute length measure of the vertical ground reaction force (in Newtons). The output of each sensor is digitised and recorded at 100 samples per second. Two extra signals reflect the sum of the 8 sensor outputs for each foot.

Data Preprocessing: Each 1D signal is divided into smaller segments with a length of 100 time steps and 50

Result: 10-fold cross-validation is used to evaluate the model’s performance. There are two groups: PD and HC. Each of them is divided into 10 folds at the subject level and combined to form a fold with 70% Parkinson and 30% control. The source code is provided at https://github.com/DucMinhDimitriNguyen/Transformers-for-1D-signals-in-Parkinson-s-disease-detection-from-gait. The result report by the paper is 98.1% in sensitivity, 86.8% in specificity, and 95.2% in accuracy. However, the result of the reproduction experiment cannot achieve the reported performance. It achieved a sensitivity of 94.03%, specificity of 68.24%, and accuracy of 87.12%.

Case study 3: EEG

Dataset: The dataset was collected by the Aron lab at the University of California, San Diego, and subsequently further analyzed by the Swann lab at the University of Oregon. There are 16 PD patients (8 females; mean age: 62.6±8.3) and 15 HC (9 females; mean age: 63.5±9.6) included in the dataset. The data was captured using 40 electrodes with a sampling rate of 512Hz.

Data Preprocessing: Select the data in channels of Inline graphic, Inline graphic, and Inline graphic.9s to 2 minutes and segmented into patches of 512 time samples.

Result: The dataset is divided into three parts: 64% for train, 16% for validation, and 20% for test. There is no source code provided, only the model structure. The paper reported that it can achieve an accuracy of 98.00%, sensitivity of 97.00%, and specificity of 100.00%. However, according to our reproduction, it only achieves 62.40% accuracy, 62.10% sensitivity, and 62.68% specificity. The reason may be that the author used the pre-trained model.

Case study 4: handwriting

NewHandPD Dataset: The dataset was collected by the Botucatu Medical School, São Paulo State University. It contains 12 exams (4 of them related to spirals, 4 related to meanders, 2 circled movements, and left and right-handed diadochokinesis). There are 31 PD patients (10 females; mean age: 57.83±7.85) and 35 HC (17 females; mean age: 44.05±14.88) included in the dataset.

Data Preprocessing: The 5th and 90th percentiles were set as lower and upper bounds. Values outside these bounds were replaced by boundary values to mitigate outlier effects. Normalisation is applied to have a zero mean and unitary standard deviation.

Result: The dataset is divided into three parts: 60% for training, 15% for validation, and 25% for testing. The source code is provided in https://github.com/lzfelix/bag-of-samplings. The paper reported that it can achieve an accuracy of 89.48%±3.7%, precision of 84.8%±4.7%, recall of 95.5%±4.8%, and F1 score of 89.7%±3.5% in the Spiral dataset and an accuracy of 92.24%±2.65%, precision of 95.2%±2.5%, recall of 88.3%±4.9%, and F1 Score of 92.4%±3.1% in the Meander dataset. However, according to our reproduction, it only achieves 84.03%±2.67% accuracy, 86.79%±6.60% precision, 79.69%±7.30% recall and 82.61%±2.92% F1 score in the Spiral dataset and 79.40%±3.52% accuracy, 86.95%±3.57% precision, 67.19%±8.76% recall and 75.41%±5.38% F1 score in the Meander dataset.

Case study 5: MRI

Dataset: The dataset was created by Badea et al. (2017), which combined the T1 MRI images from two datasets collected by Neurocon and Taowu. There are 83 subjects included in the dataset, with 43 from Neurocon (27 PD patients and 16 controls) and 40 from Taowu (20 PD patients and 20 controls).

Data Preprocessing: Median slices from the axial, coronal, and sagittal planes of 3D MR images were extracted and resized to 224x224 pixels. The three median slices are combined into a single three-channel image to maintain spatial integrity across different planes.

Result: 10-fold cross-validation is used to evaluate the model’s performance. The source is not provided, but we reproduce the experiment based on the provided model architecture. The paper reported that it can achieve an accuracy of 90.36%, precision of 90.08%, sensitivity of 90.52%, AUC of 90.51%, and F1 Score of 90.25%. However, according to the reproduction result, it only obtained the accuracy of 56.67%, precision of 52.92%, sensitivity of 86.00%, AUC of 47.49%, and F1 Score of 64.37%.

Reproduction results

We have summarized the case study results, including both the original paper’s reported results and our reproduction results. The code of our reproduction can be accessed via: https://github.com/yiming95/PD_ML_benchmark. According to the reproduction, 3 out of 5 papers could not replicate the presented results. Most of the reviewed papers do not provide source code (MRI: 2, voice: 1, handwriting: 1, gait: 2, EEG: 0, and others: 1). The lack of open-source code negatively impacts the understanding and improvement of existing methods. Additionally, even for the studies that provide code, many fail to include complete code, data preprocessing steps, or specific hyperparameter values. These issues have led to many experiments failing to match the original findings.

More specifically, for the voice data modality, we have successfully reproduced the results with 100% accuracy. For the EEG data modality, the original paper reported an accuracy of 98%, whereas our reproduction result is 62.23%. Since the authors did not release their source code, we have re-implemented the model architecture based on the descriptions provided. The discrepancy may be due to the missing description of the implementation details in the original paper, such as the potential use of pre-trained model initialization or specific training techniques that were not disclosed in the original paper. For the gait data modality, the results differ slightly. A possible reason for this could be variations in hyperparameter tuning strategies. The original authors may not have provided the full set of hyperparameters for their model, leading to slight inconsistencies in the reproduced results. For the handwriting data modality, although the authors provided the code, our reproduced results have shown minor discrepancies. A likely explanation is the use of random data splitting, which can result in inconsistent datasets for model training. We believe this discrepancy is due to the absence of exact dataset splits, but it can be reproduced under certain dataset split, and we consider this result is reproducible. For the MRI data modality, the original authors did not release their source code, and key implementation details were also missing from the paper, which could have significantly influenced performance.

As researchers and developers struggle to validate and reproduce previous results, it affects the credibility and transparency of scientific research. Moreover, most of the models lack explainability, which can make health professionals hesitant to trust and adopt these AI tools. Without understanding how the AI system reached its diagnosis, there is a risk of misdiagnosis. If the AI system’s lack of interpretability leads to errors, doctors may find it difficult to identify and correct the issue, which could result in the wrong treatment for patients, severely affecting their health and quality of life. The complete availability of the codes and explainability for all included studies is shown in Table 4.

Table 4.

Summary of the code availability, data accessibility and explainability for the reviewed paper

Author Year Objective Data Modality Source Code Provided Data Accessibility Explainability
Neuroimaging
Prashanth et al. (2014) 2014 Classification (PD vs. HC) Neuroimaging: DaTSCAN SPECT NO https://www.ppmi-info.org/data NO
Salvatore et al. (2014) 2014 Classification (PD vs. HC) Neuroimaging: MRI NO NO NO
Rana et al. (2015) 2015 Classification (PD vs. HC) Neuroimaging: MRI NO NO NO
Oliveira and Castelo-Branco (2015) 2016 Classification (PD vs. HC) Neuroimaging: FP-CIT SPECT NO https://www.ppmi-info.org/data NO
Zhang and Kagen (2017) 2017 Classification (PD vs. HC) Neuroimaging: DaTSCAN SPECT NO https://www.ppmi-info.org/data NO
Peng et al. (2017) 2017 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Sivaranjini and Sujatha (2020) 2020 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
West et al. (2019) 2019 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Dai et al. (2019) 2019 Classification (PD vs. HC) Neuroimaging: PET NO https://www.ppmi-info.org/data; https://adni.loni.usc.edu/; https://db.humanconnectome.org/app/template/Login.vm NO
Zhang et al. (2019) 2019 Classification (Prodromal PD vs. Confirmed PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Chakraborty et al. (2020) 2020 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Kaur et al. (2021) 2021 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Vyas et al. (2022) 2022 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data NO
Ya et al. (2022) 2022 Classification (PD vs. NC) Neuroimaging: MRI NO NO NO
Erdaş and Sümer (2022) 2022 Classification (PD vs. NC) Neuroimaging: MRI NO https://fcon_1000.projects.nitrc.org/indi/retro/parkinsons.html NO
Huang et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI https://gitee.com/yxfamy/mnc-net_master.git (Currently 403 cannot access) YES
Xu et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI https://github.com/ymlasu/A-Bio-marker-using-Topological-Machine-Learning-of-rs-fMRI (Only part of the code is provided) https://www.ppmi-info.org/data NO
Camacho et al. (2023) 2023 Classification (PD vs. HC) Neuroimaging: MRI NO https://www.ppmi-info.org/data YES
Priyadharshini et al. (2024) 2024 Classification (PD vs. HC) Neuroimaging: 3D MRI NO YES
Talai et al. (2021) 2021 Classification (PD vs. PSP vs. HC) Neuroimaging: T1, T2, DTI MRI NO https://www.ppmi-info.org/data NO
Prasuhn et al. (2020) 2020 Classification (PD vs. HC) Neuroimaging: Diffusion Tensor Imaging (DTI) NO https://www.ppmi-info.org/data NO
Chen et al. (2023) 2023 Classification (PD-MCI vs. PD-NC) Neuroimaging: DTI (FA, MD, AD, RD, LDH) NO contact corresponding author for access YES
Tsai et al. (2023) 2023 Classification (PD vs. PSP vs. MSA vs. HC) Neuroimaging: DTI (whole-brain features) NO NO NO
Zhao et al. (2022) 2022 Classification (PD vs. HC) Neuroimaging: DTI (Fractional Anisotropy, MD) NO NO NO
Voice
Sakar and Kursun (2010) 2010 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Bhattacharya and Bhatia (2010) 2010 Classification (PD vs. HC) Voice dataset https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (Only part of the code is provided) https://archive.ics.uci.edu/dataset/174/parkinsons NO
Guo et al. (2010) 2010 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Åström and Koker (2011) 2011 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Ramani and Sivagami (2011) 2011 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Yadav et al. (2012) 2012 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Tsanas et al. (2012) 2012 Classification (PD vs. HC) Voice dataset NO NO NO
Mandal and Sairam (2014) 2014 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Hazan et al. (2012) 2012 Classification (PD vs. HC) Voice dataset NO NO NO
Gharehchopogh and Mohammadi (2013) 2013 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Rustempasic and Can (2013) 2013 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Sharma and Giri (2014) 2014 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Olanrewaju et al. (2014) 2014 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Peker et al. (2015) 2015 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Gök (2015) 2015 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Chen et al. (2016) 2016 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Avci and Dogantekin (2016) 2016 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Dinesh and He (2017) 2017 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Caliskan et al. (2017) 2017 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Parisi et al. (2018) 2018 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings NO
Wroge et al. (2018) 2018 Classification (PD vs. HC) Voice dataset NO NO NO
Lahmiri et al. (2018) 2018 Classification (PD vs. HC) Voice dataset NO NO NO
Haq et al. (2018) 2018 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Ali et al. (2019) 2019 Classification (PD vs. HC) Voice dataset https://github.com/LiaqatAli007/Automated-Detection-of-Parkinson-s-Disease-Based-on-Multiple-Types-of-Sustained-Phonations-using-Lin https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings NO
Mostafa et al. (2019) 2019 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Lahmiri and Shmuel (2019) 2019 Classification (PD vs. HC) Voice dataset NO NO NO
Haq et al. (2019) 2019 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Senturk (2020) 2020 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Karan et al. (2020) 2020 Classification (PD vs. HC) Voice dataset NO NO NO
Soumaya et al. (2021) 2021 Classification (PD vs. HC) Voice dataset NO NO NO
Karaman et al. (2021) 2021 Classification (PD vs. HC) Voice dataset NO NO NO
Quan et al. (2021) 2021 Classification (PD vs. HC) Voice dataset NO NO NO
Zahid et al. (2020) 2020 Classification (PD vs. HC) Voice dataset NO NO NO
Rizvi et al. (2020) 2020 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings NO
Abayomi-Alli et al. (2020) 2020 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Gunduz (2019) 2019 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification NO
Nagasubramanian and Sankayya (2021) 2021 Classification (PD vs. HC) Voice dataset NO NO NO
Fang et al. (2020) 2020 Classification (PD vs. HC) Voice dataset NO NO NO
Ali et al. (2023) 2023 Classification (PD vs. HC) Voice dataset NO NO NO
Hireš et al. (2022) 2022 Classification (PD vs. HC) Voice dataset NO NO NO
Rana et al. (2022) 2022 Classification (PD vs. HC) Voice dataset NO Avaiable on Request NO
Madruga et al. (2023) 2023 Classification (PD vs. HC) Voice dataset NO NO NO
Govindu and Palwe (2023) 2023 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Celik and Başaran (2023) 2023 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons;https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification NO
Khaskhoussy and Ayed (2023) 2023 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings NO
Dheer et al. (2023) 2023 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/174/parkinsons NO
Akila and Nayahi (2024) 2024 Classification (PD vs. HC) Voice dataset NO https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification NO
Handwriting
Drotár et al. (2014) 2014 Classification (PD vs. HC) Handwriting dataset NO NO NO
Drotár et al. (2015) 2015 Classification (PD vs. HC) Handwriting dataset NO NO NO
Pereira et al. (2015) 2015 Classification (PD vs. HC) Handwriting dataset NO NO NO
Ribeiro et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset https://github.com/lzfelix/bag-of-samplings https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ NO
Razzak et al. (2020) 2020 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/;https://www.kaggle.com/datasets/kmader/parkinsons-drawings; https://bdalab.utko.fekt.vut.cz/ NO
Kamran et al. (2021) 2021 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/;https://www.kaggle.com/datasets/kmader/parkinsons-drawings; https://bdalab.utko.fekt.vut.cz/ NO
Gil-Martín et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset NO https://archive.ics.uci.edu/dataset/395/parkinson+disease+spiral+drawings+using+digitized+graphics+tablet NO
Diaz et al. (2021) 2021 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ NO
Taleb et al. (2019) 2019 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ NO
Varalakshmi et al. (2022) 2022 Classification (PD vs. HC) Handwriting dataset NO https://www.kaggle.com/datasets/kmader/parkinsons-drawings NO
Li et al. (2022) 2022 Classification (PD vs. HC) Handwriting dataset NO NO NO
Zhao and Li (2023) 2023 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ NO
Abdullah et al. (2023) 2023 Classification (PD vs. HC) Handwriting dataset NO https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ NO
Wang et al. (2024) 2024 Classification (PD vs. HC) Handwriting dataset NO NO NO
Gait
Tahir and Manap (2012) 2012 Classification (PD vs. HC) Gait dataset NO NO NO
Wahid et al. (2015) 2015 Classification (PD vs. HC) Gait dataset NO NO NO
Shetty and Rao (2016) 2016 Classification (PD vs. HD vs. ALS) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Abdulhay et al. (2018) 2018 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Rehman et al. (2019) 2019 Classification (PD vs. HC) Gait dataset NO NO NO
Balaji et al. (2021) 2021 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Xia et al. (2019) 2019 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
El Maachi et al. (2020) 2020 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Aversano et al. (2020) 2020 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Liu et al. (2021) 2021 Classification (PD vs. HC) Gait dataset Submit an application to the author https://physionet.org/content/gaitpdb/1.0.0/ NO
Nguyen et al. (2022) 2022 Classification (PD vs. HC) Gait dataset https://github.com/DucMinhDimitriNguyen https://physionet.org/content/gaitpdb/1.0.0/ NO
Trabassi et al. (2022) 2022 Classification (PD vs. HC) Gait dataset NO Request from the corresponding author NO
Li and Li (2022) 2022 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Aşuroğlu and Oğul (2022) 2022 Classification (PD vs. HC), Regression (UPDRS value) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Ma et al. (2023) 2023 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
Vinora et al. (2023) 2023 Classification (PD vs. HC) Gait dataset NO NO NO
Sharma et al. (2023) 2023 Classification (PD vs. HC) Gait dataset NO https://physionet.org/content/gaitpdb/1.0.0/ NO
EEG
Lee et al. (2019) 2019 Classification (PD vs. HC) EEG NO NO NO
Oh et al. (2020) 2020 Classification (PD vs. HC) EEG NO NO NO
Anjum et al. (2020) 2020 Classification (PD vs. HC) EEG NO http://narayanan.lab.uiowa.edu/;http://predict.cs.unm.edu/ NO
Shaban (2021) 2021 Classification (PD vs. HC) EEG NO https://openneuro.org/datasets/ds002778/versions/1.0.5 NO
Loh et al. (2021) 2021 Classification (PD vs. HC) EEG NO https://openneuro.org/datasets/ds002778/versions/1.0.5 NO
Motin et al. (2022) 2022 Classification (PD vs. HC) EEG NO https://openneuro.org/datasets/ds002778/versions/1.0.5 YES
Chawla et al. (2023) 2023 Classification (PD vs. HC) EEG NO NO NO
Coelho et al. (2023) 2023 Classification (PD vs. HC) EEG NO http://predict.cs.unm.edu/ NO
Nour et al. (2023) 2023 Classification (PD vs. HC) EEG NO https://openneuro.org/datasets/ds002778/versions/1.0.5 NO
Zhao et al. (2024) 2024 Classification (PD vs. HC) EEG Request from the corresponding author NO NO
Other Data
Bhandari et al. (2023) 2023 Classification (PD vs. HC) Gene dataset https://github.com/nikitabhandari-dl/Parkinson-s-disease-diagnosis (Currently 404 cannot access) https://ngdc.cncb.ac.cn/ YES
Wang et al. (2023) 2023 Classification (PD vs. HC) Urine biomarkers NO NO NO
Junaid et al. (2023) 2023 Classification (PD vs. HC) Patient visits NO https://www.ppmi-info.org/ YES
Igene et al. (2023) 2023 Classification (PD vs. HC) Movement data NO https://doi.org/10.21227/g2g8-1503 NO
Varghese et al. (2024) 2024 Classification (PD vs. HC) Smartwatch data, Questionnaire data https://imigitlab.uni-muenster.de/published/pads-project https://uni-muenster.sciebo.de/s/q69vUfRc9vgBoWX NO

Discussions

Summary of findings

ML-based PD diagnosis is a rapidly growing and changing field of research. This systematic review includes 117 articles about PD diagnosis using ML from 2010 to 2024. We analyze and divide them into six categories based on the data modality used in the study: (1) Neuroimaging, (2) Voice, (3) Handwriting, (4) Gait, (5) EEG, and (6) Other data. Fig 12. provides the trends of the publication for the last 15 years (2010-2024). Compared with other modalities, the neuroimaging modality, especially DaTSCAN SPECT, is the best modality for PD diagnosis in clinical practice, whereas MRI is almost useless. However, the usage of neuroimaging can be expensive. Voice recording, handwriting, and gait data are non-invasive, cost-effective, and easily collected. Hence, these data may be used for PD diagnosis. The main disadvantage of using these modalities is the lack of uniform standards for data collection, which may lead to inaccurate diagnosis. In clinical practice, EEG is not useful for the diagnosis of PD. However, a few studies have used EEG to diagnose PD, and the validity of this modality needs to be further investigated by researchers in this field.

Fig. 12.

Fig. 12

The development and changes of data in different modalities

We have also summarized the changes in the application of traditional ML and DL in PD diagnosis over the past 15 years. Fig 13 illustrates the temporal evolution of the application of traditional ML and DL techniques in PD classification research over five-year intervals from 2010 to 2024. During the early years from 2010 to 2014, traditional ML methods such as SVM and Random Forest dominated the field, while there was less use of the DL approach. During the period from 2015 to 2019, DL gained momentum and nearly caught up with traditional ML methods. A major shift occurred in the period from 2020 to 2024, where the number of studies employing DL significantly surpassed those using traditional ML, which has established DL as the mainstream approach. This trend reflects the increasing availability of large-scale datasets, advancements in computational resources, and the superior performance of deep neural networks in complex biomedical classification tasks.

Fig. 13.

Fig. 13

Temporal evolution of the application of traditional ML and DL techniques in PD classification research over five-year intervals from 2010 to 2024

Across the 117 studies reviewed in our systematic review, the main issue is that comparing the model performance with different modalities is hard. For example, the clinical value for an ML-based PD diagnosis using neuroimaging and voice recordings differs. Another issue is that the authors needed to provide more implementation details. For instance, some articles have not reported hyperparameters clearly, which may cause difficulty in reproducing the experiments. In addition, some articles only used accuracy as the evaluation metric of the model, which is not reasonable. Accuracy can be misleading when the data is imbalanced, meaning that there are significantly more samples of one class than the others. Also, different misclassification errors can have different costs in real-world applications. In a medical diagnosis task, a false negative (i.e., a patient is predicted as not having a disease when they do) can have severe consequences compared to a false positive (i.e., a patient is predicted as having a disease when they don’t). Therefore, more evaluation metrics such as specificity and sensitivity should be considered.

Limitations of current studies

Dataset size

This review has identified several limitations of existing studies that have applied ML to PD diagnosis. Firstly, the number of database participants with PD is often relatively small; for example, the total number of subjects may be less than 50 (Sakar and Kursun 2010; Bhattacharya and Bhatia 2010; Guo et al. 2010; Åström and Koker 2011; Ramani and Sivagami 2011; Yadav et al. 2012; Mandal and Sairam 2014; Gharehchopogh and Mohammadi 2013; Rustempasic and Can 2013; Sharma and Giri 2014; Olanrewaju et al. 2014; Peker et al. 2015; Gök 2015; Chen et al. 2016; Avci and Dogantekin 2016; Dinesh and He 2017; Caliskan et al. 2017; Parisi et al. 2018; Haq et al. 2018; Ali et al. 2019; Mostafa et al. 2019; Lahmiri and Shmuel 2019; Haq et al. 2019; Senturk 2020; Soumaya et al. 2021; Quan et al. 2021; Rizvi et al. 2020; Abayomi-Alli et al. 2020; Govindu and Palwe 2023; Khaskhoussy and Ayed 2023; Dheer et al. 2023; Ribeiro et al. 2019; Taleb et al. 2019; Tahir and Manap 2012; Wahid et al. 2015; Shetty and Rao 2016; Oh et al. 2020; Shaban 2021; Loh et al. 2021; Motin et al. 2022; Chawla et al.2023; Igene et al. 2023). Only eight included articles have over 500 number of subjects (Prashanth et al. 2014; Oliveira and Castelo-Branco 2015; Zhang et al. 2019; Camacho et al. 2023; Priyadharshini et al. 2024; Tsai et al. 2023; Zhao et al. 2022; Bhandari et al. 2023). The small data size may limit the performance of the ML models.

Black box nature of ML models

Another challenge is the black-box nature of the ML model, which limits the clinical applications of ML in PD diagnosis. ML algorithms, such as SVM and DL models such as CNN and RNN, are all examples of black-box models. These models contain a large number of parameters, making it difficult to interpret how they arrive at their decisions. This makes it challenging to understand why a particular diagnosis is being made, and this lack of transparency can be a significant barrier to the adoption of ML in clinical environments. The diagnosis of PD is a life safety-critical medical task, where the accuracy of diagnosis is essential for the patient’s treatment and management. Therefore, there is a need to not only use ML as a decision-support tool but also to ensure that the ML models used are interpretable to medical experts and patients. Interpretable ML models allow doctors and patients to understand the reasoning behind the model’s decision-making process, thereby increasing their trust in the model’s accuracy and reliability. Interpretable ML models provide insights into the input features that have the most significant impact on the diagnosis, the relationship between the input features and the output, and how the model arrives at its final decision.

No standardization of validation

This review has identified a lack of standardization of validation. Included studies used different validation methods, including k-fold cross-validation and hold-out validation. The use of different validation methods makes comparisons between different studies difficult. More specifically, if one study claims that it outperformed the state-of-the-art (SOTA), the proposed methodology should at least replicate other SOTA methods under the same dataset, same experiment setup and exact validation mechanism. Otherwise, it is unconvincing, as the dataset’s bias and validation mechanism may produce this better performance and not necessarily the ML algorithm design.

Lack of medical experts’ participation

Most studies follow a typical sequence. First, different modality data are collected and processed from PD and healthy control participants. Next, clinical experts manually annotate the dataset. Finally, the ML model is trained to classify patients and healthy controls. Thus, clinicians only contribute to the data label annotation, which limits the performance of the ML model building. ML scientists and medical experts should collaborate at all stages to provide feedback on the model performance and give valuable suggestions on model selection and explanation.

Bias Risk and Trustworthiness of ML-Based PD Diagnosis

Despite the growing body of ML research on PD diagnosis, only 28 out of the 117 reviewed studies have been assessed as having an overall low risk of bias based on our PROBAST evaluation. Common issues include small sample sizes, lack of external validation, unclear blinding procedures, and potential data leakage during feature selection. These limitations significantly impact the reliability and generalizability of ML models. A model that performs well within a single cohort may still fail when applied to external or real-world clinical settings. Thus, confidence in ML-based diagnostic tools depends not only on predictive performance but also on methodological rigor and transparency. High-risk bias compromises both reproducibility and the level of clinical trust necessary for real-world deployment.

No standardizing ML approaches

Our systematic review reveals that there is currently no standardized ML approach for the diagnosis of PD. One of the key obstacles to achieving generalizable and reproducible ML models is the lack of standardization across publicly available datasets. This issue significantly hinders fair model comparison, reproducibility, and clinical translation. First, there is considerable heterogeneity in data acquisition protocols. Different datasets are collected using varying configurations; for example, EEG sampling rates may differ (e.g., 128 Hz vs. 1024 Hz), MRI scans may be acquired using different field strengths (e.g., 1.5T vs. 3T), and voice recordings may be captured under inconsistent environmental conditions. These discrepancies lead to variations in signal quality and frequency content, which directly affect feature extraction and model performance. Second, substantial variability exists in patient cohorts and diagnostic labelling. Datasets differ in inclusion criteria (e.g., drug-naïve vs. medicated patients), disease stage distributions, age ranges, and definitions of control groups. Furthermore, diagnostic labels are often assigned based on different clinical criteria, such as the MDS-UPDRS, Hoehn and Yahr staging, or clinician judgment, leading to label inconsistency and reduced comparability. Third, inconsistencies in preprocessing and feature engineering pipelines further complicate model standardization. Many studies employ custom workflows, such as filtering, artifact removal, and dimensionality reduction, that are often poorly documented and difficult to reproduce. In some cases, parameter tuning may even occur on the test set, introducing additional bias into performance evaluation. Finally, differences in data modalities and formats add to the complexity. Multimodal datasets often vary in terms of synchronization and alignment between modalities. Some datasets provide only raw signals, while others include derived features or lack essential metadata, making it challenging to develop standardized multimodal fusion methods.

Future research directions

Explainable artificial intelligence (XAI)

XAI aims to provide understandable human explanations to users to better understand the black box models’ decision process (Zhang et al. 2022). The XAI approach has the potential to generate improved models and verified predictions. Moreover, an XAI system can help clinicians and researchers to understand the reasoning behind an AI system’s decision and to identify potential biases or limitations in the model. This can help to improve the accuracy and reliability of PD diagnosis, which can have important implications for patient outcomes.

Data augmentation

Data augmentation is a method to generate synthetic data. As the dataset size used for ML-based PD diagnosis is relatively small, data augmentation is a feasible approach to increase the dataset size and further improve the performance and the generalisation of the ML model. Different data modalities need to apply different data augmentation methods. Generative Adversarial Networks (GAN) is a promising method which has mostly been applied to generating image data (Yi et al. 2019). It can create diverse and realistic synthetic data that can capture the underlying data distribution, which reduces overfitting in ML models by increasing the diversity of the training data. In the future, using GAN to generate voice, neuroimaging, handwriting, gait, and EEG data for PD diagnosis is also achievable.

Transfer learning

The size of the dataset currently used to diagnose PD is insufficient for ML; therefore, transfer learning could be an effective approach to improve training efficiency and speed. When working with a small dataset, there is a higher risk of overfitting, where the model becomes too specialized in the training data and performs poorly on unseen data. To address this issue, transfer learning can be employed by leveraging a pre-trained model that has learned features from a large dataset and transferring that knowledge to a smaller dataset. Additionally, transfer learning can save valuable time and computational resources by reducing the amount of training required for a new model. Instead of training a model from scratch, transfer learning enables the fine-tuning of an existing model on a small dataset, which is a more efficient and quicker process (Kaur et al. 2021).

Federated learning

ML models often require large amounts of user data. However, collecting data for PD poses challenges since individual hospitals and organisations collect data, and data sharing may be hindered. Federated learning presents a potential solution for developing models that identify PD biomarkers and patterns using data from various sources, such as medical records, clinical studies, and wearable devices. With federated learning, different parties can collaborate to create a shared model without sharing their data. This approach also facilitates the use of large datasets without centralising the data, which is essential when working with sensitive patient information. Instead, data remains on local devices, and the model is trained by aggregating information across multiple devices without transferring data. Federated learning thus protects patient privacy while enabling the development of accurate models (Rieke et al. 2020).

Multi-modality

The multi-modality approach is a promising direction, as it can integrate multiple-view information and perform better than a single modality (Makarious et al. 2022). Single-modality learning is prone to overfitting, especially when the data samples are limited, which is often the case with PD datasets that are small and prone to noise. By incorporating additional modalities, such as genetic analysis, neuroimaging, or EEG, the model can compensate for the lack of data, enhancing its ability to learn from different types of information, thereby improving diagnostic accuracy. Furthermore, the clinical manifestations of PD vary across patients, and a single modality may fail to capture these differences comprehensively. Multi-modal data can ensure that the model generalises better across different patient groups. Additionally, multi-modal models remain robust even when one modality’s data is missing or of poor quality, making reliable predictions without being affected by the absence of any single modality. However, the acquisition of diverse data presents challenges, particularly in data availability, quality, and integration. Developing datasets specifically designed for multi-modal research remains a significant hurdle, and the standardization of data collection protocols across modalities is necessary to ensure consistency. In the future, integrating diverse modalities such as genetic data, blood samples, neuroimaging, voice, handwriting, gait analysis, and EEG into a unified ML framework could significantly improve PD diagnosis, leading to earlier and more accurate diagnoses, better patient stratification, and personalized treatments, ultimately enhancing patient outcomes.

Open source culture and standard protocols

To promote the development of ML in the diagnosis of PD, researchers should proactively disclose the full source code and experimental details used in their studies. This includes all necessary steps of data preprocessing, model evaluation, hyperparameter tuning, and pre-trained model. It ensures that other researchers can accurately reproduce and validate experimental results. Additionally, researchers should create standardized datasets collection and evaluation protocols, allowing all methods to be assessed and compared on a fair and uniform basis. At the same time, academic journals should implement stricter peer-review processes, particularly focusing on the reproducibility of the submitted works. Reviewers need to be specifically trained to ensure they can thoroughly assess whether the provided materials are sufficient to replicate the study results. By taking these measures, the transparency and reliability of research can be enhanced, facilitating scientific progress and technological innovation in the field.

Ethical concerns

Ethics are important in applying ML and DL to PD diagnosis. First, data privacy and security are major issues, especially in the medical field, where patient health data contains sensitive information. Unauthorised collection and use of data may lead to privacy breaches and even malicious exploitation. Secondly, fairness is a critical concern. If training data is biased, the model may produce inaccurate diagnostic results for certain groups (e.g., specific ages, genders, or ethnicities), exacerbating health inequalities. Moreover, DL models are often seen as “black boxes”, lacking transparency in their decision-making processes. Medical professionals may be hesitant to trust and adopt AI systems if they cannot understand how diagnoses are made. Finally, as AI becomes more integrated into healthcare, the issue of accountability becomes increasingly complex. If an AI system makes a wrong diagnosis leading to harm, who should take responsibility? The developers, the healthcare institution, or the AI itself? These issues need to be carefully addressed within an ethical framework. Future research should focus on developing methods to enhance the interpretability and transparency of AI systems, establishing guidelines for data privacy and security, and creating clear accountability structures. Collaborative efforts between AI researchers, healthcare professionals, and ethicists are essential to ensure that these technologies are implemented responsibly and fairly, mitigating potential risks and improving patient outcomes.

More complete model evaluation

In future research, it is important not to rely solely on traditional evaluation metrics such as accuracy, AUC, and precision. While these metrics are undoubtedly valuable, they may not fully capture model performance, particularly in tasks involving ordinal or continuous outcomes. To address this limitation, we advocate for the complementary use of more agnostic and unified evaluation measures, such as the Rank Graduation Accuracy (RGA) proposed by Giudici and Raffinetti (2025). RGA is applicable across binary, ordinal, and continuous predictive settings, offering a more generalizable and consistent framework for comparing models under diverse data conditions and outcome types. Incorporating such metrics alongside traditional ones could significantly improve the fairness, robustness, and clinical relevance of performance evaluation in ML-based disease diagnosis.

Beyond predictive performance, model interpretability is another critical yet often underemphasized component of diagnostic model evaluation. Traditional metrics such as accuracy and AUC provide insights into how well a model performs, but offer little information about why it makes certain predictions. In medical applications, particularly in the diagnosis of complex neurodegenerative diseases such as PD, understanding the rationale behind model decisions is essential for building clinical trust, ensuring transparency, and facilitating adoption in practice. Despite its importance, explainability remains underexplored in many published studies. As shown in the Table. 4, only a limited number of works incorporate explainability techniques, and among those that there is little consistency in the methods used. Furthermore, the lack of open-source implementations prevents systematic comparison across models. To address these gaps, we encourage future research to integrate interpretability as a core component of model development and validation. In particular, model-agnostic explainability methods, which can be applied regardless of the underlying algorithm, should be prioritized, as they enable fairer and more standardized comparisons (Calzarossa et al. 2025). The adoption of such frameworks may also facilitate the identification of clinically relevant biomarkers, thereby strengthening the link between computational models and real-world medical applications.

In addition to performance and interpretability, robustness and security represent two further dimensions that are essential for the safe deployment of diagnostic models but are frequently overlooked. Robustness refers to a model’s ability to maintain stable performance when faced with noise, missing data, or domain shifts — all of which are common in real-world clinical settings. Security, by contrast, concerns the model’s resistance to adversarial examples or malicious attacks that could compromise its output. These aspects are rarely evaluated in existing studies, often due to a lack of reproducibility and the absence of standardized assessment tools. The SAFE AI framework (Babaei et al. 2025), for example, introduces the Rank Graduation Box as a structured, model-agnostic approach to evaluating robustness and security. We therefore recommend that future research explicitly incorporate robustness and security testing into the model evaluation process. Doing so will be crucial for developing trustworthy and clinically deployable AI systems, particularly in high-stakes domains such as healthcare.

Bias mitigation

To improve the reliability and clinical applicability of ML models for PD diagnosis, future research must systematically address the sources of bias identified by tools such as PROBAST. It includes implementing rigorous dataset selection with transparent inclusion and exclusion criteria, clearly reporting participant selection logic, and accounting for demographic diversity such as age, disease stage, and comorbidities. Feature selection should be strictly separated from model evaluation to prevent information leakage, an issue commonly caused by selecting features on the entire dataset prior to train-test splitting. Employing nested cross-validation can help mitigate this risk. External validation using independent datasets from different geographic, demographic, or temporal contexts remains essential for demonstrating model generalizability, yet is still underutilized. Moreover, we encourage researchers to explicitly report how each PROBAST domain is addressed, either in the methods section or supplementary materials, to enhance transparency and facilitate cross-study comparisons. Finally, close collaboration with clinical experts is crucial to identifying potential sources of bias in preprocessing and label interpretation, reducing cognitive bias, and ensuring clinical relevance. Incorporating these practices can significantly improve the transparency, robustness, and translational potential of ML-based diagnostic tools for PD.

Conclusions

This paper reviews current trends in applying ML technologies in PD diagnosis. In this review, studies are categorised by different data modalities used in the experiments, including neuroimaging, voice, handwriting, gait, and EEG. ML has shown great potential to assist PD diagnosis, and research findings also show that it can be used as a decision-support tool to assist doctors in screening, detecting, and diagnosing PD effectively. Research on applying ML to PD diagnosis still faces many limitations and challenges. We have these issues and proposed several future directions, including the use of explainable AI for model interpretability, data augmentation techniques to generate synthetic data, transfer learning to leverage pre-trained models, federated learning to protect data privacy, and multi-modality approaches to integrate diverse information from different modalities. Herein, a more comprehensive model evaluation, which is beyond traditional metrics such as accuracy and AUC, is essential for ensuring robust, fair, and clinically relevant results. Bias mitigation strategies should also be incorporated to tackle issues such as dataset imbalance, underrepresentation of subgroups, and algorithmic bias. The case studies on five data modalities show that some research papers in this field may face issues with reproduction. Open-source code and reproduced results are essential, and this should be emphasized. Additionally, an ethical framework should be established to ensure these technologies are implemented responsibly and fairly. This comprehensive review aims to reduce the gap between AI experts and medical professionals and help future researchers design ML-based PD diagnosis applications.

Acknowledgements

This research is supported by Ningbo Science and Technology Innovation 2025 Major Project 2022Z126; A.H. is awarded the Clinical Academic Research Partnership Grant by the UK Research and Innovation (Grant MR/T005580/1 and has received funding from the National Institute of Health/NIA, USA (Grant reference NIH1R56AG074467-01).

Appendix

We have included a meta-analysis for the voice data modality, which encompasses the effect sizes and relevant statistical trends. For both sensitivity and specificity, we perform the meta-analysis using the meta library in the R language. The forest plots are shown below (Fig. 14, Fig. 15). The forest plots display the variability in sensitivity and specificity across multiple studies of the voice modality.

Fig. 14.

Fig. 14

Forest plot for sensitivity across voice modality studies

Fig. 15.

Fig. 15

Forest plot for specificity across voice modality studies

Author contributions

J.Z., Y.Z. and Y.W. conceived and designed the study. J.Z. and Y.Z. independently screened and reviewed all included articles. J.Z., Y.Z. and Y.W. drafted the manuscript (Y.Z. contributed the abstract, introduction, methods, results, discussion and conclusion sections, J.Z. contributed to the results and discussion sections. Y.Z. significantly contributed to the figures and tables. Y.W. contributed to the methods, discussion and conclusion sections). Y.W., A.H. and B.W. secured the funding. Y.W., A.H. and T.D. supervised the project. Y.W., A.H., B.W., T.D., W.F. and W.X. contributed significant amendments to the final manuscript.

Data availibility

The datasets used in this study are publicly available. The Voice dataset can be accessed at https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple%20+types+of+sound+recordings, the Gait dataset can be accessed at https://physionet.org/content/gaitpdb/1.0.0/, the EEG dataset can be accessed at https://openneuro.org/datasets/ds002778/versions/1.0.5, the Handwriting dataset can be accessed at https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/, and the MRI dataset can be accessed at https://fcon_1000.projects.nitrc.org/indi/retro/parkinsons.html.

Declarations

Conflict of interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Zhang and Y. Zhang have contributed equally to this work.

References

  1. Abdullah SM, Abbas T, Bashir MH, Khaja IA, Ahmad M, Soliman NF, El-Shafai W (2023) Deep transfer learning based parkinson’s disease detection using optimized feature selection. IEEE Access 11:3511–3524 [Google Scholar]
  2. Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Abayomi-Alli A (2020) Bilstm with data augmentation using interpolation methods to improve early detection of parkinson disease. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 371–380. IEEE
  3. Abdulhay E, Arunkumar N, Narasimhan K, Vellaiappan E, Venkatraman V (2018) Gait and tremor investigation using machine learning techniques for the diagnosis of parkinson disease. Futur Gener Comput Syst 83:366–373 [Google Scholar]
  4. Aversano L, Bernardi ML, Cimitile M, Pecori R (2020) Early detection of parkinson disease using deep neural networks on gait dynamics. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
  5. Ali L, Chakraborty C, He Z, Cao W, Imrana Y, Rodrigues JJ (2023) A novel sample and feature dependent ensemble approach for parkinson’s disease detection. Neural Comput Appl 35(22):15997–16010 [Google Scholar]
  6. Avci D, Dogantekin A (2016) An expert diagnosis system for parkinson disease based on genetic algorithm-wavelet kernel-extreme learning machine. Parkinson’s disease 2016(1):5264743 [Google Scholar]
  7. Anjum MF, Dasgupta S, Mudumbai R, Singh A, Cavanagh JF, Narayanan NS (2020) Linear predictive coding distinguishes spectral eeg features of parkinson’s disease. Parkinsonism & related disorders 79:79–85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Alhussen A, Haq MA, Khan AA, Mahendran RK, Kadry S (2025) Xai-racapsnet: Relevance aware capsule network-based breast cancer detection using mammography images via explainability o-net roi segmentation. Expert Syst Appl 261:125461 [Google Scholar]
  9. Åström F, Koker R (2011) A parallel neural network approach to prediction of parkinson’s disease. Expert Syst Appl 38(10):12470–12474 [Google Scholar]
  10. Akila B, Nayahi JJV (2024) Parkinson classification neural network with mass algorithm for processing speech signals. Neural Comput Appl 36(17):10165–10181 [Google Scholar]
  11. Aşuroğlu T, Oğul H (2022) A deep learning approach for parkinson’s disease severity assessment. Heal Technol 12(5):943–953 [Google Scholar]
  12. Ali L, Zhu C, Zhang Z, Liu Y (2019) Automated detection of parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE journal of translational engineering in health and medicine 7:1–10 [Google Scholar]
  13. Bhattacharya I, Bhatia MPS (2010) Svm classification to distinguish parkinson disease patients. In: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, pp. 1–6
  14. Balaji E, Brindha D, Elumalai VK, Vikrama R (2021) Automatic and non-invasive parkinson’s disease diagnosis and severity rating using lstm network. Appl Soft Comput 108:107463 [Google Scholar]
  15. Babaei G, Giudici P, Raffinetti E (2025) A rank graduation box for safe ai. Expert Syst Appl 259:125239 [Google Scholar]
  16. Badea L, Onu M, Wu T, Roceanu A, Bajenaru O (2017) Exploring the reproducibility of functional connectivity alterations in parkinson’s disease. PLoS ONE 12(11):0188196 [Google Scholar]
  17. Bhandari N, Walambe R, Kotecha K, Kaliya M (2023) Integrative gene expression analysis for the diagnosis of parkinson’s disease using machine learning and explainable ai. Comput Biol Med 163:107140 [DOI] [PubMed] [Google Scholar]
  18. Chakraborty S, Aich S, Kim H-C (2020) Detection of parkinson’s disease from 3t t1 weighted mri scans using 3d convolutional neural network. Diagnostics 10(6):402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cano J-R (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(12):4820–4831 [Google Scholar]
  20. Celik G, Başaran E (2023) Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of parkinson’s disease from speech signals. Appl Acoust 211:109476 [Google Scholar]
  21. Caliskan A, Badem H, Basturk A, Yuksel M (2017) Diagnosis of the parkinson disease by using deep neural network classifier. IU-Journal of Electrical & Electronics Engineering 17(2):3311–3318 [Google Scholar]
  22. Calzarossa MC, Giudici P, Zieni R (2025) An assessment framework for explainable ai with applications to cybersecurity. Artif Intell Rev 58(5):150 [Google Scholar]
  23. Coelho BFO, Massaranduba ABR, Santos Souza CA, Viana GG, Brys I, Ramos RP (2023) Parkinson’s disease effective biomarkers based on hjorth features improved by machine learning. Expert Syst Appl 212:118772 [Google Scholar]
  24. Chawla P, Rana SB, Kaur H, Singh K, Yuvaraj R, Murugappan M (2023) A decision support system for automated diagnosis of parkinson’s disease from eeg using fawt and entropy features. Biomed Signal Process Control 79:104116 [Google Scholar]
  25. Chen H-L, Wang G, Ma C, Cai Z-N, Liu W-B, Wang S-J (2016) An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinson’s disease. Neurocomputing 184:131–144 [Google Scholar]
  26. Camacho M, Wilms M, Mouches P, Almgren H, Souza R, Camicioli R, Ismail Z, Monchi O, Forkert ND (2023) Explainable classification of parkinson’s disease using deep learning trained on a large multi-center database of t1-weighted mri datasets. NeuroImage Clinical 38:103405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Chen B, Xu M, Yu H, He J, Li Y, Song D, Fan GG (2023) Detection of mild cognitive impairment in parkinson’s disease using gradient boosting decision tree models based on multilevel dti indices. J Transl Med 21(1):310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dinesh A, He J (2017) Using machine learning to diagnose parkinson’s disease from voice recordings. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp. 1–4. IEEE
  29. Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2014) Decision support framework for parkinson’s disease based on novel handwriting markers. IEEE Trans Neural Syst Rehabil Eng 23(3):508–516 [DOI] [PubMed] [Google Scholar]
  30. Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2016) Evaluation of handwriting kinematics and pressure for differential diagnosis of parkinson’s disease. Artif Intell Med 67:39–46 [DOI] [PubMed] [Google Scholar]
  31. Drotár P, Mekyska J, Smékal Z, Rektorová I, Masarová L, Faundez-Zanuy M (2015) Contribution of different handwriting modalities to differential diagnosis of parkinson’s disease. In: 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings, pp. 344–348. IEEE
  32. Diaz M, Moetesum M, Siddiqi I, Vessio G (2021) Sequence-based dynamic handwriting analysis for parkinson’s disease detection with one-dimensional convolutions and bigrus. Expert Syst Appl 168:114405 [Google Scholar]
  33. Dheer S, Poddar M, Pandey A, Kalaivani S (2023) Parkinson’s disease detection using acoustic features from speech recordings. In: 2023 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), pp. 1–4. IEEE
  34. Dai Y, Tang Z, Wang Y et al (2019) Data driven intelligent diagnostics for parkinson’s disease. Ieee access 7:106941–106950 [Google Scholar]
  35. El Maachi I, Bilodeau G-A, Bouachir W (2020) Deep 1d-convnet for accurate parkinson disease detection and severity prediction from gait. Expert Syst Appl 143:113075 [Google Scholar]
  36. Erdaş ÇB, Sümer E (2022) A deep learning method to detect parkinson’s disease from mri slices. SN Computer Science 3(2):120 [Google Scholar]
  37. Fang H, Gong C, Zhang C, Sui Y, Li L (2020) Parkinsonian chinese speech analysis towards automatic classification of parkinson’s disease. In: Machine Learning for Health, pp. 114–125. PMLR
  38. Frenkel-Toledo S, Giladi N, Peretz C, Herman T, Gruendlinger L, Hausdorff JM (2005) Effect of gait speed on gait rhythmicity in parkinson’s disease: variability of stride time and swing time respond differently. J Neuroeng Rehabil 2:1–7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation 101(23):215–220 [Google Scholar]
  40. Guo P-F, Bhattacharya P, Kharma N (2010) Advances in detecting parkinson’s disease. In: Medical Biometrics: Second International Conference, ICMB 2010, Hong Kong, China, June 28-30, 2010. Proceedings 2, pp. 306–314. Springer
  41. Gharehchopogh FS, Mohammadi P (2013) A case study of parkinson’s disease diagnosis using artificial neural networks. International Journal of Computer Applications 73(19):1–6 [Google Scholar]
  42. Gil-Martín M, Montero JM, San-Segundo R (2019) Parkinson’s disease detection from drawing movements using convolutional neural networks. Electronics 8(8):907 [Google Scholar]
  43. Goceri E (2024) Vision transformer based classification of gliomas from histopathological images. Expert Syst Appl 241:122672 [Google Scholar]
  44. Goceri E (2025) An efficient network with cnn and transformer blocks for glioma grading and brain tumor classification from mris. Expert Syst Appl 268:126290 [Google Scholar]
  45. Gök M (2015) An ensemble of k-nearest neighbours algorithm for detection of parkinson’s disease. Int J Syst Sci 46(6):1108–1112 [Google Scholar]
  46. Govindu A, Palwe S (2023) Early detection of parkinson’s disease using machine learning. Procedia Computer Science 218:249–261 [Google Scholar]
  47. Giudici P, Raffinetti E (2025) Rga: a unified measure of predictive accuracy. Adv Data Anal Classif 19(1):67–93 [Google Scholar]
  48. Gunduz H (2019) Deep learning-based parkinson’s disease classification using vocal feature sets. Ieee access 7:115540–115551 [Google Scholar]
  49. Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK (2022) Convolutional neural network ensemble for parkinson’s disease detection from voice recordings. Comput Biol Med 141:105021 [DOI] [PubMed] [Google Scholar]
  50. Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S (2012) Early diagnosis of parkinson’s disease via machine learning on speech data. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, pp. 1–4. IEEE
  51. Hausdorff JM, Lowenthal J, Herman T, Gruendlinger L, Peretz C, Giladi N (2007) Rhythmic auditory stimulation modulates gait variability in parkinson’s disease. Eur J Neurosci 26(8):2369–2375 [DOI] [PubMed] [Google Scholar]
  52. Haq AU, Li J, Memon MH, Khan J, Din SU, Ahad I, Sun R, Lai Z (2018) Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of parkinson disease. In: 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 101–106. IEEE
  53. Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M et al (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE access 7:37718–37734 [Google Scholar]
  54. Huang L, Ye X, Yang M, Pan L, Zheng S (2023) Mnc-net: Multi-task graph structure learning based on node clustering for early parkinson’s disease diagnosis. Comput Biol Med 152:106308 [DOI] [PubMed] [Google Scholar]
  55. Igene L, Alim A, Imtiaz MH, Schuckers S (2023) A machine learning model for early prediction of parkinson’s disease from wearable sensors. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0734–0737. IEEE
  56. Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T (2023) Explainable machine learning models based on multimodal time-series data for the early detection of parkinson’s disease. Comput Methods Programs Biomed 234:107495 [DOI] [PubMed] [Google Scholar]
  57. Jankovic J (2008) Parkinson’s disease: clinical features and diagnosis. Journal of neurology neurosurgery & psychiatry 79(4):368–376 [DOI] [PubMed] [Google Scholar]
  58. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y (2017) Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, 2(4)
  59. Khaskhoussy R, Ayed YB (2023) Improving parkinson’s disease recognition through voice analysis using deep learning. Pattern Recogn Lett 168:64–70 [Google Scholar]
  60. Kaur S, Aggarwal H, Rani R (2021) Diagnosis of parkinson’s disease using deep cnn with transfer learning and data augmentation. Multimedia Tools and Applications 80(7):10113–10139 [Google Scholar]
  61. Karaman O, Çakın H, Alhudhaif A, Polat K (2021) Robust automated parkinson disease detection based on voice signals with transfer learning. Expert Syst Appl 178:115013 [Google Scholar]
  62. Khan AA, Mahendran RK, Perumal K, Faheem M (2024) Dual-3dm 3 ad: mixed transformer based semantic segmentation and triplet pre-processing for early multi-class alzheimer’s diagnosis. IEEE Trans Neural Syst Rehabil Eng 32:696–707 [DOI] [PubMed] [Google Scholar]
  63. Khan AA, Madendran RK, Thirunavukkarasu U, Faheem M (2023) D2pam: Epileptic seizures prediction using adversarial deep dual patch attention mechanism. CAAI Transactions on Intelligence Technology 8(3):755–769 [Google Scholar]
  64. Kamran I, Naz S, Razzak I, Imran M (2021) Handwriting dynamics assessment using deep neural network for early identification of parkinson’s disease. Futur Gener Comput Syst 117:234–244 [Google Scholar]
  65. Kujur A, Raza Z, Khan AA, Wechtaisong C (2022) Data complexity based evaluation of the model dependence of brain mri images for classification of brain tumor and alzheimer’s disease. IEEE Access 10:112117–112133 [Google Scholar]
  66. Karan B, Sahu SS, Mahto K (2020) Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybernetics and Biomedical Engineering 40(1):249–264 [Google Scholar]
  67. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444 [DOI] [PubMed] [Google Scholar]
  68. Lahmiri S, Dawson DA, Shmuel A (2018) Performance of machine learning methods in diagnosing parkinson’s disease based on dysphonia measures. Biomed Eng Lett 8(1):29–39 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lee S, Hussein R, McKeown MJ (2019) A deep convolutional-recurrent neural network architecture for parkinson’s disease eeg classification. In: 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1–4. IEEE
  70. Li A, Li C (2022) Detecting parkinson’s disease through gait measures using machine learning. Diagnostics 12(10):2404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Liu S, Liu S, Cai W, Pujol S, Kikinis R, Feng D (2014) Early diagnosis of alzheimer’s disease with deep learning. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 1015–1018. IEEE
  72. Liu X, Li W, Liu Z, Du F, Zou Q (2021) A dual-branch model for diagnosis of parkinson’s disease based on the independent and joint features of the left and right gait. Applied Intelligence, 1–12
  73. Loh HW, Ooi CP, Palmer E, Barua PD, Dogan S, Tuncer T, Baygin M, Acharya UR (2021) Gaborpdnet: Gabor transformation and deep neural network for parkinson’s disease detection using eeg signals. Electronics 10(14):1740 [Google Scholar]
  74. Lahmiri S, Shmuel A (2019) Detection of parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed Signal Process Control 49:427–433 [Google Scholar]
  75. Li Z, Yang J, Wang Y, Cai M, Liu X, Lu K (2022) Early diagnosis of parkinson’s disease using continuous convolution network: Handwriting recognition based on off-line hand drawing without template. J Biomed Inform 130:104085 [DOI] [PubMed] [Google Scholar]
  76. Li R, Zhang W, Suk H-I, Wang L, Li J, Shen D, Ji S (2014) Deep learning based imaging data completion for improved brain disease diagnosis. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. Springer
  77. Ma Y-W, Chen J-L, Chen Y-J, Lai Y-H (2023) Explainable deep learning architecture for early diagnosis of parkinson’s disease. Soft Comput 27(5):2729–2738 [Google Scholar]
  78. Madruga M, Campos-Roca Y, Pérez CJ (2023) Addressing smartphone mismatch in parkinson’s disease detection aid systems based on speech. Biomed Signal Process Control 80:104281 [Google Scholar]
  79. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group* t (2009) Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Annals of internal medicine 151(4), 264–269
  80. Makarious MB, Leonard HL, Vitale D, Iwaki H, Sargent L, Dadu A, Violich I, Hutchins E, Saffo D, Bandres-Ciga S et al (2022) Multi-modality machine learning predicting parkinson’s disease. npj Parkinson’s Disease 8(1):35 [Google Scholar]
  81. Motin MA, Mahmud M, Brown DJ (2022) Detecting parkinson’s disease from electroencephalogram signals: an explainable machine learning approach. In: 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE
  82. Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Abd Ghani MK, Jaber MM, Khaleefah SH (2019) Examining multiple feature evaluation and classification methods for improving the diagnosis of parkinson’s disease. Cogn Syst Res 54:90–99 [Google Scholar]
  83. Mandal I, Sairam N (2014) New machine-learning algorithms for prediction of parkinson’s disease. Int J Syst Sci 45(3):647–666 [Google Scholar]
  84. Nakach F-Z, Idri A, Goceri E (2024) A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classification. Artif Intell Rev 57(12):327 [Google Scholar]
  85. Nguyen DMD, Miah M, Bilodeau G-A, Bouachir W (2022) Transformers for 1d signals in parkinson’s disease detection from gait. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 5089–5095. IEEE
  86. Nagasubramanian G, Sankayya M (2021) Multi-variate vocal data analysis for detection of parkinson disease using deep learning. Neural Comput Appl 33(10):4849–4864 [Google Scholar]
  87. Nour M, Senturk U, Polat K (2023) Diagnosis and classification of parkinson’s disease using ensemble learning and 1d-pdcovnn. Comput Biol Med 161:107031 [DOI] [PubMed] [Google Scholar]
  88. Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, Gonzalez-Rátiva MC, Nöth E (2014) New spanish speech corpus database for the analysis of people suffering from parkinson’s disease. In: Lrec, pp. 342–347
  89. Oliveira FP, Castelo-Branco M (2015) Computer-aided diagnosis of parkinson’s disease based on [123i] fp-cit spect binding potential images, using the voxels-as-features approach and support vector machines. J Neural Eng 12(2):026008 [DOI] [PubMed] [Google Scholar]
  90. Oh SL, Hagiwara Y, Raghavendra U, Yuvaraj R, Arunkumar N, Murugappan M, Acharya UR (2020) A deep learning approach for parkinson’s disease diagnosis from eeg signals. Neural Comput Appl 32:10927–10933 [Google Scholar]
  91. Olanrewaju RF, Sahari NS, Musa AA, Hakiem N (2014) Application of neural networks in early detection and diagnosis of parkinson’s disease. In: 2014 International Conference on Cyber and IT Service Management (CITSM), pp. 78–82. IEEE
  92. Prasuhn J, Heldmann M, Münte TF, Brüggemann N (2020) A machine learning-based classification approach on parkinson’s disease diffusion tensor imaging datasets. Neurological research and practice 2:1–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Perumal K, Mahendran RK, Ahmad Khan A, Kadry S (2025) Tri-m2mt: Multi-modalities based effective acute bilirubin encephalopathy diagnosis through multi-transformer using neonatal magnetic resonance imaging. CAAI Transactions on Intelligence Technology
  94. Pereira CR, Pereira DR, Da Silva FA, Hook C, Weber SA, Pereira LA, Papa JP (2015) A step towards the automated diagnosis of parkinson’s disease: Analyzing handwriting movements. In: 2015 IEEE 28th International Symposium on Computer-based Medical Systems, pp. 171–176. Ieee
  95. Parisi L, RaviChandran N, Manaog ML (2018) Feature-driven machine learning to improve early diagnosis of parkinson’s disease. Expert Syst Appl 110:182–190 [Google Scholar]
  96. Prashanth R, Roy SD, Mandal PK, Ghosh S (2014) Automatic classification and prediction models for early parkinson’s disease diagnosis from spect imaging. Expert Syst Appl 41(7):3333–3342 [Google Scholar]
  97. Priyadharshini S, Ramkumar K, Vairavasundaram S, Narasimhan K, Venkatesh S, Amirtharajan R, Kotecha K (2024) A comprehensive framework for parkinson’s disease diagnosis using explainable artificial intelligence empowered machine learning techniques. Alex Eng J 107:568–582 [Google Scholar]
  98. Peker M, Şen B, Delen D (2015) Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mrmr feature selection algorithm. Journal of healthcare engineering 6(3):281–302 [DOI] [PubMed] [Google Scholar]
  99. Peng B, Wang S, Zhou Z, Liu Y, Tong B, Zhang T, Dai Y (2017) A multilevel-roi-features-based machine learning method for detection of morphometric biomarkers in parkinson’s disease. Neurosci Lett 651:88–94 [DOI] [PubMed] [Google Scholar]
  100. Quan C, Ren K, Luo Z (2021) A deep learning based method for parkinson’s disease detection using dynamic features of speech. IEEE access 9:10239–10252 [Google Scholar]
  101. Ribeiro LC, Afonso LC, Papa JP (2019) Bag of samplings for computer-assisted parkinson’s disease diagnosis based on recurrent neural networks. Comput Biol Med 115:103477 [DOI] [PubMed] [Google Scholar]
  102. Rustempasic I, Can M (2013) Diagnosis of parkinson’s disease using principal component analysis and boosting committee machines. Southeast Europe journal of soft computing, 2(1)
  103. Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L (2019) Selecting clinically relevant gait characteristics for classification of early parkinson’s disease: a comprehensive machine learning approach. Sci Rep 9(1):17269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Rana A, Dumka A, Singh R, Rashid M, Ahmad N, Panda MK (2022) An efficient machine learning approach for diagnosing parkinson’s disease by utilizing voice features. Electronics 11(22):3782 [Google Scholar]
  105. Rieke N, Hancox J, Li W, Milletari F, Roth HR, Albarqouni S, Bakas S, Galtier MN, Landman BA, Maier-Hein K et al (2020) The future of digital health with federated learning. NPJ digital medicine 3(1):119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Rastogi D, Johri P, Donelli M, Kadry S, Khan AA, Espa G, Feraco P, Kim J (2025) Deep learning-integrated mri brain tumor analysis: feature extraction, segmentation, and survival prediction using replicator and volumetric networks. Sci Rep 15(1):1437 [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Agrawal R, Behari M (2015) Regions-of-interest based automated diagnosis of parkinson’s disease using t1-weighted mri. Expert Syst Appl 42(9):4506–4516 [Google Scholar]
  108. Razzak I, Kamran I, Naz S (2020) Deep analysis of handwritten notes for early diagnosis of neurological disorders. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE
  109. Rizvi DR, Nissar I, Masood S, Ahmed M, Ahmad F (2020) An lstm based deep learning model for voice-based detection of parkinson’s disease. Int. J. Adv. Sci. Technol, 29(8)
  110. Ramani RG, Sivagami G (2011) Parkinson disease classification using data mining algorithms. International journal of computer applications 32(9):17–22 [Google Scholar]
  111. Sigcha L, Borzì L, Amato F, Rechichi I, Ramos-Romero C, Cárdenas A, Gascó L, Olmo G (2023) Deep learning and wearable sensors for the diagnosis and monitoring of parkinson’s disease: A systematic review. Expert Syst Appl 229:120541 [Google Scholar]
  112. Salvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, Arabia G, Morelli M, Gilardi M, Quattrone A (2014) Machine learning on brain mri data for differential diagnosis of parkinson’s disease and progressive supranuclear palsy. J Neurosci Methods 222:230–237 [DOI] [PubMed] [Google Scholar]
  113. Senturk ZK (2020) Early diagnosis of parkinson’s disease using machine learning algorithms. Med Hypotheses 138:109603 [DOI] [PubMed] [Google Scholar]
  114. Sharma A, Giri RN (2014) Automatic recognition of parkinson’s disease via artificial neural network and support vector machine. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 4(3):2278–3075 [Google Scholar]
  115. Shaban M (2021) Automated screening of parkinson’s disease using deep learning based electroencephalography. In: 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 158–161. IEEE
  116. Sharma NP, Junaid I, Ari S (2023) Early diagnosis of parkinson’s disease and severity assessment based on gait using 1d-cnn. In: 2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), pp. 1–6. IEEE
  117. Sakar CO, Kursun O (2010) Telediagnosis of parkinson’s disease using measurements of dysphonia. Journal of medical systbagems 34:591–599 [Google Scholar]
  118. Shetty S, Rao Y (2016) Svm based machine learning approach to identify parkinson’s disease using gait analysis. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–5. IEEE
  119. Sivaranjini S, Sujatha C (2020) Deep learning based diagnosis of parkinson’s disease using convolutional neural network. Multimedia tools and applications 79(21):15467–15479 [Google Scholar]
  120. Soumaya Z, Taoufiq BD, Benayad N, Yunus K, Abdelkrim A (2021) The detection of parkinson disease using the genetic algorithm and svm classifier. Appl Acoust 171:107528 [Google Scholar]
  121. Tsai C-C, Chen Y-L, Lu C-S, Cheng J-S, Weng Y-H, Lin S-H, Wu Y-M, Wang J-J (2023) Diffusion tensor imaging for the differential diagnosis of parkinsonism by machine learning. Biomedical journal 46(3):100541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Taleb C, Khachab M, Mokbel C, Likforman-Sulem L (2019) Visual representation of online handwriting time series for deep learning parkinson’s disease detection. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 6, pp. 25–30. IEEE
  123. Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271 [DOI] [PubMed] [Google Scholar]
  124. Tahir NM, Manap HH (2012) Parkinson disease gait classification based on machine learning approach. J Appl Sci (Faisalabad) 12(2):180–185 [Google Scholar]
  125. Trifonova OP, Maslov DL, Balashova EE, Urazgildeeva GR, Abaimov DA, Fedotova EY, Poleschuk VV, Illarioshkin SN, Lokhov PG (2020) Parkinson’s disease: available clinical and promising omics tests for diagnostics, disease risk assessment, and pharmacotherapy personalization. Diagnostics 10(5):339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Talai AS, Sedlacik J, Boelmans K, Forkert ND (2021) Utility of multi-modal mri for differentiating of parkinson’s disease and progressive supranuclear palsy using machine learning. Front Neurol 12:648548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Trabassi D, Serrao M, Varrecchia T, Ranavolo A, Coppola G, De Icco R, Tassorelli C, Castiglia SF (2022) Machine learning approach to support the detection of parkinson’s disease in imu-based gait analysis. Sensors 22(10):3700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Vinora A, Ajitha E, Sivakarthi G, et al (2023) Detecting parkinson’s disease using machine learning. In: 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), pp. 1–6. IEEE
  129. Varghese J, Brenner A, Fujarski M, Alen CM, Plagwitz L, Warnecke T (2024) Machine learning in the parkinson’s disease smartwatch (pads) dataset. npj Parkinson’s Disease 10(1):9 [Google Scholar]
  130. Varalakshmi P, Priya BT, Rithiga BA, Bhuvaneaswari R, Sundar RSJ (2022) Diagnosis of parkinson’s disease from hand drawing utilizing hybrid models. Parkinsonism & related disorders 105:24–31 [DOI] [PubMed] [Google Scholar]
  131. Vyas T, Yadav R, Solanki C, Darji R, Desai S, Tanwar S (2022) Deep learning-based scheme to diagnose parkinson’s disease. Expert Syst 39(3):12739 [Google Scholar]
  132. Wahid F, Begg RK, Hass CJ, Halgamuge S, Ackland DC (2015) Classification of parkinson’s disease gait using spatial-temporal gait features. IEEE J Biomed Health Inform 19(6):1794–1802 [DOI] [PubMed] [Google Scholar]
  133. Wang X, Huang J, Chatzakou M, Medijainen K, Toomela A, Nõmm S, Ruzhansky M (2024) Lstm-cnn: An efficient diagnostic network for parkinson’s disease utilizing dynamic handwriting analysis. Comput Methods Programs Biomed 247:108066 [DOI] [PubMed] [Google Scholar]
  134. Wang X, Hao X, Yan J, Xu J, Hu D, Ji F, Zeng T, Wang F, Wang B, Fang J et al (2023) Urine biomarkers discovery by metabolomics and machine learning for parkinson’s disease diagnoses. Chin Chem Lett 34(10):108230 [Google Scholar]
  135. Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S, GroupInline graphic P (2019) Probast: a tool to assess the risk of bias and applicability of prediction model studies. Annals of internal medicine 170(1), 51–58
  136. Wroge TJ, Özkanca Y, Demiroglu C, Si D, Atkins DC, Ghomi RH (2018) Parkinson’s disease diagnosis using machine learning and voice. In: 2018 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–7. IEEE
  137. West C, Soltaninejad S, Cheng I (2019) Assessing the capability of deep-learning models in parkinson’s disease diagnosis. In: International Conference on Smart Multimedia, pp. 237–247. Springer
  138. Wang J, Xue L, Jiang J, Liu F, Wu P, Lu J, Zhang H, Bao W, Xu Q, Ju Z et al (2024) Diagnostic performance of artificial intelligence-assisted pet imaging for parkinson’s disease: A systematic review and meta-analysis. NPJ Digital Medicine 7(1):17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Xia Y, Yao Z, Ye Q, Cheng N (2019) A dual-modal attention-enhanced deep learning network for quantification of parkinson’s disease characteristics. IEEE Trans Neural Syst Rehabil Eng 28(1):42–51 [DOI] [PubMed] [Google Scholar]
  140. Xu N, Zhou Y, Patel A, Zhang N, Liu Y (2023) Parkinson’s disease diagnosis beyond clinical features: a bio-marker using topological machine learning of resting-state functional magnetic resonance imaging. Neuroscience 509:43–50 [DOI] [PubMed] [Google Scholar]
  141. Yogev G, Giladi N, Peretz C, Springer S, Simon ES, Hausdorff JM (2005) Dual tasking, gait rhythmicity, and parkinson’s disease: which aspects of gait are attention demanding? Eur J Neurosci 22(5):1248–1256 [DOI] [PubMed] [Google Scholar]
  142. Ya Y, Ji L, Jia Y, Zou N, Jiang Z, Yin H, Mao C, Luo W, Wang E, Fan G (2022) Machine learning models for diagnosis of parkinson’s disease using multiple structural magnetic resonance imaging features. Frontiers in Aging Neuroscience 14:808520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Yadav G, Kumar Y, Sahoo G (2012) Predication of parkinson’s disease using data mining methods: A comparative analysis of tree, statistical and support vector machine classifiers. In: 2012 National Conference on Computing and Communication Systems, pp. 1–8. IEEE
  144. Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58:101552 [DOI] [PubMed] [Google Scholar]
  145. Zhao S, Dai G, Li J, Zhu X, Huang X, Li Y, Tan M, Wang L, Fang P, Chen X et al (2024) An interpretable model based on graph learning for diagnosis of parkinson’s disease with voice-related eeg. NPJ Digital Medicine 7(1):3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Zhang J (2022) Mining imaging and clinical data with machine learning approaches for the diagnosis and early detection of parkinson’s disease. npj Parkinson’s Disease 8(1):13 [Google Scholar]
  147. Zhang YC, Kagen AC (2017) Machine learning interface for medical image analysis. J Digit Imaging 30:615–621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Zhao A, Li J (2023) A significantly enhanced neural network for handwriting assessment in parkinson’s disease detection. Multimedia Tools and Applications 82(25):38297–38317 [Google Scholar]
  149. Zahid L, Maqsood M, Durrani MY, Bakhtyar M, Baber J, Jamal H, Mehmood I, Song O-Y (2020) A spectrogram-based deep feature assisted computer-aided diagnostic system for parkinson’s disease. IEEE Access 8:35482–35495 [Google Scholar]
  150. Zhao H, Tsai C-C, Zhou M, Liu Y, Chen Y-L, Huang F, Lin Y-C, Wang J-J (2022) Deep learning based diagnosis of parkinson’s disease using diffusion magnetic resonance imaging. Brain Imaging Behav 16(4):1749–1760 [DOI] [PubMed] [Google Scholar]
  151. Zhang Y, Weng Y, Lund J (2022) Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 12(2):237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  152. Zhang X, Yang Y, Wang H, Ning S, Wang H (2019) Deep neural networks with broad views for parkinson’s disease screening. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1018–1022. IEEE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used in this study are publicly available. The Voice dataset can be accessed at https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple%20+types+of+sound+recordings, the Gait dataset can be accessed at https://physionet.org/content/gaitpdb/1.0.0/, the EEG dataset can be accessed at https://openneuro.org/datasets/ds002778/versions/1.0.5, the Handwriting dataset can be accessed at https://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/, and the MRI dataset can be accessed at https://fcon_1000.projects.nitrc.org/indi/retro/parkinsons.html.


Articles from Artificial Intelligence Review are provided here courtesy of Springer

RESOURCES