Skip to main content
Digital Health logoLink to Digital Health
. 2025 Jun 6;11:20552076251342878. doi: 10.1177/20552076251342878

Advanced comparative analysis of machine learning algorithms for early Parkinson's disease detection using vocal biomarkers

Ajay Kumar 1,, Jay Parkash Singh 1, Priyanka Paygude 2, Rachan Daimary 3, Sandeep Prasad 4
PMCID: PMC12144366  PMID: 40487885

Abstract

Objective

To investigate the potential of voice analysis—specifically sustained vowel phonation—as a non-invasive, cost-effective diagnostic method for early detection of Parkinson's disease (PD) using machine learning techniques.

Methods

A publicly available dataset from the University of California, Irvine (UCI) repository, comprising 252 voice recordings (188 from PD patients and 64 from healthy individuals), was analyzed. Machine learning classifiers, including k-nearest neighbors (KNN), AdaBoost, and artificial neural networks (ANNs), were trained and tested on the dataset. Model evaluation was conducted using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve. Kernel density estimation was applied to visualize and interpret classifier performance.

Results

Among the classifiers, KNN demonstrated the best performance with an accuracy of 98.52% and a mean accuracy of 97.33%. ANN and AdaBoost achieved mean accuracies of 93.15% and 91.77%, respectively. All models performed well across standard evaluation metrics, indicating strong discriminative ability for detecting PD from voice data.

Conclusion

The study confirms the feasibility of using sustained vowel phonation and machine learning for early PD diagnosis. The KNN classifier, in particular, shows excellent diagnostic accuracy. These findings support the integration of voice-based machine learning tools into clinical workflows, potentially enhancing early detection and management of PD.

Keywords: Parkinson's disease, machine learning, k-nearest neighbors, kernel density estimation, artificial neural networks

Introduction

Parkinson's disease (PD) is a chronic and progressive disorder that impairs the central nervous system. Specifically, it results in the degeneration of neurones in the substantia nigra, a critical area of the midbrain.13 These neurons are essential for the synthesis of dopamine, a neurotransmitter crucial for regulating motor functions. The deficiency of dopamine disrupts the body's motor control, leading to symptoms such as bradykinesia, muscle rigidity, poor posture, and tremors.37 Additionally, advanced stages of PD commonly present with cognitive and psychological issues, including anxiety, depression, and dementia. 8 The precise etiology of PD remains unknown; however, genetic predispositions and environmental toxins are suspected contributors.79 Age is a significant risk factor, with ∼1% of the global population over 60 years old affected by the condition.10,11 Gender also plays a role, as men are twice as likely to develop PD compared to women. 12 Early diagnosis is crucial, as it allows for the implementation of medication and therapeutic interventions that can alleviate symptoms and extend life expectancy. 13 Typically, individuals with PD live between 10 and 20 years following the onset of symptoms. 14

Patients with PD often exhibit diminished vocal intensity and monotony, coupled with a lack of facial expression.3,15 Their speech patterns are characterized by brief bursts of sound and hesitations before initiating speech, making dysphonia a useful diagnostic measure for PD.4,16 The dataset, sourced from Cerrahpaş a Faculty of Medicine, includes recordings from 188 PD patients (107 men and 81 women) aged 33–87 years (mean age 65.1 ± 10.9), and a control group of 64 healthy individuals (23 men and 41 women) aged 41–82 years (mean age 61.1 ± 8.9). Each participant's sustained phonation of the vowel /a/ was recorded three times using a microphone set at a sampling rate of 44.1 kHz.17,18

This study systematically examines the performance and effectiveness of several machine learning classifiers for binary classification, focusing on PD diagnosis. The classifiers analyzed include k-nearest neighbors (KNN), support vector machines (SVMs) with linear and radial basis function (RBF) kernels, AdaBoost, logistic regression (LR), and artificial neural networks (ANNs).5,15,1921 Additionally, we designed a classifier based on kernel density estimation (KDE) for comparison. Each model's accuracy was determined by averaging results over multiple iterations, incorporating additional metrics like the area under the receiver operating characteristic curve (AUROC) to ensure a comprehensive performance evaluation. KNN emerged as the top-performing model, showcasing superior accuracy in PD classification compared to other algorithms. We meticulously compared these results with established models, highlighting KNN's efficiency in this particular application. Furthermore, the KDE classifier, representing a novel approach to density estimation, added valuable insights into enhancing classification reliability. We explored the strengths and limitations of each model, with an emphasis on practical applications in healthcare diagnostics. The study underscores the significance of accurate classification in medical contexts, offering a robust framework for future research in machine learning (ML)-driven disease diagnosis. By identifying areas for potential improvements and introducing innovative techniques like KDE, this research advances the field, especially in its application to PD prediction.

Literature review

A substantial body of research has explored the utilization of machine learning algorithms for the diagnosis of PD, with key findings consolidated in Table 1. These investigations reveal significant variability in the diagnostic performance of different classifiers, thereby underscoring the pivotal importance of model selection, hyperparameter optimization, and evaluation methodology. The observed disparities in accuracy, sensitivity, and specificity across models reflect the nuanced challenges inherent in designing reliable ML-driven diagnostic systems for PD.

Table 1.

Summary of machine learning models and their performance for Parkinson's disease classification.

Sl. no Model used Achievements Accuracy (%) Authors
1 KNN Achieved significant accuracy for Parkinson's disease diagnosis 79.31 Asmae et al. 19
2 LR Employed four different classifiers, with LR being the best performer 88.6 Das 20
3 SVM Pioneered machine learning-based dysphonia detection for Parkinson's disease 91.4 Little et al. 16
4 KNN Reported the highest accuracy in the field for Parkinson's disease classification 98.2 Shirvan et al.22,23
5 ANN, SVM (linear), and SVM (RBF) Trained models with varying accuracies 86.47, 85, 87.5 Bjørklund et al. 10
6 AdaBoost Implemented an AdaBoost classifier, achieving a notable result 94 Anisha and Arulanand 24
7 LR and SVM Achieved consistent accuracies using both logistic regression and SVM 88 Xiong and Lu 25
8 LR and SVM Attained high accuracy using logistic regression and SVM on the same dataset 90, 88 Masud et al. 26
9 LR, KNN, and SVM Applied binary classifiers with various accuracies on the dataset 85, 85, 86, 84 Sakar et al. 18
10 SVM Achieved high accuracy using an SVM classifier 91.2 Gunduz 27
11 LR and KNN Reported high accuracies for both LR and KNN classifiers 91.6, 95.45 Ahmed et al. 28
12 KNN, SVM (RBF), and SVM (linear) Obtained varying accuracies using different classifiers 72.81, 88.21, 82.9 Pahuja and Nagabhushan 29
13 KNN Implemented a KNN classifier with strong accuracy 95.9 Solana-Lavalle and Rosas-Romero 30
14 LR and KNN Achieved solid results using logistic regression and KNN 86.2, 85.6 Mohammed et al. 31
15 KNN and SVM Achieved remarkable accuracy using both KNN and SVM classifiers 97.6, 90 Amato et al. 32
16 LR, KNN, AdaBoost, and SVM (linear and RBF) Reported varying accuracies using a range of classifiers 84.4, 90.6, 87.5, 87.5 Wrobel 33
17 SVM and KNN Attained modest accuracy using SVM and KNN classifiers 76.8, 73.8 Ozturk and Unal 34
18 KNN Trained a KNN model with high accuracy 97.3 Cordella et al. 35
19 LR and KNN Achieved solid accuracies using simple LR and KNN classifiers 80.03, 90.2 Sharanyaa et al. 36

KNN: k-nearest neighbors; ANN: artificial neural network; LR: logistic regression; SVM: support vector machine; RBF: radial basis function.

Notwithstanding the progress achieved, the current literature exhibits a noticeable scarcity of systematic comparative analyses involving multiple ML classifiers applied to standardized, real-world datasets, such as the UCI voice dataset. Furthermore, many existing studies do not incorporate advanced probabilistic evaluation frameworks, such as KDE, which can offer deeper insights into the distributional characteristics and decision boundaries of the classifiers. This gap highlights an unmet need for the development of rigorously validated, non-invasive, and cost-effective diagnostic solutions that leverage machine learning to enhance the accuracy, scalability, and accessibility of early PD detection, thereby augmenting conventional clinical diagnostic approaches.

Method

Dataset

The dataset used in this study is sourced from the University of California, Irvine (UCI) ML Repository, specifically designed for PD diagnosis based on vocal measurements (https://archive.ics.uci.edu/). The dataset comprises 252 instances, including 188 samples from individuals diagnosed with PD and 64 samples from healthy controls. 16 The participants range in age, gender, and disease severity, providing a diverse representation of the affected population. The feature set comprises various phonetic measurements derived from the sustained phonation of the vowel, recorded three times per participant using a 44.1 kHz sampling rate. These features capture dysphonia characteristics, including jitter, shimmer, and harmonic-to-noise ratio, which are common in PD patients. 19 This dataset is crucial for ML-based diagnostic research, offering a well-balanced set of features suitable for early PD detection. Its binary classification task, distinguishing PD-positive from PD-negative cases, has been instrumental in developing and evaluating algorithms such as KNN, SVM, and LR, advancing non-invasive diagnostics.31,32,37,38

The classification dataset utilized in this study was sourced from the UCI ML Repository, specifically curated for PD diagnosis through vocal analysis. The process of collecting the audio samples and subsequently converting them into a tabular format using wavelet transform (WT). 18 This transformation was essential for capturing the intricate patterns in the audio data that are indicative of PD.

  • J itter (% and Abs): Measures of frequency variation in the voice, reflecting irregularities in pitch that are commonly observed in PD patients.

  • Shimmer (% and dB): Indicators of amplitude variation, capturing the instability in vocal intensity, which is another hallmark of dysphonia in PD.

  • Harmonic-to-noise ratio (HNR): A metric that quantifies the degree of noise in the voice signal, where a lower HNR is typically associated with vocal disorders like PD.

  • Fundamental frequency (F0): The base rate of vibration of the vocal folds, which can be altered in Parkinson's patients, leading to a monotone speech pattern.

  • WT coefficients: These coefficients represent the transformed data from the audio samples, capturing both time and frequency information, which is crucial for identifying subtle changes in voice that may indicate the presence of PD.

These features serve as the input variables for the ML classifiers employed in the study, as shown in Table 2. By leveraging these parameters, the models aim to accurately distinguish between individuals with PD and healthy controls, thereby contributing to the development of more effective diagnostic tools.

Table 2.

Feature set and corresponding number of features.

Feature set Measure Number of features
Baseline features Jitter variants 5
Shimmer variants 6
Fundamental frequency parameters 5
Harmonicity parameters 2
Recurrence period density entropy (RPDE) 1
Detrended fluctuation analysis (DFA) 1
Pitch period entropy (PPE) 1
Time frequency features Intensity parameters 3
Formant frequencies 4
Bandwidth 4
MFCC's Mel frequency cepstral coefficients (MFCCs) 84
Wavelet transform features Wavelet transform (WT) features related to F0 182
Vocal fold features Glottis quotient (GQ) 3
Glottal to noise excitation (GNE) 6
Vocal fold excitation ratio (VFER) 7
Empirical mode decomposition 6

Optimization of dataset

The preprocessed data showed a significant bias toward the positive class, with a ratio of 75:25. We applied a balancing technique to address dataset balancing and feature scaling. Specifically, we utilized oversampling of the minority class to create a balanced dataset, employing a resampling approach akin to bootstrapping. This method ensures equal representation of both classes, which is essential for the robust and accurate performance of machine learning models. Also, we addressed key preprocessing challenges by applying feature scaling via normalization to standardize feature ranges and balancing the dataset to prevent bias toward the majority class. These steps are critical for improving model accuracy, efficiency, and fairness, ensuring more robust predictive performance.

Feature scaling: In machine learning, feature vectors often have different units and ranges, which can cause uneven contributions to the model. To mitigate this, we applied normalization, scaling the features to a range of [0, 1], as given by the following equations:

xnorm=xxminxmaxxmin (1)

For normally distributed data, we applied z-score standardization:

z=xμσ (2)

where x is the feature value, μ is the mean, and σ is the standard deviation. This ensures that all features are dimensionless, avoiding bias in the learning process.

Dataset balancing: To address the class imbalance, we adjusted the class distribution using techniques like oversampling and undersampling, ensuring equal representation of both classes and improving model performance during optimization.

Models and their learning approaches

The study employs various ML frameworks to enhance the understanding of PDs detection through regression and classification techniques. These models were implemented using popular software frameworks and libraries, including scikit-learn for LR, k-NN, SVM, AdaBoost, and KDE, and TensorFlow/Keras for ANN. The code used to implement these models is publicly available via the following GitHub repository: https://github.com/ajay3789/Early-Parkinson-s-Disease-Detection-Using-Vocal-Biomarkers/tree/main. Figure 1 illustrates the architecture and methodology of the various models employed in this study. Each of these learning approaches offers unique strengths and contributes distinct advantages to the acquisition and application of knowledge. By leveraging these diverse methodologies, the study aims to deepen insights into PD's, ultimately aiding in the development of more accurate diagnostic tools and effective treatment strategies.

Figure 1.

Figure 1.

Adopted a machine vision system for the prediction of Parkinson's disease.

LR is a statistical method used for binary classification problems. It predicts the probability of a binary outcome, and the hypothesis is represented as shown in equations (3) to (11).

hθ(x)=111+eθTx (3)

where hθ(x) is the predicted probability that the output is 1 (PD), θ are the parameters (weights) to be learned, and x is the input feature vector.

The model predicts 1 if hθ(x)0.5 , and 0 otherwise. The cost function for LR is defined as the negative log-likelihood:

J(θ)=1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))] (4)

where m is the number of training samples, and y(i) is the true label for sample i.

KNN algorithm classifies a data point based on the majority class among its KNN. The Euclidean distance between two points xi and xj in n -dimensional space is calculated by the following equation:

d(xi,xj)=k=1n(xikxjk)2 (5)

Once distances are computed for all neighbors, the algorithm chooses the k nearest points and assigns the majority class as the prediction.

SVM is used for classification by finding the hyperplane that best separates the data into classes. The equation of the hyperplane is given by the following equation:

WTx+b=0 (6)

where W is the weight vector, x is the feature vector, and b is the bias term.SVM attempts to maximize the margin between the hyperplane and the closest data points from either class, called support vectors. The optimization problem is given by the following equation:

minw,b12w2,s.t.y(i)(wTx(i)+b)1,i (7)

For non-linearly separable data, SVM uses the kernel trick, where a kernel function K(xi,xj) transforms the data into a higher-dimensional space.

AdaBoost is an ensemble learning algorithm that combines multiple weak classifiers to form a strong classifier. The final prediction is a weighted sum of the weak classifiers is given by the following equation:

F(x)=t=1Tαtht(x) (8)

where T is the number of weak classifiers, ht(x) is the prediction of the weak classifier at iteration t, and αt is the weight assigned to classifier ht , which is based on its accuracy.The weak classifiers are trained sequentially, and the weights of misclassified samples are increased so that the next classifier focuses on them.

KDE is a non-parametric way to estimate the probability density function of a random variable. The KDE for a sample x1,x2,...,xn is defined by the following equation:

f^(x)=1nhi=1nK(xxih) (9)

where K is the kernel function, commonly the Gaussian kernel, h is the bandwidth (smoothing parameter), and xi are the sample points.

The kernel function K controls the shape of the density estimate, and the bandwidth h controls the smoothness of the estimate.

ANNs are composed of layers of interconnected nodes (neurons), where each neuron performs a weighted sum of the inputs, applies an activation function, and passes the result to the next layer. For a single neuron in a layer, the output is given by the following equation:

aj(l)=f(i=1n(l1)wij(l)ai(l1)+bj(l)) (10)

where aj(l) is the activation of neuron j in layer l , wij(l) are the weights connecting neuron i in layer l1 to neuron j in layer l , bj(l) is the bias for neuron j in layer l, and f is the activation function, typically a sigmoid function, ReLU, or tanh.

The network is trained by minimizing the loss function, which is commonly the cross-entropy loss for classification, is given by the following equation:

J(θ)=1mi=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))] (11)

where y^(i) is the predicted output, and y(i) is the true label.

Evaluation metrics for comparing models

After implementing the classifiers on the dataset, we evaluated their performance using a range of standard evaluation metrics like true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), as shown in equations (12) to (16). These metrics provide a comprehensive way to assess the effectiveness of each model in predicting outcomes. By analyzing various factors such as accuracy, precision, recall, F1 score, and the AUROC, we aim to gain deeper insights into how well each classifier performs, particularly in terms of its ability to correctly identify cases of PD. This thorough comparison enables us to determine the strengths and limitations of each model, facilitating a more informed decision on the most suitable algorithm for this dataset.

Accuracy measures the correctness of the predicted values compared to the actual values. It is defined by the following equation:

Accuracy=TP+TNTP+TN+FP+FN (12)

Average Accuracy is calculated after shuffling the dataset, splitting it, and performing the train-test cycle multiple times. It is defined by the following equation:

AverageAccuracy=i=15Accuracyi5 (13)

Precision measures the exactness of the predictions, representing how many predicted positive cases are actually positive. It is defined by the following equation:

Precision=TPTP+FP (14)

Recall (also known as sensitivity) measures the completeness of the predictions, representing the probability that a relevant item is selected. It is defined by the following equation:

Recall=TPTP+FN (15)

F1 score is the harmonic means of precision and recall, providing a balance between the two metrics. It is defined by the following equation:

F1score=2*Precision*RecallPression+Recall (16)

AUROC illustrates the performance of a classification model at all threshold levels. The ROC curve plots two parameters: (a) the TP rate and (b) the FP rate. A model with an AUC of 0.5 or lower has no predictive ability, as its performance is equivalent to random guessing.

Also, the relationship between model loss and training epochs is a critical metric in ANN models applied to PD's datasets. An ideal loss curve converging to a global minimum indicates an efficient reduction in error during training. Similarly, a curve of model accuracy converging to a global maximum reflects the network's improved predictive capability as it optimizes with more epochs.

Results and discussions

Models performances

The model frameworks for PD's prediction were applied through the study of supervised learning techniques, specifically regression and classification models. The models evaluated include LR, KNN, SVM, KDE, and ANN. Each learning algorithm presents unique advantages in extracting patterns and enhancing predictive accuracy based on the dataset. These diverse approaches facilitate a comprehensive understanding of model behavior and its strengths in handling PD's prediction tasks. The results and analyses from the evaluation of these models are detailed in the following sections.

LR, the sigmoid function is applied to the computed log odds, transforming them into a probability that a binary variable belongs to one of two classes. Utilizing this fundamental statistical model, we achieved an average accuracy of 89.73%, with a peak accuracy of 91.45% on the given dataset. Table 3 demonstrates the model's effectiveness in binary classification tasks, particularly in distinguishing between the two outcome classes with a high degree of precision.

Table 3.

Evaluation metrics for logistic regression (LR) experiments.

Evaluation metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 89.73%
Accuracy 90.10% 88.88% 91.45%
Precision 0.90 0.89 0.91
Recall 0.90 0.89 0.91
F1 score 0.90 0.88 0.91

Also, the LR results were visualized through plots, as shown in Figure 2, yielding an AUROC of 0.945 in the best of five experimental trials. This high AUROC score signifies the model's robust capability in distinguishing between different classes, underscoring its effectiveness in predicting PD's with a low rate of both FP and FN.

Figure 2.

Figure 2.

Area under the receiver operating characteristic curve (AUROC) for logistic regression.

KNN classifier is a supervised learning algorithm that makes classifications based on the proximity of data points in a feature space. The algorithm assigns a data point to the class most common among its nearest neighbors. Using the KNN classifier, we achieved an average accuracy of 97.33%, with a maximum accuracy of 98.52%, indicating strong performance in classifying the dataset based on neighborhood proximity, as shown in Table 4.

Table 4.

Evaluation metrics for k-nearest neighbors (KNN) experiments.

Evaluation Metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 97.33%
Accuracy 98.52% 98.05% 97.57%
Precision 0.98 0.98 0.98
Recall 0.98 0.98 0.97
F1 score 0.97 0.97 0.97

The results were visualized through plots, as illustrated in Figure 3. For the KNN classifier, we achieved a notable AUROC of 0.9734 in the most successful of five experimental trials. This high AUROC score underscores the model's exceptional performance in effectively distinguishing between classes within the dataset.

Figure 3.

Figure 3.

Area under the receiver operating characteristic curve (AUROC) for K-nearest neighbors.

SVMs are robust, non-parametric, supervised learning models that define an optimal hyperplane in high-dimensional feature spaces for binary classification. In this study, utilizing an SVM with a linear kernel, we obtained an average accuracy of 87.68%, as shown in Table 5.

Table 5.

Evaluation metrics for the linear kernel support vector machine (SVM) classifier.

Evaluation metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 87.68%
Accuracy 89.49% 88.53% 88.16%
Precision 0.88 0.88 0.87
Recall 0.88 0.85 0.86
F1 score 0.88 0.86 0.86

Additionally, the application of an SVM with an RBF kernel resulted in a peak accuracy of 88.42%, as shown in Table 6.

Table 6.

Evaluation metrics for RBF kernel SVM classifier.

Evaluation metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 85.57%
Accuracy 88.42% 87.50% 86.18%
Precision 0.88 0.87 0.86
Recall 0.88 0.87 0.85
F1 score 0.87 0.86 0.86

SVM: support vector machine; RBF: radial basis function.

Also, the AUROC values were 0.9151 for the linear kernel and 0.8647 for the RBF kernel, underscoring the models’ proficiency in class separation and their predictive performance. These findings suggest that both kernel types exhibit substantial efficacy, with the linear kernel offering marginally superior performance in the classification of PD's.

AdaBoost is an ensemble learning algorithm designed for binary classification; it also combines the outputs of weak learners into a weighted sum, emphasizing more accurate classifiers. In the proposed study, we employed AdaBoost with a DecisionTreeClassifier base estimator, constrained to a maximum depth of 1, to control complexity and minimize overfitting. This model yielded an average accuracy of 91.56%, peaking at 94.08%, showcasing the efficacy of boosting in improving classification accuracy, as shown in Table 7.

Table 7.

Evaluation metrics for the AdaBoost classifier.

Evaluation metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 91.56%
Accuracy 94.08% 92.11% 91.45%
Precision 0.94 0.92 0.91
Recall 0.89 0.87 0.86
F1 score 0.91 0.89 0.88

The AdaBoost classifier yielded an AUC of 0.9752 in the best experiment, surpassing the KNN model's AUC, as shown in Figure 4. Although KNN achieved higher average accuracy, AUC offers a more comprehensive metric for binary classification. Therefore, AdaBoost proves to be a more reliable and accurate predictor than KNN.

Figure 4.

Figure 4.

Area under the receiver operating characteristic curve (AUROC) for the AdaBoost classifier.

The KDE method is a non-parametric approach for estimating probability densities. This technique can also be employed to construct binary classifiers. The KDE method works by approximating the distribution of a training set through a linear combination of kernel functions centered on the observed data points. For our study, we implemented a KDE-based classifier on the dataset. This classifier achieved an average accuracy of 86.58% and a maximum accuracy of 87.83%. These results demonstrate the effectiveness of KDE in capturing the underlying data distribution and making accurate predictions, as shown in Table 8.

Table 8.

Accuracies for artificial neural network (ANN) classifier.

Exp no. Training Acc (%) Test Acc (%)
1 95.87 89.47
2 96.69 95.39
3 90.08 92.11
4 98.35 94.08
5 94.21 94.74

Also, the AUROC values were 0.882 for KDE, as shown in Figure 5, underscoring the models’ proficiency in class separation and their predictive performance.

Figure 5.

Figure 5.

AUROC for KDE classifier.

AUROC: area under the receiver operating characteristic curve; KDE: kernel density estimation.

ANN is a type of neural network inspired by the structure and functioning of the human brain, consisting of interconnected artificial neurons. For our study, we designed an ANN with three dense layers comprising 12, 8, and 1 unit, respectively. The “tanh” activation function was applied to the first two hidden layers, while the “sigmoid” activation function was used in the final output layer. Through hyperparameter tuning, we determined that a batch size of 5 and training the network for 50–100 epochs produced the most effective results, optimizing the model's performance on our dataset as shown in Figure 6.

Figure 6.

Figure 6.

Illustrations graph (a) accuracies on various combinations of batch sizes and number of training epochs, (b) a plot of the artificial neural network (ANN's) accuracy versus the training epoch it is currently on, and (c) a plot of the ANN's loss versus the training epoch it is currently on.

We trained an ANN five times, each with different data orderings and varied train-validation-test splits to ensure robustness and generalizability. Utilizing this ANN classifier, we achieved an average accuracy of 93.16% and a maximum accuracy of 95.39% on the dataset, as shown in Table 9. These results highlight the model's strong performance and its ability to consistently deliver accurate predictions across different data configurations.

Table 9.

Evaluation metrics for kernel density estimation (KDE) classifier.

Evaluation metric Exp. 1 Exp. 2 Exp. 3
Avg. Accuracy 86.58%
Accuracy 87.83% 86.88% 86.84%
Precision 0.90 0.89 0.89
Recall 0.88 0.87 0.87
F1 score 0.88 0.88 0.88

Comparative analysis of models performance

This study systematically assessed the performance of several supervised machine learning models for PD prediction, including LR, KNN, and SVM with linear and RBF kernels, AdaBoost, KDE, and ANN. Evaluation metrics such as maximum and average accuracy, AUROC, precision, recall, and F1 score were employed to analyze each model's efficacy. KNN demonstrated the highest classification accuracy, achieving a maximum of 98.52% and an average accuracy of 97.33%, supported by a strong AUROC of 0.9730, highlighting its exceptional discriminative power. AdaBoost followed closely, attaining a peak accuracy of 94.08% and the highest AUROC of 0.9752, underscoring its robustness and reliability in binary classification. LR yielded consistent performance with 91.45% maximum accuracy and a balanced AUROC of 0.9420. SVM, using linear and RBF kernels, provided moderate results, with the linear variant (87.68%) outperforming the RBF (85.57%). KDE offered reasonable classification with 86.58% accuracy, effectively modelling data distribution. ANN achieved an average accuracy of 93.16% and displayed strong generalization across different experimental setups. Overall, while KNN excelled in accuracy, AdaBoost and ANN exhibited well-rounded and dependable performance across multiple evaluation parameters, as shown in Table 10.

Table 10.

Performance comparison of various models/classifiers.

Model/classifier Maximum accuracy Avg. Accuracy AUC Precision Recall F1 score
LR 91.45% 89.73% 0.9420 0.90 0.90 0.90
KNN 98.52% 97.33% 0.9730 0.98 0.98 0.97
SVM (linear kernel) 89.49% 87.68% 0.9151 0.88 0.88 0.88
SVM (RBF kernel) 88.42% 85.57% 0.8647 0.88 0.88 0.87
Adaboost 94.08% 91.56% 0.9752 0.94 0.89 0.91
KDE 87.83% 86.58% 0.8820 0.90 0.88 0.88
ANN 95.39% 93.16%

LR: logistic regression; KNN: k-nearest neighbors; SVM: support vector machine; KDE: kernel density estimation; ANN: artificial neural network; AUC: area under the curve.

Conclusion

In this study, we evaluated six classification methods to distinguish between Parkinson's positive and Parkinson's negative patients. A comprehensive array of performance metrics, including precision, accuracy, recall, average accuracy, AUROC, and F1 score, was employed to rigorously assess and compare each model's effectiveness. Key findings reveal significant insights and advancements in classification performance, particularly with the KNN and AdaBoost models. KNN achieved an average accuracy of 97.33% and a maximum accuracy of 98.52%, demonstrating substantial improvement over previous models, with superior precision, recall, and F1 score, making it a highly reliable choice. AdaBoost also performed exceptionally well, achieving an AUROC value of 0.9752, indicating its strength in balancing TP and FP rates—essential for binary classification. Although KNN outperformed in accuracy, AdaBoost's high AUROC suggests its overall superiority in scenarios where AUROC is critical. Additionally, we introduced a KDE for classification, achieving an average accuracy of 86.58% and a maximum of 87.83%. This marks the first known application of KDE for classification on this dataset, demonstrating its potential as a valuable tool for similar tasks. Future research should focus on optimizing hyperparameters and incorporating additional features to improve accuracy and generalizability for KNN and AdaBoost. For KNN, exploring alternative distance metrics and weighting schemes, and for AdaBoost, refining boosting rounds and learning rates may yield better results. For KDE, advanced smoothing techniques and alternative kernel functions could enhance performance. Combining these classifiers into an ensemble framework could harness their individual strengths and improve overall classification accuracy and robustness. Extending this research to broader datasets and real-world applications will further validate these models’ utility and effectiveness.

Acknowledgements

The authors wish to express their sincere gratitude to the Department of Computer Science and Engineering, Manipal University Jaipur, Rajasthan, India, for generously providing the essential computing resources that enabled this research. We also acknowledge the invaluable contribution of the University of California, Irvine (UCI) Machine Learning Repository, specifically designed for Parkinson's disease (PD) diagnosis based on vocal measurements dataset (https://archive.ics.uci.edu/) in this study.

Footnotes

Author contributions: Ajay Kumar: conceptualization, methodology, and writing–original draft. Jay Parkash Singh: data analysis, and writing–review and editing. Priyanka Paygude: data analysis, and writing–review and editing. Ranchan Daimary: data analysis, and writing–review and editing. Sandeep Prasad: data analysis, and writing–review and editing. All authors read and approved the final article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement: The code used for model development and evaluation, along with relevant instructions, is publicly accessible via the following GitHub repository: https://github.com/ajay3789/Early-Parkinson-s-Disease-Detection-Using-Vocal-Biomarkers/tree/main.

References

  • 1.Dauer W, Przedborski S. Parkinson’s disease: mechanisms and models. Neuron 2003; 39: 889–909. [DOI] [PubMed] [Google Scholar]
  • 2.Thomas B, Beal MF. Parkinson’s disease. Hum Mol Genet 2007; 16: R183–R194. [DOI] [PubMed] [Google Scholar]
  • 3.Alexander GE. Biology of Parkinson’s disease: pathogenesis and pathophysiology of a multisystem neurodegenerative disorder. Dialogues Clin Neurosci 2004; 6: 259–280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Przedborski S. Pathogenesis of nigral cell death in Parkinson’s disease. Parkinsonism Relat Disord 2005; 11: S3–S7. [DOI] [PubMed] [Google Scholar]
  • 5.Kumar A. Dynamic COVID-19 endurance indicator system for scientific decisions using ensemble learning approach with rapid data processing. In: International Conference on Computation of Artificial Intelligence & Machine Learning; 2024. pp. 10–28. DOI: 10.1007/978-3-031-71484-92. [DOI] [Google Scholar]
  • 6.Hassan A, Benarroch EE. Heterogeneity of the midbrain dopamine system: implications for Parkinson's disease. Neurology 2015; 85: 1795–1805. [DOI] [PubMed] [Google Scholar]
  • 7.Khan AU, Akram M, Daniyal M, et al. Awareness and current knowledge of Parkinson’s disease: a neurodegenerative disorder. Int J Neurosci 2019; 129: 55–93. [DOI] [PubMed] [Google Scholar]
  • 8.Davie CA. A review of Parkinson’s disease. Br Med Bull 2008; 86: 109–127. [DOI] [PubMed] [Google Scholar]
  • 9.Kumar A, Gill AS, Singh JP, et al. A comprehensive and comparative examination of machine learning techniques for diabetes Mellitus prediction. In: 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT); 2024 Jun 24. pp. 1–5. IEEE. [Google Scholar]
  • 10.Bjørklund G, Dadar M, Anderson G, et al. Preventive treatments to slow substantia nigra damage and Parkinson’s disease progression: a critical perspective review. Pharmacol Res 2020; 161: 105065. [DOI] [PubMed] [Google Scholar]
  • 11.Lai BCL, Tsui JKC. Epidemiology of Parkinson’s disease. BC Med J 2001; 43: 133–137. [Google Scholar]
  • 12.Cerri S, Mus L, Blandini F. Parkinson’s disease in women and men: what’s the difference? J Parkinsons Dis 2019; 9: 501–515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schmitt DP, Long AE, McPhearson A, et al. Personality and gender differences in global perspective. Int J Psychol 2017; 52: 45–56. [DOI] [PubMed] [Google Scholar]
  • 14.Michael J. Fox Foundation for Parkinson's Research. Parkinson’s disease causes [Internet]. 2018. [cited 2025 Apr 6]. Available from:https://www.michaeljfox.org/understandingparkinsons/living-with-pd.html.
  • 15.Gnerre M, Malaspina E, Di Tella S, et al. Vocal emotional expression in Parkinson’s disease: roles of sex and emotions. Societies 2023; 13: 57. [Google Scholar]
  • 16.Little M, McSharry P, Hunter E, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat Prec 2008; 1: 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Çetinkaya Tezer D, Tutuncu M, Akalin MA, et al. Myoclonus and tremor in chronic inflammatory demyelinating polyneuropathy: a multichannel electromyography analysis. Acta Neurol Belg 2022; 122: 1289–1296. [DOI] [PubMed] [Google Scholar]
  • 18.Sakar C, Serbes G, Gunduz A, et al. Parkinson’s disease classification [dataset]. UCI Mach Learn Repository 2018. DOI: 10.24432/C5MS4X [DOI] [Google Scholar]
  • 19.Asmae O, Abdelhadi R, Bouchaib C, et al. Parkinson’s disease identification using KNN and ANN algorithms based on voice disorder. In: 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET); 2020 April 1–6. DOI: 10.1109/IRASET48871.2020.9092228 [DOI] [Google Scholar]
  • 20.Das R. A comparison of multiple classification methods for diagnosis of Parkinson’s disease. Expert Syst Appl 2010; 37: 1568–1572. [Google Scholar]
  • 21.Kumar A, Gupta R. Futuristic study of criminal facial recognition: an open-source face image dataset. Science Talks 2023; 6: 100229. [Google Scholar]
  • 22.Shirvan RA, Tahami E. Voice analysis for detecting Parkinson’s disease using genetic algorithm and KNN classification method. In: 2011 18th Iranian Conference of Biomedical Engineering (ICBME). IEEE; 2011. [Google Scholar]
  • 23.Berus L, Klancnik S, Brezocnik Met al. et al. Classifying Parkinson’s disease based on acoustic measures using artificial neural networks. Sensors 2018; 19: 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Anisha CD, Arulanand N. Early prediction of Parkinson’s disease (PD) using ensemble classifiers. In: 2020 International Conference on Innovative Trends in Information Technology (ICITIIT). IEEE; 2020. [Google Scholar]
  • 25.Xiong Y, Lu Y. Deep feature extraction from the vocal vectors using sparse autoencoders for Parkinson’s classification. IEEE Access 2020; 8: 27821–27830. [Google Scholar]
  • 26.Masud M, Singh P, Gaba GS, et al. CROWD: crow search and deep learning-based feature extractor for classification of Parkinson’s disease. ACM Trans Internet Technol 2021; 21: 1–18. [Google Scholar]
  • 27.Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomed Signal Process Control 2021; 66:102452. [Google Scholar]
  • 28.Ahmed I, Aljahdali S, Khan MSet al. et al. Classification of Parkinson’s disease based on patient’s voice signal using machine learning. Intell Autom Soft Comput 2022; 32: 05. [Google Scholar]
  • 29.Pahuja G, Nagabhushan TN. A comparative study of existing machine learning approaches for Parkinson’s disease detection. IETE J Res 2021; 67: 4–14. [Google Scholar]
  • 30.Solana-Lavalle G, Rosas-Romero R. Analysis of voice as an assisting tool for detection of Parkinson’s disease and its subsequent clinical interpretation. Biomed Signal Process Control 2021; 66: 102415. [Google Scholar]
  • 31.Mohammed MA, Elhoseny M, Abdulkareem KH, et al. A multi-agent feature selection and hybrid classification model for Parkinson’s disease diagnosis. ACM Trans Multimed Comput Commun Appl 2021; 17: 1–22. [Google Scholar]
  • 32.Amato F, Borz`ı L, Olmo Get al. et al. An algorithm for Parkinson’s disease speech classification based on isolated words analysis. Health Inf Sci Syst 2021; 9: 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wrobel K. Diagnosing Parkinson’s disease by means of ensemble classification of patients’ voice samples. Procedia Comput Sci 2021; 192: 3905–3914. [Google Scholar]
  • 34.Ozturk S, Unal Y. A two-stage whale optimization method for classification of Parkinson’s disease voice recordings. Int J Intell Syst Appl Eng 2020; 8: 84–93. [Google Scholar]
  • 35.Cordella F, Paffi A, Pallotti A. Classification-based screening of Parkinson’s disease patients through voice signal. In: 2021 IEEE International Symposium on Medical Measurements and Applications (MeMeA); 2021. pp.1–6. [Google Scholar]
  • 36.Sharanyaa S, Renjith PN, Ramesh K. Classification of Parkinson’s disease using speech attributes with parametric and nonparametric machine learning techniques. In: 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS); 2020. pp.437–442. DOI:10.1109/ICISS49785.2020.93 16078. [Google Scholar]
  • 37.Kumar A, Kumar V, Singh JP, et al. Deep learning-based gland segmentation for enhanced analysis of colon histology images. In: International Conference on Paradigms of Communication, Computing and Data Analytics; 2024 Apr 20. pp. 285–292. DOI: 10.1007/978-981-97-8669-522. [DOI] [Google Scholar]
  • 38.Kumar A, Panwar AS. Human mental state detection using modified convolutional neural network with leaky rectified linear unit. In: 2024 IEEE Region 10 Symposium (TENSYMP); 2024 Sep 27. pp. 1–6. IEEE. DOI: 10.1109/TENSYMP61132.2024.10752185. [DOI] [Google Scholar]

Articles from Digital Health are provided here courtesy of SAGE Publications

RESOURCES