Skip to main content
. 2018 May 16;8:7663. doi: 10.1038/s41598-018-25999-0

Figure 1.

Figure 1

Illustration of (a) feature vector extraction from a given keystroke dynamics variable of a typing session and (b) classification pipeline of each subject based on hold time (HT), normalised flight time (NFT) and pressure (NP) information. (a) Given a keystroke dynamics variable sequence an, a ∈ {HT, NFT, NP}: (1) The sequence is split in subsequences ani using 15-seconds non-overlapping time windows; (2) For each subsequence, the first- up to fourth-order statistical moments (mean μi, standard deviation σi, kurtosis Ki, and skewness Si) of the elements are computed; (3) The probability density function (PDF) fi(x) of each subsequence is estimated through kernel density estimation (KDE) and the matrix of sample covariance C(i, j) between the PDFs of all subsequences is calculated. Feature vectors va representing each typing session are formed by the mean and standard deviation (std) σ of the moments extracted in (2), across time windows (subsequences), and the mean, std and sum of absolute values of the upper triangle CU(i, j) of the covariance matrix calculated in (3). (b) The proposed two-stage multi-model pipeline for classifying subjects as PD patients or healthy controls: (1st Stage) Feature vector sets {va} of a given subject, with each vector representing a typing session, serve as input to three trained models Ma, each one dedicated to a keystroke dynamics variable, a ∈ {NFT, HT, NP}. Models Ma yield three prediction probabilities Pa which are then grouped in new feature vectors vP; (2nd Stage) Feature vector set {vP} serves as input to a Logistic Regression classifier CLR that outputs the final classification probabilities {Pf} denoting whether each typing session belongs to a PD patient or a healthy control. Finally, the mean of prediction probabilities Pf is used to categorise the subject as PD patient or healthy control.