Skip to main content
. Author manuscript; available in PMC: 2025 Jul 8.
Published in final edited form as: Neuroinformatics. 2022 Aug 24;21(1):115–141. doi: 10.1007/s12021-022-09599-y

Fig. 5.

Fig. 5

Hierarchical clustering on principal components of neuropsychological (T) scores for subject group identification using Ward’s D2 criterion (Murtagh & Legendre, 2014). PCA is used on the subject cognitive matrix PRK×L to remove highly correlated continuous variables. Next, we apply hierarchical clustering using Ward’s D2 method on the distance matrix D to select the clusters based on the height of the hierarchical tree. The distance matrix DRK×K is computed using the dissimilarity measure such as the distance correlation (Székely, Rizzo, & Bakirov, 2007) of the PCs. The initial number of clusters NCk is assessed according to the compactness metrics (Halkidi, Batistakis, & Vazirgiannis, 2002a, 2002b), and the cluster stability is evaluated using the Jaccard similarity index (Jaccard, 1912) via a nonparametric bootstrap technique with a number of repetitions n=1000 (see detailed protocol in Supplementary Methods Section 3.1). We select significant clusters based on the approximately unbiased probability p-values (Efron, Halloran, & Holmes, 1996), as shown in Fig. 6a. We provide the final clustering solution by applying the K-means algorithm to the hierarchical clustering output.