Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2025 Jul 8.

Published in final edited form as: Neuroinformatics. 2022 Aug 24;21(1):115–141. doi: 10.1007/s12021-022-09599-y

Fig. 5 — Hierarchical clustering on principal components of neuropsychological (T) scores for subject group identification using Ward’s D2 criterion (Murtagh & Legendre, 2014). PCA is used on the subject cognitive matrix $P \in R^{K \times L}$ to remove highly correlated continuous variables. Next, we apply hierarchical clustering using Ward’s D2 method on the distance matrix $D$ to select the clusters based on the height of the hierarchical tree. The distance matrix $D \in R^{K \times K}$ is computed using the dissimilarity measure such as the distance correlation (Székely, Rizzo, & Bakirov, 2007) of the PCs. The initial number of clusters $N_{C_{k}}$ is assessed according to the compactness metrics (Halkidi, Batistakis, & Vazirgiannis, 2002a, 2002b), and the cluster stability is evaluated using the Jaccard similarity index (Jaccard, 1912) via a nonparametric bootstrap technique with a number of repetitions $n = 1000$ (see detailed protocol in Supplementary Methods Section 3.1). We select significant clusters based on the approximately unbiased probability $p$ -values (Efron, Halloran, & Holmes, 1996), as shown in Fig. 6a. We provide the final clustering solution by applying the $K$ -means algorithm to the hierarchical clustering output.