Table 1.
Training γ | Number of selected features d | Training k | Training SW | Training similarity index (%) | Testing SW | Testing similarity index (%) | |
---|---|---|---|---|---|---|---|
CV Fold 1 | 325 | 7 | 3 | 0.15 | 70.71 | 0.15 | 69.10 |
CV Fold 2 | 331 | 7 | 3 | 0.14 | 67.36 | 0.14 | 79.80 |
CV Fold 3 | 336 | 6 | 3 | 0.18 | 71.30 | 0.18 | 69.08 |
CV Fold 4 | 340 | 6 | 4 | 0.14 | 96.69 | 0.14 | 69.92 |
CV Fold 5 | 333 | 6 | 3 | 0.16 | 66.20 | 0.16 | 76.26 |
For each of the m imputed data set of baseline features, observations were split into five cross-validation (CV) folds, where the observations in each fold were separately used as a test set and the remainder comprised the training set. The table reports the averaged parameter values for each CV fold across the m imputed data sets. Training observations were used to train the generalized low-rank models (GLRM) wrapper for the L1-regularization hyperparameter γ (Column 1), and consequently the number of non-zero weighted features, d (Column 2). Training and test sets with d features were individually clustered using the partitioning around medoids (PAM) algorithm and the number of clusters that yielded the maximum average silhouette width (SW) was recorded for the training set and testing sets, (training, Columns 3 and 4; test, Column 6). To assess how well pairs of observations were clustered together in each training or test set versus the full set of observations, the pairwise similarity index was calculated for all training and test sets (training, Column 5; testing, column 7).