Table 2.
Factors Contributing to Improved Performance of Pathway-Based Prediction Models
Methods | R2 | p-value | |
---|---|---|---|
Wang et al. (2005) data set | Pathway-based SPCA | 0.304 | 1.33 × 10−7 |
(1) No gene screening in GO | 0.122 | 5.79 × 10−2 | |
(2) No supergene screening | 0.261 | 2.15 × 10−6 | |
(3) No K-means clusters | 0.300 | 1.66 × 10−7 | |
Miller et al. (2005) data set | Pathway-based SPCA | 0.16 | 2.97 × 10−3 |
(1) No gene screening in GO | 0.111 | 2.42 × 10−2 | |
(2) No supergene screening | 0.158 | 3.12 × 10−3 | |
(3) No K-means clusters | 0.160 | 3.01 × 10−3 |
For each of the contributing factors (1)–(3), pathway-based SPCA model was re-run, eliminating steps in the model corresponding to each factor, but keeping all other steps unchanged. In (1), instead of using first Supervised PCA score as supergenes, we used PCA score as supergenes in the model re-fitting; in (2), all supergenes were used to estimate the first principal component score in the final step of the Supervised PCA model; in (3), the genes not assigned to any gene category were removed. The results show that the within-category feature selection is the most critical step for the superior prediction performance of pathway-based models. In contrast, selection of supergenes affected prediction performance only slightly, and dropping genes not assigned to a definite gene category had little impact on prediction performance.