Skip to main content
. 2020 Feb 28;21:76. doi: 10.1186/s12859-020-3423-z

Fig. 1.

Fig. 1

Workflow for pathway-level models. In this study, TCGA was used as the source of gene expression data and MSigDB as the source of pathway definitions. The first step of the workflow converts the gene-level expression data matrix into pathway-level variables via the unsupervised single sample gene set method GSVA. After obtaining a pathway-level data matrix, nested cross validation was used to train and evaluate a Lasso-penalized Cox proportional hazards model. Cross validation was employed both for the training vs. test split and within each training fold for selection of the Lasso penalty parameter. With the selected pathways and estimated parameters, we performed prediction on the test data subset by applying the Cox proportional hazards regression model that had been identified in the training data subset