TABLE 1.
Basic information of datasets included in this study.
| Dataset ID | Number of samples | Brief introduction about the dataset | Number of deaths | Number of relapses |
| Discovery set | 1042 | |||
| GSE19188 | 40 | A genome-wide gene expression analysis on early-stage NSCLC | 24 | – |
| GSE30219 | 85 | Identification of a group of metastatic-prone tumors in lung cancer according to “Off-context” gene expression defined by the authors | 45 | 27 (83) |
| GSE31210 | 226 | Gene expression analysis on pathological stage I–II lung adenocarcinomas | 35 | 64 (226) |
| GSE31546 | 16 | Development of an EGFR mutation gene expression signature to predict response and clinical outcome, and identification of genes associated with the EGFR-dependent phenotype | 2 | – |
| GSE37745 | 106 | Biomarker discovery in NSCLC | 77 | – |
| GSE50081 | 127 | Validation of a histology-independent prognostic gene signature for early-stage NSCLC, including stage IA patients | 51 | 37 (124) |
| GSE68465 | 442 | Gene expression-based survival prediction in LUAD | 236 | 178 (178) |
| Validation set | 535 | |||
| TCGA-LUAD | 535 | The LUAD cohort of TCGA, a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. | 187 | – |