Table 3.
Step-by-step output as numbers of qualifying genes along the LSTNR analytical pipeline for RNAseq data deposited in TCGA from four realizations of patient-derived breast cancer transcriptomes across four molecular subtypes (courtesy of Li and Bushel, 2016; 10.1186/s12864-016-2584-7).
| Criteria | Breast Cancer Molecular Subtypes (TCGA) (N = 200) | ||||
|---|---|---|---|---|---|
| Realization 1 (N = 50) | Realization 2 (N = 50) | Realization 3 (N = 50) | Realization 4 (N = 50) | All Specimens (N = 200) | |
| Genes with uniquely aligned reads | 20,532 | ||||
| Distribution of gene-wise RPM means | P(y) ~Weibull3P(y;α,ß,γ); y = RPM | ||||
| α = 25.4 RPM | α = 24.1 RPM | α = 25.0 RPM | α = 21.9 RPM | α = 22.2 RPM | |
| ß = 0.53 | ß = 0.53 | ß = 0.54 | ß = 0.49 | ß = 0.49 | |
| γ = 9.9 × 10−3 RPM | γ = 6.6 × 10−3 RPM | γ = 1.1 × 10−2 RPM | γ = 1.6 × 10−3 RPM | γ = 1.5 × 10−3 RPM | |
| Independent filtering: Genes with average y > α | 8,005 | 8,110 | 8,083 | 8,562 | 8,538 |
| Linearized normalizing transformant: GLM Linear Predictor | (y–γ)−1 | ||||
| Transformant two-way ANOVA: resolved genes across groups with respect to gene-wise mean | 4,295 | 381 | 638 | 2,281 | 2,851 |
| Resolution-Weighed ANOVA: Significant Genes (SGs) with FDR adj. p < 0.05 based on differences in resolution-weighed RPM log-fold changes (Log2FC) relative to baseline condition | 4,465 | 5,086 | 4,537 | 4,618 | 6193 |
| Altogether: 7,749 | Final Overlap: 1,509 | ||||
| Intersection: 1,617 | |||||
Differential expression: DEGs = subset of SGs that exhibit both:
|
3,736 | 3,377 | 3,497 | 3,617 | 6,093 |
| Altogether: 6,407 | Final Overlap: 908 | ||||
| Intersection: 976 | |||||
Reproducibility: LSTNRs = subset of SGs that exhibit both:
|
1,370 | 1,102 | 1,130 | 1,210 | 1,511 |
| Altogether: 2,193 | Final Overlap: 337 | ||||
| Intersection: 368 | |||||
| Expectable DEGs: DEGREEs = Ensembl-annotated DEGs with a reproducible expectation estimate (i.e., DEGs that are also LSTNRs) and official Entrez symbol | Intersection: 366 | 1,511 | |||
| Final Overlap: 336 | |||||
| Transcriptional profiling: Profiler DEGREEs = top DEGREEs ranked by retrospective statistical power with monotonically decreasing within-gene effect sizes ΔLog2FC | 200 Profiler DEGREEs (consensus) | ||||
| Diagnostic targets: Biomarkers = minimal subset of Profiler DEGREEs with predictive discriminant power based on sequential partition tree analysis (ROC scores > 0.9 per phenotype) | CBX7, ESR1, FOXC1, and FOXM1 | ||||