Table 3.
Top ranking features from the RF model trained on the genes-only dataset (Left). Top ranking features with p-values less than 0.05 and their importance scores discovered by our genes model (Left). Side-paired differential expression (fold change) analysis results of TPM values for the same features (Right) Wilcoxon-rank sum test was used to calculate p-values and FDR (Benjamini & Hochberg)
| Model Feature Importance Metrics | Differential Expression | |||||||
|---|---|---|---|---|---|---|---|---|
| Rank | Ensemble ID_Gene ID | Importance Score | Log Importance Score | p-value | Log2 FC | p-value | FDR | Greater Expr. Side |
| 1 | ENSG00000159182_PRAC1 | 0.12 | -2.16 | 0 | -2.86 | 5.08E-21 | 3.10E-19 | Left |
| 2 | ENSG00000159184_HOXB13 | 0.07 | -2.7 | 0 | -1.78 | 1.78E-11 | 2.18E-10 | Left |
| 3 | ENSG00000144451_SPAG16 | 0.05 | -3.04 | 0 | -0.64 | 9.00E-09 | 4.99E-08 | Left |
| 4 | ENSG00000198353_HOXC4 | 0.05 | -3.08 | 0 | 1.88 | 2.11E-15 | 6.43E-14 | Right |
| 5 | ENSG00000184719_RNLS | 0.03 | -3.36 | 0 | -0.89 | 2.85E-10 | 2.48E-09 | Left |
| 6 | ENSG00000145649_GZMA | 0.03 | -3.42 | 0 | 1.54 | 4.75E-07 | 1.93E-06 | Right |
| 7 | ENSG00000197757_HOXC6 | 0.02 | -3.84 | 0 | 1.28 | 5.02E-12 | 1.02E-10 | Right |
| 8 | ENSG00000162409_PRKAA2 | 0.02 | -3.87 | 0 | -1.2 | 3.90E-10 | 2.97E-09 | Left |
| 9 | ENSG00000037965_HOXC8 | 0.02 | -3.87 | 0 | 1.32 | 7.90E-12 | 1.20E-10 | Right |
| 10 | ENSG00000147457_CHMP7 | 0.02 | -3.92 | 0 | 0.35 | 8.44E-06 | 2.71E-05 | Right |
| 11 | ENSG00000165548_TMEM63C | 0.02 | -3.99 | 0 | -0.83 | 5.31E-05 | 0.000147 | Left |
| 12 | ENSG00000203880_PCMTD2 | 0.02 | -4.01 | 0 | -0.43 | 5.16E-08 | 2.62E-07 | Left |
| 13 | ENSG00000119397_CNTRL | 0.02 | -4.08 | 0.01 | 0.27 | 1.84E-06 | 7.00E-06 | Right |
| 14 | ENSG00000103485_QPRT | 0.02 | -4.09 | 0.01 | -1.01 | 2.20E-10 | 2.24E-09 | Left |
| 15 | ENSG00000170677_SOCS6 | 0.02 | -4.13 | 0.03 | 0.39 | 4.14E-06 | 1.40E-05 | Right |