Table 2.
Original gene number | Model | Intended application |
Discovery datasets |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Population | HIV status | Setting | Approach | Tuberculosis cases | Controls | Total | ||||
Anderson39.LTBI26 | 42 | Disease risk score | Tuberculosis vs LTBI | Children | Positive or negative | South Africa, Malawi | Elastic net using genome-wide data | 87 | 43 | 130 |
Anderson39.OD26 | 51 | Disease risk score | Tuberculosis vs OD | Children | Positive or negative | South Africa, Malawi | Elastic net using genome-wide data | 87 | 134 | 221 |
BATF227 | 1 | NA | Tuberculosis vs HC (acute vs convalescent) | Adults | Negative | UK | SVM using genome-wide data | 46 | 31 | 77 |
Duffy1016 | 10 | SVM (linear kernel) | Tuberculosis vs LTBI and OD | Adults | Positive or negative | South Africa | Multinomial random forest using genome-wide data | 93 | 207 | 300 |
Gjoen828 | 7 | LASSO regression | Tuberculosis vs HC and OD | Children | Negative | India | LASSO using 198 pre-selected genes | 47 | 36 | 83 |
Gliddon329 | 3 | (FCGR1A + C1QB) − (ZNF296) | Tuberculosis vs LTBI | Adults | Positive or negative | South Africa, Malawi | FS-PLS using genome-wide data | NS | NS | 285 |
Gliddon429 | 4 | (GBP6 + PRDM1) − (TMCC1 + ARG1) | Tuberculosis vs OD | Adults | Positive or negative | South Africa, Malawi | FS-PLS using genome-wide data | NS | NS | 293 |
Huang1130 | 13 | SVM (linear kernel) | Tuberculosis vs HC and OD | Adults | Negative | UK | Common genes from elastic net, L1/2 and LASSO models, using genome-wide data | 16 | 79 | 95 |
Kaforou2531 | 27 | Disease risk score | Tuberculosis vs LTBI | Adults | Positive or negative | South Africa, Malawi | Elastic net using genome-wide data | NS | NS | 285 |
Kaforou3931 | 44 | Disease risk score | Tuberculosis vs OD | Adults | Positive or negative | South Africa, Malawi | Elastic net using genome-wide data | NS | NS | 293 |
Kaforou4531 | 53 | Disease risk score | Tuberculosis vs LTBI and OD | Adults | Positive or negative | South Africa, Malawi | Elastic net using genome-wide data | NS | NS | NS |
Maertzdorf432 | 4 | Random forest | Tuberculosis vs HC | Adults | Negative | India | Random forest using 360 selected target genes | 113 | 76 | 189 |
NPC233 | 1 | NA | Tuberculosis vs HC and LTBI | Adults | NS | Brazil | Differential expression using genome-wide data | 6 | 28 | 34 |
Penn-Nicholson617 | 6 | Difference of means | Incipient tuberculosis vs HC | Adolescents | Negative | South Africa | SVM-based gene pair models using genome-wide data | 46 | 107 | 153 |
Qian1734 | 17 | Sum of standardised expression | Tuberculosis vs HC and OD | Adults | Negative | UK | Differential expression of Nrf2-mediated genes | 16 | 69 | 85 |
Rajan535 | 5 | Unsigned sums | Tuberculosis vs HC (screening among PLHIV) | Adults | Positive | Uganda | Differential expression using genome-wide data | NS | NS | 80 (1:2 cases:controls) |
Roe313 | 3 | SVM (linear kernel) | Incipient tuberculosis vs HC | Adults | Negative | UK | Stability selection using genome-wide data | 46 | 31 | 77 |
Roe427 | 4 | SVM (linear kernel) | Tuberculosis vs OD | Adults | Negative | UK | SVM using genome-wide data | 23 | 35 | 58 |
Roe527 | 5 | SVM (linear kernel) | Tuberculosis vs HC and OD | Adults | Negative | UK | SVM using genome-wide data | 23 | 50 | 73 |
Singhania2036 | 20 | Modified disease risk score | Tuberculosis vs HC and OD | Adults | Negative | UK, South Africa | Random forest using modular approach | NS | NS | NS |
Suliman237 | 2 | ANKRD22 −OSBPL10 | Incipient tuberculosis vs HC | Adults | Negative | The Gambia, South Africa | Pair ratios algorithm using genome-wide data | 79 | 328 | 407 |
Suliman437 | 4 | (GAS6 + SEPT4) – (CD1C + BLK) | Incipient tuberculosis vs HC | Adults | Negative | The Gambia, South Africa, Ethiopia | Pair ratios algorithm using genome-wide data | 45 | 141 | 186 |
Sweeney338 | 3 | (GBP5 + DUSP3)/2 −KLF2 | Tuberculosis vs LTBI and OD | Adults | Positive or negative | Meta-analysis of South Africa, Malawi, UK, France, USA | Significance thresholding and forward search in genome-wide data | 296 | 727 | 1023 |
Walter4639 | 51 | SVM (linear kernel) | Tuberculosis vs LTBI | Adults | Negative | USA | SVM using genome-wide data | 24 | 24 | 48 |
Walter3239 | 47 | SVM (linear kernel) | Tuberculosis vs OD | Adults | Negative | USA | SVM using genome-wide data | 24 | 24 | 48 |
Walter10139 | 119 | SVM (linear kernel) | Tuberculosis vs LTBI and OD | Adults | Negative | USA | SVM using genome-wide data | 24 | 48 | 72 |
Zak1640 | 16 | SVM (linear kernel) | Incipient tuberculosis vs HC | Adolescents | Negative | South Africa | SVM-based gene pair models using genome-wide data | 37 | 77 | 114 |
Signatures were identified by systematic literature review and included for analysis. Signature names represent the first author's name of the corresponding publication, suffixed with the number of constituent genes that are present in the current RNAseq dataset. Both Anderson signatures resulted in the same number of final genes; these signatures were therefore additionally appended with the comparator control group. Details on how models were recreated are in appendix 1 (pp 2-4). LTBI=latent tuberculosis infection. OD=other diseases. NA=not applicable. HC=healthy controls. SVM=support vector machine. LASSO=least absolute shrinkage and selection operator. FS-PLS=forward selection-partial least squares. NS=not specified. Nrf2=nuclear factor, erythroid 2-like 2. PLHIV=people living with HIV.