Skip to main content
. 2024 Jan 3;10(1):eadi0282. doi: 10.1126/sciadv.adi0282

Table 2. Performance across datasets.

Average ROC-AUC score, sensitivity, and specificity with SD across five folds using different training data (coughs from all devices versus coughs from smartphone) and test sets (T1, T2, and T3). Two variations of the classifier are tested on three different test sets: T1: subject balanced passive cough dataset (for gender and number of subjects) and used for fivefold training and testing of the classifier; T2: expanded T1 consisting of all non-TB subjects and TB cough data not included for training the fivefold classifier; T3: a voluntary cough dataset consisting of coughs from TB and non-TB subjects. Table S2 represents aggregated result from multiple folds using bootstrapping.

Model training
parameters
Test set ROC-AUC score
(average of
5 folds ± SD
across folds)
Sensitivity
(average of 5
folds ± SD
across folds,
threshold = 0.5)
Specificity
(average of
5 folds ± SD
across folds,
threshold = 0.5)
Sensitivity at
70% specificity
(average of
5 folds ± SD
across folds)
Combined ROC-
AUC score of 5
folds (average
after combining
results from
all five folds
(DeLong’s CI)
TBscreen Device: All Scalo
gram: 10 Hz to
4 kHz Sampling
rate: 44.1 kHz
T1: Subject
balanced CV
0.79 ± 0.06 0.70 ± 0.11 0.71 ± 0.10 0.72 ± 0.10 0.80 (0.79–0.80)
T2: Expanded T1,
unbalanced set
0.82 ± 0.03 0.74 ± 0.02 0.72 ± 0.10 0.76 ± 0.04 0.80 (0.79–0.80)
T3: Voluntary
cough, unbal
anced set
0.64 ± 0.05 0.34 ± 0.13 0.81 ± 0.12 0.47 ± 0.06 0.64 (0.62–0.66)
TBscreen trained/
evaluated on
coughs from
smartphone
Device: Smart
phone Scalogram:
10 Hz to 4 kHz
Sampling rate:
44.1 kHz
T1 subset: Sub
ject balanced CV
0.83 ± 0.11 0.76 ± 0.12 0.74 ± 0.10 0.76 ± 0.20 0.85 (0.84–0.85)
T2 subset:
Expanded T1,
unbalanced set
0.86 ± 0.03 0.80 ± 0.03 0.74 ± 0.10 0.83 ± 0.05 0.86 (0.85–0.87)
T3 subset:
Voluntary cough,
unbalanced set
0.61 ± 0.14 0.16 ± 0.11 0.95 ± 0.05 0.51 ± 0.18 0.66 (0.62–0.70)