. 2023 Nov 28;24:718. doi: 10.1186/s12864-023-09788-2

Table 2.

The performance of various models for discriminating clustered strains from non-clustered strains in the lineage2 cohort

Parameters	Training set (n = 3595, 2081 clustered strains, 1514 non-clustered strains)	Test set (n = 1541, 918 clustered strains, 623 non-clustered strains)
	Random Forest	Gradient Boosted Classification Tree	Random Forest	Gradient Boosted Classification Tree
Kappa	0.641	0.613	0.454	0.442
AUC (95% CI)	0.908 (0.899, 0.917)	0.877 (0.866, 0.888)	0.791 (0.771, 0.811)	0.778 (0.757, 0.799)
Sensitivity (95% CI)	0.873 (0.862, 0.884)	0.836 (0.824, 0.848)	0.786 (0.766, 0.806)	0.807 (0.787, 0.827)
Specificity (95% CI)	0.762 (0.748, 0.776)	0.779 (0.765, 0.793)	0.666 (0.642, 0.690)	0.628 (0.604, 0.652)
PPV (95% CI)	0.837 (0.825, 0.849)	0.845 (0.833, 0.857)	0.771 (0.750, 0.792)	0.741 (0.719, 0.763)
NPV (95% CI)	0.811 (0.798, 0.824)	0.767 (0.753, 0.781)	0.686 (0.663, 0.709)	0.712 (0.689, 0.735)
PLR (95% CI)	4.437 (4.415, 4.459)	3.625 (3.597, 3.653)	2.451 (2.402, 2.50)	2.571 (2.528, 2.614)
NIR (95% CI)	0.225 (0.15, 0.30)	0.276 (0.198, 0.354)	0.408 (0.313, 0.503)	0.389 (0.301, 0.477)
Accuracy (95% CI)	0.827 (0.815, 0.839)	0.813 (0.8, 0.826)	0.737 (0.715, 0.759)	0.730 (0.708, 0.752)

Parameters

Training set
(n = 3595, 2081 clustered strains,
1514 non-clustered strains)

Test set
(n = 1541, 918 clustered strains,
623 non-clustered strains)

Random Forest

Gradient Boosted Classification Tree

Random Forest

Gradient Boosted Classification Tree

Kappa

0.641

0.613

0.454

0.442

AUC

(95% CI)

0.908

(0.899, 0.917)

0.877

(0.866, 0.888)

0.791

(0.771, 0.811)

0.778

(0.757, 0.799)

Sensitivity

(95% CI)

0.873

(0.862, 0.884)

0.836

(0.824, 0.848)

0.786

(0.766, 0.806)

0.807

(0.787, 0.827)

Specificity

(95% CI)

0.762

(0.748, 0.776)

0.779

(0.765, 0.793)

0.666

(0.642, 0.690)

0.628

(0.604, 0.652)

PPV

(95% CI)

0.837

(0.825, 0.849)

0.845

(0.833, 0.857)

0.771

(0.750, 0.792)

0.741

(0.719, 0.763)

NPV

(95% CI)

0.811

(0.798, 0.824)

0.767

(0.753, 0.781)

0.686

(0.663, 0.709)

0.712

(0.689, 0.735)

PLR

(95% CI)

4.437

(4.415, 4.459)

3.625

(3.597, 3.653)

2.451

(2.402, 2.50)

2.571

(2.528, 2.614)

NIR

(95% CI)

0.225

(0.15, 0.30)

0.276

(0.198, 0.354)

0.408

(0.313, 0.503)

0.389

(0.301, 0.477)

Accuracy

(95% CI)

0.827

(0.815, 0.839)

0.813

(0.8, 0.826)

0.737

(0.715, 0.759)

0.730

(0.708, 0.752)

AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; PLR, positive likelihood ratio; NLR, negative likelihood ratio; CI, confidence