Table 1. Sensitivity and specificity of COMET to type known sequences with varying levels of noise.
(a1) PURE sensitivities | Noise | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fragment size | n | 0.0% | 2.0% | 4.0% | 6.0% | 8.0% | 10.0% | 12.0% | 14.0% | 16.0% | 18.0% | 20.0% |
100 | 494 256 | 98.4% | 96.5% | 92.6% | 87.2% | 80.8% | 72.9% | 63.1% | 53.6% | 44.2% | 38.3% | 32.3% |
200 | 242 235 | 99.9% | 99.8% | 99.0% | 97.5% | 94.9% | 89.5% | 82.1% | 71.0% | 61.3% | 52.9% | 44.1% |
400 | 128 604 | 100.0% | 100.0% | 100.0% | 99.9% | 98.9% | 97.6% | 93.1% | 87.0% | 78.1% | 68.8% | 57.2% |
600 | 87 402 | 100.0% | 100.0% | 100.0% | 100.0% | 99.9% | 98.9% | 96.3% | 91.9% | 86.2% | 75.2% | 63.5% |
800 | 65 352 | 100.0% | 100.0% | 100.0% | 100.0% | 99.8% | 99.5% | 97.1% | 95.4% | 89.5% | 78.4% | 66.6% |
1200 | 43 008 | 100.0% | 100.0% | 100.0% | 100.0% | 99.5% | 100.0% | 99.1% | 97.4% | 92.9% | 83.4% | 70.3% |
1600 | 29 841 | 100.0% | 100.0% | 100.0% | 100.0% | 99.5% | 100.0% | 99.2% | 97.8% | 93.7% | 84.8% | 71.0% |
(a2) PURE specificities | Noise | |||||||||||
100 | 494 256 | 93.8% | 93.8% | 93.7% | 93.6% | 93.5% | 93.5% | 93.5% | 93.5% | 93.5% | 93.7% | 93.8% |
200 | 242 235 | 96.9% | 96.1% | 95.2% | 94.6% | 94.2% | 93.9% | 93.7% | 93.6% | 93.6% | 93.7% | 93.9% |
400 | 128 604 | 98.6% | 97.9% | 97.0% | 96.1% | 95.4% | 94.7% | 94.3% | 94.0% | 93.8% | 93.8% | 94.0% |
600 | 87 402 | 99.3% | 98.8% | 98.1% | 97.2% | 96.3% | 95.5% | 94.8% | 94.3% | 94.1% | 94.0% | 94.2% |
800 | 65 352 | 99.5% | 99.2% | 98.6% | 97.9% | 97.0% | 96.0% | 95.3% | 94.7% | 94.3% | 94.1% | 94.3% |
1200 | 43 008 | 99.8% | 99.6% | 99.3% | 98.8% | 98.1% | 97.0% | 96.2% | 95.4% | 94.8% | 94.4% | 94.5% |
1600 | 29 841 | 99.9% | 99.7% | 99.5% | 99.2% | 98.7% | 97.8% | 96.8% | 96.0% | 95.2% | 94.6% | 94.7% |
(b1) CRF sensitivities | Noise | |||||||||||
100 | 494 256 | 91.0% | 85.4% | 77.2% | 66.7% | 56.1% | 46.1% | 37.0% | 28.7% | 22.1% | 16.6% | 12.6% |
200 | 242 235 | 97.6% | 95.6% | 90.7% | 83.7% | 75.3% | 64.7% | 52.2% | 41.6% | 31.2% | 21.9% | 15.8% |
400 | 128 604 | 99.5% | 99.1% | 97.3% | 93.4% | 88.2% | 78.3% | 66.6% | 54.4% | 41.3% | 28.9% | 19.7% |
600 | 87 402 | 99.7% | 99.7% | 98.9% | 96.4% | 92.6% | 84.3% | 74.4% | 61.4% | 46.9% | 32.6% | 21.3% |
800 | 65 352 | 99.9% | 99.8% | 99.5% | 97.0% | 94.5% | 87.6% | 78.3% | 64.8% | 48.7% | 34.2% | 22.8% |
1200 | 43 008 | 99.8% | 99.9% | 100.0% | 97.3% | 96.1% | 91.2% | 82.9% | 69.0% | 52.8% | 36.9% | 23.0% |
1600 | 29 841 | 100.0% | 100.0% | 100.0% | 97.7% | 97.8% | 91.9% | 84.7% | 71.7% | 53.8% | 38.1% | 25.4% |
(b2) CRF specificities | Noise | |||||||||||
100 | 494 256 | 99.7% | 99.7% | 99.6% | 99.6% | 99.5% | 99.4% | 99.4% | 99.4% | 99.3% | 99.3% | 99.3% |
200 | 242 235 | 99.9% | 99.9% | 99.8% | 99.8% | 99.7% | 99.7% | 99.7% | 99.6% | 99.6% | 99.6% | 99.6% |
400 | 128 604 | 99.9% | 99.9% | 99.9% | 99.9% | 99.9% | 99.8% | 99.8% | 99.8% | 99.7% | 99.7% | 99.7% |
600 | 87 402 | 100.0% | 100.0% | 99.9% | 99.9% | 99.9% | 99.9% | 99.8% | 99.8% | 99.8% | 99.8% | 99.8% |
800 | 65 352 | 100.0% | 100.0% | 99.9% | 99.9% | 99.9% | 99.9% | 99.9% | 99.9% | 99.8% | 99.8% | 99.8% |
1200 | 43 008 | 100.0% | 100.0% | 100.0% | 99.9% | 99.9% | 99.9% | 99.9% | 99.9% | 99.8% | 99.8% | 99.8% |
1600 | 29 841 | 100.0% | 100.0% | 100.0% | 100.0% | 99.9% | 99.9% | 99.9% | 99.9% | 99.9% | 99.8% | 99.8% |
A synthetic data set was generated from reference sequences from the LANL HIV database, by randomly introducing mutations throughout the genome (‘noise’). The sensitivity and specificity of COMET was calculated for varying degrees of noise (0–20%) introduced into PURE subtypes (A–J), (Tables a1 and a2) or CRFs (CRF01_AE-CRF49_cpx) (Tables b1 and b2). Sequences of different lengths were submitted to COMET.