Table 2.
Dataset |
Training/testing cases |
Number of diagnostic indicators |
|||
---|---|---|---|---|---|
King-Lu | Open-source random forest | Open-source tariff method s | InterVA-4 | ||
China |
1100 / 400 |
48 |
48 |
48 |
N/A |
Institute for Health Metrics and Evaluation |
1100 / 400 |
96 |
96 |
96 |
N/A |
Million Death Study |
1100 / 400 |
89 |
89 |
89 |
N/A |
|
1100 / 1100 |
89 |
89 |
89 |
N/A |
|
6100 / 6100a |
89 |
89 |
89 |
245 |
Agincourt |
1100 / 400 |
104 |
104 |
104 |
245b |
|
1100 / 1100 |
104 |
104 |
104 |
245 |
|
2900 / 2900 |
104 |
104 |
104 |
245 |
Matlab |
1100 / 400 |
224 |
224 |
224 |
245 |
|
1100 / 1100 |
224 |
224 |
224 |
245 |
1600 / 1600 | 224 | 224 | 224 | 245 |
Only the numbers of test cases are applicable for the InterVA-4 analyses, as this method does not require any training cases. Additionally, InterVA-4 requires the input of 245 diagnostic indicators, however as many of these were not available in the given datasets, the number of useable variables was lower than 245. aThe MDS dataset used for InterVA-4 contained 552 cases, in which we extracted additional InterVA-4 indicators from the narratives. bEach CCVA method ran 30 resamples for each training/testing split within each dataset, except InterVA-4, which used the following number of re-samples: 1 for MDS data; 8, 7, 6 for Agincourt data splits of 400, 1100, and 2900 test cases; and 10, 10, 10 for Matlab data splits of 400, 1100, and 1600 test cases, respectively.