Table 2.
Performance analysis of model-based reasoning methods applied for syndrome pattern diagnosis of lung disease based on Doc2Vec in the test and external data sets.
| Model and data set | Accuracy, mean (95% CI) | Precision, mean (95% CI) | Recall, mean (95% CI) | F1 score, mean (95% CI) | |
| Doc2Vec + RFa |
|
|
|
|
|
|
|
Test | 0.8320 (0.8198-0.8442) | 0.8457 (0.8345-0.8567) | 0.8320 (0.8198-0.8442) | 0.8337 (0.8217-0.8458) |
|
|
External | 0.8190 (0.8090-0.8310) | 0.8506 (0.8366-0.8610) | 0.8190 (0.8110-0.8323) | 0.8267 (0.8147-0.8397) |
| Doc2Vec + XGBoostb |
|
|
|
||
|
|
Test | 0.7584 (0.7444-0.7724) | 0.7682 (0.7602-0.7812) | 0.7584 (0.7504-0.7704) | 0.7589 (0.7499-0.7719) |
|
|
External | 0.7270 (0.719-0.7400) | 0.7735 (0.7645-0.7835) | 0.7270 (0.7130-0.7390) | 0.7391 (0.7261-0.7501) |
| Doc2Vec + KNNc |
|
|
|||
|
|
Test | 0.8527 (0.8407-0.8637) | 0.8588 (0.8488-0.8668) | 0.8527 (0.8407-0.8627) | 0.8535 (0.8425-0.8665) |
|
|
External | 0.8202 (0.8092-0.8282) | 0.8246 (0.8116-0.8326) | 0.8220 (0.8090-0.8331) | 0.8215 (0.8105-0.8295) |
| Doc2Vec +SVMd |
|
|
|||
|
|
Test | 0.6748 (0.6628-0.6848) | 0.7424 (0.7334-0.7504) | 0.6748 (0.6668-0.6858) | 0.7577 (0.7467-0.7667) |
|
|
External | 0.5820 (0.5700-0.5950) | 0.5743 (0.5663-0.5883) | 0.5920 (0.5830-0.6033) | 0.5288 (0.5168-0.5388) |
| Doc2Vec + MLPe |
|
|
|||
|
|
Test | 0.8840 (0.8730-0.8970) | 0.8876 (0.8776-0.8976) | 0.8840 (0.8710-0.8932) | 0.8843 (0.8753-0.8973) |
|
|
External | 0.8760 (0.8620-0.8890) | 0.8897 (0.8757-0.9027) | 0.8760 (0.8630-0.8851) | 0.8791 (0.8701-0.8921) |
aRF: random forest.
bXGBoost: extreme gradient boosting.
cKNN: K nearest neighbor.
dSVM: support vector machine.
eMLP: multilayer perceptron.