Table 4:
Model performance using Frequency Tokenization Encoding trained using 80/20 split on dataset of size 450,000 (360,000 training, 90,000 testing). Top 10% and bottom 90% are measures of target LOINC frequency in testing dataset.
Accuracy (%) | F1 Score (Weighted) | Precision (Weighted) | Top 10% LOINC Weighted Precision n=74866 | Bottom 90% LOINC Weighted Precision n=15134 | |
Logistic Regression | 87.3 | 0.859 | 0.864 | 0.896 | 0.6854 |
Random Forest | 94.5 | 0.943 | 0.943 | 0.957 | 0.874 |
KNN | 88.95 | 0.877 | 0.878 | 0.911 | 0.714 |