Table 4.
Classifier Performances for the Model Generalizability Analysis Experiments
Model | Precision/PPV | Recall/sensitivity | F 1 score | 95% CI |
---|---|---|---|---|
Training on AHS and testing on PHS | ||||
RoBERTa | 0.96 | 0.83 | 0.89 | 0.87–0.91 |
SVM | 0.99 | 0.79 | 0.88 | 0.86–0.90 |
Training on PHS and testing on AHS | ||||
RoBERTa | 0.84 | 0.83 | 0.83 | 0.79–0.87 |
SVM | 0.74 | 0.95 | 0.83 | 0.79–0.87 |
The metrics are precision, recall, and F 1 score for the positive class. For RoBERTa, the sliding window setting described in the article is used. The 95% CIs were computed via bootstrap resampling. AHS indicates adult health care system; PHS, pediatric health care system; PPV, positive predictive value; RoBERTa, a robustly optimized transformer‐based model for language understanding; and SVM, support vector machine.