Skip to main content
. 2022 Aug 17;149:105969. doi: 10.1016/j.compbiomed.2022.105969

Fig. 4.

Fig. 4

Comparison of classification metrics for different machine learning methods. Metrics are computed on test samples collected from December 26, 2021 through April 10, 2022, for models trained on samples from July 17 through December 25, 2021. The top row shows, at left, the accuracy of the Mild/Severe classification and, at right, the balanced F1-score, which is the harmonic mean of precision (true positives divided by all positive predictions) and recall (true positive rate, i.e., sensitivity). The middle row shows the precision for the Mild and Severe class predictions separately, and the bottom row shows the recall. Metrics are shown for models trained with country metadata used as a feature and without, as indicated in the labeled axes below, except for GPBoost, which takes into account the country metadata by using it as the groups of random effects. All models otherwise use age, gender, and each sequence position as a feature. Error bars show the standard deviations across three runs with different random number seeds, and in some cases are not visible. Statistics for GPBoost are computed based on the mean of the response. GPBoost and LightGBM/XGBoost including country as a feature consistently outperform other methods.