Skip to main content
. 2024 Aug 1;7:927. doi: 10.1038/s42003-024-06629-0

Table 2.

Description of supervised classification models trained and tested in this study

Model namea Classificationb Data Typec Algorithmd # Virusese # Obs yes/no (total)f Important Featuresg
L1 Lethality Standard gbm 125 102/615 (717) wt_loss, MBAA, AUC_6
L1M Lethality Combined gbm 119* 102/615 (717) wt_loss, MBAA, HA-160T
LM Lethality Molecular gbm 119* 102/615 (717) HA-214V, HA-160T, HA-496R
M1 Morbidity Standard Stacked 125 176/539 (715) AUC_6, temp_5, MBAA
M1M Morbidity Combined gbm 119* 176/539 (715) AUC_6, temp_5, MBAA
MM Morbidity Molecular gbm 119* 176/539 (715) HA-227S, PB2-271T,HA-228G
T1 Transmission Standard rf 96 213/262 (475) AUC_6, slope1,3, HA-H5
T1M Transmission Combined rf 94* 213/262 (475) PB2-627E, HA-21S, HA-138A
TM Transmission Molecular rf 94* 213/262 (475) PB2-627E, HA-138A, HA-21S
L1-H1N1 Lethality Standard gbm 2 3/85 (88) NA
L1-sim Lethality Standard gbm 18^ 900/2000 (2900) NA
LM-pub Lethality Molecular gbm 78 47/388 (425) NA
M1-H1N1 Morbidity Standard Stacked 2 21/67 (88) NA
M1-sim Morbidity Standard Stacked 18^ 847/2053 (2900) NA
TM-pub Transmission Molecular rf 33 96/100 (196) NA

aAbbreviation of model evaluated in this study. All models were trained with internally generated data and tested with either internally-generated data (no qualifier), data published from a ferret transmission standardization exercise (H1N1), data simulated from internally-generated data (sim), or data aggregated from previously published literature (pub) (see methods). All features of each model are stated in Supplementary Data 1 and Supplemental Fig. 1.

bClassification was lethality (ferret surviving the 14-day p.i. observation period or not), morbidity (ferret lost >14.5% preinoculation body weight over 14 p.i. observation period or not), or transmission (ferret likely to transmit virus ≥50% of the time to a contact animal in a RDT setting or not).

cData type represents the scope of input data used to train or validate each model: standard (inclusive of in vivo-generated data and selected viral molecular information), molecular (inclusive of viral molecular inputs with no in vivo-generated data) or combined (all in vivo and molecular inputs).

dML algorithm governing the model: gradient boosting (gbm), random forest (rf), or ensemble of multiple models (Stacked, see methods).

eNumber of unique wild-type IAV in the source dataset for testing or training (*, not including dummy variable viruses; ^, proxy number of viruses based on unique combination of HA, RBS, and PA).

fPer-ferret yes/no observations based on classification model tested.

gTop three ranked for each model trained and tested with internally generated data. NA, models tested with external datasets.