Table 2.
Description of supervised classification models trained and tested in this study
| Model namea | Classificationb | Data Typec | Algorithmd | # Virusese | # Obs yes/no (total)f | Important Featuresg |
|---|---|---|---|---|---|---|
| L1 | Lethality | Standard | gbm | 125 | 102/615 (717) | wt_loss, MBAA, AUC_6 |
| L1M | Lethality | Combined | gbm | 119* | 102/615 (717) | wt_loss, MBAA, HA-160T |
| LM | Lethality | Molecular | gbm | 119* | 102/615 (717) | HA-214V, HA-160T, HA-496R |
| M1 | Morbidity | Standard | Stacked | 125 | 176/539 (715) | AUC_6, temp_5, MBAA |
| M1M | Morbidity | Combined | gbm | 119* | 176/539 (715) | AUC_6, temp_5, MBAA |
| MM | Morbidity | Molecular | gbm | 119* | 176/539 (715) | HA-227S, PB2-271T,HA-228G |
| T1 | Transmission | Standard | rf | 96 | 213/262 (475) | AUC_6, slope1,3, HA-H5 |
| T1M | Transmission | Combined | rf | 94* | 213/262 (475) | PB2-627E, HA-21S, HA-138A |
| TM | Transmission | Molecular | rf | 94* | 213/262 (475) | PB2-627E, HA-138A, HA-21S |
| L1-H1N1 | Lethality | Standard | gbm | 2 | 3/85 (88) | NA |
| L1-sim | Lethality | Standard | gbm | 18^ | 900/2000 (2900) | NA |
| LM-pub | Lethality | Molecular | gbm | 78 | 47/388 (425) | NA |
| M1-H1N1 | Morbidity | Standard | Stacked | 2 | 21/67 (88) | NA |
| M1-sim | Morbidity | Standard | Stacked | 18^ | 847/2053 (2900) | NA |
| TM-pub | Transmission | Molecular | rf | 33 | 96/100 (196) | NA |
aAbbreviation of model evaluated in this study. All models were trained with internally generated data and tested with either internally-generated data (no qualifier), data published from a ferret transmission standardization exercise (H1N1), data simulated from internally-generated data (sim), or data aggregated from previously published literature (pub) (see methods). All features of each model are stated in Supplementary Data 1 and Supplemental Fig. 1.
bClassification was lethality (ferret surviving the 14-day p.i. observation period or not), morbidity (ferret lost >14.5% preinoculation body weight over 14 p.i. observation period or not), or transmission (ferret likely to transmit virus ≥50% of the time to a contact animal in a RDT setting or not).
cData type represents the scope of input data used to train or validate each model: standard (inclusive of in vivo-generated data and selected viral molecular information), molecular (inclusive of viral molecular inputs with no in vivo-generated data) or combined (all in vivo and molecular inputs).
dML algorithm governing the model: gradient boosting (gbm), random forest (rf), or ensemble of multiple models (Stacked, see methods).
eNumber of unique wild-type IAV in the source dataset for testing or training (*, not including dummy variable viruses; ^, proxy number of viruses based on unique combination of HA, RBS, and PA).
fPer-ferret yes/no observations based on classification model tested.
gTop three ranked for each model trained and tested with internally generated data. NA, models tested with external datasets.