Skip to main content
. 2019 Feb 22;14(2):e0212665. doi: 10.1371/journal.pone.0212665

Fig 1. Experimental workflow.

Fig 1

Data was obtained from the CHOP NICU Sepsis Registry (NSR). Domain expert review was used to identify an initial feature set. Continuous data was normalized. Mean data imputation was used to complete missing data. Nested k-fold cross-validation, in which the complete dataset is divided into k stratified bins of approximately equal size (k = 10 in our study), was used to train and evaluate models. The curved arrows indicate loops over the data folds. The outer loop runs over all k folds. For each iteration, a fold is reserved for testing. The remaining k-1 folds are passed to the inner loop, which performs standard k-fold cross-validation to automatically select features and model tuning parameters. Mutual information between individual features and target class was used for automated feature selection. The model is then trained using k-1 folds and evaluated on the held-out fold. This process is repeated k times so that each data fold is used once for evaluation.