Fig. 2.
Schematic overview of the machine learning pipeline. Features were generated from seven sources and four different gold standards (GS) ranging from low (GS-1-out-of-12-HDF), moderate (GS-2-out-of-12-HDF), elevated (GS-3-out-of-12-HDF) to high (GS-4-out-of-12-HDF) stringency. These gold standards were used to train and validate four different classifiers. Predictions from the four classifiers were linearly combined and ranked yielding a combined (aggregated) classifier. The trained classifiers were validated based on cross validation results, the most important feature determined and a list of predicted HDF was given out.