Skip to main content
. 2019 Sep 10;20:465. doi: 10.1186/s12859-019-3010-3

Fig. 5.

Fig. 5

Quantification of regressor performance. a Mean squared error (MSE) in dependence of the training fraction. The training fraction is shown in percent of the total number of samples (N = 1132); the testing fraction consists of the remaining samples. For all regressors (color code), the average (solid lines) and the standard deviation (shaded regions) were computed from 100 repetitions, each with a different random split into training and testing fraction. The regressors are described in the main text. b Average receiver operating characteristics (ROC) graphs. For all regressors (same color code), an average was computed from 100 ROC graphs, each computed using a random split of the samples into two equal parts for training and testing. The points indicate the position of highest accuracy (zoom in inset). The dashed diagonal line indicates the ROC graph of a random classifier. c Accuracy and false positive rate for the classification with the Extra Trees regressor. The expected classification threshold at a rating of 4.5 (vertical line), which was defined in the manual rating process, is close to the maximum of the classification accuracy. d Visualization of the Extra Trees performance in dependence of the training set size. The training set was randomly split into a testing fraction of 200 samples and a training fraction. From the training fraction, 33%, 67%, or 100% were used for training the Extra Trees regressor which was then applied to the testing fraction with the resulting ratings rounded to integer values. The area of each circle represents the number of samples rated with the Extra Trees regressor normalized to the number of curves per manual rating. Colors represent the manual rating. The MSE and the ROC classification accuracy (threshold at 4.5) are shown in the bottom right corner of each plot. The gray-shaded line indicates a slope of one