Skip to main content
. 2018 May 1;19:56. doi: 10.1186/s13059-018-1432-2

Fig. 2.

Fig. 2

Performance of three alternative regression methods for inferring E–P models. a Performance of ordinary least squares (OLS), generalized linear model with negative binomial distribution (GLM.NB), and zero-inflated negative binomial (ZINB) regression using the binary test. Point (x,y) on a plot indicates that a fraction x of the models had − log10[q-value] < y computed by Wilcoxon rank sum test. OLS yields a higher fraction of validated models at any q-value cutoff. b Same as a but using the activity level validation test, with p values computed by the Spearman correlation test. Here too, OLS yields a higher fraction of validated models than the other methods. c Number of promoters whose OLS models passed (at q < 0.1) each of the tests (or none). d The distribution of the number of positive samples (samples in which the promoter is active, i.e., has RPKM≥1) for promoters in each category. e Comparison between the R2 values with and without cross-validation (CV). Each dot is a promoter model. Blue dots denote models with R2 ≥ 0.5 and RCV20.25. Red dots denote models with and R2 > 0.5 and RCV2<0.25 corresponding to over-fitted models with low predictive power on novel samples. f A promoter whose model as computed without CV has a very high R2 (left plot) but when CV is applied a low RCV2 is obtained (right plot). This example demonstrates the sensitivity of R2 (and Pearson correlation) to outliers. ρs Spearman correlation, Q-value FDR-corrected p value