Figure 4.
Measuring the predictive accuracy of the regression models. The coefficient of determination (R2) and the Spearman's rank correlation coefficient are shown for linear and nonlinear regression models, with and without shuffling the RC matrix. NMF and regression analysis based on the shuffled RC matrix results in lower values for the predictive accuracy of the models. This suggests that the NMF-based discovery of protein complexes, which is based on the collective binding of multiple TFs on CREs, can explain gene expression variation better than models that use random TF binding data. One hundred runs of shuffling–NMF–regression were performed, and the average R2 and correlation coefficient are plotted.