Figure 3.
F-measure and ROC curves for clique (C) and hierarchical (H) clustering pipeline at different p-value cut-offs. Red series: hierarchical clustering with Rfam data set by Will et al. (2007). Green series: clique clustering pipeline with Rfam data set. Blue series: clique clustering pipeline with Rfam_LowID data set. (a) F-measure of the clustering performance on different data sets. The peak performances of the three series are 64.8%, 74.9% and 86.4%, respectively (denoted by broken lines). Note that the cut-off used by Will et al. (2007) is recall rate, for which the corresponding p-value cut-off is difficult to estimate. Therefore, only the peak performance is presented. (b) ROC curves of clique and hierarchical clustering pipelines for different data sets. The term ‘before cluster’ refers to the performance of clustering before clique extraction (only score normalisation has been applied). The term ‘after cluster’ refers to the performance of clustering after clique extraction (both score normalisation and clique extraction have been applied). When the best overall performance is achieved (with corresponding FPR 8 × 10−3), the score normalisation contributes to the ~70% of the performance gain, while the clique extraction contributes the other ~30%