Skip to main content
. 2017 Jun 9;13(6):932. doi: 10.15252/msb.20167490

Figure 2. Integration of the three large‐scale protein complex datasets substantially improves both precision and recall of known human protein interactions.

Figure 2

  1. Precision–recall curves calculated on a leave‐out set of protein interactions from literature‐curated complexes for different combinations of predictive protein interaction features. The integration of all three datasets outperforms all other networks. Also, note a substantial improvement in performance when the weighted matrix model features are used (no MatrixModel, blue vs. integrated, orange).
  2. Performance of parameter optimization for MCL and Newman two‐stage clustering procedures. Each data point represents a set of parameters and is evaluated based on the resulting clusters similarity to both training and test sets of complexes using the F‐Grand measure (see Materials and Methods). Final parameter sets were selected based only on F‐Grand measure for the training set.
  3. Precision–recall curves evaluating protein interactions on leave‐out set before (integrated) and after (hu.MAP) clustering procedure. Note an improvement in performance after clustering suggests the clustering procedure successfully removed false‐positive interactions.
  4. Distribution of protein interactions in the final protein interaction network based on input evidence. Note the weighted matrix model interactions produce many high‐confident interactions. Also, the “Multiple” category shows predominately high‐confident interactions, which validates the integration of multiple datasets mitigating false positives.
  5. Protein interactions from our complex map substantially overlap with other protein interaction datasets across a variety of experimental types.