Figure 2. Validation of B-SIFT on protein mutation datasets.
A. Distribution of B-SIFT scores for SWISS-PROT mutagenesis data. Density plots showing the distributions of B-SIFT scores for mutations in the SWISS-PROT mutagenesis dataset classified as deleterious (red curve), neutral (black), and activating (blue). Legend specifies the number of mutations classified under each functional category. B. Mutation composition of SWISS-PROT mutagenesis data. Each bar shows the percentage of the total mutations that meet the given B-SIFT cutoffs that are classified as either activating (blue), neutral (green), or deleterious (red). Values in parentheses show the total number of mutations that met each of the B-SIFT score thresholds. C. Fold enrichment of activating mutations with increasing score cutoffs. As B-SIFT score cutoff is increased, the percentage of activating mutations with B-SIFT scores greater than or equal to the cutoff increases as well (red line). A B-SIFT cutoff of −1 represents the complete dataset and each successive point is the fold enrichment over this baseline. In contrast, the green line shows a similar plot but using increasing SIFT cutoffs starting from 0. Although simply having a high SIFT score also results in enrichment of activating mutations, B-SIFT significantly improves the enrichment.