Table 1. Evaluation of feature subsets (ROC-AUC, 10-fold cross-validation with linear support vector machine classifier) to discriminate between CPDs versus SNPs and disease-causing mutations versus common SNPs (DM versus SNPs).
Feature subset | CPD-AUC (%) | DM-AUC (%) | Performance reduction for CPD (CPD-AUC versus DM-AUC) |
---|---|---|---|
Genomic MSA | 74.0 | 94.6 | −20.6 |
Protein MSA (homologous) | 60.4 | 80.7 | −20.2 |
Local protein structure | 56.5 | 68.5 | −12.0 |
Regional protein composition | 55.3 | 63.5 | −8.2 |
Exonic features | 64.7 | 71.0 | −6.4 |
Annotated functional sites | 50.3 | 55.3 | −5.0 |
Amino-acid features | 64.5 | 69.1 | −4.6 |
Random value (control) | 50.0 | 48.7 | 1.3 |
Features ranked by performance reduction between CPD set and DM set.