Table 2.
Training and Validation Sets Used by Current Prediction Methods
| Non‐overlapping multi‐method | ||||||
|---|---|---|---|---|---|---|
| Training set (as published) | Test set (as published) | benchmark set | ||||
| Pathogenic | Benign | Pathogenic | Benign | Pathogenic | Benign | |
| In‐frame | ||||||
| PROVEAN | Uniprot | Uniprot | HGMD 2011 | 1000G P1 | HGMD2014.4 | 1000G P3 AA |
| DDIG‐in | HGMD 2012 | 1000G P1 | Uniprot | Uniprot | HGMD2014.4 | 1000G P3 AA |
| SIFT‐indel | HGMD 2010 | Interspecies | Uniprot | Uniprot | HGMD2014.4 | 1000G P3 AA |
| CADD | Simulated | Fixed Polymorphisms | ClinVar | ESP6500 | HGMD2014.4 | 1000G P3 AA |
| VEST‐indel | HGMD 2014.3 | ESP6500 AA | ClinVar | Interspecies | HGMD2014.4 | 1000G P3 AA |
| Frameshift | ||||||
| PROVEAN | N/A | N/A | N/A | N/A | N/A | N/A |
| DDIG‐in | HGMD 2012 | 1000G P1 | HGMD 2012 | Interspecies | HGMD2014.4 | 1000G P3 AA |
| SIFT‐indel | HGMD 2010 | Interspecies | N/A | N/A | HGMD2014.4 | 1000G P3 AA |
| CADD | Simulated | Fixed Polymorphisms | ClinVar | ESP6500 | HGMD2014.4 | 1000G P3 AA |
| VEST‐indel | HGMD 2014.3 | ESP6500 AA | ClinVar | Interspecies | HGMD2014.4 | 1000G P3 AA |
1000G P1 and 1000G P3 are variants from 1000 Genomes Phase 1 and 3, respectively. Interspecies benign variants derived from pairwise genome alignments of human and cow, dog, horse, chimp, rhesus macaque, and rat. Uniprot variants were obtained from the UniProtKB/Swiss‐Prot “Human Polymorphisms and Disease Mutations” dataset (Release 2011_09), annotated as deleterious, neutral, or unknown based on keywords from the provided Uniprot descriptions. AA, African or African American Ancestry and N/A, not applicable.