Skip to main content
. 2018 May 3;13(5):e0196849. doi: 10.1371/journal.pone.0196849

Table 1. Summary of variations in the ENTPRISE-X training and testing data sets.

Data Set Usage
Training set 1. For training a model in future applications.
2. For feature reduction.
3. For large scale, ten-fold cross-validation test on nonsense mutations in comparison to DDIG-in, and on frameshift mutations in comparison to DDIG-in & SIFT-indel (see Table 3).
Frameshift Nonsense
Pathogenic Neutral Pathogenic Neutral
ClinVar: 6,513 ESP6500: 1,604 ClinVar: 5,023 ESP6500: 181
1000 GP: 366 1000 GP: 3,171
Total numbers (sum of each column)
6,513 1,970 5,023 32,51
Independent testing sets (not used in training) Usage
VEST-indel set For test on frameshift variations in comparison to VEST-indel & DDIG-in methods (see Table 2).
Frameshift Nonsense
Pathogenic Neutral Pathogenic Neutral
ClinVar: 82 Inter-species: 1,025
ExAC set For large scale false positive rate test on frameshift & nonsense variations in comparison to the VEST-indel & DDIG-in methods
ExAC: 56,917 ExAC: 45,131