. 2018 May 3;13(5):e0196849. doi: 10.1371/journal.pone.0196849

Table 1. Summary of variations in the ENTPRISE-X training and testing data sets.

Data Set				Usage
Training set				1. For training a model in future applications. 2. For feature reduction. 3. For large scale, ten-fold cross-validation test on nonsense mutations in comparison to DDIG-in, and on frameshift mutations in comparison to DDIG-in & SIFT-indel (see Table 3).
Frameshift		Nonsense
Pathogenic	Neutral	Pathogenic	Neutral
ClinVar: 6,513	ESP6500: 1,604	ClinVar: 5,023	ESP6500: 181
	1000 GP: 366		1000 GP: 3,171
Total numbers (sum of each column)
6,513	1,970	5,023	32,51
Independent testing sets (not used in training)				Usage
VEST-indel set				For test on frameshift variations in comparison to VEST-indel & DDIG-in methods (see Table 2).
Frameshift		Nonsense
Pathogenic	Neutral	Pathogenic	Neutral
ClinVar: 82	Inter-species: 1,025	─	─
ExAC set				For large scale false positive rate test on frameshift & nonsense variations in comparison to the VEST-indel & DDIG-in methods
─	ExAC: 56,917	─	ExAC: 45,131