Skip to main content
. 2020 Dec 15;22(4):bbaa312. doi: 10.1093/bib/bbaa312

Table 2.

A summary of the training and independent test datasets used in the existing methods

Datasetsa Training datasets Independent test datasets CD-HIT threshold Dataset availability
Number of ACPs Number of non-ACPs Number of ACPs Number of non-ACPs
Tyagi2013 [15] 225 225 50 50 Yes
Hajisharifi2014 [2] 138 206 No No 0.9 Yes
Vijayakumar2015 [16] 217 3979 40 40 1.0 Yes
Chen2016 [17] 138 206 150 150 0.9 Yes
Li2016 [18] 138 206 150 150 0.9 Yes
Akbar2017 [13] 138 206 No No 0.9 Yes
Manavalan2017b [12] 187 398 422 + 126 422 + 205 0.9 No
Xu2018 [19] 138 206 No No 0.9 Yes
Wei2018 [20] 250 250 82 82 0.9 Yes
Kabir2018 [21] 138 206 150 150 0.9 Yes
Schaduangrat2019 [22] 138 205 No No 0.9 Yes
Boopathi2019 [23] 266 266 157 157 0.8 Yes
Yi2019 [14] 376 364 129 111 0.9 Yes
Rao2019 [11] 250 250 82 2628 0.8 Yes
Agrawal2020 [24] 689 689 172 172 - Yes
Basith2020c [36] - - 246 1733 0.9, 0.6 Yes

aDatasets are named using the second name of the first author plus the publication year in the corresponding literature.

b Manavalan2017 contains two independent test datasets. One contains 422 ACPs and 422 non-ACPs, and the other contains 126 ACPs and 126 non-ACPs, respectively.

cIn Basith2020, positive and negative samples with >90% and > 60% sequence identity with training data sets used in existing methods were removed.