Table 2.
Datasetsa | Training datasets | Independent test datasets | CD-HIT threshold | Dataset availability | ||
---|---|---|---|---|---|---|
Number of ACPs | Number of non-ACPs | Number of ACPs | Number of non-ACPs | |||
Tyagi2013 [15] | 225 | 225 | 50 | 50 | – | Yes |
Hajisharifi2014 [2] | 138 | 206 | No | No | 0.9 | Yes |
Vijayakumar2015 [16] | 217 | 3979 | 40 | 40 | 1.0 | Yes |
Chen2016 [17] | 138 | 206 | 150 | 150 | 0.9 | Yes |
Li2016 [18] | 138 | 206 | 150 | 150 | 0.9 | Yes |
Akbar2017 [13] | 138 | 206 | No | No | 0.9 | Yes |
Manavalan2017b [12] | 187 | 398 | 422 + 126 | 422 + 205 | 0.9 | No |
Xu2018 [19] | 138 | 206 | No | No | 0.9 | Yes |
Wei2018 [20] | 250 | 250 | 82 | 82 | 0.9 | Yes |
Kabir2018 [21] | 138 | 206 | 150 | 150 | 0.9 | Yes |
Schaduangrat2019 [22] | 138 | 205 | No | No | 0.9 | Yes |
Boopathi2019 [23] | 266 | 266 | 157 | 157 | 0.8 | Yes |
Yi2019 [14] | 376 | 364 | 129 | 111 | 0.9 | Yes |
Rao2019 [11] | 250 | 250 | 82 | 2628 | 0.8 | Yes |
Agrawal2020 [24] | 689 | 689 | 172 | 172 | - | Yes |
Basith2020c [36] | - | - | 246 | 1733 | 0.9, 0.6 | Yes |
aDatasets are named using the second name of the first author plus the publication year in the corresponding literature.
b Manavalan2017 contains two independent test datasets. One contains 422 ACPs and 422 non-ACPs, and the other contains 126 ACPs and 126 non-ACPs, respectively.
cIn Basith2020, positive and negative samples with >90% and > 60% sequence identity with training data sets used in existing methods were removed.