. 2020 Dec 15;22(4):bbaa312. doi: 10.1093/bib/bbaa312

Table 2.

A summary of the training and independent test datasets used in the existing methods

Datasets^a	Training datasets		Independent test datasets		CD-HIT threshold	Dataset availability
Datasets^a	Number of ACPs	Number of non-ACPs	Number of ACPs	Number of non-ACPs	CD-HIT threshold	Dataset availability
Tyagi2013 [15]	225	225	50	50	–	Yes
Hajisharifi2014 [2]	138	206	No	No	0.9	Yes
Vijayakumar2015 [16]	217	3979	40	40	1.0	Yes
Chen2016 [17]	138	206	150	150	0.9	Yes
Li2016 [18]	138	206	150	150	0.9	Yes
Akbar2017 [13]	138	206	No	No	0.9	Yes
Manavalan2017^b [12]	187	398	422 + 126	422 + 205	0.9	No
Xu2018 [19]	138	206	No	No	0.9	Yes
Wei2018 [20]	250	250	82	82	0.9	Yes
Kabir2018 [21]	138	206	150	150	0.9	Yes
Schaduangrat2019 [22]	138	205	No	No	0.9	Yes
Boopathi2019 [23]	266	266	157	157	0.8	Yes
Yi2019 [14]	376	364	129	111	0.9	Yes
Rao2019 [11]	250	250	82	2628	0.8	Yes
Agrawal2020 [24]	689	689	172	172	-	Yes
Basith2020^c [36]	-	-	246	1733	0.9, 0.6	Yes

^aDatasets are named using the second name of the first author plus the publication year in the corresponding literature.

^b Manavalan2017 contains two independent test datasets. One contains 422 ACPs and 422 non-ACPs, and the other contains 126 ACPs and 126 non-ACPs, respectively.

^cIn Basith2020, positive and negative samples with >90% and > 60% sequence identity with training data sets used in existing methods were removed.