. 2024 Oct 17;10:e2270. doi: 10.7717/peerj-cs.2270

Table 1. The description of the BugHunter dataset before and after preprocessing.

Name of the project in the BugHunter	Total # of instances before prep.	# of faulty instances before prep.	# of non-faulty instances before prep.	Faulty ratio (%)	Imb. ratio	# of software metrics after prep.	Total # of instances after prep.	# of faulty instances after prep.	# of non- faulty instances after prep.
ceylon-ide-eclipse	2,087	508	1,579	24.34	3.11	58	2,972	1,393	1,579
BroadleafCommerce	4,709	1,025	3,684	21.77	3.59	61	6,824	3,140	3,684
hazelcast	32,973	12,093	20,880	36.68	1.73	61	39,923	19,043	20,880
elasticsearch	35,862	11,950	23,912	33.32	2	62	45,497	21,585	23,912
MapDB	1,456	480	976	32.97	2.03	59	1,842	866	976
netty	11,171	2,434	8,737	21.79	3.59	59	16,207	7,470	8,737
orientdb	9,445	2,589	6,856	27.41	2.65	61	12,911	6,055	6,856
neo4j	7,030	1,841	5,189	26.19	2.82	59	9,704	4,515	5,189
titan	785	168	617	21.4	3.67	61	1,147	530	617
mcMMO	1,184	411	773	34.71	1.88	55	1,493	720	773
Android-Universal-Image-Loader	325	103	222	31.69	2.16	51	415	193	222
antlr4	840	102	738	12.14	7.24	56	1,350	612	738
junit	462	87	375	18.83	4.31	57	695	320	375
mct	105	25	80	23.81	3.2	53	143	63	80
oryx	810	77	733	9.51	9.52	55	1,350	617	733