. 2022 Oct 22;53(11):14249–14268. doi: 10.1007/s10489-022-04221-9

Table 2.

Details of the used experimental datasets

Dataset	Type of data	#Classes	Exceeding ratio	Vocabulary size	#Training set	#Testing set
DATASET-1	BINARY	2	0.471	16449	6911	1821
DATASET-2	BINARY	2	0.495	39515	15000	5000
DATASET-3	MULTICLASS	8	0.297	13737	4937	2175
DATASET-4	MULTICLASS	3	0.563	16230	10000	3000
DATASET-5	MULTICLASS	5	0.553	24742	6000	1500

Note: Exceeding ratio = number of samples with a length greater than the average length/number of all training samples