. 2023 Aug 17;9:e1544. doi: 10.7717/peerj-cs.1544

Table 1. Summary statistics of datasets: the number of training sets, validation sets and test sets of three kinds of datasets, the number of categories, the average length, the number of nodes and the number of edges of each dataset.

Dataset	Num of samples Training/Validation/Test/Vocabulary	Num of categories	Average length	Num of nodes	Num of edges (million) PMI* of words
IFLYTEK	12,133/2,599/0/250,862	119	120	265,594	26.096
ChnSentiCorp	9,600/1,200/1,200/58,932	2	109	70,932	6.126
Toutiao-S	15,000/2,500/2,500/36,246	5	25	56,246	1.761