Skip to main content
. 2023 Aug 17;9:e1544. doi: 10.7717/peerj-cs.1544

Table 1. Summary statistics of datasets: the number of training sets, validation sets and test sets of three kinds of datasets, the number of categories, the average length, the number of nodes and the number of edges of each dataset.

Dataset Num of samples
Training/Validation/Test/Vocabulary
Num of categories Average length Num of nodes Num of edges (million)
PMI* of words
IFLYTEK 12,133/2,599/0/250,862 119 120 265,594 26.096
ChnSentiCorp 9,600/1,200/1,200/58,932 2 109 70,932 6.126
Toutiao-S 15,000/2,500/2,500/36,246 5 25 56,246 1.761