Skip to main content
. 2023 Mar 1;23(5):2708. doi: 10.3390/s23052708

Table 10.

The distance between topics for the unlemmatized and lemmatized corpus.

No. of
Tokens
Vocabulary Inference
Time
(in Seconds)
Distance Measurement
Unlemmatized Lemmatized
Hellinger Jaccard Hellinger Jaccard
604,389 85,463 33.14 0.476 0.970 0.491 0.993
561,648 42,722 29.63 0.495 0.968 0.546 0.998
531,870 27,533 26.77 0.481 0.982 0.520 0.996
512,085 21,238 22.15 0.489 0.982 0.517 0.999
496,373 17,310 18.92 0.495 0.982 0.528 1.000
483,108 14,657 16.55 0.492 0.983 0.526 0.999