Skip to main content
. 2019 May 23;14(5):e0216922. doi: 10.1371/journal.pone.0216922

Table 2. Topic model evaluation (NPMI and MTA) for different training conditions and different numbers of topics (K).

The two “MT” settings use a full machine translation system, while the “word replacement” approach approximates machine translation by simply replacing the words with entries in a bilingual dictionary.

Training data K NPMI MTA
Mean SD Max Mean SD Max
MT (all tweets) 10 .185 .063 .291 3.15 1.85 6.00
MT (translations only) 10 .101 .058 .202 7.45 1.72 9.00
Word replacement 10 .130 .075 .289 3.30 2.07 6.50
MT (all tweets) 25 .112 .082 .295 3.30 2.44 7.50
MT (translations only) 25 .111 .062 .247 7.44 1.41 9.50
Word replacement 25 .126 .086 .327 3.28 2.13 7.50
MT (all tweets) 50 .096 .097 .342 3.22 2.05 8.50
MT (translations only) 50 .098 .070 .306 7.37 1.42 10.00
Word replacement 50 .126 .085 .361 3.39 2.12 8.50
MT (all tweets) 75 .150 .086 .424 2.77 1.89 9.00
MT (translations only) 75 .089 .076 .346 7.12 1.51 10.00
Word replacement 75 .124 .085 .363 3.61 2.08 9.00
MT (all tweets) 100 .127 .082 .404 2.26 1.58 7.00
MT (translations only) 100 .078 .068 .327 6.58 1.47 10.00
Word replacement 100 .113 .086 .384 3.49 1.94 9.00