. 2020 Jul 14;3:42. doi: 10.3389/frai.2020.00042

Table 1.

Some of the existing related works that revised the topic modeling method.

Related work	Topic modeling method	Evaluation method	Outcome
Chakkarwar and Tamane (2020)	Latent Dirichlet allocation (LDA) with bag of words (BoW)	Visual overview of extracted topics	- Aimed to discover the current trends, topics, or patterns from research documents to overview different research trends.
			- The result shows that the LDA is an effective topic modeling method for creating the context of a document collection.
Ray et al. (2019)	Latent semantic indexing (LSI)	Perplexity	- Aimed to introduce methods and tools of topic modeling to the Hindi language.
	LDA	Topic coherence	- Discussed many techniques and tools used for topic modeling.
	Non-negative matrix factorization (NMF)		- The coherence result of the NMF model was a little better than the LDA model.
			- The perplexity of the LDA model on the Hindi dataset is better compared to other evaluated topic modeling methods.
Xu et al. (2019)	LDA	Perplexity	- Aimed to help Chinese movie creators to get the psychological needs of movie viewers and provide suggestions to improve the quality of Chinese movies.
			- Used the word cloud as a visual display of high-frequency keywords in a text which gives a basic understanding of the core ideas of text data.
			- The LDA model provides topics that deliver a good analysis of the Douban online review.
			- Used the perplexity method to determine the best number of extracted topics, as a result, 20 extracted topics were set.
Alghamdi and Alfalqi (2015)	Latent semantic analysis (LSA)		- Reviewed many topic modeling methods in terms of characteristics, limitations, and theoretical background.
	Probabilistic latent semantic analysis (PLSA)
	LDA		- Reviewed many topic modeling application areas and evaluation methods.
	Correlated topic model (CTM)
Chen et al. (2017)	NMF	t-Distributed stochastic neighbor embedding (TSNE) dimensionality-reduction method	- Aimed to compare and evaluate many topic modeling approaches in analyzing a large set of the US Securities and Exchange Commission (SEC) filings made by US public banks.
	Principal component analysis (PCA)		- Both NMF and LDA methods provide very good document representation, while the K-Competitive Autoencoder for Text (KATE)¹ delivered more meaningful document and high-accuracy topics.
	LDA
	KATE		- The LDA provided the best result regarding the classification of topic representation.
Mazarura and de Waal (2016)	LDA	Topic stability	- Tested many numbers of topics (10, 20, 30, 40, 50, and 100 topics).
			- Topic coherence decreases for both the LDA and Dirichlet multinomial mixture model (GSDMM) as the number of topics increases in a long text, which indicates an overall decline in the quality of topics uncovered by both models as the number of topics increases.
	GSDMM	Topic coherence	- The LDA's performance of the coherence values is slightly better than the GSDMM.
			- The GSDMM is more stable than LDA.
			- The GSDMM is indeed a viable option on the short text as it displays the potential to produce better results than LDA.
Sisodia et al. (2020)	BoW		- The Nu-support vector classification (Nu-SVC) classifier outperforms all other included classifiers in the set of individual classifiers.
	Term frequency–inverse document frequency (TF-IDF)	Accuracy	- Random forest classifier outperforms all other included classifiers in the set of the case on ensemble classifiers.
	Naive Bayes	Precision	- The support vector machine (SVM) classifier outperforms all other classifiers in the set of individual classifiers.
	SVM	Recall	- Random forest classifier outperforms the remaining ones.
	Decision trees	F-measures	- Considered only two datasets; other datasets of different sizes need to be studied for better results.
	Nu-SVC
Shi et al. (2017)	Vector space model (VSM)		- Reviewed all of the following methods: VSM, LSI, PLSA, and LDA.
	LSI		- Reviewed the essential concept of topic modeling using a bag-of-words approach.
	PLSA		- Discussed the basic idea of topic modeling including the bag-of-words approach, training of model, and output.
	LDA		- Discussed topic modeling application, features, limitations, and tools such as Gensim, standard topic modeling toolbox, Machine Learning for Language Toolkit (MALLET), and BigARTM.
Nugroho et al. (2020)	LDA	Purity	- It focuses on the review of the approaches and discusses the features that are exploited to deal with the extreme sparsity and dynamics of the online social network (OSN) environment.
	NMF	Normalized mutual information (NMI)	- Run the algorithms over both datasets 30 times and note the average value of each evaluation metric for comparison.
	Task-driven NMF		- Most methods can achieve high purity value.
			- The NMF and non-negative matrix inter-joint factorization (NMijF) having the best performance over the other methods.
	Plink-LDA	Pairwise F-measure	- F-measure evaluation results in all methods were well and similar.
			- NMijF provides the best results according to all the evaluation metrics.
	NMijF		- Both LDA and NMF focus on the simple content exploitation of social media posts, main features (content, social interactions, and temporal).
Ahmed Taloba et al. (2018)	PCA model	Precision	- The aim was to compare the performance of these methods before and after using PCA.
	Standard SVM	Accuracy
	J-48 decision tree	Sensitivity	- The RF gives acceptable and higher accuracy when compared to the rest of the classifiers.
	KNN methods	F-measure	- The RF algorithm gives higher performance, and its performance is improved after using PCA.
Chen et al. (2019)	LDA	PMI score	- Tested many numbers of topics (20, 40, 60, 80, and 100).
			- The NMF has overwhelming advantages over LDA.
	NMF	Human judgments	- The knowledge-guided NMF (KGNMF) model performs better than NMF and LDA
	KGNMF		- The NMF provides better topics than LDA with topic numbers ranging from 20 to 100.
Anantharaman et al. (2019)	LDA	Precision	- Evaluated all topic modeling algorithms with both BoW and TF-IDF representations.
		Recall F-measure	- Used the Naïve Bayes classifier for the 20-newsgroup dataset and the random forest classifier for the BBC news and PubMed datasets.
	LSA	Accuracy Cohen's	- The results of the 20-newsgroup dataset LDA with BoW outperform those of the other topic algorithms.
		Kappa score	- The LDA model does not perform well with TF-IDF when compared to BoW.
	NMF	Matthews
		Correlation coefficient	- The LDA takes a lot of time when compared to the LSA and NMF models.
		Time taken

https://github.com/hugochan/KATE.