. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389

Table 1. Critical analysis of literature studies.

Ref.	Category	Methodology	Datasets	Strengths	Weaknesses
Hong & Zhen (2012), Turney (2003), Khodaei, Shahabi & Li (2012) and Lee et al. (2013)	Statistical methods	Documents are preprocessed and statistical methods are applied to calculate frequencies, and then after frequency-based ranking of words the keywords are prepared TF-IDF and Frequent Pattern method is the most common	Newspaper data Twitter benchmark datasets	Feasible for smaller sized datasets. Best performer in simple structured datasets where only frequency matters	Not feasible for large size and complex structured datasets.
Zhang & Tang (2013), Jain & Gupta (2018), Schluter (2014) and Beliga, Meštrović & Martinčić-Ipšić (2014)	Machine learning	Documents sets are cleaned by preprocessing and are further processed through machine learning methods that work on understanding word semantics from the training dataset. The Quality Phrase Mining approach is state-of-the-art and most common for this purpose	Essays collections Twitter datasets Collection of web data	Does not crash on large size and complex structured datasets	Needs a well developed training dataset
Lahiri, Choudhury & Caragea (2014), Coppola et al. (2019), Zhang et al. (2016), Zhou et al. (2019), Abilhoa & De Castro (2014), Chang, Huang & Lin (2015), Rousseau & Vazirgiannis (2015) and Liu, Chen & Song (2002)	Graph-based	Documents are preprocessed and the feature set is converted into a graph with nodes and edges linkage, then graph-based methods are applied to it. HITS, PageRank, CoreRank, and Centrality measures are the most common and state-of-the-art approaches	News datasets from the web Accidents datasets extraction from web	Supports larger size documents. Works on the basis of nodes and edge connectivity which supports any type of dataset. Does not require training dataset	Limited to graph-based methods