Skip to main content
. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389

Table 1. Critical analysis of literature studies.

Ref. Category Methodology Datasets Strengths Weaknesses
Hong & Zhen (2012), Turney (2003), Khodaei, Shahabi & Li (2012) and Lee et al. (2013) Statistical methods Documents are preprocessed
and statistical methods are applied to calculate frequencies, and then
after frequency-based ranking of words the keywords are prepared
TF-IDF and Frequent Pattern method is the most common
Newspaper data
Twitter benchmark datasets
Feasible for smaller sized datasets.
Best performer in simple structured datasets where only frequency matters
Not feasible for large size and complex
structured datasets.
Zhang & Tang (2013), Jain & Gupta (2018), Schluter (2014) and Beliga, Meštrović & Martinčić-Ipšić (2014) Machine learning Documents sets are cleaned by preprocessing and are further processed through machine learning methods that work on understanding word semantics
from the training dataset.
The Quality Phrase Mining approach is state-of-the-art
and most common for this purpose
Essays collections Twitter datasets Collection of web data Does not crash on large size and complex structured datasets Needs a well developed training dataset
Lahiri, Choudhury & Caragea (2014), Coppola et al. (2019), Zhang et al. (2016), Zhou et al. (2019), Abilhoa & De Castro (2014), Chang, Huang & Lin (2015), Rousseau & Vazirgiannis (2015) and Liu, Chen & Song (2002) Graph-based Documents are preprocessed and the feature set
is converted into a graph with nodes and edges linkage, then graph-based methods are applied to it.
HITS, PageRank, CoreRank, and Centrality measures
are the most common and
state-of-the-art approaches
News datasets from the web Accidents datasets extraction
from web
Supports larger size documents.
Works on the basis of nodes and
edge connectivity which supports
any type of dataset.
Does not require training dataset
Limited to graph-based methods