Table 1. Critical analysis of literature studies.
| Ref. | Category | Methodology | Datasets | Strengths | Weaknesses |
|---|---|---|---|---|---|
| Hong & Zhen (2012), Turney (2003), Khodaei, Shahabi & Li (2012) and Lee et al. (2013) | Statistical methods | Documents are preprocessed and statistical methods are applied to calculate frequencies, and then after frequency-based ranking of words the keywords are prepared TF-IDF and Frequent Pattern method is the most common |
Newspaper data Twitter benchmark datasets |
Feasible for smaller sized datasets. Best performer in simple structured datasets where only frequency matters |
Not feasible for large size and complex structured datasets. |
| Zhang & Tang (2013), Jain & Gupta (2018), Schluter (2014) and Beliga, Meštrović & Martinčić-Ipšić (2014) | Machine learning | Documents sets are cleaned by preprocessing and are further processed through machine learning methods that work on understanding word semantics from the training dataset. The Quality Phrase Mining approach is state-of-the-art and most common for this purpose |
Essays collections Twitter datasets Collection of web data | Does not crash on large size and complex structured datasets | Needs a well developed training dataset |
| Lahiri, Choudhury & Caragea (2014), Coppola et al. (2019), Zhang et al. (2016), Zhou et al. (2019), Abilhoa & De Castro (2014), Chang, Huang & Lin (2015), Rousseau & Vazirgiannis (2015) and Liu, Chen & Song (2002) | Graph-based | Documents are preprocessed and the feature set is converted into a graph with nodes and edges linkage, then graph-based methods are applied to it. HITS, PageRank, CoreRank, and Centrality measures are the most common and state-of-the-art approaches |
News datasets from the web Accidents datasets extraction from web |
Supports larger size documents. Works on the basis of nodes and edge connectivity which supports any type of dataset. Does not require training dataset |
Limited to graph-based methods |