Skip to main content
. 2025 Oct 23;8:1496580. doi: 10.3389/frai.2025.1496580

Table 8.

A synthesis of the work of the most cited 10 articles in our dataset.

Title Method/model Used data Algorithms/techniques Main findings
Toward Detection of Phishing Websites on Client-Side Using Machine Learning Based Approach (Jain and Gupta, 2018b) ML on multiple datasets Phishtank, OpenPhish, Alexa, payment gateways, banks RF, SVM, Neural Nets, Logistic Regression, Naive Bayes Improved accuracy using client-side data extraction
Detection of Phishing Websites Using an Efficient Feature-Based Machine Learning Framework (Rao and Pais, 2019) Feature extraction from URL + source code + 3rd parties Diverse data sets 8 ML algorithms Better than CANTINA/CANTINA+, detects zero-day phishing
Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning (Yang et al., 2019) CNN for phishing detection ~2M URLs (1,021,758 phishing + 989,021 legitimate) CNN High performance and fast processing speed
Machine Learning Based Phishing Detection from URLs (Sahingoz et al., 2019) Custom dataset + NLP 73,575 URLs (36,400 legitimate, 37,175 phishing) DT, AdaBoost, K-star, kNN, RF, SMO, Naive Bayes Scalable, real-time, detects new phishing attempts
PhishStorm: Detecting Phishing with Streaming Analytics (Marchal et al., 2014) PhishStorm – real-time detection PhishTank, DMOZ: URLs + search engine queries Classical ML on URL components 94.91% accuracy, 1.44% false positives (FP)
A Machine Learning Based Approach for Phishing Detection Using Hyperlinks Information (Jain and Gupta, 2019) HTML hyperlinks analysis PhishTank, OpenPhish, Alexa: Hyperlinks from source code Logistic Regression + 12 hyperlink features Achieved 98.4% accuracy, language-independent
A New Hybrid Ensemble Feature Selection Framework for Machine Learning-Based Phishing Detection System (Chiew et al., 2019) HEFS + CDF-g for optimal feature selection Multiple sources Ensemble framework Improves accuracy through optimal feature selection
A Stacking Model Using URL and HTML Features for Phishing Webpage Detection (Li et al., 2019) Stacking model on URL + HTML features Phishtank (2k webpages) + Alexa (49,947 webpages) Combined SVM, NN, DT, RF High accuracy, stacking outperforms individual models
CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites (Xiang et al., 2011) Extraction of 15 high-level webpage characteristics from URLs, HTML DOM, 3rd party services, search engines Diverse Web resources SVM, Logistic Regression, Bayesian Network, J48, Random Forest, AdaBoost Good TP/FP rate, competitive solution
A Comprehensive Survey of AI-Enabled Phishing Attacks Detection Techniques (Basit et al., 2021) Review on phishing Diverse datasets RF, SVM, kNN ML and DL have up to 99% accuracy, much better than heuristics and data mining approaches