. 2025 Oct 23;8:1496580. doi: 10.3389/frai.2025.1496580

Table 8.

A synthesis of the work of the most cited 10 articles in our dataset.

Title	Method/model	Used data	Algorithms/techniques	Main findings
Toward Detection of Phishing Websites on Client-Side Using Machine Learning Based Approach (Jain and Gupta, 2018b)	ML on multiple datasets	Phishtank, OpenPhish, Alexa, payment gateways, banks	RF, SVM, Neural Nets, Logistic Regression, Naive Bayes	Improved accuracy using client-side data extraction
Detection of Phishing Websites Using an Efficient Feature-Based Machine Learning Framework (Rao and Pais, 2019)	Feature extraction from URL + source code + 3rd parties	Diverse data sets	8 ML algorithms	Better than CANTINA/CANTINA+, detects zero-day phishing
Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning (Yang et al., 2019)	CNN for phishing detection	~2M URLs (1,021,758 phishing + 989,021 legitimate)	CNN	High performance and fast processing speed
Machine Learning Based Phishing Detection from URLs (Sahingoz et al., 2019)	Custom dataset + NLP	73,575 URLs (36,400 legitimate, 37,175 phishing)	DT, AdaBoost, K-star, kNN, RF, SMO, Naive Bayes	Scalable, real-time, detects new phishing attempts
PhishStorm: Detecting Phishing with Streaming Analytics (Marchal et al., 2014)	PhishStorm – real-time detection	PhishTank, DMOZ: URLs + search engine queries	Classical ML on URL components	94.91% accuracy, 1.44% false positives (FP)
A Machine Learning Based Approach for Phishing Detection Using Hyperlinks Information (Jain and Gupta, 2019)	HTML hyperlinks analysis	PhishTank, OpenPhish, Alexa: Hyperlinks from source code	Logistic Regression + 12 hyperlink features	Achieved 98.4% accuracy, language-independent
A New Hybrid Ensemble Feature Selection Framework for Machine Learning-Based Phishing Detection System (Chiew et al., 2019)	HEFS + CDF-g for optimal feature selection	Multiple sources	Ensemble framework	Improves accuracy through optimal feature selection
A Stacking Model Using URL and HTML Features for Phishing Webpage Detection (Li et al., 2019)	Stacking model on URL + HTML features	Phishtank (2k webpages) + Alexa (49,947 webpages)	Combined SVM, NN, DT, RF	High accuracy, stacking outperforms individual models
CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites (Xiang et al., 2011)	Extraction of 15 high-level webpage characteristics from URLs, HTML DOM, 3rd party services, search engines	Diverse Web resources	SVM, Logistic Regression, Bayesian Network, J48, Random Forest, AdaBoost	Good TP/FP rate, competitive solution
A Comprehensive Survey of AI-Enabled Phishing Attacks Detection Techniques (Basit et al., 2021)	Review on phishing	Diverse datasets	RF, SVM, kNN	ML and DL have up to 99% accuracy, much better than heuristics and data mining approaches