. 2025 Aug 8;11:e3066. doi: 10.7717/peerj-cs.3066

Table 4. Comparison of advantages and disadvantages of traditional machine learning-based streaming data anomaly detection algorithms.

Model	Advantage	Disadvantage	References
Statistical Model	Capable of modeling data, inferring relationships between variables	Requires certain a priori assumptions, needs validation of model reliability, requires the selection of fitting data processing methods	Hunt & Willett (2018), Tao & Michailidis (2019), Yu, Jibin & Jiang (2016)
Distance model	Can mine data in-depth	High requirements for data preprocessing, demanding distance measurement methods, sensitive to noise	Zhu et al. (2020), Ma, Aminian & Kirby (2019), Miao et al. (2018)
Clustering model	Broad applicability, robust interpretability	Not suitable for high-dimensional or large-scale streaming data, sensitive to initial values, high requirements for preprocessing	Lee & Lee (2022), Raut et al. (2023)
Density model	Simple to implement, quickly reveals potential structures and robust to noise	Suffers from the curse of dimensionality in high-dimensional data, computationally intensive for large-scale data	Liu et al. (2020), Zhang, Zhao & Li (2019)
Isolation model	Capable of modeling data distribution, suitable for complex data distributions	Performance may decrease with high-dimensional data	Liu, Ting & Zhou (2008)
Frequent item mining	Effective at identifying outliers and anomalies in low-density areas, no need for labeled data, supports unsupervised learning	Potential for false positives due to noise and outliers in dataset	Cai et al. (2020a), Hao et al. (2019), Cai et al. (2020b)