Abstract
The article concerns the detection of outliers in rule-based knowledge bases containing data on Covid 19 cases. The authors move from the automatic generation of a rule-based knowledge base from source data by clustering rules in the knowledge base to optimize inference processes and to detecting unusual rules allowing for the optimal structure of rule groups. The paper presents a two-phase procedure, wherein in the first phase, we look for the optimal structure of rule clusters when there are outlier rules in the knowledge base. In the second phase, we detect outliers in the rules using the LOF (Local Outlier Factor) algorithm. Then we eliminate the unusual rules from the database and check whether the selected cluster quality measures are responded positively to the elimination of outliers, which would indicate that the rules were rightly considered outliers. The performed experiments confirmed the effectiveness of the LOF algorithm and selected cluster quality measures in the context of detecting atypical rules. The detection of such rules can support knowledge engineers or domain experts in knowledge mining to improve the completeness of the knowledge base, which is usually the basis of the decision support system.
Keywords: rules, knowledge base, outliers, LOF, quality indices, clustering
References
- 1.Alghushairy O., Alsini R., Soule T., Ma X. ”A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams”. Big Data and Cognitive Computing. 2021;5(1) doi: 10.3390/bdcc5010001. [DOI] [Google Scholar]
- 2.Goldstein M., Uchida S. ”A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data”. PLoS One. 2016;11(4):e0152173. doi: 10.1371/journal.pone.0152173. 2016, [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Grubbs F.E. ”Procedures for Detecting Outlying Observations in Samples”. Technometrics. 1969;11(1):1–21. doi: 10.1080/00401706.1969.10490657. [DOI] [Google Scholar]
- 4.Grzymała-Busse J.W. ”A new version of the rule induction system LERS”. Fundam. Inform. 1997;31(1):27–39. doi: 10.3233/FI-1997-3113. [DOI] [Google Scholar]
- 5.Legendre P., Fionn M. ”Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?”. Journal of Classification. 2014;31:274–295. [Google Scholar]
- 6.Breunig M. M., Kriegel H., Ng R.T., and Sander, J., (2000) ”LOF: Identifying Density-Based Local Outliers”. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data,: 93–104, Dallas, Texas, USA.
- 7.Nowak-Brzezińska A., Wakulicz-Deja A. Vol. 485. Elsevier; 2019. ”Exploration of rule-based knowledge bases: A knowledge engineer’s support”; pp. 301–318. (Information Sciences). [Google Scholar]
- 8.Nowak-Brzezińska A. ”Enhancing the efficiency of a decision support system through the clustering of complex rule-based knowledge bases and modification of the inferencje algorithm”. Complexity. 2018 doi: 10.1155/2018/2065491. 2018, [DOI] [Google Scholar]
- 9.Nowak-Brzezińska A., Horyń C. ”Exploration of Outliers in If-Then Rule-Based Knowledge Bases”. Entropy. 2020;22(10) doi: 10.3390/e22101096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pijnenburg, M., Kowalczyk, W. (2018) Singular Outliers: Finding Common Observations with an Uncommon Feature. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2018. Communications in Computer and Information Science, vol 855. Springer, Cham. 10.1007/978-3-319-91479-4_41 [DOI]
- 11.Ranga Suri N.N.R., Murty M.N., Athithan G. Outlier Detection: Techniques and Applications. Intelligent Systems Reference Library. Springer; Cham: 2019. ”Outlier Detection in Categorical Data”; p. 155. [DOI] [Google Scholar]
- 12.Sinha A., Jana P.K. Distributed Computing and Internet Technology. Springer; Cham: 2018. ”Efficient Algorithms for Local Density Based Anomaly Detection”. ICDCIT 2018. LNCS, 10722, [DOI] [Google Scholar]
- 13.Thudumu S., Branch P., Jin J., et al. ”A comprehensive survey of anomaly detection techniques for high dimensional big data”. J Big Data. 2020;7(42) doi: 10.1186/s40537-020-00320-x. (2020). [DOI] [Google Scholar]
- 14.Wang H., Bah M.J., Hammad M. ”Progress in Outlier Detection Techniques: A. Survey”. IEEE Access. 2019;7:107964–108000. doi: 10.1109/ACCESS.2019.2932769. 2019, [DOI] [Google Scholar]
- 15.Wierzchoń S., Kłopotek M.A. ”Algorytmy analizy skupień”. WNT; Warszawa: 2015. [Google Scholar]
- 16.https://www.sciencedirect.com/search?qs=covid
- 17.https://www.nature.com/search?q=covid
- 18.https://www.mimuw.edu.pl/~szczuka/rses/get.html
- 19.https://covid19.who.int/
