. 2022 Jan 19;12:1000. doi: 10.1038/s41598-022-04835-6

Table 10.

Description of used machine learning models.

Model	Description
RF	RF is a model for tree-based ensemble learning that predicts accurately by combining multiple poor learners. IT uses the bagging method for training several decision trees with different samples of bootstrap. The substitution of training data in random forests is a bootstrap study, where the sample is the same as the training collection²⁶
LR	The classification problems are generally dealt with using logistic regression. It is a regression model based on the probability theorem and a predictive analysis algorithm. Binary information, in which one or more variables work together to generate a result, is most often interpreted. Using the sigmoid logistic regression function, a relationship is established between one or more independent variables with an approximation probability²⁷
SVC	The classification aims to divide a data collection into categories based on a set of criteria to classify data in a more meaningful way. SVC is a classification method focused on the support vector technique. The SVC’s goal is to fit the data you supply and return a “best fit” hyperplane that separates or categorizes the data. Following that, you should feed any features to your classifier to see what the “predicted” class is after you have obtained the hyperplane. This makes this algorithm particularly good for our purposes, though it can be used in a variety of contexts^28,29
KNN	KNN is a basic model used in machine learning for regression and classification processing. The data is referred to as the class with the closest neighbors, and the technique uses the data to organize the current data means based on a distance attribute. The KNN model bestows pledge effects in this experiment when the value of k is equal to five (k = 5). It means it looks at the five closest neighbors and chooses one based on the majority or closest distance³⁰
NB	Focused on the Bayes Theorem, the controlled learning algorithm called the Naive Bayes algorithm is used to resolve classification problems. The training of an NB classifier involves a limited number of data points and is therefore fast and scalable. It is a probabilistic classifier that predicts the probability of an object. The NB classifier claims that each likelihood of feature is independent of the others and that they do not overlap, such that each feature contributes similarly to a sample belonging to a given class. The NB classifier is easy to use and quick to compute, and it works well on massive datasets of high dimensionality³¹
ETC	The ETC works in a similar way to the random forest, except for the process of tree building in the forest. The ETC uses the initial training sample to build each decision tree. The top function to interrupt the data in the tree is chosen using the Gini index, and k samples of the best functions are used to make the decision. Several de-correlated decision trees were developed using these random function indicator samples. The algorithm for decision trees is an algorithm for categorical and numerical data that works perfectly³²
DT	A DT is a kind of tree-like framework used to construct structures. A decision tree is commonly used in medical processing because it is quick and fast to execute. There are three nodes in the decision tree. (1) Root node (main node; other nodes’ roles are dependent on it); Interior node (it handles various types of attributes) (3) Node of the leaf (it is also called as end-node; it is the final node which represents the results of each test)³³
ADA	ADA is typically used in combination with other algorithms to improve their accuracy. It focuses on boosting vulnerable learners into good learners. Any AdaBoost tree is based on an error rate of the last constructed tree³⁴