Table 1.
Classic and ensemble ML candidate models for predicting student outcomes.
Category | ML Model | Description |
---|---|---|
Classic | Logistic Regression (LR) | LR is a statistical method describing data and the relationship between one dependent and independent variable. This is one of the most widely used methods for classification and regression [43]. |
K-Nearest Neighbor (KNN) | KNN is a supervised learning algorithm that is mainly used in classification problems. The algorithm assigns labels based on the K-closest patterns in the training dataset. Subsequently, new data are labeled according to their minimum distance from the classes. | |
Support Vector Machines (SVM) | SVMs are a set of supervised learning algorithms used for classification problems. SVMs are non-parametric algorithms and rely on kernel functions whose computational complexity does not depend on the dimensions of the input space. The SVM classifier is based on determining a hyperplane that lies in a transformed input space and divides the classes [45,46]. | |
Support Vector Regression (SVR) | SVR is a supervised learning algorithm used for regression problems. Decision boundaries are determined to predict the continuous output. SVR is trained using a symmetric loss function, penalizing high and low false estimates equally [47]. | |
Ensemble | Voting | Voting predicts the output class based on the maximum number of votes received from ML models [14]. |
Bootstrap Aggregation (Bagging) | Bagging is based on the decision tree classifiers. This technique uses bootstrap sampling with replacement to generate the subset of training data. Later, these subsets are utilized to develop weak and homogeneous models independently. Weak models are trained in parallel, and predictions from voting week models yield a more accurate model [15,48]. | |
Random Forest (RF) | RF is a robust bagging method based on developing multiple decision tree models. This method focuses on two aspects of the sampling, reducing the number of training data and the number of variables. Multiple decision trees are trained on randomly selected training subsets to alleviate the over-fitting problem. The final aggregate is obtained using a majority voting procedure on the models' results. Therefore, the models have less correlation, and the final model is more reliable [16]. | |
Boosting | Boosting is an ensemble method that builds a strong model based on the iterative training of weak models. Unlike bagging, boosting models are not generated independently by training weak models but are built sequentially on samples from the training dataset. The accuracy of the decision model is improved by learning from previous mistakes [49,50]. | |
Adaptive Boosting (ADA) | ADA is a tree-based boosting method that focuses on samples that are difficult to classify. This method assigns lower weights to misclassified samples, which are adjusted sequentially during retraining. The final classification is obtained by combining all weak models, while the more accurate ones receive more weight and have more impact on the final results. | |
Extreme Gradient Boosting (XGB) | XGB is a tree-based boosting method where random sample subsets are selected to generate new models, and successive ones reduce the previous models' errors. A regularization for penalizing complex models, tree pruning, and parallel learning are utilized to alleviate overfitting and reduce time complexity [18,51]. | |
Stacked generalization (Stacking) | Stacking is an ensemble ML model that generally consists of heterogeneous models. This model obtains the final prediction by combining several robust models and aggregating their results. At the first level, stacking models are composed of several base models, while at the second level, a meta-model is developed, which considers the outputs of the base models as input. Therefore, the variety of models at the first level could lead to higher performance of the final models [19]. |