Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile

. 2021 Apr 20;23(4):485. doi: 10.3390/e23040485

Algorithm 1: Methodology proposed to predict student retention/dropout in HE institutions similar to the Chilean case.

1:
Collect and input data about the PSU scores (or equivalent depending on the country), CP index, economic quintiles, and first/second year scores.
2:
Standardize attribute names in the input data.
3:
Perform cleanup and removal of missing/noisy/duplicate instances in input data.
4:
Eliminate data features that do not add value according to [16].
5:
Create a data subset using the PSU scores, CP index, and economic quintiles.
6:
Extract data of students with more than three years of academic follow-up.
7:
State classes for data as two values considering actives and dropouts.
8:
Arrange data for the global model from the full data set considering dropout at any level.
9:
Assemble data for the first level model from the full data set considering dropout in the first year only.
10:
Dispose data for the second-level model from the first-level data with the first-level retained students plus the first-year scores.
11:
Organize data for the third-level model from the second-level data with the second-level retained students plus the second-year scores.
12:
Apply ML algorithms to the global model with the data of Step 8, and analyze the results.
13:
Employ ML algorithms for the first-level model with the data of Step 9, and state the results.
14:
Use ML algorithms for the second-level model with the data of Step 10, and obtain the results.
15:
Utilize ML algorithms for third-level model with the data of Step 11, and indicate the results.
16:
Establish the performance of the ML algorithms, and propose the best one.