Algorithm 1: Methodology proposed to predict student retention/dropout in HE institutions similar to the Chilean case. |
-
1:
Collect and input data about the PSU scores (or equivalent depending on the country), CP index, economic quintiles, and first/second year scores.
-
2:
Standardize attribute names in the input data.
-
3:
Perform cleanup and removal of missing/noisy/duplicate instances in input data.
-
4:
Eliminate data features that do not add value according to [16].
-
5:
Create a data subset using the PSU scores, CP index, and economic quintiles.
-
6:
Extract data of students with more than three years of academic follow-up.
-
7:
State classes for data as two values considering actives and dropouts.
-
8:
Arrange data for the global model from the full data set considering dropout at any level.
-
9:
Assemble data for the first level model from the full data set considering dropout in the first year only.
-
10:
Dispose data for the second-level model from the first-level data with the first-level retained students plus the first-year scores.
-
11:
Organize data for the third-level model from the second-level data with the second-level retained students plus the second-year scores.
-
12:
Apply ML algorithms to the global model with the data of Step 8, and analyze the results.
-
13:
Employ ML algorithms for the first-level model with the data of Step 9, and state the results.
-
14:
Use ML algorithms for the second-level model with the data of Step 10, and obtain the results.
-
15:
Utilize ML algorithms for third-level model with the data of Step 11, and indicate the results.
-
16:
Establish the performance of the ML algorithms, and propose the best one.
|