Skip to main content
. 2021 Apr 20;23(4):485. doi: 10.3390/e23040485
Algorithm 1: Methodology proposed to predict student retention/dropout in HE institutions similar to the Chilean case.
  • 1:

    Collect and input data about the PSU scores (or equivalent depending on the country), CP index, economic quintiles, and first/second year scores.

  • 2:

    Standardize attribute names in the input data.

  • 3:

    Perform cleanup and removal of missing/noisy/duplicate instances in input data.

  • 4:

    Eliminate data features that do not add value according to [16].

  • 5:

    Create a data subset using the PSU scores, CP index, and economic quintiles.

  • 6:

    Extract data of students with more than three years of academic follow-up.

  • 7:

    State classes for data as two values considering actives and dropouts.

  • 8:

    Arrange data for the global model from the full data set considering dropout at any level.

  • 9:

    Assemble data for the first level model from the full data set considering dropout in the first year only.

  • 10:

    Dispose data for the second-level model from the first-level data with the first-level retained students plus the first-year scores.

  • 11:

    Organize data for the third-level model from the second-level data with the second-level retained students plus the second-year scores.

  • 12:

    Apply ML algorithms to the global model with the data of Step 8, and analyze the results.

  • 13:

    Employ ML algorithms for the first-level model with the data of Step 9, and state the results.

  • 14:

    Use ML algorithms for the second-level model with the data of Step 10, and obtain the results.

  • 15:

    Utilize ML algorithms for third-level model with the data of Step 11, and indicate the results.

  • 16:

    Establish the performance of the ML algorithms, and propose the best one.