Box 1.
1. Cumulative model (CM) It predicts the cumulative probability of an observation being at or below a given level on the outcome. It assumes that ratings originate from the categorization of a latent continuous variable. We varied the structure of CM by modifying the following hyperparameters: a. Parallel curves or not. With parallel curves, predictors have the same coefficients across different levels of the outcome. With non-parallel curves, predictors were allowed to have different coefficients b. Link functions. Five-link functions were tried to transform the cumulative probability (p) to a continuous unbounded scale and can be modeled using ordinal least square regression. They were ;;; and 2. Penalized regression model [27, 38, 39] It fits CM that is penalized for having too many variables in the model. Imposing a penalty reduces the coefficient values; thus, the less contributive predictors have a coefficient close to or equal zero. We varied the structure of penalized regression model by modifying the following hyperparameters: a. Penalty term (. We set if penalty was applied to the sum of squared coefficients (Ridge penalized regression), and if penalty was applied to the sum of absolute coefficients (LASSO penalized regression) b. Criteria used to select the magnitude of penalty. AIC or BIC c. Link functions. Four link functions were used: ;;; and 3. Ordinal CART CART [18] produces a tree to predict both linear and nominal outcomes. It is built-in splitting and pruning. With splitting, the data is partitioned into smaller subsets to minimize impurity in the new subsets as measured by Gini's index. Splitting continues till final homogeneous subsets; however, they might consist of a few similar data points. At this stage, the model predicts the estimation data perfectly, but might not predict a new data point well (overfitting). To avoid this, the tree is pruned back to the point of the least cross-validated overall misclassification We used a modified approach of CART, where a score is assigned to the ordered categories of the outcome [22]. This allows to assign a cost of misclassification; The larger the distance between the actual and predicted levels, the higher the weight given to the misclassification. We varied the structure of produced tree by modifying the following hyperparameters: a. Cost of misclassification in the generalized Gini index was calculated in absolute or quadratic terms b. Complexity Parameter (CP) is the minimum improvement needed to split at each node. If the split doesn't yield at least that much benefit (the value of cp), the split does not take place. We tried 20 randomly selected values for CP c. The cross-validated overall misclassification (used to determine pruning) was measured using: Misclassification error rate, all misclassifications were given same weight Misclassification cost rate, different weights were given to different misclassifications 4. Ordinal forests (OF) Random forest (RF) [17] is a flexible machine-learning algorithm to predict linear and nominal outcomes. It builds multiple decision trees and merges them to produce an accurate and stable prediction. For every tree, it selects a random number of participants and predictors We used a modified version of RF [22, 32]. It translates ordinal levels into scores, but instead of using a fixed score set, it optimizes them. It tries different score sets and builds a small forest to estimate the expected predictive performance of each set. The optimum score set (that achieved the highest predictions using small forests) is used to build the final OF We varied the structure of the OF by modifying the following hyperparameters: a. Number of score sets tried before the approximation of the optimal score set maybe 50, 100, or 150 sets b. Number of trees in the smaller forests maybe 50, 100, or 150 trees |
c. Number of trees in the final OF using the optimized score set maybe 200, 400, or 600 final trees |