Table 3.
Selected tuning parameters for prediction models
| Sampling Framework | Logistic regression with LASSO | Random forest | ||||
|---|---|---|---|---|---|---|
| Training/ test split | Cross-validation split | Model estimation | λ † | # Non-zero coefficients | Minimum terminal node size | # predictors sampled at each split‡ |
| Visit | Visit | Observed cluster analysis | 2.5 x 10−6 | 250 | 500 | 34 |
| Visit | Person | Observed cluster analysis | 4.5 x 10−5 | 86 | 5,000 | 8 |
| Person | Person | Observed cluster analysis | 4.5 x 10−5 | 105 | 5,000 | 17 |
| Person | Person | Within cluster resampling | 5.5 x 10−5 | 64‡ | 1,000 | 17 |
λ controls the degree of shrinkage for variable selection. A larger value of λ corresponds to more shrinkage, and a smaller value of λ, less shrinkage and more non-zero coefficients.
The default recommendation for number of predictors randomly sampled for consideration at each split is square root of the total number of predictors, equal to 17 for our dataset. We also examined twice this default (34 predictors) and half of the default (8).
Average number of non-zero coefficients across logistic regression with LASSO models fit on 20 within cluster resampled datasets