Skip to main content
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Biom J. 2021 May 24;63(7):1375–1388. doi: 10.1002/bimj.202000199

Table 3.

Selected tuning parameters for prediction models

Sampling Framework Logistic regression with LASSO Random forest
Training/ test split Cross-validation split Model estimation λ # Non-zero coefficients Minimum terminal node size # predictors sampled at each split
Visit Visit Observed cluster analysis 2.5 x 10−6 250 500 34
Visit Person Observed cluster analysis 4.5 x 10−5 86 5,000 8
Person Person Observed cluster analysis 4.5 x 10−5 105 5,000 17
Person Person Within cluster resampling 5.5 x 10−5 64 1,000 17

λ controls the degree of shrinkage for variable selection. A larger value of λ corresponds to more shrinkage, and a smaller value of λ, less shrinkage and more non-zero coefficients.

The default recommendation for number of predictors randomly sampled for consideration at each split is square root of the total number of predictors, equal to 17 for our dataset. We also examined twice this default (34 predictors) and half of the default (8).

§

Average number of non-zero coefficients across logistic regression with LASSO models fit on 20 within cluster resampled datasets