Skip to main content
. 2022 Oct 3;17(10):e0271240. doi: 10.1371/journal.pone.0271240

Table 1. Summary of the simulation design following the ADEMP structure.

Aims (section 2.2)
  • To compare variable selection methods using different tuning parameters (CV, AIC and BIC) or initial estimates in terms of model selection and prediction.

  • To assess the usefulness of post-estimation shrinkage in the prediction of classical variable selection methods and compare the results with penalized methods.

  • To compare the amount of shrinkage of regression coefficients of penalized and post-estimation shrinkage methods.

  • To assess the performance of different methods in the presence of relatively many noise variables, in larger sample size, in relatively high correlation and when R2 approaches one.

Data generating mechanism (section 2.3) Training/development dataset
  • (X ~ Np(0, Σ))where p = 15 and ΣRp×p; Σij is equal to the correlation coefficient between covariate xi and xj

  • Y = + ϵ where β ∈ (βA, βB, βC, βD) and ϵ ~ N(0, σ2In)


True regression coefficients (β) for 15 covariates
βA: 1.5, 0, 1, 0, 1, 0, 0.5, 0, 0.5, 0, 0.5, 0, -0.5, 0, 0 –From [20]
βB: 1.5, 0, 0.5, 0, 0.5, 0, 0.25, 0, 0.25, 0, 0.25, 0, -0.25, 0, 0 –(modified βA)
βC: 1,0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0 –From [21]
βD: 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 –From [21]
Correlation structure (C)
C1: Taken from [20]–low collinearity
C2: Autoregressive structure with Σij = 0.3|i-j|-low collinearity
C3 Autoregressive structure with Σij = 0.8|i-j|–moderate collinearity
C4: Adapted from real study, body fat data–high collinearity
R2 and sample size (n)
R2 = {0.20, 0.30, 0.50, 0.71}; n = {100, 400}
Number of scenarios (full factorial design) and simulation runs
β × C × R2 × n = 4 × 4 × 4 × 2 = 128 scenarios
N = 2,000 simulation repetitions per scenario
Test dataset
  • New simulations with the same design as training dataset (ntest = 100, 000)


Additional analysis
Additional analysis will be conducted with βA, C1, n = (400, 800) and a subset of R2 = {0.30, 0.50, 0.71, 0.8, 0.9}
Estimand/target of analysis (section 2.4)
  • Selection status of each covariate and identification of the true model

  • Shrinkage factors for each regression estimate

  • Model prediction errors

Methods (section 2.5) A. Variable selection methods
Method Tuning parameters Initial estimates
Lasso 10-fold CV, AIC & BIC N/A
Garrote 10-fold CV, AIC & BIC OLS, ridge and lasso
Alasso* 10-fold CV, AIC & BIC OLS, ridge and lasso
Rlasso* 10-fold CV, AIC & BIC N/A
Best subset 10-fold CV, AIC & BIC N/A
BE* 10-fold CV, AIC & BIC N/A
B. Post-estimation shrinkage methods:
(i) Global [10], (ii) parameterwise [9] and (iii) Breiman’s method [5]
Estimation method: (i) leave-one-out CV and (ii) 10-fold CV
Performance measures (section 2.6)
  • Inclusion and exclusion of variables: FNR & FPR–subsection 2.6.1

  • classification of models: Probabilities–subsection 2.6.1

  • Prediction accuracy: Model error (ME)–subsection 2.6.2

  • Variability of ME within and between scenarios—section 5 in S1 File

*Alasso, Rlasso and BE denote adaptive lasso, relaxed lasso and backward elimination; while FNR and FPR denote false negative rates and false positive rates, respectively.