Table 1. Summary of the simulation design following the ADEMP structure.
Aims (section 2.2) |
|
||
Data generating mechanism (section 2.3) |
Training/development dataset
True regression coefficients (β) for 15 covariates βA: 1.5, 0, 1, 0, 1, 0, 0.5, 0, 0.5, 0, 0.5, 0, -0.5, 0, 0 –From [20] βB: 1.5, 0, 0.5, 0, 0.5, 0, 0.25, 0, 0.25, 0, 0.25, 0, -0.25, 0, 0 –(modified βA) βC: 1,0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0 –From [21] βD: 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 –From [21] Correlation structure (C) C1: Taken from [20]–low collinearity C2: Autoregressive structure with Σij = 0.3|i-j|-low collinearity C3 Autoregressive structure with Σij = 0.8|i-j|–moderate collinearity C4: Adapted from real study, body fat data–high collinearity R2 and sample size (n) R2 = {0.20, 0.30, 0.50, 0.71}; n = {100, 400} Number of scenarios (full factorial design) and simulation runs β × C × R2 × n = 4 × 4 × 4 × 2 = 128 scenarios N = 2,000 simulation repetitions per scenario Test dataset
Additional analysis Additional analysis will be conducted with βA, C1, n = (400, 800) and a subset of R2 = {0.30, 0.50, 0.71, 0.8, 0.9} |
||
Estimand/target of analysis (section 2.4) |
|
||
Methods (section 2.5) | A. Variable selection methods | ||
Method | Tuning parameters | Initial estimates | |
Lasso | 10-fold CV, AIC & BIC | N/A | |
Garrote | 10-fold CV, AIC & BIC | OLS, ridge and lasso | |
Alasso* | 10-fold CV, AIC & BIC | OLS, ridge and lasso | |
Rlasso* | 10-fold CV, AIC & BIC | N/A | |
Best subset | 10-fold CV, AIC & BIC | N/A | |
BE* | 10-fold CV, AIC & BIC | N/A | |
B. Post-estimation shrinkage methods: (i) Global [10], (ii) parameterwise [9] and (iii) Breiman’s method [5] Estimation method: (i) leave-one-out CV and (ii) 10-fold CV | |||
Performance measures (section 2.6) |
|
*Alasso, Rlasso and BE denote adaptive lasso, relaxed lasso and backward elimination; while FNR and FPR denote false negative rates and false positive rates, respectively.