Skip to main content
. Author manuscript; available in PMC: 2015 Feb 10.
Published in final edited form as: Stat Med. 2013 Aug 11;33(3):401–421. doi: 10.1002/sim.5937

Table II.

Comparison of all six methods with respect to the average number of true and noise variables selected under all case-prevalence levels, all scenarios for data configuration, and all missing data patterns.

Scenario Missing
pattern
GLM t-test CART Random
Forest
LASSO Elastic
Net
Imputed
LASSO
Prevalence = 0.5
C1.Independent Complete 9.5 (3.7) 9 4.1 (2.8) 7.3 9.3 (4) 9.3 (4.8)
M1 7.3 (6.6) 8.9 3.9 (2.8) 7.4 7 (7.8) 7.2 (8.3) 9.1 (4.2)
M2 7.2 (7) 8.7 3.7 (2.6) 6 7.6 (6.8) 7.8 (8.2) 8.8 (4.4)
C2.Correlated (True, Noise) Complete 6.4 (3.8) 6.4 3.6 (4) 6 9 (6.7) 9.2 (8.2)
M1 4.3 (7.3) 6.3 3.5 (4) 5.9 7.3 (8) 7.7 (9.4) 8.7 (7.1)
M2 4 (6.8) 6.3 3.3 (3.6) 3.8 7.1 (9.3) 7.5 (10.8) 8.4 (6.6)
C3.Correlated (True, True) Complete 2.6 (3.8) 10 3.3 (2.8) 10 6.2 (0.9) 8.4 (0.7)
M1 2.3 (7.5) 10 3.3 (2.8) 10 4.8 (1.4) 7.1 (1.2) 6.3 (0.8)
M2 2.4 (8.1) 10 3.3 (2.8) 8.5 4.8 (1.3) 7.2 (1.2) 3.3 (0.4)
C4.Omitted interactions Complete 9.6 (3.6) 8.8 3.7 (2.5) 7.3 8.9 (4.5) 8.9 (4.8)
M1 7.3 (6.8) 8.7 3.5 (2.5) 7.2 6.6 (7.1) 6.9 (7.6) 8.7 (4.7)
M2 7.3 (6.8) 8.5 3.3 (2.4) 3.8 7.2 (8.4) 7.4 (9.3) 8.2 (5.1)
C5.Unobserved true predictors Complete 7.1 (3.5) 7.7 2.9 (3.1) 5.2 7.2 (7) 7.4 (7.7)
M1 4.3 (6) 7.5 2.7 (3.5) 5 5.1 (9.6) 5.4 (10.8) 7 (7.5)
M2 4.5 (5.5) 7.3 2.5 (3) 3.4 4.4 (8.9) 4.7 (10.3) 6.8 (8.2)
C6.Complex Complete 2.9 (4.1) 5.9 3.4 (4.2) 5.9 6.5 (5.2) 7.7 (6.5)
M1 2.2 (7) 5.8 3.3 (3.8) 5.9 4.9 (6.6) 5.9 (8.1) 6.6 (5.2)
M2 2.2 (7) 5.6 3.1 (3.8) 3.9 4.5 (6) 5.6 (7.2) 6.3 (6.2)
Prevalence = 0.3
C1.Independent Complete 9.2 (4) 8.6 1.2 (0.6) 6.9 8.9 (5.6) 9 (8.1)
M1 5.8 (8) 8.6 0.9 (1.4) 6.9 5 (4) 5.5 (5.5) 8.7 (5.5)
M2 6 (7.5) 8.3 0.8 (0.4) 5.8 3.4 (3.5) 3.8 (4.8) 8.1 (6.2)
C2.Correlated (True, Noise) Complete 5.9 (3.9) 6.2 0.9 (0.7) 5.7 8.2 (7.5) 8.5 (10.1)
M1 3.8 (7.5) 6.2 0.9 (0.8) 5.8 3.4 (4.2) 4 (6.2) 7.8 (7.6)
M2 4 (7.8) 6 0.7 (0.6) 4.2 3.4 (4.9) 3.9 (6.6) 7.9 (8)
C3.Correlated (True, True) Complete 4.3 (4.9) 10 3.9 (3) 10 8.3 (1.1) 9.7 (1.1)
M1 0.6 (2.6) 10 3.9 (2.8) 10 6.4 (1.6) 8.9 (1.9) 8.4 (1.1)
M2 0.8 (3.4) 10 3.7 (2.6) 9 6.6 (1.8) 9 (2.5) 6.8 (1)
C4.Omitted interactions Complete 9.7 (3.9) 9 1.6 (0.8) 7.3 9.5 (5.8) 9.7 (8.2)
M1 6.2 (7.5) 8.8 1.3 (0.7) 7.2 6.1 (5.3) 6.3 (7) 9.4 (5.9)
M2 6.7 (8) 8.8 1.2 (0.6) 4.9 7.4 (5.8) 7.8 (8.3) 9.2 (6)
C5.Unobserved true predictors Complete 5.1 (3.8) 6.1 0.1 (0.1) 3.8 0.5 (0.6) 0.4 (0.6)
M1 3.1 (6.5) 6 0.1 (0.1) 3.6 0.3 (0.6) 0.3 (0.6) 0.4 (0.6)
M2 3.7 (6.2) 5.7 0.1 (0.1) 2.9 0.3 (0.6) 0.3 (0.8) 0.6 (0.8)
C6.Complex Complete 3.6 (4.3) 5.9 3.6 (3.5) 6.1 8.1 (6.5) 9.2 (9.2)
M1 2.4 (7.9) 5.9 3.3 (3.4) 6.2 5.8 (7) 7.5 (9.9) 8.1 (6.5)
M2 2 (7.2) 5.6 3.3 (3.8) 4.3 5.6 (6.4) 7.3 (9.7) 7.8 (7.2)

Average number of selected noise terms are shown in parentheses.

Bold cells correspond to the methods that selected the most true predictors under each condition.

Two-sample t-test and Random Forest rank all variables in each simulation, and we selected the top 10 of them. Thus, the average number of noise variables selected is 10 minus the average number of true variables, and they are not shown for these two methods.

GLM, generalized linear model; CART, Classification and Regression Tree; LASSO, Least Absolute Shrinkage and Selection Operator.