Skip to main content
. 2018 Feb 8;42(3):233–249. doi: 10.1002/gepi.22112

Table 1.

Details of ECLUST algorithm

Step Description, Softwarea and Reference
1a)
  • i)

    Calculate TOM separately for observations with E=0 and E=1 using WGCNA::TOMsimilarityFromExpr (Langfelder & Horvath, 2008)

  • ii)

    Euclidean distance matrix of |TOME=1TOME=0| using stats::dist

  • iii)

    Run the dynamicTreeCut algorithm (Langfelder et al., 2008; Langfelder, P., Zhang, B., & with contributions from Steve Horvath, 2016) on the distance matrix to determine the number of clusters and cluster membership using dynamicTreeCut::cutreeDynamic with minClusterSize = 50

1b)
  • i)

    1st PC or average for each cluster using stat::prcomp or base::mean

  • ii)

    Penalized regression model: create a design matrix of the derived cluster representatives and their interactions with E using stats::model.matrix

  • iii)

    MARS model: create a design matrix of the derived cluster representatives and E

2)
  • i)

    For linear models, run penalized regression on design matrix from Step 1b using glmnet::cv.glmnet (Friedman et al., 2010). Elasticnet mixing parameter alpha=1 corresponds to the lasso and alpha=0.5 corresponds to the value we used in our simulations for elasticnet. The tuning parameter lambda is selected by minimizing 10 fold cross‐validated mean squared error (MSE).

  • ii)

    For nonlinear effects, run MARS on the design matrix from Step 1b using earth::earth (Milborrow. Derived from mda:mars by T. Hastie and R. Tibshirani., 2011) with pruning method pmethod = “backward” and maximum number of model terms nk = 1000. The degree=1,2 is chosen using 10 fold cross validation (CV), and within each fold the number of terms in the model is the one that minimizes the generalized cross validated (GCV) error.

a

All functions are implemented in R (R Core Team, 2016). The naming convention is as follows: package_name::package_function. Default settings used for all functions unless indicated otherwise.