1. Fit a random forest to the EFFECT1 sample using the observed outcomes. The observed outcomes are no longer used after this step. |
2. Apply the random forest fit in Step 1 to both the EFFECT1 and EFFECT2 samples. Obtain a predicted probability of the outcome for each subject in the EFFECT1 and EFFECT2 samples using the fitted model. |
3. Generate a binary outcome for each subject in the EFFECT1 and EFFECT2 samples using a Bernoulli random variable with subject-specific probability equal to the predicted probability obtained in Step 2. These are the simulated outcomes that will be used in all subsequent steps. |
4. Apply a given analysis method (e.g. unpenalized logistic regression) by fitting that model to the EFFECT1 sample with the simulated outcomes generated in Step 3. |
5. Apply the fitted model from Step 4 to the EFFECT2 sample. |
6. For each subject in the EFFECT2 sample, obtain a predicted probability of the outcome based on the fitted analysis model that was applied to the EFFECT2 sample in Step 5. |
7. Use the eight performance metrics to compare the predicted probability of the outcome obtained in Step 6 with the simulated binary outcome generated in Step 3. |
8. Repeat Steps 3 to 7 1000 times. Summarize the performance metrics across the 1000 simulation replicates. |
9. Repeat Steps 3 to 8 for a total of six analysis methods (lasso, ridge regression and unpenalized logistic regression; random forest, bagged classification trees, boosted trees). |
10. Repeat Steps 1 to 9 with the five other data-generating processes (bagged classification trees, boosted trees, the lasso, ridge regression, and unpenalized logistic regression). |