Table 1. List of data sets used in simulations and analyses of real phenotypes.
Target population | Trait | European training | Target population training | Target population validation | Validation procedure (primary) | Validation procedure (secondary) |
---|---|---|---|---|---|---|
Latino | 2:1 Simulations | WTCCC2 (N=15,622) | SIGMA (N=7,393*) | SIGMA (N=8,214**) | 10-fold cross validation | NA |
Latino | 1:1 Simulations | WTCCC2 (N=7,393) | SIGMA (N=7,393*) | SIGMA (N=8,214**) | 10-fold cross validation | NA |
Latino | T2D | DIAGRAM (Neff=40,101) | SIGMA (Neff=7,363*) | SIGMA (Neff=8,181**) | 10-fold cross validation | 10×9-fold cross-validation |
Latino | T2D | UK Biobank (Neff=19,842) | SIGMA (Neff=7,363*) | SIGMA (Neff=8,181**) | 10-fold cross validation | NA |
South Asian | T2D | DIAGRAM (Neff=40,101) | SAT2D (Neff=16,065) | UK Biobank (Neff=919) | In-sample fit | 10-fold cross validation |
African | Height | UK Biobank (N=113,660) | N'Diaye et al. (N=20,427) | UK Biobank (N=1,745) | In-sample fit | 10-fold cross validation |
We list the training and validation data sets and validation procedures used in simulations (rows 1-2), predicting T2D in Latinos (rows 3-4), predicting T2D in South Asians (row 5) and predicting height in Africans (row 6). N refers to sample size (continuous traits), Neff refers to effective sample size 4/(1/Ncase+1/Ncontrol) (dichotomous traits).
sample size in each training fold.
sample size in union of validation folds.