Skip to main content
. Author manuscript; available in PMC: 2018 Dec 1.
Published in final edited form as: Genet Epidemiol. 2017 Nov 7;41(8):811–823. doi: 10.1002/gepi.22083

Table 1. List of data sets used in simulations and analyses of real phenotypes.

Target population Trait European training Target population training Target population validation Validation procedure (primary) Validation procedure (secondary)
Latino 2:1 Simulations WTCCC2 (N=15,622) SIGMA (N=7,393*) SIGMA (N=8,214**) 10-fold cross validation NA
Latino 1:1 Simulations WTCCC2 (N=7,393) SIGMA (N=7,393*) SIGMA (N=8,214**) 10-fold cross validation NA
Latino T2D DIAGRAM (Neff=40,101) SIGMA (Neff=7,363*) SIGMA (Neff=8,181**) 10-fold cross validation 10×9-fold cross-validation
Latino T2D UK Biobank (Neff=19,842) SIGMA (Neff=7,363*) SIGMA (Neff=8,181**) 10-fold cross validation NA
South Asian T2D DIAGRAM (Neff=40,101) SAT2D (Neff=16,065) UK Biobank (Neff=919) In-sample fit 10-fold cross validation
African Height UK Biobank (N=113,660) N'Diaye et al. (N=20,427) UK Biobank (N=1,745) In-sample fit 10-fold cross validation

We list the training and validation data sets and validation procedures used in simulations (rows 1-2), predicting T2D in Latinos (rows 3-4), predicting T2D in South Asians (row 5) and predicting height in Africans (row 6). N refers to sample size (continuous traits), Neff refers to effective sample size 4/(1/Ncase+1/Ncontrol) (dichotomous traits).

*

sample size in each training fold.

**

sample size in union of validation folds.