Table 1.
The list of real phenotypes and GWAS data sources analyzed in this study
Phenotype | Description | GWAS source | GWAS sample size | Validation sample size | Test sample size |
---|---|---|---|---|---|
HEIGHT | standing height | UKB | 242,213 | 26,913 | 67,282 |
HDL | high-density lipoprotein | UKB | 211,856 | 23,540 | 58,849 |
BMI | body mass index | UKB | 241,959 | 26,885 | 67,211 |
FVC | forced vital capacity | UKB | 221,249 | 24,584 | 61,459 |
FEV1 | forced expiratory volume in 1 s | UKB | 221,265 | 24,586 | 61,463 |
HC | hip circumference | UKB | 242,311 | 26,924 | 67,309 |
WC | waist circumference | UKB | 242,340 | 26,927 | 67,317 |
LDL | low-density lipoprotein | UKB | 230,995 | 25,667 | 64,166 |
BW | birth weight | UKB | 138,300 | 15,367 | 38,417 |
T2D | type 2 diabetes | UKB | 235,937 | 26,216 | 65,538 |
RA | rheumatoid arthritis | UKB | 186,239 | 20,694 | 51,734 |
ASTHMA | asthma | UKB | 229,031 | 25,448 | 63,620 |
LangoAllen2010_HEIGHT | standing height | Allen et al.74 | 131,547 | 26,913 | 67,282 |
Speliotes2010_BMI | body mass index | Speliotes et al.75 | 122,033 | 26,885 | 67,211 |
GLGC2021_HDL | high-density lipoprotein cholesterol | Graham et al.76 | 888,227 | 23,540 | 58,849 |
GLGC2021_LDL | low-density lipoprotein cholesterol | Graham et al.76 | 842,660 | 25,667 | 64,166 |
Teslovich2010_HDL | high-density lipoprotein cholesterol | Teslovich et al.77 | 97,749 | 23,540 | 58,849 |
Teslovich2010_LDL | low-density lipoprotein cholesterol | Teslovich et al.77 | 93,354 | 25,667 | 64,166 |
SpiroMeta2019_FVC | forced vital capacity | Shrine et al.78 | 79,005 | 24,584 | 61,459 |
SpiroMeta2019_FEV1 | forced expiratory volume in 1 s | Shrine et al.78 | 79,005 | 24,586 | 61,463 |
Morris2012_T2D | type 2 diabetes | Morris et al.79 | 60,786 | 26,216 | 65,538 |
Scott2017_T2D | type 2 diabetes | Scott et al.80 | 159,208 | 26,216 | 65,538 |
Okada2014_RA | rheumatoid arthritis | Okada et al.81 | 37,681 | 20,694 | 51,734 |
Demenais2018_ASTHMA | asthma | Demenais et al.82 | 142,486 | 25,448 | 63,620 |
With each phenotype code, we provide the full phenotype name and description and the GWAS data source or cohort (UKB or external study) as well as the sample sizes for the training, validation, and test sets. The sample sizes for each subset may vary slightly across the five folds. For the external summary statistics, we prepended the phenotype codes with either the consortium name or the name of the first author as well as the year in which the GWAS was published. For analyses with the external GWAS summary statistics, the validation and test sets come from the UK Biobank.