Table 2.
Summary of the datasets used.
Datasets and features | Frequency | |
---|---|---|
Observation phase | National inpatient/outpatient set of HIRAa | |
Total patients | 2,182,356 | |
Selected patientsb | 763,892 | |
No. of men in the selected set | 341,788 (44.7%) | |
No. of women in the selected set | 420,104 (55.2%) | |
No. of diagnoses | 13,459,583 | |
No. of unique diagnosis codes (ICD-10)c | 5859 (5–3-letter level) | |
Unique diagnosis high-level (3 letter)d | 1151 | |
Mean of diagnoses age | 54.19 (±23.26) | |
Outcome of diagnoses | ||
Deceasede | 43,545 (5.7%) | |
Alivef | 1,252,114 | |
UCSF Medical Centerg (2012.01 to 2017.01) | ||
Total patients | 6,852,000 | |
Total cases of diagnoses | 44,545,038 | |
No. of unique diagnoses codes (ICD-10-CM) | 29,893 | |
Unique diagnosis codes with 3 letters | 1831 | |
Mean of diagnoses age | 48.67 (±23.41), | |
Outcome of diagnoses | ||
Deceased | 2961 | |
Not deceased | 143,996 | |
Pending | 21,248,546 | |
Discovery phase | VARIMED (VARiant Informing MEDicine) | |
No. of reviewed publication | 10,331 | |
No. of SNPs (dbSNP IDs) | 130,426 (129,890) | |
No. of traits (disease/non-disease traits)h | 4 223 (1 489/2374) | |
No. of associations between SNPs and traits | 135,410 | |
NBB (Netherlands Brain Bank) | ||
Total patients (AD/non-AD)i | 50 (AD 43/non-AD 7) | |
Hippocampal formation (HF) samples | 50 | |
Blood (BL) samples | 50 | |
Mean deceased age (AD/non-AD deceased) | 83.5±8/71.4±12.6 | |
Sex (AD deceased) | Male = 14; Female = 29 | |
Sex (non-AD deceased) | Male = 3; Female = 4 | |
Braak staging (AD/non-AD deceased) | 5.02±1.1/0.71 ± 0.48 | |
UK Biobank | ||
No. of total participants | 502,543 | |
Participants with whole-exome seq (WES)j | 49,960 |
aThe Health Insurance Review and Assessment Service of Korea (HIRA). We utilized non-longitudinal sets consisting of randomly sampled in/outpatient sets built annually from 2009 to 2011 (www.hira.or.kr).
bTo minimize the re-enrollment of patients into the 2011 set from the 2009 and 2010 sets, we selected only deceased patients from the 2009 and 2010 sets. In addition, we excluded records of non-disease-related diagnoses, including injuries, poisoning, and childbirth, using diagnosis codes.
cInternational Statistical Classification of Disease and Related Health Problems 10th Revision (ICD-10).
dBased on the hierarchical structure of ICD-10 codes, which consists of a 5-letter level for a disease with familial history and a 3-letter level for general disease classification, we used transformed diagnosis codes at the 3-digit level in this study.
eDetected outcomes in health insurance reviews (HIRA).
fOther non-deceased outcomes included ongoing patients, transferred, sent back, others, and discharged while alive.
gDeidentified electronic medical records (EMRs) from the University of California, San Francisco (UCSF) Medical Center (a tertiary-care university hospital).
hCounted based on MeSH terms (Medical Subject Headings, the National Library of Medicine’s controlled vocabulary) for traits including eye color and diseases such as asthma.
iAll collected samples were of Western European ancestry.
jWe analyzed participants with WES data to validate the phenotypic effect of the germline variant of interest.
Bold characters emphasize the numbers of the table.