Skip to main content
. 2022 Sep 16;12:389. doi: 10.1038/s41398-022-02144-0

Table 2.

Summary of the datasets used.

Datasets and features Frequency
Observation phase National inpatient/outpatient set of HIRAa
 Total patients 2,182,356
 Selected patientsb 763,892
 No. of men in the selected set 341,788 (44.7%)
 No. of women in the selected set 420,104 (55.2%)
 No. of diagnoses 13,459,583
 No. of unique diagnosis codes (ICD-10)c 5859 (5–3-letter level)
 Unique diagnosis high-level (3 letter)d 1151
 Mean of diagnoses age 54.19 (±23.26)
 Outcome of diagnoses
 Deceasede 43,545 (5.7%)
 Alivef 1,252,114
UCSF Medical Centerg (2012.01 to 2017.01)
 Total patients 6,852,000
 Total cases of diagnoses 44,545,038
 No. of unique diagnoses codes (ICD-10-CM) 29,893
 Unique diagnosis codes with 3 letters 1831
 Mean of diagnoses age 48.67 (±23.41),
 Outcome of diagnoses
 Deceased 2961
 Not deceased 143,996
 Pending 21,248,546
Discovery phase VARIMED (VARiant Informing MEDicine)
 No. of reviewed publication 10,331
 No. of SNPs (dbSNP IDs) 130,426 (129,890)
 No. of traits (disease/non-disease traits)h 4 223 (1 489/2374)
 No. of associations between SNPs and traits 135,410
NBB (Netherlands Brain Bank)
 Total patients (AD/non-AD)i 50 (AD 43/non-AD 7)
 Hippocampal formation (HF) samples 50
 Blood (BL) samples 50
 Mean deceased age (AD/non-AD deceased) 83.5±8/71.4±12.6
 Sex (AD deceased) Male = 14; Female = 29
 Sex (non-AD deceased) Male = 3; Female = 4
 Braak staging (AD/non-AD deceased) 5.02±1.1/0.71 ± 0.48
UK Biobank
 No. of total participants 502,543
 Participants with whole-exome seq (WES)j 49,960

aThe Health Insurance Review and Assessment Service of Korea (HIRA). We utilized non-longitudinal sets consisting of randomly sampled in/outpatient sets built annually from 2009 to 2011 (www.hira.or.kr).

bTo minimize the re-enrollment of patients into the 2011 set from the 2009 and 2010 sets, we selected only deceased patients from the 2009 and 2010 sets. In addition, we excluded records of non-disease-related diagnoses, including injuries, poisoning, and childbirth, using diagnosis codes.

cInternational Statistical Classification of Disease and Related Health Problems 10th Revision (ICD-10).

dBased on the hierarchical structure of ICD-10 codes, which consists of a 5-letter level for a disease with familial history and a 3-letter level for general disease classification, we used transformed diagnosis codes at the 3-digit level in this study.

eDetected outcomes in health insurance reviews (HIRA).

fOther non-deceased outcomes included ongoing patients, transferred, sent back, others, and discharged while alive.

gDeidentified electronic medical records (EMRs) from the University of California, San Francisco (UCSF) Medical Center (a tertiary-care university hospital).

hCounted based on MeSH terms (Medical Subject Headings, the National Library of Medicine’s controlled vocabulary) for traits including eye color and diseases such as asthma.

iAll collected samples were of Western European ancestry.

jWe analyzed participants with WES data to validate the phenotypic effect of the germline variant of interest.

Bold characters emphasize the numbers of the table.