Skip to main content
Journal of Biochemistry logoLink to Journal of Biochemistry
. 2021 May 13;170(3):399–410. doi: 10.1093/jb/mvab060

Japonica Array NEO with increased genome-wide coverage and abundant disease risk SNPs

Mika Sakurai-Yageta 1,2,#, Kazuki Kumada 1,2,#, Chinatsu Gocho 1, Satoshi Makino 1, Akira Uruno 1,3, Shu Tadaka 1, Ikuko N Motoike 1,4, Masae Kimura 1, Shin Ito 1,2, Akihito Otsuki 1,3, Akira Narita 1, Hisaaki Kudo 1, Yuichi Aoki 1,4, Inaho Danjoh 1, Jun Yasuda 1,2, Hiroshi Kawame 1, Naoko Minegishi 1,2, Seizo Koshiba 1,2, Nobuo Fuse 1,2,3, Gen Tamiya 1,2,3, Masayuki Yamamoto 1,2,3,, Kengo Kinoshita 1,2,4,
PMCID: PMC8510329  PMID: 34131746

Abstract

Ethnic-specific SNP arrays are becoming more important to increase the power of genome-wide association studies in diverse population. In the Tohoku Medical Megabank Project, we have been developing a series of Japonica Arrays (JPA) for genotyping participants based on reference panels constructed from whole-genome sequence data of the Japanese population. Here, we designed a novel version of the SNP array for the Japanese population, called Japonica Array NEO (JPA NEO), comprising a total of 666,883 markers. Among them, 654,246 tag SNPs of autosomes and X chromosome were selected from an expanded reference panel of 3,552 Japanese, 3.5KJPNv2, using pairwise r2 of linkage disequilibrium measures. Additionally, 28,298 markers were included for the evaluation of previously identified disease risk markers from the literature and databases, and those present in the Japanese population were extracted using the reference panel. Through genotyping 286 Japanese samples, we found that the imputation quality r2 and INFO score in the minor allele frequency bin >2.5–5% were >0.9 and >0.8, respectively, and >12 million markers were imputed with an INFO score >0.8. From these results, JPA NEO is a promising tool for genotyping the Japanese population with genome-wide coverage, contributing to the development of genetic risk scores.

Keywords: disease risk alleles, ethnic-specific SNP array, genome-wide coverage, genotype imputation, Tohoku Medical Megabank Project

Graphical Abstract

graphic file with name mvab060f5.jpg


The Tohoku Medical Megabank (TMM) Project was launched as part of reconstruction efforts following the Great East Japan Earthquake on March 11, 2011, and aims to establish a next-generation medical system for precision medicine and personalized healthcare (1). To accomplish the purpose, we have been conducting prospective genome cohort studies in connection with the establishment of an integrated biobank. Between 2013 and 2017, the Tohoku Medical Megabank Organization (ToMMo) and the Iwate Tohoku Medical Megabank Organization recruited 157,602 participants and conducted a baseline assessment, including the collection of biospecimens in Miyagi and Iwate Prefectures. The study population comprised two cohorts: the TMM Community-Based Cohort Study (TMM CommCohort Study) cohort, consisting of 84,073 adults (2), and the TMM Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study) cohort, consisting of 73,529 pregnant women and their family members (3).

We have performed genome/omics analyses within the TMM project and established an integrated biobank that includes biospecimens, health and clinical information and genome/omics data to develop a research infrastructure for genomic medicine (4). Taking advantage of the two abovementioned cohorts, we planned a strategy for genomic analysis as follows: development of a whole-genome reference panel using the TMM CommCohort, large-scale genotyping and genotype imputation of both cohorts, and collection of accurate haplotype information from the TMM BirThree Cohort. Based on this strategy, we first established an allele frequency panel called 1KJPN, which includes the whole-genome sequencing (WGS) data of 1,070 participants (5). The reference panel was sequentially expanded to the latest version, 3.5KJPNv2, which consists of 3,342 and 210 samples from the participants of the TMM project and other cohorts in western Japan, respectively (6). Based on the updated reference panel, we have developed and refined custom single-nucleotide polymorphism (SNP) arrays for genotyping all 157,602 participants, as described below.

The International HapMap and the 1000 Genomes Project have shown that human genomes comprise regions with an extended linkage disequilibrium (LD) and of limited haplotype diversity, depending on the population (7, 8), and that SNPs within regions could be inferred from genotypes of a smaller number of SNPs. Carlson et al. showed that the selection algorithms of the set of SNPs for genotyping (referred to as tag SNPs) based on the r2, which are widely used for pairwise LD measures using reference genome sequences from different populations (9). Using tag SNPs, untyped sites can be complemented by genotype imputation using a reference genome to increase the number of SNPs that can be used for further association studies (10, 11). In large-scale multiethnic studies, four kinds of ethnic-specific SNP arrays were first designed for European, East Asian, African American and Latino populations with simulations of genotype imputation (12, 13). Similarly, biobanks and/or cohort projects developed ethnic-specific SNP arrays, such as the UK biobank Axiom Array (14), the Axiom-NL Array based on GoNL reference data in Netherlands (15) and the Axiom Array for Finnish of the FinnGen project. In East Asia, ethnic-specific custom arrays were also developed by the Taiwan Biobank as the TWB Array, based on the Axiom Genome-Wide CHB 1 Array (16); by the Korean biobank as Axiom KoreanChip, based on 2,576 WGS data (17) and by the Axiom China Kadoorie Biobank Array (18).

Most of these large-scale projects adopted the Axiom system because of the flexibility of the manufacturing array, the highly automated assay process and the robust sample tracking with a 96-array layout. Concurrent with the trend towards developing ethnic arrays, we also selected the Axiom system to design a Japanese-specific SNP array [the Japonica Array (JPA)]. We designed the first version of the Japonica Array (JPAv1) in 2014 (19). JPAv1 contains tag SNPs selected by means of a statistical measure called ‘mutual information’ with minor allele frequency (MAF) ≥0.5% to cover rare variants from a reference panel comprising 1,070 Japanese genomes, and a number of characterized SNPs from the genome-wide association studies (GWAS) catalog plus some other databases. In 2017, we updated JPAv1 and developed the second version (JPAv2) by increasing nontag SNPs, such as human leukocyte antigen (HLA), killer cell immunoglobulin-like receptor (KIR) regions and Y chromosome, and by replacing markers that were not working in JPAv1. Genotyping of TMM participants was conducted primarily with JPAv2 until 2018.

To enhance direct genotyping of previously identified disease risk variants and to obtain maximum genomic coverage with the expanded whole-genome reference panel including nearly 4,000 Japanese individuals (6), we aimed to design a novel and substantially revised version of the JPA, which we call Japonica Array NEO (JPA NEO). In this paper, we describe how we have improved upon JPAv2 to create JPA NEO. Of the various improvements, one salient point is that we have changed the selection algorithm for tag SNPs from using the mutual information criteria to the global standard of using r2 of the LD measure, aiming to improve imputation accuracy and to standardize the data for use in meta-analyses conducted anywhere in the world. We also report the progress of genotyping TMM participants by using all three versions of the JPA.

Materials and Methods

Tag SNP selection for JPA NEO

A target set was constructed using founders in the repository of our new genome reference panel consisting of genomes from 3,820 Japanese participants (6). For X chromosome, only females of above panel (2,066 Japanese individuals) were used. Tag SNPs were selected by the standard greedy pairwise algorithm based on the pairwise r2 of LD statistics (9, 12, 13). Briefly, starting with a set of target sites with an MAF higher than a specific threshold, one site with the maximum number of others exceeding the r2 threshold was selected. Then, this maximally informative site and all other associated sites were grouped as a bin of tag SNPs and removed from the target set. These steps were iterated until the total number of tag SNPs matched that of JPAv2. When multiple tag SNPs were selected in the same step, we prioritized them according to the following three criteria: (1) the maximum score of annotation by ANNOVAR (20) (exonic or splicing = 6, ncRNA = 5, 5′-UTR or 3′-UTR = 4, intronic = 3, upstream or downstream = 2, intergenic = 1 and no annotation = 0); (2) not A-T or G-C of alternative-reference alleles and (3) yielding the maximum variance of base-pair positions.

Selection of disease-related markers for JPA NEO

Disease-related markers were selected primarily from published lists of disease-related genes and GWAS results of Japanese populations, with expert advice. In addition, markers in the NHGRI GWAS catalog (21) and the UK Biobank Array (14) were also selected. From the latter, we extracted markers present in the Japanese population by referring to the 3.5KJPNv2 panel.

Development of JPA NEO

The list of tag SNPs and disease-related markers was combined with those of Y chromosome and mitochondrial markers. Based on the combined list, the array was produced using the Axiom myDesign service (Themo Fisher Scientific, Inc.). Multiple probes were designed for markers that were not included in the Axiom™ validated probe sets. Then, control markers were added, and the total number of markers was adjusted to the maximum number for the Axiom 96-array layout. The full marker list and detailed list of disease-related SNPs are available at the jMorp website (https://jmorp.megabank.tohoku.ac.jp/downloads/#jpa).

DNA samples

Isolation and quality control (QC) of genomic DNA from blood and saliva samples in the TMM biobank were performed as described previously (22). Genomic DNA samples isolated from the blood of TMM participants, but not those included in the reference panel, were used to evaluate JPA NEO. The study was approved by the Research Ethics Committee of ToMMo, Tohoku University.

Genotyping with JPAs

A genotype assay was performed according to the manufacturer’s protocol (i.e. Axiom™ 2.0 Genotyping Assay User Manual for 8-plate workflow). Briefly, target DNA was enzymatically amplified and fragmentated, and after confirmations of concentration and fragment length by NanoDrop (Thermo Fisher Scientific) and TapeStation System (Agilent Technologies), hybridization, ligation and scanning were processed by a semiautomated machine, GeneTitan™ Multi-Channel Instrument (Thermo Fisher Scientific). These processes were conducted using liquid-dispensable robots (Nimbus™, Hamilton; Biomek FXP, Beckman Coulter) and managed by a laboratory information management system (LIMS, LabVantage Solutions). For the QC, the dish quality control (DQC), sample QC call rate and plate pass rate were analysed using control markers for the Axiom platform (around 19,000) according to the Axiom™ Genotyping Solution Data Analysis Guide using Axiom™ Power Tools (APT, version 1.16.1). Genotyping data satisfying the criteria were used for the following QC analyses.

QC analysis and genotype imputation

Genotyping data were further analysed for SNP QC, sample QC and plate QC using all markers per plate, in accordance with the abovementioned analysis guide. After filtering out variant sites with low call rates, low MAF, or showing substantial deviation from Hardy–Weinberg equilibrium, SHAPEIT2 (23) and IMPUTE2 (24) were used to conduct prephasing and genotype imputation, respectively. The imputation accuracy was evaluated using the squared correlation, r2, with leave-one-out SNP masking methods (12, 13, 25). Briefly, genotype imputation was performed by masking an input SNP and the imputed SNP was compared with the masked one to obtain r2, after which the average r2 in each MAF bin was calculated. Another metric, the information measure (INFO score) given by IMPUTE2, was used to analyse the imputation quality for each marker, where the value 0–1 indicated the uncertainty about the imputed genotype (11).

Results

Tag SNP selection for improved genome-wide coverage

In JPA NEO, our updated version of the JPA, we used the maximum number on a single array of the Axiom 96-array layout, and the total of nearly 670,000 markers was divided into about 650,000 tag SNPs and tens of thousands of disease-related markers. The selection process of JPA NEO is essentially the same as previous versions of the JPA. However, we have selected these markers by using the latest version of our genome reference panel, which contains the genomes of 3,552 Japanese individuals (3.5KJPNv2) (6), which is about three times greater than that used for the previous versions (JPAv1 and JPAv2). Of note, while the previous two versions of the JPA used mutual information for tag SNP selection (19), in JPA NEO we decided to change the method for selecting tag SNPs to one based on the standard protocol using pairwise r2 (9) (Table I). This has the advantage of allowing us to harmonize our data with those of other studies. We believe that it is of great importance to perform meta-analyses with other large-scale GWAS utilizing the same concept. A comparison of the design of JPA NEO with those of JPAv1 and JPAv2 is summarized in Table I.

Table I.

Overview of JPA design

JPAv1 (19) JPAv2 JPA NEO
(released December 01, 2014) (released October 27, 2017) (released September 17, 2019)
Tag SNPs
 Whole-genome reference panel of the Japanese population 1KJPN (5) (n = 1,070) 1KJPN (5) (n = 1,070) 3.5KJPNv2 (6) (n = 3,552a)
 Method for selection Mutual information (19) Mutual information (19)

r 2 of LD measures (9);

r 2 ≥ 0.8

 MAF threshold ≥0.5% ≥0.5% ≥1%
Disease-related markers NHGRI GWAS catalog (24) NHGRI GWAS catalog (24) Published GWAS data primarily of the Japanese population, published lists of disease genes, NHGRI GWAS catalog (24) and the UK Biobank Array (14)b
a

Including samples outside from the Tohoku region.

b

Markers present in the Japanese population extracted by 3.5KJPNv2.

To optimize the selection of tag SNPs, we first selected tag SNPs from chromosome 10 of the 3.5KJPNv2 reference panel by using greedy pairwise algorithm (9) with different combinations of thresholds of MAF; i.e. ≥0.005, ≥0.01 or ≥0.05 and pairwise r2 of LD measures; r2 ≥ 0.5 or ≥0.8. Two metrics were used to evaluate tag SNP performance: (1) genomic coverage, which is the proportion of untyped sites with at least one tag SNP with r2 greater than a given threshold and (2) the number of variants obtained by genotype imputation above the threshold of a given INFO score, which is an index of imputation accuracy. When tag SNPs were selected by pairwise r2 ≥ 0.8 and MAF ≥ 0.01, the genomic coverage with r2 ≥ 0.8 and the number of imputed variants from the 2KJPN reference panel (2,049 Japanese genomes) with INFO > 0.9 were better or comparable to those of JPAv2 and Infinium Omni2.5-8 (Fig. 1). Based on these results, we decided to select tag SNPs with pairwise r2 of LD measures ≥0.8 and MAF ≥ 0.01 from the target set of autosomes and the X chromosome. For the design of JPA NEO, a substantial number, >1,000 of sex-chromosome SNPs on two pseudoautosomal regions were newly selected, whereas only about 10 SNPs on these regions were available in JPAv1 and JPAv2.

Fig. 1.

Fig. 1.

Evaluation of tag SNPs performance selected by different conditions. Tag SNPs were selected by different thresholds of pairwise r2 (0.5 or 0.8) and MAF (0.005, 0.01 or 0.05) from the target set of chromosome 10. The markers on JPAv2 or Infinium Omni2.5-8 (Omni 2.5) in the same chromosome were used for the control. (A) Genomic coverage was analysed with r2 threshold of 0.2, 0.5 or 0.8. (B) The number of imputed variants by the 2KJPN haplotype reference panel with an INFO score threshold of 0.5 or 0.9.

We also selected Y chromosomal markers for the Y haplogroup classification of the International Society of Genetic Genealogy (26) and from those in JPAv1 and JPAv2, which were selected using preexisting Axiom arrays for Asian populations. Mitochondrial markers were extracted mainly from 3.5KJPNv2 by removing those with MAF < 0.5% as well as those with multiple alleles. Most markers corresponding to the HLA and KIR regions were taken over from those adopted for JPAv1 and JPAv2.

Selection of disease-related markers based on published evidence

For the selection of disease-related markers, we picked ∼9,000 markers present in the Japanese population, mainly from among published lists of disease genes and GWAS-identified risk variants. The former includes known and candidate functional variants on gene lists from the American College of Medical Genetics and Genomics (27) and 1,866 pharmacogenomics markers in 38 genes, 18 of which were obtained from drug guidelines published by the Clinical Pharmacogenetics Implementation Consortium as of April 2020 (28). The latter includes published risk variants for various complex diseases identified by GWAS of the Japanese population and a meta-analysis of East Asian populations. Representative examples are shown in a Supplementary Table S1, which includes 99 markers (96 genes) of type 2 diabetes (29), 100 markers (94 genes) of lipid metabolism, 45 markers (35 genes) of obesity, as well as 12 markers (7 genes) and 33 markers (24 genes) of late-onset Alzheimer’s disease identified by GWAS of the Japanese population and meta-analyses of European populations, respectively.

Moreover, ∼13,000 and 12,000 markers were selected from the NHGRI GWAS catalog (21) and UK Biobank Array (14), respectively. We used reference panel 3.5KJPNv2 to extract the markers present in the Japanese population. The novel Axiom SNP array specific to the Japanese population was developed as JPA NEO.

JPA NEO has genome-wide coverage and contains disease risk SNPs

The developed JPA NEO contains a total of 666,883 markers; the number of markers in each category is shown in Table II in comparison with JPAv1/JPAv2. In JPAv1/JPAv2, tag SNPs from autosomes and the X chromosome account for ∼98% (>650,000 SNPs). In contrast, nearly 8,500 SNPs from the Y chromosome (779 markers), mitochondria (409 markers) and HLA and KIR regions (6,757 and 532 markers, respectively) were also included to realize genome-wide coverage and genotyping of specific functional variants.

Table II.

Number of markers for each category of JPAv1, JPAv2 and JPA NEO

Number of markersa
Category JPAv1 JPAv2 JPA NEO
Tag SNPs including 22 autosomes and X chromosome 638,269 632,186 654,246
Y chromosome 275 606 779
Mitochondria 70 104 409
HLA 3,906 6,914 6,757
KIR 412 1,014 532
Disease-related markers
 From the literature 9,366
 From databases 10,798 11,171 22,451b
Total 659,253 659,328 666,883
a

Including markers present in multiple categories.

b

Extracted markers present in Japanese population by 3.5KJPNv2.

Although there is some overlap with the above SNPs, a total of 28,298 disease-related SNPs in 12 disease categories and pharmacogenomics are included as well (Table III). These SNPs include risk alleles for complex diseases, including dementia, depression and autism spectrum disorder among psycho-neurologic diseases (5,556 markers), type 2 diabetes and hyperlipidaemia among metabolic diseases (2,948 markers) and asthma and atopic dermatitis among immunological diseases (6,426 markers). In addition, variants related to physical traits (height, blood protein levels, etc.), expression quantitative trait locus, and so on are categorized as ‘others’.

Table III.

Summary of disease-related markers in JPA NEO

Disease category Number of markersa
Psycho-neurologic diseases 5,556
Cardiovascular diseases 1,108
Respiratory diseases 2,000
Metabolic diseases 2,948
Immunological diseases 6,426
Ophthalmological diseases 441
Mitochondrial diseases 61
Urological diseases 367
Gastrointestinal diseases 452
Gynecological diseases 270
Cancer 986
Inherited diseases 943
Pharmacogenomics 1,866
Otherb 10,345
Totalc 28,298
a

Some markers are included in multiple categories.

b

Physical traits, expression quantitative trait locus, etc.

c

Without overlap.

Of note, 3,472 markers (0.52%) in JPA NEO were MAF < 1% as confirmed by 3.5KJPNv2 (Supplementary Table S2). This is due to the adoption of some disease-related markers regardless of their MAF in 3.5KJPNv2. We have compiled the full list of disease-related markers with keywords and disease categories as a Supplementary Table S3, which can be downloaded from the jMorp website (30).

High imputation performance of JPA NEO

We modified the tag SNPs for JPA NEO from the previous versions with the aim of improving the imputation coverage of the microarray. To verify this point, we analysed the performance of JPA NEO in comparison with that of JPAv2. To this end, the same 286 samples, which were not included in the 3.5KJPNv2 reference panel, were genotyped using both JPA NEO and JPAv2. We found that the median call rates of JPAv2 and JPA NEO for all markers per sample were >99.6% and 99.8%, respectively (Supplementary Table S4), indicating that the call rate of JPA NEO is slightly better than that of the JPAv2.

More than 99% of markers were polymorphic in both JPAv2 and JPA NEO, as we intended (Table IV). Some microarrays are designed to cover a wide range of ethnicities, which is in contrast to the aim and scope of our Japanese-specific arrays. We hypothesized that the former type of microarrays may have lower performance compared with ethnic-specific ones. To address this point, we compared the performance of JPAv2 and the Infinium Asian Screening Array (ASA), which covers a wide range of Asian populations, including Japan, by using the genomes of 191 Japanese in the TMM cohorts. We found that >17% of markers were monomorphic in the ASA array, while >99% worked as polymorphic markers in JPAv2 (Table IV) with a median call rate of >99% for both arrays (Supplementary Table S4). This observation supports our contention that ethnic-specific microarrays are critical for analysing each ethnic population.

Table IV.

Numbers of polymorphic markers according to small-scale genotyping of Japanese individuals

Numbers of markers analysed Numbers of polymorphic markers (%)
(n = 286)a
 JPAv2b 643,417 639,137 (99.33)
 JPA NEOb 659,745 656,747 (99.55)
(n = 191)a
 JPAv2b 652,920 647,995 (99.25)
 ASAc 645,991 532,647 (82.45)
a

Used the same DNA sample set for each comparative analysis.

b

Used the markers within the recommended probe set list created during the QC analysis.

c

Removed tri-allelic markers and markers for missing and overlapping positions.

When we closely inspected the MAF distributions of JPA NEO in comparison with those of JPAv2, we noticed that JPAv2 showed low numbers of MAF markers (15–25%) compared with JPA NEO (Fig. 2). We envisage that this may be due to the method for selecting tag SNPs. However, our new selection method has significantly improved the marker distribution in this region.

Fig. 2.

Fig. 2.

MAF distributions from small-scale genotyping. The MAF distributions of (A) JPA NEO and (B) JPAv2 were obtained by genotyping 286 individuals and analysing 659,754 and 643,417 markers, respectively. The number of markers present in each MAF bin (0.01 interval) is shown.

We performed genotype imputation of autosomes by using the haplotype reference panel of 3.5KJPNv2 and evaluated the imputation accuracy according to two metrics, imputation quality r2 and INFO score. The mean r2 and INFO score were >0.9 and 0.8, respectively, in MAF bin >2.5–5% of two arrays (Fig. 3), indicating reliable imputation accuracy for both JPAv2 and JPA NEO. However, importantly, we also noticed that there was a significant decrease in mean r2 in the region over MAF 20% in JPAv2. Whereas the precise reason for this decrease remains to be clarified, the decrease has been abrogated in JPA NEO.

Fig. 3.

Fig. 3.

Imputation accuracy of JPA NEO compared with that of JPAv2. Imputation accuracy was measured by (A) the coefficient of determination, r2, and (B) the INFO score. Genotyping was performed for 286 individuals using both arrays, and genotype imputation was performed using the 3.5KJPNv2 haplotype reference panel. The mean values in each MAF bin are shown.

As shown in Table V, slightly but clearly more imputed markers with INFO > 0.8 were obtained from genotyping data by JPA NEO than JPAv2, especially those with MAF < 1% (1.08-fold). We found that a total of >12 million markers were imputed by the small-scale analyses of the two arrays. These results indicate that while both JPA NEO and JPAv2 provide sufficient power for genotyping the Japanese population and following genotype imputation, JPA NEO shows better imputation performance without any bias throughout MAF bins. Thus, we conclude that JPA NEO is the most reliable imputation array ever developed for the Japanese population.

Table V.

Number of imputed markers (INFO score > 0.8)

MAF < 1% 1 ≤ MAF ≤ 5% 5% < MAF
(n = 286)
 JPAv2a 3,605,463 2,237,278 6,583,308
 JPA NEOa 3,919,446 2,316,257 6,639,510
a

Used the same DNA sample set for genotyping.

Large-scale genotyping by JPAs in the TMM project

To establish a solid research infrastructure for genomic medicine in Japan, the TMM project aimed to generate as much genotype data as possible from its 150,000 participants. To this end, we have been genotyping TMM cohort participants using the JPA since 2014. To complete such as large-scale genotyping efficiently, we established an elaborate three-group system from sample selection to genotyping, which connects to the data qualification.

We prepared our own special workflow for the ToMMo analysis, which ensures efficient and reliable sample processing and supports high-throughput measurement (Fig. 4). The first step is preparing the target sample lists containing the thousands of participants corresponding to a specific purpose, such as the TMM CommCohort participants with respiratory function data. The selection of participants and availability of DNA samples or biospecimens are supported by LIMS at the TMM biobank (22). This step is conducted by Center for Genome Platform Projects. The second step is extracting and dispensing the DNA into 96-well plates. To divide samples into individual plates in a well-ordered and formulated manner, the correspondence between sample identifier (ID) and well position is manifested by creating the plate map before dispensing the DNA samples. This step is conducted by Group of Biobank. The final step is transporting the DNA plates and plate maps to the genotyping facility attached to the TMM Biobank, which is operated using LIMS by Group of Microarray-based Genotyping Analysis. For security control, different sample IDs were used for sample collection, storage and analysis (31).

Fig. 4.

Fig. 4.

Workflow for large-scale genotyping in the TMM project. Based on the plate maps created from the target sample lists, the DNA plates were prepared and transported to the genotyping facility.

Capitalizing on this workflow, in May 2020, we obtained JPA data of ∼130,000 participants who met the criteria for QC analysis using control markers. The dataset comprises ∼2,000 JPAv1, 101,000 JPAv2 and 27,000 JPA NEO data (Table VI). We have already analysed >63,000 samples from the TMM CommCohort by using JPAv2, whereas the TMM BirThree Cohort samples were analysed by either JPAv2 or JPA NEO. Considering further association analyses, we are in the process of designing a rigid protocol that would allow each family unit to be analysed by the same JPA.

Table VI.

Progress of genotyping TMM samples by JPA as of May 2020

Passed DQC and QC call rate/totalb (% pass rate)
Blood samples Saliva samples
JPAv1a 2,350/— None
JPAv2 100,005/100,166 (99.84%) 949/1,056c (89.87%)
JPA NEO 25,982/26,016 (99.87%) 768/768 (100%)
a

Genotyping conducted by Toshiba Inc.

b

Including samples analysed by multiple JPAs.

c

Including one failed plate below the criteria of mean QC call rate of passed samples.

We have been using DNA samples obtained primarily from peripheral or cord blood. When samples from these sources were not available, mostly those from the children of TMM BirThree Cohort participants, DNA from saliva samples was used and analysed separately with the one from blood. In our operation, the QC pass rate has been >99% for blood samples using both JPAv2 and JPA NEO. In contrast, that of saliva samples as ∼90% using JPAv2, likely due to the presence of lower-quality samples. We believe that with this accomplishment, JPA NEO now has enough control data of the resident population to be an important and useful array for the entire Japanese population.

Discussion

The TMM project is one of the first large-scale prospective genome cohort studies in Japan and aims to realize precision medicine and personalized healthcare. To construct genome research infrastructure, we had to consider a cost-effective and high-throughput strategy for the acquisition of genomic data of >150,000 participants. Based on previous studies on genomic variants in diverse populations (32), we recognized that commercial arrays for global or even Asian populations were not sufficient for our purpose. Therefore, we decided to develop a custom ethnic-specific SNP array, the JPA, to maximize the acquisition of polymorphic markers in the Japanese population and provide genomic coverage with reliable genotype imputation accuracy while reducing cost.

In the TMM project, the whole-genome reference panel was expanded from 1KJPN to 2KJPN and 3.5KJPN. The latest version, 3.5KJPNv2, was constructed not only with an increased number of single-nucleotide variants but also added those from the X chromosome and mitochondria (6). JPA NEO was designed by reselecting the tag SNPs of autosomes and the X chromosome from this panel. The haplotype reference panel for genotype imputation was also updated from 2KJPN to 3.5KJPNv2. This update to the imputation panel yielded an increase of >5 million imputed variants in the preliminary analysis of 335 samples using JPAv2 data compared with those obtained by genotype imputation with 2KJPN in the same sample analysis (data not shown). Thus, 3.5KJPNv2 is more effective than previous reference panels in providing genome-wide coverage in terms of both tag SNP selection and genotype imputation.

The genotype imputation performance of JPA NEO was evaluated in comparison with that of JPAv2 by performing a small-scale analysis. In the genotyping data obtained by both arrays, monomorphic markers were scarcely observed and the large number of variants were imputed with a high imputation quality r2 and INFO score. Of note, JPA NEO showed better statistics compared with JPAv2 but without any bias, suggesting that JPA NEO is the best-ever SNP array developed for the Japanese population. The compatibility of markers in JPAv2 and JPA NEO is ∼40% (data not shown). Therefore, it seems important to develop a method for utilizing the genotyping data obtained by different JPA array platforms, which we plan to provide as a user guideline when the full data of all 150,000 TMM participants are released.

JPA NEO incorporates nearly 30,000 disease-related variants previously reported in the literature and stored in databases, to allow for the evaluation of known functional risk alleles in the Japanese population. Because some SNPs with MAF < 1% were included, their SNP cluster plots and the concordance with genotypes obtained by WGS analysis must be carefully assessed. However, qualified disease risk variants can be used for association studies along with phenotype data.

The JPAs have been used to perform large-scale genotyping of TMM samples. Whereas we have not experienced any issues with plate QC assessments conducted so far, we are planning to carefully implement batch-based as well as statistical genetic QC analyses to assess whether a plate effect is caused by sample selection bias. Indeed, in the UK Biobank, a sample picking algorithm has been used for genotyping experiments to prevent clustering of participants in the same plate by time or date of collection, collection centre, geography or participant phenotypes (33). In contrast, we did not intentionally randomize sample picking; we selected samples according to the aim of our analysis. For example, TMM CommCohort samples with respiratory function data for GWAS were selected and analysed using the same plates. Therefore, each plate should include samples collected from the same periods, regions and families.

Among the ∼130,000 genotyping data of TMM participants that we have processed so far, samples satisfying the criteria of sample DQC and QC call rate are quite high, especially when using blood samples (>99.8%). However, the pass rate of saliva samples was slightly worse (>89.8%) than that of blood samples in JPAv2. The use of saliva has been reported yield a low rate in other large-scale genotyping projects, for instance, 93.8% in the Genetic Epidemiology Research on Adult Health and Aging cohort (34). This may be due to the lower quality of saliva-derived samples, which is sometimes observed by electrophoresis as DNA degradation; this is likely due to problems during sample collection by participants, such as when they mix the sample with Oragene preservative solution. We are sharing the direct and imputed genotyping data with the research community upon completion of the QC analyses and genotype imputation. More than 54,000 JPA data have already been released as of June 2020, with associated data such as biochemical examinations and questionnaires. The full genotype data of the TMM project is expected to be released soon.

Data obtained by the genotype imputation array have been successfully utilized for GWAS. Summary statistics of large-scale GWAS are precious for the development of genetic risk scores, such as the polygenic risk score (PRS) (35). PRSs will be used to identify groups of individuals for therapeutic intervention, initiation and interpretation of disease screens and life planning (36). So far, the number and scale of GWAS in the European population greatly exceed those in non-European populations (37). However, the application of PRSs based on European cohorts to other populations is limited due to biases originating from the genomic diversity among populations, for instance, the difference in LD structure around causal variants. Further studies are required to develop PRSs for the Japanese population, and to evaluate their clinical utility used together with conventional clinical risk scores as pointed out recently (38, 39).

We believe that our future efforts should be focussed on acquiring genotype data from all participants of the TMM cohorts as well as implementing a GWAS to develop and evaluate genetic risk scores, including PRSs, optimized for the Japanese population. Genomic data obtained by the TMM project will serve as an excellent control for the GWAS executed using other biobanks/cohorts in Japan, and it will also be exploited for GWAS of associated phenotypes and omics data from the TMM project. We also believe that the JPA should continue to be updated. For the next version, we are planning to design a medical checkup array with a minimal set of tag SNPs that nevertheless contains abundant risk SNPs. These efforts will also contribute to further identifying genetic determinants of diseases in those of East Asian ancestry (32).

In conclusion, we designed a new version of the JPA, JPA NEO, to improve both genome-wide coverage and genotyping of disease risk variants. Disease risk variants were selected from the literature and filtered by our reference panel to extract those expected to be present in the Japanese population. Experimental verification using the developed JPA NEO showed greater imputation performance without any bias through a wide range of MAF and with increased imputed variants compared with the previous version. Large-scale genotyping of TMM samples using JPA NEO is now underway. JPA NEO will provide highly accurate, efficient and cost-effective genotyping for the Japanese population. Combining the JPA data of TMM participants with those of other Japanese biobanks/cohorts will be helpful for better understanding the genetic risks of complex diseases, leading to its application for disease risk prediction and prevention and consequently personalized healthcare.

Author Contributions

K.Ku., C.G., S.M., A.U., S.T., I.N.M., A.O., A.N. and Y.A. performed the computational analyses. M.S.-Y., K.Ku., M.K., S.I., A.O. and H.Ku. prepared the samples and conducted the genotyping experiments. M.S.-Y., M.Y. and K.Ki. wrote the manuscript with assistance from the other authors. M.S.-Y., I.D., J.Y., H.Ka., N.M., S.K., N.F., G.T., M.Y. and K.Ki. conceived and supervised the project. All authors read and approved the final manuscript.

Supplementary Data

Supplementary Data are available at JB Online.

Supplementary Material

mvab060_Supplementary_Data

Acknowledgements

We would like to thank all participants of the TMM Study. We would also like to acknowledge everyone who assisted with the study, especially Sachiyo Sugimoto, Nanako Sugawara and Satoshi Souma for technical assistance, and Hiroaki Hashizume for fruitful discussion. The full list of ToMMo members is available at https://www.megabank.tohoku.ac.jp/english/a201201/. We would also like to thank the Iwate Medical University Iwate Tohoku Medical Megabank Organization for collaboration.

Funding

This work was supported by following programs by the Japan Agency for Medical Research and Development (AMED) and the Ministry of Education, Culture, Sports, Science and Technology (MEXT); the Tohoku Medical Megabank Project [JP20km0105001 and JP20km0105002], the Advanced Genome Research and Bioinformatics Study to Facilitate Medical Innovation [GRIFIN; JP20km0405203] and the Facilitation of R&D Platform for the AMED Genome Medicine Support [JP20km0405001] of the Platform Program for Promotion of Genome Medicine (P3GM). This work was partially supported by the Center of Innovation (COI) Program [JPMJCE1303] from the MEXT and the Japan Science and Technology Agency (JST). The funding bodies played no role in the design of the study and collection, analysis and interpretation of data and in writing the manuscript.

Conflict of Interest

The authors declare no conflict of interests.

Abbreviations

ASA

Asian Screening Array

DQC

dish quality control

GWAS

genome-wide association study

HLA

human leukocyte antigen

JPA

Japonica Array

KIR

killer cell immunoglobulin-like receptor

LD

linkage disequilibrium

LIMS

laboratory information management system

MAF

minor allele frequency

PRS

polygenic risk score

QC

quality control

SNP

single-nucleotide polymorphism

TMM

Tohoku Medical Megabank

ToMMo

Tohoku Medical Megabank Organization

WGS

whole-genome sequencing

References

  • 1. Kuriyama S., Yaegashi N., Nagami F., Arai T., Kawaguchi Y., Osumi N., Sakaida M., Suzuki Y., Nakayama K., Hashizume H., Tamiya G., Kawame H., Suzuki K., Hozawa A., Nakaya N., Kikuya M., Metoki H., Tsuji I., Fuse N., Kiyomoto H., Sugawara J., Tsuboi A., Egawa S., Ito K., Chida K., Ishii T., Tomita H., Taki Y., Minegishi N., Ishii N., Yasuda J., Igarashi K., Shimizu R., Nagasaki M., Koshiba S., Kinoshita K., Ogishima S., Takai-Igarashi T., Tominaga T., Tanabe O., Ohuchi N., Shimosegawa T., Kure S., Tanaka H., Ito S., Hitomi J., Tanno K., Nakamura M., Ogasawara K., Kobayashi S., Sakata K., Satoh M., Shimizu A., Sasaki M., Endo R., Sobue K., Tohoku Medical Megabank Project Study Group T., Yamamoto M. (2016) The Tohoku Medical Megabank Project: design and Mission. J. Epidemiol. 26, 493–511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Hozawa A., Tanno K., Nakaya N., Nakamura T., Tsuchiya N., Hirata T., Narita A., Kogure M., Nochioka K., Sasaki R., Takanashi N., Otsuka K., Sakata K., Kuriyama S., Kikuya M., Tanabe O., Sugawara J., Suzuki K., Suzuki Y., Kodama E.N., Fuse N., Kiyomoto H., Tomita H., Uruno A., Hamanaka Y., Metoki H., Ishikuro M., Obara T., Kobayashi T., Kitatani K., Takai-Igarashi T., Ogishima S., Satoh M., Ohmomo H., Tsuboi A., Egawa S., Ishii T., Ito K., Ito S., Taki Y., Minegishi N., Ishii N., Nagasaki M., Igarashi K., Koshiba S., Shimizu R., Tamiya G., Nakayama K., Motohashi H., Yasuda J., Shimizu A., Hachiya T., Shiwa Y., Tominaga T., Tanaka H., Oyama K., Tanaka R., Kawame H., Fukushima A., Ishigaki Y., Tokutomi T., Osumi N., Kobayashi T., Nagami F., Hashizume H., Arai T., Kawaguchi Y., Higuchi S., Sakaida M., Endo R., Nishizuka S., Tsuji I., Hitomi J., Nakamura M., Ogasawara K., Yaegashi N., Kinoshita K., Kure S., Sakai A., Kobayashi S., Sobue K., Sasaki M., Yamamoto M. (2020) Study profile of the Tohoku Medical Megabank Community-based Cohort Study. J. Epidemiol. 31, 65–76 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Kuriyama S., Metoki H., Kikuya M., Obara T., Ishikuro M., Yamanaka C., Nagai M., Matsubara H., Kobayashi T., Sugawara J., Tamiya G., Hozawa A., Nakaya N., Tsuchiya N., Nakamura T., Narita A., Kogure M., Hirata T., Tsuji I., Nagami F., Fuse N., Arai T., Kawaguchi Y., Higuchi S., Sakaida M., Suzuki Y., Osumi N., Nakayama K., Ito K., Egawa S., Chida K., Kodama E., Kiyomoto H., Ishii T., Tsuboi A., Tomita H., Taki Y., Kawame H., Suzuki K., Ishii N., Ogishima S., Mizuno S., Takai-Igarashi T., Minegishi N., Yasuda J., Igarashi K., Shimizu R., Nagasaki M., Tanabe O., Koshiba S., Hashizume H., Motohashi H., Tominaga T., Ito S., Tanno K., Sakata K., Shimizu A., Hitomi J., Sasaki M., Kinoshita K., Tanaka H., Kobayashi T., Tohoku Medical Megabank Project Study G., Kure S., Yaegashi N., Yamamoto M. (2020) Cohort Profile: Tohoku Medical Megabank Project Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study): rationale, progress and perspective. Int. J. Epidemiol. 49, 18–19m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Fuse N., Sakurai-Yageta M., Katsuoka F., Danjoh I., Shimizu R., Tamiya G., Nagami F., Kawame H., Higuchi S., Kinoshita K., Kure S., Yamamoto M. (2019) Establishment of integrated biobank for precision medicine and personalized healthcare: the Tohoku Medical Megabank Project. JMA J. 2, 113–122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Nagasaki M., Yasuda J., Katsuoka F., Nariai N., Kojima K., Kawai Y., Yamaguchi-Kabata Y., Yokozawa J., Danjoh I., Saito S., Sato Y., Mimori T., Tsuda K., Saito R., Pan X., Nishikawa S., Ito S., Kuroki Y., Tanabe O., Fuse N., Kuriyama S., Kiyomoto H., Hozawa A., Minegishi N., Douglas Engel J., Kinoshita K., Kure S., Yaegashi N., To M.J.R.P.P., Yamamoto M., and ToMMo Japanese Reference Panel Project (2015) Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat. Commun. 6, 8018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Tadaka S., Katsuoka F., Ueki M., Kojima K., Makino S., Saito S., Otsuki A., Gocho C., Sakurai-Yageta M., Danjoh I., Motoike I.N., Yamaguchi-Kabata Y., Shirota M., Koshiba S., Nagasaki M., Minegishi N., Hozawa A., Kuriyama S., Shimizu A., Yasuda J., Fuse N., Tohoku Medical Megabank Project Study G., Tamiya G., Yamamoto M., Kinoshita K. (2019) 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome. Hum. Genome Var. 6, 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.International HapMap, C. (2005) A haplotype map of the human genome. Nature 437, 1299–1320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.1000 Genomes Project Consortium, Abecasis G.R., Altshuler D., Auton A., Brooks L.D., Durbin R.M., Gibbs R.A., Hurles M.E., McVean G.A. (2010) A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Carlson C.S., Eberle M.A., Rieder M.J., Yi Q., Kruglyak L., Nickerson D.A. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Marchini J., Howie B., Myers S., McVean G., Donnelly P. (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 [DOI] [PubMed] [Google Scholar]
  • 11. Marchini J., Howie B. (2010) Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 [DOI] [PubMed] [Google Scholar]
  • 12. Hoffmann T.J., Kvale M.N., Hesselson S.E., Zhan Y., Aquino C., Cao Y., Cawley S., Chung E., Connell S., Eshragh J., Ewing M., Gollub J., Henderson M., Hubbell E., Iribarren C., Kaufman J., Lao R.Z., Lu Y., Ludwig D., Mathauda G.K., McGuire W., Mei G., Miles S., Purdy M.M., Quesenberry C., Ranatunga D., Rowell S., Sadler M., Shapero M.H., Shen L., Shenoy T.R., Smethurst D., Van den Eeden S.K., Walter L., Wan E., Wearley R., Webster T., Wen C.C., Weng L., Whitmer R.A., Williams A., Wong S.C., Zau C., Finn A., Schaefer C., Kwok P.Y., Risch N. (2011) Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. Genomics 98, 79–89 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Hoffmann T.J., Zhan Y., Kvale M.N., Hesselson S.E., Gollub J., Iribarren C., Lu Y., Mei G., Purdy M.M., Quesenberry C., Rowell S., Shapero M.H., Smethurst D., Somkin C.P., Van den Eeden S.K., Walter L., Webster T., Whitmer R.A., Finn A., Schaefer C., Kwok P.Y., Risch N. (2011) Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98, 422–430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O'Connell J., Cortes A., Welsh S., Young A., Effingham M., McVean G., Leslie S., Allen N., Donnelly P., Marchini J. (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Ehli E.A., Abdellaoui A., Fedko I.O., Grieser C., Nohzadeh-Malakshah S., Willemsen G., de Geus E.J., Boomsma D.I., Davies G.E., Hottenga J.J. (2017) A method to customize population-specific arrays for genome-wide association testing. Eur. J. Hum. Genet. 25, 267–270 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Chen C.H., Yang J.H., Chiang C.W.K., Hsiung C.N., Wu P.E., Chang L.C., Chu H.W., Chang J., Song I.W., Yang S.L., Chen Y.T., Liu F.T., Shen C.Y. (2016) Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan Biobank project. Hum. Mol. Genet. 25, 5321–5331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Moon S., Kim Y.J., Han S., Hwang M.Y., Shin D.M., Park M.Y., Lu Y., Yoon K., Jang H.M., Kim Y.K., Park T.J., Song D.S., Park J.K., Lee J.E., Kim B.J. (2019) The Korea Biobank Array: design and Identification of Coding Variants Associated with Blood Biochemical Traits. Sci. Rep. 9, 1382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Dai J., Lv J., Zhu M., Wang Y., Qin N., Ma H., He Y.Q., Zhang R., Tan W., Fan J., Wang T., Zheng H., Sun Q., Wang L., Huang M., Ge Z., Yu C., Guo Y., Wang T.M., Wang J., Xu L., Wu W., Chen L., Bian Z., Walters R., Millwood I.Y., Li X.Z., Wang X., Hung R.J., Christiani D.C., Chen H., Wang M., Wang C., Jiang Y., Chen K., Chen Z., Jin G., Wu T., Lin D., Hu Z., Amos C.I., Wu C., Wei Q., Jia W.H., Li L., Shen H. (2019) Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 7, 881–891 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Kawai Y., Mimori T., Kojima K., Nariai N., Danjoh I., Saito R., Yasuda J., Yamamoto M., Nagasaki M. (2015) Japonica Array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals. J. Hum. Genet. 60, 581–587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wang K., Li M., Hakonarson H. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. MacArthur J., Bowler E., Cerezo M., Gil L., Hall P., Hastings E., Junkins H., McMahon A., Milano A., Morales J., Pendlington Z.M., Welter D., Burdett T., Hindorff L., Flicek P., Cunningham F., Parkinson H. (2017) The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Minegishi N., Nishijima I., Nobukuni T., Kudo H., Ishida N., Terakawa T., Kumada K., Yamashita R., Katsuoka F., Ogishima S., Suzuki K., Sasaki M., Satoh M., Tohoku Medical Megabank Project Study G., Yamamoto M. (2019) Biobank establishment and sample management in the Tohoku Medical Megabank Project. Tohoku J. Exp. Med. 248, 45–55 [DOI] [PubMed] [Google Scholar]
  • 23. Delaneau O., Marchini J., Zagury J.F. (2011) A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 [DOI] [PubMed] [Google Scholar]
  • 24. Howie B.N., Donnelly P., Marchini J. (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wojcik G.L., Fuchsberger C., Taliun D., Welch R., Martin A.R., Shringarpure S., Carlson C.S., Abecasis G., Kang H.M., Boehnke M., Bustamante C.D., Gignoux C.R., Kenny E.E. (2018) Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies. G3 (Bethesda) 8, 3255–3267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Jobling M.A., Tyler-Smith C. (2003) The human Y chromosome: an evolutionary marker comes of age. Nat. Rev. Genet. 4, 598–612 [DOI] [PubMed] [Google Scholar]
  • 27. Kalia S.S., Adelman K., Bale S.J., Chung W.K., Eng C., Evans J.P., Herman G.E., Hufnagel S.B., Klein T.E., Korf B.R., McKelvey K.D., Ormond K.E., Richards C.S., Vlangos C.N., Watson M., Martin C.L., Miller D.T. (2017) Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 [DOI] [PubMed] [Google Scholar]
  • 28. Caudle K.E., Dunnenberger H.M., Freimuth R.R., Peterson J.F., Burlison J.D., Whirl-Carrillo M., Scott S.A., Rehm H.L., Williams M.S., Klein T.E., Relling M.V., Hoffman J.M. (2017) Standardizing terms for clinical pharmacogenetic test results: consensus terms from the Clinical Pharmacogenetics Implementation Consortium (CPIC). Genet. Med. 19, 215–223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Imamura M., Takahashi A., Yamauchi T., Hara K., Yasuda K., Grarup N., Zhao W., Wang X., Huerta-Chagoya A., Hu C., Moon S., Long J., Kwak S.H., Rasheed A., Saxena R., Ma R.C., Okada Y., Iwata M., Hosoe J., Shojima N., Iwasaki M., Fujita H., Suzuki K., Danesh J., Jorgensen T., Jorgensen M.E., Witte D.R., Brandslund I., Christensen C., Hansen T., Mercader J.M., Flannick J., Moreno-Macias H., Burtt N.P., Zhang R., Kim Y.J., Zheng W., Singh J.R., Tam C.H., Hirose H., Maegawa H., Ito C., Kaku K., Watada H., Tanaka Y., Tobe K., Kawamori R., Kubo M., Cho Y.S., Chan J.C., Sanghera D., Frossard P., Park K.S., Shu X.O., Kim B.J., Florez J.C., Tusie-Luna T., Jia W., Tai E.S., Pedersen O., Saleheen D., Maeda S., Kadowaki T. (2016) Genome-wide association studies in the Japanese population identify seven novel loci for type 2 diabetes. Nat. Commun. 7, 10531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Tadaka S., Saigusa D., Motoike I.N., Inoue J., Aoki Y., Shirota M., Koshiba S., Yamamoto M., Kinoshita K. (2018) jMorp: Japanese Multi Omics Reference Panel. Nucleic Acids Res. 46, D551–D557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Takai-Igarashi T., Kinoshita K., Nagasaki M., Ogishima S., Nakamura N., Nagase S., Nagaie S., Saito T., Nagami F., Minegishi N., Suzuki Y., Suzuki K., Hashizume H., Kuriyama S., Hozawa A., Yaegashi N., Kure S., Tamiya G., Kawaguchi Y., Tanaka H., Yamamoto M. (2017) Security controls in an integrated Biobank to protect privacy in data sharing: rationale and study design. BMC Med. Inform. Decis. Mak. 17, 100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.1000 Genomes Project Consortium, Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. (2015) A global reference for human genetic variation. Nature 526, 68–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Welsh S., Peakman T., Sheard S., Almond R. (2017) Comparison of DNA quantification methodology used in the DNA extraction protocol for the UK Biobank cohort. BMC Genomics 18, 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Kvale M.N., Hesselson S., Hoffmann T.J., Cao Y., Chan D., Connell S., Croen L.A., Dispensa B.P., Eshragh J., Finn A., Gollub J., Iribarren C., Jorgenson E., Kushi L.H., Lao R., Lu Y., Ludwig D., Mathauda G.K., McGuire W.B., Mei G., Miles S., Mittman M., Patil M., Quesenberry C.P. Jr., Ranatunga D., Rowell S., Sadler M., Sakoda L.C., Shapero M., Shen L., Shenoy T., Smethurst D., Somkin C.P., Van Den Eeden S.K., Walter L., Wan E., Webster T., Whitmer R.A., Wong S., Zau C., Zhan Y., Schaefer C., Kwok P.Y., Risch N. (2015) Genotyping informatics and quality control for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort. Genetics 200, 1051–1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Torkamani A., Wineinger N.E., Topol E.J. (2018) The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 [DOI] [PubMed] [Google Scholar]
  • 37. Gurdasani D., Barroso I., Zeggini E., Sandhu M.S. (2019) Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 [DOI] [PubMed] [Google Scholar]
  • 38. Mosley J.D., Gupta D.K., Tan J., Yao J., Wells Q.S., Shaffer C.M., Kundu S., Robinson-Cohen C., Psaty B.M., Rich S.S., Post W.S., Guo X., Rotter J.I., Roden D.M., Gerszten R.E., Wang T.J. (2020) Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 323, 627–635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Elliott J., Bodinier B., Bond T.A., Chadeau-Hyam M., Evangelou E., Moons K.G.M., Dehghan A., Muller D.C., Elliott P., Tzoulaki I. (2020) Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA 323, 636–645 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mvab060_Supplementary_Data

Articles from Journal of Biochemistry are provided here courtesy of Oxford University Press

RESOURCES