Abstract
Background
Over 160 disease phenotypes have been mapped to the major histocompatibility complex (MHC) region on chromosome 6 by genome-wide association study (GWAS), suggesting that the MHC region as a whole may be involved in the etiology of many phenotypes, including unstudied diseases. The phenome-wide association study (PheWAS), a powerful and complementary approach to GWAS, has demonstrated its ability to discover and rediscover genetic associations. The objective of this study is to comprehensively investigate the MHC region by PheWAS to identify new phenotypes mapped to this genetically important region.
Methods
In the current study, we systematically explored the MHC region using PheWAS to associate 2692 MHC-linked variants (minor allele frequency ≥ 0.01) with 6221 phenotypes in a cohort of 7481 subjects from the Marshfield Clinic Personalized Medicine Research Project.
Results
Findings showed that expected associations previously identified by GWAS could be identified by PheWAS (e.g. psoriasis, ankylosing spondylitis, type I diabetes, and celiac disease) with some having strong cross-phenotype associations potentially driven by pleiotropic effects. Importantly, novel associations with 8 diseases not previously assessed by GWAS (e.g., lichen planus) were also identified and replicated in an independent population. Many of these associated diseases appear to be immune-related disorders. Further assessment of these diseases in 16,484 Marshfield Clinic twins suggests that some of these diseases, including lichen planus, may have genetic etiologies.
Conclusions
These results demonstrate that the PheWAS approach is a powerful and novel method to discover SNP-disease associations, and is ideal when characterizing cross-phenotype associations, and further emphasizes the importance of the MHC region in human health and disease.
Keywords: phenome-wide association study (PheWAS), genome-wide association study (GWAS), major histocompatibility complex, lichen planus, precision medicine
INTRODUCTION
Over the last decade, more than a thousand unique phenotypes have been associated with thousands of loci by genome-wide association study (GWAS).1 Interestingly, according to the National Human Genome Research Institute (NHGRI) GWAS Catalog, 2.5% of GWAS SNPs, some with potential pleiotropic properties, and 13.5% of phenotypes can be mapped to a 4 Mb region on chromosome 6p, encompassing the major histocompatibility complex (MHC) gene cluster. For example, age-related macular degeneration, drug-induced liver injury, and schizophrenia, along with many inflammatory and autoimmune conditions, such as ankylosing spondylitis, psoriasis, and type I diabetes, are mapped to the MHC region.1 In addition to these GWASs, the ImmunoChip Consortium has fine mapped approximately 20 autoimmune diseases to the MHC region,2 while others have studied this region by candidate gene association studies, gene expression studies, and protein structural variant analyses.3-5
The MHC region is characterized by human leukocyte antigen (HLA) class I and class II gene clusters. HLA genes encode proteins that modulate both innate and adaptive immune response. HLA class I proteins present epitopes from inside the cell (e.g., viruses) to identify cells targeted for cytotoxic T-cell digestion. Class I MHC molecules are presented as transmembrane glycoproteins that consist of two polypeptide chains, α and β2-microglobulin.6 Class II proteins present foreign antigens from outside the cell to stimulate helper T- and B-cells to activate the complement and antibody systems.7,8 Class II MHC molecules consist of α and β chains that are encoded by HLA-DP, -DQ, or -DR.9 The MHC region is one of the most polymorphic regions in the human genome. Strong genetic associations between the HLA variants and autoimmune disease have been established for many years. For example, HLA-B27 has been known to be the major susceptibility gene for ankylosing spondylitis, a complex disease that is characterized by inflammation and ankylosis. Variants in this gene are present in over 90% of ankylosing spondylitis patients. In another example, the major T1D susceptibility locus maps to the class II loci HLA-DRB1 and HLA-DQB1. Variants in this region may account for 30-50% of genetic T1D risk.10 In addition to HLA genes, numerous other genes in the MHC region, such as TNF, MICA, MICB, and MOG, also have apparent immunologic roles.11-13The potential importance of this genetic locus in disease etiology is also highlighted by the significant proportion of conditions that are genetically mapped to these genes despite occupying a small proportion (only 0.1%) of the human genome.1 As such, the phenome-wide association study (PheWAS) design may be a powerful method to evaluate genetic variants in this important genetic region.
The PheWAS approach reverses the paradigm of GWAS by using a genotype-to-phenotype strategy to identify diseases that are associated with an individual genetic variant. A commonality of most PheWASs is the use of an electronic health record (EHR) to define a phenome, often relying on standardized International Classification of Diseases, version 9 (ICD9) codes to define disease status.14 A challenge with PheWAS is some phenotypes may be individually rare. Regardless, the PheWAS approach has demonstrated its capacity to rediscover important genetic associations identified previously by GWAS, has the capacity to identify novel associations, and is ideal when characterizing cross-phenotype associations.15-20 This may be particularly relevant for HLA variants located in the human MHC region on chromosome 6. For example, the first proof-of-concept PheWAS focused on the HLA DRB1*1501 variant previously associated with multiple sclerosis (MS). This PheWAS was not only able to demonstrate the importance of this variant in MS, but was also able to identify a novel association with erythematosus conditions,15 an association that was subsequently confirmed in an independent PheWAS.17
With many phenotypes already mapped to the MHC region by GWAS, and with many more phenotypes yet to be studied by GWAS, we hypothesized that this region may contain additional genetic associations and that the PheWAS technique may be leveraged to identify such associations. To address this hypothesis, we conducted a comprehensive PheWAS of 2692 genetic variants across the MHC region spanning 4 Mbs on chromosome 6. PheWAS results not only confirmed many expected associations, but also identified many novel associations with immunologic diseases not yet assessed by GWAS.
MATERIALS AND METHODS
Ethics Statement
This study was approved by the Marshfield Clinic Institutional Review Board (approval number HEB10112). Written and informed consent was acquired for all participants.
Patient Population
Genotyped samples have been described elsewhere,21 and have been applied previously to PheWAS.14,22,23 Briefly, all genotyped individuals were self-identified white/non-Hispanic Marshfield Clinic patients recruited into the Personalized Medicine Research Project (PMRP). PMRP represents a homogenous population with 77% of participants claiming German ancestry. In this PheWAS, 7481 patients were used for discovery and 3887 patients were used for independent validation. Discovery set participants were all over age 40 (mean 59 years), have on average over 30 years of EHR data, and have been genotyped by Illumina HumanCoreExome BeadChip (San Diego, CA) as described below. Validation set participants were all over age 50 (mean 74 years) and had comparable years of EHR data.
In addition to PMRP, a cohort of 16,484 Marshfield Clinic twins was evaluated for disease concordance. Diseases studied in this population included 8 phenotypes associated with the MHC locus in replication studies. Information used to identify twins included shared last name, date of birth, home address, healthcare/billing account, and/or clinical documentation suggesting they were a twin. Individuals in MCTC are on average 30 years of age with 8.8 years of EHR data. Although zygosity information is unavailable in MCTC, we developed a method that assesses disease concordance rates in twins to study potential genetic diseaeses as described previously.22 Significance was measured by determining if a disease co-occurred in pairs of twins more frequently than by chance given the disease frequency in the cohort. Like the PheWAS, phenotypes were defined by ICD9 coding.
Genotyping
DNA samples from 7481 patients in the discovery set were genotyped by Illumina HumanCoreExome BeadChip. The Exome Chip consists of 569,645 variants across the genome. For the purpose of the current study, we analyzed SNPs from the MHC region that spans chr6: 29091311-33821793 (hg19). After filtering out poor quality and rare SNPs , 2692 variants remained with MAF ≥ 0.01, including 142 previously defined (2014.10.30) “GWAS significant” SNPs (p ≤ 5.0E-08).1 For validation, 281 SNPs genotyped on the Illumina Human660W-Quad BeadChip were used.
ICD9 Code and Phenome
The phenome was defined by patient EHR data as described previously.17,23,24 Briefly, ICD9 codes were used to define cases and controls at varying levels of phenotypic resolution using a roll-up strategy (e.g. ICD9 720.89 ✧ 720.8*✧720.*). Patients coded for any one specific code (e.g., ICD9 720.89) became “cases” for that code, whereas those not coded for the specific code or related codes (e.g., ICD9 720*) became “controls.” For common ICD9 codes (≥300 individuals), cases were defined by those coded two or more times (“rule-of-two”); those coded only once were not considered a case or a control. For less frequent ICD9 codes (<300 individuals), all individuals coded for that ICD9 code were designated as a case. As requested by Marshfield Clinic’s Institutional Review Board, case status was not defined for rare ICD9 codes (<9 individuals) to protect patient privacy as PMRP participants originate from a very specific region in Central Wisconsin. A total of 6221 phenotypes/ICD9 codes defined the phenome for this PheWAS.
Statistics
Because some diseases represented rare phenotypes and some SNPs had low MAFs (<0.05), a Fisher’s exact test for allelic association was calculated with Plink v1.9 (http://pngu.mgh.harvard.edu/purcell/plink/).25 PheWAS was performed on 2692 SNPs including 142 “GWAS significant” SNPs. A suggestive p-value cutoff in the discovery set (p ≤ 5E-05) was used to identify associations to be assessed for independent replication. We further applied logistic regression analysis using sex and years of EHR data as potential covariates for all 6221 phenotypes (S1 and S2 Table). The rs1794275 genotype was also included as a potential covariate when further studying lichen planus. FDR was calculated.26 A meta-analysis was conducted using Fisher’s method by Ri386 3.1.0 (http://www.R-project.org/) 27 for SNP-disease pairs where phenotypic and genotypic data were available in the independent replication set. This analysis considered the direction of effect (i.e. ORs) in the two datasets. HLA classical haplotypes were imputed using HIBAG R-package.28,29 Associations between classic HLA haplotypes and lichen planus were analyzed using Fisher’s exact test that compared each haplotype to all other haplotypes.
To characterize cross-phenotype associations in the MHC region, a Trait-based Association Test (TATES) was conducted for 2692 SNPs, including 142 “GWAS significant” SNPs. TATES analyses were conducted on ICD9 codes that define 12 broad disease categories and those that mapped to inflammatory phenotypes defined by PheCodes.30 For individual phenotypes, TATES combines the p-values obtained in a single marker test to arrive at a global p-value while correcting for observed correlational structure between phenotypes.20,31
RESULTS
PheWAS for “GWAS significant” SNPs
Our initial PheWAS focused on identifying novel associations for GWAS SNPs that mapped to the MHC region [chr6:29091311-33821793 (hg19)]. According to the NHGRI GWAS Catalog, 259 SNPs had “GWAS significant” associations (p ≤ 5.0E-08);1 142 of these SNPs were genotyped by Illumina HumanCoreExome BeadChip. Of these 142 SNPs, 9 were associated with 10 ICD9 codes representing 4 general phenotypes that passed a conservative Bonferroni correction (p ≤ 5.7E-08; assuming experimentwise α = 0.05, 142 SNPs, and 6221 phenotypes) in our discovery set (Table 1). Three of the 9 SNPs agreed with previous GWAS findings. For those that did not agree, most of the phenotypes were not adequately captured in the PheWAS either because of rarity or lack of specific ICD9 codes that described the expected phenotype. Rediscovered phenotypes included reconfirming associations with celiac disease, psoriasis, and ankylosing spondylitis. For example, we replicated an association between rs2187668 and celiac disease (p = 1.9E-08, false discovery rate (FDR) = 0.0072).32,33 Rs3131296 and rs2071278, previously shown to be associated with schizophrenia34 and levels of complement factors C3 and C4,35 respectively, and in partial linkage disequilibrium (LD) with rs2187668 in the PMRP sample (r2 = 0.53 and 0.45, respectively), were also associated with celiac disease. Similar scenarios of multiple SNPs in LD sharing common associations were observed for other phenotypes (Table 1). In addition to these rediscoveries, we also identified an association between rs1794275 and lichen planus (p = 1.8E-08, FDR = 0.0071). Lichen planus is an inflammatory condition that can affect skin and mucous membranes and has not been studied by GWAS. This SNP was previously shown to be associated by GWAS with IgA nephropathy in an East Asian population.36 Furthermore, this SNP has been associated with multiple sclerosis, primary biliary cirrhosis, rheumatoid arthritis, and type I diabetes by ImmunoChip Consortium.2
Table 1.
Significant PheWAS Associations (p ≤ 5.7E-08) with “GWAS Significant” SNPs.
SNP |
Position
(bp,hg19) |
Gene |
Cases
(MAF) |
Controls
(MAF) |
OR
(95%CI) |
Fisher
p-value |
FDR |
ICD9
Code |
PheWAS
Phenotype |
GWAS
Phenotype1 |
Reported OR
ranges1 |
---|---|---|---|---|---|---|---|---|---|---|---|
rs9264942 | 31274380 |
HLA-C, HLA-B |
327 (0.46) |
6744 (0.35) |
1.6 (1.4-1.9) | 5.8E-09 | 0.0026 | 696 | psoriasis and similar disorders |
Crohn's disease, HIV-1 control |
1.2-2.9, 5.3 |
253 (0.48) |
7034 (0.35) |
1.7 (1.4-2.0) | 5.5E-09 | 0.0026 | 696.1 | other psoriasis | |||||
rs10484554 | 31274555 |
HLA-C, HLA-B |
327 (0.25) |
6744 (0.14) |
2.0 (1.7-2.4) |
9.2E-13 | 1.5E-6 | 696 | psoriasis and similar disorders |
psoriasis, AIDS progression |
2.8-4.7, NR |
42 (0.40) |
7376 (0.14) |
4.2 (2.7-6.4) |
1.9E-09 | 0.001 | 696.0 | psoriatic arthropathy |
|||||
253 (0.28) |
7034 (0.14) |
2.4 (1.9-2.9) |
2.1E-15 | 5.7E-9 | 696.1 | other psoriasis | |||||
rs4349859 | 31365787 |
HLA-B, MICA |
35 (0.39) |
7383 (0.04) |
14.5 (8.9- 23.6) |
1.5E-19 | 2.5E-12 | 720.0 | ankylosing spondylitis |
ankylosing spondylitis |
40.837 |
180 (0.16) |
7238 (0.04) |
4.2 (3.2-5.7) |
8.7E-17 | 7.3E-10 | 720 | ankylosing spondylitis and other inflammatory spondylopathies |
|||||
26 (0.40) |
7392 (0.04) |
15.6 (8.9- 27.3) |
4.1E-16 | 1.7E-09 | 720.8 | other inflammatory spondylopathies |
|||||
26 (0.40) |
7392 (0.04) |
15.6 (8.9- 27.3) |
4.1E-16 | 1.7E-09 | 720.89 | other inflammatory spondylopathies |
|||||
30 (0.37) |
7388 (0.04) |
13.3 (7.8- 22.7) |
9.7E-16 | 3.2E-09 | 720.9 | unspecified inflammatory spondylopathy |
|||||
rs4418214 | 31391401 |
MICA, HCP5 |
253 (0.16) |
7034 (0.08) |
2.3 (1.8-2.9) | 8.0E-10 | 5.6E-04 | 696.1 | other psoriasis | HIV-1 susceptibility, HIV-1 control |
1.5, 4.4 |
35 (0.41) |
7383 (0.08) |
8.2 (5.1- 13.2) |
1.5E-14 | 3.6E-08 | 720.0 | ankylosing spondylitis |
|||||
180 (0.19) |
7238 (0.08) |
2.6 (2.0-3.5) | 1.8E-10 | 1.5E-04 | 720 | ankylosing spondylitis and other inflammatory spondylopathies |
|||||
26 (0.42) |
7392 (0.08) |
8.3 (4.6- 14.7) |
1.2E-10 | 1.1E-04 | 720.8 | other inflammatory spondylopathies |
|||||
26 (0.42) |
7392 (0.08) |
8.3 (4.6- 14.7) |
1.2E-10 | 1.1E-04 | 720.89 | other inflammatory spondylopathies |
|||||
30 (0.40) |
7388 (0.08) |
7.7 (4.6- 13.0) |
5.8E-12 | 8.1E-06 | 720.9 | unspecified inflammatory spondylopathy |
|||||
rs9368699 | 31802541 |
C6orf48, SNORD48 |
253 (0.09) |
7034 (0.04) |
2.7 (1.9-3.6) | 3.3E-08 | 0.011 | 696.1 | other psoriasis | HIV-1 control | NR |
rs2071278 | 32165444 | NOTCH4 | 47 (0.40) |
7371 (0.16) |
3.5 (2.3-5.2) | 1.8E-08 | 0.0071 | 579.0 | celiac disease | complement factor C3 and C4 levels |
0.1 |
rs3131296 | 32172993 | NOTCH4 | 47 (0.38) |
7371 (0.14) |
3.8(2.5-5.8) | 3.3E-09 | 0.0016 | 579.0 | celiac disease | schicophrenia | 1.2 |
rs2187668 | 32605884 |
HLA-
DQA1 |
47 (0.34) |
7371 (0.12) |
3.8 (2.4-5.8) | 1.9E-08 | 0.0072 | 579.0 | celiac disease | celiac disease, systematic lupus erythematosus, nephropathy (idiopathic membranous), immunoglobulin A |
6.2-7.0, 2.2, 4.3, 2.5 |
rs1794275 | 32671248 |
HLA-
DQB1, HLA- DQA2 |
97 (0.34) |
7321 (0.17) |
2.5 (1.8-3.4) | 1.8E-08 | 0.0071 | 697.0 | lichen planus | IgA nephropathy | 1.3 |
PheWAS, phenome-wide association study; GWAS, genome-wide association study; SNP, single nucleotide polymorphism; bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval; ICD9, International Classification of Disease version 9; FDR, false discovery rate; NR, not reported.
Reported OR ranges are from the GWAS Catalog1 unless otherwise specified.
PheWAS across entire MHC region
To more comprehensively assess the MHC locus, we conducted PheWAS for all HumanCoreExome SNPs mapped to this region [minor allele frequency (MAF) ≥ 0.01, n = 2692 SNPs]. This analysis rediscovered statistically significant associations for psoriasis, ankylosing spondylitis, and type I diabetes (p ≤ 3.0E-09; assuming experimentwise α = 0.05, 2692 SNPs, and 6221 phenotypes). Using a suggestive p-value threshold (p ≤ 5.0E-05) in the discovery set, there were 1464 associations consisting of 895 SNPs and 425 phenotypes. Since many of these associations are not overly significant given FDR > 0.05, we expect some false positives (S1 Table). Among these 1464 associations, 470 SNP-disease pairs, consisting of 281 SNPs and 214 phenotypes, could be assessed in the independent validation set with available data. Of the 470 pairs, 64 SNP-disease pairs had suggestive evidence in the validation set (p ≤ 0.05), 58 SNP-disease pairs (91%), consisting of 44 SNPs and 23 phenotypes, demonstrated similar directions of effect in both datasets indicating potential enrichement for true associations (S2 Table). Among the 23 disease phenotypes, 8 diseases have not been characterized by previous GWAS. These 8 diseases were associated with 16 SNPs (Table 2). Under this senario, lichen planus again showed associations with MHC SNPs, including the GWAS SNP described previously (rs1794275). As also mentioned previously and further described below, lichen planus is an immune-related condition. Based on clinical descriptions, many other conditions may also have immunologic etiologies (Table 2).
Table 2.
Top genetic associations with MHC SNPs not previously charactericed by GWAS.
Discovery set | Validation set |
Meta
analysis P-value |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenotype |
ICD9
code |
SNP |
Position
(bp,hg19) |
Cases
(MAF) |
Controls
(MAF) |
Fisher
p-value |
OR
(95%CI) |
FDR |
HLA haplotype
(p-value#) |
Cases
(MAF) |
Controls
(MAF) |
Fisher
p-value |
OR
(95%CI) |
|
Unspecified histoplasmosis retinitis |
115.92 | rs3093983 | 31496925 | 12 (0.54) |
7406 (0.18) |
5.0E-05 | 5.3 (2.4-11.8) |
0.57 |
HLA-DQB1
*06:02 (6.7E-4) |
8 (0.56) | 3879 (0.19) |
0.0005 | 5.7 (2.1-15.2) |
1.2E-07 |
rs3093978 | 31498497 | 12 (0.54) |
7406 (0.18) |
5.0E-05 | 5.3 (2.4-11.8) |
0.57 | 8 (0.56) | 3879 (0.19) |
0.0005 | 5.6 (2.1-15.2) |
1.2E-07 | |||
Hemangioma of intra-abdominal structures |
228.04 | rs3131003 | 31093482 | 31 (0.68) |
7387 (0.42) |
3.7E-05 | 2.9 (1.7-5.0) |
0.52 |
HLA-C
*12:03 (6.8E-4) |
25 (0.62) |
3862 (0.43) |
0.0078 | 2.2 (1.2-3.9) |
1.2E-06 |
rs2523619 | 31318144 | 31 (0.45) |
7387 (0.20) |
5.1E-06 | 3.3 (2.0-5.5) |
0.23 | 25 (0.36) |
3862 (0.21) |
0.011 | 2.1 (1.2-3.8) |
2.7E-07 | |||
Pneumonia due to Staphylococcus |
482.4 | rs3129234 | 33111347 | 15 (0.57) |
7403 (0.23) |
4.3E-05 | 4.4 (2.1-9.0) |
0.54 |
HLA-DPB1
*03:01 (5.3E-4) |
6 (0.46) | 3881 (0.24) |
0.012 | 2.7 (1.2-6.0) |
2.1E-06 |
rs3129214 | 33117258 | 15 (0.57) |
7403 (0.23) |
4.4E-05 | 4.4 (2.1-9.0) |
0.54 | 6 (0.46) | 3881 (0.24) |
0.022 | 2.7 (1.2-6.0) |
3.9E-06 | |||
rs756440 | 33122331 | 15 (0.57) |
7403 (0.23) |
4.3E-05 | 4.4 (2.1-9.0) |
0.54 | 6 (0.46) | 3881 (0.24) |
0.011 | 2.7 (1.2-6.0) |
2.1E-06 | |||
Lichen planus | 697.0 | rs12529049 | 32357715 | 97 (0.26) |
7321 (0.15) |
4.6E-05 | 2.0 (1.5-2.8) |
0.55 |
HLA-DQB1
*05:01 (8.0E-08) |
81 (0.22) |
3806 (0.14) |
0.0032 | 1.8 (1.2-2.6) |
6.5E-07 |
rs4248166 | 32366421 | 97 (0.30) |
7321 (0.18) |
1.4E-05 | 2.0 (1.5-2.8) |
0.38 | 81 (0.25) |
3806 (0.17) |
0.013 | 1.6 (1.1-2.3) |
8.3E-07 | |||
rs13192471 | 32671103 | 97 (0.29) |
7321 (0.14) |
8.4E-08 | 2.5 (1.8-3.4) |
0.023 | 81 (0.23) |
3806 (0.14) |
0.0038 | 1.8 (1.2-2.6) |
1.9E-09 | |||
rs1794275 | 32671248 | 97 (0.34) |
7321 (0.17) |
1.8E-08 | 2.5 (1.8-3.4) |
0.0071 | 81 (0.23) |
3806 (0.17) |
0.031 | 1.5 (1.0-2.2) |
3.3E-09 | |||
rs2857106 | 32787570 | 97 (0.33) |
7321 (0.20) |
1.4E-05 | 2.0 (1.5-2.7) |
0.38 | 81 (0.28) |
3806 (0.19) |
0.0051 | 1.7 (1.2-2.4) |
3.4E-07 | |||
Dyshidrosis | 705.81 | rs2844697 | 30932309 | 374 (0.43) |
7044 (0.35) |
1.2E-05 | 1.4 (1.2-1.6) |
0.34 |
HLA-B
*35:01 (0.0030) |
146 (0.41) |
3741 (0.35) |
0.042 | 1.3 (1.0-1.6) |
2.1E-06 |
Other and unspecified nonspecific immunological findings |
795.79 | rs3094165 | 29833541 | 215 (0.23) |
7012 (0.32) |
2.6E-05 | 0.6 (0.5-0.8) |
0.47 |
HLA-DRB1
*01:03 (0.0012) |
127 (0.26) |
3668 (0.32) |
0.044 | 0.7 (0.6-1.0) |
4.5E-06 |
Infraspinatus (muscle) (tendon) sprain |
840.3 | rs13198118 | 30770732 | 10 (0.55) |
7408 (0.16) |
4.2E-05 | 6.4 (2.6-15.5) |
0.53 |
HLA-DRB1
*08:10 0.0030 |
7 (0.36) |
3880 (0.16) |
0.034 | 3.0 (1.0-9.0) |
5.7E-06 |
Contusion of wrist | 923.21 | rs9264942 | 31274380 | 264 (0.44) |
7154 (0.35) |
1.9E-05 | 1.5 (1.2-1.8) |
0.44 |
HLA-C
*03:03 (1.6E-4) |
98 (0.46) |
3789 (0.36) |
0.0076 | 1.5 (1.1-2.0) |
6.6E-07 |
MHC, major histocompatibility complex; SNP, single nucleotide polymorphism; GWAS, genome-wide association study; ICD9, International Classification of Disease version 9; bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval; FDR, false discovery rate.
Reported are the haplotypes with the minimum p-values.
To determine if any of the 8 diseases may have underlying genetic etiologies, 16,484 twins in the Marshfield Clinic Twin Cohort (MCTC) were assessed.22 The analysis considered the disease concordance rates in families of twins relative to the disease frequency in the population. In general, most diseases were rare in the twin cohort given that MCTC represents younger patients (average 30 years of age) with fewer years of EHR data (average 8.8 years) compared to patients in PMRP (average >59 years of age and >30 years of EHR data). Even with this limitation, three diseases, had suggestive evidence (p < 0.05) that these conditions co-segregated in families of twins and may be driven in part by genetics. With 46 affected and 1 family of disease concordant twins, lichen planus approached significance (p = 0.062) (S3 Table). The most significant phenotype included dyshidrosis, a skin condition that results in small fluid-filled blisters often affecting the hands and can be associated with atopic dermatitis and other allergic conditions.38 In MCTC, 133 twins were affected with dyshidrosis including seven families of disease concordant twin families (p = 1.6E-6).
Lichen planus associations in the MHC region
Lichen planus is known to be T-cell mediated and can be caused by MHC-linked graft-versus-host disease from allogenic bone marrow transplantation.39 Since lichen planus was the most significant phenotype identified in the current study, but not previously studied by GWAS, we conducted follow-up analyses. To further assess lichen planus associations in the MHC region, logistic regression was conducted in the discovery set. Except for the association with top “GWAS significant” SNP rs1794275, lichen planus also showed associations with 5 other SNPs across the MHC region (p ≤ 1.9E-5; assuming experimentwise α = 0.05, 2692 SNPs) (Fig 1A). All 6 SNPs were common variants with similar odds ratios (ORs; 2.0-2.5) for the minor alleles. Four of the six associations were confirmed in the validation set (p < 0.05) (Table 3). However, LD analysis indicated that these SNPs were in partial or strong LD with the most significant SNP (rs1794275), indicating that these SNP associations were likely due to LD. Indeed, significance of these associations was diminished when the effect of rs1794275 was adjusted by logistic regression (Fig 1B). For follow-up and to provide potential functional insights, we assessed associations between classical HLA haplotypes and lichen planus. Haplotype HLA DQB1*05:01 had the strongest association with lichen planus (p = 8.0E-08). Similar haplotype analyses were conducted for the 7 other novel phenotypes (Table 2 and S4 Table).
Figure 1. Manhattan plot for lichen planus across the MHC region.
(A) Fisher’s exact analysis and (B) logistic regression analysis. In Figure 1B, black data points represent adjustment for gender and years of EHR data. Grey data points represent adjustment for gender, years of EHR data, and rs1794275 genotype.
Table 3.
Top SNP associations with lichen planus.
Discovery set | Validation set | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP |
Position
(bp,hg19) |
Function | Gene |
Amino
Acid change |
Variation |
Case
MAF (N=97) |
Control
MAF (N=7321) |
p-value |
OR
(95% CI) |
FDR |
Case
MAF (N=81) |
Control
MAF (N=3806) |
p-value |
OR
(95% CI) |
Meta-
analysis p-value |
rs6930777 | 32351566 | intergenic |
C6orf10, HCG23 |
- | C/T | 0.23 | 0.12 | 6.8E-06 | 2.3 (1.6-3.2) |
0.28 | - | - | - | - | - |
rs4248166 | 32366421 | intronic | BTNL2 | - | T/C | 0.30 | 0.18 | 1.4E-05 | 2.0 (1.5-2.8) |
0.38 | 0.25 | 0.17 | 0.013 | 1.6 (1.1-2.3) |
8.3E-07 |
rs1049056 | 32634369 | exonic | HLA-DQB1 | A6S | C/A | 0.32 | 0.16 | 1.8E-07 | 2.4 (1.8-3.2) |
0.03 | - | - | - | - | - |
rs13192471 | 32671103 | intergenic |
HLA-DQB1, HLA-DQA2 |
- | T/C | 0.29 | 0.14 | 8.4E-08 | 2.5 (1.8-3.4) |
0.02 | 0.23 | 0.14 | 0.0038 | 1.8 (1.2-2.6) |
1.9E-09 |
rs1794275 | 32671248 | intergenic |
HLA-DQB1, HLA-DQA2 |
- | G/A | 0.34 | 0.17 | 1.8E-08 | 2.5 (1.8-3.4) |
0.007 | 0.23 | 0.17 | 0.031 | 1.5 (1.0-2.2) |
3.3E-09 |
rs2857106 | 32787570 | intergenic |
HLA-DOB, TAP2 |
- | T/C | 0.33 | 0.20 | 1.4E-05 | 2 (1.5-2.7) |
0.38 | 0.28 | 0.19 | 0.0051 | 1.7 (1.2-2.4) |
3.4E-07 |
SNP, single nucleotide polymorphism; bp, base pair; MAF, minor allele frequency; N: number of cases or controls; OR, odds ratio; CI, confidence interval; FDR, false discovery rate.
a p-values calculated using Fisher’s exact test.
Cross-phenotype association analysis of the MHC region
Because many disease-associated variants have been mapped to the MHC region by GWAS and now PheWAS, we characterized cross-phenotype associations for 2692 HumanCoreExome BeadChip MHC SNPs. Focus was on 358 ICD9 codes that define inflammatory phenotypes.30 Five SNPs were statistically significant for cross-phenotype associations (p ≤ 1.9E-05; assuming experimentwise α = 0.05, 2692 SNPs), including rs4349859, rs4418214, rs9391846, rs12175489, and rs2844505 (Fig 2). The top SNP with the most significant cross-phenotype associations was the GWAS SNP rs4349859 (p = 2.3E-14). Rs4349859 was most strongly associated with ankylosing spondylitis, as described previously (ICD9 720; p = 8.7E-17; Table 1), along with multiple arthritic phenotypes including rheumatoid arthritis (ICD9 714; p = 0.004), unspecified polyarthropathy or polyarthritis of ankle and foot (ICD9 716.57; p = 0.005), and localized osteoarthrosis of the pelvic region and thigh (ICD9 715.35; p = 0.008) (S5 Table). These results are not surprising given the observed genetic overlap between ankylosing spondylitis and rheumatoid arthritis.40 Lastly, rs4349859 was also associated with iridocyclitis (ICD9 364.02; p = 8.0E-06) (S5 Table), an inflammatory condition of the iris and ciliary body that can affect those diagnosed with ankylosing spondylitis.41 These significant results generated in a single experiment expand on observations of cross-phenotype associations for the MHC region described by many GWASs.
Figure 2. Results of cross-phenotype association analysis of the MHC region.
Grey data points represent “GWAS significant” SNPs and black data points represent non-GWAS significant SNPs across the MHC region.
DISCUSSION
Our study is the first comprehensive PheWAS to examine SNPs mapped to the MHC region. As described previously, the MHC region may be involved in the etiology of many diseases, including unstudied conditions. Diseases such as autoimmune, inflammatory, and malignant diseases are significantly more common among individuals carrying particular HLA alleles.10 Many of the genes in the MHC region are hypermutable which is fundamental for their function. This is particularly relevant for HLA genes involved in the induction and regulation of the immune responses. As such, the PheWAS may serve as a powerful tool for studying genetic variants in this important genetic locus. Our study confirmed significant associations between SNPs and disease phenotypes identified in previous GWASs,1 including ankylosing spondylitis, psoriasis, celiac disease, and type I diabetes; some with strong cross-phenotype associations that may be indicative of pleiotropy. Most importantly, we further demonstrated PheWAS’s capacity to discover suggestive SNP-disease associations for 8 diseases that have not been studied by GWAS. Multiple diseases, including lichen planus, are likely immune-related diseases, reflecting the importance of the MHC region in human immune genetics (S2 Table). Results from MCTC provide further support that these conditions may be influenced by genetic variation, although we cannot rule out shared environmental effects (S3 Table). Additional genetic studies focused on these diseases may be needed.
Lichen planus was the most significant disease associated with MHC SNPs not previously studied by GWAS.1 Our study revealed six SNPs, including one GWAS significant SNP (rs1794275), associated with lichen planus across a 400 kb region, including HLA-DQB1 (Fig 1, Table 3). These 6 SNPs have also been reported to be associated with multiple sclerosis, type I diabetes, and other immune diseases.2 Interestingly, there is evidence that individuals with multiple sclerosis and type I diabetes are at increased risk for lichen planus.42,43 Since many of these variants are in partial LD, we suspect that these SNPs probably tag for the same functional variant or haplotype. Haplotype analysis demonstrated HLA-DQB1*05:01 was strongly associated with lichen planus. Notably, HLA-DQB1*05:01:01 has been implicated in lichen planus previously by a candidate gene association study.44 However, other candidate gene studies have implicated other HLA haplotypes in lichen planus.45,46 Since individuals with lichen planus are at increased risk for carcinoma,47 in-depth future studies of the genetic etiology of lichen planus may be warranted.
Like GWAS, PheWAS is challenged by multiple comparison testing and frequently applies a Bonferroni adjustment.14 This study also included variants with MAF ≥0.01, including coding variants, as potential candidates. Variants with MAF <0.05 may have higher effect sizes compared to common variants, but if not, they may increase the burden to identify statistically significant associations. The ability to identify statistical significance is further limited by the inherent nature of the phenotype(s) being studied (e.g., heritability, polygenicity, case/control specificity, and sample size). In a PheWAS strategy, thousands of phenotypes can be studied simultaneously, but some individual diseases may be rare, have weak genetic etiologies, or be sex specific. The culmination of these challenges will influence power to detect associations. We attempted to account for these difficulties by applying conservative Fisher’s exact tests and Bonferroni adjustments when interpreting results. It is expected larger cohorts linked to extensive phenotypic data, such as the anticipated “Precision Medicine Initiative” of over one million individuals,48 will be ideal for PheWAS to better assess the importance of thousands of phenotypes including rare and sex-specific diseases.
Unique to PheWAS is the potential correlative structure in the phenotypic data. In an ICD9-based PheWAS, similar ICD9 codes may be correlated. Furthermore, correlations exist across codes.14 Under this circumstance, a stringent Bonferroni correction may be overly conservative. Regardless, we did identify multiple associations that surpassed the conservative multiple testing threshold, but we suspect additional novel associations remain to be elucidated by follow-up PheWASs and disease-specific studies.
The MHC region brings unique challenges in genetic study not limited to PheWAS. Specifically, this locus includes extreme sequence diversity, structural variants, high gene density with considerable homologies, and substantial LD spanning Mbs of sequence.49 Due to these inherent qualities, characterizing the candidate genes and their presumed functional variants can be a significant challenge. These limitations also restrict SNP genotyping array design with many genotyped SNPs unable to fully address these complexities. It should be noted that other SNP arrays, including Illumina ImmunoChip,2 provide higher density genotyping for this region compared to the Exome Chip platform. In the future, new long range sequencing technologies,50 in combination with the PheWAS approach, may prove necessary to comprehensively study this biologically and clinically important region.
In conclusion, our results expand on what multiple GWASs have identified in that the MHC locus is an important region in human health and disease. Furthermore, this study builds on the growing evidence that the PheWAS technique may be a highly efficient method to detect novel SNP-disease associations and may be ideal when characterizing cross-phenotype associations (Fig 2). With patient cohorts expanding into the millions linked to extensive phenotypic data, it will be conceivable to conduct a “GWAS-by-PheWAS” in a single experiment where all diseases are studied at the GWAS level and all variants are studied at the PheWAS level. Such strategies may lead to significant advancements in precision medicine.
Supplementary Material
S1 Table. Data for 1464 SNP-disease associations consisting of 895 SNPs and 425 phenotypes in the discovery set (p-value ≤ 5.0E-05).
Included are association results and functional annotation. Abbreviations: bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval.
S2 Table. Data for 470 significant discovery set SNP-disease pairs with available genotypic and phenotypic data in the independent replication set. Included are association results and functional annotation. GWAS-identified diseases (GWAS_Diseases) are defined as either Y, N, or AMBIGOUS. Y indicates that this disease has been studied by previous GWAS. N indicates that this disease has not been studied by previous GWAS. AMBIGOUS indicates that this disease has not been studied by previous GWAS, but some other disease that might be related to this disease has been studied by GWAS and therefore, the relationship between the studied GWAS disease and the phenotype that the ICD9 code stands for is not clear. Additional abbreviations: bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval; and meta-analysis p-value, meta-analysis p-value calculated using Fisher’s method.
S3 Table. Co-occurrence of 8 novel disease phenotypes in the MCTC.
S4 Table. Complete HLA haplotypes analyses of 8 novel disease phenotypes.
S5 Table. SNPs with statistically significant cross-phenotype associations. Included are pairwise associations for those SNPs and ICD9 codes that bin within musculoskeletal, digestive, endocrine-metabolic, dermatologic, or inflammatory disease categories.
ACKNOWLEDGEMENTS
The authors would like to thank Rachel Stankowski for her assistance in editing this manuscript and Dr. Joshua Denny for providing the most recent version of their PheWAS R-package.
FUNDING
This work was support by the generous donors of the Marshfield Clinic, National Institutes of Health National Center for Advancing Translations Sciences grant number UL1TR000427, National Human Genome Research Institute grant number 1U01HG006389, National Institute of General Medical Science grant number 1R01GM114128 and National Library of Medicine grant numbers 1K22LM011938.
Footnotes
COMPETING INTERESTS
None declared.
CONTRIBUTORS
SJH and JL designed the study and SJH oversaw all aspects of the study. JL performed laboratory experiments. JL, JGM, BAH, and ZY generated and analyzed association data. CG, LR, and CC provided clinical interpretations. SSK, XZ, TM and KT provided expertise in haplotype analysis. MHB and SJH provided material support. JL and SJH wrote the manuscript with input from all other authors.
REFERENCES
- 1.Hindorff LAMJ, Morales J, Junkins HA, Hall PN, Klemm AK, Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: www.genome.gov/gwastudies. [Google Scholar]
- 2.ImmunoBase. Avaialable at: https://www.immunobase.org/page/Welcome/display.
- 3.Parkes M, Cortes A, van Heel DA, Brown MA. Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nat Rev Genet. 2013;14:661–73. doi: 10.1038/nrg3502. [DOI] [PubMed] [Google Scholar]
- 4.Hanna S, Etzioni A. MHC class I and II deficiencies. J Allergy Clin Immunol. 2014;134:269–75. doi: 10.1016/j.jaci.2014.06.001. [DOI] [PubMed] [Google Scholar]
- 5.Jones EY, Fugger L, Strominger JL, Siebold C. MHC class II proteins and disease: a structural perspective. Nat Rev Immunol. 2006;6:271–82. doi: 10.1038/nri1805. [DOI] [PubMed] [Google Scholar]
- 6.Halenius A, Gerke C, Hengel H. Classical and non-classical MHC I molecule manipulation by human cytomegalovirus: so many targets–but how many arrows in the quiver? Cell Mol Immunol. 2015;12:139–53. doi: 10.1038/cmi.2014.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Grifoni A, Montesano C, Colizzi V, Amicosante M. Key role of human leukocyte antigen in modulating human immunodeficiency virus progression: an overview of the possible applications. World J Virol. 2015;4:124–33. doi: 10.5501/wjv.v4.i2.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Trivedi VB, Dave AP, Dave JM, Patel BC. Human leukocyte antigen and its role in transplantation biology. Transplant Proc. 2007;39:688–93. doi: 10.1016/j.transproceed.2007.01.066. [DOI] [PubMed] [Google Scholar]
- 9.Turner D. The human leucocyte antigen (HLA) system. Vox Sang. 2004;87:87–90. doi: 10.1111/j.1741-6892.2004.00438.x. [DOI] [PubMed] [Google Scholar]
- 10.Mosaad YM. Clinical role of human leukocyte antigen in health and disease. Scand J Immunol. 2015;82:283–306. doi: 10.1111/sji.12329. [DOI] [PubMed] [Google Scholar]
- 11.Silke J, Rickard JA, Gerlic M. The diverse role of RIP kinases in necroptosis and inflammation. Nat Immunol. 2015;16:689–97. doi: 10.1038/ni.3206. [DOI] [PubMed] [Google Scholar]
- 12.Lanier LL. NKG2D receptor and its ligands in host defense. Cancer Immunol Res. 2015;3:575–82. doi: 10.1158/2326-6066.CIR-15-0098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Reindl M, Di Pauli F, Rostasy K, Berger T. The spectrum of MOG autoantibody-associated demyelinating diseases. Nat Rev Neurol. 2013;9:455–61. doi: 10.1038/nrneurol.2013.118. [DOI] [PubMed] [Google Scholar]
- 14.Hebbring SJ. The challenges, advantages and future of phenome-wide association studies. Immunology. 2014;141:157–65. doi: 10.1111/imm.12195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–10. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Muatsumoto M, Kopp PA, Newton KM, Karvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011;89:529–42. doi: 10.1016/j.ajhg.2011.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hebbring SJ, Schrodi SJ, Ye Z, Zhou Z, Page D, Brilliant MH. A PheWAS approach in studying HLA-DRB1*1501. Genes Immun. 2013;14:187–91. doi: 10.1038/gene.2013.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pendergrass SA, Brown-Gentry K, Dudek S, Frase A, Torstenson ES, Goodloe R, Ambite JL, Avery CL, Buyske S, Bůžková P, Deelman E, Fesinmeyer MD, Haiman CA, Heiss G, Hindorff LA, Hsu CN, Jacksin RD, Kooperberg C, Le Marchand L, Lin Y, Matise TC, Monroe KR, Moreland L, Park SL, Reiner A, Wallace R, Wilkens LR, Carwford DC, Ritchie MD. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 2013;9:e1003087. doi: 10.1371/journal.pgen.1003087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Cohorts for Heat and Aging Research in Genomic Epidemiology (CHARGE) QRS Group. Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013;127:1377–85. doi: 10.1161/CIRCULATIONAHA.112.000604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–95. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, Li R, Masys DR, Ritchie MD, Roden DM, Dtruewing JP, Wolf WA, eMERGE Team The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4:13. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mayer J, Kitchner T, Ye Z, Zhou Z, He M, Schrodi SJ, Hebbring SJ. Use of an electronic medical record to create the marshfield clinic twin/multiple birth cohort. Genet Epidemiol. 2014;38:692–8. doi: 10.1002/gepi.21855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ye Z, Mayer J, Ivacic L, Zhou Z, He M, Schrodi SJ, Page D, Brilliant MH, Hebbring SJ. Phenome-wide association studies (PheWASs) for functional variants. Eur J Hum Genet. 2015;23:523–9. doi: 10.1038/ejhg.2014.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hebbring SJ, Rastegar-Mojarad M, Ye Z, Mayer J, Jacobson C, Lin S. Application of clinical text data for phenome-wide association studies (PheWASs) Bioinformatics. 2015;31:1981–7. doi: 10.1093/bioinformatics/btv076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Benjamini YH, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;57:289–300. [Google Scholar]
- 27.R Development Core Team . A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2011. R Foundation for Statistical Computing. ISBN: 3-900051-07-0. Available at: http://www.R-project.org/ [Google Scholar]
- 28.Khor SS, Yang W, Kawashima M, Kamitsuji S, Zheng X, Nishida N, Sawai H, Toyoda H, Miyagawa T, Honda M, Kamatani N, Tokunaga K. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references. Pharmacogenomics J. 2015;15:530–7. doi: 10.1038/tpj.2015.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zheng X, Shen J, Cox C, Wakefield JC, Ehm MG, Nelson MR, Weir BS. HIBAG--HLA genotype imputation with attribute bagging. Pharmacogenomics J. 2014;14:192–200. doi: 10.1038/tpj.2013.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Carroll RJ, Bastarache L, Denny JC. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30:2375–6. doi: 10.1093/bioinformatics/btu197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1003235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adány R, Aromaa A, Bardella MT, van den Berg LH, Bockett NA, de la Concha EG, Dema B, Fehrmann RS, Fernández-Arquero M, Fiatal S, Grandone E, Green PM, Groen HJ, Gwilliam R, Houwen RH, Hunt SE, Kaukinen K, Kelleher D, Korponay-Szabo I, Kurppa K, MacMathuna P, Mäki M, Mazzilli MC, McCann OT, Mearin ML, Mein CA, Mirza MM, Mistry V, Mora B, Morley KI, Mulder CJ, Murray JA, Núñez C, Oosterom E, Ophoff RA, Polanco I, Peltonen NL, Platteel M, Rubak A, Salomaa V, Schweizer JJ, Sperandeo MP, Tack GJ, Turner G, Veldink JH, Verbeek WH, Weersma RK, Wolters VM, Urcelay E, Cukrowski B, Greco L, Neuhausen SL, McManus R, Barisani D, Deloukas P, Barrett JC, Saavalainen P, Wijmenga C, van Heel DA. Multiple common variants for celiac disease influencing immune gene expression. Nat Genet. 2010;42:295–302. doi: 10.1038/ng.543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.van Heel DA, Franke L, Hunt KA, Gwilliam R, Zhernakova A, Inouye M, Wapenaar MC, Barnardo MC, Bethel G, Holmes GK, Feighery C, Jewell D, Kelleher D, Kumar P, Travis S, Walters JR, Sanders DS, Howdle P, Swift J, Playford RJ, McLaren WM, Mearin ML, Mulder CL, McManus R, McGinnis R, Cardon LR, Seloukas P, Wijmenga C. A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nat Genet. 2007;39:827–9. doi: 10.1038/ng2058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietiläinen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Børglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Böttcher Y, Olesen J, Breuer R, Möller HJ, Giegling I, Rasmussen HB, Timm S, Matheisen M, Bitter I, Réthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TW, Thorsteindottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Genetic Risk and Outcome in Psychosis (GROUP) Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jönsson EG, Terenius L, Agartz I, Petersson H, Nöthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, Clair D, Godlstein DB, Stefansson K, Collier DA. Common variants conferring risk of schizophrenia. Nature. 2009;460:744–7. doi: 10.1038/nature08186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yang X, Sun J, Gao Y, Tan A, Zhang H, Hu Y, Feng J, Qin X, Tao S, Chen Z, Kim St, Peng T, Liao M, Lin X, Zhang Z, Tang M, Li L, Mo L, Liang Z, Shi D, Huang Z, Huang X, Liu M, Liu Q, Zhang S, Trent JM, Zheng SL, Xu J, Mo Z. Genome-wide association study for serum complement C3 and C4 levels in healthy Chinese subjects. PLoS Genet. 2012;8:e1002916. doi: 10.1371/journal.pgen.1002916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yu XQ, Li M, Zhang H, Low HQ, Wei X, Wang JQ, Sun LD, Sim KS, Li Y, Foo JN, Wang W, Li ZJ, Yin XY, Tang XQ, Fan L, Chen J, Li RS, Wan JX, Liu ZS, Lou TQ, Zhu L, Huang XJ, Zhang XJ, Liu ZH, Liu JJ. A genome-wide association study in Han Chinese identifies multiple susceptibility loci for IgA nephropathy. Nat Genet. 2012;44:178–82. doi: 10.1038/ng.1047. [DOI] [PubMed] [Google Scholar]
- 37.Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G, Oppermann U, Dilthey A, Pirinen M, Stone MA, Appleton L, Moutsianas L, Lesli S, Wordsworth T, Kenna TJ, Karaderi T, Thomas GP, Ward MM, Weisman MH, Farrar C, Bradburty LA, Danoy P, Inman RD, Maksymowych W, Gladman D, Rahman P, Spondyloarthritis Research Consortium of Canada (SPARCC) Morgan A, Marzo-Ortega H, Bowness P, Gaddney K, Gaston JS, Smith M, Bruges-Armas J, Couto AR, Sorrentino R, Paladini F, Ferreira MA, Xu H, Liu Y, Jiang L, Kopez-Larrea C, Díaz-Peña R, López-Vázquez A, Zayats T, Band G, Bellenquez C, Blackburn H, Blackwell JM, Bramon E, Bumpstead SK, Casas JP, Corvin A, Craddock N, Deloukas P, Dronov S, Duncanson A, Edkins S, Freeman C, Gillman M, Gray E, Gwilliam R, Hammond N, Hunt SE, Jankowski J, Jayakumar A, Langford C, Liddle J, Markus HS, Mathew CG, McCann OT, McCarthy MI, Palmer CNN, Peltonen L, Plomin R, Potter SC, Rautanen A, Ravindrarajah R, Ricketts M, Samani N, Sawcer SJ, Strange A, Trembath RC, Viswanathan AC, Waller M, Weston P, Whittaker P, Widaa S, Wood NW, McVean G, Reveille JD, Wordsworth BP, Brown MA, Donnelly P, Australo-Angol-American Spondyloarthritis Consortium (TASC) Wellcome Trust Case Control Consortium 2 (WTCCC2) Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011;43:761–7. doi: 10.1038/ng.873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lofgren SM, Warshaw EM. Dyshidrosis: epidemiology, clinical characteristics, and therapy. Dermatitis. 2006;17:165–81. doi: 10.2310/6620.2006.05021. [DOI] [PubMed] [Google Scholar]
- 39.Nicolatou-Galitis O, Kitra V, Van Vliet-Constantinidou C, Peristeri J, Goussetis E, Petropoulos D, Grafakos S. The oral manifestations of chronic graft-versus-host disease (cGVHD) in paediatric allogeneic bone marrow transplant recipients. J Oral Pathol Med. 2001;30:148–53. doi: 10.1034/j.1600-0714.2001.300304.x. [DOI] [PubMed] [Google Scholar]
- 40.Jawaheer D, Seldin MF, Amos CI, Chen WV, Shigeta R, Monteiro J, Kern M, Criswell LA, Albani S, Nelson JL, Clegg DO, Pope R, Schroeder HW, Jr, Bridges SL, Jr, Pisetsky DS, ward R, Kastner DL, Wilder RL, Pincus T, Callahan LF, Flemming D, Wener MH, Gregersen PK. A genomewide screen in multiplex rheumatoid arthritis families suggests genetic overlap with other autoimmune diseases. Am J Hum Genet. 2001;68:927–36. doi: 10.1086/319518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Chung YM, Yeh TS, Liu JH. Clinical manifestations of HLA-B27-positive acute anterior uveitis in Chinese. Zhonghua Yi Xue Za Zhi (Taipei) 1989;43:97–104. [PubMed] [Google Scholar]
- 42.Sepic J, Ristic S, Perkovic O, Brinar V, Lipozencic J, Crnic-Martinovic M, Starcevic Cizmarevic N, Janko Labinac D, Kapovic M, Peterlin B. A case of lichen ruber planus in a patient with familial multiple sclerosis. J Int Med Res. 2010;38:1856–60. doi: 10.1177/147323001003800533. [DOI] [PubMed] [Google Scholar]
- 43.Mohsin SF, Ahmed SA, Fawwad A, Basit A. Prevalence of oral mucosal alterations in type 2 diabetes mellitus patients attending a diabetic center. Pak J Med Sci. 2014;30:716–9. [PMC free article] [PubMed] [Google Scholar]
- 44.Setterfield JF, Neill S, Shirlaw PJ, Theron J, Vaughan R, Escudier M, Challacombe SJ, Black MM. The vulvovaginal gingival syndrome: a severe subgroup of lichen planus with characteristic clinical features and a novel association with the class II HLA DQB1*0201 allele. J Am Acad Dermatol. 2006;55:98–113. doi: 10.1016/j.jaad.2005.12.006. [DOI] [PubMed] [Google Scholar]
- 45.Pavlovsky L, Israeli M, Sagy E, Berg AL, David M, Shemer A, Klein T, Hodak E. Lichen planopilaris is associated with HLA DRB1*11 and DQB1*03 alleles. Acta Derm Venereol. 2015;95:177–80. doi: 10.2340/00015555-1884. [DOI] [PubMed] [Google Scholar]
- 46.Gao XH, Barnardo MC, Winsey S, Ahmad T, Cook J, Agudelo JD, Zhai N, Powell JJ, Fuggle SV, Wojnarowska F. The association between HLA DR, DQ antigens, and vulval lichen sclerosus in the UK: HLA DRB112 and its associated DRB112/DQB10301/04/09/010 haplotype confers susceptibility to vulval lichen sclerosus, and HLA DRB10301/04 and its associated DRB10301/04/DQB10201/02/03 haplotype protects from vulval lichen sclerosus. J Invest Dermatol. 2005;125:895–9. doi: 10.1111/j.0022-202X.2005.23905.x. [DOI] [PubMed] [Google Scholar]
- 47.Gandolfo S, Richiardi L, Carrozzo M, Broccoletti R, Carbone M, Pagano M, Vestita C, Rosso S, Merletti F. Risk of oral squamous cell carcinoma in 402 patients with oral lichen planus: a follow-up study in an Italian population. Oral Oncol. 2004;40:77–83. doi: 10.1016/s1368-8375(03)00139-8. [DOI] [PubMed] [Google Scholar]
- 48.Collins FS, Varmus H. A new initiative on precision medicine. New Engl J Med. 2015;372:793–5. doi: 10.1056/NEJMp1500523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Allcock RJ, Atrazhev AM, Beck S, de Jong PJ, Elliott JF, Forbes S, Halls K, Horton R, Osoegawa K, Rogers J, Sawcer S, Todd JA, Trowsdale J, Wang Y, Williams S. The MHC haplotype project: a resource for HLA-linked association studies. Tissue Antigens. 2002;59:520–1. doi: 10.1034/j.1399-0039.2002.590609.x. [DOI] [PubMed] [Google Scholar]
- 50.Ammar R, Paton TA, Torti D, Shlien A, Bader GD. Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes. F1000Res. 2015;4:17. doi: 10.12688/f1000research.6037.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
S1 Table. Data for 1464 SNP-disease associations consisting of 895 SNPs and 425 phenotypes in the discovery set (p-value ≤ 5.0E-05).
Included are association results and functional annotation. Abbreviations: bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval.
S2 Table. Data for 470 significant discovery set SNP-disease pairs with available genotypic and phenotypic data in the independent replication set. Included are association results and functional annotation. GWAS-identified diseases (GWAS_Diseases) are defined as either Y, N, or AMBIGOUS. Y indicates that this disease has been studied by previous GWAS. N indicates that this disease has not been studied by previous GWAS. AMBIGOUS indicates that this disease has not been studied by previous GWAS, but some other disease that might be related to this disease has been studied by GWAS and therefore, the relationship between the studied GWAS disease and the phenotype that the ICD9 code stands for is not clear. Additional abbreviations: bp, base pair; MAF, minor allele frequency; OR, odds ratio; CI, confidence interval; and meta-analysis p-value, meta-analysis p-value calculated using Fisher’s method.
S3 Table. Co-occurrence of 8 novel disease phenotypes in the MCTC.
S4 Table. Complete HLA haplotypes analyses of 8 novel disease phenotypes.
S5 Table. SNPs with statistically significant cross-phenotype associations. Included are pairwise associations for those SNPs and ICD9 codes that bin within musculoskeletal, digestive, endocrine-metabolic, dermatologic, or inflammatory disease categories.