ABSTRACT
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.
Keywords: meta‐analysis, haplotype association tests, family samples, linear mixed effects model
Introduction
In recent years, genome‐wide association studies (GWAS) have identified multiple common variants associated with disease and disease‐related traits. In a typical GWAS, association between a trait and genetic variants is tested one variant at a time, and variants with weak association routinely fail to be detected, especially in small cohorts. Therefore, meta‐analysis is often used by large consortia to increase statistical power [Dupuis et al., 2010, Scott et al., 2012, Stram, 1996, Zeggini et al., 2008] to detect variants with a moderate to weak association with the trait of interest. Even with large meta‐analyses, variants identified to date only explain a small proportion of the total heritability. In order to identify the source of the unexplained heritability, emerging approaches have attempted to account for multiple variants at once when evaluating association with a trait. Such approaches include penalized regression methods [Li et al., 2011, Wu et al., 2009], pathway analysis [Holden et al., 2008], gene‐based tests such as burden [Madsen and Browning, 2009] and SKAT [Wu et al., 2010], and haplotype analysis [Liu et al., 2008, Schaid et al., 2002, Tregouet et al., 2004]. The power of these approaches can be enhanced by increasing sample size or combining multiple studies. Methods for meta‐analysis of gene‐based tests are well established and widely used [Hu et al., 2013, Liu et al., 2014], but there are no widely used methods for the meta‐analysis of haplotype association tests.
In this article, we propose a meta‐analysis approach to combine haplotype association results from multiple studies. In the first step of our method, each study provides regression estimates and covariance matrices of haplotype effects, with adjustment for familial correlation to accommodate familial samples or cryptic relatedness. In our second step, cohort‐specific haplotype effect estimates are pooled using a multivariate generalized least square meta‐analysis approach. A global association test and evaluation of the effect of each haplotype can be obtained within our framework. We perform a simulation study to evaluate our approach, comparing results with more traditional meta‐analysis of single‐variant association tests and gene‐based tests. Finally, we replicate a published haplotype association between a fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium and are able to provide more precise haplotype effect estimates than the prior report involving haplotype estimates from a single cohort [Mahajan et al., 2015]. Code implementing the novel approach, along with a tutorial, is available at http://sites.bu.edu/fhspl/publications/metahaplo.
Methods
Haplotype Association Test at Cohort Level
Our approach is based on Zaykin et al.'s [2002] haplotype analysis method for unrelated samples. We incorporate random effects to account for family structure, making the approach applicable to family‐based cohorts, unrelated samples, or a mix of the two. We assume that a total of n subjects from a study are sequenced in a region with q SNVs and as a result, K haplotypes are observed. We assume a general linear (mixed‐effect) model, written as:
| (1) |
where is an quantitative trait vector, is an matrix of covariates (without intercept) including, for example, age, sex, and associated genetic principal components controlling for potential population stratification, is a coefficient vector for the p adjustment variables, each vector is the expected haplotype dosage, is an random effect vector that accounts for the relatedness within families, and is an vector of the random error terms. When haplotype m of the jth () subject is observed, , the jth entry in is either 0, 1, or 2, that is, the number of copies of haplotype m the jth subject carries. Otherwise, expected haplotype dosages are inferred from , the genotype vector of the jth subject, using statistical algorithms such as the expectation‐maximization (EM) algorithm [Dempster et al., 1977]. For the jth subject, the sum of the K haplotype dosages is always equal to 2. The random effect vector is assumed to follow a normal distribution , where is the additive variance and is the relationship matrix (with entries equal to twice the kinship coefficient for related pairs and 0 for unrelated pairs) derived from pedigree structure or genome‐wide information; in unrelated samples, the matrix reduces to , the identity matrix. Finally, we assume the vector of error terms ε follows a normal distribution , where is the variance of the error term.
Let denote the overall design matrix of size , and define the overall variance matrix as . The parameters and () are estimated as , where is evaluated at the maximum likelihood estimates and , which can be obtained using the lmekin function in R's coxme package [Therneau, 2012]. The estimated variance of the effect estimates is . The method reduces to an ordinary linear regression when applied to unrelated samples.
Meta‐Analysis
We assume a total of N cohorts participate in the meta‐analysis and the i‐th () cohort provides the estimates and the covariance matrix of the haplotype effects for haplotypes, and a total of haplotypes are observed in at least one cohort. We propose a multivariate meta‐analysis approach [Becker and Wu, 2007] based on generalized weighted least squares to combine the length haplotype effect estimates from each cohort, denoted by for studies , into a single effect estimate vector of length . The generalized weighted least square approach is formulated as:
| (2) |
where () is the haplotype coefficient vector for cohort i;
is the stacked haplotype coefficient vector from ();
is the coefficient vector of the haplotype effects;
is a design matrix stacked from the N cohorts, where W i () is a matrix, with zeros and one in each row indicating which haplotype effect is observed by cohort i;
is the error term which is assumed to have a multivariate normal distribution with a mean of 0 and a covariance matrix of .
Note that in the meta‐analysis stage, cohort haplotypes are reordered to match the order assigned to the haplotypes observed in at least one cohort, and the design matrix reflects this reordering. Furthermore, because is unknown, in our method, we substitute the sample estimate , hence the weighted least square estimator of β is and .
Hypothesis Testing
The global null hypothesis of no association of any haplotype with the trait is expressed as
| (3) |
To construct a test statistic to test for haplotype association, we reparameterize it into the equivalent null hypothesis, where β1 is chosen from commonly observed haplotypes:
| (4) |
The null hypothesis can be tested using a Wald test statistic of the form
| (5) |
where is estimated from and is the covariance matrix of , with a dimension of and the th element having the form . Under the null hypothesis, the Wald test statistic follows a distribution asymptotically.
Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium
The CHARGE consortium comprises multiple studies with the common goal of identifying genes and loci associated with cardiovascular‐related traits. Seven CHARGE cohorts contributed to a meta‐analysis evaluating the association between genetic variants and fasting glucose in 25,305 nondiabetic participants (Table 1). Fasting glucose levels in millimole per liter were analyzed in participants free of type‐2 diabetes. Type‐2 diabetes was defined by cohorts referring to at least one of the following criteria: a physician diagnosis of type‐2 diabetes, on the antidiabetic treatment of type‐2 diabetes, fasting plasma glucose ⩾7 mmol/l, random plasma glucose ⩾11.1 mmol/l, or hemoglobin A1C . Study‐specific sample exclusions were detailed in [Wessel et al., 2015].
Table 1.
CHARGE cohorts
| Cohort | Sample size |
|---|---|
| Generation Scotland: Scottish Family Health Studya (GS) | 7,678 |
| Framingham Heart Studya (FHS) | 6,561 |
| Cardiovascular Health Study (CHS) | 3,525 |
| Family Heart Studya (FamHS) | 3,393 |
| Multi‐Ethnic Study of Atherosclerosis (MESA) | 2,507 |
| FENLAND (FLD) | 1,341 |
| European Prospective Investigation into Cancer and Nutrition, Potsdam (EPIC‐Potsdam) | 300 |
| Total | 25,305 |
aFamily‐based cohort.
Genotypes were obtained from the Illumina HumanExome BeadChip [Grove et al., 2013], a genotyping array containing 247,870 variants discovered through exome sequencing in ∼ 12,000 individuals, in which ∼ 75% of the variants are low‐frequency variants (Minor Allele Frequency (MAF) ). The main content of the chip comprises protein‐altering variants (nonsynonymous coding, splice‐site, and stop gain or loss codons) seen at least three times in a study and in at least two studies providing information to the chip design. We selected four G6PC2 variants previously studied for their haplotype association with fasting glucose [Mahajan et al., 2015].
Simulation Studies
To evaluate the validity and power of our approach, we perform a simulation study varying the number of cohorts included in the meta‐analysis (5 or 10), and the type of samples (unrelated, family‐based, mix of the two). We also vary the sample size from 400 up to 1,600 subjects per cohort. See Table 2 for a description of the various study designs investigated in type‐I error rate and power.
Table 2.
Study designs for type‐I error rate evaluation
| Study design | No. of cohort | Sample sizes | Type‐I error rate (G6PC2) | Type‐I error rate (JAZF1) |
|---|---|---|---|---|
| 1 | 5 | 250 NF2 (× 5) | 0.010 | 0.010 |
| 2 | 5 | 250 NFv (× 5) | 0.010 | 0.012 |
| 3 | 5 | 100 NF2, 175 NF2, 400 U, 700 U, 1000 U | 0.013 | 0.010 |
| 4 | 5 | 100 NFv, 175 NFv, 400 U, 700 U, 1000 U | 0.011 | 0.011 |
| 5 | 5 | 100 NFv, 175 NFv, 250 NFv, 325 NFv, 400 NFv | 0.011 | 0.012 |
| 6 | 10 | 250 NF2 (× 5); 1000 U (× 5) | 0.010 | 0.011 |
| 7 | 10 | 400 U, 700 U, 1000 U, 1300 U, 1600 U | 0.008 | 0.012 |
| 8 | 5 | 100 NF2, 175 NF2, 250 NF2, 325 NF2, 400 NF2 | 0.012 | 0.011 |
| 9 | 5 | 250 NF2, 125 NF2 (× 2), 375 NF2 (× 2) | 0.011 | 0.011 |
| 1000 U, 500 U (× 2), 1500 U (× 2) | ||||
| 10 | 10 | 250 NFv (× 7), 1000 U (× 3) | 0.012 | 0.011 |
NF2, nuclear family with 2 offspring; NFv, nuclear family with the number of offspring randomly selected to be between 1 and 4; U, unrelated subjects.
Simulated trait values are dependent on sex, age, and haplotypes/genetic variants (power evaluation only). Sex of mothers and fathers (founders) are fixed in a heterosexual marriage but are randomly assigned to offspring, with equal probability. The age for unrelated individuals and the first offspring in a family are generated from a uniform distribution over the range 30 to 50. Additional offspring's ages are set to be within 5 years of the first offspring with at least a 1 year gap (no twins), using a uniform distribution. For family samples, the age of the mother is restricted to be 20–45 years older than her offspring, and the father's age to be within 5 years of the mother's age, with a restriction that the age be at least 20 years older than the older offspring.
We select the known T2D‐associated genes G6PC2 (chromosome 2; Tables 3 and 4) and JAZF1 (chromosome 7; Tables 5 and 6) to generate the reference panel haplotypes (Tables 3 and 4). We use the observed haplotypes and frequencies estimated by EM algorithm from 6561 participants from the Framingham Heart Study. For example, in JAZF1 no single haplotype has a frequency greater than 25% and eight haplotypes have frequency greater than 1% (Table 6).
Table 3.
G6PC2 variants
| Name | Chr | MapInfo | dbSNPID | Minor | Major | FHS MAF |
|---|---|---|---|---|---|---|
| exm‐rs560887 | 2 | 169763148 | rs560887 | A | G | 0.293 |
| exm239664 | 2 | 169763262 | rs138726309 | T | C | 0.0036 |
| exm239667 | 2 | 169764141 | rs2232323 | C | A | 0.0078 |
| exm239672 | 2 | 169764176 | rs492594 | C | G | 0.4553 |
Table 4.
G6PC2 haplotype frequencies
| rs560887 | rs138726309 | rs2232323 | rs492594 | FHS frequency |
|---|---|---|---|---|
| C | C | A | C | 0.46 |
| T | C | A | G | 0.29 |
| C | C | A | G | 0.24 |
| T | C | C | G | 0.006 |
| C | T | A | C | <0.001 |
| T | C | A | C | <0.001 |
| C | T | A | G | <0.001 |
| C | C | C | G | <0.001 |
Table 5.
JAZF1 variants (chromosome 7)
| Name | Position | dbSNPID | Minor | Major | MAF |
|---|---|---|---|---|---|
| exm‐rs10486567 | 27976563 | rs10486567 | A | G | 0.2415 |
| exm2270592 | 28039797 | rs38523 | C | T | 0.3683 |
| exm‐rs864745 | 28180556 | rs864745 | G | A | 0.4965 |
| exm‐rs1635852 | 28189411 | rs1635852 | C | T | 0.4973 |
| exm‐rs849134 | 28196222 | rs849134 | G | A | 0.4917 |
Table 6.
JAZF1 haplotype frequencies
| Haplotype | rs10486567 | rs38523 | rs864745 | rs1635852 | rs849134 | Frequency |
|---|---|---|---|---|---|---|
| 1 | G | T | A | T | A | 0.2327 |
| 2 | G | T | G | C | G | 0.2295 |
| 3 | G | C | G | C | G | 0.1608 |
| 4 | G | C | A | T | A | 0.1295 |
| 5 | A | T | A | T | A | 0.0866 |
| 6 | A | T | G | C | G | 0.0793 |
| 7 | A | C | A | T | A | 0.0434 |
| 8 | A | C | G | C | G | 0.0259 |
| 9 | A | T | G | T | A | 0.0029 |
| 10 | A | T | A | C | A | 0.0029 |
| 11 | A | C | A | C | A | 0.0023 |
| 12 | G | T | A | C | A | 0.0019 |
| 13 | G | T | G | T | A | 0.0017 |
| 14 | G | C | G | T | A | 0.0005 |
Genotypes are simulated by randomly assigning a pair of haplotypes to founders, and by dropping randomly selected haplotypes to offspring assuming no recombination within haplotypes. Although phasing information is available in our simulation setting, we do not use the phase information when implementing our approach because such information is not typically available in real datasets. We use the EM algorithm to infer expected haplotype dosage conditional on genotypes via R package haplo.stats [Sinnwell and Schaid, 2013].
When estimating haplotype effects at the cohort‐level, rare haplotypes (frequency) are collapsed to stabilize the computation and to avoid potential singularities due to high LD among SNVs.
Type‐I Error Rate
For evaluating the type‐I error rate of our new approach, a trait unassociated with the haplotypes is simulated using a multivariate normal distribution with mean (sex is set to 1 for males and to 2 for females) and a covariance matrix , with . Age and sex explained about 10% and 5% of the trait variance, respectively, resulting in a trait with moderate heritability ().
Cohort‐specific analyses are performed by first estimating haplotypes using the EM algorithm implemented in the R package haplo.stats, followed by regression analysis using haplotype dosages and covariates as independent variables. Cohort results are then meta‐analyzed using the novel approach previously described, and the global association test P values are recorded. Ten thousand simulations are performed to assess the type‐I error rate in all scenarios at the nominal threshold (Table 2).
Power Evaluation
The power of our novel haplotype meta‐analysis approach is evaluated in a total of 16 scenarios (phenotype datasets) divided into four study designs (study design 1–4 from Table 2), with varying haplotype or SNV effects. For each scenario, we first compute the meta‐analysis haplotype global test statistic, and then compare to meta‐analysis of both single variant tests and gene‐based tests. For single variant tests, we compute the meta‐analysis test statistic using inverse‐variance weighted method that has been shown to be the most powerful when the effect size is constant across cohorts [Zhou et al., 2011]. We then select the SNP with the minimum meta‐analysis P‐value ( for G6PC2; for JAZF1) and adjust the meta‐analysis P‐value for multiple testing using a Bonferroni correction for the effective number of independent variants [Gao et al., 2008]. We denote the result for the best SNP in the single variant analysis by “min P”. For gene‐based tests, we choose SKAT and Burden test with Wu weights and perform the analysis using R package seqMeta [Voorman et al., 2014]. We use to evaluate the power of all four approaches.
For each scenario, the phenotype is simulated using a multivariate normal distribution with mean and a covariance matrix , with , but unlike the type‐I error scenarios, the value of depends on genotypes/haplotypes in addition to the covariates of age and sex. We investigate four genetic effect scenarios: one or two causal genetic variants, or one or two causal haplotypes. For the causal variant scenario, where gj () is a vector containing the number of minor alleles (0, 1, or 2) carried by individuals in the sample, and is the effect of variant j, set to , where is the minor allele frequency of variant j and is the proportion of variance explained by this specific variant (haplotype). When only one causal variant is included in the model, and is multiplied by . For the causal haplotype models, , where is a vector containing the number (conditional dosage) of haplotype j carried by individuals in the sample, and is the effect of haplotype j, set to , where is mean haplotype dosage of haplotype j and . When only one causal haplotype is included in the model, and is multiplied by . For the JAZF1 gene, we select two haplotypes, GTATA (the most frequent haplotype) and GCGCG (the third most frequent haplotype), to have an effect on the phenotype while all other haplotypes have no effect on the phenotype. For models with single variant effects, we select rs849134 and rs38523 to have nonzero effect on the trait, while all other genetic variants have no effect. For the G6PC2 gene, we select CCAC and TCAG, the two most frequent haplotypes to have an effect on the phenotype. For models with single variant effects, we select rs560887 and rs2232323 to have nonzero effect on the trait.
A thousand simulations with five independent cohorts are performed to compare the power of our approach to the single variant method adjusted for multiple testing and gene‐based methods.
Results
Meta‐Analysis of Four Coding Variants on G6PC2 Region
G6PC2 is a known locus to affect fasting glucose level. Among the 17 exonic variants on the exome chip, 15 are rare variants (MAF<1%) and two are common variants (rs560887 with MAF = 25.4%; rs492594 with MAF = 43.7%). Previous GWAS have identified the A allele of rs560887, one of the two common variants to be associated with lower fasting glucose level ([Bouatia‐Naji et al., 2008]: mmol/l, ; [Dupuis et al., 2010]: mmol/l, ). A recent large‐scale exome‐chip analysis indicated that these 15 rare variants also had a joint effect on fasting glucose [Wessel et al., 2015]. Our approach is applied to study the association between the haplotype structure of four coding variants rs560887, rs138726309, rs2232323, and rs492594 and fasting glucose, using CHARGE exome‐chip data. We perform a meta‐analysis of seven studies comprising of three family‐based and four population‐based cohorts with up to 25,305 non‐diabetic European participants, to better understand how the overall haplotype structure as well as how the single haplotype affect fasting glucose level. With a meta‐analysis sample size of 25,305, we have successfully replicated a previous reported haplotype analysis of four coding variants on G6PC2 region [Mahajan et al., 2015], but with higher precision (Table 7). Our effect size estimates are consistent with previously published estimates, in terms of both direction and magnitude. However, prior results were based on a single population‐based cohort with 4,442 participants. In contrast, our analysis is based on seven cohorts with over 25,000 participants. Among the five haplotypes shared by all seven studies, one copy of the most significant haplotype, TCAG, decreases fasting glucose levels by 0.074 (95% confidence interval (CI): 0.063,0.085) mmol/l, on average; one copy of the second most significant haplotype, CCAG, increases the average fasting glucose levels by 0.039 (95% CI: 0.028,0.050) mmol/l; and one copy of the third most significant haplotype, TCCG, decreases fasting glucose levels by an average of 0.12 (95% CI: 0.065,0.18) mmol/l. Most haplotype effect estimates reported in Mahajan et al. [2015] fall within our 95 % CI, with the exception of estimates for TCCG (Mahajan et al.'s [2015] estimates = 0.205), which fall just outside our reported CI.
Table 7.
Single haplotype association test using 4SNVs on G6PC2 region
| rs560887 | rs138726309 | rs2232323 | rs492594 | β (SE) | P‐value | Frequency | ()a | |
|---|---|---|---|---|---|---|---|---|
| C | C | A | C | 0.4394 | ||||
| T | C | A | G | −0.073 (0.0055) |
|
0.2671 | −0.065(0.011) | |
| C | C | A | G | 0.039 (0.0056) |
|
0.2645 | 0.034(0.012) | |
| T | C | C | G | −0.12 (0.029) |
|
0.0065 | −0.205(0.057) | |
| C | T | A | C | −0.022 (0.056) | 0.70 | 0.0021 | −0.202(0.077) | |
| T | C | A | C | −0.031 (0.020) | 0.12 | 0.0195 | NA |
The haplotypes are observed in all cohorts except that the last one is observed only in FHS, CHS, GS, and FamHS.
a and denote the estimates from the paper of Mahajan et al. [2015].
Simulations
Ten scenarios with increasing diversity in the study designs of the cohorts included in the meta‐analysis are simulated to evaluate type‐I error rate of our approach. The type‐1 error rate is well controlled in all scenarios investigated (Table 2).
In the simulations to evaluate power, our approach is shown to be almost as powerful as the single SNV approach when SNVs are influencing the trait, but much more powerful to detect true haplotype effects. For example, in the family based design scenarios, our approach is 40% more powerful than single SNV analyses when two haplotypes have nonzero effect on the phenotypes (Figures 1 and 2). A similar pattern is observed for designs with a mix of unrelated and related samples. The gain in power is smaller when a single haplotype is influencing the trait, but present for all scenarios evaluated. When compared to the gene‐based tests, our approach is uniformly more powerful in all scenarios across all study designs (Figures 1 and 2) because of the Wu (default) weighing scheme that downweights common variants.
Figure 1.

Power of the haplotype meta‐analysis approach compared to gene‐based methods and single SNV meta‐analysis (min P) adjusted for multiple testing in the G6PC2 region, evaluated at in four study designs. Description of the four study designs used in the simulation can be found in Table 2 (study design 1–4). The labels on the x axes denote that 1 (SNV) or 2 (2SNVs) SNVs are influencing the phenotypes, or 1 (1HAP) or 2 (2HAPs) haplotypes have an effect on the phenotypes.
Figure 2.

Power of the haplotype meta‐analysis approach compared to gene‐based methods and single SNV meta‐analysis (min P) adjusted for multiple testing in the JAZF1 region, evaluated at in four study designs. Description of the four study designs used in the simulation can be found in Table 2 (study design 1–4). The labels on the x axes denote that 1 (SNV) or 2 (2SNVs) SNVs are influencing the phenotypes, or 1 (1HAP) or 2 (2HAPs) haplotypes have an effect on the phenotypes.
Discussion
We have proposed a general meta‐analysis approach to combine the haplotype association results from multiple cohorts. Our approach imposes no restrictions on the haplotypes observed across cohorts. Instead, our approach can incorporate information from haplotypes observed in a single cohort in addition to haplotypes observed in multiple cohorts. In the first stage of our approach, haplotype association analysis is performed at the cohort level. Information about the haplotype structure, frequencies, effect estimates, and covariance of effect estimates is collected, and meta‐analyzed in the second stage using a generalized weighted least square approach. The association between a trait and any single or multiple haplotypes can be easily evaluated within our framework.
We evaluated the type‐I error rate in a variety of scenarios with different cohort designs that included a mix of unrelated and family samples. Type‐I error rate was controlled in all scenarios investigated. We also compared the power of our approach with single variant tests corrected for multiple testing (min P approach), and demonstrated that our approach had equivalent power when variants, not haplotypes, influenced the trait, but was more powerful in the presence of true haplotype effects. Our haplotype approach also provided more evidence for association compared to gene‐based tests applied with the default weighting scheme, as exemplified in a recent large‐scale exome‐chip project [Wessel et al., 2015] applied to the G6PC2 region comprising 15 rare variants (MAF<1%). Our simulations also illustrated that the haplotype effect size estimates obtained from meta‐analysis were unbiased, even when family‐based cohorts were included.
While our approach cannot serve as the only tool for the discovery of associated variants and regions, it is a complementary tool to single‐variant and gene‐based tests. Mahajan et al. [2015] demonstrated the usefulness of haplotype analysis in their investigation of the effect of G6PC2 variants on fasting glucose. In 4,442 nondiabetic subjects from the Oxford Biobank, the G allele from the coding variant rs492594 appears to significantly decrease fasting glucose levels. However, when conditioning on the variant with the largest effect (rs560887) on fasting glucose, the effect estimates of the G‐allele from rs492594 is reversed, and the G allele appears to decrease fasting glucose, an apparent paradox. However, looking at the haplotype estimates elucidates the mystery: the rs492594 G allele is most frequently observed on the same haplotype as the glucose raising allele (T) from the strongest associated variant (rs560887), giving the impression that the G allele also increases fasting glucose. Our analysis supports this conclusion, and refines the effect estimates provided by Mahajan et al. [2015] by increasing the number of samples used to obtain effect estimates via meta‐analysis, providing more precise estimates, as reflected in the smaller standard errors.
Our approach has some limitations. The variants included in the haplotype analysis must be genotyped or imputed in all cohorts. In other words, all cohorts must include the same set of variants in their analysis. Moreover, when using imputed genotypes, best‐guess genotypes must be used because the approach does not currently handle genotypes in the form of dosage. The EM algorithm currently employed for inferring haplotypes works best for a moderate number of variants (< 15), and very rare haplotypes (frequency) are recommended to be collapsed to ensure computation stability. Despite these limitations, our approach has the potential to shed some light on the relationship between traits and multiple associated SNVs in a region.
Acknowledgments
Generation Scotland: Generation Scotland received core funding from the Chief Scientist Office of the Scottish Government Health Directorate CZD/16/6 and the Scottish Funding Council HR03006. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland, and was funded by the UKâs Medical Research Council. Ethics approval for the study was given by the NHS Tayside committee on research ethics (reference 05/S1401/89). We are grateful to all the families who took part, the general practitioners and the Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, healthcare assistants and nurses.
FamHS: Family Heart Study was supported by NIH grants RO1‐HL‐087700 and RO1‐HL‐088215 (M.A.P., PI) from NHLBI, and RO1‐DK‐8925601 and RO1‐DK‐075681 (I.B.B., PI) from NIDDK.
MESA: MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts N01‐HC‐95159, N01‐HC‐95160, N01‐HC‐95161, N01‐HC‐95162, N01‐HC‐95163, N01‐HC‐95164, N01‐HC‐95165, N01‐HC‐95166, N01‐HC‐95167, N01‐HC‐95168, N01‐HC‐95169, UL1‐TR‐001079, and UL1‐TR‐000040. Funding for SHARe genotyping was provided by NHLBI contract N02‐HL‐64278. Funding for MESA Family was provided by grants R01‐HL‐071051, R01‐HL‐071205, R01‐HL‐071250, R01‐HL‐071251, R01‐HL‐071252, R01‐HL‐071258, R01‐HL‐071259, and UL1‐RR‐025005. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center.
FHS: Framingham Heart Study—Genotyping, quality control, and calling of the Illumina HumanExome BeadChip in the Framingham Heart Study was supported by funding from the National Heart, Lung and Blood Institute, Division of Intramural Research (Daniel Levy and Christopher J. OâDonnell, Principle Investigators). A portion of this research was conducted using the Linux Clusters for Genetic Analysis (LinGA) computing resources at Boston University Medical Campus. Also supported by National Institute for Diabetes and Digestive and Kidney Diseases (NIDDK) R01 DK078616, NIDDK K24 DK080140, and American Diabetes Association Mentor‐Based Postdoctoral Fellowship Award #7‐09‐MN‐32, all to Dr. Meigs.
FENLAND: The Fenland Study is funded by the Medical Research Council (MC_U106179471) and Wellcome Trust. We are grateful to all the volunteers for their time and help, and to the General Practitioners and practice staff for assistance with recruitment. We thank the Fenland Study Investigators, Fenland Study Co‐ordination team, and the Epidemiology Field, Data and Laboratory teams.
EPIC‐Potsdam: We thank all EPIC‐Potsdam participants for their invaluable contribution to the study. The study was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.). The recruitment phase of the EPIC‐Potsdam study was supported by the Federal Ministry of Science, Germany (01 EA 9401) and the European Union (SOC 95201408 05 F02). The follow‐up of the EPIC‐Potsdam study was supported by German Cancer Aid (70‐2488‐Ha I) and the European Community (SOC 98200769 05 F02). Furthermore, we thank Dr. Manuela Bergmann who was responsible for the methodological and organizational work of data collections of exposures and outcomes and Wolfgang Fleischhauer for his medical expertise that was employed in case ascertainment and contacts with the physicians and Ellen Kohlsdorf for data management.
CHS: This CHS research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants HL080295, HL087652, HL103612, HL068986 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through AG023629 from the National Institute on Aging (NIA). A full list of CHS investigators and institutions can be found at http://www.chs‐nhlbi.org/pi.htm. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The copyright line for this article was changed on 5th May 2016 after original online publication.
References
- Becker BJ, Wu M‐J. 2007. The synthesis of regression slopes in meta‐analysis. Stat Sci 22: 414–429. [Google Scholar]
- Bouatia‐Naji N, Rocheleau G, Van Lommel L, Lemaire K, Schuit F, Cavalcanti‐Proença C, Marchand M, Hartikainen A‐L, Sovio U, De Graeve F, and others. 2008. A polymorphism within the g6pc2 gene is associated with fasting plasma glucose levels. Science 320(5879):1085–1088. [DOI] [PubMed] [Google Scholar]
- Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B (Methodological) 39: 1–38. [Google Scholar]
- Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson A. U., Wheeler E, Glazer NL, Bouatia‐Naji N, Gloyn AL, and others . 2010. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42(2):105–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao X, Starmer J, Martin ER. 2008. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32(4):361–369. [DOI] [PubMed] [Google Scholar]
- Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, Hansen M, Borecki IB, Cupples LA, Fornage M, and others. 2013. Best practices and joint calling of the humanexome beadchip: the charge consortium. PLoS One 8(7):e68095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holden M, Deng S, Wojnowski L, Kulle B. 2008. Gsea‐SNP: applying gene set enrichment analysis to SNP data from genome‐wide association studies. Bioinformatics 24(23):2784–2785. [DOI] [PubMed] [Google Scholar]
- Hu Y‐J, Berndt SI, Gustafsson S, Ganna A, Hirschhorn J, North KE, Ingelsson E, Lin D‐Y. 2013. Meta‐analysis of gene‐level associations for rare variants based on single‐variant statistics. Am J Hum Genet 93(2):236–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Das K, Fu G, Li R, Wu R. 2011. The Bayesian lasso for genome‐wide association studies. Bioinformatics 27(4):516–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu D, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, Nikpay M, Auer PL, Goel A, Zhang H, and others. 2014. Meta‐analysis of gene‐level tests for rare variant association. Nat Genet 46(2):200–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu N, Zhang K, Zhao H. 2008. Haplotype‐association analysis. Adv Genet 60: 335–405. [DOI] [PubMed] [Google Scholar]
- Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5(2):e1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahajan A, Sim X, Ng HJ, Manning A, Rivas MA, Highland HM, Locke AE, Grarup N, Im HK, Cingolani P, and others. 2015. Identification and functional characterization of g6pc2 coding variants influencing glycemic traits define an effector transcript at the g6pc2‐abcb11 locus. PLoS Genet 11(1):e1004876–e1004876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. 2002. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70(2):425–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott RA, Lagou V, Welch RP, Wheeler E, Montasser ME, Luan J, Mägi R, Strawbridge RJ, Rehnberg E, Gustafsson S, and others. 2012. Large‐scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet 44(9):991–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sinnwell JP, Schaid DJ. 2013. haplo.stats: Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous . R package version 1.6.8.
- Stram DO. 1996. Meta‐analysis of published data using a linear mixed‐effects model. Biometrics 52: 536–544. [PubMed] [Google Scholar]
- Therneau T. 2012. coxme: Mixed Effects Cox Models. R package version 2.2‐3.
- Tregouet D, Escolano S, Tiret L, Mallet A, Golmard J. 2004. A new algorithm for haplotype‐based association analysis: the stochastic‐em algorithm. Ann Hum Genet 68(2):165–177. [DOI] [PubMed] [Google Scholar]
- Voorman A, Brody J, Chen H, Lumley T. 2014. seqMeta: Meta‐Analysis of Region‐Based Tests of Rare DNA Variants . R package version 1.5.
- Wessel J, Chu AY, Willems SM, Wang S, Yaghootkar H, Brody JA, Dauriz M, Hivert M‐F, Raghavan S, Lipovich L, and others. 2015. Low‐frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nat Commun 6:5897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X. 2010. Powerful SNP‐set analysis for case‐control genome‐wide association studies. Am J Hum Genet 86(6):929–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu TT, Chen YF, Hastie T, Sobel E, Lange K. 2009. Genome‐wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6):714–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG. 2002. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53(2):79–91. [DOI] [PubMed] [Google Scholar]
- Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, and others. 2008. Meta‐analysis of genome‐wide association data and large‐scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40(5):638–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou B, Shi J, Whittemore AS. 2011. Optimal methods for meta‐analysis of genome‐wide association studies. Genet Epidemiol 35(7):581–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
