Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2015 Aug 6;97(2):260–271. doi: 10.1016/j.ajhg.2015.06.007

Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies

Gleb Kichaev 1, Bogdan Pasaniuc 1,2,3,
PMCID: PMC4573268  PMID: 26189819

Abstract

Localization of causal variants underlying known risk loci is one of the main research challenges following genome-wide association studies. Risk loci are typically dissected through fine-mapping experiments in trans-ethnic cohorts for leveraging the variability in the local genetic structure across populations. More recent works have shown that genomic functional annotations (i.e., localization of tissue-specific regulatory marks) can be integrated for increasing fine-mapping performance within single-population studies. Here, we introduce methods that integrate the strength of association between genotype and phenotype, the variability in the genetic backgrounds across populations, and the genomic map of tissue-specific functional elements to increase trans-ethnic fine-mapping accuracy. Through extensive simulations and empirical data, we have demonstrated that our approach increases fine-mapping resolution over existing methods. We analyzed empirical data from a large-scale trans-ethnic rheumatoid arthritis (RA) study and showed that the functional genetic architecture of RA is consistent across European and Asian ancestries. In these data, we used our proposed methods to reduce the average size of the 90% credible set from 29 variants per locus for standard non-integrative approaches to 22 variants.

Introduction

Genome-wide associations studies (GWASs) have reproducibly identified thousands of risk loci associated with complex traits and diseases.1–7 Unfortunately, the index variants reported in these studies are typically not biologically causal but rather correlated with the underlying causal variant through linkage disequilibrium (LD).8 Fine-mapping experiments identify causal variants responsible for the GWAS signal first by gathering dense genetic information, either through targeted sequencing or dense imputation, and second by statistically prioritizing variants that can subsequently be validated in functional studies.3,6,9,10

Divergent population histories due to various demographic forces such as bottlenecks and expansions have produced unique genomic landscapes across ethnicities.11,12 Such differences in LD patterns and variant frequencies across populations can increase the power of statistical fine mapping if they are properly modeled.3,13–18 Intuitively, if a locus contains a causal variant, the neighborhood of LD partners linked to this variant will be distinct in different populations. Thus, aggregating the strength of association across multiple populations might accentuate the signal from the true causal variant(s) while dampening the noise from correlated variants.

A common approach to combining information across multiple studies is through inverse-variance fixed-effects meta-analysis,19 which assumes that effect sizes of causal variants are similar across studies or populations. This assumption can be relaxed by a random-effects strategy, although it has been observed that this usually results in a decrease in statistical power.20 A recent, and more robust, Bayesian meta-analysis framework15 was proposed to reason over trans-ethnic studies with potential allelic heterogeneity. Both the fixed-effects meta-analysis statistics and the Bayes factors supplied by the latter approach can be readily converted into posterior probabilities of association (PPAs) for the construction of fine-mapping credible sets.3,21 However, these credible sets are commonly built under the assumption that a locus harbors at most a single causal variant,3,22–24 which might be invalidated at many risk loci,17,25,26 leading to miscalibrated credible sets.27,28 Although conceptually it might be possible to create credible sets on the basis of independent signals identified through conditional analysis, this strategy suffers from necessitating an ad hoc re-definition of the fine-mapping region. Furthermore, multiple causal variants in LD can create at neighboring sites synthetic associations that are potentially stronger than the association at the true causal variants. The iterative conditioning strategy would necessarily select these synthetic SNPs first, thereby dissipating the signal from the true causal variants.27

In addition to the strength of association between genotype and phenotype, an orthogonal source of information lies within a variant’s functional genomic context. Projects such as ENCODE29 and ROADMAP30 have provided a rich atlas of functional information, and numerous groups have reproducibly demonstrated that disease-associated variants are systematically enriched within chromatin marks that delineate active regulatory regions in phenotypically relevant cell types.31–35 Whereas functional genomic data are often used as a post hoc validation of association findings,4,10,36 a number of principled approaches have been proposed to jointly integrate functional and association data.28,35,37,38 In addition to increasing the accuracy of fine mapping, these integrative approaches also provide insights into the genetic architecture of the trait by identifying relevant tissue-specific functional marks without making any prior assumptions. However, to the best of our knowledge, functional integrative approaches have not been extended to trans-ethnic fine mapping, and a rigorous assessment of trans-ethnic fine mapping in the presence of multiple causal variants is currently lacking. Although in principle the single-population frameworks that allow for multiple causal variants27,28 can operate directly on trans-ethnic meta-analysis statistics, they require ad hoc averaging of trans-ethnic LD and do not properly account for heterogeneity by ancestry at causal variants.

In this work, we propose a statistical framework that integrates three sources of information to triangulate causal variants in fine-mapping studies: (1) the strength of association between genotype and phenotype, (2) differential genomic background across ethnic groups, and (3) tissue-specific functional genomic annotations (Figure 1). Different allele frequencies (or sample sizes) across populations induce differential standardized effect sizes at all the variants in a region, even in the presence of no allelic effect-size heterogeneity by ancestry. We model this induced heterogeneity across populations through a multivariate normal (MVN) framework wherein the sets of population-specific association statistics are realizations from population-specific MVN distributions. Similar to the case of a single population,27,28 this allows us to consider multiple causal variants at any risk locus. We integrate functional genomic data by using Empirical Bayes,28 which provides a means of selecting functional annotations most relevant to the trait of interest. Most importantly, our proposed approach requires only the summary association data for each population, thereby avoiding the many restrictions that can accompany analysis of individual-level genotype data.

Figure 1.

Figure 1

Example of a Fine-Mapping Locus in Three Different Populations

In population 1 (left), the causal variants are present, but strong regional LD makes it difficult to distinguish them from tagging SNPs. In population 2 (middle), the causal variants both have a very low frequency and/or are monomorphic, resulting in no observable association between the SNPs and the trait. In population 3 (right), the causal variants are common and have few tagging SNPs. Our framework jointly models population-specific LD structure and integrates functional genomic data to prioritize causal variants.

Through extensive simulations, we show that our trans-ethnic framework significantly improves fine-mapping resolution over conventional meta-analysis strategies and demonstrate that considering multiple causal variants in multi-ethnic cohorts yields large gains in fine-mapping efficiency. We showcase our framework by reanalyzing empirical summary data from a large trans-ethnic rheumatoid arthritis (RA [OMIM: 180300]) GWAS.4 We first demonstrate that the functional architecture of RA is consistent across ethnicities and that there is a strong preponderance of immune-related functional classes that are enriched with causal variants. We then fine map the RA GWAS loci by using functional data and show that our method greatly outperforms current state-of-the art methodologies and uncovers a number of plausible functional variants.

Material and Methods

Multi-population Fine-Mapping Framework

Without loss of generality (given that similar results can be derived for case-control traits), let y be a quantitative phenotype such that yi=giβ+ϵi, where ϵiN(0,σe2), and gi denotes a multi-SNP genotype containing {0,1,2} counts of the reference allele at M SNPs for an individual i. The β vector represents allelic effects where the jth entry will be non-zero only if SNP j is causal. Given genotype (Gp) and phenotype (Yp) data over Np individuals from population p, a standard approach to measuring association strength at SNP j is through the Wald statistic

zpj=βˆpjSE(βpjˆ)=Cov(Gpj,Yp)NpVar(Gpj)σe2,

which asymptotically follows a normal distribution:

N(βpjVar(Gpj)σeNp,1).

We denote the non-centrality parameter (NCP) of the normal distribution as

λpj=βpjVar(Gpj)σeNp.

Under the null hypothesis that SNP j is not causal (or does not tag a causal variant; see below), βpj=0 and thus λpj=0. If the SNP is causal, then βpj0, yielding a non-zero λpj, and governs the power of detecting this variant in an association study (i.e., rejecting the null at some confidence level). Importantly, even when the allelic effects at the causal variants are similar across populations (i.e., βpj=βpj), different allele frequencies and sample sizes induce population-specific NCPs, yielding larger NCPs at more common SNPs and/or larger studies. This leads to the well-known result that causal variants are more readily detectable in populations in which they are present more frequently.

Pervasive LD at fine-scale resolutions induces correlations between tag SNPs and causal SNPs, thus creating an indirect association between tag SNPs and traits.13 More specifically, the LD-induced NCP at a SNP j (Λpj) can be approximated as a linear combination of NCPs at the causal SNPs with LD-adjusted weights13,27,28,39 as

Λpj=crpj,cλpc, (Equation 1)

where the sum is taken across all causal SNPs c, and rpj,c is the Pearson correlation coefficient between SNPs j and c in population p. We expand Equation 1 to include all SNPs in the locus by incorporating indicator variable Cpk, which is set to 1 if SNP k is causal in population p and 0 otherwise:

Λpj=k=1Mrpj,kλpkCpk. (Equation 2)

In vector notation,

Λp=Σp(λpCp), (Equation 3)

where Σp is the LD matrix of Pearson correlations among the M SNPs, Cp is a binary vector indicating which SNPs are causal, and  denotes the element-wise multiplication between two vectors. We can now write the probability of the data (i.e., the observed standardized effect sizes, Z scores) given the causal variants (Cp) in population p under a MVN assumption:

Zp|λp,CpN(Λp,Σp). (Equation 4)

This allows us to define the total likelihood of the data by marginalizing across all sets of causal SNPs (C) as

L(Z1,Z2,,ZP;λ1,λ2,,λP)=pCpCP(Zp|λp,Cp)P(Cp), (Equation 5)

which we simplify under the assumption that the causal vector set is identical across populations:

L(Z1,Z2,,ZP;λ1,λ2,,λP)=CCpP(Zp|λp,C)P(C). (Equation 6)

Here, P(Zp|λp,Cp) is defined as the probability density function of the MVN (see Equation 4), and P(C) is the probability of a given causal set. Note that Equation 6 assumes that the causal set is identical across populations but allows for different effect sizes at causal SNPs across populations.

Integration of Functional-Annotation Data

We assume that each variant can potentially have several phenotypically relevant genomic functional annotations (e.g., transcription factor binding site), which can be encoded as binary variable Ajk for variant j and annotation k or as a continuous value (e.g., a probabilistic membership of variants in different functional classes). We integrate the functional information through the probability of the causal set C as follows,

P(C;γ)=j(11+exp(γTAj))Cj(11+exp(γTAj))1Cj, (Equation 7)

where γ is a vector containing the prior log-odds ratio of causality for every functional annotation. We extend the likelihood to incorporate functional data as

L(Z1,Z2,,ZP;λ1,λ2,,λP,γ)=CCpP(Zp|λp,C)P(C;γ), (Equation 8)

which we can further simplify by assuming that data at different loci are independent:

L(Z1,Z2,,ZP;λ1,λ2,,λP,γ)=lClClpP(Zl,p|λp,l,Cl)P(Cl;γ). (Equation 9)

Finally, to obtain posterior probabilities that each SNP is causal, we use Bayes theorem to compute the joint posterior for each causal set,

P(Cl|Zl,1,Zl,2,,Zl,P;λl,1,λl,2,,λl,P,γ)=pP(Zl,p|λp,l,Cl)P(Cl;γ)ClClpP(Zl,p|λp,l,Cl)P(Cl;γ), (Equation 10)

and subsequently marginalize across all Cl=(C1l,C2l,,CNl) such that Cjl=1:

P(Cjl|Z1,Z2,,ZP;λl,1,λl,2,,λl,P,γ)=ClCl:Cjl=1P(Cl|Z1,Z2,,ZP;λl,1,λl,2,,λl,P,γ). (Equation 11)

Model Fitting

Because of the finite nature of either the sample or the reference panel, the LD matrix in practice could be ill conditioned. We apply a Tikhonov Regularization40 to all LD matrices to ensure their invertibility and as a result preserve the non-degeneracy and numerical stability of the MVN approximation. Furthermore, because we ensure that all Σ are positive definite, there exists a Cholesky decomposition for each LD matrix and its corresponding inverse. Let Lp=Chol(Σp)1; it follows that Zp˜=LpZpN(LpΛp,I). In practice, we operate in the transformed-Z-score space (Zp˜) because it improves numerical stability and reduces computational burden by removing a large, repetitive matrix multiplication when computing the MVN density.

We fit the parameters of the model to the data across all loci by using a variant of the expectation maximization over the functional annotations (γ) and approximate the NCPs by using a simple function of the observed Z scores (see Appendix A). We note that because enumerating over all possible causal sets is combinatorially intractable, we typically restrict the number of causal variants per locus to two or three in practice.

Simulation Data

We benchmarked our proposed framework by using simulations starting from real genotype data. Using the NHGRI catalog of GWAS variants on chromosome 1,1 we centered 25-kb windows on the lead SNP and used HAPGEN241 and 1000 Genomes12 to simulate individuals from the Asian (n = 286), African (n = 246), and European (n = 379) ancestries. SNPs that were polymorphic with a minor allele frequency ≥ 0.01 in at least one population were retained for analysis. For each simulation, we randomly chose 50 loci and simulated causal variants by drawing causal status according to the logistic prior model described above. Unless otherwise noted, we used the annotations (coding, UTR, promoter, DNase hypersensitivity site [DHS], intronic, and intergenic) and functional enrichments (13.8×, 8.4×, 2.8×, 5.1×, and 0.1×) observed in Gusev et al.35 for simulations below. We simulated phenotypes under a linear model such that for individual i of population p, their phenotype Y was drawn as Yi,p=j=1Ncβjgj,i,p+ϵi,p, where Nc is the total number of causal variants, βj is the effect size of the jth causal SNP, gj,i,p is the number of copies of the risk allele j for individual i of population p. Following recent works, we simulated similar heritability across populations.42 The population-specific error term, ϵi,p, was drawn according to a N(0,σe,p2), where σe,p2=(σg,p2hg2σg,p2)/hg2, σg,p2=βCov(Xp)β, and Cov(Xp) is the population-specific covariance of the genotypes (LD). The effect size, βj, was set to be inversely proportional to the average SD of the population allele frequencies; this is roughly equivalent to assuming that each causal SNP explains an equal proportion of the phenotypic variance.43

Existing Methods

We compared our proposed methods with other well-established probabilistic methods for fine mapping. First, we investigated MANTRA, a Bayesian trans-ethnic meta-analysis technique proposed by Morris.15 We obtained the software implementation from the author and ran it with the default settings; we provided the fixation index (FST) between the three populations as determined in Nelis et al.44 as the prior for the Bayesian partition model. The output of MANTRA is a Bayes factor, which we subsequently converted to

PPAi=BFikBFk

as previously recommended.3,22,45 Similarly, we calculated posterior probabilities that SNPs are causal strictly on the basis of the inverse-variance fixed-effects19 meta-analysis by using the PAINTOR (Probabilistic Annotation Integrator)28 and CAVIARBF46 frameworks. We note that the CAVIARBF and PAINTOR models require LD as input, which we calculated as the average of the population-specific LD weighted by the sample size of each population. We assessed accuracy by rank ordering SNPs across all fine-mapping loci according to the output of each method and then determined the proportion of identified causal variants as more SNPs were selected. We typically report the median number of SNPs one would need to validate in order to resolve 90% of the causal variants as our main metric of accuracy.

RA Multi-ethnic Fine-Mapping Dataset

We downloaded summary statistics from a large trans-ethnic RA GWAS consisting of over 100,000 individuals (∼68,000 of European ancestry and ∼36,000 of Asian ancestry).4 We used the reported genome-wide-significant loci, excluding human leukocyte antigen regions, and centered 100-kb windows around the top SNP, yielding a total of 89 fine-mapping loci. For each of these regions, we estimated LD by using the European and Asian individuals from 1000 Genomes.12 We integrated 482 publicly available functional annotations comprising 406 DHSs spanning numerous cell types and tissues,31,47 the seven genomic segmentations of the eight primary ENCODE cell lines,48 Fantom5 enhancer and transcription start site regions,49 immune cell enhancers,10 genic elements derived from GenCode,50 and overall methylation and acetylation marks from ENCODE.29 The construction of a phenotypically specific fine-mapping model requires two phases. First, we ran the model marginally on each annotation and subsequently rank ordered all the annotations according to likelihood-ratio statistics.28,37 Second, we selected the top annotations that were minimally correlated with one another (usually no more than five) to enter a final model to estimate posterior probabilities that variants are causal.

Results

Joint Modeling of Association Statistics across Populations Increases Fine-Mapping Performance

We used simulations to investigate the benefit of jointly modeling population-specific association statistics versus standard meta-analysis approaches. We simulated fine-mapping datasets over 10,000 individuals equally divided among European, Asian, and African ancestries with total heritability of hg2=0.25 across 50 loci with genetic architecture similar to that in Gusev et al.35 The loci were simulated such that in expectation, each locus harbored a single causal variant with allelic effects shared across populations (see Material and Methods). This yielded an average of 15 loci with a single causal variant and 13 loci with multiple causal variants per simulation. In general, we find that trans-ethnic fine-mapping strategies that assume a single causal variant are less optimal than those that allow for multiple causal variants (Table 1). For example, MANTRA meta-analysis requires 1.9 and 96.8 SNPs per locus in order to identify 50% and 90% of the causal variants, respectively, whereas methods that allow multiple causal variants but do not incorporate functional data require 1.2 and 7.0 SNPs per locus to identify 50% and 90% of the causal variants, respectively.46 Existing integrative fine-mapping methods that leverage functional data28 applied to fixed-effects meta-analysis statistics achieve accuracy of 1.0 and 5.6 SNPs per locus to find 50% and 90% of the causal variants, respectively. In contrast, our proposed framework resolves causal variants with the greatest efficiency (Figure 2) in that it requires only 0.9 and 5.2 SNPs per locus to find 50% and 90% of the causal variants, respectively (paired t test, p < 0.001). Overall, this can be attributed to the fact that our approach models population-specific LD patterns while allowing for multiple causal variants in the presence of functional information.

Table 1.

Our Trans-ethnic Integrative Framework Is Superior to Conventional Meta-analysis Strategies and Current State-of-the-Art Methodologies

Heterogeneity Level Identified Proportion of Causal Variants Single Causal Variant per Locus
Multiple Causal Variants per Locus
Fixed-Effects Meta-analysis MANTRA15 Fixed-Effects CAVIARBF46 Fixed-Effects PAINTOR28a Trans-ethnic PAINTORa
None 0.50 1.9 2.0 1.2 1.0 0.9
0.75 29.8 30.3 2.9 2.1 1.9
0.90 96.8 96.8 7.0 5.6 5.2
Weak 0.50 1.9 2.0 1.1 0.9 0.9
0.75 62.3 62.7 2.9 2.0 1.8
0.90 118.1 118.6 6.8 4.9 4.1
Strong 0.50 29.0 11.1 12.6 9.6 2.3
0.75 105.0 92.7 68.6 58.4 19.7
0.90 143.9 139.8 134.4 121.3 56.5

We simulated 1,000 multi-ethnic fine-mapping datasets under various levels of allelic heterogeneity across populations. For the first two levels of heterogeneity (“none” and “weak”), we invoked the standard infinitesimal assumption on allelic effects either globally or at the population level by setting effect sizes (βc,p) at the causal SNPs inversely proportional to either the mean allele-frequency SD or the population-specific allele-frequency SD. To simulate strong heterogeneity across ancestries, we drew effect sizes from a standard normal distribution for each population independently and added enough Gaussian noise to maintain hg2=0.25. Displayed here is the median number of SNPs selected per locus for identifying a specified proportion of the causal variants.

a

Methods that also integrate functional data.

Figure 2.

Figure 2

Trans-ethnic PAINTOR Is Most Efficient in Identifying Causal Variants

The distributions of the number of SNPs required for follow-up identification of 90% of the causal variants across 1,000 simulations are displayed as boxplots. The different panels represent increasing levels of effect-size heterogeneity by ancestry: none (left), weak (middle), and strong (right). The widths of the notches in each boxplot roughly correspond to 95% confidence intervals for the median number of SNPs required for resolving 90% of the causal variants. For the sake of clarity, we have cut the y axis to emphasize the significant difference in performance across all three methods.

Recent studies have shown that GWAS findings generally replicate across populations,42,51 thus suggesting sharing of underlying causal variants. However, it is generally unknown whether these variants contribute to disease risk uniformly across populations. We sought to assess the performance of fine mapping in the situation where the causal variants have either weak or strong heterogeneity by ancestry. In addition to fine mapping datasets in which causal effects were similar across populations (no heterogeneity), we simulated allelic effects inversely proportional to the population-specific allele-frequency SD (weak heterogeneity) and normally distributed allelic effects for each ancestry independently (strong heterogeneity). We found that our framework significantly outperformed the fixed-effects meta-analysis followed by probability estimation by existing methods. For example, in the case of weak heterogeneity, our approach required 4.1 as opposed to 4.9 SNPs per locus (19.5% improvement); in addition, in the presence of strong heterogeneity, our approach dramatically outperformed existing meta-analysis strategies by reducing the number of SNPs required for identifying 90% of the causal variants from 121.3 to 56.5 (214% improvement) (Table 1; Figure 2). The increase in performance is likely due to the fact that our framework makes no assumptions pertaining to the population-specific allelic effects at causal SNPs, given that we allow the empirically observed Z scores in each population to dictate the effect size. This allows for arbitrary levels of heterogeneity in the effect size by population, whereas fixed-effects meta-analysis assumes similar effect sizes across populations.

Performance of Trans-ethnic Fine Mapping

The benefit of trans-ethnic fine mapping has been thoroughly documented both in simulations and in empirical data.3,13,15 However, previous works have assumed a single causal variant at a risk locus, and this assumption is often invalidated in practice. Here, we sought to assess trans-ethnic fine mapping in the presence of multiple causal variants at a risk locus while integrating functional-annotation data. Consistent with previous works,13 we found that for the same sample size, multi-ethnic cohorts attained superior accuracy over single-population studies. However, allowing for multiple causal variants enabled trans-ethnic fine mapping to perform even better than single-population fine mapping. We observed a near 3- to 4-fold increase in the median resolution for methods that model multiple causal variants but only a 1.4- to 1.6-fold gain for methods that assume a single causal (see Table 2). We attribute this to the much smaller number of sets of causal variants (as a proportion of the total possible sets) that are compatible with the observed association statistics. Diversity in LD patterns across populations additionally penalizes sets of variants that do not contain the true causal variants because they are unlikely to explain the observed data. Consequently, multi-ethnic cohorts not only will have proportionally more LD patterns than single-population cohorts (therefore placing larger penalties on incorrect causal sets) but can also borrow power from populations where the causal variants are present more frequently.

Table 2.

Modeling Multiple Causal Variants in Multi-ethnic Cohorts Yields Larger Relative Gains in Fine-Mapping Efficiency

Ethnic Group Single Causal Variant
Multiple Causal Variants
+ +
Asians 136.9 134.4 89.3 36.2
Europeans 135.0 130.9 82.9 33.5
Africans 104.0 95.0 34.4 14.7
Trans-ethnic 72.6 58.4 8.5 4.9
Relative 1.4 1.6 4.0 3.0

We simulated fine-mapping datasets with various ethnic compositions and allelic effects shared across populations. Displayed here are four fine-mapping strategies that consider either single or multiple causal variants at each risk locus and either have (+) or do not have (−) access to functional data across different ethnic study designs. The bottom row represents the relative gain in the median 90% causal-variant resolution of trans-ethnic cohorts over the next best-performing group.

Genetic-Trait Architecture Affects Fine-Mapping Performance

Functional information was demonstrated to improve fine-mapping resolution in a single population,10,28,37,38 and we investigated the potential gains in a trans-ethnic setting. We simulated two disease architectures by using five functional annotations where causal variants either localize predominantly within a single broad functional class, as observed by Gusev et al.35 (A1), or have a smaller, more diffuse localization within functionally specific cell types28 (A2). For each class of disease architectures, we fit six trans-ethnic integrative models such that each successive model incorporated an additional functional annotation into a joint framework. Not surprisingly, when the true genetic architecture of a trait at fine-mapping regions has a strong enrichment of causal variants within a common functional class (i.e., DHSs35), these functional annotations will be most informative for the purposes of fine mapping (see Figure 3). On the other hand, more diffuse localization of causal variants requires multiple annotations for maximizing the utility of functional data. For example, for genetic architecture A1, the addition of the DHS annotation yielded a 70% increase in fine-mapping resolution, whereas genetic architecture A2 required all five annotations to improve resolution by 18% (see Figure 3).

Figure 3.

Figure 3

The Underlying Functional Architecture of a Trait Affects Fine-Mapping Performance

We simulated two classes of disease architectures: A1 (solid line) and A2 (dashed line). Architecture A1 was based on the functional enrichment observed in Gusev et al.35 and had a strong enrichment within a single DHS class. Architecture A2 was simulated with a more diffuse enrichment in various cell types and classes and was based on what we empirically observed in the RA dataset. Displayed on top of each point is the percentage of SNPs falling within that annotation and its corresponding enrichment.

Integrative Fine Mapping in a Multi-ethnic RA Dataset

We investigated whether similar results from simulations can be attained in empirical data from a trans-ethnic RA GWAS over more than 100,000 individuals4 (see Material and Methods). Because the functional genetic architecture of RA across different populations is unknown, we first quantified whether the enrichment of causal variants in various functional annotations is consistent across ancestries. Reassuringly, we saw a strong correspondence in functional enrichment at the fine-mapping loci across all 482 functional categories we investigated (r = 0.597; Figure 4). This provides evidence supporting the assumption that a single functional prior can be applied across populations uniformly in trans-ethnic fine mapping.

Figure 4.

Figure 4

Functional Enrichment Is Consistent across Europeans and Asians

We compared the enrichment across 482 functional annotations at 89 RA-associated loci in Europeans (n ≈ 68,000) and Asians (n ≈ 36,000) separately. Each point represents the estimated enrichment of an annotation in both European and Asian populations.

Next, we estimated trans-ethnic enrichment for each of the 482 annotations independently to allow the model to discern the most functionally relevant cell types and classes. The enrichment likelihood ratios supplied by this procedure provide a natural way to prioritize functional annotations to move forward with fine mapping.28 We consistently found a strong and significant enrichment of causal variants within activity regulatory regions of immune-related cell types (see Figure 5), which is largely in line with RA etiology (rank permutation p < 0.001). The final trans-ethnic integrative model included annotations of DHS regions specific to three cell types (skin keratinocytes, T helper 2 cells, and B lymphocytes), immune enhancer regions described in Farh et al.,10 and GENCODE-defined exon regions. We found that simply applying existing multi-causal frameworks27,28 on the trans-ethnic meta-analysis statistics yielded wider 90% credible sets (it required approximately 28.5 SNPs per locus as opposed to 24.0 SNPs per locus for our proposed framework), thus demonstrating the benefit of modeling population-level LD. Furthermore, the integration of functional data additionally reduced the size of the credible set to 21.7 SNPs per locus (see Table 3), showing that leveraging functional annotations refines trans-ethnic fine-mapping signal.

Figure 5.

Figure 5

Trans-ethnic Functional Enrichment at RA GWAS Loci Indicates Immune-Related Regulatory Architecture

Here, we compare the enrichment of casual variants within 42 DHSs of immune-related cell types (B cells, T cells, natural killer cells, keratinocytes, monocytes, and thymic cells) and the enrichment of causal variants in 354 DHS annotations of other cell types. The widths of the notches in each boxplot roughly correspond to 95% confidence intervals for the median enrichment.

Table 3.

Integrative Approaches that Model Population-Level LD Yield the Smallest Credible Sets in Empirical Data

Association Statistics Average No. of SNPs
Without Annotations With Annotations
Asians 35.2 31.9
Europeans 32.0 28.7
Fixed-effects meta-analysis 28.5 25.0
Trans-ethnic 24.0 21.7

Displayed here is the average number of SNPs per locus in the 90% credible sets for single and multi-population fine mapping of RA-associated loci. To compute credible sets, we first ordered the SNPs across all 89 loci and then took the total number of ordered SNPs that consumed 90% of the total posterior probability mass. Consistent with simulation findings, integrating multiple populations with functional data improved fine-mapping resolution.

Next, we explored the plausible causality of the SNPs that attained a high posterior probability under our framework (Table 4). For example, rs968567, which lies within the promoter region of FADS2 (OMIM: 606149) and was functionally validated to disrupt transcription factor binding and subsequent gene expression,52 achieved a trans-ethnic posterior probability of 0.29. However, this variant fell within all five functional annotations that our framework deemed important for this trait and, upon appropriate re-weighting, achieved a posterior probability of 0.84. Alternatively, trans-ethnic association can be extremely beneficial on its own. For example, rs12693993, a variant within the coding region of CD28 (OMIM: 186760), a gene implicated for its importance in T cell development and proliferation and cytokine production, achieved a posterior probability for causality of 0.34 and 0.02 in Europeans and Asians, respectively. However, upon integration of trans-ethnic association with functional data, it achieved a posterior probability for causality of 0.85. The identification of these two SNPs, among others, serves as an important illustration of the benefit of our proposed methodologies.

Table 4.

Integrating Trans-ethnic Association Strength with Functional Data Promotes a Number of SNPs to Attain a High Posterior Probability for Causality

rsID Chromosomal Position European Association (Z Score) Asian Association (Z Score) Posterior Probability without Annotations Posterior Probability with Annotations Functional Annotations
rs2476601 chr1: 114,377,568 −26.04 NA 1.00 1.00 coding exons, skin keratinocyte DHSs
rs7731626 chr5: 55,444,683 −9.84 NA 1.00 1.00 GM12865 DHSs, Th2 DHSs, immune enhancers
NA chr1: 2,523,878 −5.22 −4.18 1.00 1.00 immune enhancers
rs1893592 chr21: 43,855,067 −5.73 −4.01 1.00 1.00 coding exons, immune enhancers
NA chr19: 10,771,941 −6.13 NA 1.00 1.00 immune enhancers
rs72767222 chr5: 55,440,788 5.11 NA 0.99 0.99 skin keratinocyte DHSs, immune enhancers
rs12715125 chr3: 27,763,427 5.58 NA 0.95 0.99 coding exons, GM12865 DHSs, Th2 DHSs, skin keratinocyte DHSs, immune enhancers
rs71508903 chr10: 63,779,871 7.26 5.88 0.76 0.93 GM12865 DHSs, skin keratinocyte DHSs, immune enhancers
rs12693993 chr2: 204,595,597 −2.74 −1.76 0.68 0.88 Th2 DHSs, skin keratinocyte DHS, immune enhancers
rs968567 chr11: 61,595,564 −4.95 NA 0.29 0.85 coding exons, GM12865 DHSs, Th2 DHSs, skin keratinocyte DHSs, immune enhancers
rs909685 chr22: 39,747,671 6.29 4.62 0.65 0.84 Th2 DHSs, skin keratinocyte DHSs, immune enhancers
rs657075 chr5: 131,430,118 2.54 4.46 0.73 0.82 skin keratinocyte DHSs, immune enhancers

We applied our framework across all 89 GWAS RA loci with relevant functional data. Displayed in this table are SNPs achieving a trans-ethnic posterior probability of greater than 0.8. Abbreviations are as follows: NA, not applicable; Th2, T helper 2 cell.

Discussion

In this work, we introduced a fine-mapping framework that bridges several sources of evidence to prioritize functional SNPs and demonstrated its efficacy in real and simulated datasets. As fine-mapping data become increasingly multi-ethnic3,4 and functional data become larger and more refined,30 we believe that our proposed methodology will have increasing relevance. By operating exclusively on summary data, our approach reduces the need to share individual data, which often prohibits large-scale analyses. In addition, a key advantage of our proposed methodology is that it provides an unbiased perspective on which functional genomic data are most relevant to the trait within an Empirical Bayes framework. Rather than relying on careful and manual selection of functional elements when conducting fine mapping,10,36 we allow the data to dictate the functional relevance of a particular annotation. As the catalog of functional data expands to encompass more diverse cell types and genomic signatures, a principled strategy to parsing these annotations is paramount.

We note that although our model does not assume a priori that there exists allelic heterogeneity by ancestry,15 by construction, it is capable of handling trans-ethnic heterogeneity whether it is due to a true difference in the per-allelic effects or simply a result of genetic drift that yielded distinct allele frequencies at the causal SNPs. We have found that as the level of heterogeneity across populations increases, our framework increasingly outperforms competing strategies. Although extreme heterogeneity might be unlikely, gene-environment interactions in complex traits can manifest themselves as distinct allelic effects across populations.53

We conclude with several limitations of our proposed framework. The efficacy of our proposed method is intimately connected to the underlying functional architecture of the trait being examined. In the scenario where the correct functional annotation is unavailable or the distribution of casual variants is more or less uniform across the functional-annotation categories, our method will most likely underperform fine-mapping strategies that either do not estimate parameters for functional enrichment27,46 or pre-specify the correct enrichment parameters from other external analyses.35 However, there is mounting evidence that suggests that casual variants for most complex traits co-localize with epigenetic marks10,31,35,37 that are now available for the vast majority of human cell types.54 Finally, additional improvements in performance could be made through a Bayesian treatment of non-centrality parameters within our framework,46 which we leave as a potential direction for future work.

Acknowledgments

We would like to thank Alkes Price, Hillary Finucane, Robert Brown, Huwenbo Shi, and Nicholas Mancuso for helpful discussion and feedback on this work. This research was supported by NIH grants R01-HG006399, R01-GM53275, and R21-CA182821.

Published: July 16, 2015

Appendix A: Optimization Procedure

We optimize the parameters of our model by using expectation maximization. First, we take expectations of the complete data log-likelihood with respect to the posterior distribution of causal sets and simplify to obtain a function, Q, that is readily optimized via standard techniques. Let Zl, represent all P vectors of association statistics (Zl,1,Zl,2,,Zl,P) at locus l, and let λl, be the corresponding vectors of non-centrality parameters,

Q(γ,λ|γ(t),λ)=lClP(Cl|Zl,λl,,γ(t))lnP(Zl,;λl,,γ(t))=lClP(Cl|Zl,λl,,γ(t))(lnP(Cl;γ(t))+plnP(Zl,p|Cl,λl,p))=lClP(Cl|Zl,λl,,γ(t))lnP(Cl;γ(t))+lClP(Cl|Zl,,γ(t),λl,)plnP(Zl,p|Cl,λl,p)=Q(γ|γ(t))+Q(λp|λp),

thereby decoupling the prior from the likelihood. We simplify Q(γ|γ(t)) to obtain

Q(γ|γ(t),λ)=ljcjl0,1P(cjl|Zl,;γ(t),λl,)lnP(cjl;γ(t))=ljP(cjl=1|Zl,;γ(t),λl,)ln(1+exp(γTAjl))ljP(cjl=0|Zl,;γ(t),λl,)ln(1+exp(γTAjl)),

which is a concave function whose gradient is simply

Q(γ|γ(t),λ)γ=jlP(cjl=1|Zl,;γ(t),λl,)11+exp(γTAjl)AjljlP(cjl=0|Zl,;γ(t),λl,)11+exp(γTAjl)Ajl.

To avoid potential numerical instability resulting from inverting a Hessian matrix, as would be required for standard Newton-Raphson, we optimize this function Q by using a limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm implemented in the NLopt library. Finally, as previously mentioned, the non-centrality parameter for SNP j at locus l from population p, λp,lj, is set simply as

f(Zp,lj)={argmin(3.7,Zp,lj)ifZp,lj<0argmax(3.7,Zp,lj)ifZp,lj>00ifZp,lj=0(SNPjis monomorphic in populationp),

a strategy that was previously demonstrated to work well in practice.28 This iterative algorithm is repeated until the change in the log-likelihood is less than 0.01.

Web Resources

The URLs for data presented herein are as follows:

References

  • 1.Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff L., Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Willer C.J., Schmidt E.M., Sengupta S., Peloso G.M., Gustafsson S., Kanoni S., Ganna A., Chen J., Buchkovich M.L., Mora S., Global Lipids Genetics Consortium Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mahajan A., Go M.J., Zhang W., Below J.E., Gaulton K.J., Ferreira T., Horikoshi M., Johnson A.D., Ng M.C., Prokopenko I., DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. Asian Genetic Epidemiology Network Type 2 Diabetes (AGEN-T2D) Consortium. South Asian Type 2 Diabetes (SAT2D) Consortium. Mexican American Type 2 Diabetes (MAT2D) Consortium. Type 2 Diabetes Genetic Exploration by Nex-generation sequencing in muylti-Ethnic Samples (T2D-GENES) Consortium Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 2014;46:234–244. doi: 10.1038/ng.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Okada Y., Wu D., Trynka G., Raj T., Terao C., Ikari K., Kochi Y., Ohmura K., Suzuki A., Yoshida S., RACI consortium. GARNET consortium Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506:376–381. doi: 10.1038/nature12873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shungin D., Winkler T.W., Croteau-Chonka D.C., Ferreira T., Locke A.E., Mägi R., Strawbridge R.J., Pers T.H., Fischer K., Justice A.E., ADIPOGen Consortium. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GEFOS Consortium. GENIE Consortium. GLGC. ICBP. International Endogene Consortium. LifeLines Cohort Study. MAGIC Investigators. MuTHER Consortium. PAGE Consortium. ReproGen Consortium New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Visscher P.M., Brown M.A., McCarthy M.I., Yang J. Five years of GWAS discovery. Am. J. Hum. Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rivas M.A., Beaudoin M., Gardet A., Stevens C., Sharma Y., Zhang C.K., Boucher G., Ripke S., Ellinghaus D., Burtt N., National Institute of Diabetes and Digestive Kidney Diseases Inflammatory Bowel Disease Genetics Consortium (NIDDK IBDGC) United Kingdom Inflammatory Bowel Disease Genetics Consortium. International Inflammatory Bowel Disease Genetics Consortium Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat. Genet. 2011;43:1066–1073. doi: 10.1038/ng.952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Farh K.K.-H., Marson A., Zhu J., Kleinewietfeld M., Housley W.J., Beik S., Shoresh N., Whitton H., Ryan R.J., Shishkin A.A. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zaitlen N., Paşaniuc B., Gur T., Ziv E., Halperin E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 2010;86:23–33. doi: 10.1016/j.ajhg.2009.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ong R.T.-H., Wang X., Liu X., Teo Y.-Y. Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. Eur. J. Hum. Genet. 2012;20:1300–1307. doi: 10.1038/ejhg.2012.88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Morris A.P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 2011;35:809–822. doi: 10.1002/gepi.20630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Teo Y.-Y., Ong R.T., Sim X., Tai E.S., Chia K.-S. Identifying candidate causal variants via trans-population fine-mapping. Genet. Epidemiol. 2010;34:653–664. doi: 10.1002/gepi.20522. [DOI] [PubMed] [Google Scholar]
  • 17.Udler M.S., Meyer K.B., Pooley K.A., Karlins E., Struewing J.P., Zhang J., Doody D.R., MacArthur S., Tyrer J., Pharoah P.D., SEARCH Collaborators FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Hum. Mol. Genet. 2009;18:1692–1703. doi: 10.1093/hmg/ddp078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Stacey S.N., Sulem P., Zanon C., Gudjonsson S.A., Thorleifsson G., Helgason A., Jonasdottir A., Besenbacher S., Kostic J.P., Fackenthal J.D. Ancestry-shift refinement mapping of the C6orf97-ESR1 breast cancer susceptibility locus. PLoS Genet. 2010;6:e1001029. doi: 10.1371/journal.pgen.1001029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Evangelou E., Ioannidis J.P. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 2013;14:379–389. doi: 10.1038/nrg3472. [DOI] [PubMed] [Google Scholar]
  • 20.Wang X., Chua H.-X., Chen P., Ong R.T.-H., Sim X., Zhang W., Takeuchi F., Liu X., Khor C.-C., Tay W.-T. Comparing methods for performing trans-ethnic meta-analysis of genome-wide association studies. Hum. Mol. Genet. 2013;22:2303–2311. doi: 10.1093/hmg/ddt064. [DOI] [PubMed] [Google Scholar]
  • 21.Liu C.-T., Buchkovich M.L., Winkler T.W., Heid I.M., Borecki I.B., Fox C.S., Mohlke K.L., North K.E., Adrienne Cupples L., African Ancestry Anthropometry Genetics Consortium. GIANT Consortium Multi-ethnic fine-mapping of 14 central adiposity loci. Hum. Mol. Genet. 2014;23:4738–4744. doi: 10.1093/hmg/ddu183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Maller J.B., McVean G., Byrnes J., Vukcevic D., Palin K., Su Z., Howson J.M., Auton A., Myers S., Morris A., Wellcome Trust Case Control Consortium Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 2012;44:1294–1301. doi: 10.1038/ng.2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Beecham A.H., Patsopoulos N.A., Xifara D.K., Davis M.F., Kemppinen A., Cotsapas C., Shah T.S., Spencer C., Booth D., Goris A., International Multiple Sclerosis Genetics Consortium (IMSGC) Wellcome Trust Case Control Consortium 2 (WTCCC2) International IBD Genetics Consortium (IIBDGC) Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 2013;45:1353–1360. doi: 10.1038/ng.2770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Onengut-Gumuscu S., Chen W.-M., Burren O., Cooper N.J., Quinlan A.R., Mychaleckyj J.C., Farber E., Bonnie J.K., Szpak M., Schofield E., Type 1 Diabetes Genetics Consortium Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 2015;47:381–386. doi: 10.1038/ng.3245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Meyer K.B., O’Reilly M., Michailidou K., Carlebur S., Edwards S.L., French J.D., Prathalingham R., Dennis J., Bolla M.K., Wang Q., GENICA Network. kConFab Investigators. Australian Ovarian Cancer Study Group Fine-scale mapping of the FGFR2 breast cancer risk locus: putative functional variants differentially bind FOXA1 and E2F1. Am. J. Hum. Genet. 2013;93:1046–1060. doi: 10.1016/j.ajhg.2013.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Trynka G., Hunt K.A., Bockett N.A., Romanos J., Mistry V., Szperl A., Bakker S.F., Bardella M.T., Bhaw-Rosun L., Castillejo G., Spanish Consortium on the Genetics of Coeliac Disease (CEGEC) PreventCD Study Group. Wellcome Trust Case Control Consortium (WTCCC) Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat. Genet. 2011;43:1193–1201. doi: 10.1038/ng.998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hormozdiari F., Kostem E., Kang E.Y., Pasaniuc B., Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics. 2014;198:497–508. doi: 10.1534/genetics.114.167908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kichaev G., Yang W.-Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P., Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10:e1004722. doi: 10.1371/journal.pgen.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Trynka G., Raychaudhuri S. Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases. Curr. Opin. Genet. Dev. 2013;23:635–641. doi: 10.1016/j.gde.2013.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Karczewski K.J., Dudley J.T., Kukurba K.R., Chen R., Butte A.J., Montgomery S.B., Snyder M. Systematic functional regulatory assessment of disease-associated variants. Proc. Natl. Acad. Sci. USA. 2013;110:9607–9612. doi: 10.1073/pnas.1219099110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Gusev A., Lee S.H., Trynka G., Finucane H., Vilhjálmsson B.J., Xu H., Zang C., Ripke S., Bulik-Sullivan B., Stahl E., Schizophrenia Working Group of the Psychiatric Genomics Consortium. SWE-SCZ Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. SWE-SCZ Consortium Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hazelett D.J., Rhie S.K., Gaddis M., Yan C., Lakeland D.L., Coetzee S.G., Henderson B.E., Noushmehr H., Cozen W., Kote-Jarai Z., Ellipse/GAME-ON consortium. Practical consortium Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet. 2014;10:e1004102. doi: 10.1371/journal.pgen.1004102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pickrell J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 2014;94:559–573. doi: 10.1016/j.ajhg.2014.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chung, D., Yang, C., Li, C., Gelernter, J., and Zhao, H. (2014). GPA: A statistical approach to prioritizing GWAS results by integrating pleiotropy information and annotation data. arXiv, arXiv:1401.4764, http://arxiv.org/abs/1401.4764v1. [DOI] [PMC free article] [PubMed]
  • 39.Pasaniuc B., Zaitlen N., Shi H., Bhatia G., Gusev A., Pickrell J., Hirschhorn J., Strachan D.P., Patterson N., Price A.L. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics. 2014;30:2906–2914. doi: 10.1093/bioinformatics/btu416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tikhonov A.N., Arsenin V.Y. John Wiley & Sons; 1977. Solutions of Ill-Posed Problems. [Google Scholar]
  • 41.Su Z., Marchini J., Donnelly P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011;27:2304–2305. doi: 10.1093/bioinformatics/btr341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Coram M.A., Duan Q., Hoffmann T.J., Thornton T., Knowles J.W., Johnson N.A., Ochs-Balcom H.M., Donlon T.A., Martin L.W., Eaton C.B. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet. 2013;92:904–916. doi: 10.1016/j.ajhg.2013.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yang J., Manolio T.A., Pasquale L.R., Boerwinkle E., Caporaso N., Cunningham J.M., de Andrade M., Feenstra B., Feingold E., Hayes M.G. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nelis M., Esko T., Mägi R., Zimprich F., Zimprich A., Toncheva D., Karachanak S., Piskácková T., Balascák I., Peltonen L. Genetic structure of Europeans: a view from the North-East. PLoS ONE. 2009;4:e5472. doi: 10.1371/journal.pone.0005472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Franceschini N., van Rooij F.J., Prins B.P., Feitosa M.F., Karakas M., Eckfeldt J.H., Folsom A.R., Kopp J., Vaez A., Andrews J.S., LifeLines Cohort Study Discovery and fine mapping of serum protein loci through transethnic meta-analysis. Am. J. Hum. Genet. 2012;91:744–753. doi: 10.1016/j.ajhg.2012.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chen W., Larrabee B.R., Ovsyannikova I.G., Kennedy R.B., Haralambieva I.H., Poland G.A., Schaid D.J. Fine mapping causal variants with an approximate bayesian method using marginal test statistics. Genetics. 2015 doi: 10.1534/genetics.115.176107. Published online May 6, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Thurman R.E., Rynes E., Humbert R., Vierstra J., Maurano M.T., Haugen E., Sheffield N.C., Stergachis A.B., Wang H., Vernot B. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ernst J., Kheradpour P., Mikkelsen T.S., Shoresh N., Ward L.D., Epstein C.B., Zhang X., Wang L., Issner R., Coyne M. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Forrest A.R., Kawaji H., Rehli M., Baillie J.K., de Hoon M.J., Haberle V., Lassmann T., Kulakovskiy I.V., Lizio M., Itoh M., FANTOM Consortium and the RIKEN PMI and CLST (DGT) A promoter-level mammalian expression atlas. Nature. 2014;507:462–470. doi: 10.1038/nature13182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Cunningham F., Amode M.R., Barrell D., Beal K., Billis K., Brent S., Carvalho-Silva D., Clapham P., Coates G., Fitzgerald S. Ensembl 2015. Nucleic Acids Res. 2015;43:D662–D669. doi: 10.1093/nar/gku1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Marigorta U.M., Navarro A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 2013;9:e1003566. doi: 10.1371/journal.pgen.1003566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lattka E., Eggers S., Moeller G., Heim K., Weber M., Mehta D., Prokisch H., Illig T., Adamski J. A common FADS2 promoter polymorphism increases promoter activity and facilitates binding of transcription factor ELK1. J. Lipid Res. 2010;51:182–191. doi: 10.1194/jlr.M900289-JLR200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Malaria Genomic Epidemiology Network. Malaria Genomic Epidemiology Network Reappraisal of known malaria resistance loci in a large multicenter study. Nat. Genet. 2014;46:1197–1204. doi: 10.1038/ng.3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ernst J., Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 2015;33:364–376. doi: 10.1038/nbt.3157. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES