Skip to main content
Human Molecular Genetics logoLink to Human Molecular Genetics
. 2020 Aug 3;29(R1):R81–R88. doi: 10.1093/hmg/ddaa148

Fine-mapping genetic associations

Anna Hutchinson 1, Jennifer Asimit 2, Chris Wallace 3,4,5,
PMCID: PMC7733401  PMID: 32744321

Abstract

Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further ‘fine-mapping’ step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.

Introduction

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with a spectrum of diseases and related traits (1). However, often multiple variants are associated due to linkage disequilibrium (LD), and the variant with the strongest evidence for genetic association (e.g. smallest P-value) may not be the variant that directly impacts the phenotype. Additional ‘fine-mapping’ analyses are required to identify which variants are most likely to be causal (responsible for the association), to ultimately enable the functional studies which can help elucidate the underlying biology of human phenotypes.

All fine-mapping studies require that causal variants are available in the GWAS dataset (either directly genotyped or imputed), with large sample sizes to enable sufficient power to distinguish between associated variants in LD and good-quality data to avoid misleading results (2). Many methods have been developed which aim to improve the accuracy of fine-mapping by building on these requirements in different ways. Here, we first describe the basic methodology for fine-mapping, before discussing recent extensions, particularly with regard to allowing multiple causal variants in a region, and the potential increases in accuracy afforded by simultaneous fine-mapping in multiple ancestries, of multiple traits, or by including external data on chromatin state or function.

Single Causal Variant Fine-Mapping

The early statistical fine-mapping methods that were developed assumed any genetic region containing variants in LD with a GWAS association signal could contain at most one causal variant (3). Whilst the assumption was biologically unrealistic, the approaches developed set the framework for much future work and, unusually for GWAS at the time, adopted a Bayesian approach. This approach is naturally suited to the evaluation of multiple hypotheses, each corresponding to the possibility that a different variant could be causal and responsible for the whole pattern of association across a region.

Evidence for association at each SNP is summarized by a Bayes factor, which compares the marginal likelihood of the data at that SNP under different prior distributions for its effect on the phenotype, Inline graphic. Single SNP Bayes factors can be calculated using either individual genotypes (3) or approximated from summary data on the maximum likelihood estimate of Inline graphic and its standard error (4). The prior distributions compare an ‘associated’ hypothesis, HA, to a ‘non-associated’ hypothesis, H0. Under H0, we assume Inline graphic=0. Under HA, the commonly used prior is Inline graphic, where a common choice for the prior variance in a case–control setting isInline graphic which puts probability 0.02 on odds ratios either above 1.5 or below 0.67 (5). Assuming exactly one causal variant exists in the region, the posterior probability that each SNP i = 1,...,p is the causal variant can be calculated without any information on LD (3) and is proportional to its Bayes factor, BFi, written as

graphic file with name M6.gif

Generally no single variant is identified as overwhelmingly likely to be causal, so researchers prioritize credible sets of potentially causal variants, derived by sorting variants into descending order of posterior probability and adding variants to the set until the cumulative sum of posterior probabilities exceeds some threshold, α, to form a (100*α)% credible set. Whilst these are expected to contain the causal variant with probability α, it has been shown that coverage is often higher because only datasets with strong primary association signals are fine-mapped, so that smaller ‘adjusted credible sets’ can be derived to potentially improve the fine-mapping resolution (6).

The Gaussian prior N(0,W) above is typically used for estimating these posterior probabilities, but Walters and colleagues (7) advocated that a Laplace distribution provided a better fit to true causal effect estimates than a Gaussian when modelling genetic associations with breast cancer. However, the Bayes factor is reassuringly robust to differences in the form of the prior distribution, whilst changing the variance, W, has a larger but still modest effect, leading overall fine-mapping conclusions to be similar across a range of prior distributions (Fig. 1).

Figure 1.

Figure 1

The posterior probability for a given SNP in single causal variant fine-mapping is proportional to the Bayes factor (BF) for association at that SNP. We simulated data with estimated effect sizes Inline graphicN(β,V) for a range of V = 0.001, 0.01 and 0.1 and analysed each with different priors: either Gaussian or Laplace distributions with prior variance W = 0.01, 0.02 and 0.04. Different combinations of W and V produce very different marginal likelihood values under HA (left column). However, these differences are dominated by the very low marginal likelihood under H0 (centre column), such that the resultant log BF (right column) is very similar across priors.

Multiple Causal Variant Fine-Mapping

Conditional forward stepwise regression—where, sequentially, the most strongly associated SNP is conditioned on to search for additional independent signals (8)—provided statistical evidence that GWAS risk loci could contain multiple causal variants. This evidence has prompted the development of Bayesian fine-mapping methods that jointly model multiple causal variants in GWAS risk loci whilst capturing the uncertainty in the causal associations (9) (Table 1).

Table 1.

A list of popular tools for multiple causal variant fine-mapping, including URLs, details on the strategy implemented to search the space of models, whether the tool can incorporate functional data into the analysis and whether using only summary data is supported

Tool URL Search strategy Is able to incorporate functional data Requires only summary data
CAVIARBF https://bitbucket.org/Wenan/caviarbf/ Exhaustive N Y
PAINTOR https://github.com/gkichaev/PAINTOR_V3.0 Exhaustive or stochastic (MCMC) Y Y
FINEMAP http://www.christianbenner.com/ Stochastic (shotgun stochastic search) N Y
GUESSFM https://github.com/chr1swallace/GUESSFM Stochastic (GUESS) N N
JAM https://github.com/pjnewcombe/R2BGLiMS Exhaustive or stochastic (rj-MCMC) N Y
DAP https://github.com/xqwen/dap DAP algorithm Y N
DAP-G https://github.com/xqwen/dap DAP algorithm Y Y
SuSiE https://github.com/stephenslab/susieR Iterative Bayesian stepwise selection N Y

For a locus containing p SNPs, there are a total of 2p possible causal models to consider. Thus, whilst earlier methods such as CAVIAR (10) and its successor CAVIARBF (11) used exhaustive search to enumerate over all 2p possible causal configurations, other methods have built more complex but scalable alternative search strategies. GUESSFM (12) benefits from clustering SNPs into ‘tag sets’ to initially reduce the search space and uses the stochastic search algorithm GUESS that is specifically tailored to explore multimodal model spaces (13,14). In contrast to its competitors which treat the response variable as Gaussian using linear regression, GUESSFM can also accommodate generalized linear models to support a wider range of distributions for the response variable, however currently requires individual genotypes which may not be available due to privacy concerns.

Using only summary statistics, FINEMAP (15) implements a shotgun stochastic search, evaluating many neighbouring models at each iteration to efficiently search the vast space of models, whilst the joint analysis of marginal summary statistics (JAM) algorithm (16) uses a formal reversible jump MCMC algorithm. When used to fine-map 84 prostate cancer susceptibility loci using summary data from 8 GWAS subcohorts (totalling 82 591 cases and 61 213 controls), JAM identified additional independent signals in 12 regions which would have likely been missed if using conventional single causal variant fine-mapping methods (17).

Developed for efficient integrative identification of multiple causal variants whilst incorporating genomic annotations, the deterministic approximation of posterior (DAP) algorithm (18) offers a compromise between slow exhaustive and fast but restrained stochastic search methods. The DAP-G software exploits the assumption that causal variants are typically sparse in a given locus, so that a subspace of plausible models is identified using a deterministic search strategy before being thoroughly explored to generate credible sets of putative causal variants for each independent signal (9).

Credible sets have proven to be an appealing way to probabilistically select which variants to prioritize for functional validation in single causal variant fine-mapping, and are now facilitating the identification of multiple causal variants in a region, with newer software versions of earlier methods now generating such credible sets directly (e.g. FINEMAP V1.3, V1.4). The ‘Sum of Single Effects’ (SuSiE) regression model and its associated model selection framework, iterative Bayesian stepwise selection (IBSS), offer a novel deterministic algorithm for computing approximate posterior distributions in multiple causal variant fine-mapping (19). Briefly, the overall effect vector is constructed as a sum of multiple single-effect vectors that each have one non-zero entry for a potential causal variant. Fitting these models directly yields credible sets of potential causal variants, which are constructed post hoc from the posterior in other approaches. SuSiE’s computations were faster than competing fine-mapping methods (4 times faster than DAP-G, 30 times faster than FINEMAP and 4000 times faster than CAVIAR, on average), and SuSiE credible sets were able to achieve higher power, smaller size and higher purity than DAP-G credible sets. Using full genotypes from 87 Yoruban individuals, SuSiE was used to fine-map SNPs that influence splicing (‘splice QTLs’) in 77 345 introns (see (20) for the quantification of alternative splicing). Whilst the vast majority of introns yielded a single 95% credible set, indicating that a single causal variant is responsible for splicing in the region, SuSiE also uncovered 156 additional signals (represented by additional credible sets at an intron) that would likely be missed in conventional single causal variant fine-mapping. Encouragingly, these signals were enriched in regulatory regions.

Fine-Mapping Utilizing Functional Annotations

The maturation of functional genomics assays has led to the growth of publicly available large-scale, multi-tissue functional datasets (e.g. ENCODE (21), NIH Roadmap Epigenomics Mapping Consortium (22), GTEx (23), BLUEPRINT (24)) which cover DNA methylation, chromatin modification, accessibility and 3D structure (reviewed in detail by (2)). As causal variants are typically enriched in various cell type and cell state-specific genomic features (25), these data should be useful for fine-mapping causal variants. The Bayesian fine-mapping framework offers an intuitive opportunity to incorporate external biological information through principled prior specifications (26), yet uncertainty remains about how best to incorporate this data into the analysis and how to balance the influence that both the external functional data and the association study data have on the resultant fine-mapping inferences.

One option is to use a two-step approach. PolyFun (27) estimates per-SNP heritabilities for groups of SNPs showing similar functional enrichment with the trait of interest. Prior probabilities of SNP causality are then specified as proportional to the average heritability for the SNP group. This approach decouples functional enrichment estimation and fine-mapping to provide researchers with prior probabilities which can then be used directly with the existing fine-mapping software of their choice (e.g. SuSiE and FINEMAP). Alternatively, Dadaev et al. (2018) (17) used estimates of SNP causality from JAM in a conditional quantile regression framework to incorporate functional annotations for fine-mapping prostate cancer susceptibility loci.

Whilst computationally scalable to fine-map large numbers of genomic loci using vast amounts of functional data, the question of balancing information from the prior and GWAS data is most difficult in such two-step approaches.

Integrated functional fine-mapping approaches should be better able to balance the two. PAINTOR (26) allows multivariate binary functional annotation data to influence the fine-mapping by allowing the probability that a SNP is causal to vary, which is most useful for distinguishing multiple associated variants in high LD (Fig. 2). Alternatively, functional data can be used to inform causal SNP effect size priors. For example, Alenazi and colleagues (28) used a hierarchical form of the normal-gamma (NG) prior on causal SNP effect size to allow for differential shrinkage of effect sizes based on partially observed categorical functional genomic data (specifically, functional significance (FS) scores that measure deleterious effects of SNPs). Shrinkage is decreased for SNPs with high FS scores (more evidence of being deleterious) relative to SNPs with low FS scores (less evidence of being deleterious). (The effect size shrinkage for SNPs with unobserved FS scores falls somewhere between that for SNPs with high and SNPs with low FS scores.)

Figure 2.

Figure 2

The utility of incorporating functional annotation data for single causal variant fine-mapping. We simulated summary GWAS data for 8000 regions, each with a single causal variant (CV), and used PAINTOR to analyse sets of 100 regions varying the proportion of causal variants carrying a specific annotation, which is controlled to be present in 5% of all non-causal SNPs. As the proportion of causal variants with the annotation increases, (A) the mean posterior probability (PP) at the causal variant tends to increase, and (B) the size of the 95% credible set tends to decrease. The greatest gain in using relevant functional annotation data is for regions with medium-high LD, where the enrichment of the functional annotation allows a variant with the annotation to be picked from a set of variants showing similar levels of association. We distinguish between low, medium and high LD causal variants (Inline graphic), according to the number of other SNPs that they are in LD with (Inline graphic): 2 or fewer, 3–10 or more than 10, respectively.

Whilst it may seem conceptually attractive to integrate functional data with fine-mapping, there is no consensus about how this data is best incorporated into the fine-mapping procedure, and existing methods differ substantially in their methodology. The genomic region containing the CASP8 gene has been fine-mapped using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium at least six times (28–31) with little consensus reached on the likely causal variant(s) for breast cancer in the region. Only 1 of the top 20 SNPs selected by FINEMAP was also in the top 20 SNPs selected by Alenazi et al.’s NG method. Moreover, in univariate analyses of the region both with (30) and without (31) the incorporation of functional data, this SNP was not selected in the top 20 causal candidates. This example serves as a salutary reminder that fine-mapping results are method dependent and the current lack of a broad set of gold-standard true positives makes it difficult to evaluate the accuracy of different methods in the real world.

Multiancestry Meta-analysis for Fine-Mapping

Statistical fine-mapping requires large sample sizes to distinguish between associated variants in high LD. Because allele frequencies as well as LD patterns vary between populations which have been geographically separated, combining information across populations not only increases fine-mapping power through sample size but also increases resolution through exploiting these differences in LD.

Such multiancestry meta-analysis gives a greater improvement, the greater diversity of ancestry that is included (32). In particular, the lower LD found in African populations (who have more generations to their most recent common ancestors than European or Asian populations) aids this refinement. This early approach of multiancestry fine-mapping involved fixed-effects genome-wide association meta-analysis (GWAMA) (33) over all samples, assuming an equal allelic effect in all populations. The resulting Bayes factor at each SNP can then be transformed into PPs under the single causal variant assumption as above.

To allow both heterogeneity in allelic effects between distantly related populations and more homogeneous effects in populations that are more closely related, MANTRA (Meta-ANalysis of Transethnic Association studies) (34) employs a Bayesian partition model to cluster populations based on shared ancestry and population-specific allelic effects at each variant. MANTRA assumes a single causal variant and could be viewed as a hybrid meta-analysis with both fixed (within cluster) and random (between clusters) effects; the two extremes of a single cluster K = 1 and the same number of clusters as populations K=N coincide, respectively, with Bayesian implementations of fixed and random effects meta-analysis. As the marginal likelihood cannot be directly evaluated, an MCMC algorithm is used, yielding estimates of the Bayes factor for association at each SNP, as well as the approximate PP of heterogeneity in allelic effects between populations (i.e. proportion of MCMC outputs with K > 1). Simulation studies for a range of models of heterogeneity in allelic effects between diverse populations (i.e. a subset of populations have the same allelic effect and remainder are under the null model) show MANTRA improves resolution over fixed effects and give further evidence that diverse ancestries result in better localization of causal variants than single ancestries. The benefits of using MANTRA compared to a fixed-effects model are strongest when causal variants have opposing effects in different populations, though an improvement is also observed when a causal variant is not active in at least one of the populations.

An application of MANTRA to 10 type 2 diabetes susceptibility loci for a multiancestry meta-analysis of European, East Asian, South Asian and Mexican and Mexican American ancestry GWAS (26 488 cases and 83 964 controls) resulted in a reduction in the number of SNPs in 99% credible sets, compared to MANTRA for a meta-analysis of independent European ancestry GWAS only (12 171 cases and 56 862 controls) (35). In 9 out of 10 type 2 diabetes loci, multiancestry MANTRA resulted in a 99% credible set reduction of 1 (TCF7L2) to 17 (FTO) SNPs compared to the European ancestry MANTRA.

Both CAVIAR (36) and PAINTOR (37) have been extended to consider multiple populations with different LD structures, assuming the same causal variants between populations, but allowing for different effects at those variants. Evidence from studies that have examined effect sizes in different populations suggests that most variants are shared between populations, although exceptions do appear to exist (36,37). These approaches differ from MANTRA by allowing multiple variants at the expense of independently estimating allelic effects in each population, whereas MANTRA assumes more similar effects between more closely related populations and allows different effects between populations belonging to different clusters. There is therefore a trade-off between allowing for multiple causal variants at the expense of a more simplistic model and using more statistically sophisticated meta-analysis approaches at the expense of assuming a single causal variant per region. If causal effects prove to be similar between populations, the benefit of including multiple causal variants in these methods may outweigh, on average, making simplifying assumptions about effect size relationships, but many researchers favour the single causal variant methods in applied work, perhaps because concerns about between population heterogeneity are not yet resolved.

Multiple-Outcome Fine-Mapping

Analysis of multiple outcomes increases power for discovery of associations and helps in understanding pleiotropic associations, as to whether or not the same causal variants underlie multiple-outcome associations in the same locus. There are several multi-trait methods for the identification of associations, applicable either to multiple phenotypes or tissue-specific gene expression measurements (e.g. (38)). Multivariate methods that simultaneously compare genetic variation among multiple traits have been shown to increase power to detect genetic associations (39,40). Likewise, joint fine-mapping of multiple outcomes has a potential to improve resolution over independent single-outcome fine-mapping.

There are very few methods for jointly fine-mapping multiple outcomes. Partly, this is due to computational intensiveness, as the number of joint models increases exponentially with the number of outcomes. One approach, fastPAINTOR (41), adopts a Bayesian framework and limits the model search space by imposing the restriction that each outcome has the same causal variants, allowing for different effect sizes. This approach uses GWAS summary statistics and requires that all quantitative traits are measured on each individual in the sample. The authors use simulations to demonstrate improved fine-mapping accuracy at pleiotropic risk loci, compared to single-trait methods and ``Genetic analysis incorporating Pleiotropy and Annotation'' (GPA; 42). An applied example to four blood lipid phenotypes demonstrates the improvement in fine-mapping resolution when leveraging evidence across correlated traits using fastPAINTOR.

Multinomial fine-mapping (MFM) (43) is an efficient approach for fine-mapping multiple diseases in datasets that share controls. This approach is able to efficiently assess all possible joint models, without requiring shared causal variants, by expressing the approximate Bayes factor (ABF) of a joint model as a function of the individual disease ABFs, the model sizes and sample sizes for cases and controls. It makes use of single-disease fine-mapping results from a Bayesian stochastic search approach, GUESSFM (12), though could be adapted to results from other fine-mapping methods, provided that controls are either shared completely or not at all. Again, simulations demonstrate that MFM tends to improve precision, suggesting that MFM gives a cost-efficient alternative to gathering larger samples. In an analysis of multiple autoimmune diseases, the MFM result combining multiple diseases within a single population was often equivalent to that obtained from single-disease fine-mapping with multiple ancestries. Note that fastPAINTOR and MFM have been designed for specific situations: multiple quantitative traits measured on the same individuals and distinct groups of cases with multiple diseases compared to a single control group. This specialization reflects the computational challenges associated with multiple-outcome approaches.

Outlook and Future Challenges

Fine-mapping methods have now been developed that consider single or multiple causal variants, single or multiple traits, incorporating functional data and leveraging multiple ancestry data. They each have different advantages and challenges and rely on different assumptions reflecting assorted biological interpretations and the computational trade-offs required when working with high-dimensional GWAS data.

In contrast to methods for testing genetic associations, which take a mostly frequentist approach, fine-mapping methods are almost all Bayesian. This means applied geneticists must master both statistical domains to first identify and then fine-map associations. Perhaps the biggest challenge in applied fine-mapping is choosing between the available methods. Without a large set of known true positives and negatives, the consequences of the different assumptions made by each method cannot be compared against the same standard. Faced with such choices, other fields such as gene regulatory network inference (44) have adopted ensemble approaches, where multiple methods are applied to the same data and similar results across methods are interpreted as robust to methodological differences.

The cost of an ensemble approach in fine-mapping is likely to be longer lists of variants considered as potentially causal. Ultimately, statistical fine-mapping is only a tool to prioritize lists of potentially causal variants, and wet-lab protocols are required to pinpoint the true causal variant(s) from these lists. To that end, the lists need to be as small as possible whilst containing the true causal variant(s) with high probability. Recent developments in high-throughput wet-lab techniques such as multiple parallel reporter assays (45) or CRISPR screens mean the functional effects of multiple variants can now be evaluated simultaneously. Thus, ensemble fine-mapping may be enabled by these higher-throughput assays.

An area we have not discussed is that most fine-mapping studies remain restricted to studying SNPs, although small and large structural variants are also known to have functional effects, such as the trinucleotide repeat in the gene HTT which causes Huntington’s disease (https://ghr.nlm.nih.gov/condition/huntington-disease; accessed Apr 29, 2020). Whilst some fine-mapping software can encompass structural variation (10,46), this is conditional on their availability within the GWAS dataset. Thus, improving their direct assay or imputation must be a priority, not only to enable identifying structural variation that is causal but also to exclude non-causal structural variation so that greater confidence can be placed in single nucleotides identified by fine-mapping.

It is encouraging that fine-mapping methods now encompass multiple causal variants as standard and that other extensions in fine-mapping are also beginning to be adopted in practice. It seems likely that the biggest gains in fine-mapping in the short term will be driven by wider adoption of these methods as well as by increasing sample sizes. Increasing sample sizes by considering multiple ancestries in GWAS is likely to be particularly fruitful as different national biobank studies such as UK Biobank (47), Biobank Japan (48) and FinnGen continue to mature. This will also push researchers towards methods that work directly with summary statistics, since fitting multiple multi-variant models to such large data with 100 000 s individuals remains computationally challenging. A less discussed issue here is the importance of the matrices which estimate LD. These need to reflect the population the study has sampled and be estimated from an increasingly large sample as the samples under study grow (49).

By curating GWAS summary statistics, the GWAS catalog (https://www.ebi.ac.uk/gwas/) has enabled rapid lookup of associated variants across thousands of human traits. Analogous fine-mapping resources are also being created, such as CAUSALdb (http://mulinlab.org/causaldb) which contains fine-mapping results for over 2500 traits, generated through systematic fine-mapping under a single causal variant assumption using CAVIAR, FINEMAP and PAINTOR (50). As larger, multiple ancestry studies become available, which allow for multiple causal variants, we hope that such resources will accordingly serve increasingly accurate fine-mapping results. This in turn will enable the creation of more accurate polygenic risk scores, which can help to disentangle the causal pathways to disease, for example, through Mendelian randomization analysis. They will also assist considerably with the primary motivation of fine-mapping studies: efficient design of functional studies, leading to understanding the biological mechanisms which underlie disease risk and thus, ultimately, intervene upon these mechanisms.

Availability of Code

See https://github.com/chr1swallace/fm-priors for code used to generate Fig. 1.

See https://github.com/annahutch/fm-functional for code used to generate Fig. 2.

Funding

Engineering and Physical Sciences Research Council (EP/R511870/1 to A.H.) and GlaxoSmithKline (GSK; to A.H.); Medical Research Council (MR/R021368/1 to J.A.); Wellcome Trust (WT107881 to C.W.); Medical Research Council (MC UU 00002/4 to C.W.).

Conflict of Interest statement. A.H. is a GSK-sponsored iCASE student.

Contributor Information

Anna Hutchinson, MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK.

Jennifer Asimit, MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK.

Chris Wallace, MRC Biostatistics Unit, Cambridge Biomedical Campus, Cambridge Institute of Public Health, Cambridge CB2 0SR, UK; Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0AW, UK; Department of Medicine, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 2QQ, UK.

References

  • 1. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A. and Yang J. (2017) 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet., 101, 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Spain S.L. and Barrett J.C. (2015) Strategies for fine-mapping complex traits. Hum. Mol. Genet., 24, R111–R119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Maller J.B., McVean G., Byrnes J., Vukcevic D., Palin K., Su Z., Howson J.M.M., Auton A., Myers S., Morris A.  et al. (2012) Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet., 44, 1294–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Wakefield J. (2009) Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol., 33, 79–86. [DOI] [PubMed] [Google Scholar]
  • 5. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hutchinson A., Watson H. and Wallace C. (2020) Improving the coverage of credible sets in Bayesian genetic fine-mapping. PLoS Comput. Biol., 16, e1007829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Walters K., Cox A. and Yaacob H. (2019) Using GWAS top hits to inform priors in Bayesian fine-mapping association studies. Genet. Epidemiol., 43, 675–689. [DOI] [PubMed] [Google Scholar]
  • 8. Yang J., Ferreira T., Morris A.P., Medland S.E., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden P.A.F., Heath A.C., Martin N.G., Grant W Montgomery  et al. (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet., 44, 369–375  S1-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Lee Y., Luca F., Pique-Regi R. and Wen X. (2018) Bayesian multi-SNP genetic association analysis: control of FDR and use of summary statistics. bioRxiv, 316471. [Google Scholar]
  • 10. Hormozdiari F., Kostem E., Kang E.Y., Pasaniuc B. and Eskin E. (2014) Identifying causal variants at loci with multiple signals of association. Genetics, 198, 497–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Chen W., Larrabee B.R., Ovsyannikova I.G., Kennedy R.B., Haralambieva I.H., Poland G.A. and Schaid D.J. (2015) Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics, 200, 719–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Wallace C., Cutler A.J., Pontikos N., Pekalski M.L., Burren O.S., Cooper J.D., García A.R., Ferreira R.C., Guo H., Walker N.M.  et al. (2015) Dissection of a complex disease susceptibility region using a Bayesian stochastic search approach to fine mapping. PLoS Genet., 11, e1005272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Bottolo L., Chadeau-Hyam M., Hastie D.I., Zeller T., BLiquet B., Newcombe P., Yengo L., Wild P.S., Schillert A., Ziegler A.  et al. (2013) GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet., 9, e1003657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Bottolo L. and Richardson S. (2010) Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal., 5, 583–618. [Google Scholar]
  • 15. Benner C., Spencer C.C.A., Havulinna A.S., Salomaa V., Ripatti S. and Pirinen M. (2016) FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinforma. Oxf. Engl., 32, 1493–1501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Newcombe P.J., Conti D.V. and Richardson S. (2016) JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol., 40, 188–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Dadaev T., Saunders E.J., Newcombe P.J., Anokian E., Leongamornlert D.A., Brook M.N., Cieza-Borrella C., Mijuskovic M., Wakerell S., Al Olama A.A.  et al. (2018) Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants. Nat. Commun., 9, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Wen X., Lee Y., Luca F. and Pique-Regi R. (2016) Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet., 98, 1114–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Wang G., Sarkar A., Carbonetto P. and Stephens M. (2020) A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. Ser. B Stat. Methodol., doi: 10.1111/rssb.12388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Li Y.I., van de Geijn B., Raj A., Knowles D.A., Petti A.A., Golan D., Gilad Y. and Pritchard J.K. (2016) RNA splicing is a primary link between genetic variation and disease. Science, 352, 600–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bernstein B.E., Stamatoyannopoulos J.A., Costello J.F., Ren B., Milosavljevic A., Meissner A., Kellis M., Marra M.A., Beaudet A.L., Ecker J.R.  et al. (2010) The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol., 28, 1045–1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. GTEx Consortium (2013) The genotype-tissue expression (GTEx) project. Nat. Genet., 45, 580–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Stunnenberg H.G., International Human Epigenome Consortium and Hirst M. (2016) The international human epigenome consortium: a blueprint for scientific collaboration and discovery. Cell, 167, 1145–1149. [DOI] [PubMed] [Google Scholar]
  • 25. Iotchkova V., Ritchie G.R.S., Geihs M., Morganella S., Min J.L., Walter K., Timpson N.J., UK10K Consortium, Dunham I., Birney E. and Soranzo N. (2019) GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet., 51, 343–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Kichaev G., Yang W.-Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P. and Pasaniuc B. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet., 10, e1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Weissbrod O., Hormozdiari F., Benner C., Cui R., Ulirsch J.C., Gazal S., Schoech A.P., van de Geijn B., Reshef Y., Márquez-Luna C.  et al. (2019) Functionally-informed fine-mapping and polygenic localization of complex trait heritability. bioRxiv, 807792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Alenazi A.A., Cox A., Juárez M., Lin W.-Y. and Walters K. (2019) Bayesian variable selection using partially observed categorical prior information in fine-mapping association studies. Genet. Epidemiol., 43, 690–703. [DOI] [PubMed] [Google Scholar]
  • 29. Fachal L., Aschard H., Beesley J., Barnes D.R., Allen J., Kar S., Pooley K.A., Dennis J., Michailidou K., Turman C.  et al. (2020) Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet., 52, 56–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Spencer A.V., Cox A., Lin W., Easton D.F., Michailidou K. and Walters K. (2016) Incorporating functional genomic information in genetic association studies using an empirical Bayes approach. Genet. Epidemiol., 40, 176–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Spencer A.V., Cox A., Lin W., Easton D.F., Michailidou K. and Walters K. (2015) Novel Bayes factors that capture expert uncertainty in prior density specification in genetic association studies. Genet. Epidemiol., 39, 239–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Asimit J.L., Hatzikotoulas K., McCarthy M., Morris A.P. and Zeggini E. (2016) Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. EJHG, 24, 1330–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Mägi R. and Morris A.P. (2010) GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics, 11, 288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Morris A.P. (2011) Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol., 35, 809–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Asian Genetic Epidemiology Network Type 2 Diabetes (AGEN-T2D) Consortium, South Asian Type 2 Diabetes (SAT2D) Consortium, Mexican American Type 2 Diabetes (MAT2D) Consortium, Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) Consortium, Mahajan A., Go M.J., Zhang W., Below J.E., Gaulton K.J.  et al. (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet., 46, 234–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. LaPierre N., Taraszka K., Huang H., He R., Hormozdiari F. and Eskin E. (2020) Identifying causal variants by fine mapping across multiple studies. bioRxiv, 10.1101/2020.01.15.908517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Kichaev G. and Pasaniuc B. (2015) Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet., 97, 260–271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Flutre T., Wen X., Pritchard J. and Stephens M. (2013) A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet., 9, e1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Galesloot T.E., van Steen K., Kiemeney L.A.L.M., Janss L.L. and Vermeulen S.H. (2014) A comparison of multivariate genome-wide association methods. PLoS One, 9, e95923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Turchin M.C. and Stephens M. (2019) Bayesian multivariate reanalysis of large genetic studies identifies many new associations. PLoS Genet., 15, e1008431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Kichaev G., Roytman M., Johnson R., Eskin E., Lindström S., Kraft P. and Pasaniuc B. (2017) Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinforma. Oxf. Engl., 33, 248–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Chung D., Yang C., Li C., Gelernter J. and Zhao H. (2014) GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet., 10, e1004787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Asimit J.L., Rainbow D.B., Fortune M.D., Grinberg N.F., Wicker L.S. and Wallace C. (2019) Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases. Nat. Commun., 10, 3216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Hill S.M., Heiser L.M., Cokelaer T., Unger M., Nesser N.K., Carlin D.E., Zhang Y., Sokolov A., Paull E.O., Wong C.K.  et al. (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods, 13, 310–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Bourges C., Groff A.F., Burren O.S., Gerhardinger C., Mattioli K., Hutchinson A., Hu T., Anand T., Epping M.W., Wallace C.  et al. (2020) Resolving mechanisms of immune-mediated disease in primary CD4 T cells. EMBO Mol. Med., 12, e12112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Chiang C., Scott A.J., Davis J.R., Tsang E.K., Li X., Kim Y., Hadzic T., Damani F.N., Ganel L., GTEx Consortium  et al. (2017) The impact of structural variation on human gene expression. Nat. Genet., 49, 692–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M.  et al. (2015) UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med., 12, e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T.  et al. (2017) Overview of the BioBank Japan project: study design and profile. J. Epidemiol., 27, S2–S8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Benner C., Havulinna A.S., Järvelin M.-R., Salomaa V., Ripatti S. and Pirinen M. (2017) Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet., 101, 539–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Wang J., Huang D., Zhou Y., Yao H., Liu H., Zhai S., Wu C., Zheng Z., Zhao K., Wang Z.  et al. (2020) CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies. Nucleic Acids Res., 48, D807–D816. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press

RESOURCES