Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 22.
Published in final edited form as: Nat Genet. 2020 Jun;52(6):550–552. doi: 10.1038/s41588-020-0622-5

Exploring and visualizing large-scale genetic associations using PheWeb

Sarah A Gagliano Taliun 1,2,#, Peter VandeHaar 1,2,#, Andrew P Boughton 1,2, Ryan P Welch 1,2, Daniel Taliun 1,2, Ellen M Schmidt 1,2, Wei Zhou 1,3, Jonas B Nielsen 4, Cristen J Willer 1,3,4, Seunggeun Lee 1,2, Lars G Fritsche 1,2, Michael Boehnke 1,2, Gonçalo R Abecasis 1,2,5
PMCID: PMC7754083  NIHMSID: NIHMS1639815  PMID: 32504056

To the Editor – Advances in genotyping and sequencing technologies, the growing availability of electronic health records for research use, and the emergence of population-scale cohorts, are enabling large studies to collect copious amounts of both phenotype and genotype data. Studies can collect 1,000s of traits measured across 100,000s of individuals, each assessed at 1,000,000s of genetic variants. These resources enable genome- and phenome-wide association studies (GWAS, PheWAS) at increasing scales, and generate high-dimensional results that can provide insights on many aspects of human genetics and biology. However, navigating these association results can be challenging and cumbersome. To aid in generating and testing hypotheses for the mechanisms underlying complex traits, these results should be organized in an intuitive, easy to navigate manner. The current standards in the field are to use Manhattan1 and LocusZoom2 plots to review single trait results and to use PheWAS3,4 plots to summarize results across many traits. The ability for investigators to explore their own data, alternating between these two types of views, is an increasingly common feature of large-scale association analyses. To this end, we developed PheWeb, an easy-to-use open-source web-based tool for visualizing, navigating, and sharing GWAS and PheWAS results.

We have used PheWeb to explore association results for large datasets such as the UK Biobank5 (http://pheweb.sph.umich.edu/SAIGE-UKB) and the Michigan Genomics Initiative (MGI, http://pheweb.sph.umich.edu/MGI-freeze2). The PheWeb instance populated with the UK Biobank summary statistics displays 28 million genetic markers assessed across 1,403 binary traits for 408,961 White British participants6 (Supplementary Note). Others have used PheWeb to explore large sets of association results, such as the Oxford Brain Imaging Genetics Project (http://big.stats.ox.ac.uk) and the new computationally efficient association tool fastGWA (http://fastgwa.info/ukbimp).

PheWeb provides automated data processing and an interactive web-interface for exploratory analysis. The data processing pipeline loads and harmonizes association summary statistics (recipes are provided for the output of many common tools, see Supplementary Note), organizes relationships between traits based on pairwise genetic correlations, and annotates the variants. The web-interface provides intuitive visualizations at three levels of granularity: genome-wide at the trait-level, and regional (LocusZoom)2 and phenome-wide summaries at the variant level (Supplementary Note, Supplementary Fig. 1). PheWeb links to relevant public databases (e.g. NHGRI-EBI GWAS Catalog7 and ClinVar8) to provide further information on a particular variant. Association results can be queried by trait, variant, or gene. To facilitate collaboration, PheWeb visualizations can be shared through the URLs, and we are exploring the opportunity to enable collaborative annotation on each results page.

PheWeb can help make meaningful discoveries. To illustrate this potential, Figure 1 and Supplementary Figures 23 illustrate different views of genetic association signals and key variants for bladder cancer in the UK Biobank association results in PheWeb. From the Manhattan plot (Fig. 1a), the strongest association on chromosome 5 is at rs4975616 (p-value=9.9×10−11), which is located near the CLPTM1L gene. The regional view of the region (Supplementary Figure 4) highlights that rs4975616 and several of its proxies are in the GWAS Catalog and associated with various cancers (e.g. lung cancer, pancreatic cancer, basal cell carcinoma), suggesting a broad role for the locus in cancer susceptibility. The PheWAS view (Supplementary Figure 5) further supports the association of the locus with a variety of cancers, including cancers of the skin and lung, and links to multiple PubMed entries supporting the potential role of rs4975616 in lung and other cancers9. Interestingly, for the other top loci (both on chromosome 8), the regional and PheWAS views (rs2976384 in JRK/PSCA: Fig. 1b,c; rs10094872 near MYC: Supplementary Figures 67) convey distinct messages that these loci are not associated with skin and lung cancers, but instead with gastric and urinary traits such as duodenal ulcer, urinary tract infection, and pancreatic cancer. The quantile-quantile plot (Supplementary Fig. 2) shows that the significant associations for bladder cancer are driven by common (MAF>2%) variants.

Figure 1.

Figure 1.

Interactive views of genetic associations in the UK Biobank instance of PheWeb. (a) Manhattan plot view for the GWAS on bladder cancer based on 2,427 cases and 404,796 controls using SAIGE (a generalized mixed model association test accounts for case-control imbalance) with sex, birth year and four principal components as covariates for the White British subset. (b) Regional view (LocusZoom) for variant rs2976384 (purple diamond) with variants present in the NHGRI-EBI GWAS Catalog displayed. (c) Phenome-wide view (PheWAS) for variant rs2976384 showing phenome-wide significant association with traits from the UK Biobank related to the urinary system or lower digestive tract. Traits are sorted and colored according to a meaningful set of biological categories (e.g. infectious diseases, neurological trait, metabolic). Direction of effect of the alternate allele’s association with each trait is exhibited by upward-facing (positive effect) or downward-facing (negative effect) triangle.

To help make connections among traits, PheWeb optionally displays pairwise genetic correlations across traits (Supplementary Fig. 3). For example, bladder cancer shows the strongest genetic correlation with cancer of urinary organs (r=0.84, p-value=7.6×10−7 from cross-trait linkage disequilibrium (LD) score regression10) and weaker correlations with tobacco use disorder (r=0.39, p-value=0.006), an observation that is consistent with the role of smoking as a major risk factor for bladder cancer11.

In our view, describing the spectrum of traits at each locus helps identify loci that influence disease through similar mechanisms and help expose connections between traits, whether expected or not. Obviously, further research is needed to make conclusive statements on the role of these or any loci, including fine-mapping and co-localization approaches as well as refinement of phenotype definitions for the traits. To this end, PheWeb is designed to allow gradual updating of result sets as new analyses are completed and refined.

While we believe that making data explorations and large sets of results broadly accessible and intuitive is extremely valuable, these do not obviate the need for further analysis, experimentation and biological follow-up. Thus, PheWeb is as useful as the data and results behind it, but we expect these results will be much more useful when they are accessible. When interpreting the visualizations, users must consider existing biases in the underlying GWAS data, such as, but not limited to, analyses conducted in restricted (ancestry) sets of individuals and sub-optimal phenotype definitions.

We welcome user feedback and feature requests through the PheWeb GitHub repository (https://github.com/statgen/pheweb), which helps us enhance and tailor PheWeb to meet the needs of the research community. This repository includes a walk-through demonstration and easy-to-follow instructions for creating a PheWeb for one’s own data. The PheWeb code-base is not exclusive to displaying variant-trait associations, but can be used to display other types of genome-wide data, such as variant associations with gene expression (expression quantitative trait loci).

In summary, large-scale population cohorts can now include 1,000s of traits assessed across 100,000s of individuals with genetic data. PheWeb facilitates the sharing and organizing of genetic association results, including one’s own data, at scale to facilitate hypothesis generation to help make important discoveries for human genetics.

Supplementary Material

Supplementary Information

Acknowledgements

The UK Biobank genetic association summary statistics were generated previously using the UK Biobank Resource through project ID number 24460. This research was supported by NIH grant HG009976 (M.B.) and HG007022 (G.R.A.).

Footnotes

Competing Interests

G.R.A. is an employee of Regeneron Pharmaceuticals; he owns stock and stock options for Regeneron Pharmaceuticals. The spouse of C.J.W. is employed at Regeneron Pharmaceuticals.

Data availability

PheWeb fully embodies the philosophy of data and results sharing to advance scientific discoveries. PheWeb allows users to download full sets of summary statistics or subsets corresponding to the strongest association results. The summary statistics that populate the UK Biobank instance of PheWeb can be downloaded at ftp://share.sph.umich.edu/UKBB_SAIGE_HRC/.

Code availability

PheWeb’s code-base is open-source and hosted on GitHub at https://github.com/statgen/pheweb. Our pipeline for computing pairwise genetic correlations using cross-trait LD Score regression10 or SumHer12 is scalable to large datasets, and is available at https://github.com/statgen/pheweb-rg-pipeline. Please see the Life Sciences Reporting Summary.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

RESOURCES