Skip to main content
GigaByte logoLink to GigaByte
. 2023 Sep 1;2023:gigabyte89. doi: 10.46471/gigabyte.89

Trumpet plots: visualizing the relationship between allele frequency and effect size in genetic association studies

Lucia Corte 1,2, Lathan Liou 1, Paul F O’Reilly 1, Judit García-González 1,*
PMCID: PMC10498096  PMID: 37711278

Abstract

Recent advances in genome-wide association and sequencing studies have shown that the genetic architecture of complex traits and diseases involves a combination of rare and common genetic variants distributed throughout the genome. One way to better understand this architecture is to visualize genetic associations across a wide range of allele frequencies. However, there is currently no standardized or consistent graphical representation for effectively illustrating these results.

Here we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum. The proposed plots have a distinctive trumpet shape: with the majority of variants having high frequency and small effects, and a small number of variants having lower frequency and larger effects. To demonstrate the utility of trumpet plots in illustrating the relationship between the number of variants, their frequency, and the magnitude of their effects in shaping the genetic architecture of complex traits and diseases, we generated trumpet plots for more than one hundred traits in the UK Biobank. To facilitate their broader use, we developed an R package, ‘TrumpetPlots’ (available at the Comprehensive R Archive Network) and R Shiny application, ‘Shiny Trumpets’ (available at https://juditgg.shinyapps.io/shinytrumpets/) that allows users to explore these results and submit their own data.

Statement of Need

Visualizations are powerful tools that have helped the field of genetics to better understand and communicate complex findings. By using visual aids like Manhattan plots and volcano plots, researchers can more easily pinpoint genetic variants identified through genome-wide association studies (GWAS). With the advancement of GWAS and sequencing studies, a mounting number of significant genetic variants – both common and rare – are being discovered. To better understand the relationship between these variants, combining these findings into single visualizations helps to observe the relationship between effect size and allele frequency, providing a clearer picture of the genetic architecture of different traits and diseases. However, there is currently no consistent method for illustrating such results. In this paper, we propose a standardized approach for visualizing the effect size of risk variants across the allele frequency spectrum, generate plots for over a hundred traits in the UK Biobank, and provide to the field an R package and R Shiny application to allow users to explore their own results.

Background

Results visualization is an essential tool for interpreting complex data. By using visual representations such as graphs, charts, and plots, researchers can quickly identify patterns, trends, and outliers that may not be detected in tables of raw data. Visualizations help researchers gain a more intuitive and comprehensive understanding of the data, especially when dealing with large and complex data sets. Furthermore, visualizing results can facilitate communication of research findings to a broader audience, including non-experts [1]. Visual representations are often more accessible and engaging, making it easier for others to understand and appreciate the significance of the research conducted.

In the field of genetics, the use of visualizations has revolutionized the interpretation and communication of research findings. Over the past two decades, visualizations such as Manhattan plots [2] which display the results of GWAS, software packages like haploview [3] that analyze and visualize linkage disequilibrium (LD) patterns of GWAS-associated loci, and volcano plots that assess patterns of differential gene expression [4] have all played crucial roles in illustrating and sharing the key summaries of data that have advanced the field. These and other visualizations [5] have allowed the genetics field to more easily identify candidate causal variants, relevant genes, and potential outliers that may not be apparent in tables of GWAS summary statistics or differential expression results.

More recent advances in GWAS and sequencing studies [68] have resulted in the identification of an increasing number of significant genetic variants, including both common [8] and rare variants [5, 6]. Researchers are now starting to combine these findings into single visualizations to observe the relationship between effect size and allele frequency across the full range of significantly associated variants. Given that the number of risk-conferring variants, their frequency in a population, and their effect size can vary across different traits and diseases, using these plots can provide a better and instantaneous understanding of their relative genetic architecture. Recent studies on height [9], schizophrenia [10] and coronary artery disease [11, 12] have already included this full range as the main figure, highlighting the utility of this type of visualization. However, a formal and consistent method for illustrating these results has not yet emerged.

The aim of this work is to introduce an R package and R Shiny application to illustrate the distribution of risk variants across a wide range of allele frequencies. We term the resulting plots ‘trumpet plots’, due to their trumpet-like shape. To demonstrate their utility, we generated trumpet plots for over one hundred continuous traits available in the UK Biobank [13], illustrating the distribution of risk variants across an effect allele frequency range between 0.00001 and 1. These plots are available at https://juditgg.shinyapps.io/shinytrumpets/, and we illustrate a single trumpet plot combining the results of all of these in Figure 1.

Figure 1.

Figure 1.

Distribution of allele frequencies and effect sizes for genetic associations across 129 continuous traits from the UK Biobank. Power curves for statistical power of 0.5 (blue), 0.7 (purple) and 0.9 (pink) were constructed for the median genome-wide association study sample size (N = 351,550). The colours of the dots represent the −log10 of the association P-value. The size of the dot is proportional to the effect size (represented in the y-axis). Information for each individual variant (variant name, effect size, gene name (if applicable) P-value, and trait name) is displayed when hovering a mouse cursor over each dot. Due to file size limits, the interactive version does not display variants with effect size lower than 0.05. Please see the supporting data [26] for a static version with all variants across the 129 traits. figure-1.html

We propose that trumpet plots are valuable representations of genetic associations across the full allele frequency spectrum that can help researchers to better understand the genetic architecture of traits and diseases, and potentially aid in study design and the prioritisation of investments to discover new variants that contribute to disease.

Methods

In the following sections, we will explain the various decisions we made when creating trumpet plots. These decisions include selecting the appropriate scale to represent allele frequencies, deciding whether to use the full GWAS summary statistics or independent GWAS variants, addressing issues related to the reporting of rare-variant association tests, determining whether to include power curves, and considering the effect size sign of the variants included. By carefully considering each of these factors, we aimed to create informative and visually appealing trumpet plots that illustrate the effect size of genetic associations across a wide range of allele frequencies.

Using the logarithmic vs linear scale to represent allele frequencies

In the representation of allele frequencies, the range of values can vary greatly between the smallest and largest frequency. When these associations are plotted on a linear scale, rare variants can be obscured or difficult to distinguish. To address this issue, we recommend using a logarithmic (log) scale – we use log base 10 – for allele frequencies in trumpet plots. Compared to a linear scale, the log scale uses increments that represent a relative increase or decrease, rather than a fixed-value increase or decrease. The log scale compresses the allele frequencies that are most common, which results in a more even distribution of values across the scale. This scale of visualization facilitates the identification of important patterns and trends.

Identification of independent significant variants to enhance the interpretation of trumpet plots

Genetic association studies involve testing up to millions of genetic variants for their association with a particular trait or disease. However, many of these variants are correlated with each other due to their physical proximity on the genome, which is known as LD. This means that many variants nearby to causal variants often show significant associations with the trait under study, due only to their correlation rather than to any biological involvement with the trait.

Two methods that can be used to identify independent significant variants in a GWAS are clumping and conditional analysis [14, 15]. Clumping involves selecting a subset of independent significant variants by choosing a lead variant for each LD cluster and then discarding all other variants in that cluster. The lead variant is typically the one with the strongest association with the trait of interest. Conditional analysis, on the other hand, involves identifying independent significant variants after performing a joint analysis of multiple variants together. In this approach, the effect of one variant is conditioned on the effect of other variants; that is, the association between the trait of interest and one variant is evaluated after accounting for the effect of other variants. This can either be performed as a joint analysis (e.g., a regression), with multiple variants in one model, or else as an iterative process, where the lead variant and each other variant in the region are tested jointly, one by one. By considering the effects of multiple variants jointly in single models, conditional analysis helps to identify independent signals more accurately than clumping – which relies on only correlations between variants to indirectly infer independent signals.

Since the correlation between variants can make it challenging to interpret trumpet plots, we recommend plotting only independent significant variants: variants that represent distinct genetic associations with the trait of interest.

Variant-level associations for low allele frequencies

While GWAS are a valuable tool for detecting common genetic variants associated with complex traits or diseases, they have limited power to identify associations with rare variants. To address this limitation, sequencing studies [1618], such as whole-exome sequencing and whole-genome sequencing, have been used to detect variant-level associations with low allele frequencies.

To further enhance the statistical power of rare-variant associations, a commonly used strategy is to aggregate the rare variants detected into functional genetic units, such as genes, and perform collective variation analysis (e.g., gene burden tests [1921]). However, we find that the reporting of rare-variant association analyses varies across studies. Some studies report results at the variant level, while others report only results of the functional unit in which rare variants were aggregated. This makes comparing results across independent studies, and determining the functional significance of specific variants, challenging.

We therefore encourage the reporting of results at the variant level, so that they can be included in visualizations of allele frequency in relation to effect size, such as trumpet plots, aiding study comparisons and variant interpretation. Although the analysis of genetic variants at low frequency is expected to improve with the availability of biobank-scale samples and the development of new methods to reduce biases in association tests, caution should still be exercised when inspecting associations with rare variants, as these can suffer from instability and low power, particularly in relation to binary traits.

Statistical power considerations

GWAS requires careful consideration of statistical power [22], which depends on various factors, including allele frequency and the effect size of variants – represented by the x- and y-axes of trumpet plots, respectively. Common variants – usually defined as having allele frequency greater than 1% – tend to have higher power in association studies because common causal variants are more likely to be present in the sample (either genotyped or imputed), and because their relatively balanced number of alleles is akin to having a larger sample size. Variants with larger effect sizes have higher power because their effects are further from the null hypothesis of zero effect.

We therefore recommend incorporating power curves into trumpet plots, since they visually represent the statistical power across the spectrum of allele frequency for a given sample size and effect size [23]. Moreover, power curves can aid in identifying parts of the association testing space in which the power to detect significant associations is low.

Two alternative approaches to illustrate the joint distribution of allele frequency and effect size

One approach to illustrate the relationship between allele frequency and effect size is to plot only positive effects: that is, the allele effect for each variant that increases the value of the phenotype. In this case, the effect sizes are always positive, and both the allele and sign of the association regression coefficients (betas or odds ratios) need to be flipped (to the other allele) if they are reported as negative, to ensure that the effect size is greater than zero. If this ‘flipping’ is required, then the allele frequency of the other allele should be reported, which will be 1 minus the original allele frequency. In this case, the allele frequencies of the plot range from 0 to 1.

The other approach, which we recommend, allows for both positive (risk allele in the context of disease phenotypes) and negative (protective allele in the context of disease phenotypes) effect sizes and always corresponds to the minor allele. In this case, the effect size of the allele can have either a positive or a negative value, and the allele frequencies of the plot range from 0 to 0.5.

Practical example: Generating trumpet plots for 129 traits in the UK Biobank

We examined all continuous UK Biobank traits with available GWAS analyses performed by Benjamin Neale’s group [24], and searched to confirm whether rare-variant associations were available for the same trait (by UK Biobank Field ID) in the exome sequencing analysis performed by the Regeneron team [13].

Common variant associations were extracted from the Neale group’s GWAS summary statistics. For each GWAS, we extracted the independent variants using COJO GCTA (–cojo-slct command), and a random subset of 4,000 unrelated individuals with European ancestry from the UK Biobank as an LD reference panel. We selected independent variants with minor allele frequency of >0.01 and association P-value < 5 × 10−8 within a 100-Kb window.

Rare-variant association results were extracted from the supplementary data table 2 (SD2) of the Regeneron study [13]. This study reported results for burden tests (which typically aggregated variants and indels) and individual rare-variant–level tests. To ensure that the effect sizes reported in our analyses corresponded with individual rare variants, we extracted results for only ‘singleton variants’ with predicted loss of function – including stop-gain, frameshift, stop-lost, start-lost and essential splice variants – and deleterious missense variants.

We utilized our R package, TrumpetPlots [25] (RRID:SCR_023742) to create plots depicting the relationship between allele frequency (x-axis) and odds ratio (y-axis). Figure 1 illustrates the combination of results from 129 continuous traits, aggregated into a single trumpet plot. When all the traits are collectively represented, a change in the number of associations around allele frequency of 0.01 is observed. This is likely due to a combination of factors, including differences in genome coverage, quality control and statistical power of the two studies used. For allele frequencies of >0.01, association results were extracted from GWAS that used genotyping arrays to assess genotyped variants across the entire genome. In contrast, for allele frequencies of <0.01, association results were extracted from exome sequencing of coding variants only, which constitute a small fraction of the genome. These variants were further filtered to include only rare, singleton variants with predicted loss of function; while these variants may be expected to have relatively large effect sizes, their statistical power to identify significant associations corresponding to variants of small effect, is substantially lower than that of common variants.

R Shiny application

We developed a user-friendly web application called Shiny Trumpets to visualize trumpet plots for our UK Biobank results, as well as any other genetic association results that can be uploaded by the user. With Shiny Trumpets, researchers with no knowledge of R programming can easily upload and visualize their own data sets.

If a user uploads their own results, the Shiny Trumpets application prompts them to upload the input data files and specify the sample size used for the study, such as the GWAS sample size. This information is used to perform power calculations for the visualization. Shiny Trumpets offers an intuitive interface for users to explore and download trumpet plots.

Discussion

Visual representations of genetics and genomics results, such Manhattan plots [2], Q-Q plots [2], haploview [3] or volcano plots [4], have been helpful in interpreting research findings and in identifying patterns, trends, and outliers that may not be easily apparent in tables of raw data. These visualizations have revolutionized the interpretation and communication of research findings relating to the identification of GWAS-associated loci, putatively causal genes, and potential outliers.

In this manuscript, we introduce a new R package and R Shiny application to illustrate the distribution of risk variant effect sizes across a wide range of allele frequencies, which we coin: ‘TrumpetPlots’. We illustrate the distribution of variant effect sizes across the allele frequency range (from 0.00001 to 1) for over 100 continuous traits available in the UK Biobank, and propose that these plots are valuable representations of genetic associations that can help researchers better understand the genetic architecture of traits and diseases and prioritize certain study designs (e.g., sequencing or GWAS) to discover new variants that contribute to disease.

Alternative combinations of the results from genetic association analyses can lead to various types of plots, each with distinct shapes that differ from a trumpet. For instance, Manhattan plots have gained popularity as a means of illustrating association results, and related variations like Miami plots [27] and Brisbane plots [9] have also been reported. In a previous study [28], the idea of adapting volcano plots (commonly used to represent differential gene expression analyses) was proposed for genetic association studies. Other metrics – such as the proportion of variance explained, or the population attributable risk of each variant – could be represented in relation to the risk allele frequency [29]. All these metrics have different properties and assumptions, which can influence their use and interpretation [29]. For example, illustrating effect sizes is particularly suitable for identifying genetic variants with strong effects regardless of how common a variant is in the population. This is especially important for the discovery and prioritization of candidate genes, and to gain biological insights of the traits or disease under study. However, it also highlights the presence of large-effect rare variants that, due to their low frequency, may have a small contribution to population-level disease risk.

One important consideration when interpreting the trumpet plots we constructed for the UK Biobank is that they represent only individuals of European ancestry. The relationship between effect size and allele frequency can be affected by population genetic differences [30, 31] and, as such, one interesting application of trumpet plots could be to compare the joint distribution of allele frequencies and effect sizes across different ancestries to identify similarities and differences for further investigation. Insights about the similarities and differences across populations, in the relationship between effect size and allele frequency, could have important implications for disease risk prediction and prevention strategies.

In conclusion, we emphasize the significance of data visualization in the genetics field and present a novel R package and R Shiny application for visualizing the relationship between allele frequency and effect size in association studies. We hope that the proposed ‘trumpet plots’ will provide a valuable representation of genetic associations and enhance the interpretation of association results across the allele frequency spectrum.

Availability of source code and requirements

Acknowledgements

We thank the participants of the UK Biobank and the scientists involved in the construction of this resource. We would like to thank Dr Shea Andrews for helpful discussions on several aspects of the project. We would also like to express our gratitude to the Center for Excellence in Youth Education (CEYE) program for their support and training, which enabled us to carry out this research. Without the invaluable assistance and dedication of CEYE staff, this project would not have been possible.

Funding Statement

This work was supported by a grant from the National Institutes of Health (R01MH122866) to PFO, by a 2022 NARSAD Young Investigator Grant (Number 30749) by the Brain & Behavior Research Foundation to JGG, and through the computational resources and staff expertise provided by Scientific Computing and the Data Ark (Data Commons) teams at the Icahn School of Medicine at Mount Sinai.

Data Availability

All data used in this manuscript is publicly available. Rare-variant associations are available in supplementary data table 2 of the original publication [13]; GWAS summary statistics are available on the website https://www.nealelab.is/uk-biobank/.

The code is freely available and open to others’ contributions at https://gitlab.com/JuditGG/freq_or_plots (UK Biobank analyses), https://gitlab.com/JuditGG/trumpetplots (R package with test data) and https://juditgg.shinyapps.io/shinytrumpets/ (R Shiny application).

To seek support or to report issues, users can visit https://gitlab.com/JuditGG/freq_or_plots/-/issues (for questions or issues related to the R Shiny application) and https://gitlab.com/JuditGG/trumpetplots/-/issues (for questions or issues related to the R package).

Snapshots of the code are also available from the GigaDB repository [26].

List of Abbreviations

GWAS: genome-wide association study; LD: linkage disequilibrium; log: logarithm or logarithmic.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Competing Interests

The authors declare that they have no competing interests.

Authors’ contributions

LC: Data Curation, Formal Analysis, Investigation, Writing – Original Draft Preparation. LL: Software, Validation, Visualization. PFO: Conceptualization, Funding Acquisition, Formal Analysis, Supervision, Writing – Review & Editing. JGG: Conceptualization, Data Curation, Formal Analysis, Investigation, Software, Validation, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing.

Funding

This work was supported by a grant from the National Institutes of Health (R01MH122866) to PFO, by a 2022 NARSAD Young Investigator Grant (Number 30749) by the Brain & Behavior Research Foundation to JGG, and through the computational resources and staff expertise provided by Scientific Computing and the Data Ark (Data Commons) teams at the Icahn School of Medicine at Mount Sinai.

References

  • 1.Iyegbe CO, O’Reilly PF. . Genetic origins of schizophrenia find common ground. Nature, 2022; 604(7906): 433–435. doi: 10.1038/d41586-022-00773-5. [DOI] [PubMed] [Google Scholar]
  • 2.Turner SD. . qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. J. Open Source Softw., 2018; 3(25): 731. doi: 10.21105/joss.00731. [DOI] [Google Scholar]
  • 3.Barrett JC, Fry B, Maller J et al. . Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 2005; 21(2): 263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]
  • 4.Li W. . Volcano plots in analyzing differential expression with mRNA microarrays. J. Bioinform. Comput. Biol., 2012; 10(06): 1231003. doi: 10.1142/S0219720012310038. [DOI] [PubMed] [Google Scholar]
  • 5.Boughton AP, Welch RP, Flickinger M et al. . LocusZoom.js: interactive and embeddable visualization of genetic association study results. Bioinformatics, 2021; 37(18): 3017–3018. doi: 10.1093/bioinformatics/btab186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhou W, Bi W, Zhao Z et al. . SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat. Genet., 2022; 54(10): 1466–1469. doi: 10.1038/s41588-022-01178-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Mbatchou J, Barnard L, Backman J et al. . Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet., 2021; 53(7): 1097–1103. doi: 10.1038/s41588-021-00870-7. [DOI] [PubMed] [Google Scholar]
  • 8.Chang CC, Chow CC, Tellier LC et al. . Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 2015; 4(7): 1–16. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yengo L, Vedantam S, Marouli E et al. . A saturated map of common genetic variants associated with human height. Nature, 2022; 610(7933): 704–712. doi: 10.1038/s41586-022-05275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Trubetskoy V, Pardiñas AF, Qi T et al. . Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature, 2022; 604(7906): 502–508. doi: 10.1038/s41586-022-04434-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Aragam KG, Jiang T, Goel A et al. . Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat. Genet., 2022; 54(12): 1803–1815. doi: 10.1038/s41588-022-01233-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Koyama S, Ito K, Terao C et al. . Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat. Genet., 2020; 52(11): 1169–1177. doi: 10.1038/s41588-020-0705-3. [DOI] [PubMed] [Google Scholar]
  • 13.Backman JD, Li AH, Marcketta A et al. . Exome sequencing and analysis of 454,787 UK Biobank participants. Nature, 2021; 599(7886): 628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yang J, Ferreira T, Morris AP et al. . Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet., 2012; 44(4): 369–375, S1–S3. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yang J, Lee SH, Goddard ME et al. . GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet., 2011; 88(1): 76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Leitsalu L, Haller T, Esko T et al. . Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol., 2015; 44(4): 1137–1147. doi: 10.1093/ije/dyt268. [DOI] [PubMed] [Google Scholar]
  • 17.Gaziano JM, Concato J, Brophy M et al. . Million veteran program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol., 2016; 70: 214–223. doi: 10.1016/j.jclinepi.2015.09.016. [DOI] [PubMed] [Google Scholar]
  • 18.Turnbull C, Scott RH, Thomas E et al. . The 100,000 genomes project: bringing whole genome sequencing to the NHS. BMJ, 2018; 361: k1687. doi: 10.1136/bmj.k1687. [DOI] [PubMed] [Google Scholar]
  • 19.Lee S, Emond MJ, Bamshad MJ et al. . Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet., 2012; 91(2): 224–237. doi: 10.1016/j.ajhg.2012.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Feng S, Liu D, Zhan X et al. . RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics, 2014; 30(19): 2828–2829. doi: 10.1093/bioinformatics/btu367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Neale BM, Rivas MA, Voight BF et al. . Testing for an unusual distribution of rare variants. PLOS Genet., 2011; 7(3): e1001322. doi: 10.1371/journal.pgen.1001322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hong EP, Park JW. . Sample size and statistical power calculation in genetic association studies. Genom. Inform., 2012; 10(2): 117–122. doi: 10.5808/GI.2012.10.2.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sham PC, Purcell SM. . Statistical power and significance testing in large-scale genetic studies. Nat. Rev. Genet., 2014; 15(5): 335–346. doi: 10.1038/nrg3706. [DOI] [PubMed] [Google Scholar]
  • 24.UK Biobank . Neale lab. http://www.nealelab.is/uk-biobank. Accessed August 25, 2023.
  • 25.García-González J, Liou L. . TrumpetPlots: Visualization of Genetic Association Studies. June 13, 2023; https://cran.r-project.org/web/packages/TrumpetPlots/index.html. Accessed August 25, 2023.
  • 26.Lucia C, Lathan L, Paul OF et al. . Supporting data for “trumpet plots: visualizing the relationship between allele frequency and effect size in genetic association studies”. GigaScience Database, 2023; 10.5524/102432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.White JD. . juliedwhite/miamiplot: An R package for creating ggplot2 based miami plots. https://github.com/juliedwhite/miamiplot. Accessed June 29, 2023.
  • 28.Li W, Freudenberg J, Suh YJ et al. . Using volcano plots and regularized-chi statistics in genetic association studies. Comput. Biol. Chem., 2014; 48: 77–83. doi: 10.1016/j.compbiolchem.2013.02.003. [DOI] [PubMed] [Google Scholar]
  • 29.Witte JS, Visscher PM, Wray NR. . The contribution of genetic variants to disease depends on the ruler. Nat. Rev. Genet., 2014; 15(11): 765–776. doi: 10.1038/nrg3786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shi H, Gazal S, Kanai M et al. . Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun., 2021; 12(1): 1098. doi: 10.1038/s41467-021-21286-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Martin AR, Gignoux CR, Walters RK et al. . Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet., 2017; 100(4): 635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Hongfang Zhang
GigaByte.

Curator Assess MS

Editor: Yannan Fan
GigaByte.

Review MS

Editor: Clara Albiñana

Reviewer name and names of any other individual's who aided in reviewer Clara Albiñana
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes
Additional Comments
Is the source code available, and has an appropriate Open Source Initiative license <a href="https://opensource.org/licenses" target="_blank">(https://opensource.org/licenses)</a> been assigned to the code? Yes
Additional Comments
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code? No
Additional Comments Although there are no explicit guidelines for contribution in the manuscript or website, it is true that by placing the project on gitlab it is possible to contribute to the project / open issues.
Is the code executable? No
Additional Comments Unfortunately, I wasn't able to install the R package. I have now opened an issue on the gitlab page so that it can hopefully get solved.
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Yes
Additional Comments It is very common for new R packages to just use devtools for installation.
Is the documentation provided clear and user friendly? Yes
Additional Comments The requirements for generating a trumpet plot just involve providing a set of GWAS summary statistics with column-specific names, together with the GWAS sample size. This is very common for GWAS summary statistics-based tools. I think it is fine for the R package to require re-naming the columns to fit the format, as one already needs to upload the file into R. However, I find it inconvenient to have to re-save the summary statistics file with different name-columns for the shinyapp tool. Providing e.g. column indexes alone would be much more user-friendly. Together with the manuscript, I think a longer readme file in the gitlab repository would be very beneficial for stand-alone usage of the R package.
Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required? No
Additional Comments I cannot answer this question until I can install the tool.
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level? Yes
Additional Comments
Have any claims of performance been sufficiently tested and compared to other commonly-used packages? Not applicable
Additional Comments There are no existing comparable tools.
Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)? Yes
Additional Comments
Are there (ideally real world) examples demonstrating use of the software? Yes
Additional Comments
Is automated testing used or are there manual steps described so that the functionality of the software can be verified? Yes
Additional Comments I can see there is a toy dataset included with the R package.
Any Additional Overall Comments to the Author I think the manuscript is very clear and good at making the point of the utility of the software. The proposed trumpet plots are very visually appealing and can be useful to characterise the genetic variation of diverse phenotypes. The novelty of the trumpet plots, as compared to previously proposed effect size vs. allele frequency plots, is the use of positive and negative effect sizes, making it look like a trumpet. I also appreciate the style decisions in the standard generated plots, with a nice visually-appealing color scheme and design. On the use of the software, I have focused my testing on the R package, which I was not able to install. The shinyapp is very useful for visualising the existing, pre-computed trumpet plots, but I do not find it very useful for generating user-uploaded summary statistics for the reasons I mentioned above. Another comment on the ShinyApp is that I appreciate the possibility to download the plots but it would be very useful to include the name of the visualized phenotype as the plot title, for example, to avoid confusion when downloading multiple plots. I also found an incorrect sentence in the abstract, which is think should be reversed: " The proposed plots have a distinctive trumpet shape, with the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects".
Recommendation Minor Revisions
GigaByte.

Review MS

Editor: Wentian Li

Reviewer name and names of any other individual's who aided in reviewer Wentian Li
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is? Yes
Additional Comments
Is the source code available, and has an appropriate Open Source Initiative license <a href="https://opensource.org/licenses" target="_blank">(https://opensource.org/licenses)</a> been assigned to the code? Yes
Additional Comments
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code? Yes
Additional Comments
Is the code executable? Yes
Additional Comments
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined? Yes
Additional Comments
Is the documentation provided clear and user friendly? No
Additional Comments Many aspects of Fig.1 are not explained.
Additional Comments
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level? Yes
Additional Comments
Have any claims of performance been sufficiently tested and compared to other commonly-used packages? Not applicable
Additional Comments
Additional Comments
Are there (ideally real world) examples demonstrating use of the software? Yes
Additional Comments
Additional Comments
Any Additional Overall Comments to the Author Plots with allele frequency as x axis and effect size (e.g. odds ratio) as y axis is a very common display of the contribution from both common and rare alleles to genetic association. A schematic form of this plot is practically on almost everybody's presentation slides when introduce this topic (to see an example, see, e.g. Science (23 Nov 2012), vol 338(6110), pp.1016-1017 ). Considering how many people have already been familiar with this type of plot, I feel that very little new is added in this paper: maybe only a new name ("trumpet"), and/or the power lines. The other methods contributions (log-x, one variant per LD, avoiding gene-level statistics) are rather straightforward. People without experience with "shiny" (R package) can still use ggplot2 or plot in R to get the same result. Generally speaking, I think the paper is weak, though OK as a program/package announcement. Major comments: * I think the trumpet shape (increase of "effect size" for rare variant) is probably a direct consequence of using odds-ratio as a measure of effect size. If the allele frequency in normal population is p0, that in disease population is p1, [p1/(1-p1)]/[p0/(1-p0)] ~ p1/p0 tends to be large for small p0's, simply because the denominator is small. On the other hand, if population attributable risk (p0*(RR-1)/(1+p0*(RR-1))) is used as the y-axis, I am uncertain what the shape of the plot would be. * A risk allele has these pieces of information: 1. allele frequency, 2. effect size (e.g. odds ratio), 3. type-I error/p-value, 4. type-II error/power. The plot in this paper show #1 vs #2 and #4 being added as extra. In another publication with a proposal to plot genetic association results (Comp Biol. and Chem. (2014), 48:77-83 doi: 10.1016/j.compbiolchem.2013.02.003), #2 is against #3 with #1 being an added extra. I'm sure using other combinations could lead to other types of plots. The authors should discussion/compare these possibilities. Minor comments: In Fig.1, the size of the dots, the brown vs cyan color, the discontinuity of scatter dots around 0.01, are not explained.
Recommendation Major Revisions
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Re-Review MS

Editor: Wentian Li

Comments on revised manuscript I have read authors' response and I'm mostly satisfied. Only two minor comments: * Witte 2014 Nature Rev. Genet. article summarizes the point I tried to make well. I understand that rare variants should have a relatively higher effect from an evolutionary perspective, but since these are rare, their individual or even collective contribution to a disease in the population is still small. A casual reader may not realize this point and I think it would be helpful to cite Witte's article. * My minor comment on Fig.1 is still not addressed: there seem to be more points on the right side of p=0.01 line than the left side. Why this discontinuity? (the added text in Revision is about the color and size of the dots, not about this discontinuity)
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Assess Revision

Editor: Hongfang Zhang
GigaByte.

Final Data Preparation

Editor: Yannan Fan
GigaByte.

Editor Decision

Editor: Hongfang Zhang
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment This work presents a new standardized graphical approach for visualizing genetic associations across a wide range of allele frequencies. These proposed TrumpetPlots have a distinctive trumpet shape, hence the proposed name. With the majority of variants having low frequency and small effects, while a small number of variants have higher frequency and larger effects, this view can help to provide new and valuable insights into the genetic basis of traits and diseases, and also help prioritize efforts to discover new risk variants. The tool is provided as a novel R package and R Shiny application and to demonstrate its use the article illustrates the distribution of variant effect sizes across the allele frequency range for over 100 continuous traits available in the UK Biobank. After some problems in testing the package is now available and easy to deploy via CRAN.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    All data used in this manuscript is publicly available. Rare-variant associations are available in supplementary data table 2 of the original publication [13]; GWAS summary statistics are available on the website https://www.nealelab.is/uk-biobank/.

    The code is freely available and open to others’ contributions at https://gitlab.com/JuditGG/freq_or_plots (UK Biobank analyses), https://gitlab.com/JuditGG/trumpetplots (R package with test data) and https://juditgg.shinyapps.io/shinytrumpets/ (R Shiny application).

    To seek support or to report issues, users can visit https://gitlab.com/JuditGG/freq_or_plots/-/issues (for questions or issues related to the R Shiny application) and https://gitlab.com/JuditGG/trumpetplots/-/issues (for questions or issues related to the R package).

    Snapshots of the code are also available from the GigaDB repository [26].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES