Abstract
Genome-wide association studies (GWAS) identify regions of the genome in which genetic variation is associated with the risk of complex diseases, such as diabetes, or the magnitude of traits, such as blood pressure. Determining which “effector genes” mediate the effects of GWAS associations is essential to using GWAS to understand disease mechanisms and develop new therapies. In recent years, GWAS authors have increasingly included effector-gene predictions as part of their study results. However, the research community has not yet converged on standards for generating or reporting these predictions. In this Perspective, we illustrate the diversity of the evidence types used to support effector-gene predictions and argue for future initiatives to increase their accessibility and usefulness.
Depending on one’s point of view,1 genome-wide association studies (GWAS) either produce a remarkable range of discoveries2 or implicate most of the genome – but few actual genes – in disease.3 This in part reflects a divide between geneticists, who use GWAS associations as a foundation for further research into disease,4 and non-geneticists, who frequently do not use GWAS results as part of their experimental work.5 To bridge this divide, genetic funders, industry,6 consortia,7 and investigators8,9,10 have begun to devote substantial resources to the “variant to function” (V2F) problem. V2F aims to determine the causal variant at a GWAS locus and the “effector gene” it impacts, as well as to understand the downstream molecular and cellular processes altered en route to the disease state. While V2F is not necessary for every application of GWAS (e.g. polygenic risk scores, mendelian randomization), it is a critical step toward mechanistic understanding of disease.
Arguably the most important step within V2F is effector-gene identification, as genes and their products offer the most direct clues into biological mechanisms and are the targets of most therapies. Consequently, in recent years it has become increasingly common for GWAS to include lists of predicted effector genes as a major study outcome. These sets of genes are compiled by integrating genetic associations with genomic and epigenomic annotations, computational method results, and findings from the scientific literature. Such lists are an important avenue for increasing the biological utility of a GWAS, as they communicate the authors’ “best guesses” for effector genes and provide starting points for researchers to study potential disease mechanisms, propose novel therapeutic targets, and pursue experimental evidence to elevate a predicted effector gene to a validated effector gene.11,12,13
This trend is a double-edged sword, however, and has the potential to sow confusion rather than clarity. Specifically, by analyzing the related literature of the past 10 years, we find that there is a rapid increase in published lists but little consistency in the evidence types used to support effector-gene predictions or the format in which predictions are presented. Therefore, where different lists have been published for a trait, they are often discordant. Furthermore, there is a lack of communication of these predictions to the scientists who could use them in downstream studies, as such lists are most often reported in one of the many supplementary tables accompanying a GWAS publication and are not always readily visible and adequately annotated.
Based on these observations, we argue that it is now time to develop guidelines or standards for constructing and reporting these effector-gene lists. To that end, we first review the lists published in recent years, define organizing principles for the data and methods currently used to construct them, and propose areas in which new standards would increase their utility. Our aim is to motivate the genetics community to realize and act on the need for these standards so that effector-gene lists become as robust and interpretable as the GWAS association statistics upon which they are based.
History of effector-gene prediction
Determining the effector gene for a GWAS association is a challenging task.8,9,10 Linkage disequilibrium (LD) makes it difficult to localize an association to the underlying causal variant(s); most associations are outside of protein-coding regions14, and understanding the regulatory effects of causal variants requires a variety of assays in different cells and tissues.10 For these reasons, the earliest GWAS publications annotated each locus with the gene closest to the strongest-associated variant, even though it was understood that the nearest gene was not necessarily the effector.11
Over time, researchers began to apply more sophisticated approaches to prioritize genes nearby GWAS associations (Fig. 1). We define “gene prioritization” as the activity of aggregating multiple lines of evidence across GWAS significant loci to rank all genes at each locus by each evidence type. Gene prioritization on its own does not determine the directional relationship between gene perturbation and a trait (i.e. whether gene activation or inhibition would be protective from disease), an important secondary question that requires additional data and approaches. This is a necessary first step towards an outcome of “effector-gene prediction,” which we define as integrating the combined weight of evidence to identify the gene most likely to be the effector at each locus (Fig. 1, Box 1). Some of the earliest high-profile systematic gene prioritization efforts were conducted by the GIANT consortium in 2010, for example, amassing evidence from the literature, pathway analyses, and other criteria to evaluate 95 genes found near 48 independent GWAS loci for waist-hip ratio.15
Fig.1. Different approaches for connecting variants to genes, illustrated for a single GWAS locus.

Top: A stylized representation of an association plot for an example locus (a genomic region in which there are variants significantly associated with a GWAS trait), as would be produced by the commonly used tool LocusZoom (http://locuszoom.org/). Purple diamond: the most significantly associated (index) variant. Colored dots: other variants in the region; height on the plot indicates significance of association, and different colors represent linkage disequilibrium (LD) relationships to the index variant. Overlaid line graph: recombination rate within the locus. Genes in the region are indicated below the association plot, with boxes indicating exons and lines indicating introns. Early GWAS would annotate the GWAS index variant with the nearest gene (Gene 3) even though it was already recognized at the time that the nearest gene to an index variant is not always the effector gene at the locus. Bottom: Gene-prioritization efforts assess the weights of multiple types of evidence that support the likelihood that genes at GWAS loci are the effector genes. Effector-gene prediction studies begin with gene prioritization, but further create a single table or figure that integrates and summarizes all evidence to predict the set of effector genes for the trait. Some current efforts stop at the point of gene prioritization, while others use gene prioritization results to further create a single “predicted-effector-gene list” that integrates and summarizes all evidence types (dotted arrow). In the example shown, Gene 2 has the largest number of lines of evidence, supporting the prediction that it is the effector at this locus.
Box 1: Effector gene terminology.
Two terms are currently in use in the literature to describe GWAS effector genes:
“Causal gene” or “causative gene” are most commonly used, according to PubMed searches. While they express the concept clearly and accurately, these terms could suggest certainty that the gene product has a deterministic role in causing disease. However, the use of these terms has been questioned, as it is the variants, rather than genes, that cause disease.
“Effector gene” has been used in the literature for several decades in the general sense of “a gene that carries out a role”, and, in the last decade, it has been used to convey the concept of a gene whose product is predicted to mediate the effect of a genetically associated variant on a disease or trait.70 The term effector transcript is used interchangeably. Because “effector gene” is clear and does not imply causality, we recommend its use over “causal gene” when referring to genes that are predicted to be effectors.
Several other terms express concepts that are related to but distinct from the concept of a GWAS effector gene:
“Candidate gene” most often refers to one of a subset of genes that have been chosen for investigation based on prior knowledge of their disease relevance. For example, genes might be included in a set of candidates if they participate in a biochemical pathway thought to be important for the disease, if they are located in a chromosomal region known from family linkage studies to be relevant to the disease, or if mutations in their mouse orthologs confer phenotypes similar to the complex disease under study.
“Prioritized gene” refers to a gene’s rank relative to other genes near the same GWAS signal in terms of its likelihood of being the effector gene at that locus. This term is potentially confusing because it can describe two different concepts, as it may refer to the most highly prioritized gene at a locus, or may denote any gene at a locus that has been assigned a priority relative to the other genes at the locus.
“Target gene” refers specifically to a gene whose regulation is affected by a sequence variant. This term emphasizes the role of a variant, rather than of a gene or gene product. It is also sometimes used with the different meaning of a gene whose product is the target of a drug.
“Direction of effect”, or more precisely the directional relationship between gene perturbation and disease risk, expresses whether increasing or decreasing gene activity exerts a specific effect on disease. Knowledge of direction of effect is important for developing therapeutic strategies, but this is challenging because it is difficult to determine the effect that noncoding variants have on gene activity.
We suggest two terms to describe different aspects of effector-gene prioritization:
“Gene prioritization” refers to the activity of ranking genes in a GWAS locus by the strength of each line of evidence under consideration. A gene may have strong evidence of one type and weak evidence of a different type (illustrated in Fig. 1).
“Effector-gene prediction” integrates the results of gene prioritization to identify a single gene (usually) at a locus that is most likely to be the effector, based on the combined weight of evidence (occasionally, two or more genes at a locus may have the same weight of evidence).
In recent years, lists of the most likely effector genes for GWAS traits, generated by gene prioritization and subsequent effector-gene prediction, have become a focal point of many studies. Just as the impact of GWAS have been maximized by standard approaches for data generation and imputation, quality control, and analysis and presentation, so too will the impact of these lists be maximized when the definitions of effector genes are consistent, the methods are robust and transparent, and the presentation of predictions intuitive and accessible to a diverse group of consumers.
Surveying the landscape of gene prioritization
To determine the extent of gene prioritization and effector-gene prediction studies as a follow-up to GWAS, we surveyed GWAS publications in the NHGRI-EBI GWAS Catalog16 and found 169 publications between 2012-2022 that used systematic gene prioritization across 157 traits. While this set of publications likely omits some gene prioritization studies, we argue that it includes most published studies, since our selection is based on manual review of all abstracts and selected full texts within the comprehensive GWAS Catalog collection of publications (Supplementary Note). Supporting the hypothesis that gene prioritization after GWAS is becoming increasingly common, the frequency of these papers rose from 0.8% of papers uploaded to the GWAS Catalog in 2012 to 7.5% of those papers in 2022.
We analyzed this set of gene prioritization papers to determine whether there was any consistency in the evidence used to support prioritizations. Since methods for generating such evidence have been extensively reviewed,8,9,10,17,18 we do not describe them in detail here, but rather organize the evidence types into broad categories to analyze their use in gene prioritizations. We assign them to two main groups: “variant-centric” evidence that links predicted causal variants to genes, and “gene-centric” evidence that considers properties of genes independent of nearby associations (Table 1 and Fig. 2). These are approximately equivalent, respectively, to the “bottom-up” and “top-down” approaches to gene prioritization proposed by Stacey et al.19 In many cases, gene prioritization also incorporates the results of integrative computational pipelines. Ultimately, the data underlying these pipelines can also be grouped into the variant-centric and gene-centric evidence categories.
Table 1. Variant-centric and gene-centric evidence types observed in the set of analyzed gene prioritization papers.
“GWAS variant” refers to a variant or set of variants genetically associated with, and potentially causal for, the trait or disease of interest. Different studies may consider evidence only for the most strongly associated (index) variant, for the index variant and its LD proxies, or for a set of statistically fine-mapped variants.
| Approach | Evidence type | Explanation |
|---|---|---|
| Variant-centric | Location in or near gene | A GWAS variant may be likely to affect the promoter of the gene closest to it |
| A GWAS variant that causes a potentially deleterious sequence change to a gene is likely to affect that gene | ||
| A significant gene-level association with a disease, derived from combining multiple common-variant associations, suggests involvement of a gene in a disease | ||
| Regulatory evidence: tissue-specific physical contact | A GWAS variant that physically contacts a gene’s promoter may affect an enhancer element for the gene | |
| Regulatory evidence: tissue-specific epigenomic annotations | A GWAS variant located within a regulatory element for a gene (defined by epigenomic annotations and/or by open chromatin) may alter the function of that element | |
| Regulatory evidence: tissue-specific molecular QTL | A GWAS variant coincident with or linked to a QTL affecting expression (eQTL), transcript splice forms (sQTL), protein levels (pQTL), DNA methylation (meQTL), or other molecular properties of a gene or its product can suggest the identity of the impacted gene | |
| Gene-centric | “Guilt by association” | Modulation of gene expression, actual or predicted (e.g., by a transcriptome-wide association study69) under disease conditions or co-regulation with known disease-related genes can suggest a link to the disease |
| Membership of the gene or its product in sets of disease-related genes or proteins, such as biochemical or signaling pathways, can suggest a link to the disease | ||
| Physical interaction of a gene product with disease-relevant proteins can suggest involvement in disease mechanisms | ||
| Perturbational evidence | If a gene is a known effector for a Mendelian disorder, it could potentially be an effector for a related complex disease | |
| If perturbation of a gene or its ortholog in cell or animal models confers phenotypes related to a disease, this can suggest that the gene has a role in the disease | ||
| If a drug in use to treat a disease interacts with or affects a gene product, this can suggest a relationship between the gene and the disease | ||
| A significant gene-level association with a disease, derived from combining multiple rare-variant associations, can suggest a role for the gene in the disease | ||
| Assertion in literature or resources | Assertions about a gene’s function, derived from direct experimentation or functional annotation, may suggest a link to a disease |
Fig. 2. Evidence types in current use for gene prioritization and effector-gene prediction, illustrated for the example locus shown in Fig. 1.

Top: Variant-centric evidence types may be collected for the index variant (purple diamond), variants in LD with the index variant (colored dots), or for fine-mapped variants in the locus. Bottom: Gene-centric evidence types are determined for all genes within a defined distance of the locus. See Table 1 and the main text for details on each method. Computational pipelines may incorporate both variant-centric and gene-centric evidence types (as indicated with the bracket). The computational pipelines used in the set of papers analyzed are listed in Supplementary Table 1.
Variant-centric evidence
Variant-centric approaches attempt to identify the causal variant underlying an observed GWAS association, the regulatory element (e.g., enhancer or promoter) and cell type that the variant impacts, and the gene whose regulation is altered by this change.14 Frequently, a first step is to perform statistical fine-mapping20,21 to predict which variants are likely to be causal, potentially using functional priors22 or trans-ethnic analyses23 to improve resolution. Once the set of putatively causal variants has been established, variant-centric approaches try to identify which genes may be affected by these variants.
One class of variant-centric evidence depends on the proximity of a GWAS variant to a gene (Table 1). In the <10% of cases24 where a causal variant alters a protein-coding sequence, a prediction that the change is deleterious (e.g., by tools such as Variant Effect Predictor25 and others) is usually considered to provide a clear link between the variant and that gene. When a causal variant lies in the noncoding genome, the simple assumption that it affects the promoter of the nearest gene may often be correct.19,26,27 Finally, a significant gene-level aggregate association of common variants in and near a gene, estimated using tools such as MAGMA (Multi-Marker Analysis of GenoMic Annotation28), is often used to support effector-gene predictions (see below).
Other variant-centric approaches seek to link noncoding variants to regulation of specific genes by their three-dimensional physical contact with promoters or enhancers,29,30,31 by their location within regions annotated as regulatory elements,32,33,34 or by their statistical associations with molecular properties of genes or their protein products at quantitative trait loci (QTL).35,36,37,38 All of these approaches implicitly assume that the variant impacts the effector gene in cis, rather than genes far away in trans – a reasonable assumption supported by evidence of primarily local regulation, though the true extent of trans effects remains uncertain. Since gene regulation can only provide evidence linking a gene to a disease when it is shown to occur in a disease-relevant tissue, the cellular context of variant-centric regulatory evidence is critical. Here, single-cell approaches provide much higher resolution than those using bulk tissue samples.32,39
Gene-centric evidence
A gene-centric approach to effector-gene prioritization begins with the set of genes that are inside a GWAS locus and considers evidence about each gene and its product, rather than about the causal variant in the locus. The major types of gene-centric evidence included in the gene prioritization papers we analyzed (Table 1) can be categorized as “guilt-by-association,” perturbational, and literature-based.
Guilt-by-association evidence is based on the idea that genes relevant to the same disease exhibit similarities to or interactions with each other: they may be co-regulated or differentially regulated in disease, or their products may physically interact or participate together in a biological pathway. Perturbational evidence arises from phenotypes conferred by impairment of a gene or its product – for example, Mendelian (monogenic) mutations or knock-out experiments in model organisms40 or human cell lines. Other sources of perturbational evidence include “burden tests” of rare coding variants identified from whole exome sequencing,41,42 or functional impairment of a gene product by a drug.
Any of the gene-centric evidence types may also be found in the published literature about individual genes and in curated online resources. Researchers performing gene prioritization often incorporate this literature either in an ad hoc way, via mining of public databases, or via statistical text-mining methods such as GRAIL43, whereas others might simply draw upon their expert knowledge accumulated through years of research (https://github.com/fauman/EGEA).
Computational pipelines
Computational pipelines weight and integrate various types of evidence to make effector-gene predictions. Results from the widely used pipelines DEPICT44 (Data-driven Expression Prioritized Integration for Complex Traits), FUMA (Functional Mapping and Annotation of genetic associations) SNP2GENE,45 PoPS (Polygenic Priority Score),26 and L2G (Locus to Gene)46 were often included in combination with additional independent evidence types in the set of gene prioritization papers that we analyzed (Supplementary Table 1; see below).
Broad inconsistency in evidence types and presentation
With so many evidence types available for effector-gene prioritization, choosing a methodology and presenting it clearly to readers can be challenging. To understand trends in how authors have approached this challenge, we identified commonalities and differences in evidence types and presentation formats in the set of gene prioritization papers analyzed (Supplementary Table 2). First, we considered the evidence types that were included. Most studies cited only a minority (3 or 4) of the evidence categories defined above (Fig. 3A). The most cited evidence categories are GWAS variant location in or near a gene, and molecular QTL (Fig. 3B). Rarely did studies use the same set of evidence categories, with 73 distinct sets occurring across the 169 papers and the most common set (variant location and QTL only) shared by only 10 papers. We did not identify any clear trends in the usage of different evidence categories over time, nor were we able to group studies into distinctive “types”, either manually or through statistical methods such as k-means clustering or non-negative matrix factorization.
Fig. 3. Evidence usage and presentation formats in gene prioritizations and effector-gene predictions.

A. Number of evidence categories cited per paper in the 169 gene prioritization papers analyzed. The 8 evidence categories include the 4 variant-centric and 3 gene-centric categories (see Table 1), plus a “Computational pipeline” category. B. Number of papers that cited each category of variant-centric or gene-centric evidence or the results of integrative pipelines. Some papers cited multiple specific evidence types within a category (for example, both eQTL and pQTL evidence within the Molecular QTL evidence category). C. Distribution of the papers by inclusion of either a single table or figure (“Gene prioritization plus effector-gene prediction integrated into a single table or figure”) or representation of evidence in multiple separate tables (“Gene prioritization only”). D. Analysis of the 115 predicted-effector-gene lists in tabular format. Lists were compared by three criteria: 1) identification of each locus with the rsID or genomic coordinate of the sentinel variant (“Loci identified”); 2) inclusion of evidence for all genes in each GWAS locus (as opposed to including only the top gene) (“Show all genes”); and 3) use of a scoring system to indicate the weight of evidence per gene (“Assign scores”).
Second, we considered whether authors only prioritized genes or further consolidated the results to make effector-gene predictions (Fig. 3C). In 24% (41) of the studies, only gene prioritization data were included, with each evidence type presented separately throughout the paper and supplements. In the remaining 76% of papers (128), the authors followed up the gene prioritization with effector-gene prediction, integrating all results into a unified single component (table or figure). While 115 of the effector-gene-prediction papers presented results in tabular format (most often as supplementary data files), 13 papers showed the list and supporting evidence only in a graphical format or as an image of a table. Thus, only about two thirds of these gene prioritization efforts (115 of 169 studies) produced a re-usable output that is suitable for downstream analysis (Fig. 3C).
Even among the 115 publications that produced tabular lists of genes and evidence, we observed substantial heterogeneity in the content of the lists. In 10 papers we analyzed in more detail to investigate how authors chose which genes to consider at a locus, we found no consistency: some simply included all genes within a certain distance of the sentinel variant (ranging from 25kb to 1Mb), while others incorporated both LD and physical distance.
The content of the 115 tabular lists also differed widely in other respects, such as whether evidence was presented for each gene at each GWAS locus (70% of lists) or only for the single gene at each locus considered most likely to be the effector gene (30% of lists), whether the lists included a scoring system that summarized the weight of evidence for each gene (29% included a scoring system while 71% did not), and, finally, whether GWAS loci were identified by the Reference SNP Cluster IDs (rsIDs) or coordinates of their index variants (81% of the lists identified the index variant while 19% did not, introducing substantial difficulty in mapping the predictions to GWAS signals) (see Fig. 3D).
The 33 lists that did include a scoring or categorization system for the weight of evidence communicated the confidence of their predictions in differing ways. Most lists included a score that was simply the sum of the number of lines of evidence (seen in 24 of the 33 lists), while a few presented a continuous numerical score generated by applying an algorithm to the evidence (for example, the PoPS method26 or the “priority index” algorithm47). Numerical scores were generally applied over the whole list, such that genes in different loci had scores that could be compared to each other. A few lists classified the weight of evidence with text strings such as “low”, “moderate”, or “strong.” Only one list in our set48 used a substantially more complicated step-by-step process to evaluate the weight of evidence for genes, although it should be noted that heuristics have been applied to evidence in other predicted-effector-gene lists not considered here (https://hugeamp.org/research.html?pageid=mccarthy_t2d_247, https://hugeamp.org/research.html?pageid=T1D_UVa_168) and are also built into some integrative prioritization pipelines, such as GPScore49 and ProGeM.19
Empirically, therefore, the most common type of predicted-effector-gene list – to the extent that commonality can be found – uses 3 or 4 evidence types (most likely including the QTL and variant location criteria) and includes evidence for all genes at each identified locus, but without a quantitative scoring system (Fig. 3D). To date, it seems that the genetics research community has not organically converged upon standards for producing or presenting these lists.
Concordance of independently created lists
To enable the use of predicted-effector-gene lists as starting points for biological experiments or therapeutic development,50 it is desirable that multiple lists for the same trait converge to a reproducible set of genes, irrespective of differences in the underlying samples or evidence. Within our set of papers, two or more lists had been created for 19 traits. We found that most of these lists did not indicate the most likely effector gene at each locus, instead prioritizing multiple genes for each locus without a scoring system for the strength of evidence, so that the strongest predictions could not be identified and compared. Pairs of lists for four traits (Alzheimer’s disease,51,52 heart failure,53,54 stroke,55,56 and estimated glomerular filtration rate (eGFR)57,58) did identify the top gene per locus, allowing us to compare their predictions (Supplementary Note).
For each trait, we first identified loci that had been detected in both studies and then compared the highest-ranked gene at each shared locus. We found concordance (defined as identification of the same highest-ranking gene in both studies, or a match within multiple genes with the same ranking) in 50% of shared Alzheimer’s loci, 67% of shared heart failure loci, and 75% of shared stroke loci. Among the 552 genes that were predicted as effectors by the newer eGFR study,58 47% were top-ranked in the older study.57 We compared the evidence supporting several of the discordant predictions between lists, and found that, in general, the differences could be ascribed either to the use of differing evidence types or to differing input data within the same evidence type (e.g., eQTL specific to different tissues; see Supplementary Note).
While these comparisons indicate that predictions for shared loci across different studies are clearly more concordant than would be expected by chance, concordance rates of 50-75% are low from the perspective of producing “canonical” gene lists that can be trusted by researchers who perform downstream studies of these genes. In addition to discordant effector-gene predictions, discrepancies across lists only increase when their formats and methodologies are considered. For example, performing the comparisons required steps such as manually extracting information from graphics, comparing different tables to identify the genetic loci for prioritized genes, and mapping variant rsIDs to chromosomal coordinates. All of these discrepancies can make it challenging to draw meaningful conclusions from the lists, hindering their interoperability and reducing their potential for downstream integration and analysis.
Suggested best practices and standards
Our analyses illustrate the large heterogeneity in published lists of predicted effector genes, likely because there are currently no common methods or standards for producing or presenting them. However, they also demonstrate that effector-gene predictions are becoming more widespread and prominent aspects of GWAS publications. Ideally, standards and recommendations for effector-gene prediction would be developed by representatives of the genetics and genomics community who collectively understand the strengths and limitations of GWAS, the principles underlying gene prioritization methods, and the downstream uses of effector-gene predictions in experimental studies and therapeutic target prioritization. This group would balance feasibility with desirability and recognize the need to bring consistency to the field of effector-gene prioritization even as methods continue to evolve. The experiences of ClinGen59 and the GenCC60 for rare disease, ACMG for rare variant classification,61 and the Mendelian Randomization community62 offer past examples in analogous settings where consortia were formed to bring consistency and uniformity to previous gray areas of genetic reporting.
It will likely take several years to agree upon standards for evaluating and sharing predicted-effector-gene lists. To begin this process, in September 2024 we held a community workshop on standards and infrastructure for predicted-effector-gene lists that was attended by about 80 researchers spanning a wide range of career stages, institutions (i.e., universities, non-profit institutes, pharmaceutical companies, funding agencies), and geographic locations. A summary of the workshop is available online (https://kp4cd.org/2024_PEG_workshop) and a manuscript is in preparation describing its proceedings. An initial set of high-level recommendations, arising from the analysis reported here and proposed at the workshop, is presented in Box 2. These represent a starting point for definition and implementation of a more detailed reporting standard.
Box 2: General recommendations emerging from a community workshop on standards and infrastructure for predicted-effector-gene lists.
We advocate for greater recognition by the scientific community of the value of effector-gene predictions. Authors of GWAS manuscripts should include effector-gene predictions in order to increase the impact of their work, journals should recommend effector-gene predictions as a standard part of manuscript submissions, and peer reviewers should consider the inclusion of these predictions (and the extent to which they are presented in an interpretable format) as a strength of such studies.
These lists should at minimum include: the genes predicted as effectors and the trait for which they are predicted to be effectors, definitions of the GWAS loci for which predictions were made, some qualitative or quantitative measure of confidence for the predictions, and the data that were used to make the predictions. Uniform nomenclature and controlled vocabulary should be used wherever possible—for example, from the HUGO Gene Nomenclature Committee (https://www.genenames.org/), Experimental Factor Ontology (https://www.ebi.ac.uk/ols4/ontologies/efo), Evidence & Conclusion Ontology (https://www.ebi.ac.uk/ols4/ontologies/eco), or others.
Data should always be presented in a single file in machine-readable tabular format,71 in addition to any graphical or simplified tabular representation.
While the recommendations above could be implemented immediately, going forward a consortium or working group should be formed to 1) define standards for creating and reporting effector-gene predictions and 2) develop benchmarks for evaluating the quality of methodologies for producing these predictions, leading to a set of recommended guidelines that effector-list creators can follow.
An open-source catalog of these lists should be created that satisfies FAIR (Findable, Accessible, Interoperable, and Reusable72) criteria.
Future directions
In the era before GWAS, the heterogeneity and subjectivity by which genes were studied for common diseases led to a replicability crisis63 that was ultimately corrected by rigorous statistical standards introduced by GWAS.64 Today, the content and format of predicted-effector-gene lists likewise vary widely across studies; they are inherently subjective and are difficult to interpret even by expert geneticists who did not conduct the original study – let alone by non-geneticists who are the most likely consumers of these lists. Although such lists are currently found in only a minority of GWAS publications, they are disproportionately included in larger-scale studies published by high-impact journals (Supplementary Note), underscoring their importance. Given the increasing inclusion of effector-gene predictions with GWAS and their prioritization by funding agencies (https://grants.nih.gov/grants/guide/rfa-files/rfa-dk-19-012.html, https://grants.nih.gov/grants/guide/notice-files/NOT-MH-24-370.html) and industry, it is likely that effector-gene lists will become a commonplace or expected output of GWAS publications in the near future. The GWAS research community now has the opportunity to establish forward-looking guidelines and standards for predicted-effector-gene list production that minimize their heterogeneity and increase their transparency and interpretability.
Our analysis of the inputs to these lists illustrates the main reason for their heterogeneity: effector-gene prediction is an evolving science, with numerous evidence sources and methods available as inputs for predicting genes. Multiple efforts are underway to leverage this breadth of input data to more accurately predict effector genes,17 including pipelines such as cS2G (Combined SNP2Gene),65 Ei (Effector index),66 FLAMES (Fine-mapped Locus Assessment Model of Effector geneS),27 and CALDERA (CALling Disease-RelAted genes).67 Advances in machine learning and artificial intelligence may also soon revolutionize the entire field of effector-gene prediction.68 The availability of benchmarking sets of “gold standard” genes will be crucial for these efforts, both for training the models they use and for evaluating the quality of their output, and as more genes undergo detailed functional characterization, more of these sets will become available. We therefore do not intend to issue recommendations that direct researchers towards specific evidence sources or tools, since consensus does not yet exist regarding them – for example, there have been studies demonstrating the small amount of overlap between eQTL and GWAS associations,37 yet our survey indicates QTL are nonetheless one of the two data sources most frequently used for effector-gene prediction (Fig. 3B). Given all these considerations, effector-gene lists produced five years from now will undoubtedly look very different from the lists produced today.
However, as studies are already producing effector-gene lists as discussed here, scientists will seek to use these lists in their research, particularly due to the importance of genetic support in establishing the human disease-relevance of genes.50 These scientists will find it difficult to navigate the current landscape of predicted-effector-gene lists and may reach erroneous conclusions or invest substantial time and resources in pursuing inappropriate genes. Therefore, the appropriate question to ask is not whether there is sufficient consensus today to issue rigorous standards for gene prioritization methods, but rather whether an organized community of geneticists can improve upon the current status quo for reporting these activities and ensure that the outcomes are findable, accessible, and re-usable in downstream analyses, including serving as benchmarking sets for future methods development. Specifically, a concrete roadmap and set of guidelines – however likely they are to be revised in the future – could, today, increase the utility and transparency of effector-gene lists and, in the coming years, ensure that effector-gene lists become more statistically rigorous, consistently created and presented, and interoperable between studies and across traits.
Concluding remarks
We conclude, therefore, with a “call to arms” – to scientists conducting GWAS, collecting data for translating variant to function, and developing methods for gene prioritization – to invest in a roadmap for advancing the field of predicted-effector-gene list construction and presentation. We envision a scenario where such lists accompany GWAS publications as frequently as Manhattan plots, and are submitted to catalogs similar to the GWAS Catalog in formats that can be easily integrated within downstream approaches, such as knowledge graphs or machine learning (https://datascience.nih.gov/sites/default/files/NIH-STRATEGIC-PLAN-FOR-DATA-SCIENCE-2023-2028-final-draft.pdf). This will enable researchers of any background to understand the lists and use them to advance our understanding of disease and development of new treatments. Predicted-effector-gene lists provide perhaps the most important step in translating GWAS to impact human health, but only if they can be interpreted and trusted by the broad community of scientists who will need to use them.
Supplementary Material
Acknowledgments
We thank our colleagues for helpful discussions. This work was supported by the awards 5U24HG011453, 2UM1DK105554, and 1U24HG012542-01 from the National Institutes of Health, and by European Molecular Biology Laboratory Core Funds.
Footnotes
Competing Interests
The authors declare no competing interests.
References
- 1.Tam V et al. Benefits and limitations of genome-wide association studies. Nat Rev Genet 20, 467–484 (2019). [DOI] [PubMed] [Google Scholar]
- 2.Visscher PM et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 101, 5–22 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Boyle EA, Li YI & Pritchard JK An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Claussnitzer M et al. A brief history of human disease genetics. Nature 577, 179–189 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dornbos P et al. Evaluating human genetic support for hypothesized metabolic disease genes. Cell Metab 34, 661–666 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ochoa D et al. The next-generation Open Targets Platform: reimagined, redesigned, rebuilt. Nucleic Acids Res 51, D1353–d1359 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Deciphering the impact of genomic variation on function. Nature 633, 47–57 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gallagher MD & Chen-Plotkin AS The Post-GWAS Era: From Association to Function. Am J Hum Genet 102, 717–730 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Claussnitzer M & Susztak K Gaining insight into metabolic diseases from human genetic discoveries. Trends Genet 37, 1081–1094 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cano-Gamez E & Trynka G From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front Genet 11, 424 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Claussnitzer M et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med 373, 895–907 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gupta RM et al. A Genetic Variant Associated with Five Vascular Diseases Is a Distal Regulator of Endothelin-1 Gene Expression. Cell 170, 522–533.e15 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Musunuru K et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–9 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Watanabe K et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51, 1339–1348 (2019). [DOI] [PubMed] [Google Scholar]
- 15.Heid IM et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42, 949–60 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cerezo M et al. The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity. Nucleic Acids Res (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Qi T, Song L, Guo Y, Chen C & Yang J From genetic associations to genes: methods, applications, and challenges. Trends Genet 40, 642–667 (2024). [DOI] [PubMed] [Google Scholar]
- 18.Schipper M & Posthuma D Demystifying non-coding GWAS variants: an overview of computational tools and methods. Hum Mol Genet 31, R73–r83 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Stacey D et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res 47, e3 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Maller JB et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat Genet 44, 1294–301 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li B & Ritchie MD From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. Front Genet 12, 713230 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Weissbrod O et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat Genet 52, 1355–1363 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yuan K et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Nat Genet 56, 1841–1850 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Peña-Martínez EG & Rodríguez-Martínez JA Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases. Front Biosci (Schol Ed) 16, 4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McLaren W et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weeks EM et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat Genet 55, 1267–1276 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schipper M et al. Gene prioritization in GWAS loci using multimodal evidence. medRxiv, 2023.12.23.23300360 (2024). [Google Scholar]
- 28.de Leeuw CA, Mooij JM, Heskes T & Posthuma D MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol 11, e1004219 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cairns J et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol 17, 127 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fulco CP et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664–1669 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pliner HA et al. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Mol Cell 71, 858–871.e8 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gaulton KJ, Preissl S & Ren B Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat Rev Genet 24, 516–534 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Moore JE et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Boix CA, James BT, Park YP, Meuleman W & Kellis M Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Neumeyer S, Hemani G & Zeggini E Strengthening Causal Inference for Complex Disease Using Molecular Quantitative Trait Loci. Trends Mol Med 26, 232–241 (2020). [DOI] [PubMed] [Google Scholar]
- 36.Zhu Z et al. Leveraging molecular quantitative trait loci to comprehend complex diseases/traits from the omics perspective. Hum Genet 142, 1543–1560 (2023). [DOI] [PubMed] [Google Scholar]
- 37.Connally NJ et al. The missing link between genetic association and regulatory function. Elife 11(2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mostafavi H, Spence JP, Naqvi S & Pritchard JK Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat Genet 55, 1866–1875 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG & Powell JE Single-cell genomics meets human genetics. Nat Rev Genet 24, 535–549 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Groza T et al. The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Res 51, D1038–d1045 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Goldstein DB et al. Sequencing studies in human genetics: design and interpretation. Nat Rev Genet 14, 460–70 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lee S, Wu MC & Lin X Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–75 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Raychaudhuri S et al. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet 5, e1000534 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6, 5890 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Watanabe K, Taskesen E, van Bochoven A & Posthuma D Functional mapping and annotation of genetic associations with FUMA. Nat Commun 8, 1826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mountjoy E et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet 53, 1527–1533 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fang H et al. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nat Genet 51, 1082–1091 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Watanabe K et al. Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nat Genet 54, 1125–1132 (2022). [DOI] [PubMed] [Google Scholar]
- 49.Sarsani V et al. A cross-ancestry genome-wide meta-analysis, fine-mapping, and gene prioritization approach to characterize the genetic architecture of adiponectin. HGG Adv 5, 100252 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Minikel EV, Painter JL, Dong CC & Nelson MR Refining the impact of genetic evidence on clinical success. Nature 629, 624–629 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kunkle BW et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet 51, 414–430 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schwartzentruber J et al. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes. Nat Genet 53, 392–402 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Shah S et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun 11, 163 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Levin MG et al. Genome-wide association and multi-trait analyses characterize the common genetic architecture of heart failure. Nat Commun 13, 6914 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Malik R et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet 50, 524–537 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Mishra A et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature 611, 115–123 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Stanzick KJ et al. Discovery and prioritization of variants and genes for kidney function in >1.2 million individuals. Nat Commun 12, 4350 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Liu H et al. Epigenomic and transcriptomic analyses define core cell types, genes and targetable mechanisms for kidney disease. Nat Genet 54, 950–962 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Strande NT et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet 100, 895–906 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.DiStefano MT et al. The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources. Genet Med 24, 1732–1742 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Richards S et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–24 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Skrivankova VW et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. Jama 326, 1614–1621 (2021). [DOI] [PubMed] [Google Scholar]
- 63.Hirschhorn JN, Lohmueller K, Byrne E & Hirschhorn K A comprehensive review of genetic association studies. Genet Med 4, 45–61 (2002). [DOI] [PubMed] [Google Scholar]
- 64.Marigorta UM, Rodríguez JA, Gibson G & Navarro A Replicability and Prediction: Lessons and Challenges from GWAS. Trends Genet 34, 504–517 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gazal S et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet 54, 827–836 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Forgetta V et al. An effector index to predict target genes at GWAS loci. Hum Genet 141, 1431–1447 (2022). [DOI] [PubMed] [Google Scholar]
- 67.Schipper M, Ulirsch J, Posthuma D, Ripke S & Heilbron K Simplifying causal gene identification in GWAS loci. medRxiv, 2024.07.26.24311057 (2025). [Google Scholar]
- 68.Long E, Wan P, Chen Q, Lu Z & Choi J From function to translation: Decoding genetic susceptibility to human diseases via artificial intelligence. Cell Genom 3, 100320 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Gamazon ER et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–8 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Mahajan A et al. Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2-ABCB11 locus. PLoS Genet 11, e1004876 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Hertz MI & McNeill AS Eleven quick tips for properly handling tabular data. PLoS Comput Biol 20, e1012604 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Wilkinson MD et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
