Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2026 Mar 8;27(2):bbag061. doi: 10.1093/bib/bbag061

Finding Significant Hits in Networks: a network-based tool for analyzing gene-level P-values to identify significant genes missed by standard methods

Sandeep Acharya 1, Vaha Akbary Moghaddam 2, Wooseok J Jung 3, Yu S Kang 4, Shu Liao 5, Michael A Province 6, Michael R Brent 7,
PMCID: PMC12967332  PMID: 41795654

Abstract

Finding Significant Hits in Networks (FISHNET) uses prior biological knowledge, represented as gene interaction networks and gene function annotations, to identify genes that do not meet the genome-wide significance threshold but replicate, nonetheless. Its input is gene-level P-values from any source, including omicsWAS, aggregation of genome-wide association studies P-values, CRISPR screens, or differential expression analysis. It is based on the idea that genes whose P-values are low purely by chance are distributed randomly across networks and functions, so genes with suggestive P-values that cluster in densely connected subnetworks and share common functions are less likely to reflect chance and more likely to replicate. FISHNET combines network and function analysis with permutation-based P-value thresholds to identify a small set of exceptional genes that we call FISHNET genes. Applied to 11 cardiovascular risk traits, FISHNET identified 19 gene-trait relationships that missed genome-wide significance thresholds but, nonetheless, replicated in an independent cohort. The replication rate of FISHNET genes matched that of genes with lower P-values. FISHNET identified a novel association between RUNX1 expression and HDL that is supported by experimental evidence that RUNX1 promotes white fat browning, which increases HDL cholesterol levels. FISHNET also identified an association between LTB expression and BMI that is supported by experimental evidence that higher LTB expression increases BMI via activation of the LTβR pathway. Both associations failed genome-wide significance thresholds, highlighting FISHNET’s ability to uncover meaningful relationships missed by traditional methods. FISHNET software is freely available at https://brentlab.github.io/fishnet/.

Keywords: gene prioritization, relaxed significance thresholds, network-based analysis, novel gene discovery, replicable gene-trait associations

Introduction

The primary goal of hypothesis testing is to determine whether an observation made in a sample from a population is likely to be true in the entire population. Statistical hypothesis testing is typically used to minimize the risk of false positive findings at the cost of substantial risk of false negatives. The false-negative risk is magnified by multiple testing correction (typically Bonferroni [1, 2] or Benjamini-Hochberg [3, 4]), particularly when many tests are carried out, as is the case in genome-wide association studies (GWAS) [5, 6], transcriptome-wide association study (TWAS) [7, 8], and other omics-WAS [9, 10]. Moreover, when effect sizes are small, as is typical in genetics, very large samples are needed to reach statistical significance. Thus, there is an urgent need for methods that can identify true, population-wide omics-trait relationships that do not meet Bonferroni or Benjamini-Hochberg significance thresholds. Such methods can be evaluated empirically by replication studies, in which additional samples are drawn from the population.

We propose Finding Significant Hits in Networks (FISHNET), a new method for finding exceptional gene-trait associations that replicate at a higher rate than other associations with the same P-values. FISHNET integrates results from gene-level summary statistics with prior biological knowledge represented as networks. It can use gene-level summary statistics from GWAS, TWAS with measured or predicted gene expression levels, proteome-wide association studies, RNA-Seq experiments, functional genetics screens, or any other source. It uses gene–gene interactions from co-expression networks, protein–protein interaction (PPI) networks, or other networks, together with gene function annotations from Gene Ontology (GO) [11]. We hypothesize that genes whose P-values are low by chance are distributed randomly across biological networks and functions. Thus, when genes with low P-values cluster in densely interconnected subnetworks (network modules) and share common functions, they are less likely to reflect sampling error and therefore more likely to replicate in new samples. FISHNET combines network module enrichment analysis [12] and GO over-representation analysis [13] with permutation-based significance thresholds to identify a small set of exceptional, trait-influencing genes that we call FISHNET genes.

FISHNET is specifically designed to identify gene-trait associations that replicate in independent samples at a higher rate than other associations with similar P-values. In addition to prioritizing genes that meet traditional significance thresholds, it identifies genes below these thresholds that replicate at a similar rate, enabling thresholds to be relaxed. Notable features include its permutation-based significance testing, its automated testing of model assumptions, and its use of GO biological process annotations in addition to networks. These annotations, which are manually curated and reflect a large body of literature [11], are complementary to networks, which are typically generated from high-throughput data. Another reason for supplementing networks with GO analysis is that network modules are identified computationally [14], based on network connectivity, so there is no guarantee that two genes in the same module share a common function.

This paper makes four contributions. First, it introduces the FISHNET algorithm and software. The software is freely available, easy to install, and easy to deploy across a wide range of computing environments. Second, it evaluates the replicability and reproducibility of FISHNET genes using 43 sets of GWAS gene-level summary statistics [14]. Evaluation was carried out on nine combinations of networks and algorithms for finding modules, identifying the most useful combinations. Third, using the best networks and the best module detection algorithm, it identifies FISHNET genes across 11 traits associated with cardiovascular risks in the Long Life Family Study (LLFS) cohort and performs the replication analysis in the Framingham Heart Study (FHS) cohort (See acknowledgements) [15, 16]. The results show that FISHNET genes have a better replication rate than non-FISHNET genes with similar P-values. Fourth, this paper introduces metrics to assess whether user inputs—network modules and gene-level summary statistics—violate the model assumptions.

Methods

Gene-level summary of genome-wide association studies

Gene-level summary statistics were obtained from 185 meta-analyses of GWAS collected for the 2019 Disease Module Identification DREAM Challenge [14]. The SNP-level summary statistics were aggregated to gene-level statistics using PASCAL [12]. PASCAL uses the sum of chi-squared approach to calculate a gene-level P-value. To create discovery and replication set pairs for replication analysis, we used only traits that had more than one study with completely or partially independent cohorts. Additionally, we removed studies where the genotyped SNPs did not cover all chromosomes in the genome. After these exclusions, we retained gene-level summary statistics from 43 GWAS. There were 17 traits with exactly two GWAS datasets—Bipolar disorder, BMI, BMI (male), BMI (female), Coronary artery disease, Fasting glucose, Height, Hip circumference (male), Hip circumference (female), Molecular degeneration, Rheumatoid arthritis, Total cholesterol, Waist circumference (male), Waist circumference (female), Waist-hip ratio (male), Waist-hip ratio (female), and Schizophrenia. For High-density lipoprotein (HDL), Low-density lipoprotein (LDL), and Triglycerides, there were three GWAS studies, two of which we used as discovery sets, with the third as the replication set for both. The final dataset consisted of 23 discovery and replication set pairs, each with gene-level GWAS summary statistics. The characteristics of each dataset can be found in Supplementary Table S1.

Long Life Family Study

The LLFS is a longitudinal family study that enrolled families enriched for exceptional longevity to discover genetic factors contributing to healthy aging. LLFS enrolled 4953 participants in 539 pedigrees, primarily of European ancestry (99%). The recruitment procedure and enrollment criteria of the LLFS participants have been previously described [17, 18]. The data generated in the study includes gene expression levels from blood and biomarkers of health and aging. We focused on 11 traits associated with cardiovascular risks spanning four categories: pulmonary (forced expiratory volume, forced vital capacity, and the ratio of the two), lipids (HDL, LDL, triglycerides, total cholesterol), anthropometric (BMI, BMI-adjusted waist), and cardiovascular (pulse, ankle-brachial index) [19–23].

Long Life Family Study RNA-seq and transcriptome wide association

RNA extraction and sequencing were carried out by the McDonnell Genome Institute at Washington University. Total RNA was extracted from PAXgene™ Blood RNA tubes using the Qiagen PreAnalytiX PAXgene Blood miRNA Kit (Qiagen, Valencia, CA). The RNA-Seq data were processed with the nf-core/rnaseq pipeline version 3.3 using STAR/RSEM and otherwise default settings (https://zenodo.org/records/5146005). The RNA-Seq data QC steps and the gene expression level adjustment model used in this study have been previously described by Acharya et al. [8] who performed TWAS on the same 11 traits using the data from the first clinical exam in LLFS. Since our previous publication, the dataset has grown. Depending on the trait, the number of participants with both RNA-Seq and trait data now ranges from 879 to 1667. The adjustment steps for all 11 traits are described in [8]. Supplementary Table S2 shows the characteristics of study participants for covariates and 11 cardiovascular risk traits.

For each trait, the adjusted gene expression residuals were used as a predictor and the adjusted trait residuals were used as a response variable in a linear mixed model implemented in MMAP [24]. A kinship matrix generated by MMAP from the LLFS pedigree was used to account for family relatedness. For traits with genomic inflation factor > 1.1, the P-values were adjusted by using BACON [25]. The same RNA-Seq processing and trait adjustment steps were applied for replication in the FHS dataset, where the number of participants with data for each trait ranged from n = 1080 to n = 1380.

Module enrichment analysis and Gene Ontology over-representation analysis

First, in each selected gene–gene interaction network, network modules (highly connected subnetworks) were identified by three algorithms from the Disease Module Identification DREAM Challenge [14], designated as R2 (based on random walk), K1 (based on kernel clustering), and M2 (based on modularity optimization). The MONET software incorporates these three modularization algorithms. Each algorithm takes in a gene network and returns modules of highly interconnected genes. The source code and user documentation are available at https://github.com/BergmannLab/MONET. The modules identified by MONET in this work were from STRING functional PPI network [26], InWeb physical PPI network [27], and a gene co-expression network from Gene Expression Omnibus [28, 29]. In addition to aggregating SNP-level P-values to gene-level, the PASCAL software package provides separate functionality to aggregate gene level P-values to module level [12]. As such, after identifying network modules, GWAS or TWAS gene-level summary statistics and sets of genes in network modules were fed into PASCAL’s module enrichment algorithm [12]. Figure 1A depicts the inputs and outputs of module enrichment analysis and GO over-representation analysis. After modules were identified, the connections between genes in modules were not used. PASCAL’s module enrichment algorithm outputted sets of genes in modules significantly enriched for genes with low P-values, adjusted for the total number of modules tested using Bonferroni correction. GO over-representation analysis was done on the set of genes in each enriched module by using WebGestaltR (version: 0.4.6) with the following configuration: (organism: hsapiens, method: ORA, enrichDatabase: GO Biological Process, FDRMethod: BH, FDRThreshold = 0.05) [13]. The affinity propagation feature in WebGestaltR was used to eliminate GO biological processes with highly overlapping member genes, thereby reducing the multiple testing burden and improving computational performance.

Figure 1.

Alt text: A workflow diagram in two parts. Panel A illustrates the input of gene-level P-values into module significance analysis, leading to significant modules and Gene Ontology results. Panel B displays a flowchart of the permutation-based computational pipeline used to statistically identify significant FISHNET genes.

(A) The first step in the FISHNET workflow. The gene-level P-values are input into module significance analysis. Module significance analysis outputs significant modules and their P-values. Gene ontology over-representation analysis identifies biological processes with significant over-representation among genes in each significant module. (B) The workflow used to identify FISHNET genes. For details of the permutation-based hypothesis testing, see Methods.

Finding Significant Hits in Networks algorithm

Figure 1B depicts the FISHNET algorithm. FISHNET is run separately for each combination of traits, networks, and module detection algorithms. Genes are considered caught in the FISHNET if they:

  1. Are in a network module unusually enriched for genes with low P-values (PASCAL’s module enrichment analysis). This passes only genes that work together with other trait-implicated genes. And,

  2. Are annotated by a GO biological process term that is enriched in their module (GO over-representation analysis). This passes only genes that work together as a part of a common biological process. And,

  3. Are among the top j genes ranked by P-values. j starts at 5% of genes. If all genes’ P-values are under the null model, they should be uniformly distributed, in which case the top 5% corresponds to a nominal P-value threshold of 0.05; if the P-value distribution is inflated, the top 5% will correspond to a P-value threshold lower than 0.05.

    • a If using the optional permutation-based hypothesis testing described below, j is reduced until specified significance criteria are met.

Permutation-based null model: empirical hypothesis testing

Null hypothesis: The number of candidate FISHNET genes is not substantially higher than expected at random.

PASCAL’s module enrichment analysis pipeline monotonically transforms the genes’ P-value distribution into a chi-squared distribution. This process relies only on the ranks (quantiles) of the genes when ranked by their P-values. Therefore, FISHNET works as follows (also shown in Fig. 1B):

  1. Randomly permute the original P-value ranks of genes M times. By default, M = 200.

  2. Run module enrichment and GO over-representation analyses for the original gene ranks and all permutations.

  3. Let Inline graphic be the set of genes in permutation Inline graphic that:

    • Are in a module of interacting genes that is significantly enriched for low P-values.

    • Are annotated with a biological process term enriched among genes in that module.

    Let Inline graphic (the candidate FISHNET genes) be the set of genes satisfying these criteria when using the true ranks.

  4. When using permutation-based hypothesis test

Inline graphic = 0.05 × number of genes.

While Inline graphic ≥ 10, do:

Inline graphic = the set of top Inline graphic genes with the smallest P-values.

For Inline graphic = 0 to M:

  1. 5)
    graphic file with name DmEquation3.gif

    # Compute the expected FDR for genes above rank Inline graphic

  2. 6)
    graphic file with name DmEquation4.gif

    Inline graphic = quantile of |C0,j| in |C1,j|, |C2,j|, |C3,j|, …., |CM,j|.

    If Inline graphic <= 0.05 and Inline graphic > = 0.99:

    Return FISHNET genes = A0 ∩ Bj.

    Else: Inline graphic = Inline graphic – 10.

Results

We developed the FISHNET algorithm to identify replicable gene-trait relationships missed by standard association analyses. It works by combining gene-level summary statistics with prior biological knowledge encoded in gene–gene interaction networks and gene function annotations. Among the genes with suggestive/significant P-values, FISHNET prioritizes genes that cluster in network modules and share common biological functions (see Fig. 1 and Methods). To evaluate FISHNET, we applied it to GWAS gene-level summary statistics from Choobdar et al. 2019 [14] (Supplementary Text 1, Supplementary Table S1) and TWAS summary statistics from the LLFS (Supplementary Text 2) and FHS cohorts.

Finding Significant Hits in Networks performance varies across networks and modularization algorithms

FISHNET was applied to 23 GWAS discovery-replication summary statistics pairs. Each pair was analyzed in combination with nine sets of network modules obtained by applying three modularization algorithms to three networks. Specifically, algorithms based on kernel clustering, modularity optimization, or random walk were applied to the STRING functional PPI network [26], InWeb physical PPI network [27], and a gene co-expression network [28, 29]. Characteristics of the module sets and associated GO terms are provided in Supplementary Table S6.

Each FISHNET output gene (hit) from each run can be uniquely identified by four factors: the gene, network, modularization algorithm, and trait used in the run. For convenience, we refer to these unique identifiers as genequads (Fig. 2A). If a genequad is identified by FISHNET, we refer to such genequad as a FISHNET quad. Each network was evaluated on the reproducibility and replicability of all FISHNET quads based on that network (unioned over the other three factors). Likewise, modularization algorithms were evaluated on all FISHNET quads based on that algorithm, unioned over other factors. A FISHNET quad from a discovery set is reproduced if it is also a FISHNET quad in the replication set; it is replicated if a hit identified in the discovery set is Bonferroni significant in the replication set.

Figure 2.

Alt text: A multi-panel figure consisting of a flow diagram and bar charts. The flow diagram shows modularization algorithms applied to networks, while the bar charts compare replication and reproducibility percentages across different combinations.

Replication and reproducibility rates across networks and modularization algorithms using 23 pairs of GWAS discovery and replication summary statistics. (A) Three modularization algorithms are applied to three networks to obtain nine sets of gene modules, each fed into FISHNET with gene-trait summary statistics. (B) GEO co-expression outperforms other networks in terms of replicability and reproducibility, while STRING functional PPI performs worst in both metrics. (C) Kernel clustering achieves the best balance of replication and reproducibility, while modularity optimization performs best in replication, and random walk performs best in reproducibility.

Across 23 GWAS discovery datasets, three networks, and three modularization algorithms, FISHNET identified 375 Genequads, of which 162 (43.2%) reproduced and 200 (53.3%) replicated (Supplementary Table S3). Among networks, the GEO co-expression network had the best reproducibility and replication rate, followed by InWeb PPI (Fig. 2B). Among modularization algorithms, random walk had the best reproducibility rate and modularity optimization had the best replication rate (Fig. 2C). Kernel clustering had the best performance when both replicability and reproducibility were considered. The analyses below are based on the two best-performing networks, InWeb PPI and GEO co-expression, and the best modularization algorithm, kernel clustering.

For each network and modularization algorithm pair, we examined whether network characteristics (number of modules, average module size, number of unique genes in modules) affected the number of FISHNET quads. Across all network and modularization algorithm combinations, none of these characteristics showed a significant association with the number of FISHNET quads (Supplementary Table S6).

Finding Significant Hits in Networks identifies replicable gene-trait relationships missed by association analyses

We previously published results from association analyses of blood gene expression levels with 11 cardiovascular risk traits in the LLFS cohort [8] . For the current paper, we applied FISHNET to those summary statistics to identify replicable gene-trait relationships that did not reach significance thresholds. Across 11 traits, FISHNET identified 287 unique gene-trait relationships, 34 of which replicated in the FHS cohort (Table 1, Supplementary Table S4). Nineteen of the 34 were not Bonferroni significant in the LLFS cohort (Table 1, Supplementary Table S8). For pulse and Forced Vital Capacity (FVC) there were no significant hits in the original study, but both have one replicated FISHNET gene. For BMI, HDL, and Triglycerides, FISHNET identified 15 replicated gene-trait relationships that were genome-wide significant in the original studies and 17 that were not.

Table 1.

FISHNET identified replicated gene-trait relationships across five traits in LLFS, including both genes that meet the genome-wide significance threshold and those that do not.

Genes Trait Bon. Significant in LLFS (P ≤ 3.5 × 10−6)
PTGER2, FFAR2, CX3CR1, HCK, BIRC3, UBE2J1, DUSP2 BMI Yes
TNF, LTB, FLT3, CD22, CLEC4E, MMP8, TLR5, CCR6, SERPINA1, EDA, IL18R1, PEA15 BMI No
MS4A3, SREBF2, GATA2 HDL Yes
CD180, GCA, RUNX1 HDL No
Pulse Yes
ITGAM Pulse No
TCN1, MS4A3, SREBF2, RUNX1,GATA2 Triglycerides Yes
LTB, CD244 Triglycerides No
FVC Yes
CD1D FVC No

FISHNET identified an association between expression of LTB, the gene encoding lymphotoxin-beta, and BMI, which is strongly supported by experimental studies in mice. LTB, a member of the tumor necrosis factor family, regulates immune responses through the activation of the LTβR pathway [30, 31]. Inactivation of this pathway in Ltβr−/− mice confers resistance to diet-induced obesity, possibly through its effects on the gut microbiota [32]. The direction of this effect is consistent with our finding that expression of LTB is positively associated with higher BMI in humans. The mouse experiment supports the possibility that higher expression of LTB increases BMI by activating the LTβR pathway. Given that cytokine-mediated immune responses are key mechanisms in obesity-induced inflammation [33, 34], obesity may also induce expression of LTB in a positive feedback loop. One previous study showed upregulation of LTB in peripheral blood monocytes of 14 mildly obese Korean men but not in 12 moderately obese men [35]. However, there is no previous evidence for association between the expression level of LTB and BMI in the TWAS Atlas [36], nor is there any previous genetic evidence linking LTB to BMI in humans in the GWAS Catalog [37]. Although LTB did not reach genome-wide significance for BMI in our TWAS, FISHNET supported an association because: (i) LTB is in the same module of the co-expression network with other genes that show evidence of association with BMI, including SERPINA1, HCK, and CX3CR1, and (ii) LTB is annotated with GO terms involving cytokine production that are over-represented among genes in that module (Fig. 3A). There is no guarantee that genes with similar expression patterns will be involved in the same molecular processes, but the combined FISHNET criteria highlighted a relationship between LTB and CX3CR1, both of which are involved in immune signaling. Supporting this, LTB/LTβR signaling has been shown to induce expression of CX3CL1, a pro-inflammatory chemokine ligand for CX3CR1, thereby linking LTB activity to CX3CR1-mediated pro-inflammatory pathways [38].

Figure 3.

Alt text: Two panels showing the network sub-module diagram of FISHNET gene interactions within biological network sub-modules. Panel A displays a co-expression sub-module for BMI, illustrating the connectivity between specific nodes in the network. Panel B displays a protein–protein interaction (PPI) sub-module for HDL, illustrating the connections between FISHNET genes and mediator genes.

FISHNET genes interact with genome-wide significant genes in significant network modules. (A) A significant co-expression module for BMI contains 4 replicated FISHNET genes. Two of these (HCK and CX3CR1) are Bonferroni-significant in LLFS and two are not (LTB and SERPINA1). LTB directly interacts with CX3CR1 and HCK and, like CX3CR1, participates in cytokine production and cytokine biosynthetic processes. (B) A significant InWeb PPI module for HDL contains a replicated FISHNET gene (RUNX1) with 2 Bonferroni-significant and replicated FISHNET genes (GATA2, SREBF2). RUNX1 participates with GATA2 in the regulation of hemopoiesis and stem cell differentiation. RUNX1 is connected to both SREBF2 and GATA2 via two mediator genes.

FISHNET also identified a novel, positive association between RUNX1, the gene encoding the Runt-related transcription factor (TF) 1, and HDL. This finding is strongly supported by experimental evidence highlighting RUNX1’s role in promoting white fat browning, which in turn increases HDL cholesterol levels. The RUNX1 protein promotes white fat browning by binding to the promoters of the brown-adipose-tissue-specific genes Pgc-1α and Ucp-1 in inguinal white adipose tissue (iWAT) [39]. Deletion of RUNX1 (in mice lacking CDK6) reduces the expression level of Pgc-1α and Ucp-1, consequently inhibiting white-fat browning [39]. Furthermore, inducing white fat browning via Kaempferol (KPF) treatment in mice under both a high-fat diet and normal chow diet enhances RUNX1 protein levels and increases the expression of Pgc-1α and Ucp-1 in iWAT [40]. Chemical induction of white fat browning via KPF treatment has also been directly linked to the change in HDL-cholesterol levels. Specifically, when white fat browning is induced in mice with hypertriglyceridemia (Apoa5−/−), the cholesterol shifts from the triglycerides-rich lipoproteins to HDL, increasing the amount of HDL-cholesterol [41]. The role of RUNX1 in promoting HDL levels in mice is consistent with our finding that higher expression of RUNX1 is associated with higher HDL levels in humans.

Module significance analysis links RUNX1 to cholesterol production via interactions with SREBF2. A physical PPI module containing RUNX1 and SREBF2 is significantly enriched for genes with low P-values for association with HDL (Fig. 3B). RUNX1 is connected to SREBF2 via CEBPB and SRF (Fig. 3B). SREBF2 upregulates genes involved in cholesterol production, including HMG-CoA reductase, the target of the cholesterol-lowering statin drug family. Statins lower cholesterol by inhibiting HMG-CoA reductase [42, 43]. Interestingly, an intronic variant (rs2834707) in RUNX1 has been previously associated with HDL in the GWAS Catalog [37]. However, there is no prior association of RUNX1 expression with HDL in the TWAS Atlas [36].

Finding Significant Hits in Networks genes replicate at a higher rate across P-value and false discovery rate thresholds

We compared the replication rate of all genes satisfying different P-value thresholds with the replication rate of FISHNET genes above the same thresholds. In the summary statistics from TWAS of 11 cardiovascular risk traits in the LLFS cohort, the FISHNET genes had a higher replication rate than all genes satisfying the same P-value thresholds (Fig. 4A). Specifically, we set a series of P-value thresholds, starting with the genome-wide, Bonferroni-corrected significance threshold (p ≤ 3.5 × 10−6) and increasing by factors of 10 until we reached p ≤ 3.5 × 10−2. Genes were categorized into cumulative bins such that each bin includes all genes satisfying its threshold as well as those satisfying any lower thresholds. The replication rate of FISHNET genes at p ≤ 3.5 × 10−5, a factor of ten more liberal than the Bonferroni criterion, was similar to that of all Bonferroni significant genes. More generally, the replication rate of FISHNET genes at each threshold was similar to that of non-FISHNET genes at a one-order-of-magnitude more stringent threshold, a trend observed across all thresholds (Fig. 4A). The same analysis in 23 GWAS discovery datasets showed an even better result (Fig. 4B). Among genome-wide significant genes (p ≤ 3.5 × 10−6), those that were also FISHNET genes had a higher replication rate than the rest. We also compared the replication rate of all genes satisfying different FDR thresholds with that of FISHNET genes satisfying the same thresholds (Fig. 4C and D). In the LLFS data, the replication rate of FISHNET genes at each threshold was comparable to that of non-FISHNET genes at an FDR threshold four times more stringent (see the two bars in Fig. 4C). The results from GWAS gene-level summary statistics are even stronger (Fig. 4D). The evidence from both datasets suggests relaxing the FDR threshold by a factor of 4 for FISHNET genes will not compromise the replication rate.

Figure 4.

Alt text: Four line graphs comparing FISHNET genes to a baseline. The y-axis shows replication percentages, and the x-axis shows P-value or FDR thresholds, with FISHNET genes consistently maintaining higher replication rates.

Replication percentage across P-value and FDR thresholds. The x-axis shows different P-values and FDR thresholds. The y-axis shows the percentage of genes satisfying the indicated threshold that replicated (A) the replication percentage across P-value thresholds in the LLFS cohort (genome-wide significance threshold: p ≤ 3.5 × 10−6). (B) The replication percentage across P-value thresholds in the GWAS summary datasets (genome-wide significance threshold: p ≤ 2.9 × 10−6). (C) and (D) The replication percentage across FDR thresholds in the LLFS cohort and GWAS summary datasets, respectively. Across all P-value and FDR thresholds, FISHNET genes have a better replication rate than all genes satisfying the threshold.

Finding Significant Hits in Networks is not suitable for co-expression networks built from the same expression data used to generate the P-values

The intuition behind FISHNET is that, under the null hypothesis, gene P-values should be randomly distributed across the network. We hypothesized that this might not be the case when the gene P-values come from association of gene expression levels with traits and the network is generated from the same gene expression data. The reason is that genes in modules of co-expression networks are expected to have similar expression patterns in the data used to generate the network. If the same data are used for association with traits, genes with similar expression patterns across participants can be expected to have similar P-values. Therefore, genes in the same module may have similar P-values, even when those P-values are large and the genes are therefore unlikely to be associated with the trait. To test this hypothesis, we built a co-expression network from the 1810 LLFS gene expression samples by using GENIE3 (https://github.com/vahuynh/GENIE3) [44] (LLFS-net). Modules were identified using the K1 method and submitted to FISHNET, along with P-values for association of the same gene expression levels with 11 cardiovascular risk traits. Across the 11 traits, FISHNET identified 162 enriched module-trait pairs, compared to 38 for the co-expression network based on independent GEO data. A histogram of P-values across modules showed a U-shaped distribution with substantial over-representation of P-values near 1.0 (Fig. 5A), consistent with genes with large P-values (bottom ranked) clustering in network modules. The same histogram from the GEO network showed much less over-representation or large P-values (Fig. 5B). To further investigate this, we reversed the ranking of genes by P-values, so that the most significant genes had the largest P-values (bottom ranked) and the least significant ones had the smallest P-values (top ranked). Genes that are not truly associated with the trait, which have now been reassigned the smallest P-values, should not cluster into modules and so no significant modules should be found. However, when the modules were from LLFS-net, 5 significant modules were found after P-value reversal; when they were from the GEO network built from independent data, no significant modules were found. The FISHNET software outputs the number of significant modules after rank reversal as a diagnostic evaluation of the modules’ suitability for the summary statistics. We also developed an empirical statistical test, as an alternative to the number of significant modules upon rank reversal, to quantify the dependency between the summary statistics and network modules. The test is described in Online Supplement and the results of applying it to our data are shown in Supplementary Table S7.

Figure 5.

Alt text: Two side-by-side frequency histograms. Histogram A shows a U-shaped distribution of module P-values for the LLFS co-expression network. Histogram B shows the distribution for the GEO co-expression network.

Comparison of module P-value distributions across two types of co-expression networks using LLFS TWAS summary statistics as input. (A) The distribution of module P-values from the LLFS co-expression network. The distribution has a prominent U-shape in this case. (B) The distribution of module P-values from the GEO co-expression network.

An alternative thresholding mechanism allows more control over Finding Significant Hits in Networks false discovery rate

When using empirical hypothesis testing, FISHNET only outputs a gene set if it can identify one that meets its thresholds on permutation-based FDR and quantile (Fig. 1). To search for such a set, it first identifies the set Inline graphic of all genes that (i) are in a module that is significantly enriched for low P-values and (ii) are annotated with a GO biological process term that is enriched among genes in that module. It then intersects Inline graphic with Inline graphic, the set of all genes with the Inline graphic smallest P-values, where the default value of Inline graphic is 5% of the total number of genes, and tests Inline graphic to see whether it satisfies the FDR and quantile criteria. In all but one of the FISHNET runs reported above in which Inline graphic is not empty, Inline graphic does satisfy the criteria. But when it does not, FISHNET tests Inline graphic until it finds an intersection that satisfies the criteria. In the run on male waist circumference, this process was necessary and resulted in a satisfactory intersection, Inline graphic. Figure 6A shows the empirical FDR as a function of the percentage of genes included in Inline graphic for this trait, and in this case the line is monotonically decreasing as Inline graphic gets smaller. However, there is no guarantee that it will be—Fig. 6A also shows the FDR line for triglycerides, which bounces up and down with no significant trend. It is therefore possible that the FDR could fail for all values of Inline graphic. To increase the likelihood of identifying a non-empty subset of Inline graphic that satisfies the criteria, we implemented an alternative approach.

Figure 6.

Alt text: Two line graphs demonstrating change in FDR as a function of two thresholding mechanisms. Panel A plots the top X% percentage of genes selected against FDR for waist circumference and triglycerides. Panel B shows the relationship between different module-level P-values and FDR and for an alternative mechanism.

(A) The change in FDR as a function of top X% genes selected in the thresholding mechanism based on iteratively removing genes with the largest P-values for GWAS on male waist circumference and triglycerides levels. (B) The change in FDR as a function of module P-values in the alternative thresholding mechanism.

In the alternative approach, instead of removing genes with the largest P-values, we iteratively reduce the threshold on the P-value for modules, which is returned by PASCAL (module-based filter, Supplementary Fig. S1). Starting with modules that had Bonferroni-adjusted P-values ≤0.1, the candidate FISHNET gene set was iteratively reduced to genes within modules meeting increasingly stringent thresholds until the FDR and percentile criteria were met. Notably, using the module-based filter decreased FDR monotonically as module P-value thresholds became more stringent (Fig. 6B). On the LLFS TWAS summary statistics, this alternative method performed comparably to the original FISHNET pipeline in terms of replication rates (Supplementary Fig. S2A and B). On the 23 GWAS summary datasets from Choobdar et al. 2019 [14]. FISHNET genes obtained by the alternative method (module-based filter) showed slightly lower replication rates compared to other genes with the smallest gene P-values while showing substantially higher replication rates for genes with larger P-values (S3A and B).

Overall, iteratively removing genes with large P-values outperformed the module-based filter across two datasets and ensured that FISHNET genes met the FDR and percentile rank criteria, establishing it as the preferred method in this work.

Discussion

Multiple testing correction approaches such as Bonferroni adjustment [1, 2] and FDR control [3, 4] reduce false positives at the cost of increasing false negatives. FISHNET offers a way to relax significance thresholding while maintaining the rate of replication in independent cohorts. Across 11 cardiovascular risk traits in the LLFS cohort, FISHNET identified 34 replicable gene-trait relationships, 19 of which were not genome-wide significant in TWAS according to standard thresholds.

A robust, multi-purpose tool for gene prioritization

An easy-to-install, easy-to-use implementation of FISHNET is available at https://brentlab.github.io/fishnet/. Given gene-level summary statistics and modules derived from a gene–gene interaction network, it generates (i) a prioritized list of FISHNET genes, (ii) a diagnostic evaluation of the modules’ suitability for the summary statistics, and (iii) a list of significant network modules along with enriched GO terms associated with module genes. FISHNET also outputs expected FDR and the quantile of the number of candidate FISHNET genes identified based on permutation analysis. By default, FISHNET accepts candidate genes with FDR ≤ 0.05 and quantile ≥99%, but users can change this to meet the needs of their application. Users can also customize the pipeline by defining the initial threshold on gene-trait P-values required for genes to be considered (default: top 5% of genes, ranked by P-values).

To assess FISHNET’s sensitivity to the choice of permutation-based thresholds, we varied the FDR cutoff from 0.01 to 0.10 and the quantile cutoff from 0.80 to 0.99 (see Online Supplement for details). Across this entire range of cutoffs, the total number of FISHNET genes barely changed (Supplementary Table S5), confirming the robustness of FISHNET. This is expected because FISHNET internally uses PASCAL to identify modules that contain a statistically significant clustering of genes with low P-values from gene-trait associations. It selects modules only if their module P-values, which PASCAL calculates using a chi-squared test, are less than 0.05 after Bonferroni correction for the number of modules in the network. Because this threshold is stringent, PASCAL rarely returns modules in which the clustering of low gene-level P-values occurs by chance. Thus, random permutation runs tend to return no significant modules, so the default thresholds on the permutation-based significance criteria are usually satisfied. The number of FISHNET genes is typically reduced only when users set thresholds that are much more stringent than the defaults, such as FDR < 0.01. The number of FISHNET genes also remained stable across different numbers of random permutations from 200 to 5000, indicating that the default setting of 200 permutations in FISHNET software is enough to get stable results.

Relationship to other methods for post-processing genome-wide association studies results

The goal of FISHNET is to identify a small set of exceptional, trait-influencing genes and to recommend them for replication studies or other follow-ups, even when their P-values from gene-trait association are not significant by conventional criteria. One step in the FISHNET process is to identify network modules that are enriched for genes with low P-values. FISHNET uses a method called PASCAL to do this, but there are many possible alternatives, including gene set enrichment analysis [45], over representation analysis, and other methods reviewed in ref. [46]). Some of these methods could be substituted for PASCAL, but they are not comparable to FISHNET because they do not recommend individual genes for follow-up.

FISHNET makes use of P-values from gene-trait association together with external information about genes—gene–gene networks and curated gene function annotations. Many other methods aim to increase the power of GWAS by incorporating external information about individual SNPs, including SKAT [47], STAAR [48], and other methods reviewed in ref. [49]. Because these methods focus on individual SNPs, they cannot use the gene-level information that FISHNET uses. They also cannot be applied in situations where there are gene-level P-values but no SNP-level P-values, such as TWAS with measured gene expression levels or differential expression analysis. Two other methods, stratified FDR [50] and functional FDR [51], can be applied to individual SNPs or genes. However, the external information stratified FDR uses consists of non-overlapping gene categories, usually defined by ranges of a numerical variable. Functional FDR uses a numerical variable directly. Neither method can use the gene–gene networks or biological function annotations that FISHNET uses.

FISHNET also improves on traditional network-based methods for association analysis in both methodology and evaluation. One of the most popular approaches is to select previously known trait-associated genes as seed nodes and propagate their scores across local neighborhoods in the network to predict new gene-trait relationships [52, 53]. This approach creates a bias against detecting genes that affect the trait via pathways that are different from those of known genes. FISHNET uses all gene-level P-values, which inherently reduces pathway bias, and it identifies modules that are significantly enriched for genes with low P-values, even if those genes are far from significant when considered individually. Furthermore, FISHNET incorporates GO biological process annotations to identify genes that (i) interact in significant network modules and (ii) participate in a common biological process. In terms of evaluation, the most common evaluation metrics to validate network-based approaches are leave-one-out methodology [53, 54] and validation with genes from published drug target databases [55]. These metrics test the methods’ ability to perform well in recovering known gene-trait relationships. However, the ultimate goal is to uncover novel gene-trait relationships. To achieve this, we evaluated FISHNET by comparing the replication rate of FISHNET genes against those meeting standard Bonferroni or FDR thresholds.

Findings from Finding Significant Hits in Networks and recommendations

Our results suggest best practices for using FISHNET. Among the modularization algorithms tested, we recommend the kernel clustering method K1 [14] when prioritizing both reproducibility and replicability. The modularity optimization method, M2, performed best in replicability, while the random walk approach performed best in reproducibility (Fig. 2B). As an alternative, one could use several modularization algorithms. The analyst’s goal determines how to combine the results: taking the union of the identified genes will yield a broader set, whereas taking the intersection will produce a more conservative, core set of high-confidence genes. While we did not evaluate the replicability and reproducibility of FISHNET genes as a function of the number of modularization algorithms used, all three algorithms we tested performed well in at least one evaluation metric. This suggests that combining results from multiple algorithms is a promising strategy.

Among networks, FISHNET performed well on the InWeb physical PPI and GEO co-expression networks, but poorly on the STRING functional PPI network (Fig. 2A). We recommend using networks based on physical PPI and co-expression interactions over those based on other functional interactions. We also caution against using the same dataset to generate summary statistics and construct gene-based networks, as this can lead to unreliable FISHNET gene-trait relationships (Fig. 5A and B). Based on the results shown in Fig. 4, we recommend using a slightly relaxed Bonferroni or FDR-based threshold and prioritizing FISHNET genes that meet these relaxed criteria alongside the genes that are significant under the original threshold. This recovers gene-trait relationships missed by standard thresholds while maintaining the replication rate.

Limitations and opportunities for broader applications

There are many potential use cases for FISHNET that have not yet been validated. First, the gene-level P-values can come from any source, including RNA-Seq experiments (such as comparing cells treated with a drug to untreated cells) and CRISPR screens to identify genes that affect cellular traits [56, 57]. Second, FISHNET has been tested only on specific gene–gene networks, but others might give better, worse, or complementary results. For example, FISHNET was tested on co-expression networks, but these are neutral as to the molecular mechanism that causes each pair of linked genes to have similar expression patterns. A mechanistic alternative is networks that link TFs to their direct targets [58–61]. Modules from mechanistic gene regulatory networks could elucidate the specific TFs mediating gene–trait relationships. Third, FISHNET has only been tested on modules identified by three algorithms. Since different traits and networks performed optimally with modules from different algorithms, it might be valuable to test a broader range of modularization strategies. More generally, FISHNET uses only the set of genes in each module, not its internal connectivity. Thus, biologically coherent gene sets from any source could be used. For example, the sets of genes directly regulated by each TF could be used, in which case no modularization algorithm is required. Fourth, the final stage of FISHNET uses GO Biological Process terms to identify biological functions enriched among genes in significant modules, but gene sets from other sources could be used, too. For example, gene sets defined by drug target discovery databases such as DrugBank [62, 63], DisGeNET [64], and Open Targets [65, 66] could provide complementary insights and reveal new genes. The versatility of our implementation allows users to try out any of these sources for gene-level P-values, module gene sets, and functional gene sets. Future work validating FISHNET with new knowledge sources will greatly expand its applicability.

Key Points

  • Finding Significant Hits in Networks (FISHNET) integrates gene-level summary statistics with prior biological knowledge from network modules and functional annotations, and is specifically designed to identify replicable gene-trait associations that may be missed by traditional genome-wide significance thresholds.

  • FISHNET identified gene-trait associations that replicate at higher rates than other associations with similar P-values, enabling relaxation of significance thresholds without reducing replication rate.

  • Applied to the Long Life Family Study cohort, FISHNET identified 17 replicated gene-trait associations across 11 cardiovascular risk traits that were missed by transcriptome-wide association study and Bonferroni-corrected thresholds.

  • FISHNET software is freely available at https://brentlab.github.io/fishnet/.

Supplementary Material

Supplementary_materials_bbag061

Acknowledgements

We are grateful to the entire Long Life consortium, its participants, and its investigators, without whom this work would not have been possible. This work was supported by grant AG063893 from the National Institute on Aging.

The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195, HHSN268201500001I and 75N92019D00031). Framingham Heart Study data were obtained from db-gap accessions phs000007.v35.p16.c1, phs000007.v35.p16.c2, phs000974.v6.p5.c1, phs000974.v6.p5.c2. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI.

Contributor Information

Sandeep Acharya, Division of Computational and Data Sciences, Washington University, 1 Brookings Dr, St. Louis, MO 63130, United States.

Vaha Akbary Moghaddam, Division of Statistical Genomics, Washington University School of Medicine, 4515 McKinley Ave, St. Louis, MO 63110, United States.

Wooseok J Jung, Department of Computer Science and Engineering, Washington University, 1 Brookings Dr, St. Louis, MO 63130, United States.

Yu S Kang, Department of Computer Science and Engineering, Washington University, 1 Brookings Dr, St. Louis, MO 63130, United States.

Shu Liao, Department of Computer Science and Engineering, Washington University, 1 Brookings Dr, St. Louis, MO 63130, United States.

Michael A Province, Division of Statistical Genomics, Washington University School of Medicine, 4515 McKinley Ave, St. Louis, MO 63110, United States.

Michael R Brent, Department of Computer Science and Engineering, Washington University, 1 Brookings Dr, St. Louis, MO 63130, United States.

Data and code availability

The FISHNET pipeline is available at https://brentlab.github.io/fishnet/. The gene-level summary statistics from association analyses based on the LLFS cohort and the 2019 Disease Module Identification DREAM Challenge, along with the corresponding FISHNET outputs, are available in the Supplementary Files. The datasets used to generate the LLFS summary statistics and the inputs and outputs from FHS association analyses have not been deposited in public repositories due to data use constraints.

References

  • 1. Abdi  HE. Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: SAGE Publications, 2007. [Google Scholar]
  • 2. Rice  TK, Schork  NJ, Rao  DC. Methods for handling multiple testing. Adv Genet  2008;60:293–308. 10.1016/S0065-2660(07)00412-9 [DOI] [PubMed] [Google Scholar]
  • 3. Korthauer  K, Kimes  PK, Duvallet  C. et al.  A practical guide to methods controlling false discoveries in computational biology. Genome Biol  2019;20:118. 10.1186/s13059-019-1716-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Benjamini  Y, Drai  D, Elmer  G. et al.  Controlling the false discovery rate in behavior genetics research. Behav Brain Res  2001;125:279–84. 10.1016/S0166-4328(01)00297-2 [DOI] [PubMed] [Google Scholar]
  • 5. Santanasto  AJ, Acharya  S, Wojczynski  MK. et al.  Whole genome linkage and association analyses identify DLG associated Protein-1 as a novel positional and biological candidate gene for muscle strength: the long life family study. J Gerontol A Biol Sci Med Sci  2024;79:glae144. 10.1093/gerona/glae144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Wang  L, Wang  S, Anema  JA. et al.  Novel loci for triglyceride/HDL-C ratio longitudinal change among subjects without T2D. J Lipid Res  2024;66:100702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Feitosa  MF, Lin  SJ, Acharya  S. et al.  Discovery of genomic and transcriptomic pleiotropy between kidney function and soluble receptor for advanced glycation end products using correlated meta-analyses: the long life family study. Aging Cell  2024;23:e14261. 10.1111/acel.14261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Acharya  S, Liao  S, Jung  WJ. et al.  A methodology for gene level omics-WAS integration identifies genes influencing traits associated with cardiovascular risks: the long life family study. Hum Genet  2024;143:1241–52. 10.1007/s00439-024-02701-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Montano  C, Taub  MA, Jaffe  A. et al.  Association of DNA methylation differences with schizophrenia in an epigenome-wide association study. JAMA Psychiatry  2016;73:506–14. 10.1001/jamapsychiatry.2016.0144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Akbary Moghaddam  V, Acharya  S, Schwaiger-Haber  M. et al.  Gene-Embedded Multi-Modal Networks for Population-Scale Multi-Omics DiscoveryGene-Embedded Multi-Modal Networks for Population-Scale Multi-Omics Discovery. bioRxiv. 2025.
  • 11. Gene Ontology Consortium, Aleksander  SA, Balhoff  J. et al.  The gene ontology knowledgebase in 2023. Genetics  2023;224:iyad031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Lamparter  D, Marbach  D, Rueedi  R. et al.  Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput Biol  2016;12:e1004714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Wang  J, Vasaikar  S, Shi  Z. et al.  WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res  2017;45:W130–7. 10.1093/nar/gkx356 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Choobdar  S, Ahsen  ME, Crawford  J. et al.  Assessment of network module identification across complex diseases. Nat Methods  2019;16:843–52. 10.1038/s41592-019-0509-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Splansky  GL, Corey  D, Yang  Q. et al.  The third generation cohort of the National Heart, Lung, and Blood Institute's Framingham heart study: design, recruitment, and initial examination. Am J Epidemiol  2007;165:1328–35. [DOI] [PubMed] [Google Scholar]
  • 16. Kannel  WB, Feinleib  M, McNamara  P. et al.  An investigation of coronary heart disease in families. The Framingham offspring study. Am J Epidemiol  1979;110:281–90. [DOI] [PubMed] [Google Scholar]
  • 17. Newman  AB, Glynn  NW, Taylor  CA. et al.  Health and function of participants in the long life family study: a comparison with other cohorts. Aging (Albany NY)  2011;3:63–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Wojczynski  MK, Jiuan Lin  S, Sebastiani  P. et al.  NIA long life family study: objectives, design, and heritability of cross-sectional and longitudinal phenotypes. J Gerontol A Biol Sci Med Sci  2022;77:717–27. 10.1093/gerona/glab333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Barter  P, Gotto  AM, LaRosa  J. et al.  HDL cholesterol, very low levels of LDL cholesterol, and cardiovascular events. N Engl J Med  2007;357:1301–10. [DOI] [PubMed] [Google Scholar]
  • 20. Miller  M, Stone  NJ, Ballantyne  C. et al.  Triglycerides and cardiovascular disease: a scientific statement from the American Heart Association. Circulation  2011;123:2292–333. 10.1161/CIR.0b013e3182160726 [DOI] [PubMed] [Google Scholar]
  • 21. Flint  AJ, Rexrode  KM, Hu  FB. et al.  Body mass index, waist circumference, and risk of coronary heart disease: a prospective study among men and women. Obes Res Clin Pract  2010;4:e171–81. 10.1016/j.orcp.2010.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Ramalho  SHR, Shah  AM. Lung function and cardiovascular disease: a link. Trends Cardiovasc Med  2021;31:93–8. 10.1016/j.tcm.2019.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Korhonen  PE, Syvänen  KT, Vesalainen  RK. et al.  Ankle-brachial index is lower in hypertensive than in normotensive individuals in a cardiovascular risk population. J Hypertens  2009;27:2036–43. 10.1097/HJH.0b013e32832f4f54 [DOI] [PubMed] [Google Scholar]
  • 24. O’Connell, J. Mixed Model Analysis for Pedigrees and Populations (MMAP) [Github] 2017; Available from:  https://mmap.github.io/.
  • 25. van  Iterson  M, van Zwet  E., BIOS Consortium  et al.  Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol  2017;18:19. 10.1186/s13059-016-1131-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Szklarczyk  D, Franceschini  A, Wyder  S. et al.  STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res  2015;43:D447–52. 10.1093/nar/gku1003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Li  T, Wernersson  R, Hansen  RB. et al.  A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods  2017;14:61–4. 10.1038/nmeth.4083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Barrett  T, Wilhite  SE, Ledoux  P. et al.  NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res  2013;41:D991–5. 10.1093/nar/gks1193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Edgar  R, Domrachev  M, Lash  AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res  2002;30:207–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Brinkman  CC, Iwami  D, Hritzo  MK. et al.  Treg engage lymphotoxin beta receptor for afferent lymphatic transendothelial migration. Nat Commun  2016;7:12021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. van de  Pavert  SA, Mebius  RE. New insights into the development of lymphoid tissues. Nat Rev Immunol  2010;10:664–74. 10.1038/nri2832 [DOI] [PubMed] [Google Scholar]
  • 32. Upadhyay  V, Poroyko  V, Kim  TJ. et al.  Lymphotoxin regulates commensal responses to enable diet-induced obesity. Nat Immunol  2012;13:947–53. 10.1038/ni.2403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Schmidt  FM, Weschenfelder  J, Sander  C. et al.  Inflammatory cytokines in general and central obesity and modulating effects of physical activity. PloS One  2015;10:e0121971. 10.1371/journal.pone.0121971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Rodriguez-Hernandez  H, Simental-Mendía  LE, Rodríguez-Ramírez  G. et al.  Obesity and inflammation: epidemiology, risk factors, and markers of inflammation. Int J Endocrinol  2013;2013:678159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Jung  UJ, Seo  YR, Ryu  R. et al.  Differences in metabolic biomarkers in the blood and gene expression profiles of peripheral blood mononuclear cells among normal weight, mildly obese and moderately obese subjects. Br J Nutr  2016;116:1022–32. 10.1017/S0007114516002993 [DOI] [PubMed] [Google Scholar]
  • 36. Lu  M, Zhang  Y, Yang  F. et al.  TWAS atlas: a curated knowledgebase of transcriptome-wide association studies. Nucleic Acids Res  2023;51:D1179–87. 10.1093/nar/gkac821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Sollis  E, Mosaku  A, Abid  A. et al.  The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res  2023;51:D977–85. 10.1093/nar/gkac1010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Lotzer  K, Döpping  S, Connert  S. et al.  Mouse aorta smooth muscle cells differentiate into lymphoid tissue organizer-like cells on combined tumor necrosis factor receptor-1/lymphotoxin beta-receptor NF-kappaB signaling. Arterioscler Thromb Vasc Biol  2010;30:395–402. 10.1161/ATVBAHA.109.191395 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Hou  X, Zhang  Y, Li  W. et al.  CDK6 inhibits white to beige fat transition by suppressing RUNX1. Nat Commun  2018;9:1023. 10.1038/s41467-018-03451-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Zhang  X, Hou  X, Xu  C. et al.  Kaempferol regulates the thermogenic function of adipocytes in high-fat-diet-induced obesity via the CDK6/RUNX1/UCP1 signaling pathway. Food Funct  2023;14:8201–16. 10.1039/d3fo00613a [DOI] [PubMed] [Google Scholar]
  • 41. Bartelt  A, John  C, Schaltenberg  N. et al.  Thermogenic adipocytes promote HDL turnover and reverse cholesterol transport. Nat Commun  2017;8:15010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Horton  JD, Goldstein  JL, Brown  MS. SREBPs: activators of the complete program of cholesterol and fatty acid synthesis in the liver. J Clin Invest  2002;109:1125–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Istvan  E. Statin inhibition of HMG-CoA reductase: a 3-dimensional view. Atheroscler Suppl  2003;4:3–8. [DOI] [PubMed] [Google Scholar]
  • 44. Huynh-Thu  VA, Irrthum  A, Wehenkel  L. et al.  Inferring regulatory networks from expression data using tree-based methods. PloS One  2010;5:e12776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Subramanian  A, Tamayo  P, Mootha  VK. et al.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A  2005;102:15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Chimusa  ER, Dalvie  S, Dandara  C. et al.  Post genome-wide association analysis: dissecting computational pathway/network-based approaches. Brief Bioinform  2019;20:690–700. 10.1093/bib/bby035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Wu  MC, Lee  S, Cai  T. et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet  2011;89:82–93. 10.1016/j.ajhg.2011.05.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Li  X, Li  Z, Zhou  H. et al.  Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet  2020;52:969–83. 10.1038/s41588-020-0676-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Moore  A, Marks  JA, Quach  BC. et al.  Evaluating 17 methods incorporating biological function with GWAS summary statistics to accelerate discovery demonstrates a tradeoff between high sensitivity and high positive predictive value. Commun Biol  2023;6:1199. 10.1038/s42003-023-05413-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Sun  L, Craiu  RV, Paterson  AD. et al.  Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol  2006;30:519–30. [DOI] [PubMed] [Google Scholar]
  • 51. Chen  X, Robinson  DG, Storey  JD. The functional false discovery rate with applications to genomics. Biostatistics  2021;22:68–81. 10.1093/biostatistics/kxz010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Ata  SK, Wu  M, Fang  Y. et al.  Recent advances in network-based methods for disease gene prediction. Brief Bioinform  2020;22:bbaa303. [DOI] [PubMed] [Google Scholar]
  • 53. Visona  G, Bouzigon  E, Demenais  F. et al.  Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform  2024;25:bbae014. 10.1093/bib/bbae014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Zhang  H, Ferguson  A, Robertson  G. et al.  Benchmarking network-based gene prioritization methods for cerebral small vessel disease. Brief Bioinform  2021;22:bbab006. 10.1093/bib/bbab006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Picart-Armada  S, Barrett  SJ, Willé  DR. et al.  Benchmarking network propagation methods for disease gene identification. PLoS Comput Biol  2019;15:e1007276. 10.1371/journal.pcbi.1007276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Avey  D, Sankararaman  S, Yim  AKY. et al.  Single-cell RNA-seq uncovers a robust transcriptional response to morphine by glia. Cell Rep  2018;24:3619–3629.e4. 10.1016/j.celrep.2018.08.080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Lalli  MA, Avey  D, Dougherty  JD. et al.  High-throughput single-cell functional elucidation of neurodevelopmental disease-associated genes reveals convergent mechanisms altering neuronal differentiation. Genome Res  2020;30:1317–31. 10.1101/gr.262295.120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Abid  D, Brent  MR. NetProphet 3: a machine learning framework for transcription factor network mapping and multi-omics integration. Bioinformatics  2023;39:btad038. 10.1093/bioinformatics/btad038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Marbach  D, Lamparter  D, Quon  G. et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods  2016;13:366–70. 10.1038/nmeth.3799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Saha  A, Kim  Y, Gewirtz  ADH. et al.  Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res  2017;27:1843–58. 10.1101/gr.216721.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Gerstein  MB, Kundaje  A, Hariharan  M. et al.  Architecture of the human regulatory network derived from ENCODE data. Nature  2012;489:91–100. 10.1038/nature11245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Thorn  CF, Klein  TE, Altman  RB. PharmGKB: the pharmacogenomics Knowledge Base. Methods Mol Biol  2013;1015:311–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Wishart  DS, Knox  C, Guo  AC. et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res  2006;34:D668–72. 10.1093/nar/gkj067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Pinero  J, Ramírez-Anguita  JM, Saüch-Pitarch  J. et al.  The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res  2020;48:D845–55. 10.1093/nar/gkz1021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Koscielny  G, An  P, Carvalho-Silva  D. et al.  Open targets: a platform for therapeutic target identification and validation. Nucleic Acids Res  2017;45:D985–94. 10.1093/nar/gkw1055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Carvalho-Silva  D, Pierleoni  A, Pignatelli  M. et al.  Open targets platform: new developments and updates two years on. Nucleic Acids Res  2019;47:D1056–65. 10.1093/nar/gky1133 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_materials_bbag061

Data Availability Statement

The FISHNET pipeline is available at https://brentlab.github.io/fishnet/. The gene-level summary statistics from association analyses based on the LLFS cohort and the 2019 Disease Module Identification DREAM Challenge, along with the corresponding FISHNET outputs, are available in the Supplementary Files. The datasets used to generate the LLFS summary statistics and the inputs and outputs from FHS association analyses have not been deposited in public repositories due to data use constraints.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES