Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Mar 9;18(3):e1009935. doi: 10.1371/journal.pcbi.1009935

Urgent need for consistent standards in functional enrichment analysis

Kaumadi Wijesooriya 1, Sameer A Jadaan 2, Kaushalya L Perera 1, Tanuveer Kaur 1, Mark Ziemann 1,*
Editor: Melissa L Kemp3
PMCID: PMC8936487  PMID: 35263338

Abstract

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. To ascertain the frequency of these issues in the literature, we performed a screen of 186 open-access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. An extension of this survey showed that these problems are not associated with journal or article level bibliometrics. Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.

Author summary

Functional enrichment analysis is a commonly used technique to identify trends in large scale biological datasets. In biomedicine, functional enrichment analysis of gene expression data is frequently applied to identify disease and drug mechanisms. While enrichment tests were once primarily conducted with complicated computer scripts, web-based tools are becoming more widely used. Users can paste a list of genes into a website and receive enrichment results in a matter of seconds. Despite the popularity of these tools, there are concerns that statistical problems and incomplete reporting are compromising research quality. In this article, we conducted a systematic examination of published enrichment analyses and assessed whether (i) any statistical flaws were present and (ii) sufficient methodological detail is provided such that the study could be replicated. We found that lack of methodological detail and errors in statistical analysis were widespread, which undermines the reliability and reproducibility of these research articles. A set of best practices is urgently needed to raise the quality of published work.

Introduction

Since the turn of the millennium, high throughput “omics” techniques like microarrays and high throughput sequencing have brought with them a deluge of data. These experiments involve the measurement of thousands of genes simultaneously and can identify hundreds or even thousands of significant associations in a single experiment. Interpreting such data is extraordinarily challenging, as the sheer number of associations can be difficult to investigate in a gene-by-gene manner. Instead, many tools have been developed to summarize regulated gene expression profiles into simplified functional categories. These functional categories typically represent signaling or biochemical pathways, curated from information present in the literature, hence the name functional enrichment. The validity of functional enrichment analysis is dependent upon rigorous statistical methods as well as accurate and up-to-date gene functional annotations.

Two of the most frequently used databases of gene annotations are Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Both databases emerged around the time of public release of the first eukaryotic genomes, with the aim of systematically cataloging gene and protein function [1,2].

Widely used functional enrichment tools can be classified into two main categories; (i) over-representation analysis (ORA) and (ii) functional class scoring (FCS), and the most common application is in differential gene expression analysis. In ORA, differentially expressed genes (DEGs) meeting a significance and/or fold change threshold are queried against curated pathways (gene sets). A statistical test is performed to ascertain whether the number of DEGs belonging to a particular gene set is higher than that expected by random chance, as determined by comparison to a background gene list. These ORA tools can be stand-alone software packages or web services, and they use one or more statistical tests (eg: Fisher’s exact test, chi-square test) [3,4].

In the case of ORA for differential expression (eg: RNA-seq), a whole genome background is inappropriate because in any tissue, most genes are not expressed and therefore have no chance of being classified as DEGs. A good rule of thumb is to use a background gene list consisting of genes detected in the assay at a level where they have a chance of being classified as DEG [57]. Using the whole genome background gene list may be suitable in cases where all genes have the capacity of being detected, for example in studies of genetic variation (eg: [8]). However the problem becomes more acute when the proportion of measured genes/proteins is small, for example in proteomics and single-cell RNA-sequencing where only a few thousand analytes are detected.

FCS tools involve giving each detected gene a differential expression score and then evaluating whether the scores are more positive or negative than expected by chance for each gene set. The popular Gene Set Enrichment Analysis (GSEA) tool uses permutation approaches to establish whether a gene set is significantly associated with higher or lower scores, either by permuting sample labels or by permuting genes in the differential expression profile [9].

From a user’s perspective, ORA is easier to conduct because it is as simple as pasting a list of gene names into a text box on a website. FCS tools are more difficult to use but are reported to have superior sensitivity in detecting subtle associations [1013].

Although these are powerful tools to summarize complex genomics data, there are limitations. For example many ORA and FCS approaches assume independence between genes, which is problematic, as genes of the same functional category are somewhat more likely to have correlated gene expression [14]. There is an ongoing debate as to whether ignoring non-independence is a reasonable simplifying assumption in functional enrichment analysis [15,16]. Another issue is the reporting of statistically significant enrichments, where the observed effect size (enrichment score) is so small it is unlikely to have any meaningful biological effect [17].

Furthermore, there are concerns that enrichment tools are not being correctly used. Previous publications have warned that inappropriate background set selection heavily influences enrichment results [5,6]. Timmons et al [18] highlight two cases where an inappropriate background list led to invalid enrichment results in published articles.

While the nominal p-value from the gene set test is appropriate when a single set is examined, functional enrichment analysis typically involves hundreds to thousands of parallel tests, one for each gene set in the library (eg: MSigDB v7.4 library contains 32,284 sets [19]). False discovery rate (FDR) correction of enrichment p-values is therefore required to limit the number of false positives when performing so many concurrent tests [57,20].

Lack of methodological detail can severely weaken reproducibility. In 2001, minimum information about a microarray experiment (MIAME) guidelines were described [21] and rapidly adopted, resulting in higher reporting standards in publications describing microarray data. Reporting standards for computational biology have been proposed [22,23], but are not widely adopted. At a minimum, functional enrichment analysis reports should describe the methods used in such detail that the analysis could be replicated.

The main purpose of this work is to survey the frequency of methodological and reporting flaws in the literature, in particular; (i) inappropriate background gene set, (ii) lack of p-value adjustment for multiple comparisons and (iii) lack of essential methodological details. Secondly, we examine several RNA-seq datasets to evaluate the effect of such issues on the functional enrichment results obtained.

Results

Methodological and reporting deficiencies are widespread in articles describing functional enrichment analysis

A search of PubMed Central showed 2,941 open-access articles published in 2019 with the keywords “enrichment analysis”, “pathway analysis” or “ontology analysis”. From these, we randomly selected 200 articles for detailed methodological analysis. We excluded 14 articles from the screen because they did not present any enrichment analysis. Those excluded articles included articles describing novel enrichment analysis techniques or tools, review articles or conference abstracts. As some articles included more than one enrichment analysis, the dataset included 235 analyses from 186 articles; this data is available in S1 Table. A flow diagram of the survey is provided in Fig 1.

Fig 1. A summary of the survey of functional enrichment analyses.

Fig 1

The survey consists of two parts, with 200 and 1300 PMC articles considered, respectively.

There were articles from 96 journals in the sample, with PLoS One, Scientific Reports, and PeerJ being the biggest contributors (S1 Fig). There were 18 different omics types, with gene expression array and RNA-seq being the most popular (S1 Fig). There were 31 different species under study, but Homo sapiens was the most common with 157 analyses (S1 Fig).

We recorded the use of 26 different gene set libraries, with GO and KEGG being the most frequently used (Fig 2A). There were 14 analyses where the gene set libraries used were not defined in the article. Only 18 analyses reported the version of the gene set library used (Fig 2B). There were 12 different statistical tests used, and the most commonly reported tests were Fisher’s exact, GSEA and hypergeometric tests; but the statistical test used was not reported for most analyses (Fig 2C). Fourteen analyses did not conduct any statistical test, as they only showed the number of genes belonging to different sets. Out of the 221 analyses that performed a statistical test, only 119 (54%) described correcting p-values for multiple testing (Fig 2D).

Fig 2. Findings from the survey of published enrichment analyses.

Fig 2

(A) Representation of different gene set libraries. (B) Proportion of analyses reporting gene set library version information. (C) Representation of different statistical tests. (D) Proportion of analyses conducting correction of p-values for multiple comparisons. (E) Representation of different software tools for enrichment analysis. (F) Proportion of analyses that reported version information for software used. (G) Background gene set usage and reporting. (H) Proportion of scripted analyses that provided software code. (I) Proportion of analyses that provided gene profiles. (J) Proportion of analyses with different methodological problems. (K) Reporting of GSEA parameters.

There were 50 different tools used to perform enrichment analysis, with DAVID and GSEA being the most common; while 15 analyses (6.4%) did not state what tool was used (Fig 2E). The version of the software used was provided in only 68 of 235 analyses (29%) (Fig 2F).

For analyses using ORA methods, we examined what background gene set was used (Fig 2G). This revealed that in most cases, the background list was not defined, or it was clear from the article methods section that no specific background list was used. In a few cases, a background list was mentioned but was inappropriate, for example using a whole genome background for an assay like RNA-seq. In only 8/197 of cases (4.1%), the appropriate background list was described in the article.

Of the 47 analyses which used computer scripts, only 3 provided links to the code used for enrichment analysis (6.4%) (Fig 2H). For 93 of 235 analyses (40%), the corresponding gene lists/profiles were provided either in the supplement or in the article itself (Fig 2I).

Next, we quantified the frequency of methodological and reporting issues that would undermine the conclusions (Fig 2J). Lack of appropriate background was the most common issue (179 cases), followed by lack of FDR control (94), then lack of data shown (13), inference without test (11), and misinterpreted FDR values (2). Only 35 analyses (15%) did not exhibit any of these major methodological issues.

We also looked at studies performing GSEA, and whether three important analytical choices were described in the methods. These are (i) the gene weighting parameter, (ii) test type, ie: permuting sample labels or genes, and (iii) method used for ranking genes. These parameters were not stated in more than half of the analyses (Fig 2K).

Taken together, these data suggest methodological and reporting deficiencies are widespread in published functional enrichment analyses.

Methodological and reporting deficiencies occur irrespective of journal rank or article citations

To make an accurate assessment of association between analysis quality and bibliometrics, we required a larger sample of articles. We therefore screened a further 1,300 articles, bringing the total number of analyses described up to 1630; this dataset is available in S2 Table.

We then scored each analysis based on the presence or absence of methodological issues and included details. The median score was -4, with a mean of -3.5 and standard deviation of 1.4 (Fig 3A). Next, we assessed whether these analysis scores were associated with Scimago Journal Rank (SJR), a journal-level citation metric. There was a slight positive association between analysis score and SJR (Pearson r = 0.058, p = 0.036), but when viewed as boxplots of analysis score categories, there was no clear association between methodological rigor and SJR (Fig 3B).

Fig 3. Comparison of analysis scores to bibliometrics.

Fig 3

(A) Distribution of analysis scores. (B) Association of analysis scores with Scimago Journal Rank (SJR). (C) Journals with highest mean analysis scores. (D) Journals with lowest mean analysis scores. (E) Association of mean analysis score with SJR for journals with 5 or more analyses. (F) Association of analysis scores with accrued citations.

Next, we wanted to know which journals had the highest and lowest scores. Only journals with five or more analyses were included. The best scoring journals were Transl Psychiatry, Metabolites and J Exp Clin Cancer Res (Fig 3C), while the poorest were RNA Biol, Mol Med Rep, Cancer Med and World J Gastroenterol (Fig 3D), although we note that there was a wide variation between articles of the same journal.

Then we assessed for an association between mean analysis score and the SJR, for journals with five or more analyses (Fig 3E). Again, there was no association between mean analysis score and SJR (Pearson r = -0.012, p = 0.93). Next, we assessed whether there was any association between analysis scores and the number of citations received by the articles. After log transforming the citation data, there was no association between citations and analysis scores (Pearson r = 0.02, p = 0.39) (Fig 3F). These findings suggest that methodological issues are not limited to lower ranking journals or poorly cited articles.

Misuse of functional enrichment tools changes results substantially

To demonstrate whether functional enrichment analysis misuse affects results and downstream interpretation, we used an example RNA-seq dataset examining the effect of high glucose exposure on hepatocytes (SRA accession SRP128998). Out of 39,297 genes in the annotation set, 15,635 were above the detection threshold (≥10 reads per sample on average). Statistical analysis revealed 3,472 differentially expressed genes with 1,560 up-regulated and 1,912 down-regulated (FDR<0.05) due to high glucose exposure.

To quantify the effect of not performing correction for multiple testing on enrichment results, FCS and ORA were performed and filtered at the nominal p<0.05 and FDR<0.05 levels. The number of gene sets found with each approach is shown in Fig 4A. This revealed that if p-value adjustment is ignored, then 25% of FCS and 39% of ORA results would erroneously appear as significant.

Fig 4. Example enrichment analysis misuse.

Fig 4

(A) Number of differentially expressed gene sets for FCS and ORA methods using nominal p-value and FDR thresholds. (B) Euler diagram showing the overlap of differentially regulated gene sets with FCS and ORA methods (FDR<0.05). (C) Euler diagram shows the overlap of ORA results when using recommended or whole genome background. Whole genome background analysis is indicated with a *. (D) Jaccard index values for enrichment analysis results when conducted in different ways. “FDR” refers to a significance threshold of FDR<0.05. “Nominal” refers to a significance threshold of p<0.05. “ORA*nom” refers to a whole genome background used in tandem with the p<0.05 significance threshold. (E) Jaccard index values for enrichment analysis results when conducted in different ways for seven independent RNA-seq experiments.

The overlap of significant gene sets (FDR<0.05) identified with FCS and ORA methods is shown in Fig 4B, and is reflected by a Jaccard statistic of 0.65, indicating moderate concordance between these methods.

We then performed ORA using a background list consisting of all genes in the annotation set, not just those detected by the assay (indicated as ORA*), which resulted in 139 up and 484 down regulated gene sets (FDR<0.05). The overlap of ORA with ORA* was relatively small, with a Jaccard statistic of 0.44 (Fig 4C). We then conducted ORA* without p-value adjustment (indicated as ORA*nom). As expected, the overlap between ORA and ORA*nom was very low, at 0.38 (Fig 4D).

This analysis was repeated with an additional six independent RNA-seq datasets for confirmation (Fig 4E and Table 1). The Jaccard values obtained confirm a consistent degree of similarity between FCS and ORA (mean Jaccard = 0.58). Across all studies, the lack of p-value adjustment and incorrect background affected results. Interestingly the effect of ignoring p-value adjustment was more acute for ORA as compared to FCS. Also, when it comes to ORA, the impact of incorrect background was observed to be more severe than ignoring p-value adjustment. As expected, results were drastically impacted when both methodological errors were present. In that case, the Jaccard index was <0.38 when compared to ORA with the recommended procedure.

Table 1. Seven independent RNA-seq experiments used for functional enrichment analysis.

Detection threshold is an average of 10 reads per sample. Differentially expressed genes are defined as FDR<0.05 using DESeq2.

SRA accession and citation Control datasets Case datasets Genes detected Genes differentially expressed
SRP128998 [24] GSM2932797 GSM2932798 GSM2932799 GSM2932791 GSM2932792 GSM2932793 15635 3472
SRP038101 [25] GSM1329862 GSM1329863 GSM1329864 GSM1329859 GSM1329860 GSM1329861 13926 3589
SRP037718 [26] GSM1326472 GSM1326473 GSM1326474 GSM1326469 GSM1326470 GSM1326471 15477 9488
SRP096177 [27] GSM2448985 GSM2448986 GSM2448987 GSM2448982 GSM2448983 GSM2448984 15607 5150
SRP247621 [28] GSM4300737 GSM4300738 GSM4300739 GSM4300731 GSM4300732 GSM4300733 14288 230
SRP253951 [29] GSM4462339 GSM4462340 GSM4462341 GSM4462336 GSM4462337 GSM4462338 15182 8588
SRP068733 [30] GSM2044431 GSM2044432 GSM2044433 GSM2044428 GSM2044429 GSM2044430 14255 7365

Discussion

Concerns have been raised that some articles describing enrichment analysis suffer from methodological problems [18], but this is the first systematic examination of the frequency of such issues in peer-reviewed publications.

In this sample of open-access research articles, we observed a bias toward tools that are easy to use. ORA tools that only require pasting lists of gene identifiers into a webpage (ie: DAVID, KOBAS and PANTHER) were collectively more popular than other solutions like GSEA (a stand-alone graphical user interface software for FCS) or any command line tool (consistent with another report [13]). This is despite ORA tools being reported to lack sensitivity to detect subtle association according to previous benchmarking studies [1013].

Failing to properly describe the background gene list was the most common methodological issue (Fig 2J). In the seven RNA-seq examples examined here, using the inappropriate whole genome background gave results that were on average only 44% similar to results obtained using the correct background (Fig 4E).

The severe impact of selecting the wrong background on RNA-seq functional enrichment results is due to the fact that in RNA-seq, typically only a small fraction of all annotated genes are detected. Table 1 indicates that only ~38% of genes are detected in the seven examples. In contrast, a modern microarray detects a larger proportion of genes [31], so the effect of using a whole genome background is less severe.

There are various approaches to define the background of an RNA-seq dataset [32]. The effect of different filtering methods on differential expression results has been investigated [33], and provides us with some practical recommendations to avoid sampling biases described by Timmons et al [18].

Although articles that used GSEA obtained higher methodology scores overall, they were not free of issues. For example, GSEA has different options that impact the results including the ranking metric, gene weighting method and the permutation type (on samples or genes), which were not regularly reported in articles (Fig 2K), limiting reproducibility.

We scored a total of 1630 analyses, revealing only a small fraction that obtained a satisfactory score of zero or higher. The analysis scores we generated did not correlate with journal or article metrics. This suggests that methodological and reporting problems are not limited to lower ranked journals but are a more general problem.

These shortcomings are understandable, as some popular web tools do not accommodate background gene lists by design (eg: [34]). Moreover some user guides gloss over the problems of background list and correction for multiple testing (eg: [35]), while other guides are written in such a way that they are difficult for the novice statistician to comprehend (eg: [36]). Certainly the inconsistent nomenclature used in different articles and guides makes it difficult for beginners to grasp these concepts. Unfortunately, some of the best guides for enrichment analysis are paywalled (eg: [5,7]) which limits their accessibility. With this in mind, there is a need for a set of minimum standards for enrichment analysis that is open access and written for the target audience (life scientists with little expertise in statistics).

There are some limitations of this study that need to be recognized. Many open-access articles examined here are from lower-ranked journals that might not be representative of articles in paywalled journals. The articles included in this study contained keywords related to functional enrichment in the abstract, and it is plausible that articles in higher ranked journals contain such details in the abstract at lower rates. Those highly ranked specialist genomics journals are likely to have lower rates of problematic articles due to more knowledgeable editors and peer reviewers.

We also recognize the simplistic nature of the analysis scoring criteria. Clearly, the impact of each criterion is not the same. For example the effect of ignoring FDR is likely more severe than omitting the version number of the tool used. This simplified scheme was used for practical reasons.

Further, it is difficult to ascertain whether these methodological issues invalidate the conclusions of these articles. We are currently working on a systematic large scale replication study to determine the reliability of these articles with corrected methods.

In conclusion, these results are a wake-up call for reproducibility and highlight the urgent need for minimum standards for functional enrichment analysis.

Methods

Survey of published enrichment analysis

We collated 2,941 articles in PubMed Central published in 2019 that have keywords “enrichment analysis”, “pathway analysis” or “ontology analysis”. We initially sampled 200 of these articles randomly using the Unix “shuf” command. We then collected the following information from the article, searching the methods sections and other parts of the article including the supplement.

  • Journal name

  • Type of omics data

  • Gene set library used, and whether a version was reported

  • Statistical test used

  • Whether p-values were corrected for multiple comparisons

  • Software package used, and whether a version was reported

  • Whether an appropriate background gene set was used

  • Code availability

  • Whether gene profile was provided in the supplement

  • Whether the analysis had any major flaws that might invalidate the results. This includes:
    1. background gene set not stated or inappropriate background set used,
    2. lack of FDR correction,
    3. no enrichment data shown,
    4. inference without performing any statistical test, and
    5. misinterpreting p-values by stating results were significant when FDR values indicate they weren’t.

We excluded articles describing novel enrichment analysis techniques/tools, review articles and conference abstracts. Some articles presented the results of >1 enrichment analysis, so additional rows were added to the table to accommodate them. These data were entered into a Google Spreadsheet by a team of five researchers. These articles were cross checked by another team member and any discrepancies were resolved.

For analyses using GSEA, we scanned the articles to identify whether key methodological steps were described, including (i) the gene weighting parameter, (ii) test type, ie: permuting sample labels or genes, and (iii) method used for ranking genes.

For assessment of enrichment analysis quality with journal metrics and citations, we required a larger sample, so we selected a further 1300 articles from PMC for analysis. Results from this sample were not double-checked, so may contain a small number of inaccuracies. We rated each analysis with a simple approach that deducted points for methodological problems and missing details, while awarding points for including extra information (Table 2).

Table 2. Scoring schema.

1 point deducted 1 point awarded
Gene set library origin not stated Code made available
Gene set library version not stated Gene profile data provided
Statistical test not stated
No statistical test conducted
No FDR correction conducted
App used not stated
App version not stated
Background list not defined
Inappropriate background list used

SJR data for 2020 were downloaded from the Scimago website (accessed 5th August 2021) and used to score journals by their citation metrics. Using NCBI’s Eutils API, we collected the number of citations each article accrued since publication (accessed 3rd December 2021). Citation data were log2 transformed prior to regression analysis. Pearson correlation tests were used to assess the association with the analysis scores we generated.

Exemplifying functional enrichment analysis misuse

To demonstrate the effect of misusing functional enrichment analysis, a publicly available RNA-seq dataset (SRA accession SRP128998) was downloaded from DEE2 on 19th January 2022 [37]. This data consists of immortalized human hepatocytes cultured in standard (n = 3) or high glucose media (n = 3), first described by Felisbino et al [24]. Transcript level counts were aggregated to genes using the getDEE2 R package v1.2.0. Next, genes with an average of less than 10 reads per sample were omitted from downstream analysis. Differential expression statistical analysis was conducted with DESeq2 v1.32.0 [38] to identify genes altered by high glucose exposure. For gene set analysis, human Reactome gene sets [39] were downloaded in GMT format from the Reactome website (accessed 7th December 2021). FCS was performed using the mitch R package v1.4.1 with default settings, which uses a rank-ANOVA statistical test [10]. Differentially expressed genes with FDR<0.05 were used for ORA analysis using the clusterProfiler R package (v4.0.5) enricher function that implements a hypergeometric test [40]. No fold-change threshold was used to select genes for ORA. For ORA, two types of background gene sets were used: (i) detected genes, or (ii) all genes in the genome annotation set. For genes and gene sets, a false discovery rate adjusted p-value (FDR) of 0.05 was considered significant. Analyses were conducted in R version 4.1.2. To understand whether these results are consistent across other experiments, we repeated this analysis for an additional six independent published RNA-seq studies [2530]. Details of the contrasts examined are shown in Table 1.

Supporting information

S1 Table. A survey of 186 articles describing functional enrichment results in TSV format.

(TSV)

S2 Table. A survey of 1300 articles describing functional enrichment results in TSV format.

(TSV)

S1 Fig. General information about the analyses that underwent screening.

(A) The most highly represented journals in the article set. (B) The most highly represented omics types used for enrichment analysis. (C) The most highly represented organisms under study.

(EPS)

Acknowledgments

We thank Drs Antony Kaspi (Walter and Eliza Hall Institute), Nick Wong and Anup Shah (Monash University) for comments on the manuscript. This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the NCRIS-funded Australian Research Data Commons (ARDC).

Data Availability

All data and code to suport this work is available from GitHub (https://github.com/markziemann/SurveyEnrichmentMethods). We have also used Zenodo to assign a DOI to the repository: 10.5281/zenodo.5763096.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25: 25–29. doi: 10.1038/75556 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28: 27–30. doi: 10.1093/nar/28.1.27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hosack DA, Dennis G Jr, Sherman BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biol. 2003;4: R70. doi: 10.1186/gb-2003-4-10-r70 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81: 98–104. doi: 10.1016/s0888-7543(02)00021-6 [DOI] [PubMed] [Google Scholar]
  • 5.Rhee SY, Wood V, Dolinski K, Draghici S. Use and misuse of the gene ontology annotations. Nat Rev Genet. 2008;9: 509–515. doi: 10.1038/nrg2363 [DOI] [PubMed] [Google Scholar]
  • 6.Tipney H, Hunter L. An introduction to effective use of enrichment analysis software. Hum Genomics. 2010;4: 202–206. doi: 10.1186/1479-7364-4-3-202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Tilford CA, Siemers NO. Gene set enrichment analysis. Methods Mol Biol. 2009;563: 99–121. doi: 10.1007/978-1-60761-175-2_6 [DOI] [PubMed] [Google Scholar]
  • 8.Cirillo E, Kutmon M, Gonzalez Hernandez M, Hooimeijer T, Adriaens ME, Eijssen LMT, et al. From SNPs to pathways: Biological interpretation of type 2 diabetes (T2DM) genome wide association study (GWAS) results. PLoS One. 2018;13: e0193515. doi: 10.1371/journal.pone.0193515 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102: 15545–15550. doi: 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kaspi A, Ziemann M. Mitch: Multi-contrast pathway enrichment for multi-omics and single-cell profiling data. BMC Genomics. 2020;21: 447. doi: 10.1186/s12864-020-06856-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics. 2019;35: 5146–5154. doi: 10.1093/bioinformatics/btz447 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene set analysis: Challenges, opportunities, and future research. Front Genet. 2020;11: 654. doi: 10.3389/fgene.2020.00654 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Xie C, Jauhari S, Mora A. Popularity and performance of bioinformatics software: the case of gene set analysis. BMC Bioinformatics. 2021;22: 191. doi: 10.1186/s12859-021-04124-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23: 980–987. doi: 10.1093/bioinformatics/btm051 [DOI] [PubMed] [Google Scholar]
  • 15.Irizarry RA, Wang C, Zhou Y, Speed TP. Gene set enrichment analysis made simple. Stat Methods Med Res. 2009;18: 565–575. doi: 10.1177/0962280209351908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tamayo P, Steinhardt G, Liberzon A, Mesirov JP. The limitations of simple gene set enrichment analysis assuming gene independence. Stat Methods Med Res. 2016;25: 472–487. doi: 10.1177/0962280212460441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Karp PD, Midford PE, Caspi R, Khodursky A. Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics. BMC Genomics. 2021;22: 191. doi: 10.1186/s12864-021-07502-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Timmons JA, Szkop KJ, Gallagher IJ. Multiple sources of bias confound functional enrichment analysis of global -omics data. Genome Biol. 2015;16: 186. doi: 10.1186/s13059-015-0761-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1: 417–425. doi: 10.1016/j.cels.2015.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hung JH, Yang TH, Hu Z, Weng Z, DeLisi C. Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform. 2012;13: 281–291. doi: 10.1093/bib/bbr049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29: 365–371. doi: 10.1038/ng1201-365 [DOI] [PubMed] [Google Scholar]
  • 22.Tan TW, Tong JC, Khan AM, de Silva M, Lim KS, Ranganathan S. Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and Minimum Information About a Bioinformatics investigation (MIABi). BMC Genomics. 2010;11: S27. doi: 10.1186/1471-2164-11-S4-S27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Peng RD. Reproducible research in computational science. Science. 2011;334: 1226–1227. doi: 10.1126/science.1213847 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Felisbino MB, Ziemann M, Khurana I, Okabe J, Al-Hasani K, Maxwell S, et al. Valproic acid influences the expression of genes implicated with hyperglycaemia-induced complement and coagulation pathways. Sci Rep. 2021;11: 2163. doi: 10.1038/s41598-021-81794-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lund K, Cole JJ, VanderKraats ND, McBryan T, Pchelintsev NA, Clark W, et al. DNMT inhibitors reverse a specific signature of aberrant promoter DNA methylation and associated gene silencing in AML. Genome Biol. 2014;15: 406. doi: 10.1186/s13059-014-0406-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rafehi H, Balcerczyk A, Lunke S, Kaspi A, Ziemann M, Kn H, et al. Vascular histone deacetylation by pharmacological HDAC inhibition. Genome Res. 2014;24: 1271–1284. doi: 10.1101/gr.168781.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Keating ST, Ziemann M, Okabe J, Khan AW, Balcerczyk A, El-Osta A. Deep sequencing reveals novel Set7 networks. Cell Mol Life Sci. 2014;71: 4471–486. doi: 10.1007/s00018-014-1651-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lopez Sanchez MIG, Van Bergen NJ, Kearns LS, Ziemann M, Liang H, Hewitt AW, et al. OXPHOS bioenergetic compensation does not explain disease penetrance in Leber hereditary optic neuropathy. Mitochondrion. 2020;54: 113–121. doi: 10.1016/j.mito.2020.07.003 [DOI] [PubMed] [Google Scholar]
  • 29.Blanco-Melo D, Nilsson-Payant BE, Liu WC, Uhl S, Hoagland D, Møller R, et al. Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19. Cell. 2020;181: 1036–1045.e9. doi: 10.1016/j.cell.2020.04.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rafehi H, Kaspi A, Ziemann M, Okabe J, Karagiannis TC, El-Osta A. Systems approach to the pharmacological actions of HDAC inhibitors reveals EP300 activities and convergent mechanisms of regulation in diabetes. Epigenetics. 2017;12: 991–1003. doi: 10.1080/15592294.2017.1371892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sood S, Szkop KJ, Nakhuda A, Gallagher IJ, Murie C, Brogan RJ, et al. iGEMS: an integrated model for identification of alternative exon usage events. Nucleic Acids Res. 2016;44: e109. doi: 10.1093/nar/gkw263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chung M, Bruno VM, Rasko DA, Cuomo CA, Muñoz JF, Livny J, et al. Best practices on the differential expression analysis of multi-species RNA-seq. Genome Biol. 2021. Apr 29;22(1):121. doi: 10.1186/s13059-021-02337-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sha Y, Phan JH, Wang MD. Effect of low-expression gene filtering on detection of differentially expressed genes in RNA-seq data. Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015: 6461–6464. doi: 10.1109/EMBC.2015.7319872 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44: W90–7. doi: 10.1093/nar/gkw377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Blake JA. Ten quick tips for using the gene ontology. PLoS Comput Biol. 2013;9: e1003343. doi: 10.1371/journal.pcbi.1003343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bauer S. Gene-Category Analysis. Methods Mol Biol. 2017;1446: 175–188. doi: 10.1007/978-1-4939-3743-1_13 [DOI] [PubMed] [Google Scholar]
  • 37.Ziemann M, Kaspi A, El-Osta A. Digital expression explorer 2: a repository of uniformly processed RNA sequencing data. Gigascience. 2019;8: giz022. doi: 10.1093/gigascience/giz022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550. doi: 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48: D498–D503. doi: 10.1093/nar/gkz1031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16: 284–287. doi: 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009935.r001

Decision Letter 0

Ilya Ioshikhes, Melissa L Kemp

6 Jan 2022

Dear Dr Ziemann,

Thank you very much for submitting your manuscript "Urgent need for consistent standards in functional enrichment analysis" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Melissa L. Kemp, Ph.D.

Associate Editor

PLOS Computational Biology

Ilya Ioshikhes

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Enrichment analysis is widely used to interpret high-throughput omics data in terms of functional categories and pathways. In this study the authors laboriously surveyed 1630 genomics papers to assess whether enrichment analyses are conducted properly. Of these papers, 186 were studied in detail as they were cross checked by at least two team members. Their main finding is that only 15% of screened analyses are conducted and documented properly. Main issues in the rest are not using or reporting background genes (95%), failure to correct for multiple testing (43%), pooling up and down regulated genes. The author also analyzed an RNA-seq data set to demonstrate that how enrichment results differ when analyses is done inappropriately. As the first large-scale survey of its kind, this article sounds the alarm, once again, on the widespread abuse of statistics by the biomedical and genomics research community. The incompetency and the carelessness call for urgent action to improve reproducibility.

Fig.3b and 3f probably could be more effectively using boxplots/violin plots by treating analysis score as categories. Overlapping dots, even with color gradient, are hard to interpret.

Annotation databases for some online tools are not updated frequently. For example, the database for DAVID is last update in 2016. See https://david.ncifcrf.gov/content.jsp?file=release.html The author could discuss how obsolete annotation affect the results of enrichment analysis.

Reviewer #2: Wijesooriya et al. Urgent need for consistent standards in functional enrichment analysis

General comments

The Ziemann team address a fundamental scientific issue in the field of bioinformatics. Building on earlier commentaries and case examples they systematically address and quantify the extent of the ‘erroneous pathway p-value issue’ in biomedical research. Their conclusions are not surprising to this reviewer, yet these results make urgent and essential reading for all reviewers, journal editors and biomedical researchers working with OMIC data.

There is one main and easy to solve flaw in their article and that pertains to an argument about combining or otherwise up and down-regulated genes. There is no logic or rule for this and in biological pathways combining is completely acceptable, as you can have both positive and negative regulators in a list from one pathway, differentially expressed in the opposite direction. Thus the interpretation of the statistical work by Hong G et al 2014 is wrong – unsurprisingly as they understand little biology based on a reading of their article.

Thus an up-regulation of a positive actor or down-regulated of a negative actor – in the same pathway - can equate to the same biochemical outcome. Splitting lists also actually leads to two other problem. First is that with some technologies detecting down-regulation is more difficult (signal related) and second is that related to gene list size. Its well appreciated that small lists, especially if enrichment ratios + pvalues, and/or boot strapping are considered, are not sensitively profiled. Splitting a biologically relevant list into up and down impacts on this issue in an unpredictable manner, depending on gene list size and content. This particular section needs re-written or deleted as its wrong. Your results – see below – actually stumble on this issue.

Specific comments

The majority of my comments are minor and relate to ensuring clarity of message and identification of any statement that could be misconstrued (or is not entirely accurate).

P3 ‘Instead, many tools have been developed to summarise gene profiles into simplified functional categories’

I would breifly mention the history behind the Gene Ontology Consortium and their project to provide context as to how this catalogue of processes/pathways emerged. http://geneontology.org/docs/introduction-to-go-resource/

Its possibly informative to reflect on the fact that this grew out of a need to catalogue the genome (with the incorrect assumption that all genes were equally characterizable in a genome wide study). So perhaps they never thought about a variable detectable background.

You should probably mention that the Gene Ontology Consortium current link to their “10 tips for GO” does not mention the word ‘background’ even once. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003343

This is a paragraph from Chapter from the ‘The Gene Ontology HandBook’. 2017. They STILL don’t correctly explain the background bias issue.

“Such large changes in GO annotations can affect GO enrich- ment analyses, which are sensitive to the choice of background distribution (Chap. 13 [3]; [20]). For instance, Clarke et al. [21] have shown that changes in annotations contribute significantly to changes in overrepresented terms in GO analysis. To mitigate this problem, researchers should analyze their datasets using the most up-to-date version of the ontology and annotations, and ensure that the conclusions they draw hold across multiple recent releases. At the time of the writing of this chapter, DAVID, a popular GO analysis tool, had not been updated since 2009”

P3 ‘developed to summarise gene profiles into simplified functional categories’

Try – ‘developed to summarise regulated gene expression profiles into simplified functional categories’

P4 ‘A statistical test is performed to ascertain whether the number of DEGs belonging to a particular gene set is higher than that expected by random chance, as determined by comparison to a background gene list. These ORA tools can be stand-alone software packages or web services, and they use one or more statistical tests (eg: Fisher’s exact test, hypergeometric test) [1,2]’.

I have two comments here.

Most ignore the enrichment ratio (which helps inform about bias related to very large (usually uninformative) gene sets) e.g. significant p-value and ER of 1.1 is not a meaningful result (given all the sources of bias). I would explicitly mention that ‘is ideally both a significant adjusted p-value and a robust enrichment ratio (ratio of regulated genes to total genes in that group) is sought’.

Second point is that I believe it is essential to mention the published objections to use of Fisher’s exact test (or worse still hypergeometric test). In that they assume independence and for members of gene sets that not true. I would state there are “general concerns” about the weakness of the primary statistical methods. The reason I mention this is some colleagues won’t engage on the larger issue of background bias or lack of FDR use, because they object to the primary statistical method. Better to state you recognise their concerns with a sentence and citation.

A good review article to cite on the GO stats would be https://www.nature.com/articles/nrg2363

Also note this sentence from the review.

“In practice, a term would need to have a raw p-value less than 4 x 10−7 for it to be significant at the 1% significance level. Other corrections, such as Holm's26 and false discovery rate27, are less conservative but loss of power cannot be completely avoided (see Refs 28,29 for further reviews). Hence, as a general rule, one can increase the power of the statistical analysis by performing the fewest possible number of tests.”

This review deals with things are a theoretical level and do not do the essential work you provide in your article – namely the scale for the problem. It is worrying however how little of this review in 2008 has been recognised, compared with the use of GO tools.

Hammering home that a p-value of 1x10-6 BEFORE correction is likely to required will provide some reality check to those using GO tools and the designers of software – as they are the MAIN problem today. If your tool does not give good results people will not use them.....thats clearly an issue.

Secondly, many select EVERY GO class, or GO + KEGG + XYZ when using online tools - This is clearly a flawed approach, yet software tools enable this mistake.

A further point that you may wish to consider specifically in relation to microarrays and RNAseq. The modern microarray provides greater coverage (See Sood et al Nucleic Acid research 2016 and Timmons et al Aging Cell 2019 and supplement), and are more sensitive (See Fig 2e, Peters TJ Bioinformatics 2019) when profiling individual human tissues than RNAseq.

RNAseq has a library (PCR bias and incomplete nature means a gene can not ‘appear’) and also a serious reproducibility problem (See Supplement S22 of the Sequencing Quality Control Consortium, Nature Biotechnology 2014 – 32(9)) meaning that the certainty of the background becomes less clear than for a modern array processed with modern methods. See Mandelboum et al Plos Biology 2019).

Single Cell RNAseq becomes an even greater issue – here coverage is normally between 2000-5000 genes and heavily biased for high abundance genes esp. mitochondrial (See the Gene Ontology data buried in the supplement of Mereu et al Nature Biotechnology June 2020).

In short, for sequencing experiments the GO/Pathway background issue just became a far greater problem. This is important.

Page 6 “From these, we initially selected 200 articles for detailed methodological analysis.”

Please clarify why and how you selected the 200 for the summary chart arm of your flow chart.

I am curious that so few Nature Communication articles appear in your analysis. This is the largest single source of OMIC papers I encounter and almost without exception there is no clear methods and obvious flaws in the GO/Pathway analysis. The other journals that have a high frequency of flawed GO/Pathway analysis are the American Journal of Physiology family of journals.

In that sense Fig S1A is not helpful as its not an unbiased or full representation of the sources of the problem – and might give the wrong impression.

P8 During this survey, we noticed some studies grouped up- and down-regulated gene lists together prior to ORA, a practice we were not expecting.

As mentioned above you have made a mistake in logic, regarding up and down-regulated gene lists and their combination. See above.

P11. Figure 3D is remarkable – BMC Bioinformatics being one of the lowest scoring journals. I am glad I resigned from their editorial board 8yr ago (for poor quality editorial processes).

P12. The RNA-seq data you analyse using only an FDR filter and with >20% DE raises issues about normalisation if such a large % of genes are genuinely DE. I personally would check implementing a modest FC filter on top of the p-value to enrich in true positives and confirm that you still have only moderate concordance. In short, if you put a lot of ‘junk’ in, you can not expect reliable data out and its best to avoid any critic of this analysis.

P12 “Interestingly, 26 gene sets were simultaneously up and downregulated with this approach.”

This is probably a reflection of the issue of splitting up and down regulated lists as I mentioned. The direction does not come from GO, it comes from your assumption that up means “process up” and vice versa. This is not reliably concluded without detailed inspection as, as stated, loss of a negative regulator results in up-regulation of pathway function. You need to rethink this part of your paper otherwise you are jeopardising the validity of your article. Likewise you need to re think how you cite the flawed conclusions made by Hong 2014.

14. Hong G, Zhang W, Li H, Shen X, Guo Z. Separate enrichment analysis of pathways for up- and downregulated genes. J R Soc Interface. 2014;11: 20130950.

Page 15

“This is despite ORA tools being reported to lack sensitivity when compared to FCS according to previous benchmarking studies [7-9]”

“Although analyses involving GSEA scored better overall, they were not free of issues. “

You should probably consider that FCS/GSEA represent a different set of biases (depending on the origin and date of the gene-sets) than ORA GO type analysis. Arguing that FCS is more sensitive is partly a reflection of using the KS statistic and there are strong advocates against the robustness of KS (and several iterations trying to address them). On the other hand being able to show that the accumulation of a large number of small changes in a pathway might be biologically sound is attractive. I would present + and – rather than just that FCS is ‘better’ based on sensitivity.

Methods

List that the journal scoring metric is logical but ad hoc, and it is not scaled by the magnitude of impact of each error on the results.

Human nature is to say “I followed almost all the criteria so I am doing well” when they could miss out the most damaging rule e.g. No FDR and inappropriate background”. I’d state you are not implying that all scoring criteria are equal.

From peer reviewing the number of times I have informed an author that they get “relevant pathways” (to their tissue/biology) because they compare with a genome wide background and that creates fake enriched p-values – only to have this ignored and the fake enriched p-values published, is substantial.

Defining the correct RNAseq background is particularly challenging as many samples can have essentially zero counts for a particular gene (unrelated to group membership), and yet its called detected by some % call. Proportion calls with groups (block design) is required. You might wish to write something more about the sources background biases or cite from Timmons 2015, that lists them.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Xijin Ge

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009935.r003

Decision Letter 1

Ilya Ioshikhes, Melissa L Kemp

18 Feb 2022

Dear Dr Ziemann,

We are pleased to inform you that your manuscript 'Urgent need for consistent standards in functional enrichment analysis' has been provisionally accepted for publication in PLOS Computational Biology. The reviewers appreciated your attention to their critical comments and felt your revised, focused version would be an invaluable resource to the journal's readership.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Melissa L. Kemp, Ph.D.

Associate Editor

PLOS Computational Biology

Ilya Ioshikhes

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed my concerns. I also like it that the authors choose to focus on specific issues rather than attacking many things at the same time. The manuscript is much improved.

Regarding the pooling of up- and down-regulated genes for ORA, my opinion differs slightly from that of Reviewer #2. I think, in most cases, splitting up- and down-regulated genes are recommended. The two pitfalls (difficulty in detecting down-regulated genes, and list size) for splitting listed by reviewer #2 are not frequently encountered in RNA-Seq studies. Yes, positive and negative regulators of a pathway is possible sometimes. But pooling these gene lists introduces more noise and as far as I can see is not a common practice. I agree with a lot of excellent, constructive points of Reviewer #2. But authors should be allowed to express their opinions.

Reviewer #2: Dear Authors

Thanks for your responses and adjustments. I think the article is now in excellent shape.

One technical point, you are free to consider, is the fold-change filter. There is extensive studies from the older microarray field demonstrating the impact of a combined FC and P-value on FDR control for individual genes (See the work around the time of Choe et al). In terms of ontology analysis you can observe this effect empirically, yourself, if you have large list motivated by p-values only (e.g. n=1500), and you compare the GO profile obtained with and without a FC filter (e.g. down to n=750) you not a substantial impact on GO pathway enrichment (often losing large GO categories with modest enrichment ratios). In terms of FC filters, it is less arbitrary than p<0.05 - in that you decide to filter based on the technical performance of the platform. For an array that could be >10% shift in signal. For sequencing that is probably closer to 20%.

A FC filter would impact on your results but its up to you!

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009935.r004

Acceptance letter

Ilya Ioshikhes, Melissa L Kemp

4 Mar 2022

PCOMPBIOL-D-21-02205R1

Urgent need for consistent standards in functional enrichment analysis

Dear Dr Ziemann,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. A survey of 186 articles describing functional enrichment results in TSV format.

    (TSV)

    S2 Table. A survey of 1300 articles describing functional enrichment results in TSV format.

    (TSV)

    S1 Fig. General information about the analyses that underwent screening.

    (A) The most highly represented journals in the article set. (B) The most highly represented omics types used for enrichment analysis. (C) The most highly represented organisms under study.

    (EPS)

    Attachment

    Submitted filename: Reviewer comments v2.pdf

    Data Availability Statement

    All data and code to suport this work is available from GitHub (https://github.com/markziemann/SurveyEnrichmentMethods). We have also used Zenodo to assign a DOI to the repository: 10.5281/zenodo.5763096.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES