netboxr: Automated discovery of biological process modules by network analysis in R

Eric Minwei Liu; Augustin Luna; Guanlan Dong; Chris Sander

doi:10.1371/journal.pone.0234669

. 2020 Nov 2;15(11):e0234669. doi: 10.1371/journal.pone.0234669

netboxr: Automated discovery of biological process modules by network analysis in R

Eric Minwei Liu ^1,^*,^#, Augustin Luna ^2,^3,^4,^*,^#, Guanlan Dong ⁵, Chris Sander ^2,^3,^4,^*

Editor: Tao Huang⁶

PMCID: PMC7605689 PMID: 33137091

Abstract

Summary

Large-scale sequencing projects, such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), have generated high throughput sequencing and molecular profiling data sets, but it is still challenging to identify potentially causal changes in cellular processes in cancer as well as in other diseases in an automated fashion. We developed the netboxr package written in the R programming language, which makes use of the NetBox algorithm to identify candidate cancer-related functional modules. The algorithm makes use of a data-driven, network-based approach that combines prior knowledge with a network clustering algorithm, obviating the need for and the limitation of independently curated functionally labeled gene sets. The method can combine multiple data types, such as mutations and copy number alterations, leading to more reliable identification of functional modules. We make the tool available in the Bioconductor R ecosystem for applications in cancer research and cell biology.

Availability and implementation

The netboxr package is free and open-sourced under the GNU GPL-3 license R package available at https://www.bioconductor.org/packages/release/bioc/html/netboxr.html

Introduction

Large-scale sequencing consortia such as The Cancer Genome Atlas (TCGA) [1] and the Interactional Cancer Genome Consortium (ICGC) [2] provide detailed genomic alteration profiling in many cancer types. Many methods based on the recurrence of genomic alterations, i.e., the frequency of occurrence in sets of tumor samples, have been developed to identify alterations likely to be functional in oncogenesis or cancer progression, addressing an important question in the field of precision oncology [3]. However, due to the considerable patient-to-patient heterogeneity of the cancer genome, rare mutations in certain patients can still be involved in tumor development by affecting biological processes in ways similar to those of known cancer genes. One way to address the issue of the effect of rare mutations is to combine prior knowledge of genetic and molecular interactions with recurrence-based methods and thus increase the power of predictions despite relatively low recurrence counts. In this spirit, we have developed the NetBox algorithm that seeks to automate the identification of candidate oncogenic processes and involved genes, which allows the quantitative analysis of genomic alterations in the context of known signaling pathway connectivity [4]. The NetBox algorithm identifies potentially novel network modules by mapping genomic alterations onto a comprehensive prior-knowledge interaction network, containing nodes and their interactions (edges), and then identifying modules as clusters of connected nodes that are frequently affected by genomic alterations as a set. The aggregation of nodes into clusters overcomes the statistical problem of low counts for individual nodes. This is in contrast to methods such as gene set enrichment analysis (GSEA), a popular approach to associate a gene list to biological functions, that relies on curated, pre-defined clusters of genes and does not make use of the often known interactions between genes or gene products. Unlike GSEA, NetBox is not limited to nor influenced by curated gene sets for module discovery and overcomes the issue of the occurrence of genes in more than one of the curated gene sets (Fig 1). Instead, the NetBox algorithm derives network modules de novo, based on the alteration data in tumor samples, such that the identified modules can identify new functional gene groups that cross the boundaries of curated gene sets. For newly discovered modules, the functional annotation can then be derived from the annotation of the gene members, providing potentially novel hypotheses about cellular processes that matter in the system from which the alteration data is derived.

Fig 1 — A) The key difference between gene set enrichment analysis (GSEA)/hypergeometric tests and NetBox pathway discovery. Nodes are genes or gene products (red: altered in the dataset; black: unaltered). Edges are known interactions. Gene sets (top) typically do not contain interactions and if they do, these are not used in GSEA. Interactions are explicitly used in NetBox to infer functional modules. B) NetBox workflow for, e.g., alterations in cancer genomics data. Other types of data, such as mRNA expression profiles and proteomics profiles, can be equally accommodated as input.

To extend the use of NetBox, we have implemented the NetBox algorithm as a native R package, netboxr. The netboxr package provides users with access to the NetBox algorithm within the R ecosystem, thereby providing simplicity and flexibility for the visualization and secondary analyses through available R packages by using common data structures in R packages. Here, we describe the use of netboxr to integrate various types of genomic alterations for the detection of potentially functional network modules in glioblastoma multiforme (GBM) cancer as an example and highlight netboxr tutorial material for integrating the use of the netboxr package with additional R packages.

Methods

Implementation

The netboxr package implements the original NetBox algorithm for the discovery of pathway modules [4] using the R programming language and adds several functions to communicate with other packages in the R ecosystem and to integrate several input data types.

Base functionality and algorithm

netboxr takes genes that are significantly altered by mutations, copy number alteration associated with gene expression change, or possibly changes in other data types, as input for identification of pathway modules. As a first step in the analysis, an input group of altered genes is mapped onto an interaction network from a comprehensive knowledge base of interactions; sources for networks of interactions include Pathway Commons using the paxtoolsr package. In NetBox, to account for obviously incomplete knowledge, candidate linker genes are defined as genes that do not have alterations but are direct neighbors of altered genes. To do this, in the second step, a hypergeometric test is used to determine the probability that a given candidate linker node has x or more interactions with nodes in the input gene list, Pr(X≥x), where x is the observed number of interactions between a candidate linker node to altered genes in the input list. This probability is taken as the p-value; significant p-values indicate that candidate linker genes are involved in the relevant biological processes along with the input genes. P-values of each candidate linker gene are corrected by the Benjamini-Hochberg method. Linker genes with an adjusted p-value equal to or less than 0.05 are counted as significantly connected linker genes. Third, a community detection algorithm (e.g., the Girvan–Newman edge betweenness algorithm) is applied to the extended network with connected altered genes and linker genes. Finally, netboxr offers edge-betweenness and leading eigenvector (for networks with large numbers of nodes) algorithms to identify network modules as connected clusters of genes. As newly configured modules lack functional labels, the modules identified by netboxr can then be passed to enrichment packages, such as the ClusterProfiler package, to characterize the modules in terms of functional Gene Ontology (GO) terms (i.e., biological processes, molecular function, or cellular compartment), to complement the functional annotation of individual genes with an overall functional label for the set of interacting genes, which we call pathway modules.

Assessment of statistical significance

Two statistical tests are performed on the identified network modules to assess the significance of the identified network (i.e., the identified network is the entirety of the network defined by the altered gene nodes, linker nodes, and their connecting edges). These tests were conducted in a similar manner as for the original NetBox algorithm [4]. To assess the level of global connectivity, an empirical p-value is calculated by determining the number of times the size of the largest connected component (the largest network component can be composed of multiple modules) identified from the same number of randomly selected genes equals or exceeds the size of the largest connected component from the list of altered genes in the data set. Next, a network modularity score is calculated [5]. This score represents the strength (or quality) of the division of a network into various modules and is defined as the edge fraction that is within given modules minus an expected fraction by randomly distributing the edges. To assess the statistical significance of the network modularity observed in the resulting network, we used a local rewiring algorithm where random networks are generated that maintain the same size and all genes maintain the same degree, but the choice of interaction partners is random. For each of these random networks, we calculate the network modularity score and calculate the average and standard deviation for a set of random networks. The observed modularity score is then converted into a z-score (and reported as a p-value) to measure the deviation of the observed network modularity from that of the random null model.

Implementation details and integration in the R ecosystem

Beyond the base functions, netboxr includes several additions to simplify and expand its use. These include 1) instructions for retrieving and processing genomic alterations such as mutations and copy number alterations from data repositories such as cBioPortal [6, 7] or Genomic Data Commons (GDC) [3], 2) instructions to use pathway data (i.e., genes or gene products with interactions) via the paxtoolsr Pathway Commons package [8] or from resources such as the STRING pathway database [9], 3) functionality to switch between the discovery of modules using various algorithms for detection of network communities from the igraph package [10] and 4) guidance on functionally annotating netboxr-derived modules using the ClusterProfiler package [11]. netboxr was implemented in R and can be installed through the BiocManager package manager from Bioconductor (bioconductor.org).

Use case

The package vignette (i.e., tutorial) provides step by step instructions for the usage of the package and exploration of the results. Additionally, within the vignette, we provide instructions for users to generate input data and retrieve input gene lists from publicly available studies on cBioPortal [6, 7].

As a specific use case, here we tested the netboxr package in a cancer use case by using the list of altered genes from TCGA datasets and the prior-knowledge network data from the original NetBox paper [4]. The results reported here are comparable to those from Cerami et al. although the unadjusted p-values for linker genes are not exactly the same. This is because the unadjusted p-values of linker genes in the original NetBox report [4] were calculated as the probability that a given candidate linker node has exactly X interactions Pr(X = x), where X is the observed number of interactions between a candidate linker node to altered genes in the input list; instead, netboxr uses a hypergeometric test as described in the Base Functionality section. The final number of linker genes, using a significance cutoff of 0.05 after FDR correction, is the same in netboxr and the original NetBox implementation. Using netboxr, we identify the PIK3R1 (Module M1) and RB1 (Module M2) modules, each with connected genes that are significantly altered as a set in the glioblastoma (GBM) cancer genomics data from the Cancer Genome Atlas (TCGA). Each of these modules is then annotated with brief descriptions through an enrichment step using the clusterProfiler Bioconductor package using gene annotations from the Gene Ontology (Fig 2; S1 File). Module M1 is related to AKT/PKB signaling while M2 is related to cell cycle regulation; brief descriptions of functions for other modules are shown in the figure. This module-driven exploration of the identified network allows a finer-grain understanding of the input gene list than through simply performing an enrichment analysis over the entire input gene list through 1) the topology of the identified network connections and 2) the annotation of specific modules (Fig 2, S1 File). Details for this example are provided in the vignette document in the netboxr package. Fig 2 demonstrates the use of the netboxr package to discover pathway modules from multiple genomic data for the example of the TCGA glioblastoma multiforme (GBM) study.

Fig 2 — The new NetBox algorithm implementation in R uses the igraph library to speed up module detection and visualization. Using mutations, and copy number alteration data from TCGA (Cerami et al., 2010), netboxr identified 10 pathway modules. Modules are functionally annotated with a brief description of the module genes using the clusterProfiler Bioconductor package (S1 File). For more detailed understanding users should also inspect the function of the genes contained in the modules. In this glioblastoma example, the largest module (M1, light orange background) contains genes related to the PIK3 pathway and functions related to AKT signaling (also known as “PKB signaling” in the GO Gene Ontology). The second-largest module (light blue background) contains genes related to the TP53 and cell cycle pathways. These algorithmically inferred two main modules are consistent with the intuitively inferred signaling pathways in the original TCGA publication [12].

Conclusion

The netboxr R package facilitates data-driven network module discovery. Here we provide a use case that emphasizes analysis of cancer genomics data, but the methodology is applicable for other diseases or cell biological perturbation experiments with comparable large datasets covering genetic or molecular changes in genes or gene products. With the ease of installation in R bioinformatics environments, researchers can quickly use datasets of molecular measurements to identify pathway modules and form biological hypotheses on the functional role of cellular processes in cancer and other diseases, as well as in healthy tissues. As with any method, users should carefully consider the problems they attempt to address and understand the strengths and limitations before using a particular method. As for the contrast with enrichment analyses, which NetBox is distinct from and complementary to, Reimand et al. present a broader discussion of potential biases, including those in existing pathway databases [13]. The netboxr authors can be contacted to explore collaboration in module discovery in large scale perturbation-response datasets of interest.

Supporting information

S1 File

(DOCX)

Click here for additional data file.^{(403.7KB, docx)}

Acknowledgments

We would like to thank The Cancer Genome Atlas community of researchers for providing the genomic data used in the netboxr package vignette.

Data Availability

The data underlying the results presented in the study are available from the netboxr package found at https://doi.org/10.18129/B9.bioc.netboxr.

Funding Statement

This research was supported by the US National Institutes of Health grant (U41 HG006623-02), the Ruth L. Kirschstein National Research Service Award (F32 CA192901), and through funding for the National Resource for Network Biology (NRNB) from the National Institute of General Medical Sciences (NIGMS -P41 GM103504). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Chang K, Creighton CJ, Davis C, Donehower L, Drummond J, Wheeler D, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45: 1113–1120. 10.1038/ng.2764 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Hudson (Chairperson) TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR, et al. International network of cancer genome projects. Nature. 2010;464: 993–998. 10.1038/nature08987 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Jensen MA, Ferretti V, Grossman RL, Staudt LM. The NCI Genomic Data Commons as an engine for precision medicine. Blood. American Society of Hematology; 2017. pp. 453–459. 10.1182/blood-2017-03-735654 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cerami E, Demir E, Schultz N, Taylor BS, Sander C. Automated network analysis identifies core pathways in glioblastoma. PLoS One. 2010;5 10.1371/journal.pone.0008918 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E—Stat Nonlinear, Soft Matter Phys. 2004;69: 026113 10.1103/PhysRevE.69.026113 [DOI] [PubMed] [Google Scholar]
6.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio Cancer Genomics Portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2: 401–404. 10.1158/2159-8290.CD-12-0095 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6: pl1–pl1. 10.1126/scisignal.2004088 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rodchenkov I, Babur O, Luna A, Aksoy BA, Wong J V., Fong D, et al. Pathway Commons 2019 Update: Integration, analysis and exploration of pathway data. Nucleic Acids Res. 2020;48: D489–D497. 10.1093/nar/gkz946 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Snel B, Lehmann G, Bork P, Huynen MA. String: A web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 2000;28: 3442–3444. 10.1093/nar/28.18.3442 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Csardi Gabor and Nepusz Tamas. The igraph software package for complex network research. InterJournal. 2006;Complex Sy: 1695 Available: http://igraph.sf.net [Google Scholar]
11.Yu G, Wang LG, Han Y, He QY. ClusterProfiler: An R package for comparing biological themes among gene clusters. Omi A J Integr Biol. 2012;16: 284–287. 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Brennan CW, Verhaak RGW, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The Somatic Genomic Landscape of Glioblastoma. Cell. 2013;155: 462–477. 10.1016/j.cell.2013.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc. 2019. February;14(2):482–517. 10.1038/s41596-018-0103-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0234669.r001

Decision Letter 0

Tao Huang

23 Jul 2020

PONE-D-20-16149

NetBoxR: Automated Discovery of Biological Process Modules by Network Analysis in R

PLOS ONE

Dear Dr. Luna,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 06 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Tao Huang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. During our initial internal evaluation of your submission we noticed several typos in your manucript.

Please note that PLOS ONE does not providing copyediting or proofs of accepted manuscripts.

We therefore recommend that you carefully review your manuscript and correct any errors at this time.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

'Funding

This research was supported by the US National Institutes of Health grant (U41 HG006623-02),

the Ruth L. Kirschstein National Research Service Award (F32 CA192901), and through funding

for the National Resource for Network Biology (NRNB) from the National Institute of General

Medical Sciences (NIGMS -P41 GM103504).'

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

a. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

'The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.'

b. Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors developed an R package named NetBoxR that makes use of the NetBox algorithm to identify candidate cancer-related processes. The algorithm makes use of a network kbased approach that combines prior knowledge with a network clustering algorithm. This program can combine multiple data types, such as mutations and copy number alterations, leading to more reliable identification of functional modules. This tool can be useful for the research community to identify the functional modules in cancer data analysis.

I have some little concerns:

1. In the paper, it says” NetBoxR takes significantly altered genes from SNV mutation, copy number alteration associated with gene expression change, and merges them into a gene list for identification of pathway modules.” How can users to define the “significantly altered genes from SNV mutation, copy number alteration”, just like finding differentially expressed genes from expression profiles, how to take the variants into account to find these genes? Authors may provide some guidance on choosing these genes for a better performance in NetBoxR.

2. From the methodology side, there are differences between the GSEA algorithm and NetBox, it seems the NetBox overcomes some shortcomes, I suggest authors to do a comparison between the results from GSEA and NetBox, to provide the new findings from NetBox, which can not be got from GSEA.

3. In the online tutorial, the results from NetBox were used for GO enrichment, why not to do the GO enrichment at the first using the original gene list, what are the advantages by using the NetBoxR to do a pre-process before the GO enrichment?

4. In the result figure (paper, online), it shows some modules, but there is no annotation of these module? The author should show the function annotation for the modules in the result figures found by their program.

5. Minor: the text in original figure can be see clearly, but I cannot see any word in the figure embedded in the pdf file.

Reviewer #2: Liu et al present the R package NetBoxR that implements the NetBox algorithm previously described in an article by Cerami et al (2010). This article presents little in terms of new methods, data or results; the main message is the implementation of the NetBox algorithm in R rather than a standalone command-line application.

The article is mostly written well and is easy to understand, at least if the reader is willing to refer to the Cerami et al work for some of the details. There are some sentences that could use rewording to make them clearer, see below.

In terms of methods, I am concerned mainly about two issues:

1. NetBox and NetBoxR evaluate the significance of the found network by focusing on the largest module only. It would be beneficial to implement a method that can determine significance of each module, not just the largest one.

2. The identified modules are based on literature networks, which presumably makes any enrichment analysis in literature sets biased. If I understood the methods correctly, this inflation is the result of not just the connections that come from the literature, but also because linker genes are derived from the literature and included in the modules. It does not appear that the package contains any functions that could quantify and correct the bias.

As for implementation, a cursory reading of the help files indicates that they need be improved. Reading the help for geneConnector, one learns for example that the argument networkGraph is “an igraph graph object” but it’s unclear what it is supposed to represent. Similarly, argument communityMethod is described as “A string for community detection method c('ebc','lec')”. It is not clear what the shorthands mean. Arguments ‘directed’ and ‘keepIsolatedNodes’, although both logical, are described as “TRUE of FALSE” and “logic value”, respectively; neither has any information on what the TRUE and FALSE values actually determine. Similar comments apply to the description of the output of the function. The individual input arguments and output components need a description of what they represent, not just what type of R object is expected.

The authors say that the package provides “instructions for retrieving and processing genomic alterations such as SNV mutations and copy number alterations from data repositories such as cBioPortal (Cerami et al., 2012; Gao et al., 2013) or Genomic Data Commons (GDC) (Jensen et al., 2017)”. I did not find the instructions; perhaps I missed something in the vignette or in the help files?

Minor issues:

The authors refer to the package as NetBoxR, but the package name on bioconductor is netboxr (all lowercase). This can cause confusion, e.g., NetworkManager::install(“NetBoxR”) produces an error.

The sentence “ability to retrieve pathway data via the paxtoolsr package as available in ~20 data source aggregation Pathway Commons, (Luna et al., 2016) or STRING (Snel et al., 2000)” does not make sense.

Contrary to what the authors seem to imply, the package does not seem to be available for R-3.6.3; one apparently needs R-4.0.0 or higher.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-20-16149_comments.docx

Click here for additional data file.^{(12.8KB, docx)}

PLoS One. 2020 Nov 2;15(11):e0234669. doi: 10.1371/journal.pone.0234669.r002

Author response to Decision Letter 0

21 Sep 2020

Thanks for the constructive feedback. We revised our manuscript accordingly. Please see our response as a separate document.

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(20.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0234669.r003

Decision Letter 1

Tao Huang

14 Oct 2020

netboxr: Automated discovery of biological process modules by network analysis in R

PONE-D-20-16149R1

Dear Dr. Luna,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Tao Huang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: I did not find the response to the comments file in the review system(I am not sure if the authors forgot to submit it or something wrong with the review system).

I have read through the paper, and I can feel that authors has made great upgrade to their netboxr package, and refined the manuscript. It can be accepted.

One minor comment: There are too much kay words.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Attachment

Submitted filename: PONE-D-20-16149_R1_comments.docx

Click here for additional data file.^{(11.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0234669.r004

Acceptance letter

Tao Huang

22 Oct 2020

PONE-D-20-16149R1

netboxr: Automated discovery of biological process modules by network analysis in R

Dear Dr. Luna:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Tao Huang

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

(DOCX)

Click here for additional data file.^{(403.7KB, docx)}

Attachment

Submitted filename: PONE-D-20-16149_comments.docx

Click here for additional data file.^{(12.8KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(20.7KB, docx)}

Attachment

Submitted filename: PONE-D-20-16149_R1_comments.docx

Click here for additional data file.^{(11.7KB, docx)}

Data Availability Statement

The data underlying the results presented in the study are available from the netboxr package found at https://doi.org/10.18129/B9.bioc.netboxr.

[pone.0234669.ref001] 1.Chang K, Creighton CJ, Davis C, Donehower L, Drummond J, Wheeler D, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45: 1113–1120. 10.1038/ng.2764 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref002] 2.Hudson (Chairperson) TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR, et al. International network of cancer genome projects. Nature. 2010;464: 993–998. 10.1038/nature08987 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref003] 3.Jensen MA, Ferretti V, Grossman RL, Staudt LM. The NCI Genomic Data Commons as an engine for precision medicine. Blood. American Society of Hematology; 2017. pp. 453–459. 10.1182/blood-2017-03-735654 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref004] 4.Cerami E, Demir E, Schultz N, Taylor BS, Sander C. Automated network analysis identifies core pathways in glioblastoma. PLoS One. 2010;5 10.1371/journal.pone.0008918 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref005] 5.Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E—Stat Nonlinear, Soft Matter Phys. 2004;69: 026113 10.1103/PhysRevE.69.026113 [DOI] [PubMed] [Google Scholar]

[pone.0234669.ref006] 6.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio Cancer Genomics Portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2: 401–404. 10.1158/2159-8290.CD-12-0095 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref007] 7.Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6: pl1–pl1. 10.1126/scisignal.2004088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref008] 8.Rodchenkov I, Babur O, Luna A, Aksoy BA, Wong J V., Fong D, et al. Pathway Commons 2019 Update: Integration, analysis and exploration of pathway data. Nucleic Acids Res. 2020;48: D489–D497. 10.1093/nar/gkz946 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref009] 9.Snel B, Lehmann G, Bork P, Huynen MA. String: A web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 2000;28: 3442–3444. 10.1093/nar/28.18.3442 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref010] 10.Csardi Gabor and Nepusz Tamas. The igraph software package for complex network research. InterJournal. 2006;Complex Sy: 1695 Available: http://igraph.sf.net [Google Scholar]

[pone.0234669.ref011] 11.Yu G, Wang LG, Han Y, He QY. ClusterProfiler: An R package for comparing biological themes among gene clusters. Omi A J Integr Biol. 2012;16: 284–287. 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref012] 12.Brennan CW, Verhaak RGW, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The Somatic Genomic Landscape of Glioblastoma. Cell. 2013;155: 462–477. 10.1016/j.cell.2013.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0234669.ref013] 13.Reimand J, Isserlin R, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc. 2019. February;14(2):482–517. 10.1038/s41596-018-0103-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

netboxr: Automated discovery of biological process modules by network analysis in R

Eric Minwei Liu

Augustin Luna

Guanlan Dong

Chris Sander

Roles

Abstract

Summary

Availability and implementation

Introduction

Fig 1. Overview of NetBox algorithm.

Methods

Implementation

Base functionality and algorithm

Assessment of statistical significance

Implementation details and integration in the R ecosystem

Use case

Fig 2. Glioblastoma multiforme (GBM) pathway modules identified by the netboxr package from cancer genomics alteration data without the use of pre-defined gene sets.

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Tao Huang

Roles

Author response to Decision Letter 0

Decision Letter 1

Tao Huang

Roles

Acceptance letter

Tao Huang

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases