Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2020 Apr 9;36(12):3920–3921. doi: 10.1093/bioinformatics/btaa236

MONET: a toolbox integrating top-performing methods for network modularization

Mattia Tomasoni b1,b2,, Sergio Gómez b3, Jake Crawford b4,b5, Weijia Zhang b6, Sarvenaz Choobdar b1,b2, Daniel Marbach b1,b2,b8, Sven Bergmann b1,b2,b7,
Editor: Pier Luigi Martelli
PMCID: PMC7320625  PMID: 32271874

Abstract

Summary

We define a disease module as a partition of a molecular network whose components are jointly associated with one or several diseases or risk factors thereof. Identification of such modules, across different types of networks, has great potential for elucidating disease mechanisms and establishing new powerful biomarkers. To this end, we launched the ‘Disease Module Identification (DMI) DREAM Challenge’, a community effort to build and evaluate unsupervised molecular network modularization algorithms. Here, we present MONET, a toolbox providing easy and unified access to the three top-performing methods from the DMI DREAM Challenge for the bioinformatics community.

Availability and implementation

MONET is a command line tool for Linux, based on Docker and Singularity containers; the core algorithms were written in R, Python, Ada and C++. It is freely available for download at https://github.com/BergmannLab/MONET.git.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Gene networks, such as protein interaction, signaling, gene co-expression and homology networks, provide scaffolds of linked genes. Subnetworks, or modules, include genes normally acting in concert but whose joint function may be disrupted, if any of its members is missing, or disregulated. For disease modules, this disruption can lead to a disease phenotype. The identification of such modules is therefore useful for elucidating disease mechanisms and establishing new biomarkers and potential therapeutic targets. Yet, which methods work best to extract such modules from different types of networks is not well understood. This prompted us to initiate the ‘Disease Module Identification (DMI) DREAM Challenge’ (Choobdar et al., 2019), providing an unbiased and critical assessment of 75 contributed module identification methods. Our method evaluation used summary statistics from more than 200 disease relevant genome-wide Association Studies in conjunction with our Pascal tool (Lamparter et al., 2016), avoiding the bias of using annotated molecular pathways.

The top-performing methods implemented novel algorithms that advanced the state-of-the-art, clearly outperforming off-the-shelf tools. We, therefore, decided to make the top three methods available for the bioinformatics community in a single user-friendly package: MONET is a command line tool based on Docker and Singularity virtualization technologies, automatically installing the tool with all its dependencies inside a container, avoiding time-consuming and error-prone manual installations of computing environments and libraries. All computations then take place in this sandbox environment and once the output is ready, all resources can be fully released bringing the user’s machine back to its original state.

2 Methods and implementation

While our challenge was able to establish Kernel Clustering Optimization using theDiffusion State Distance (DSD)metric by Cao et al. (2014) (hereafter K1) as the overall winner, there were several strong competitors using entirely different approaches for the network modularization. Importantly, we observed that no single method was superior on all network types and that disease modules identified by different methods were often complementary (Choobdar et al., 2019).

2.1 K1: top method using kernel clustering

K1 is based on the DSD, a novel graph metric which is built on the premise that paths through low-degree nodes are stronger indications of functional similarity than paths that traverse high-degree nodes by Cao et al. (2014). The DSD metric is used to define a pairwise distance matrix between all nodes, on which a spectral clustering algorithm is applied. In parallel, dense bipartite sub-graphs are identified using standard graph techniques. Finally, results are merged into a single set of non-overlapping clusters.

BLOG: https://www.synapse.org/#!Synapse:syn7349492/wiki/407359.

2.2 M1: top method using modularity optimization

M1 employs an original technique named Multiresolution introduced by Arenas et al. (2008) to explore all topological scales at which modules may be found. The novelty of this approach relies on the introduction of a parameter, called resistance, which controls the aversion of nodes to form modules. Modularity (Arenas et al., 2007; Newman and Girvan, 2004) is optimized using an ensemble of algorithms: extremal optimization (Duch and Arenas, 2005), spectral optimization (Newman, 2006), fast algorithm (Newman, 2004), tabu search (Arenas et al., 2008) and fine-tuning by iterative repositioning of individual nodes in adjacent modules.

BLOG: https://www.synapse.org/#!Synapse:syn7352969/wiki/407384.

2.3 R1: top method using random walk

R1 is based on a variant of Markov Cluster Algorithm known as balanced Multi-layer Regularized Markov Cluster Algorithm (bMLRMCL) (Satuluri et al., 2010), which scales well to large graphs and minimizes the number of oversized clusters. First, a pre-processing step is applied so that edges with low weights are discarded and all remaining edges are scaled to integer values. Then, bMLRMCL is applied iteratively on modules of size greater than a user-defined threshold.

BLOG: https://www.synapse.org/#!Synapse:syn7286597/wiki/406659.

3 Performance

Figure 1 illustrates the performance of the MONET algorithms on simulated graphs with planted community structure, generated using the class of benchmark graphs proposed by Lancichinetti et al. (2008). Modularization performance is measured using Normalized Mutual Information (NMI). Experiments were carried out on regular desktop hardware. In accordance with performance evaluations within the DMI DREAM Challenge, K1, the winner, requires the most computational resources, with a runtime of about one day and the highest memory allocation for processing on the Challenge inputs. M1, the second runner-up, completed the Challenge in a few hours and displayed excellent performance on the simulated benchmark (even superior to K1, especially in case of extremely high fraction of inter-module edges and extremely low memory requirements). R1, the second runner-up, is the only method that requires parameters to be tuned (nine in total); nevertheless, we believe it is an excellent addition to our tool, as it performed close to K1/M1 on the benchmark, it requires only moderate memory and has an extremely low run time (it completed the Challenge in under an hour). Please refer to the Supplementary Information for details about the execution time.

Fig. 1.

Fig. 1.

Comparison of the MONET methods (K1, M1 and R1) against a baseline (Louvain) on simulated graphs with planted community structure. On the left: clustering performance (NMI) as a function of the fraction of inter-module edges (mixing parameter). Right: memory requirements as a function of network size. Each point represents an average of the results obtained performing a grid search over the following parameter space (at least two repetitions for each combination of parameters): number of nodes: 5k, 7k, 8k, 10k; average node degree: 15, 20, 25; exponent of the distribution of community sizes: 1, 2 and exponent of the distribution of node degrees: 2, 3

4 Installation and usage

MONET is extremely simple to install/uninstall and run. The only requirement is having installed either Docker (Merkel, 2014) or Singularity (Kurtzer et al., 2017). For detailed instructions and information about usage and I/O formats, please refer to the README file on the github repository.

$ git clone https://github.com/BergmannLab/MONET.git

$ cd MONET &&./install.sh

$ monet --help

$ monet --method=M1 --container=docker \

--input=./input/network.txt --output=./output

Funding

This work was supported by the Swiss National Science Foundation grant no. FN 310030_152724/1.

Conflict of Interest: none declared.

Supplementary Material

btaa236_Supplementary_Data

References

  1. Arenas A. et al. (2007) Size reduction of complex networks preserving modularity. N. J. Phys., 9, 176. [Google Scholar]
  2. Arenas A. et al. (2008) Analysis of the structure of complex networks at different resolution levels. N. J. Phys., 10, 053039. [Google Scholar]
  3. Cao M. et al. (2014) New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics, 30, i219–i227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Choobdar S. et al. ; The DREAM Module Identification Challenge Consortium. (2019) Assessment of network module identification across complex diseases. Nat. Methods, 16, 843–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Duch J., Arenas A. (2005) Community detection in complex networks using extremal optimization. Phys. Rev. E, 72, 027104. [DOI] [PubMed] [Google Scholar]
  6. Kurtzer G.M. et al. (2017) Singularity: scientific containers for mobility of compute. PLoS One, 12, e0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lamparter D. et al. (2016) Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput. Biol., 12, e1004714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lancichinetti A. et al. (2008) Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78, 046110. [DOI] [PubMed] [Google Scholar]
  9. Merkel D. (2014) Docker: lightweight linux containers for consistent development and deployment. Linux J., 239, 2. [Google Scholar]
  10. Newman M.E. (2004) Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69, 066133. [DOI] [PubMed] [Google Scholar]
  11. Newman M.E. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Newman M.E., Girvan M. (2004) Finding and evaluating community structure in networks. Phys. Rev. E, 69, 026113. [DOI] [PubMed] [Google Scholar]
  13. Satuluri V. et al. (2010) Markov clustering of protein interaction networks with improved balance and scalability. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (BCB’10) ACM, New York, NY, USA, pp. 247–256.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btaa236_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES