Abstract
Summary
We define a disease module as a partition of a molecular network whose components are jointly associated with one or several diseases or risk factors thereof. Identification of such modules, across different types of networks, has great potential for elucidating disease mechanisms and establishing new powerful biomarkers. To this end, we launched the ‘Disease Module Identification (DMI) DREAM Challenge’, a community effort to build and evaluate unsupervised molecular network modularization algorithms. Here, we present MONET, a toolbox providing easy and unified access to the three top-performing methods from the DMI DREAM Challenge for the bioinformatics community.
Availability and implementation
MONET is a command line tool for Linux, based on Docker and Singularity containers; the core algorithms were written in R, Python, Ada and C++. It is freely available for download at https://github.com/BergmannLab/MONET.git.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
Gene networks, such as protein interaction, signaling, gene co-expression and homology networks, provide scaffolds of linked genes. Subnetworks, or modules, include genes normally acting in concert but whose joint function may be disrupted, if any of its members is missing, or disregulated. For disease modules, this disruption can lead to a disease phenotype. The identification of such modules is therefore useful for elucidating disease mechanisms and establishing new biomarkers and potential therapeutic targets. Yet, which methods work best to extract such modules from different types of networks is not well understood. This prompted us to initiate the ‘Disease Module Identification (DMI) DREAM Challenge’ (Choobdar et al., 2019), providing an unbiased and critical assessment of 75 contributed module identification methods. Our method evaluation used summary statistics from more than 200 disease relevant genome-wide Association Studies in conjunction with our Pascal tool (Lamparter et al., 2016), avoiding the bias of using annotated molecular pathways.
The top-performing methods implemented novel algorithms that advanced the state-of-the-art, clearly outperforming off-the-shelf tools. We, therefore, decided to make the top three methods available for the bioinformatics community in a single user-friendly package: MONET is a command line tool based on Docker and Singularity virtualization technologies, automatically installing the tool with all its dependencies inside a container, avoiding time-consuming and error-prone manual installations of computing environments and libraries. All computations then take place in this sandbox environment and once the output is ready, all resources can be fully released bringing the user’s machine back to its original state.
2 Methods and implementation
While our challenge was able to establish Kernel Clustering Optimization using the ‘Diffusion State Distance (DSD)’ metric by Cao et al. (2014) (hereafter K1) as the overall winner, there were several strong competitors using entirely different approaches for the network modularization. Importantly, we observed that no single method was superior on all network types and that disease modules identified by different methods were often complementary (Choobdar et al., 2019).
2.1 K1: top method using kernel clustering
K1 is based on the DSD, a novel graph metric which is built on the premise that paths through low-degree nodes are stronger indications of functional similarity than paths that traverse high-degree nodes by Cao et al. (2014). The DSD metric is used to define a pairwise distance matrix between all nodes, on which a spectral clustering algorithm is applied. In parallel, dense bipartite sub-graphs are identified using standard graph techniques. Finally, results are merged into a single set of non-overlapping clusters.
BLOG: https://www.synapse.org/#!Synapse:syn7349492/wiki/407359.
2.2 M1: top method using modularity optimization
M1 employs an original technique named Multiresolution introduced by Arenas et al. (2008) to explore all topological scales at which modules may be found. The novelty of this approach relies on the introduction of a parameter, called resistance, which controls the aversion of nodes to form modules. Modularity (Arenas et al., 2007; Newman and Girvan, 2004) is optimized using an ensemble of algorithms: extremal optimization (Duch and Arenas, 2005), spectral optimization (Newman, 2006), fast algorithm (Newman, 2004), tabu search (Arenas et al., 2008) and fine-tuning by iterative repositioning of individual nodes in adjacent modules.
BLOG: https://www.synapse.org/#!Synapse:syn7352969/wiki/407384.
2.3 R1: top method using random walk
R1 is based on a variant of Markov Cluster Algorithm known as balanced Multi-layer Regularized Markov Cluster Algorithm (bMLRMCL) (Satuluri et al., 2010), which scales well to large graphs and minimizes the number of oversized clusters. First, a pre-processing step is applied so that edges with low weights are discarded and all remaining edges are scaled to integer values. Then, bMLRMCL is applied iteratively on modules of size greater than a user-defined threshold.
BLOG: https://www.synapse.org/#!Synapse:syn7286597/wiki/406659.
3 Performance
Figure 1 illustrates the performance of the MONET algorithms on simulated graphs with planted community structure, generated using the class of benchmark graphs proposed by Lancichinetti et al. (2008). Modularization performance is measured using Normalized Mutual Information (NMI). Experiments were carried out on regular desktop hardware. In accordance with performance evaluations within the DMI DREAM Challenge, K1, the winner, requires the most computational resources, with a runtime of about one day and the highest memory allocation for processing on the Challenge inputs. M1, the second runner-up, completed the Challenge in a few hours and displayed excellent performance on the simulated benchmark (even superior to K1, especially in case of extremely high fraction of inter-module edges and extremely low memory requirements). R1, the second runner-up, is the only method that requires parameters to be tuned (nine in total); nevertheless, we believe it is an excellent addition to our tool, as it performed close to K1/M1 on the benchmark, it requires only moderate memory and has an extremely low run time (it completed the Challenge in under an hour). Please refer to the Supplementary Information for details about the execution time.
4 Installation and usage
MONET is extremely simple to install/uninstall and run. The only requirement is having installed either Docker (Merkel, 2014) or Singularity (Kurtzer et al., 2017). For detailed instructions and information about usage and I/O formats, please refer to the README file on the github repository.
$ git clone https://github.com/BergmannLab/MONET.git
$ cd MONET &&./install.sh
$ monet --help
$ monet --method=M1 --container=docker \
--input=./input/network.txt --output=./output
Funding
This work was supported by the Swiss National Science Foundation grant no. FN 310030_152724/1.
Conflict of Interest: none declared.
Supplementary Material
References
- Arenas A. et al. (2007) Size reduction of complex networks preserving modularity. N. J. Phys., 9, 176. [Google Scholar]
- Arenas A. et al. (2008) Analysis of the structure of complex networks at different resolution levels. N. J. Phys., 10, 053039. [Google Scholar]
- Cao M. et al. (2014) New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics, 30, i219–i227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choobdar S. et al. ; The DREAM Module Identification Challenge Consortium. (2019) Assessment of network module identification across complex diseases. Nat. Methods, 16, 843–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duch J., Arenas A. (2005) Community detection in complex networks using extremal optimization. Phys. Rev. E, 72, 027104. [DOI] [PubMed] [Google Scholar]
- Kurtzer G.M. et al. (2017) Singularity: scientific containers for mobility of compute. PLoS One, 12, e0177459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamparter D. et al. (2016) Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput. Biol., 12, e1004714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lancichinetti A. et al. (2008) Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78, 046110. [DOI] [PubMed] [Google Scholar]
- Merkel D. (2014) Docker: lightweight linux containers for consistent development and deployment. Linux J., 239, 2. [Google Scholar]
- Newman M.E. (2004) Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69, 066133. [DOI] [PubMed] [Google Scholar]
- Newman M.E. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman M.E., Girvan M. (2004) Finding and evaluating community structure in networks. Phys. Rev. E, 69, 026113. [DOI] [PubMed] [Google Scholar]
- Satuluri V. et al. (2010) Markov clustering of protein interaction networks with improved balance and scalability. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology (BCB’10) ACM, New York, NY, USA, pp. 247–256.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.