Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2021 Dec 9;50(5):e27. doi: 10.1093/nar/gkab1200

Interpretation of network-based integration from multi-omics longitudinal data

Antoine Bodein 1, Marie-Pier Scott-Boyer 2, Olivier Perin 3, Kim-Anh Lê Cao 4, Arnaud Droit 5,
PMCID: PMC8934642  PMID: 34883510

Abstract

Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.

INTRODUCTION

Cost reductions of DNA sequencing in addition to other high-throughput multi-omics technologies have revolutionized many research fields ranging from personalized medicine (1,2) to systems biology (3,4). These innovations have led to new biological insights and a better understanding of living organisms (5–7). Thus, enabling the assessment of most biological layers, this democratisation of high-throughput technologies has created large datasets representing different biomolecules that necessitate specific processing and statistical methods (8,9). Multi-omics trials typically collect different types of biomolecules (mRNA, proteins, metabolites, etc.) from the same biological samples with the ultimate goal of highlighting the interaction between biological layers that could be responsible for causing complex phenotype or diseases (2). Even more so most biological phenomenon involves complex interactions between layers that vary through time (10). Adapted multi-omics time-course methods to integrate and accurately capture interactions among those biological layers are thus now required of and fully capture interactions within and between omic layers.

To describe complex interactions and regulatory mechanisms behind biological systems, mathematical models, such as network, are built to interpret and reverse-engineer cellular functions. Networks are used to represent all relevant interactions taking place in a biological systems (11). In networks, molecules (genes, proteins, metabolites) are reduced to a series of nodes that are connected to each other by edges. Edges represent the pairwise relationships, interactions, between two molecules within the same network. Molecular networks have become extremely popular and have been used in every area of biology to model for example transcriptional regulation mechanisms, physical protein–protein interactions (12,13), or metabolic reactions (14). Networks come with valuable properties and useful topological features such as degree distribution to identify highly connected nodes or shortest paths which determine proximity between two nodes. On a different scale, network modularity defines sub-network units with highly connected nodes in respect to the rest of the network. These sub-networks, also known as modules, often share a similar function. Thus, the ‘guilt by association’ property assumes that known or unknown highly connected molecules should be functionally related (15).

Inference methods for network construction are often applied to a single omic layer to identify interaction between molecules. However, this does not directly elucidate interaction across multiple omic layers (16). To connect these layers, a first approach require prior knowledge of across omic molecular networks such as publicly-available databases (17). This approach is based on a legacy limited to model organisms and may not reflect the current biological condition. A second method use multivariate data-driven methods that statistically infers correlations between molecules based on multi-omics data. However, this approach may have many possible solutions. A combination of the two methods could improve multi-omics network construction.

The ultimate goal of multi-omics networks is to connect phenotypes to biological mechanisms and their regulators. Analysis of the interactome identifies direct neighbors and modules linked to a phenotype. However, direct neighbors can lead to false discoveries. Phenotypes and molecules may be linked by irrelevant interactions. Furthermore, our knowledge of the interactome is not complete and we can miss true interactions or interactions with more distant molecules (18). Based on the work of Page et al. (19), propagation algorithms recently became the state-of-the-art to investigate gene disease associations and also gene function prediction (18–20). From the known association, the signal is iteratively propagated through the network. When a steady state is reached, new nodes can be added to the initial association by their propagation score reflecting their proximity to the starting nodes. It thus highlights potential new phenotype-related targets. New advances in randoms walks algorithms allow to propagate the signal in heterogeneous multi-layered networks which improves association prediction (21).

In this paper, we propose to build hybrid multi-omics networks from longitudinal multi-omics data in order to facilitate the interpretation of multi-layers systems (Figure 1). This methodology is based in the first place on the modelling and clustering of expression profiles with similar behaviours over time. It relies on both accurate network reconstruction methods and knowledge-based reviewed interactions between either molecules of the same or different types. Finally, a random walk algorithm was used to identify and make new hypothesis about links between omics molecules and key biological functions or mechanisms. The main objectives of this method is to provide a versatile framework for multi-omics network-based integration but also to provide interpretation guidelines to explore these networks to further highlight key intra-omics and inter-omics mechanisms and interactions. We illustrate this approach through three case studies. These studies have different experimental designs with different omics data types, timepoints and organisms to demonstrate that the proposed approach is able to deal with a wide range of situations.

Figure 1.

Figure 1.

Overview of the proposed approach. (A) Description of the experimental design: the same biological material is sampled at several time points across several omic layers indicated in different colors. Each omic data is normalised using both platform-specific and time-specific normalisation steps. (B) Multi-Omics Network is built using both inference-based and knowledge-based methods to connect intra- and cross-layered biological features or molecules (mRNA, proteins, metabolites). Measured molecules are clustered into groups of similar expression profiles over time and corresponding nodes formed kinetic sub-networks. Over-representation analysis is performed to add an extra layer of functional annotation. Propagation analysis is performed on specific nodes of interest, called seeds (biological function, gene, protein, metabolite, etc.) to identify closely related molecules.

MATERIALS AND METHODS

This approach proposes pre-processing, modelling and clustering steps for multi-omics longitudinal data. It mainly emphasizes on network-based integration and multi-layered network exploration (Figure 2) using network propagation algorithms in order to provide new biological insights. We developed the R package netOmics which wraps the method proposed below. It was developed to simplify and reproduce the integration and interpretation steps. It provides documentation and practical guidelines to build and explore (longitudinal) multi-omics networks.

Figure 2.

Figure 2.

Workflow diagram illustrating the main steps of network based integration using longitudinal multi-omics data.

Multi-Omics longitudinal design

We define longitudinal multi-omics designs as follow. From the same biological sample, omics data are produced (RNA, proteins, etc.) at different timepoints. Raw data are processed to get (NxP) abundance tables by omics data type with samples in rows and molecules or biological features (RNA, proteins, metabolites, etc.) in columns. We call these tables blocks. In this framework, there is no need to have matching timepoints between blocks because we use a modelling step to interpolate missing timepoints and even out uneven designs.

Pre-processing of longitudinal multi-omics data

We assume each omics data is a raw count table resulting from bioinformatics quantification pipelines (22,23). Low counts are filtered and data are normalized according to the type of data in each table. We also applied a filter on time profiles and kept only molecules with the highest expression fold change between the lowest and highest point over the entire time course, as described in (24). For each case study, we adapted these filters to take into account platform-specific dynamic range of values.

timeOmics: modelling and clustering of longitudinal multi-omics data

The timeOmics approach (24) was used to cluster multi-omics molecules with similar expression profiles over time. The framework is based on two main steps:

  1. The first step uses a Linear Mixed Model Spline framework (25) to model every molecule over the time-course by taking into account the inter-individual variation. This framework tests different models and assigns the best model to each molecule according to a goodness of fit test. One of its benefits is to allow interpolation of missing timepoints and thus accommodate non-regular experimental designs with missing data.

  2. The second step clusters the modelled expression profiles in groups of similar expressions over time. This is performed using various multivariate projection-based methods implemented in mixOmics (26). With a 3-blocks omic design, we used multi-block Projection on Latent Structures (block PLS) to cluster time profiles from multi-omics datasets. Optimal number of clusters is determined by maximising the average silhouette coefficient.

The objective of these preliminary steps is to summarise each molecule with an expression pattern. This tag will then be used in the multi-omics network reconstruction to build cluster-specific sub-networks.

Network reconstruction

In order to build a multi-omics network, we started by building a map for each layer (genes, proteins, metabolites, etc) using a combination of both data-driven and knowledge-driven building methods. The method used is specific to the type of data and are described below. As mentioned in section 2.3, we kept the clustering information by building several sub-networks per kinetic cluster. We also built an entire network without the cluster labels.

Data-driven network reconstruction

For gene expression data, we relied on Gene Regulatory Network inference. This class of method tries to reverse engineer complex regulatory mechanisms in the organisms and infer relationship between genes. ARACNe (27) is a co-expression-based inference algorithm which identify most likely TF–targeted genes interaction by estimating mutual information, a similarity distance between pairs of transcript expression profile. We used ARACNe algorithm on gene expression profile to infer potential TF- targeted genes interaction from the gene expression dataset.

Knowledge-driven network

Some kind of interactions cannot be revealed by inference methods and has to be experimentally determined (e.g. protein–protein interactions, ChIP). We then relied on reviewed interactions found in specialized databases to connect molecules of the same type and also get cross-layered interactions (binding, enzyme, regulation) From the measured molecules in the dataset, we collected all possible interactions through targeted databases. To maximize cross-layered connectivity, we also included non-measured proteins or metabolites which were directly connected to measured molecules.

Protein interactions

Physical or functional protein–protein interaction (PPI) is one kind of interaction that is difficult to predict and PPI network inference algorithms for MS data are still in their infancy (28). For human proteomic data, we relied on the BioGRID database (29) which records >1.8 million proteins and genetics interactions from major model organisms. This database collects experimentally determined physical protein–protein interactions and also connects transcriptomics and proteomics layers with regulatory relationship (TF–gene interactions). For other model organisations, we can rely on more specialised or custom databases (30).

Metabolite interactions

We used the KEGG Pathway database (31) which records a collection of manually drawn metabolic pathways representing molecular interaction, reaction and relation networks for human and other model organisms. We used KEGG to link metabolite compounds involved in the same reactions. We also connect metabolites to genes and/or proteins if they are involved in the same biochemical enzymatic reaction thanks to KEGG Orthology database that links genes to high-level functions.

The objective of this building step is to provide an entire multi-omics network composed of three main layers and several sub-networks specific to kinetic clusters. The next steps will focus on the analysis of these multi-omics networks.

Enrichment analysis

Over representation analysis (ORA) helps to find enriched and meaningful biological insights from interacting biomolecules. This task was achieved using gProfiler2 (32), first on each kinetic cluster and then on all molecules from the entire network. We focused on the three Gene Ontology (GO) terms: Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). P-values were corrected with gProfiler2 custom multiple testing correction algorithm (g:SCS) (33) and only significant terms were considered (g: SCS < 0.05). Size and significance of P-values distributions were compared between both clusters and entire network approaches. We also used Fisher’s combined probability test (34) for multi-omics P-values comparison.

Random walk

As described in (21), in an undirected graph G = (V, E), the random walk (RW) starts from a node (v0), called seed, and simulate a particle that randomly moves from one node vt to another vt + 1 following the probability distribution: Inline graphic

graphic file with name M0001a.gif (1)

where d(x) is the degree of the node x in the graph G. Valdeolivas et al. (21) also added the possibility to restart at the initial node to avoid dead ends in multi-layered networks. When a steady state is reached, the algorithm gives a probability score to each node of the network which represents the proximity of that node and the seed.

We then used the R package RandomWalkRestartMH (21) to apply random walk with restart algorithm on multi-omics network with three main purposes to guide interpretation. (i) RW can be used to identify multi-omics nodes and their interactions linked to mechanisms of interest (e.g. GO:BP). Therefore, a GO term node can be turned into a seed and RW can be performed from that starting point. Then, a sub-network with the top 25 closest nodes to that seed can be built. Naively, all significant GO term nodes were iteratively turned into seeds. We then screened sub-networks containing different types of molecules to highlight the multi-omics aspects of the integration. We applied this analysis on both kinetic cluster sub-networks and entire network. (ii) RW can be used for nodes function prediction. Similar as above, unlabelled nodes can be turned into seeds. We relied on gProfiler2 annotations to identify nodes without any known functions. For an unlabelled seed, a list of ranked nodes was produced and the closest GO term node was assigned to that seed. We repeated this for the three GO ontologies (BP, MF, CC) on the entire network. (iii) Combined to kinetic clusters, RW can locate regulatory mechanisms and find interacting nodes with different expression profiles in the entire network. Once again, each node can be turned into seed and sub-networks were built using the 10 closest nodes. Then we screened sub-networks with different cluster labels from the seeds that might reveal underlying regulatory mechanisms.

Data

In the following section, two published multi-omics case studies are presented. These applications have longitudinal multi-omics designs, but each block was analysed separately. We modelled these dataset with multi-layer networks to highlight the multi-omics interactions. Specific analysis steps for each example are described, along with specific databases used for each case study.

Case study 1: HeLa cell cycling study

Understanding the complex relationship between gene expression, translation product and protein abundance is the key to decrypt biological mechanisms. Genes undergo several steps of regulation before turning into proteins including transcription, translation, folding, post-translational modification and eventually degradation (35) studied the poor correlation between mRNA and related protein levels during cell cycling regulation using triplicate measurement of mRNA expression (microarray), translation product (PUNCH-P) and protein quantification (MS) from synchronized HeLa S3 cells. Authors sampled cells during phases G1, S and G2 (Figure 3).

Figure 3.

Figure 3.

HeLa cell cycling study: overview of the analysis: (A) Experimental design: three samples are collected for each steps G1, S and G2. For each sample, RNA, translation products and proteins are quantified. (B) Multi-Omics longitudinal clustering: Clustered expression profiles of mRNA, translation products and proteins. Each line represents the modelled abundance of a molecule during the cycle. 4 clusters were obtained using the timeOmics clustering approach. Cluster compositions are detailed in Table 2. (C) Multi-omics network layout: represents the connection between entities and their different types of interactions. The network was composed of four layers: a gene layer build from mRNA expression, a PPI layer build from measured proteins and BioGRID known interactions, a metabolite layer from KEGG pathways and GO term layer from enrichment analysis.

The authors were able to find clusters of patterns in the expression of genes and their products related to key functions of the cell cycle. These functions were up- or down-regulated at different stages. For example, Cell division, Cytokinesis, Spindle, Chromosome segregation and Microtubule based movement biological processes shared similar patterns and were down-regulated in G1/S and up-regulated in S/G2 transitions. Interestingly, multi-omics showed that a gene and its derivatives can have different expression patterns during a given time course, which highlights underlying molecular mechanisms, such as mRNA or protein degradation.

In this first case study, we intended to highlight the multi-omics interactions involved in the control of HeLa cell cycle during G1, S and G2 phases.

To do so, we filtered the RMA-normalized mRNA with a 2-Fold-Change filter. iBAQ normalized translatome products and proteins were filtered with a different 3-Fold-Change threshold because of the differences in platform-specific dynamic ranges.

We modelled the expression of every molecule with LMMS including the variations of the three replicates for each of the three timepoints. We built multi-omics clusters of expression profiles based on the direction of variation between each step. mRNA and proteins networks were built with ARACNe and BioGrid interaction databases, respectively. Protein coding and TF regulated information were used to connect these layers where UniProtID were converted into Gene Symbols (https://www.uniprot.org/uploadlists/) to ensure proper matching. We also included metabolite reactions from KEGG connected to protein enzymes to metabolites. We performed ORA with mRNA, translatome products and proteins against GO:BP, GO:MF and GO:CC terms for both clusters and entire set of molecules. Finally, we performed RW on this multi-omics network.

Case study 2: dynamic maize responses to aphid feeding

Maize (Zea mays) is one of the most productive cereal crops in the world. However, the plant is subject to numerous biotic attacks caused by herbivorous insects and it is therefore critical to understand the maize defense mechanisms in order to improve its productivity. Aviner et al. (35) studied the dynamic of maize response to aphid feeding and they found that mutants in benzoxazinoid biosynthesis and terpene synthases genes do affect aphid proliferation.

To measure gene expression changes over time, authors exposed five two-weeks maize plants (B73) to corn leaf aphid (Rhopalosiphum maidis) during four days (Figure 7). They also include five control maize plants with the same growing conditions minus the exposure to aphids. During this time course, they sampled the five exposed and five controls plants at six timepoints (i.e. 2, 4, 8, 24, 48, 96 h) and conducted gene expression profiling with RNA-seq, LC-TOF-MS nontargeted metabolite quantification as well as amino-acids, phospholipids and terpenes targeted metabolite quantification.

Figure 7.

Figure 7.

Dynamic response to maize aphid feeding study: (A) Experimental design: five samples are collected for each condition and for each timepoint (2,4,8,24,48,96 h). For each sample, RNA and targeted metabolites are quantified. (B) Multi-Omics longitudinal clustering: Clustered expression profiles of mRNA, TF–coding genes and metabolites. Each line represents the modelled abundance of a molecule during the time-course. (C) Multi-omics network layout: represents the connection between entities and their different types of interactions.

In the paper, the authors focused on the genes and metabolites involved in the maize response to aphids. We intended to go a little further by addressing the complex regulatory relationships that may exist between these multi-omic actors over time.

For this example, we focused on the first five timepoints (i.e. 2, 4, 8, 24, 48 h). We discarded genes which were not differentially expressed between exposed and control groups and those having an expression difference <2-fold over the entire time course. We split TF-coding genes from other transcripts to get a 3-omics-like experimental design. Finally, we discarded nontargeted metabolite as they were not annotated and performed longitudinal clustering with multiple block PLS. For network reconstruction, we used ARACNe on genes and TF. We used the Protein-Protein Interaction database for Maize (PPIM) (36) to identify protein-coding genes and extracted direct neighbor interactions. PPIM contains more than 2.7 millions interactions between protein, TF and gene interactions. These interactions are either predicted or experimentally determined from public databases such as UniProt (37), BioGrid (29), DIP (38), IntAct (39) and MINT (40) or using to text mining. We mainly focused on validated interactions and predicted ones qualified as high-confidence interactions that represents 155 845 interactions with top Inline graphic highest decision scores (36). Then we connected the measured metabolites and those involved in the same reaction to the genes/proteins using KEGG. We performed ORA with genes, TF and proteins on GO:BP, GO:MF and GO:CC terms. Finally, we performed RW on this multi-omics network.

Case study 3: diabetes seasonal study

Diabetes is the seventh leading cause of death in the world according to the WHO (41) and its prevalence is constantly increasing (42). The role of the integrative Human Microbiome Project (iHMP) is to study the impact of host-microbiome relationships on diabetes mechanisms of appearance and progression (43).

In the study from Integrative (44), 105 subjects with diabetes (Insulin Sensitive and Insulin Resistant) were followed over a period of more than 4 years. Each patient was sampled every 3 months and every 3–4 days during stress periods. This resulted in an average of 27 samples per patient. At each visit, 51 clinical tests were performed. From the blood samples, transcriptomics, proteomics, metabolomics, cytological and microbiome data (oral and gut) were produced.

Thanks to this complex design, Sailani et al. (45) identified several molecules linked to diabetes and important biological changes related to annual seasonality. They found different shift in expression in all omics molecules and they group them into two main expression profile clusters. We expect that network integration will provide a deeper understanding of the diabetes process and the role of host-microbiome interaction in that disease.

Since microbiome data is highly variable we decided to perform the modelling and integration on only one individual. We also reduced the time period to one year (7 timepoints) for the same reason. We performed the integration of transcriptomics, proteomics, metabolomics, cytokines, gut microbiota data and clinical variables. We applied a Fold-Change filter for each omics block, except the clinical variables, with a specific threshold. We modelled the data with LMMS and used a multiple block PLS to cluster the data. We applied the sparse multi-block PLS to identify a key signature per cluster. For network reconstruction, we applied the ARACNe algorithm (27) separately on transcriptomics, unlabelled metabolomics and clinical variables. We used the BioGRID interaction database (29) to build a PPI network from proteins and cytokines and used the information of TTRUST (46) and TF2DNA (47) to connect these molecules to the transcriptomics layer. We applied the SparCC algorithm (|ρ| ≥ 0.3) to build a microbiome network as recommended in (48). With limited knowledge about host-microbiota interactions, microbiota network were connected to each of the layers listed above by computing the spearman rank correlation on the expression data and kept only the highest interactions between microbiota and other layer (|ρ| ≥ 0.99). We followed the same procedure to connect the unlabelled metabolomics data and clinical variables to the other layers. Lastly, we performed an enrichment analysis with the transcriptomics, proteomics and cytokines molecules against Gene Ontology and we added the significant GO terms as a GO layer by connecting the significant GO terms to the corresponding RNA, proteins or cytokines. In addition, we performed a gene-related disease enrichment analysis with MedlineRanker (49) a data mining tool, which searches for publication abstracts in which genes were linked to diseases (MedlineRanker parameters: Min number of citations = 5; min number of genes significantly associated with a disease = 2; FDR:0.05). Like GO terms, disease terms were added to the network and connected to the related genes. Finally, we performed RW on this multi-omics network and the cluster specific sub-networks.

RESULTS

Case study 1: HeLa cell cycling study

In this example, we studied the HeLa cell cycling from Aviner et al. (35). This dataset was composed of three omic layers and three timepoints.

Pre-processing, modelling and clustering of time profile

HeLa cell dataset was assembled into a single table focused on proteins. After missing value removal, this dataset was composed of 6785 mRNA, 4102 translation products and 5023 proteins (Table 1). 448 mRNA, 2672 translation products and 4295 proteins remained after the Fold-Change filtering step. Finally, once all molecules were modelled over time and noisy profiles were removed, 446 mRNA, 2318 translation products and 4237 proteins remained.

Table 1.

HeLa cell cycling study: Initial number of mRNA, translation products and proteins, and remaining molecules after Fold-Change and noisy modelled profile filtering. A threshold of 2-FC was applied to mRNA and 3-FC to both proteins and translation products

Raw counts Fold change LMMS filter
mRNA 6785 448 446
Trans. products 4102 2672 2318
Proteins 5023 4295 4237

Time profile clustering

Once all the molecules were modelled over time and noisy profiles were removed, remaining expression profiles were clustered according to their differential expression value between two timepoints. This clustering resulted in 4 clusters (Table 2) with a silhouette coefficient of ssil = 0.75. We compared this clustering with the timeOmics approach with four clusters for similar parameters. timeOmics resulted in a lower silhouette coefficient (ssil = 0.63). In the first approach, Cluster 1 included the largest number of molecules (n = 4443). It is characterized by a decrease between the G1 and S phases and then an increase between S and G2/M. This seemed to be the main kinetic pattern for proteins since it contained the majority of them Inline graphic. Cluster 2 (n = 1444), included the largest amount of translation products (Inline graphic) and it was characterized by an increase in expression from the first to the last step. Cluster 3 (n = 958) showed an opposite pattern compared to the cluster 2 with a decrease across the overall time course. Finally Cluster 4 (n = 156) included the least number of molecules and no protein appeared to follow an increase and decrease pattern.

Table 2.

HeLa cell cycling study: clusters composition

mRNA Translation products Proteins
Cluster 1 233 187 4023
Cluster 2 81 1244 119
Cluster 3 35 828 95
Cluster 4 97 59 0

Multi-layered network reconstruction

The first layer to be reconstructed was the gene inference network from the mRNA. We used the ARACNe algorithm to build a network by kinetic cluster but also to build a entire network composed of all the mRNA. Supplementary Table S1 shows statistics about the sub-networks such as the number of connected / disconnected nodes and edges.

The second layer was the PPI network. Proteins were connected to each other using the BioGRID interaction database. As for the genes, PPI sub-networks were built for each kinetic cluster but we also built a entire PPI network composed of all the proteins. In addition, we included BioGRID proteins which were directly connected to the measured ones (first degree neighbours). Number of nodes and edges are detailed in Supplementary Table S1.

These first two layers were combined thanks to two types of links. First, protein-coding information linked 126 genes to their corresponding proteins. Second, TF-regulated information from TF2DNA (47) and TTRUST databases linked 57 proteins to 403 genes (16 846 interactions). In addition, KEGG pathway database was also used to link protein enzymes to metabolite reactions. This new layer was composed of 1595 metabolites connected to 2213 proteins (12 694 interactions).

Enrichment analysis

Over representation analysis (ORA) was performed on mRNA, translation products, proteins and first degree connected proteins against GO:BP, GO:MF and GO:CC. This approach was achieved both for each kinetic cluster and the entire network. The most significant GO terms found in clusters were strongly related to cell development RNA processing (GO:0006396), ribonucleoprotein complex biogenesis (GO:0022613), cellular component organization or biogenesis (GO:0071840) or RNA splicing, (GO:0008380). Full results of the ORA are detailed in Supplementary Table S2. With the largest number of molecules, proteins gathered the most significant GO terms. Table 3 details the number of enriched GO terms by omic types. Enriched terms were splited by ontologies. mRNA collected the fewest enriched terms in the three ontologies, followed by translation products and proteins. The first degree neighbours (extended proteins) collected the most terms, as they have the highest number of nodes in the network.

Table 3.

HeLa cell cycling study: number of unique significant enriched GO terms by omics type and by ontology

GO:BP GO:CC GO:MF
mRNA 255 82 54
translation products 331 124 70
Proteins 353 128 69
Extended Proteins 1490 390 280

In Figure 4, we compared the number of unique GO terms and the distribution of P-value between the global or cluster approach. With the exception of translation products which had more enriched term in clusters than the entire set, both approaches seemed to have a similar number of unique GO terms but some terms DNA replication (GO:0006260), mitotic cell cycle (GO:0000278) were shared between clusters. In addition the clustering approach gave much more significant terms than the global approach with the extended proteins. There was no significant difference in the P-value distributions however the clustering approach results in smaller p-values. In the Figure 4C, some GO terms were exclusively found in certain clusters such as: regulation of cell projection organization (GO:0031344) and nervous system development (GO:0007399) in Cluster 1, spindle midzone assembly (GO:0051255) and meiotic chromosome separation (GO:0051307) in Cluster 2, mitotic G2/M transition checkpoint (GO:0044818) and regulation of DNA endoreduplication (BP) GO:0032875 in Cluster 3, negative regulation of gene expression, epigenetic (GO:0045814) and regulation of chromatin silencing (GO:0031935) in Cluster 4. Moreover some terms were also only found in the entire set such as chromosome condensation (GO:0030261), regulation of cytokinesis (GO:0032465) and histone H3 acetylation (GO:0043966).

Figure 4.

Figure 4.

HeLa cell cycling study: Over Representation Analysis of mRNA, transcription product and protein term list. (A) Number of distinct go terms with or without clustering approach. (B) Histogram of P-values resulting from GO term enrichment by clustering approach. We separated the terms obtained from measured molecules (raw), and terms from with BioGRID first degree neighbours (extended). (C) Intersection of go terms by clusters. None of the approaches seemed to offer the most enriched go terms. Nevertheless, (C) shows that there were many terms not shared between clusters.

Finally, significant GO terms were added in the multi-layer network as another new layer and were connected to genes, proteins and metabolites. This last building step resulted in a multi-omics network composed of four layers. Table 4 describe network composition with the different type of node for each layer.

Table 4.

HeLa cell cycling study: multi-Omics network composition. Number of nodes by layer

Gene Proteins Extend Prot. Metabolite GO terms
Cluster 1 233 1813 22 387 1585 1576
Cluster 2 81 53 4726 591 1325
Cluster 3 35 50 4205 559 1328
Cluster 4 97 441
All 252 1916 22 582 1595 1590

Random walk

Molecules involved in specific mechanisms

The first purpose of RW applied to multi-omics network was to identify molecules involved in a specific mechanisms. Naively, each of the 2279 significantly enriched GO terms, (union of GO terms found in cluster and the entire network), can be a mechanism or a function of interest. Each of these GO terms was then turned into a seed. RW from each seed were used to generate sub-networks the 25 top closest nodes. In order to focus on multi-omics integration, we only focused on seeds that reached both genes and proteins. In a second step, we also included seeds that reached both genes, proteins and metabolites. Table 5 describe the number of GO terms (BP, MF, CC) where these conditions were met. 16 unique terms were linked to metabolite nodes (12 BP, 4 MF). Some terms were shared between clusters. These terms encompassed functions related to cell development such as histone lysine methylation GO:0034968 or regulation of mitotic nuclear division (GO:0007088). In Figure 5, we illustrate in detail the multi-omics interactions leading to the activation of this last biological process (GO:0007088). More general terms were also found: positive regulation of cellular metabolic process (GO:0031325), mRNA catabolic process (GO:0006402). Some MF terms seemed also related to cell development anaphase-promoting complex binding (GO:0010997), S-adenosylmethionine-dependent methyltransferase activity (GO:0008757). We also found such terms as response to ionizing radiation (GO:0010212) which could be an artifact of mass-spectrometry analysis. On the other hand, besides the term histone methyltransferase activity (H3-K27 specific) (GO:0046976) other shared terms seemed to be rather generic : methyltransferase activity (GO:0008168) or regulation of signal transduction (GO:0009966).

Table 5.

HeLa cell cycling study: (A) number of seeds that can reach both genes and proteins. (B) Same as A but with the addition of metabolites

A. GO:BP GO:CC GO:MF
Cluster 1 127 57 46
Cluster 2 98 59 35
Cluster 3 57 23 25
All 187 82 56
B. GO:BP GO:MF
Cluster 1 7 1
Cluster 2 2
Cluster 3 1 2
All 9 2
Figure 5.

Figure 5.

HeLa cell cycling study: Random walk with restart result from the seed GO:0007088 . The top 25 results are displayed. From the seed, the algorithm is able to associate genes, proteins and go terms. Thanks to the addition of knowledge-based layers, it can also access other proteins and metabolites which were not produced in the original study (Ext. neighbours).

Function prediction

The second purpose of RW was to predict the function of unannotated nodes (genes, proteins). In this perspective, 6 genes and 24 proteins were unannotated according to gProfiler. Each of these nodes was then turned into a seed. We then assigned to each node the closest GO term. Results of function prediction annotation are detailed in the Supplementary Table S3. The first biological processes which emerged for almost every unannotated nodes were cell cycle (GO:0007049) and cell division (GO:0051301) with protein binding (GO:0005515) or RNA binding (GO:0003723) molecular functions and were related to the nucleus (GO:0005634). Another interesting example was the uncharacterized protein DKFZp781F05101, which should play a role in the nucleosome formation (GO:0006334) and in particular in this study those containing the histone H3 variant CENP-A to form centromeric chromatin (GO:0034080).

Intersection between cluster

The last objective of the RW was to be combined with the kinetics cluster in order to identify connected molecules with different expression profiles. All the nodes representing a measured molecule were transformed into a seed. From the 2362 seeds, 438 were connected to molecules from other clusters We kept 47 seeds that can also reach GO term nodes. These results are detailed in Supplementary Table S3. We identified many biological processes terms related to the cell development. For example, the node labelled PIMREG was a protein coding gene belonging to Cluster 1. With RW, it was linked to Cell Division (GO:0051301), Cell Cycle (GO:0007049) and other GO:BP terms. PIMREG is a mitotic regulator which takes part in the control of metaphase-to-anaphase transition. It was linked to SETD1B (Cluster 2), an histone methyltransferase that tri-methylates histone H3 at Lysine 4 and contributes to epigenetic control of chromatin structure and gene expression. This is an active marker of up-regulated transcription and this may explain why the expression of the PIMREG gene is increasing between phases S and G2/M. In Figure 6PIMREG is also directly connected to the CDK5RAP2 gene, which has the same expression profile as PIMREG and plays a role in the formation and stability of microtubules, essential to the cell development.

Figure 6.

Figure 6.

HeLa cell cycling study: Random walk with restart result analysis from the node PIMREG. (A) Sub-network generated with top 10 RW closest nodes, colored by kinetic clusters. (B) Expression profiles of measured genes from (A). Each line represent the modelled expression of a measured gene.

Case study 2: dynamic maize responses to aphid feeding

Pre-processing, modelling and clustering of time profile

The maize dataset included a gene expression assay composed of 41 716 transcripts identified using the B73 maize reference genome (AGPv3.20), with 2254 transcripts identified as TF-coding genes and a targeted metabolite assay with 45 molecules measured. 1606 maize genes, 123 TF and all metabolites were differentially expressed between aphid and control groups for at least 1 time points. 331 genes, 36 TF and 29 metabolites were filtered after the 2 Fold-Change filtering step. Once all the molecules were modelled and noisy profiles were removed 1208 genes, 31 TF and 29 metabolites remained in the exposed group (Table 6).

Table 6.

Dynamic response to maize aphid feeding study: Initial number of genes, TF-coding genes and metabolites and after each pre-processing step (DE: differentially expressed molecules; FC: molecules above 2 Fold-Changes, LMMS filter: number of molecules after linear model filtering)

Raw count DE filter FC filter LMMS filter
Genes 39 462 1606 1295 1208
TF-coding genes 2254 123 75 42
Metabolites 45

Time profile clustering

Next step was the clustering with timeOmics using genes, TF-coding genes and targeted metabolites. This resulted in optimally 4 clusters (ssil = 0.82). Table 7 gives the cluster composition. Clusters 1 and 2 recorded most of the molecules Inline graphic. These were mainly composed of linear models representing an increase (cluster 1) or a decrease (cluster 2) in the time course. Cluster 3 Inline graphic was characterised by a rapid decrease followed by a slight increase from t = 24 h. Cluster 4 Inline graphic showed the opposite of cluster 3 with a rapid increase followed by a relatively steady state.

Table 7.

Dynamic response to maize aphid feeding study: number of gene, TF-coding genes and targeted metabolites per cluster identified with multi-block PLS longitudinal clustering

Cluster 1 Cluster 2 Cluster 3 Cluster 4
Genes 496 495 84 133
TF-coding genes 18 15 6 3
Metabolites 24 7 4 10

Multi-layered network reconstruction

The first layer to be reconstructed was the gene inference network from the mRNA and TF using ARACNe. We used unmodelled filtered data from aphid and control samples to build this first layer. As the first case study, we build sub-network for each cinetic cluster and a global network composed of all the mRNA/TF. Networks composition is detailed in Supplementary Table S4.

The second layer was the PPI network. Filtered PPIM database was used to build protein-protein interaction network from measured protein-coding genes. We also included proteins which were directly connected to the measured protein coding genes (first degree neighbours). Gene kinetic cluster sub-networks and global network were extended with PPI interactions.

The third network contained metabolic reactions. Metabolites converted to KEGG Compound ID were linked to each other with KEGG Pathways. Metabolites involved in the same metabolic reactions were also included in this layer. As for genes and proteins, kinetic cluster sub-networks and entire network were extended to connect genes and proteins to metabolites. Composition of multi-omics networks is detailed in Table 8.

Table 8.

Dynamic response to maize aphid feeding study: Multi-Omics network composition. Number of nodes by layer

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Entire Network
Genes 514 510 90 136 1250
Ext. Proteins 1473 2000 176 1102 2545
Metabolites (measured) 17 12 14 10 38
Metabolite (KEGG reactions) 77 85 65 46 250
GO terms 121 131 57 100 148

Enrichment analysis

ORA was performed on genes and proteins against GO:BP, GO:MF and GO:CC. This approach was achieved both for each cluster and combined network. We first performed ORA on measured molecules only and with PPIM’s first-degree neighbours. All the results of ORA analysis are detailed in Supplementary Table S5 and Table 9 summarise the number of unique GO terms by ontology and cluster. Cluster 1 with more genes gathered the largest number of significant terms. Terms specific to this cluster were molecular functions related to nucleotide bindings (GO:0035639, GO:0032555, GO:0017076, GO:0032553). Other clusters grouped terms related to bindings or kinase activity (GO:0043546, GO:0043424). Figure 8 represents the number of significant enriched GO terms and their p-value distribution. Measured molecules tended to produce more significant terms when ORA was performed on the entire set of genes rather than with the clustering. Nevertheless, with PPIM’s first-degree neighbours, there were more significant term in the clusters than in the entire set. Neighbours also gave smaller p-values. Lastly, ORA identified unique terms per cluster, not found in the entire set, and vice versa. Therefore, we found the terms mentioned for Cluster 1. Concerning the unique terms in the entire set, they included terms corresponding to defence response to other organism (GO:0098542) or other biological processes that can be linked to defence mechanisms such as lignin metabolic process (GO:0009808), interspecies interaction between organisms (GO:0044419), cellular response to chemical stimulus (GO:0070887) or transmembrane transport (GO:0055085). Stranger, defense response to oomycetes (GO:0002229) was also significantly enriched in the entire set.

Table 9.

Dynamic response to maize aphid feeding study: over representation analysis of mRNA term list by cluster. (A) Number of significant GO terms per cluster and entire network with genes. (B) Same as (A) with PPIM first degree neighbours

A. GO:BP GO:CC GO:MF
Cluster 1 4 5 17
Cluster 2 7 6 5
Cluster 3 - 1 -
Cluster 4 5 - 5
Entire network 37 5 31
B. GO:BP GO:CC GO:MF
Cluster 1 123 14 78
Cluster 2 124 627 586
Cluster 3 19 1 39
Cluster 4 100 -14 569
Entire Network 122 530 104
Figure 8.

Figure 8.

Dynamic response to maize aphid feeding study: Over Representation Analysis. (A) Number of distinct go terms with or without clustering and by approach. (B) Histogram of p-values resulting from GO term enrichment. (C) Intersection of go terms by clusters

Random walk

Like the HeLa cell cycling study example, RW was used on the multi-omics network with three purposes: Identify multi-omics molecules linked to mechanisms (GO term nodes as seed), function prediction (unlabelled nodes as seeds), and identify regulatory mechanisms with nodes between kinetic clusters.

Molecules involved in specific mechanisms

Each of the 89 significantly enriched GO terms was iteratively turned into a seed and subnetworks with the top 25 nodes were build. Within a multi-omics perspective, only three seeds were able to reach at least one gene and one metabolite: molybdopterin cofactor binding (GO:0043546), cell surface receptor signaling pathway (GO:0007166) and phenypropanoid metabolic process (GO:0009698). The latter is specific to plants and gathers information on cell development and defence mechanisms. Phenypropanoid metabolic process was found in clusters 2 and 4 as well as in entire network approach. Part of the shikimate pathway that allows the production of aromatic amino acids in plants, the phenylpropanoid pathway is known to be activated under abiotic stress including salinity, drought, heavy metals, pesticides, ultraviolet radiation and temperature stress (50). In the entire network, this node reached the metabolites L-Tyrosine (C00082) and L-Phenylalanine (C00079) which were both aromatic amino acids measured in the initial study. In addition, it reached non-measured metabolites such as trans-Cinnamate (C00423) and 4-Coumarate (C00811) which were connected to Tyrosine and Phenylalanine respectively. These connections were enabled by the L-phenylalanine ammonia-lyase and L-tyrosine ammonia-lyase reactions in which both measured and non-measured compounds were involved. In this same sub-network, we obtained several iso-enzymes involved in these reactions such as pal1,5,6,9. Interestingly, several gene models without functional annotation are connected to the network (GRMZM2G063679, GRMZM2G087259). RW from Phenypropanoid metabolic process on clusters 2 and 4 networks also returned Tyrosine and Tryptophan since they were classified in clusters 2 and 4 respectively and were involved in the same metabolic reaction. It also reached genes connected to similar pathways in both clusters: dhurrin biosynthesis acting as a plant defense compound and also suberin monomers, cuticular wax and cutin biosynthesis for plant development. Specificaly to cluster 4, one protein (GRMZM2G164036) was linked to DIBOA-glucoside biosynthesis, a benzoxazinoids dervived from indole that is a volatile compound used in parasitic defence (51).

Function prediction

Unlabelled gene nodes (n = 96) were turned into seeds and the closest GO terms were assigned to these nodes. 347 unlabelled nodes were disconnected from the network therefore it was not possible to apply the algorithm for these ones. These results are detailed in Supplementary Table S6. Many seeds were connected to various biosynthesis process such as cinnamic acid (GO:0009800), ceramide (GO:0046513), galactolipid (GO:0019375). We found unlabelled genes linked to defense response processes (GO:0006952) or jasmonic acid metabolic process (GO:0009694) which is an hormone that can be produce during biotic or abiotic stress. Other nodes were connected to various molecular functions such as hydrolase, chitinase, methyltransferase or bindings.

Cluster intersection

All measured nodes were turned into seeds and sub-neworks were build with the to 10 highest scored nodes from the entire network. 678 seeds were able to reach both nodes with different assigned kinetic cluster and GO term nodes. Results are detailed in Supplementary Table S6. We found molecular functions corresponding to different enzymatic activities (ATP binding, kinase, oxidoreductase ). Less frequently, we observed a toxin activity (GO:0090729) for the rip2 gene (GRMZM2G119705), also connected to defense response (GO:0006952). In Figure 9, gene AC209374.4_FG002, assigned to cluster 2, were indirectly connected to nodes from both clusters 1, 3 and 4. They were linked to DNA-binding transcription factor activity (GO:0003700). This regulatory mechanisms may be explained by the opposite expression profile as illustrated in Figure 9B.

Figure 9.

Figure 9.

Dynamic response to maize aphid feeding study: Random walk with restart result analysis from the node AC209374.4_FG002. (A) Sub-network generated with top 10 RW closest nodes, colored by kinetic clusters. (B) Expression profiles of measured genes from (A). Each line represent the modelled expression of a measured gene.

Case study 3: diabetes seasonal study

Pre-processing, modelling and clustering of time profile

The diabetes seasonal study dataset (Figure 10) was composed of 10 339 mRNA, 302 proteins, 66 Cytokines, 724 metabolites, 95 gut Operational Taxonomic Units (OTU) and 39 clinical variables 10. After the preprocessing step and the fold change filtering step (FC > 1.5), there were 775 mRNA, 30 proteins, 16 cytokines, 105 metabolites, 58 OTU left. No FC filter was applied to the clinical variables since the fold change for all these variables was below the threshold. The remaining molecules were modelled as a function of time over the entire year t = {48, 57, 61, 75, 85, 170, 295}.

Figure 10.

Figure 10.

Diabetes seasonal study: (A) experimental design: one individual was sampled for each timepoint (days: 48, 57, 61, 75, 8, 170, 295). For each sample, RNA, Proteins, Cytokines, Metabolites (untargeted), gut OTU and clinical variables are quantified. (B) Multi-Omics longitudinal clustering: Clustered expression profiles of RNA, Proteins, Cytokines, Metabolites (untargeted), gut OTU and clinical variables. Each line represents the modelled abundance of a molecule during the time-course. (C) Multi-omics network layout: represents the connection between entities and their different types of interactions.

The modelled time profiles from the 6 blocks were clustered using the multi-block PLS and resulted in two clusters (ssil = 0.26). The first cluster (Cluster 1) was composed of 293 molecules and characterised by an increase until the middle of the year (t = 75) then a decrease. The second cluster (Cluster 2) was composed of 730 molecules and had an opposite pattern to Cluster 1 with a decrease until the middle of the year then an increase. We used the sparse multi-block PLS to identify a signature of the cluster and the number of selected molecules by sparse cluster is displayed in Table 10.

Table 10.

Diabetes seasonal study: initial number molecules, number of remaining molecules after the fold-change filtering step, cluster compositions and sparse cluster composition by omics

RNA Proteins Cytokines Metabolites Gut OTU Clinical var.
Raw data 10 339 302 66 724 96 39
Fold change 775 30 16 105 58 39
Cluster 1 164 14 9 56 33 17
Cluster 2 611 16 7 49 25 22
(sparse) cluster 1 19 1 6 3 8 11
(sparse) cluster 2 38 2 4 1 2 15

Multi-layered network reconstruction

The first layer of the multi-omics network was the gene inference network from mRNA expression using ARACNe algorithm. As the first 2 case studies, we build sub-networks with cluster-specific mRNA and a general network with all the molecules. We also applied ARACNe to build networks of clinical variables and unlabelled metabolites.

As in the HeLa cell cycling case study, we build a PPI network with the BioGRID (29) interaction database with proteins and cytokines. This layer was connected to the mRNA layer using TFome information described above (TF2DNA (47), TTRUST (46)).

Specific to this case study, we build a microbiome correlation network from gut OTU using SparCC algorithm and cluster-specific sub-networks. We connected this layer to the other layer by computing the Spearman rank correlation and kept only the highest correlation between gut OTU and other molecules (|ρ| ≥ 0.99). With a correlation above the threshold, 58 OTU were linked to five RNA, one metabolite and two clinical variables (AG, EGFR) with a total of 132 interactions.

Unlike the other two case studies, the metabolites were not annotated, therefore we could not use the KEGG reaction database to connect this layer to genes and proteins. Instead, we applied the same method as the microbiome layer to connect the metabolite layer to the network. The 105 metabolites were linked to 11 mRNA, 8 OTU and 3 clinical variables (EGFR, LYM, ALKP), with a total of 231 interactions.

Finally, we applied the same method for the clinical variable layer. The 39 clinical variables were linked to 775 mRNA, 16 cytokines, 30 proteins, 105 metabolites and 58 OTU, with a total of 1026 interactions.

The network composition is detailed in Table 11.

Table 11.

Diabetes seasonal study: multi-omics network composition. Number of nodes by layer

Cluster 1 Cluster 2 Entire Network
RNA 169 621 775
Proteins (mesured) 11 17 23
Proteins (BioGRID interaction) 138 741 848
Cytokines (mesured) 3 4 8
Cytokines (BioGRID interaction) 11 20 29
Metabolites 56 49 105
Gut OTU 33 25 58
Clinical Var. core 18 22 39
GO terms 20 63 96
Disease 110 174 178

Enrichment analysis

ORA was performed on mRNA, proteins and cytokines data against Gene Ontology (BP, MF and CC). It was performed on both cluster-specific terms and the combined list of molecules. Unsurprisingly, with more molecules than Cluster 1, Cluster 2 gathered more significant GO terms (Table 12, Supplementary Table S7). Figure 11 represents the number of significant enriched GO terms and their P-value distribution. Measured molecules tended to produce more significant terms when ORA was performed on the entire set of genes rather than with the clustering. In Cluster 2, we found a lot of GO terms linked to cell organization and biogenesis such as cellular component organization or biogenesis GO:0071840, regulation of organelle organization GO:0033043, positive regulation of neurogenesis GO:0050769, regulation of organelle organization GO:0033043, cellular component biogenesis GO:0044085, membrane organization GO:0061024, actin cytoskeleton organization GO:0030036, cell-cell junction GO:0005911. These enriched terms suggest important changes in the mechanical properties of tissues such as skin. In winter, cold and low humidity increase skin permeability and epidermal thickening (45,52). This is consistent with the kinetic of cluster 2 which displayed a decrease in expression until in the winter and spring seasons and an increase in summer. In Cluster 1, we found more generic terms linked to cell activities such as translational initiation (GO:0006413), biological regulation (GO:0065007) or positive regulation of cellular process (GO:0048522). We also found the term viral infection (GO:0016032) which can also be explained by the seasonal nature of the data and can be increased in patients with diabetes (53).

Table 12.

Diabetes seasonal study: over representation analysis of RNA, proteins and cytokines term list by cluster. (A) Number of significant GO terms per cluster and entire network. (B) Same as (A) with first degree neighbours from BioGRID

A. GO:BP GO:CC GO:MF
Cluster 1 72 43 12
Cluster 2 135 128 69
Entire network 226 140 89
B. GO:BP GO:CC GO:MF
Cluster 1 229 70 19
Cluster 2 1204 249 151
Entire network 1356 263 162
Figure 11.

Figure 11.

Diabetes seasonal study: over representation analysis. (A) Number of distinct go terms with or without clustering and by approach. (B) Histogram of p-values resulting from GO term enrichment. (C) Intersection of go terms by clusters.

In addition to GO enrichment analysis, we performed a disease-related gene enrichment analysis using MedlineRanker in this case study (49). We found significantly enriched diseases such as Hemoglobinopathies (P-value = 1.7 10e–3) or Thalassemia (P-value = 3.3 10e–3) a blood-related genetic disorder. People with this hemoglobinopathy are likely to develop autoimmunity, insulin resistance and disease such as type 1 or 2 diabetes (54). We also found Renal Tubular Acidosis (P-value = 2.10e–2) (Supplementary Table S7) which is also known to be a diabetes-related disease (55,56).

Finally, we used this enrichment information to add two more layers to the network. Each GO or disease term was linked to the corresponding RNA, protein or cytokines used for the enrichment analysis. The final network was composed of 8 layers of molecules/terms and the final composition of the entire network and the kinetic cluster specific subnetworks is available in Table 11.

Random walk

In this case study, random walk was used to identify multi-omics interaction from diseases and biological functions (GO terms). It was also used to identify molecules between kinetic clusters. Unlike the other case studies, any gene or protein nodes were unannotated and we did not predict any function.

To identify molecules linked to biological mechanisms, the random walk starting points were significant GO terms and Disease nodes. After the random walk from each of those seeds, we ranked the nodes and extracted the top 15 first nodes related to the seed. We then had a list of 274 initial seeds (96 GO, 178 diseases). X seeds were able to reach at least two omic layers within the 15 closest nodes and the seeds were linked to a maximum of three different layers. For example, the disease node labelled Renal Tubular Acidosis was linked to one cytokine, three proteins and 12 RNA despite that this disease was only related to two known RNA (SLC4A1, CA2) in this network (Figure 12). In addition, in this sub-network, we detected the RNA MEIS1 which was found in the signature. The gene is a protein-coding gene from the family of homeobox genes (HOX). It is associated with restless legs syndrome (57). We did not find any relevant literature about a potential link between this gene and diabetes mellitus but associations between restless leg syndrome and diabetes mellitus are well documented (58–60).

Figure 12.

Figure 12.

Diabetes seasonal study: random walk with restart result from the seed ‘Acidosis, Renal Tubular’. Within the closest nodes, the algorithm were able to link RNA, proteins, a cytokine and a disease. The nodes are colored by omic type and by kinetic cluster.

RWR was also applied to identify seeds linked to both kinetic clusters. The disease node Renal Tubular Acidosis was connected to molecules which belong to both 2 kinetic clusters. We therefore have a clear demonstration of the benefits of building and annotating longitudinal multi-omics networks for the identification of potential mechanisms and revealing new interactions in diabetes mellitus.

DISCUSSION

Improvements and cost reductions in high-throughput omics technologies have resulted in the emergence of increasingly complex experimental designs that combine both multi-omics data and longitudinal sampling from the same biological samples. These complex designs are implemented to provide a comprehensive view of biological systems in order to identify complex relationships between omics layers and answer various biological questions such as disease subtyping, biomarker discovery, models and predictions (2,61–63). Appropriate integration of the multiple omics profiled must be performed to fully take advantage of the interactions between omics and their interdependencies. The challenge of multi-omics integration is to offer interpretation guidelines adapted to the multiplicity and heterogeneity of designs (data types, number of conditions, samples, timepoints, etc.). In order to identify cellular mechanisms with multi-omics regulation, we propose to use a combination of both data-driven and knowledge-driven integration methods from longitudinal multi-omic data to highlight interactions not only between molecules of the same biological type but also between omic layers linked to cellular mechanisms.

Multi-omics network integration in literature

Multi-omics networks are the key to illustrate topological relationships between different molecular species (64). The HetioNet network is currently one of the most impressive multi-omics networks. Composed of 18 different types of nodes (genes, SNPs, proteins, compounds, tissues, diseases) and 19 different types of edges between these layers (65). Its main objective is to provide a natural representation of the human system hierarchy and to propose genetic-disease association prediction. Although very comprehensive, some relationships evolve in time and the addition of these dynamic interactions as well as specific expression data should be very valuable to study complex disease mechanisms. On a smaller scale, MetPriCNet (66) is a 3-level network composed of genes, metabolites, diseases and associations between each cross-layer pair. Its objective is to prioritize candidate metabolites for known diseases or phenotypes. Although very useful for therapeutic target prediction, these networks should be connected to expression data to further address the needs of personalized medicine. In another paradigm, OmicsNet (67) offers to build customized multi-omics networks from a list of molecules of interest (genes, proteins, metabolites, TF, miRNA). It does not provide association prediction but the connection between the nodes is ensured by several databases (STRING, miRBase, KEGG, ENCODE, etc.). Its main advantage is to propose in a web browser a user friendly 3D representation of the resulting network. Like other networks, it is specifically designed for the human organism and still too few integration networks target other model organisms.

Like OmicsNet, MiBioMics (68) also offers a web service and an easy-to-use user interface. The method offers the user to directly upload their data, and performs pre-processing and multi-omics integration with multiple co-inertia analysis. It also offers network inference using WGCNA (69). Although the interface is dynamic, network exploration is limited to module detection.

A recent contribution to multi-omics networks with time series data was proposed in (70). From time series expression data, the authors defined time events with Minardo-Model. These events have significant shifts in expression on a specific time window. They identified groups of molecules with similar time events. The analysis of the successive time events resulted in an expected causal directed graph between the molecules. This was applied on a multi-omics dataset (phosphoproteomics, transcriptomics, and proteomics) where each block was analyzed independently and the results were combined during the final event ordering step. Complementary to our approach and if the experimental design allows it (Kaur et al. (70) used 7+ evenly spaced timepoints), one can easily imagine that the order of events can be embedded into another layer of the graph and provide unique interpretation results.

Interpretation guidelines for multi-omics networks: advantages and limitations

Ranging from expression modelling over time, multivariate clustering to network reconstruction, in the following section we discuss the pros and cons of the choices made at each stage of this methodology.

Data requirement and experimental design

Each experimental design is different, in terms of type of data, numbers of replicates, conditions and timepoints. As illustrated in the different case study, the framework can handle a large variety of data. Special care must be taken in designing the study. To improve the integration, interpretation and reduce false positive discoveries, one should study events with large variation over the same time period and when the variation between omics blocks is expected to happen on the same time scale (71). Normalisation can also impact the network building process, we recommend to using gold standard normalisation and preprocessing approaches appropriate for each omic layer. Longitudinal data and Multivariate clustering: our previous timeOmics framework (modeling and clustering step) was designed to identify highly associated temporal profiles between multiple and heterogeneous datasets. In order to maximise the method performance, we made recommendations about experimental design, including the number of data blocks (3) and evenly spaced time points (5–10) (24). In addition, the interpolation procedure which infer missing timepoints for uneven designs may have a large impact on the clustering result as demonstrated in (24). We also used cases when they were numerous timepoints or high inter-individual variability. A filtering step was implemented to increase interpretability, based on highly variable profiles within the time period above a set fold-change. This naive threshold can be refined with the addition of condition specific differential expression or other models according to the experimental design (25,45).

Network and kinetic clusters

In this framework, we have proposed to use multi-layer networks to help with the interpretation and the visualization of multi-omics interactions. Although we built a global gene network, we also used the inference method on each of the clusters separately. We first tried to extract cluster-specific sub-networks from the general network. The second approach is clustering first, and then building a network per cluster. We kept the second approach because it gave a higher number of connected nodes and edges than the network-first approach. Furthermore, with the addition of knowledge data for which we do not have temporal profiles, some nodes will be shared between clusters.

Network inference

We relied on ARACNe algorithm for gene inference network. More mathematically advanced alternatives exist (72) recently applied Gaussian graphical model with graphical lasso to infer gene interactions in 15 specific types of human cancer using RNA-Seq expression data from The Cancer Genome Atlas. Nevertheless, ARACNe is a well-cited, reliable coexpression-based inference method and it provides an easy to use implementation. As mentioned by (73), the inference problem is NP-complete and only the combination of the results of several methods might help uncover true interactions. Moreover, of omics data should influence the choice of the inference method. With microbiome abundance data, co-occurrence network inference is often applied to identify synergism or competition relationships between species and their environment (74,75). In terms of multi-omics interaction, the recent tool OmicsAnalyst (76) proposes an interesting improvement. This method used two filters to control correlation strengths for intra-omics and for inter-omics interactions.

Knowledge-based interaction

The proposed hybrid multi-omics networks were composed of expression data and knowledge links. Therefore, known links can be limited by incomplete databases. In the first case study, we merged BioGRID, TTRUST and TF2DNA to connect proteins to genes. However, more specialised databases could also be added to include more links. In the second case study, since maize is not as studied as humans, the PPIM database covers all known to date interactions for this organism and some unknown interactions may be missing. In addition, these links represent potential interactions that may append in the studies but not measured interactions and these networks are referred to as maps. An additional integration step would consist in constructing a dynamic signal flow where the directionality and intensity of the interactions would be represented in the form of flows or pathways. Unfortunately, this type of representation requires a time-series design using precise multiple doses of stimulation under the same condition (16). On the contrary, the use of erroneous databases can lead to false discoveries and thus we believe that only knowledge databases with trusted interactions should be used. Alternatively, overlapping interaction between several databases can be selected to improve predictions (77).

Multi-omics interpretation

The use of clusters enables kinetic-specific enrichment analyses to be performed. These results can be compared with those obtained from the entire network. Both approaches tended to provide the same number of significant terms are instead complementary. It was possible to identify terms unique to each approach and terms only found in a particular cluster. Addition of non-measured molecules in ORA provided more significant terms with lower p-values. However, these extra molecules were highly shared between cluster or were linked to similar functions. Thus it did not bring cluster specific enrichment. The sparse multi-block PLS allows the identification of a signature of kinetic clusters. This signature is often composed of a small list of molecules and does not always allow a positive biological enrichment. However, the use of this sparse signature adds additional information on the network and allows the identification of nodes of interest. A good interpretation strategy is to use these signature molecules as seeds in the propagation analysis or if any seed can reach these nodes of interest.

Propagation analysis

Although random walk propagation analysis is the state of the art for association prediction, its application on heterogeneous multi-layered networks is still in its early stages. Multi-layer network describes a network composed of several layers. The layers contain the same nodes but the edges connecting these nodes are different. For example, a gene network may have coexpression links on a first layer and PPI interactions on a second layer.

In addition, to depict the relationships between different species of molecules, it is recommended to use heterogeneous networks with a bipartite relationship that describe all possible interaction between two sets of nodes. With more links and more possibilities to fine-tune the transition between layers, more complex network structures would improve the prediction of the random walk (21). As complex biological networks have several types of molecules and several types of interactions, it becomes essential to use heterogeneous multi-layer networks to describe all the relations of the system. However, tools to build such networks are limited by the number of layers or the number of heterogeneous relations. Therefore, we were encouraged to use monoplex networks for each application. Although monoplex networks highlighted relevant interactions, comparison with heterogeneous multi-layer networks would have been worthwhile. Finally, merging data from different sources must be performed with care. Depending on the source of the data, the quality of the interactions can be heterogeneous and can lead to wrong predictions. To overcome this issue, it would be possible to use weighted edges according to the level of confidence granted to the inference algorithms or the given databases. However, using data with very heterogeneous quality scores can result in the highlighting of the same interactions and potential discoveries can be missed. It is recommended to mix inference approaches and databases to improve interaction predictions (78,79).

Contribution of network-based integration on public datasets

The reanalysis of multi-omics data with new methods can lead to a better understanding of the system. Both cases proposed in this paper are no exception. With the study of HeLa cells during the cell cycle, the authors were able to highlight functional groups, expressed by proteins, which may be over or under-regulated at different phases of the cycle. Combined with the other omics layers, they were able to compare the dynamics of an mRNA, its translation product and the associated protein and thus highlight the regulation mechanism during the cycle. Nevertheless, our methodology, based on network-based multi-omics integration could, in addition to answering the same questions, offer the advantage of going even further.

From a key function (Cell division, Chromosome segmentation, Telomere maintenance) it is possible to predict the molecules involved but also to know the events, interactions, regulation, which have allowed its expression. Moreover, by adding other molecular layers (e.g. metabolites) it is possible to observe the implications of this function at the system level.

The second case examining the response of maize to an aphid attack highlighted the dynamics of certain genes involved in the defense mechanisms, associated with the evolution of known metabolites. Although the initial analyses identified genes and metabolites that are highly variable with the presence of aphids, the authors did not take advantage of all the multi-layer interactions. Indeed, with a very targeted list of metabolites, having roles in defense mechanisms but also in other pathways, it is difficult to identify relevant multi-omic interactions. By adding other layers but also directly connected molecules, we have been able to considerably increase the diversity of the interactions that could be detected and thus increase the chances of identifying other potential targets to better understand maize defense mechanisms against aphids and possibly orient the development of future biological pesticides.

Finally with the third case study, we studied Type 2 diabetes with a more complex experimental design than the other case study with microbiome and clinical variables. This case study investigates the seasonality of omics expressions throughout the year. Aside from the GO terms, we enriched the network with a disease layer. This layer adds informative interactions regarding possible diabetes-related complications as well as the prediction of key genes in these mechanisms.

The widespread use of high-throughput technologies has enabled the profiling of multiple omics layers across multiple time points. This has opened the door to new study designs that can now address questions and identify novel biological mechanisms regulated by different biomolecular layers. However, one of the biggest challenges in this era of multi-omics is the integration and interpretation of the diverse large-scale omics data in a way that provides new and more complete biological insights.

To facilitate the interpretation of longitudinal multi-omics data we have proposed an analytic strategy that consists of multi-omics kinetic clustering and multi-layer network. The use of multi-layer networks can evolve and be enriched by the addition of new layers (for example miRNA, disease, drugs, environmental variables, microbiome) and their respective cross-layered interactions (gene–disease association, host-microbe interaction, miRNA-mRNA regulation) and also intra-layered interactions that could defined different kinds of interactions for the same layer (co-expression, functional relationship, expression delays).

The analysis of three multi-omics longitudinal studies in three previously published datasets demonstrates that new multi-layered interactions can be unravelled. This approach can be applied to many fields such as pharmacology or dermocosmetics where the addition of information and the exploration of multi-omics networks can lead to the discovery of new therapeutic targets. Eventually, and with highly controlled experimental designs, we expect that dynamic networks can be built to model an entire biological system. Therefore, this methodology will open new avenues for exploration and interpretation from multi-omic studies.

DATA AVAILABILITY

The netOmics R package is available at https://github.com/abodein/netOmics. HeLa Cell raw data for case study 1 can be found in (35). Maize raw data for case study 2 can be found in (80). Diabetes raw data for the case study 3 can be found in (45). Processed data and in-house scripts are available in a Github public repository: https://github.com/abodein/netOmics-case-studies.

Supplementary Material

gkab1200_Supplemental_Files

ACKNOWLEDGEMENTS

Author contributions: A.B. performed the analyses and wrote the manuscript. A.B., M.P.S.B. and K.A.L.C. revised the manuscript. All authors read and approved the submitted version.

Contributor Information

Antoine Bodein, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada.

Marie-Pier Scott-Boyer, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada.

Olivier Perin, Digital Sciences Department, L’Oréal Advanced Research, Aulnay-sous-bois, France.

Kim-Anh Lê Cao, Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia.

Arnaud Droit, Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

A.B., M.P.S.B. and A.D. were supported by Research and Innovation chair L’Oréal in Digital Biology; K.A.L.C. was supported in part by the National Health and Medical Research Council (NHMRC) Career Development fellowship [GNT1159458]. Funding for open access charge: Research and Innovation chair L’Oreal in Digital Biology.

Conflict of interest statement. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

REFERENCES

  • 1. Chin L., Andersen J.N., Futreal P.A.. Cancer genomics: from discovery science to personalized medicine. Nat. Med. 2011; 17:297. [DOI] [PubMed] [Google Scholar]
  • 2. Hasin Y., Seldin M., Lusis A.. Multi-omics approaches to disease. Genome Biol. 2017; 18:83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Van Assche R., Broeckx V., Boonen K., Maes E., De Haes W., Schoofs L., Temmerman L.. Integrating-omics: systems biology as explored through C. elegans research. J. Mol. Biol. 2015; 427:3441–3451. [DOI] [PubMed] [Google Scholar]
  • 4. Fondi M., Liò P.. Multi-omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol. Res. 2015; 171:52–64. [DOI] [PubMed] [Google Scholar]
  • 5. Lloyd-Price J., Arze C., Ananthakrishnan A.N., Schirmer M., Avila-Pacheco J., Poon T.W., Andrews E., Ajami N.J., Bonham K.S., Brislawn C.J.et al.. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature. 2019; 569:655–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Sun Y.V., Hu Y.-J.. Integrative analysis of multi-omics data for discovery and functional studies of complex human diseases. Advances in Genetics. 2016; 93:Elsevier; 147–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Huertas-Vazquez A., Leon-Mimila P., Wang J.. Relevance of multi-omics studies in cardiovascular diseases. Fron. Cardiovl. Med. 2019; 6:91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Palsson B., Zengler K.. The challenges of integrating multi-omic data sets. Nat. Chem. Biol. 2010; 6:787–789. [DOI] [PubMed] [Google Scholar]
  • 9. Huang S., Chaudhary K., Garmire L.X.. More is better: recent progress in multi-omics data integration methods. Front. Genet. 2017; 8:84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Krahmer J. Circadian abundance and modification of proteins in arabidopsis. 2016; Edinburgh: University of Edinburgh; PhD thesis. [Google Scholar]
  • 11. Barabasi A.-L., Oltvai Z.N.. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004; 5:101–113. [DOI] [PubMed] [Google Scholar]
  • 12. Jeong H., Mason S.P., Barabási A.-L., Oltvai Z.N.. Lethality and centrality in protein networks. Nature. 2001; 411:41–42. [DOI] [PubMed] [Google Scholar]
  • 13. Stelzl U., Worm U., Lalowski M., Haenig C., Brembeck F.H., Goehler H., Stroedicke M., Zenkner M., Schoenherr A., Koeppen S.et al.. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005; 122:957–968. [DOI] [PubMed] [Google Scholar]
  • 14. UniProt Consortium Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zhu X., Gerstein M., Snyder M.. Getting connected: analysis and principles of biological networks. Genes Dev. 2007; 21:1010–1024. [DOI] [PubMed] [Google Scholar]
  • 16. Yugi K., Kubota H., Hatano A., Kuroda S.. Trans-omics: how to reconstruct biochemical networks across multiple ‘omic’ layers. Trends Biotechnol. 2016; 34:276–290. [DOI] [PubMed] [Google Scholar]
  • 17. Kim D., Joung J.-G., Sohn K.-A., Shin H., Park Y.R., Ritchie M.D., Kim J.H.. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J. Am. Med. Inform. Assn. 2015; 22:109–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Cowen L., Ideker T., Raphael B.J., Sharan R.. Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 2017; 18:551. [DOI] [PubMed] [Google Scholar]
  • 19. Page L., Brin S., Motwani R., Winograd T.. The PageRank citation ranking: bringing order to the web. Technical report. 1999; Stanford InfoLab. [Google Scholar]
  • 20. Nguyen N.D., Wang D.. Multiview learning for understanding functional multiomics. PLOS Comput. Biol. 2020; 16:e1007677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Valdeolivas A., Tichit L., Navarro C., Perrin S., Odelin G., Levy N., Cau P., Remy E., Baudot A.. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics. 2019; 35:497–505. [DOI] [PubMed] [Google Scholar]
  • 22. Yalamanchili H.K., Wan Y.-W., Liu Z.. Data analysis pipeline for RNA-seq experiments: from differential expression to cryptic splicing. Curr. Protoc. Bioinformatics. 2017; 59:11–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Brown M., Dunn W.B., Ellis D.I., Goodacre R., Handl J., Knowles J.D., O’Hagan S., Spasić I., Kell D.B.. A metabolome pipeline: from concept to data to knowledge. Metabolomics. 2005; 1:39–51. [Google Scholar]
  • 24. Bodein A., Chapleur O., Droit A., Lê Cao K.-A.. A generic multivariate framework for the integration of microbiome longitudinal studies with other data types. Front. Genet. 2019; 10:944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Straube J., Gorse A.-D., Huang B.E., Lê Cao K.-A.. A linear mixed model spline framework for analysing time course ‘omics’ data. PloS one. 2015; 10:e0134540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Rohart F., Gautier B., Singh A., Lê Cao K.-A.. mixOmics: an R package for omics feature selection and multiple data integration. PLoS Comput. Biol. 2017; 13:e1005752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Margolin A.A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., Califano A.. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7:Springer; S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zheng M., Zhuo M.. PPI inference algorithms using MS data. International Conference on Big Data Analytics for Cyber-Physical-Systems. 2019; Springer; 931–936. [Google Scholar]
  • 29. Chatr-Aryamontri A., Breitkreutz B.-J., Oughtred R., Boucher L., Heinicke S., Chen D., Stark C., Breitkreutz A., Kolas N., O’Donnell L.et al.. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 2015; 43:D470–D478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Klingström T., Plewczynski D.. Protein–protein interaction and pathway databases, a graphical review. Brief. Bioinform. 2011; 12:702–713. [DOI] [PubMed] [Google Scholar]
  • 31. Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M.. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016; 44:D457–D462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Raudvere U., Kolberg L., Kuzmin I., Arak T., Adler P., Peterson H., Vilo J.. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019; 47:W191–W198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Reimand J., Kull M., Peterson H., Hansen J., Vilo J.. g: Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 2007; 35:W193–W200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Schröder M.S., Culhane A.C., Quackenbush J., Haibe-Kains B.. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. Bioinformatics. 2011; 27:3206–3208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Aviner R., Shenoy A., Elroy-Stein O., Geiger T.. Uncovering hidden layers of cell cycle regulation through integrative multi-omic analysis. PLoS Genet. 2015; 11:e1005554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Zhu G., Wu A., Xu X.-J., Xiao P.-P., Lu L., Liu J., Cao Y., Chen L., Wu J., Zhao X.-M.. PPIM: a protein-protein interaction database for maize. Plant. Physio. 2016; 170:618–626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Uniprot Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Xenarios I., Rice D.W., Salwinski L., Baron M.K., Marcotte E.M., Eisenberg D.. DIP: the database of interacting proteins. Nucleic Acids Res. 2000; 28:289–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., Campbell N.H., Chavali G., Chen C., Del-Toro N.et al.. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014; 42:D358–D363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Licata L., Briganti L., Peluso D., Perfetto L., Iannuccelli M., Galeota E., Sacco F., Palma A., Nardozza A.P., Santonico E.et al.. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 2012; 40:D857–D861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Ayvazian R. A Strategy to Increase Type 2 Diabetes Health Education. 2021; Northridge: California State University; PhD thesis. [Google Scholar]
  • 42. Cho N., Shaw J., Karuranga S., Huang Y., da Rocha Fernandes J., Ohlrogge A., Malanda B.. IDF Diabetes Atlas: global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pr. 2018; 138:271–281. [DOI] [PubMed] [Google Scholar]
  • 43. Integrative H., Proctor L.M., Creasy H.H., Fettweis J.M., Lloyd-Price J., Mahurkar A., Zhou W., Buck G.A., Snyder M.P., Strauss III J.F.et al.. The integrative human microbiome project. Nature. 2019; 569:641–648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Integrative H. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe. 2014; 16:276–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Sailani M.R., Metwally A.A., Zhou W., Rose S.M.S.-F., Ahadi S., Contrepois K., Mishra T., Zhang M.J., Kidziński Ł., Chu T.J.et al.. Deep longitudinal multiomics profiling reveals two biological seasonal patterns in California. Nat. Commun. 2020; 11:4933–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Han H., Cho J.-W., Lee S., Yun A., Kim H., Bae D., Yang S., Kim C.Y., Lee M., Kim E.et al.. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 2018; 46:D380–D386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Pujato M., Kieken F., Skiles A.A., Tapinos N., Fiser A.. Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes. Nucleic Acids Res. 2014; 42:13500–13512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Friedman J., Alm E.J.. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012; 8:e1002687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Fontaine J.-F., Barbosa-Silva A., Schaefer M., Huska M.R., Muro E.M., Andrade-Navarro M.A.. MedlineRanker: flexible ranking of biomedical literature. Nucleic Acids Res. 2009; 37:W141–W146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Sharma A., Shahzad B., Rehman A., Bhardwaj R., Landi M., Zheng B.. Response of phenylpropanoid pathway and the role of polyphenols in plants under abiotic stress. Molecules. 2019; 24:2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Gierl A., Frey M.. Evolution of benzoxazinone biosynthesis and indole production in maize. Planta. 2001; 213:493–498. [DOI] [PubMed] [Google Scholar]
  • 52. Balato N., Di Costanzo L., Patruno C., Patrì A., Ayala F.. Effect of weather and environmental factors on the clinical course of psoriasis. Occup. Environ. Med. 2013; 70:600–600. [DOI] [PubMed] [Google Scholar]
  • 53. Lönnrot M., Lynch K.F., Larsson H.E., Lernmark Å., Rewers M.J., Törn C., Burkhardt B.R., Briese T., Hagopian W.A., She J.-X., et al.. Respiratory infections are temporally associated with initiation of type 1 diabetes autoimmunity: the TEDDY study. Diabetologia. 2017; 60:1931–1940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Barnard M., Tzoulis P.. Diabetes and thalassaemia. Thalassemia Rep. 2013; 3:e18. [Google Scholar]
  • 55. Reddy P. Clinical approach to renal tubular acidosis in adult patients. Int. J. Clin. Pract. 2011; 65:350–360. [DOI] [PubMed] [Google Scholar]
  • 56. Karet F.E. Mechanisms in hyperkalemic renal tubular acidosis. J. Am. Soc. Nephro. 2009; 20:251–254. [DOI] [PubMed] [Google Scholar]
  • 57. Sarayloo F., Dion P.A., Rouleau G.A.. MEIS1 and restless legs syndrome: a comprehensive review. Front. Neurol. 2019; 10:935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Akın S., Bölük C., Börü Ü.T., Taşdemir M., Gezer T., Şahbaz F.G., Keskin Ö.. Restless legs syndrome in type 2 diabetes mellitus. Primary Care Diabetes. 2019; 13:87–91. [DOI] [PubMed] [Google Scholar]
  • 59. Merlino G., Fratticci L., Valente M., Giudice A.D., Noacco C., Dolso P., Cancelli I., Scalise A., Gigli G.L.. Association of restless legs syndrome in type 2 diabetes: a case-control study. Sleep. 2007; 30:866–871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Lopes L.A., de MM Lins C., Adeodato V.G., Quental D.P., De Bruin P.F., Montenegro R.M., de Bruin V.M.. Restless legs syndrome and quality of sleep in type 2 diabetes. Diabetes Care. 2005; 28:2633–2636. [DOI] [PubMed] [Google Scholar]
  • 61. Pinu F.R., Beale D.J., Paten A.M., Kouremenos K., Swarup S., Schirra H.J., Wishart D.. Systems biology and multi-omics integration: Viewpoints from the metabolomics research community. Metabolites. 2019; 9:76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Jamil I.N., Remali J., Azizan K.A., Muhammad N. A.N., Arita M., Goh H.-H., Aizat W.M.. Systematic Multi-Omics Integration (MOI) approach in plant systems biology. Front. Plant Sci. 2020; 11:944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Misra B.B., Langefeld C., Olivier M., Cox L.A.. Integrated omics: tools, advances and future approaches. J. Mol. Endocr. 2019; 62:R21–R45. [DOI] [PubMed] [Google Scholar]
  • 64. Lee B., Zhang S., Poleksic A., Xie L.. Heterogeneous multi-layered network model for omics data integration and analysis. Front. Genet. 2020; 10:1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Himmelstein D.S., Lizee A., Hessler C., Brueggeman L., Chen S.L., Hadley D., Green A., Khankhanian P., Baranzini S.E.. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife. 2017; 6:e26726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Yao Q., Xu Y., Yang H., Shang D., Zhang C., Zhang Y., Sun Z., Shi X., Feng L., Han J.et al.. Global prioritization of disease candidate metabolites based on a multi-omics composite network. Sci. Rep.-UK. 2015; 5:17201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Zhou G., Xia J.. OmicsNet: a web-based tool for creation and visual analysis of biological networks in 3D space. Nucleic Acids Res. 2018; 46:W514–W522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Zoppi J., Guillaume J.-F., Neunlist M., Chaffron S.. MiBiOmics: an interactive web application for multi-omics data exploration and integration. BMC Bioinformatics. 2021; 22:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Langfelder P., Horvath S.. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008; 9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Kaur S., Peters T.J., Yang P., Luu L. D.W., Vuong J., Krycer J.R., O’Donoghue S.I.. Temporal ordering of omics and multiomic events inferred from time-series data. NPJ Syst. Biol. Appl. 2020; 6:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Bodein A., Scott-Boyer M.-P., Perin O., Lê Cao K.-A., Droit A.. timeOmics: an R package for longitudinal multi-omics data integration. Bioinformatics. 2021; 10.1093/bioinformatics/btab664. [DOI] [PubMed] [Google Scholar]
  • 72. Zhao H., Duan Z.-H.. Cancer genetic network inference using Gaussian graphical models. Bioinformatics Biol. Insigh. 2019; 13:1177932219839402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. De Smet R., Marchal K.. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 2010; 8:717–729. [DOI] [PubMed] [Google Scholar]
  • 74. Ma B., Wang Y., Ye S., Liu S., Stirling E., Gilbert J.A., Faust K., Knight R., Jansson J.K., Cardona C.et al.. Earth microbial co-occurrence network reveals interconnection pattern across microbiomes. Microbiome. 2020; 8:82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. McGregor K., Labbe A., Greenwood C.M.. MDiNE: a model to estimate differential co-occurrence networks in microbiome studies. Bioinformatics. 2020; 36:1840–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Zhou G., Ewald J., Xia J.. OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data. Nucleic Acids Res. 2021; 49:W476–W482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Bajpai A.K., Davuluri S., Tiwary K., Narayanan S., Oguru S., Basavaraju K., Dayalan D., Thirumurugan K., Acharya K.K.. Systematic comparison of the protein-protein interaction databases from a user’s perspective. J. Biomed. Inform. 2020; 103:103380. [DOI] [PubMed] [Google Scholar]
  • 78. Hill S.M., Heiser L.M., Cokelaer T., Unger M., Nesser N.K., Carlin D.E., Zhang Y., Sokolov A., Paull E.O., Wong C.K.et al.. Inferring causal molecular networks: empirical assessment through a community-based effort. Nat. Methods. 2016; 13:310–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Mercatelli D., Scalambra L., Triboli L., Ray F., Giorgi F.M.. Gene regulatory network inference resources: A practical overview. Biochim. Biophys. Acta (BBA)-Gene Regul. Mech. 2020; 1863:194430. [DOI] [PubMed] [Google Scholar]
  • 80. Tzin V., Fernandez-Pozo N., Richter A., Schmelz E.A., Schoettner M., Schäfer M., Ahern K.R., Meihls L.N., Kaur H., Huffaker A.et al.. Dynamic maize responses to aphid feeding are revealed by a time series of transcriptomic and metabolomic assays. Plant Physio. 2015; 169:1727–1743. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkab1200_Supplemental_Files

Data Availability Statement

The netOmics R package is available at https://github.com/abodein/netOmics. HeLa Cell raw data for case study 1 can be found in (35). Maize raw data for case study 2 can be found in (80). Diabetes raw data for the case study 3 can be found in (45). Processed data and in-house scripts are available in a Github public repository: https://github.com/abodein/netOmics-case-studies.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES