Abstract
Most biological mechanisms involve more than one type of biomolecule, and hence operate not solely at the level of either genome, transcriptome, proteome, metabolome or ionome. Datasets resulting from single-omic analysis are rapidly increasing in throughput and quality, rendering multi-omic studies feasible. These should offer a comprehensive, structured and interactive overview of a biological mechanism. However, combining single-omic datasets in a meaningful manner has so far proved challenging, and the discovery of new biological information lags behind expectation. One reason is that experiments conducted in different laboratories can typically not to be combined without restriction. Second, the interpretation of multi-omic datasets represents a significant challenge by nature, as the biological datasets are heterogeneous not only for technical, but also for biological, chemical, and physical reasons. Here, multi-layer network theory and methods of artificial intelligence might contribute to solve these problems. For the efficient application of machine learning however, biological datasets need to become more systematic, more precise – and much larger. We conclude our review with basic guidelines for the successful set-up of a multi-omic experiment.
Highlights
-
•
Most biological mechanisms involve more than one class of biomolecule.
-
•
Multi-omics approaches integrate information from multiple layers of biological data.
-
•
The bottleneck remains in low sample numbers and strategies to make sense out of heterogeneous datasets.
-
•
Multi-layer networks and artificial intelligence can dissect multi-omics data if large and systematic datasets are available.
Introduction
Many biological processes are highly dynamic, and their regulation as well as functionality involves a multitude of interactions between the genome, epigenome, transcriptome, proteome, metabolome, and ionome 1, 2, 3. Thus, in order to comprehensively understand a process of fundamental biological importance, it is critical not only to understand these biological layers as separate elements, but to dissect how they interact with one another (Figure 1).
An unprecedented pace in the development of ‘omic’ technologies, as well as increasing investments into research facilities, has greatly increased access to ‘genome-scale’ technologies across the biosciences. Studies involving multiple of these techniques (‘multi-omics’) have given rise to a new era in Systems Biology, but generate the need to integrate and combine very different types of biological information. While the obvious need for a ‘multi-level’ biological analysis has created an anticipation that multi-omics is capable of revealing new biological mechanisms, it is becoming increasingly clear that our current methodological spectrum for the analysis of biological data, and the theoretical framework required to interpret the obtained information, is lagging far behind. Hence, a large number of high-quality datasets are created, only to be incompletely analysed, and lots of biological insight remains buried. In this review we highlight some new developments that aim to change this situation, and discuss typical pitfalls that are to be avoided in order to conduct a successful multi-omic study.
Challenges to combine multi-omic biological information
While none of the current omic technologies is perfect, some come considerably closer to providing a comprehensive picture of the biological layer they aim to address, whilst some others lag behind. Often, this has less to do with the state of the technological developments themselves, and more with huge differences in the chemical and physical complexity of each biological level (Figure 1).
-
1.
The genome, according to the Central Dogma [4], is the basal layer of the cell, and, at the same time, it is the biological layer most effectively captured by current omic technology. Being composed of strands of the four nucleotides, the genome is a linear, effectively digital sequence. By leveraging the intrinsic complementarity of base pairs (sequencing by synthesis), it has become possible to rapidly genotype an unprecedented number of samples at a relatively low cost 5, 6, 7, 8. The ability to efficiently sequence genomes and to predict RNA and protein sequences from it 5, 6, 7, 8, has effectively opened the door for multi-omic approaches. The digital nature of the DNA sequences renders them the easiest form of biological ‘omic’ information to be stored in databases and shared between labs.
DNA sequence information is static by nature and is not directly informative about biological mechanisms encoded within it. ‘Epigenomics’, the genome-wide picturing of DNA modifications or chromatin structure in 2D and 3D 9, 10, 11, is progressing rapidly, but is not yet covering the comprehensive set of DNA modifications or structural elements. In order to get a comprehensive view on cellular heterogeneity, biological problems are more often investigated on a single cell level. For an excellent review about the state of te technology and remaining challenges, the reader is referred to [12].
-
2.
The transcriptome was the first ‘functional’ molecular layer of the cell that was accessible on the genomic scale, and remains the dynamic layer with the best coverage [13]. Indeed, the rise of transcriptomics led to a plethora of biological discoveries, and transcriptomic was the technology that opened the door for the first series of real multi-omic studies, where comparisons between DNA sequence and mRNA expression facilitated the identification of structural elements in genome and transcriptome [14]. Transcriptional analysis remains therefore more frequently employed – i.e. it is for most biologists the first contact with an ‘omic’ technology – as its data is still more easily analysed and shared than the more ‘downstream omics’ such as proteomics and metabolomics. More recently, transcriptomics is enjoying a second revival, as it is in many cases applicable to single cells 15, 16, 17, 18.
-
3.
The proteome is the primary ‘functional’ layer of the cell bridging gene expression to phenotype, and therefore of massive complexity 19, 20. While the sequence of a protein can be (largely) derived from genome and transcriptome, the function of a protein depends on its concentration, folding, turnover, post-translational modifications, cellular localisation, and its binding to other proteins and metabolites. As a consequence of this complexity, no technology covers the proteome in its diversity comprehensively. While most proteins can now routinely be quantified in a low number of samples, post-translational modifications and dynamic structural changes still fall short of being exhaustively quantifiable 21, 22, 23, 24, 25, 26. Further challenges for the era of data driven biology consider sample throughput, that in proteomics remains considerable lower as in genomics, transcriptomics or metabolomics, and quantitative precision on large sample series, that is – compared to the other omic-technologies – low.
-
4.
The metabolome is the first cellular layer that is not directly encoded in the genome, but is instead a product of the functional spectrum of the proteome, in contact with the environment of the cell [27]. Therefore, the metabolome constitutes a ‘phenotype’ of the cell. Despite being downstream, the genome, transcriptome and proteome consist however of components made by the metabolome. Furthermore, the central components of the metabolome are better conserved across all organisms compared to genome, transcriptome and proteome, and can also be recapitulated by a non-enzymatic chemistry, and hence the metabolome is believed evolutionarily to be the oldest part of the cell 28, 29. The metabolome was recognized as a key player in very early clinical research, a similar central role in the molecular biosciences has recently begun to be re-instated 30, 31. Due to its enormous chemical complexity, the metabolome can however not be captured by a single technology comprehensively. However, efforts in reconstructing metabolic networks on the basis of the rules of biochemistry, have been fruitful on the cellular scale. Genome-scale reconstructions of the metabolic network form the basis for the prediction of cellular phenotypes such as gene essentiality and the growth rate 32, 33, 34. Interactions between the metabolome and the proteome are considered key in the identification of so far overlooked cellular mechanisms. A key challenge to generate large-scale metabolomics data for the era of data driven biology, is to make the right decision between the need to quantify a low number of metabolites at high precision [35], or a large number of metabolites at lower precision [36], complementary approaches picturing a different set of biological mechanisms.
-
5.
The ionome reflects total elemental composition of the cell 37, 38. Different to other omic layers, the ionome is not produced by the cell, but rather is a consequence of transport and diffusion processes and the incorporation of elements into biomass [39]. Therefore, the total and relative cellular ionome is sensitive to the genetics of an organism, but also all physiological changes with a significant impact on any form of membrane transport, intra- and extra-cellular pH, redox potential, ionic strength, nutrient supply, metabolic activity, cell size, membrane composition and potential, and changes in organellar biology. If significantly altered, the ionome is expected to have massive impact on the function of any biological system as it determines the cellular reaction environment, and with it, all simultaneously co-occurring chemical processes. As a result, the ionome represents the convergence of the physiological changes originating over genome, transcriptome, proteome and metabolome [40]. However, although the ionome can be precisely measured, the biological interest in ionomic data has remained moderate because the interpretation is so far challenging.
-
6.
The phenome is the sum of all organismal phenotypes, representing the top layer of omic applications [41]. While each database or functional-genetic screen collecting phenotypic information is in essence a form of ‘phenomics’, several studies have attempted to achieve the systematic generation of ‘phenomes’. These include detection of growth size of bacterial or fungal colonies [42], the movement of C. elegans in different environments [43], or the non-invasive studies of growth and photosynthesis in plants [44], to name a few. The main challenge of phenomics lies in the recording of the vast possibilities of traits that emerge from the combination of influences from environmental and genomic cues, superimposed by cell-to-cell and temporal heterogeneity [41].
The key challenge: how to render descriptive data predictable about function
It is noteworthy that many of the most successful applications of multi-omic studies did address the functionality of metabolism (some key papers are found here. ∗45, 46, ∗∗47, 48, ∗∗49, 50, 51) or very basic cellular processes such as translation 52, 53, 54 or transcription factor binding 55, 56 via integration of separate omics experiments.
This is perhaps less surprising in the sense that metabolic networks function in a cross-layer manner, and hence, depend on multi-omics data by principle. However, it is also of note that the field of metabolomic research has provided many research tools to facilitate the integration of multi-omic technologies. Moreover, the huge effort invested to reconstruct the topological organisation of the metabolic network on the genomic scale is starting to pay back 57, 58, 59. Nonetheless, despite these successes, neither metabolome, in addition to most downstream phenotypes, remains so far not predictable from multi-omic data ∗45, 50.
Why is the combination of different layers of biological data so complicated? Although none of proteome, metabolome, ionome and phenome can be fully captured at this time, the partial coverage that is already possible has resulted in staggering amounts of multidimensionality that is difficult to capture with the classically applied methods of biological data analysis [60]. New avenues are however enabled through network science, which has historically contributed to the study of omics data by shedding light on the topology and organisation of biological networks such as metabolic-reaction networks, protein-protein interaction networks and genetic regulatory networks 61, 62. Recently, a mathematical framework has emerged in network science which appears promising for the task of integrating multi-omic data: multi-layer networks 63, 64, 65. Multi-layer networks are capable of describing systems where interactions of different nature are involved. In its most general formulation, a multi-layer network is a network formed by several layers or standard networks. Each of these layers describes interactions/relations of a specific kind between the nodes of that layer.
The first class of multi-layer networks are the multiplex networks [63]. A multiplex network comprises a common set of nodes but the pattern of connections between them is layer-specific (Figure 2B). Because the set of nodes is shared between the layers, these network structures are suitable for the analysis of multi-omics data featuring either a single class of biomolecules or different classes of biomolecules for which a one-to-one correspondence between the classes can be established. For instance, the multiplex network framework was successfully applied to analyse multi-omic datasets from different types of tissues (gastric, lung, pancreas, colorectal) under normal and cancer conditions by considering co-expression data, protein-protein physical interactions, transcription factor co-targeting relations, and microRNA co-targeting relations. A consensus clustering algorithm was used to reveal multi-layer communities and identify the candidate driver genes for the different types of cancers through an enrichment analysis of the communities [66].
In the second class – interconnected networks – nodes in different layers represent distinct types of objects, which can vary in number, and interactions between nodes from different layers can be described by inter-layer connections (Figure 2C). Interconnected networks can be used to model multi-omic data that involve biomolecules of different classes (e.g. proteomics and metabolomics data), and interactions/relations between biomolecules that belong to the same class (e.g. physical interactions between proteins or metabolic reactions between metabolites) or to different classes (e.g. proteins catalysing reactions between metabolites). For example, trans-omic networks are global biochemical networks that result from integrating measurements across the multiple omic layers of the genome, transcriptome, proteome and metabolome [65].
Multi-layer networks as basis for the biological application of artificial intelligence
A multi-layer network representation containing all the functional interactions within and across all the omic layers of the cell for a given organism would fascinatingly and comprehensively describe our knowledge about that system. However, at the moment, we lack such a description because of the incompleteness of the data collected in the different omic fields and because of the limitations in the accuracy of the recorded and available data itself. This makes it necessary to incorporate in the multi-layer network models of the cell other mechanistic models and/or data-driven machine learning algorithms that are able to fill the gap in our empirical knowledge of how the information flows across the omic layers 67, 68, 69.
Artificial intelligence (AI) based machine learning approaches allow learning of complex functional relationships from data in an unbiased fashion without the need of a priori assumptions. The principle of a machine learning model is to train on one biological dataset and then use detected patterns to predict another. AI has been a great success in computer science and, in regards to biological approaches, is especially appealing for building predictive models on the basis of biological networks when underlying molecular mechanisms are unknown. Already, the multi-omics field has benefited greatly from machine learning, including for the analysis of genomics, proteomics and metabolomics data sets 70, 71, 72. Recent applications range from clinical predictions in cancer therapy to personalised dietary interventions based on prediction of postprandial glucose responses from DNA sequences obtained from stool microbiota [73].
Machine learning for typical biological applications can be divided into two major categories, namely, supervised learning and unsupervised learning. In supervised learning, the goal of machine learning algorithms is to learn a function y from the set of features xn present in the training dataset. These features can represent any molecular signals, such as DNA sequence, expression of genes, proteins, metabolites or a set of pixels in imaging data. The response function y can be anything of interest spanning from the disease class to the levels of transcripts to be predicted from DNA features. Conversely, in unsupervised learning, the aim is to infer a function that describes a hidden structure from an “unlabeled” example, e.g. identify common molecular patterns that form a cluster of samples, typically used in “guilt by association” analyses for function prediction 35, 74.
The challenges associated with integrative analysis of multi-omics datasets arise from the inherent heterogeneity of the data. Any unsupervised learning technique is ultimately based on the study of variation between the samples. However, different types of datasets often have different numbers of features. In addition, the typical degree of feature variation depends strongly on the nature of the data, whilst conventional unsupervised methods, such as PCA for dimensionality reduction or K-means for clustering, are insensitive to features with low inter-sample variation. These methods thus cannot be directly employed for comprehensive analysis of concatenated datasets, in which multiple sets of features of different types are matched to the same samples. To address this problem, alternative statistical approaches are being developed to deal specifically with multi-omics data [75]. For example, the contribution of each feature set can be weighted using multiple factor analysis [76], or the features from different sets can be modelled using a common set of Gaussian latent variables [77].
Conventional supervised learning algorithms e.g. linear regression, logistic regression, support vector machine (SVM) and decision trees, require a set of manually engineered features that represent one input layer and allow for prediction of an output layer. Such architectures are typically called “shallow” and have been shown to be limited in their applications even when large datasets are available [78]. In contrast, “deep” architectures or “deep learning” [79] is abstracted by multiple hidden layers between input and output layer. In each layer the information is passed on to each unit as a weighted sum of units from previous layers with some – usually nonlinear – transformation in order to obtain a new representation of the input [80]. Such architectures benefit from very large datasets for finding structures hidden within them for learning complex feature representations that are successfully used to make accurate predictions out of biological data. Apart from accurate predictions, these “self-learned” feature representations have the potential of uncovering complex molecular interactions that would have been missed in a conventional hypothesis-driven paradigm. A recent review provides more specific examples about the applications of deep learning in biology [81].
One of the key remaining challenges is the interpretation of machine learning models. For instance, in biology, it is naive to be aiming for identification of a “key regulator” protein when, in reality, all processes are multifactorial. Similarly, in machine learning, predictive features are multidimensional and are usually in complex relationships with each other. As a result, one needs to be more careful when interpreting resultant models. To facilitate the interpretation of the machine learning models achieving a prediction – hence to use them to identify biological mechanisms – one can apply them to distinct biological networks. For instance, using metabolic networks one can integrate data to link gene expression to metabolism 50, 82, 83, or, one can apply them to phosphoproteomics data to predict metabolite concentrations from kinase activity profiles [84]. In a similar fashion one can abstract data to the known processes that are predictive of the outcome of interest. Applications of deep learning techniques are currently mostly limited to sequence and image analysis and will open its full potential when omics-technologies increase throughput significantly. For instance, with the development of single-cell technologies for other – omics, a single experiment will interrogate tens of thousands of individual cells opening new horizons in quantitative biology. A key problem here is the quantitative precision: artificial intelligence will only successfully detect the biological patterns from features that have a positive signal to noise ratio. While genomics, metabolomics and ionomics achieve high precision values on large datasets, single-cell mRNA sequencing and quantitative proteomics in particular need to improve in this respect. Once these problems are solved, the use of artificial intelligence in data driven systems biology through the integration of multi-omics will be a key next step in solving the genotype-phenotype problem.
Conclusions
The remarkable advances in imaging, sequencing, and mass spectrometry technologies have made large scale omics datasets increasingly available. Despite the huge amount of accessible data generated by single-omic experiments, identification of novel biological mechanisms upon combining them has not yet reached expectations. This is partially caused by the intrinsic difficulties to combine highly heterogeneous data, and the fact that the supposedly ‘large’ datasets are still often small compared to what would be needed to work effectively using unsupervised learning approaches, and often fall short of the required precision. In order to achieve a high information content in multi-omic data, as a community we need to learn to adhere to standards, to conduct experiments in a highly systematic fashion, and to anticipate the integration of sophisticated mathematical approaches at the stage of experimental planning (Box 1). The analysis of multi-omic data via multi-layer networks and machine learning is highly promising. Biology is indeed in the process of obtaining a closer attachment to the data science. Multi-omic sciences has the potential to transform our understanding of biological systems and enable an excitingly fresh view on how the biological system is functioning.
Box 1. The beginner's guide on the design of a multi-omic experiments.
The rapid development of omics technologies has meant that it is now financially and technically feasible for many research groups to perform not only single omics experiments, but to produce multiple omics datasets. We summarize here some key elements to consider when attempted to conduct a multi-omic experiment for the first time.
-
1.
Beware! The scientific part of multi-omic work starts, not ends, when the biological data is recorded. Whilst seemingly obvious, we feel that this is an important point to make given the number of multi-omics datasets published without comprehensive follow-up data analyses. Furthermore, please be aware that it has somehow become typical in single omic analyses to set arbitrary thresholds to classify what is up or downregulated and to perform enrichment analysis, whilst sample comparison is often based on correlation coefficients, similarity heatmaps, principal component analysis or even Venn diagrams of up or downregulated genes. There is nothing wrong with this analysis per se, but similar analyses will not interpret multi-omic datasets due to their heterogeneous natures. So book onto an R-course, at least.
-
2.
Be prepared to spend a long, long time with the data to understand it. On a related point, the power of multi-omics to generate mechanistic models comes from integration of the different omics layers. The value of the network, which can be constructed from multi-omics datasets is greater than that of the sum of the single omics layers analysed separately, as these networks often recapitulate the topology of the real biological network and hence can be used to construct mechanistic and predictive models of biological phenomena. This means, however, one does enter uncharted territory and one typically cannot ‘outsource’ the interpretation of a multi-omic dataset to someone else, i.e. a bioinformatics facility.
-
3.
Cost benefit analysis – one omics done well can be much better than many done badly: Whilst the availability of multiple omics platforms makes multi-omics an appealing option, resources are still limited and must be utilised most effectively. Omics studies are typified by a high number of molecular features for a very small number of samples (p >> n), and this creates challenges when machine learning techniques are applied. Furthermore, omics data are often noisy and burdened by batch effects, both of which become easier to deal with as the sample size is increased and replicate measurements are added. Thus, it is not always a bad idea to sacrifice the number of omics layers or molecular features measured for the sake of sample size and many replicates. Finally, please think about the precision of your method, this is crucial.
-
4.
Make use of publicly available datasets, and by doing so, experience the beauty of reporting and data standards. Following the explosive development of omics platforms there is now a rapidly increasing number of biological data available, much of which has not been fully analysed (or analysed at all). This is unlikely to change in the immediate future, and so we discourage conducting a multi-omics experiment in the hope that someone else will perform comprehensive computational analyses at a later point (which might never happen). Conversely, there are also challenges in using available datasets, particularly when integrating different datasets, as this requires highly specialised models in order to deal with batch effects and differences in experimental protocols. Another good reason for re-using others datasets before creating your own is that it is quite instructive of how important it is to adhere to community standards concerning data types and reporting guidelines.
-
5.
Good publication practice: Resist the temptation to overuse the power of the example, even if (at present) you (might still) convince Reviewers and Editors. The majority of multi-omic papers start with the generation of a multi-layer dataset, but then go on to pick out just a single example and to finish the study as it it were a very classic molecular biology paper which was explicitly targeted towards this example from the beginning. In the end, such a strategy renders the beauty and the power of multi-layer biology obsolete. However, it is clear as multi-omic studies become more widespread that such ‘nice stories’ are often turning out to be nothing more than to be statistically insignificant correlations between two parallelly occurring phenomena in huge datasets. With more statistical knowledge available across the biological disciplines, the scientific alertness to distinguish causality from correlation is increasing rapidly, and the tide is turning. Multiple fields in other branches of science have already benefited from the application of data science techniques, such as machine learning, to generate predictive models of complex natural phenomena, as will it be in biosciences in the near future. In other words, multifactorial relationships are becoming increasingly accepted as scientific results at the expense of a lower acceptance for scientific oversimplifications.
Alt-text: Box 1
Acknowledgements
We kindly thank our lab members Enrica Calvani, Clara Correia-Melo, Vadim Demichev, Joanna Segal, Gerbren Spoelstra, Julia Stanger, Christoph Messner and Michael Mülleder for contributions in writing this article. Work in the Ralser lab is supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001134), the UK Medical Research Council (FC001134), and the Wellcome Trust (FC001134), A.Z. is SciLifeLab fellow. ST is a Boehringer Ingelheim Fonds PhD fellow.
This review comes from a themed issue on Systems biology of model organisms (2017)
Edited by Jens Nielsen and Kiran Raosaheb Patil
References
- 1.Kurakin A. Scale-free flow of life: on the biology, economics, and physics of the cell. Theor Biol Med Model. 2009;6:6. doi: 10.1186/1742-4682-6-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bensimon A., Heck A.J.R., Aebersold R. Mass spectrometry-based proteomics and network biology. Annu Rev Biochem. 2012;81:379–405. doi: 10.1146/annurev-biochem-072909-100424. [DOI] [PubMed] [Google Scholar]
- 3.Gutteridge A., Pir P., Castrillo J.I., Charles P.D., Lilley K.S., Oliver S.G. Nutrient control of eukaryote cell growth: a systems biology study in yeast. BMC Biol. 2010;8:68. doi: 10.1186/1741-7007-8-68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Crick F. Central dogma of molecular biology. Nature. 1970;227:561–563. doi: 10.1038/227561a0. [DOI] [PubMed] [Google Scholar]
- 5.Goodwin S., McPherson J.D., McCombie W.R. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–351. doi: 10.1038/nrg.2016.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Adams D.J., Doran A.G., Lilue J., Keane T.M. The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes. Mamm Genome. 2015;26:403–412. doi: 10.1007/s00335-015-9579-6. [DOI] [PubMed] [Google Scholar]
- 7.International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]
- 8.Levy J. Sequencing the yeast genome: an international achievement. Yeast. 1994;10:1689–1706. doi: 10.1002/yea.320101304. [DOI] [PubMed] [Google Scholar]
- 9.Friedman N., Rando O.J. Epigenomics and the structure of the living genome. Genome Res. 2015;25:1482–1490. doi: 10.1101/gr.190165.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ntziachristos P., Abdel-Wahab O., Aifantis I. Emerging concepts of epigenetic dysregulation in hematological malignancies. Nat Immunol. 2016;17:1016–1024. doi: 10.1038/ni.3517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schones D.E., Zhao K. Genome-wide approaches to studying chromatin modifications. Nat Rev Genet. 2008;9:179–191. doi: 10.1038/nrg2270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gawad C., Koh W., Quake S.R. Single-cell genome sequencing: current state of the science. Nat Rev Genet. 2016;17:175–188. doi: 10.1038/nrg.2015.16. [DOI] [PubMed] [Google Scholar]
- 13.Sultan M., Schulz M.H., Richard H., Magen A., Klingenhoff A., Scherf M., Seifert M., Borodina T., Soldatov A., Parkhomchuk D. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. doi: 10.1126/science.1160342. [DOI] [PubMed] [Google Scholar]
- 14.Spies D., Ciaudo C. Dynamics in transcriptomics: advancements in RNA-seq time course and downstream analysis. Comput Struct Biotechnol J. 2015;13:469–477. doi: 10.1016/j.csbj.2015.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kolisko M., Boscaro V., Burki F., Lynn D.H., Keeling P.J. Single-cell transcriptomics for microbial eukaryotes. Curr Biol. 2014;24:R1081–R1082. doi: 10.1016/j.cub.2014.10.026. [DOI] [PubMed] [Google Scholar]
- 16.Vallejos C.A., Risso D., Scialdone A., Dudoit S., Marioni J.C. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods. 2017;14:565–571. doi: 10.1038/nmeth.4292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Efroni I., Birnbaum K.D. The potential of single-cell profiling in plants. Genome Biol. 2016;17:65. doi: 10.1186/s13059-016-0931-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kanter I., Kalisky T. Single cell transcriptomics: methods and applications. Front Oncol. 2015;5:53. doi: 10.3389/fonc.2015.00053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Harper J.W., Bennett E.J. Proteome complexity and the forces that drive proteome imbalance. Nature. 2016;537:328–338. doi: 10.1038/nature19947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Calvo S.E., Mootha V.K. The mitochondrial proteome and human disease. Annu Rev Genomics Hum Genet. 2010;11:25–44. doi: 10.1146/annurev-genom-082509-141720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chuh K.N., Batt A.R., Pratt M.R. Chemical methods for encoding and decoding of posttranslational modifications. Cell Chem Biol. 2016;23:86–107. doi: 10.1016/j.chembiol.2015.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chuh K.N., Pratt M.R. Chemical methods for the proteome-wide identification of posttranslationally modified proteins. Curr Opin Chem Biol. 2015;24:27–37. doi: 10.1016/j.cbpa.2014.10.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wiśniewski J.R. Mass spectrometry-based proteomics: principles, perspectives, and challenges. Arch Pathol Lab Med. 2008;132:1566–1569. doi: 10.5858/2008-132-1566-MSPPPA. [DOI] [PubMed] [Google Scholar]
- 24.Schubert O.T., Röst H.L., Collins B.C., Rosenberger G., Aebersold R. Quantitative proteomics: challenges and opportunities in basic and applied research. Nat Protoc. 2017;12:1289–1294. doi: 10.1038/nprot.2017.040. [DOI] [PubMed] [Google Scholar]
- 25.Uhrig R.G., Moorhead G.B. Plant proteomics: current status and future prospects. J Proteomics. 2013;88:34–36. doi: 10.1016/j.jprot.2013.01.018. [DOI] [PubMed] [Google Scholar]
- 26.Feng Y., De Franceschi G., Kahraman A., Soste M., Melnik A., Boersema P.J., de Laureto P.P., Nikolaev Y., Oliveira A.P., Picotti P. Global analysis of protein structural changes in complex proteomes. Nat Biotechnol. 2014;32:1036–1044. doi: 10.1038/nbt.2999. [DOI] [PubMed] [Google Scholar]
- 27.Zamboni N., Saghatelian A., Patti G.J. Defining the metabolome: size, flux, and regulation. Mol Cell. 2015;58:699–706. doi: 10.1016/j.molcel.2015.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Keller M.A., Kampjut D., Harrison S.A., Ralser M. Sulfate radicals enable a non-enzymatic Krebs cycle precursor. Nat Ecol Evol. 2017;1:0083. doi: 10.1038/s41559-017-0083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Messner C.B., Driscoll P.C., Piedrafita G., De Volder M.F.L., Ralser M. Nonenzymatic gluconeogenesis-like formation of fructose 1,6-bisphosphate in ice. Proc Natl Acad Sci U S A. 2017;114:7403–7407. doi: 10.1073/pnas.1702274114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Puchades-Carrasco L., Pineda-Lucena A. Metabolomics in pharmaceutical research and development. Curr Opin Biotechnol. 2015;35:73–77. doi: 10.1016/j.copbio.2015.04.004. [DOI] [PubMed] [Google Scholar]
- 31.Bujak R., Struck-Lewicka W., Markuszewski M.J., Kaliszan R. Metabolomics for laboratory diagnostics. J Pharm Biomed Anal. 2015;113:108–120. doi: 10.1016/j.jpba.2014.12.017. [DOI] [PubMed] [Google Scholar]
- 32.Wang Z., Danziger S.A., Heavner B.D., Ma S., Smith J.J., Li S., Herricks T., Simeonidis E., Baliga N.S., Aitchison J.D. Combining inferred regulatory and reconstructed metabolic networks enhances phenotype prediction in yeast. PLoS Comput Biol. 2017;13 doi: 10.1371/journal.pcbi.1005489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nikoloski Z., Perez-Storey R., Sweetlove L.J. Inference and prediction of metabolic network fluxes. Plant Physiol. 2015;169:1443–1455. doi: 10.1104/pp.15.01082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lewis N.E., Nagarajan H., Palsson B.O. Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol. 2012;10:291–305. doi: 10.1038/nrmicro2737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mülleder M., Calvani E., Alam M.T., Wang R.K., Eckerstorfer F., Zelezniak A., Ralser M. Functional metabolomics describes the yeast biosynthetic regulome. Cell. 2016;167 doi: 10.1016/j.cell.2016.09.007. 553–565.e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fuhrer T., Zampieri M., Sévin D.C., Sauer U., Zamboni N. Genomewide landscape of gene-metabolome associations in Escherichia coli. Mol Syst Biol. 2017;13:907. doi: 10.15252/msb.20167150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Baxter I. Ionomics: studying the social network of mineral nutrients. Curr Opin Plant Biol. 2009;12:381–386. doi: 10.1016/j.pbi.2009.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Salt D.E., Baxter I., Lahner B. Ionomics and the study of the plant ionome. Annu Rev Plant Biol. 2008;59:709–733. doi: 10.1146/annurev.arplant.59.032607.092942. [DOI] [PubMed] [Google Scholar]
- 39.Huang X.-Y., Salt D.E. Plant ionomics: from elemental profiling to environmental adaptation. Mol Plant. 2016;9:787–797. doi: 10.1016/j.molp.2016.05.003. [DOI] [PubMed] [Google Scholar]
- 40.Yu D., Danku J.M.C., Baxter I., Kim S., Vatamaniuk O.K., Vitek O., Ouzzani M., Salt D.E. High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome. BMC Genomics. 2012;13:623. doi: 10.1186/1471-2164-13-623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Houle D., Govindaraju D.R., Omholt S. Phenomics: the next challenge. Nat Rev Genet. 2010;11:855–866. doi: 10.1038/nrg2897. [DOI] [PubMed] [Google Scholar]
- 42.Zackrisson M., Hallin J., Ottosson L.-G., Dahl P., Fernandez-Parada E., Ländström E., Fernandez-Ricaud L., Kaferle P., Skyman A., Stenberg S. Scan-o-matic: high-resolution microbial phenomics at a massive scale. G3. 2016;6:3003–3014. doi: 10.1534/g3.116.032342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Yemini E., Jucikas T., Grundy L.J., Brown A.E.X., Schafer W.R. A database of Caenorhabditis elegans behavioral phenotypes. Nat Methods. 2013;10:877–879. doi: 10.1038/nmeth.2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flood P.J., Kruijer W., Schnabel S.K., van der Schoor R., Jalink H., Snel J.F.H., Harbinson J., Aarts M.G.M. Phenomics for photosynthesis, growth and reflectance in Arabidopsis thaliana reveals circadian and long-term fluctuations in heritability. Plant Methods. 2016;12:14. doi: 10.1186/s13007-016-0113-y. [DOI] [PMC free article] [PubMed] [Google Scholar]; An automated high-throughput phenotyping platform was applied to screen 1440 Arabidopsis plants multiple times per day, achieving a new level of systematics in Plant Phenomics
- Hackett S.R., Zanotelli V.R.T., Xu W., Goya J., Park J.O., Perlman D.H., Gibney P.A., Botstein D., Storey J.D., Rabinowitz J.D. Systems-level analysis of mechanisms regulating yeast metabolic flux [Internet] Science. 2016:354. doi: 10.1126/science.aaf2786. [DOI] [PMC free article] [PubMed] [Google Scholar]; By incorporating multiple-omics layers, the relationship of metabolome, gene expression to achieve metabolic flux yeast is elaborated
- 46.Alam M.T., Zelezniak A., Mülleder M., Shliaha P., Schwarz R., Capuano F., Vowinckel J., Radmanesfahar E., Krüger A., Calvani E. The metabolic background is a global player in Saccharomyces gene expression epistasis. Nat Microbiol. 2016;1:15030. doi: 10.1038/nmicrobiol.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliveira A.P., Ludwig C., Zampieri M., Weisser H., Aebersold R., Sauer U. Dynamic phosphoproteomics reveals TORC1-dependent regulation of yeast nucleotide and amino acid biosynthesis. Sci Signal. 2015;8 doi: 10.1126/scisignal.2005768. rs4–rs4. [DOI] [PubMed] [Google Scholar]; By time resolved phosphoproteomics, the coordination of metabolic signaling is detected
- 48.Castrillo J.I., Zeef L.A., Hoyle D.C., Zhang N., Hayes A., Gardner D.C., Cornell M.J., Petty J., Hakes L., Wardleworth L. Growth control of the eukaryote cell: a systems biology study in yeast. J Biol. 2007;6:4. doi: 10.1186/jbiol54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stefely J.A., Kwiecien N.W., Freiberger E.C., Richards A.L., Jochem A., Rush M.J.P., Ulbrich A., Robinson K.P., Hutchins P.D., Veling M.T. Mitochondrial protein functions elucidated by multi-omic mass spectrometry profiling. Nat Biotechnol. 2016;34:1191–1197. doi: 10.1038/nbt.3683. [DOI] [PMC free article] [PubMed] [Google Scholar]; Combining more than 3000 mass spectrometry measurements the reveal different omic layers, this study addresses mitochondrial protein function systematically
- 50.Zelezniak A., Sheridan S., Patil K.R. Contribution of network connectivity in determining the relationship between gene expression and metabolite concentration changes. PLoS Comput Biol. 2014;10 doi: 10.1371/journal.pcbi.1003572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Buescher J.M., Liebermeister W., Jules M., Uhr M., Muntel J., Botella E., Hessling B., Kleijn R.J., Le Chat L., Lecointe F. Global network reorganization during dynamic adaptations of Bacillus subtilis metabolism. Science. 2012;335:1099–1103. doi: 10.1126/science.1206871. [DOI] [PubMed] [Google Scholar]
- 52.Zur H., Tuller T. Predictive biophysical modeling and understanding of the dynamics of mRNA translation and its evolution. Nucleic Acids Res. 2016;44:9031–9049. doi: 10.1093/nar/gkw764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Chu D., Zabet N., von der Haar T. A novel and versatile computational tool to model translation. Bioinformatics. 2012;28:292–293. doi: 10.1093/bioinformatics/btr650. [DOI] [PubMed] [Google Scholar]
- 54.Romano M.C., Thiel M., Stansfield I., Grebogi C. Queueing phase transition: theory of translation. Phys Rev Lett. 2009;102:198104. doi: 10.1103/PhysRevLett.102.198104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Simicevic J., Schmid A.W., Gilardoni P.A., Zoller B., Raghav S.K., Krier I., Gubelmann C., Lisacek F., Naef F., Moniatte M. Absolute quantification of transcription factors during cellular differentiation using multiplexed targeted proteomics. Nat Methods. 2013;10:570–576. doi: 10.1038/nmeth.2441. [DOI] [PubMed] [Google Scholar]
- 56.Zabet N.R., Adryan B. Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res. 2015;43:84–94. doi: 10.1093/nar/gku1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jeong H., Tombor B., Albert R., Oltvai Z.N., Barabási A.L. The large-scale organization of metabolic networks. Nature. 2000;407:651–654. doi: 10.1038/35036627. [DOI] [PubMed] [Google Scholar]
- 58.Thiele I., Swainston N., Fleming R.M.T., Hoppe A., Sahoo S., Aurich M.K., Haraldsdottir H., Mo M.L., Rolfsson O., Stobbe M.D. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 2013;31:419–425. doi: 10.1038/nbt.2488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Förster J., Famili I., Fu P., Palsson B.Ø., Nielsen J. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 2003;13:244–253. doi: 10.1101/gr.234503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Merico D., Gfeller D., Bader G.D. How to visually interpret biological data using networks. Nat Biotechnol. 2009;27:921–924. doi: 10.1038/nbt.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Barabási A.-L., Oltvai Z.N. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004;5:101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
- 62.Alon U. Chapman & Hall/CRC; 2014. An introduction to systems biology: design principles of biological circuits. [Google Scholar]
- 63.Kivelä M., Arenas A., Barthelemy M., Gleeson J.P., Moreno Y., Porter M.A. Multilayer networks. J Complex Netw. 2014;2:203–271. [Google Scholar]
- 64.Boccaletti S., Bianconi G., Criado R., del Genio C.I., Gómez-Gardeñes J., Romance M., Sendiña-Nadal I., Wang Z., Zanin M. The structure and dynamics of multilayer networks. Phys Rep. 2014;544:1–122. doi: 10.1016/j.physrep.2014.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Yugi K., Kubota H., Hatano A., Kuroda S. Trans-omics: how to reconstruct biochemical networks across multiple “omic” layers. Trends Biotechnol. 2016;34:276–290. doi: 10.1016/j.tibtech.2015.12.013. [DOI] [PubMed] [Google Scholar]
- 66.Cantini L., Medico E., Fortunato S., Caselle M. Detection of gene communities in multi-networks reveals cancer drivers. Sci Rep. 2015;5:17386. doi: 10.1038/srep17386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yizhak K., Benyamini T., Liebermeister W., Ruppin E., Shlomi T. Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model. Bioinformatics. 2010;26:i255–i260. doi: 10.1093/bioinformatics/btq183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Aggarwal K., Lee K.H. Functional genomics and proteomics as a foundation for systems biology. Brief Funct Genomic Proteomic. 2003;2:175–184. doi: 10.1093/bfgp/2.3.175. [DOI] [PubMed] [Google Scholar]
- 69.Vert J.-P. John Wiley & Sons, Inc.; 2010. Reconstruction of biological networks by supervised machine learning approaches. In elements of computational systems biology; pp. 163–188. [Google Scholar]
- 70.Libbrecht M.W., Noble W.S. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16:321–332. doi: 10.1038/nrg3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Tyanova S., Temu T., Sinitcyn P., Carlson A., Hein M.Y., Geiger T., Mann M., Cox J. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods. 2016;13:731–740. doi: 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
- 72.Acharjee A., Ament Z., West J.A., Stanley E., Griffin J.L. Integration of metabolomics, lipidomics and clinical data using a machine learning method. BMC Bioinforma. 2016;17:440. doi: 10.1186/s12859-016-1292-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeevi D., Korem T., Zmora N., Israeli D., Rothschild D., Weinberger A., Ben-Yacov O., Lador D., Avnit-Sagi T., Lotan-Pompan M. Personalized nutrition by prediction of glycemic responses. Cell. 2015;163:1079–1094. doi: 10.1016/j.cell.2015.11.001. [DOI] [PubMed] [Google Scholar]; This study dissects the gut microbiome data by methods of artificial intelligence, and achieves a prediction of the human post glycemic response to different nutrient intake
- 74.Piovesan D., Giollo M., Ferrari C., Tosatto S.C.E. Protein function prediction using guilty by association from interaction networks. Amino Acids. 2015;47:2583–2592. doi: 10.1007/s00726-015-2049-3. [DOI] [PubMed] [Google Scholar]
- 75.Bersanelli M., Mosca E., Remondini D., Giampieri E., Sala C., Castellani G., Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinforma. 2016;17(Suppl 2):15. doi: 10.1186/s12859-015-0857-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.de Tayrac M., Lê S., Aubry M., Mosser J., Husson F. Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: multiple Factor Analysis approach. BMC Genomics. 2009;10:32. doi: 10.1186/1471-2164-10-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Shen R., Olshen A.B., Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25:2906–2912. doi: 10.1093/bioinformatics/btp543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Bottou L., Curtis F.E., Nocedal J. Optimization methods for large-scale machine learning [Internet] arXiv [stat.ML] 2016 [no volume] [Google Scholar]
- 79.Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. [DOI] [PubMed] [Google Scholar]
- 80.Li Y. Deep reinforcement learning: an overview [internet] arXiv [cs.LG] 2017 [no volume] [Google Scholar]
- Angermueller C., Pärnamaa T., Parts L., Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12:878. doi: 10.15252/msb.20156651. [DOI] [PMC free article] [PubMed] [Google Scholar]; Summarizes the current applications of Deep learning in Biology
- 82.Patil K.R., Nielsen J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci U S A. 2005;102:2685–2689. doi: 10.1073/pnas.0406811102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Robinson J.L., Nielsen J. Integrative analysis of human omics data using biomolecular networks. Mol Biosyst. 2016;12:2953–2964. doi: 10.1039/c6mb00476h. [DOI] [PubMed] [Google Scholar]
- Gonçalves E., Raguz Nakic Z., Zampieri M., Wagih O., Ochoa D., Sauer U., Beltrao P., Saez-Rodriguez J. Systematic analysis of transcriptional and post-transcriptional regulation of metabolism in yeast. PLoS Comput Biol. 2017;13 doi: 10.1371/journal.pcbi.1005297. [DOI] [PMC free article] [PubMed] [Google Scholar]; Describes elegant machine learning approach for prediction of metabolism from kinase activities