BOX 1.
Terms, concepts, expressions, and definitions for clarity of readers foraying into multi-omics.
Terms, concepts, expressions | Definitions |
Multi-omics/panomics/integromics/integrated omics polyomics/transomics cross-omics | An approach aiming to improve the understanding of systems regulatory biology, molecular central dogma and genotype-phenotype relationship by combining 3 or more different omics data. |
Multi-table, Multi-block | Terms focusing on the format of the data rather than its nature, popular in chemoinformatics (among other fields); can (but does not have to) imply a larger number of features than observations in the integrated tables/blocks. |
Multi-view | Method often used in the field of ML for learning heterogeneity in the data and identification of patterns. By comparison to multiple cameras viewing an object from different angles, in omics context, the object can vary – whether it’s “cell,” “organism,” or just “genome” viewed via different seq* techniques. |
Multi-source | This term encompasses datasets that are derived from multiple sources of molecular assays. This terminology is used, for example by the joint and individual variation explained (JIVE) tool (O’Connell and Lock, 2016) during EDA. |
Multi-modal | A term often used in omics in reference to multiple measurements methods done at molecular level to gain holistic insights of cellular machinery (e.g., one cell at a time). It is also popular in drug repositioning that involves integration of more nuanced electronic health record (EHR) data integration. |
Central dogma of molecular biology | This is an explanation of the flow of genetic information within a biological system from DNA to RNA (transcription) to protein (translation) to metabolites (enzyme catalysis). |
Machine learning (ML) method | Algorithm (a sequence of instructions) aimed at learning from data, with applications including exploration/dimensionality reduction (unsupervised methods, e.g., PCA, matrix factorization) and classification/prediction (supervised or semi-supervised methods) |
Deep learning (DL) method | A subtype of ML using deep neural networks, composed of artificial neurons (signal aggregating or transforming units) arranged in layers; the depth of the DL refers to the number of “hidden” layers between the “input” (exclusive) and “output” layers (inclusive). |
Fusion (Baldwin et al., 2020) | A specific type of integration that applies a uniform method in a scalable manner, to solve biological problems which the multi-omics measurements target. |
Exploratory data analysis (EDA) | It is an approach that is heavily used in statistics, data science field during early data analysis steps often coupled with visualization. |
Matrix factorization | A class of ML algorithms based on matrix decomposition, i.e., representation of a data matrix by two or more matrices (factors) that can be multiplied together to obtain the original matrix (or its approximation). It can be used for classification, prediction, or exploration. |
Data heterogeneity | The data with a structural variation that can be explained by the composition of the analyzed dataset; encompasses both the clinical heterogeneity (e.g., presence of two groups with different genetic make-up due to ancestral differences, or different underlying etiologies of a disease) and technical heterogeneity (i.e., batch effects). |
Meta-data | A table of organized information and instructions that helps to summarize the data properties in order to make it findable and usable for data analysis across same or multiple projects. |
Git | A version-control system for tracking changes in source code and other documents during software development. Platforms such as Github and Gitlab are built on top of it. |