MOTL: enhancing multi-omics matrix factorization with transfer learning

David P Hirst; Morgane Térézol; Laura Cantini; Paul Villoutreix; Matthieu Vignes; Anaïs Baudot

doi:10.1186/s13059-025-03675-7

. 2025 Jul 25;26:224. doi: 10.1186/s13059-025-03675-7

MOTL: enhancing multi-omics matrix factorization with transfer learning

David P Hirst ^1,^✉, Morgane Térézol ¹, Laura Cantini ², Paul Villoutreix ¹, Matthieu Vignes ³, Anaïs Baudot ^1,^4,^5,^✉

PMCID: PMC12291315 PMID: 40713657

Abstract

Joint matrix factorization is popular for extracting lower dimensional representations of multi-omics data but loses effectiveness with limited samples. Addressing this limitation, we introduce MOTL (Multi-Omics Transfer Learning), a framework that enhances MOFA (Multi-Omics Factor Analysis) by inferring latent factors for small multi-omics target datasets with respect to those inferred from a large heterogeneous learning dataset. We evaluate MOTL by designing simulated and real data protocols and demonstrate that MOTL improves the factorization of limited-sample multi-omics datasets when compared to factorization without transfer learning. When applied to actual glioblastoma samples, MOTL enhances delineation of cancer status and subtype.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-025-03675-7.

Keywords: Matrix factorization, Dimensionality reduction, Multi-omics, Data integration, Transfer learning, MOFA

Background

Omics data have transformed the study of biology and medicine by enabling high-throughput measurements of the activity and abundance of biological molecules and processes [1–3]. In recent years, the fields of biology and medicine have been revolutionized by the increased availability of multi-omics datasets [1, 4, 5]. A multi-omics dataset is comprised of multiple data matrices, each containing a different type of omics data (e.g., mRNA transcript counts, genomic mutations, DNA methylation prevalence). The integrative analysis of multi-omics data can provide a better understanding of a biological system than that obtained from the analysis of a single omics data matrix, as the complementary information contained in different omics enables a more comprehensive overview of the underlying biological system [6–11]. Additionally, using multiple omics can reveal insights into relationships between the different biological layers they represent [4, 5, 12]. Combining omics is also expected to reduce the impact of noise [6, 9, 12]. However, multi-omics data poses further analysis challenges beyond those encountered in single omics data analysis. These challenges include increased dimensionality, the presence of multiple data types, diverse sources of technological noise, and diverse ranges of variability. In this context, there has been an increased need for methods able to carry out integrative analysis of multiple omics.

The development of multi-omics analysis tools is an active area of research and a large variety of strategies have been proposed. These strategies encompass broad and overlapping categories, including Bayesian methods, network-based approaches, or dimensionality reduction techniques [4, 9], alongside more recent deep learning strategies [10, 11, 13]. A category of multi-omics analysis tools that is widely used is dimensionality reduction with matrix factorization. Matrix factorization infers a lower dimensional representation of the observed data, in which a sufficiently informative proportion of the original signal is retained [14]. It has proven to be computationally efficient, but also interpretable, and effective for the analysis of large datasets [6, 7, 9, 12, 15].

Most classical matrix factorization approaches were designed for the analysis of a single data matrix. Applying matrix factorization to a single omics matrix produces a score matrix and a weight matrix, both of which contain values for latent factors that are potentially associated with different sources of underlying biological signal. The values in the weight matrix ideally represent signal across the assayed biological features, and the values in the score matrix represent the signal across the samples. For a multi-omics dataset, one of the strategies is to jointly factorize multiple omics data matrices. Various methods are now available for this purpose [9]. Multi-omics joint matrix factorization methods typically produce a weight matrix for each omics, and either a shared score matrix or a combination of shared and omics specific score matrices. Many multi-omics matrix factorization methods are extensions of classical methods. For example, intNMF [16] extends non-negative matrix factorization to the multi-omics setting, and allows the user to determine the relative contribution of each omics to the extraction of joint signal. JIVE [17] extends principal component analysis to model both joint and omics specific signal. moCluster [18] extends canonical correlation analysis to produce a shared score matrix based on score matrices produced for each of two or more omics. MOFA [19], which is an extension of Factor Analysis, uses a Bayesian framework to account for the presence of multiple data types and to distinguish between joint and omics specific signal. Overall, the factors inferred by multi-omics matrix factorization can be used for clustering samples to reveal disease sub-types, for identifying molecular profiles and biomarkers associated with diseases, as well as for prediction of outcomes such as drug response and survival [6, 7, 9, 20, 21]. A challenge for matrix factorization is that it requires a large amount of observed data to produce a meaningful representation. However, there are cases where omics are measured from only a small number of samples, due to the rareness or cost of obtaining the data, and so there is a need for methods which help mitigate this challenge [14, 21, 22].

For a dataset generated from a small number of samples, transfer learning is a potential solution to the limited effectiveness of matrix factorization. Transfer learning is a machine learning approach in which information extracted from a large learning domain is used to improve the performance of a task applied to a smaller target domain [20–23]. It is assumed that the two domains share an overlapping latent space, allowing knowledge from the learning domain to be transferred to the application of the task to the target domain. Transfer learning has been successfully used in various machine learning applications, including image classification, text sentiment classification and recommendation systems [21, 22, 24, 25]. In a transfer learning approach to omics matrix factorization, information inferred from the prior factorization of a learning dataset, comprised of a large number of samples from a heterogeneous set of biological conditions, is incorporated into the factorization of a small target dataset [21, 26]. It is assumed that if the latent factors inferred from the learning dataset represent common underlying biological processes, they should help improve the factorization of the target dataset [23].

The usefulness of transfer learning approaches to matrix factorization, for omics data analysis, has been demonstrated in contexts in which both the target and learning datasets were comprised of single matrices of omics data. In these cases, transfer learning was used to infer a score matrix for the target dataset by projecting it onto a weight matrix inferred from a learning dataset. In one study, Stein-O’brien et al. [23] factorized a mouse single cell RNA-seq learning dataset with the Bayesian non-negative matrix factorization algorithm CoGAPS [27]. Then, they used the transfer learning tool projectR [28] to infer a score matrix for a human time course bulk RNA-seq dataset. The resulting factors were associated with known spatiotemporal differences across the samples. In another example, Davis-Marcisak et al. [29] factorized a mouse single cell RNA-seq learning dataset with CoGAPS, and then used projectR to infer a score matrix for bulk RNA-seq data from human cancer samples. They observed an association between a particular projectR factor and outcomes in metastatic melanoma. Taroni et al. [20] developed MultiPLIER, a transfer learning framework, which they demonstrated by firstly applying the non-negative matrix factorization algorithm PLIER [30] to a subset of Recount2 to infer a weight matrix. Recount2 is a compendium of RNA-seq data obtained from 70,000+ human samples taken across more than 2000 studies [31]. Taroni et al. [20] then used a blood cell compendium of microarray gene expression data as the target dataset, for which they inferred a score matrix with MultiPLIER, as well as factorizing the target dataset directly with PLIER. For counts of a cell type of interest, MultiPLIER inferred a more highly correlated factor than was inferred by direct factorization of the target dataset. They also used microarray gene expression data for 79 samples from a rare disease group called antineutrophil cytoplasmic autoantibody associated vasculitis (AAV) as a target dataset. There are no AAV samples in the Recount2 compendium, yet the MultiPLIER factors were positively correlated with their best match from factors inferred by direct factorization of the target dataset.

It has thus been demonstrated that the application of matrix factorization to a large, heterogeneous learning dataset can yield factors containing transferable information, that are biologically relevant to target datasets from different organisms, diseases, cell types and omics platforms. However, existing transfer learning approaches to matrix factorization have been designed for, and demonstrated on, datasets comprised of single omics data only. To the best of our knowledge, transfer learning approaches to joint multi-omics matrix factorization are currently lacking.

We here introduce MOTL (Multi-Omics Transfer Learning), a novel Bayesian transfer learning algorithm for multi-omics matrix factorization. MOTL is based on MOFA, a popular tool for integrative multi-omics analysis [19]. We first present the statistical framework and implementation of MOTL. Next, we propose two protocols, that we designed based on simulated and real multi-omics datasets, for evaluating the performance of transfer learning approaches. We used these protocols to evaluate MOTL, and observed that, for a target multi-omics dataset comprised of a small number of samples, our transfer learning approach to matrix factorization is more effective than matrix factorization without transfer learning. Lastly, we showcase a practical use case of MOTL on a limited glioblastoma sample set, revealing an enhanced delineation of cancer status and subtype thanks to transfer learning.

Results

MOTL: a new transfer learning framework for multi-omics matrix factorization

We propose MOTL, a transfer learning approach to multi-omics matrix factorization. MOTL is based on MOFA [19], which uses variational Bayesian inference [32]. Consider a multi-omics target dataset, $T$ , consisting of omics matrices, $T^{(m)}$ , $m = 1, . . ., M$ . Each $T^{(m)} = [t_{nd}^{(m)}] \in R^{N_{t} \times D_{m}}$ contains data for $N_{t}$ samples (rows) and $D_{m}$ features (columns), where $t_{nd}^{(m)}$ is the value for the nth sample and the dth feature from the mth matrix (see the Methods “Mathematical Notation” section for a summary of the mathematical notation used in this document). The features depend on which molecules were assayed to generate a given omics matrix; for example the features for mRNA counts are genes, while those for DNA methylation are CpG sites.

We wish to jointly factorize $T^{(m)}$ into a matrix of sample scores, $Z = [z_{nk}] \in R^{N_{t} \times K}$ , and an omics specific matrix of feature weights, $W^{(m)} = [w_{kd}^{(m)}] \in R^{K \times D_{m}}$ . The resulting lower dimensional representation is based on K factors, which ideally represent underlying biological signals associated with some biological condition(s) of interest. $z_{nk}$ is the score for the nth sample and the kth factor, while $w_{kd}^{(m)}$ is the weight for the kth factor and the dth feature from the mth matrix. The kth column vector of $Z$ , denoted by $z_{: k}$ , contains scores for factor k, while the nth row vector, $z_{n :}$ , contains scores for sample n. The kth row vector of $W^{(m)}$ , denoted by $w_{k :}^{(m)}$ , contains weights for factor k, for the mth matrix, while the dth column vector, $w_{: d}^{(m)}$ , contains weights for feature d from the mth matrix.

We are concerned with the situation in which $N_{t}$ is small, exacerbating the curse of dimensionality, and therefore, we expect to improve the factorization of $T$ by employing a transfer learning approach (see Fig. 1). We do this transfer learning by incorporating values that have already been inferred from the prior factorization of a learning dataset, $L$ , and we assume that the Bayesian matrix factorization algorithm MOFA [19] was used for factorizing $L$ .

Fig. 1 — Overview of MOTL, our transfer learning approach to joint multi-omics matrix factorization based on variational Bayesian inference. a A multi-omics learning dataset, $L$ , consisting of M omics matrices, $L^{(m)}$ , $m = 1, . . ., M$ , is factorized with MOFA to infer a matrix of feature weights, $W^{(m)}$ , vector of feature-wise intercepts, $a^{(m)}$ , and a vector of feature-wise precision parameter values, $τ^{(m)}$ , for each $L^{(m)}$ . b The feature weight, intercept, and precision parameter values, inferred from the factorization of $L$ , are incorporated into the factorization of a multi-omics target dataset, $T$ , for which MOTL infers a matrix of sample scores, $Z$ , with variational inference

The learning dataset consists of omics matrices $L^{(m)}$ , $m = 1, . . ., M$ . Each $L^{(m)} = [l_{nd}^{(m)}] \in R^{N_{l} \times D_{m}}$ contains data for the same $D_{m}$ features as $T^{(m)}$ , but for a different set of $N_{l} > N_{t}$ samples. We hypothesize that if $L$ is comprised of samples from a heterogeneous set of biological conditions, then the factorization of $L$ will yield information that is relevant for the factorization of $T$ .

MOTL is based on the variational Bayesian inference methodology used by MOFA (the Methods “The MOFA model” section). We have modified the MOFA algorithm to enable us to supplement the factorization of $T$ by incorporating values already inferred from the prior factorization of $L$ . For MOTL, we assume that each observed $t_{nd}^{(m)}$ is a random variable, with a likelihood that is conditional on vectors $z_{n :}$ and $w_{: d}^{(m)}$ . We model continuous, counts and binary data with the same likelihoods and link functions that MOFA uses. For observed continuous data, we thus assume a Gaussian likelihood, into which we include a feature-wise precision parameter, $τ_{d}^{(m)}$ , for each feature d from matrix m. For observed binary data, we assume a Bernoulli likelihood, and for observed counts data we assume a Poisson likelihood. In contrast to MOFA, MOTL does not center the input data during factorization fitting, as we want to incorporate an intercept that is compatible with the factorization of $L$ . We therefore replace $z_{n :} w_{: d}^{(m)}$ with $a_{d}^{(m)} + z_{n :} w_{: d}^{(m)}$ in the likelihood, where $a_{d}^{(m)}$ is the feature-wise intercept for feature d, from matrix m. We infer $a_{d}^{(m)}$ values based on the MOFA factorization of $L$ (the Methods “Application of MOTL to simulated, TCGA and glioblastoma multi-omics datasets” section). MOTL accepts missing $t_{nd}^{(m)}$ values; therefore it is not necessary to remove features with missing values, or perform imputation, before using MOTL.

In order to carry out a transfer learning approach to matrix factorization, MOTL uses the matrix of feature weights, $W^{(m)}$ , vector of feature-wise intercepts, $a^{(m)} = [a_{d}^{(m)}] \in R^{D_{m}}$ , and vector of feature-wise precision parameter values, $τ^{(m)} = [τ_{d}^{(m)}] \in R^{D_{m}}$ , inferred for each $L^{(m)}$ with a prior MOFA factorization of $L$ . Instead of modeling these as random variables, we treat them as constants. We aim to obtain point estimates of $z_{nk}$ values, for which, we assume the same joint prior distribution as MOFA does,

\begin{matrix} p (Z) = \overset{N_{t}}{\prod_{n = 1}} \overset{K}{\prod_{k = 1}} Normal (z_{nk} | 0, 1) \end{matrix}

MOTL obtains point estimates of $z_{nk}$ values by approximating the joint posterior distribution $p (Z | T)$ with a variational distribution:

\begin{matrix} q (Z) = \overset{N_{t}}{\prod_{n = 1}} \overset{K}{\prod_{k = 1}} q (z_{nk}) = \overset{N_{t}}{\prod_{n = 1}} \overset{K}{\prod_{k = 1}} Normal (z_{nk} | μ_{nk}, σ_{nk}) \end{matrix}

MOTL infers $q (Z)$ iteratively. At each iteration, the value of each parameter is updated while all other parameter values are held fixed. MOTL optimizes the joint variational distribution by iterating until convergence. For each $z_{nk}$ , the expected value, $E_{q} [z_{nk}] = μ_{nk}$ , is used as the point estimate throughout and after model fitting. MOTL uses the same update equations for the parameters of $q (z_{nk})$ as MOFA, but with the inclusion of intercepts:

\begin{matrix} σ_{nk}^{2} = {(\overset{M}{\sum_{m = 1}} \overset{D_{m}}{\sum_{d = 1}} τ_{nd}^{(m)} {(w_{kd}^{(m)})}^{2} + 1)}^{- 1} \end{matrix}

\begin{matrix} μ_{nk} = σ_{nk}^{2} \overset{M}{\sum_{m = 1}} \overset{D_{m}}{\sum_{d = 1}} τ_{nd}^{(m)} w_{kd}^{(m)} ({\hat{t}}_{nd}^{(m)} - a_{d}^{(m)} - \sum_{j \neq k} z_{nj} w_{jd}^{(m)}) \end{matrix}

where $τ_{nd}^{(m)}$ is the precision for the nth sample and dth feature from the mth matrix, and ${\hat{t}}_{nd}^{(m)}$ denotes a (possibly) transformed observed data point (the Methods “The MOFA model” section). For observed data with a Gaussian assumed likelihood, a feature-wise precision, $τ_{d}^{(m)}$ , is used instead of $τ_{nd}^{(m)}$ , and there is no transformation, meaning ${\hat{t}}_{nd}^{(m)} = t_{nd}^{(m)}$ . For observed data with a non-Gaussian assumed likelihood, MOTL transforms the data to yield Gaussian pseudo-data values, which it does not center. The transformation to Gaussian pseudo-data allows updates of $q (Z)$ to be based on the assumption of Gaussian observed data. When MOFA transforms observed data with a Bernoulli assumed likelihood, it derives and uses a precision parameter, $τ_{nd}^{(m)}$ , for each sample and feature. For observed data with a Poisson assumed likelihood, it derives and uses a feature-wise precision, $τ_{d}^{(m)}$ . Thus for Bernoulli observed data, MOTL initializes $τ_{nd}^{(m)}$ values with $τ_{d}^{(m)}$ values, which are averages of the $τ_{nd}^{(m)}$ values returned by the factorization of $L$ , and these are subsequently updated at each iteration of the algorithm. For Poisson observed data, MOTL uses the $τ_{d}^{(m)}$ values obtained from the prior factorization of $L$ , and holds them fixed.

To monitor convergence we calculate the evidence lower bound (ELBO), which can be used to evaluate how well a variational distribution approximates a posterior distribution of interest. We calculate the ELBO with respect to $Z$ :

\begin{matrix} ELBO (Z) = E_{q} [log p (T | Z)] + E_{q} [log p (Z)] - E_{q} [log q (Z)] \end{matrix}

For $T^{(m)}$ with a non-Gaussian assumed likelihood, we use the same lower bound for $log p (t_{nd}^{(m)}, | Z)$ as MOFA does. Maximizing this lower bound, coupled with the use of ${\hat{t}}_{nd}^{(m)}$ values, allows updates of $q (Z)$ based on the assumption of Gaussian observed data [33, 34]. We calculate the ELBO at regular intervals, and the number of iterations between each calculation is a user defined parameter. We check for convergence based on the absolute change in the ELBO (from the previous check) as a percentage of the initial ELBO. The algorithm is deemed to have converged when a specified number of changes in the ELBO are consecutively below a threshold. Both the threshold, and the required number of consecutive changes falling below this threshold, are user defined parameters.

We allow factors to be dropped during training, based on the fraction of variance explained:

\begin{matrix} R_{mk}^{2} = 1 - \frac{\overset{N_{t}}{\sum_{n = 1}} \overset{D_{m}}{\sum_{d = 1}} {({\hat{t}}_{nd}^{(m)} - a_{d}^{(m)} - z_{nk} w_{kd}^{(m)})}^{2}}{\overset{N_{t}}{\sum_{n = 1}} \overset{D_{m}}{\sum_{d = 1}} {({\hat{t}}_{nd}^{(m)} - a_{d}^{(m)})}^{2}} \end{matrix}

We drop the factor with the lowest $R_{mk}^{2}$ that does not have any $R_{mk}^{2}$ above the threshold. We assess factors in this way after each round of updates. After convergence the algorithm returns $Z$ and $W^{(m)}$ matrices for the factors that have not been dropped.

MOTL is available as an open source R implementation [35].

Evaluation protocol using simulated multi-omics data

We first designed and implemented a transfer learning evaluation protocol based on simulated multi-omics datasets, which we generated from groundtruth factors (the Methods “Multi-omics data simulated with groundtruth factors” section). In each simulation instance, we generated a multi-omics dataset, $Y$ , which we subsequently split into a target dataset, $T$ , and a learning dataset, $L$ . $Y$ consisted of matrices of counts, continuous, and binary data. We generated each matrix, $Y^{(m)}$ , from a statistical distribution conditional on random matrices $Z$ and $W^{(m)}$ , which each contained values for K groundtruth factors. The kth column vector of $Z$ contained sample scores for the kth groundtruth factor. The kth row vector of $W^{(m)}$ contained feature weights for that same factor. We varied the number of groundtruth factors across configurations, using $K \in {20, 30}$ . We generated $Z$ based on the group membership of samples. In each instance, we created two groups of five samples for the target dataset. The learning dataset samples belonged to either 20 or 40 differently sized groups of randomly selected sizes. For each groundtruth factor and group, the sample scores were generated using a mean parameter value that was common to all samples in the group. We induced heterogeneity by allowing the means to vary across groups and factors, randomly selecting each group mean, for a given groundtruth factor, from a pool of three possible values. We split each $Y^{(m)}$ into $T^{(m)}$ and $L^{(m)}$ , based on the sample groups used to generate $Z$ . In each instance $T$ contained data for 10 samples, while the expected number of samples for $L$ was $\in {400, 1000}$ .

For each simulation instance, we factorized $L$ with MOFA (the Methods “Application of MOFA to simulated, TCGA and glioblastoma multi-omics datasets” section). We then factorized $T$ with our transfer learning method MOTL (the Methods “Application of MOTL to simulated, TCGA and glioblastoma multi-omics datasets” section), incorporating output from the factorization of $L$ . To benchmark the performance of MOTL, we also performed direct MOFA factorizations (i.e., factorization without transfer learning) of $T$ datasets. We evaluated both the MOTL and direct MOFA factorization of each $T$ , and compared the overall performance of each approach. We evaluated factorizations of each $T$ by calculating an F1 score (the Methods “Evaluation methods” section), to measure how well the factorization allowed us to uncover differentially active groundtruth factors underlying $T$ . The kth groundtruth factor was differentially active for $T$ if the mean parameter values used to simulate the sample scores, for that factor, differed between the two groups of target dataset samples. Factorization with MOTL led to higher F1 scores than direct MOFA factorization, indicating that the MOTL factorizations were more effective in uncovering differentially active latent signal from $T$ datasets (Fig. 2a). This was observed across all simulation configurations, and the overall uplift in mean F1 score for MOTL, when compared to direct MOFA factorization, was 0.21 (p-value $< 0.01$ , the Methods “Evaluation methods” section). We thus observed that transfer learning with MOTL was more effective in uncovering differentially active latent signal, when compared to direct MOFA factorization (without transfer learning) of $T$ . Of note, MOTL was also more effective than direct factorization with the alternative multi-omics matrix factorization approaches intNMF and moCluster (Additional file 1: Fig. S1).

Fig. 2 — Evaluation of factorizations of small simulated multi-omics target datasets. a The boxplots represent the F1 scores obtained for factorizations with and without MOTL transfer learning, for different simulation configuration settings. Simulation configurations varied in the number of groups of samples used for the learning dataset (*Learning Groups*), the number of groundtruth factors (K), or the standard deviation used to simulate $z_{nk}$ values (sd). F1 scores take a value between 0 and 1, and higher values indicate better factorizations. Each boxplot is based on 30 F1 scores. The hinges of the boxes are the 25th and 75th percentiles, the middle lines are medians, the diamonds are the mean values, and the whiskers are either extreme values or extend 1.5 times the inter-quartile range from the hinge. b The line plots represent the average F1 score obtained from MOTL factorizations, for different simulation configuration settings, after permuting the values in proportions of the feature vectors in the $W^{(m)}$ matrices obtained from prior factorization of $L$ . The dashed line is the average F1 score obtained with direct MOFA factorization for the simulation configuration

We next wanted to evaluate the robustness of MOTL when there is a decline in the overlap between the latent spaces of $L$ and $T$ (the Methods “Application of MOTL to simulated, TCGA and glioblastoma multi-omics datasets” section). We forced this decline in overlap by permuting feature vectors in the $W^{(m)}$ matrices inferred from $L$ datasets, based on a range of permutation proportions between 0 and 1. For each simulation instance, and permutation proportion, p, we created new $W^{(m)}$ matrices by permuting the values in $(p \times 100) %$ of the feature vectors in the $W^{(m)}$ matrices inferred from $L$ . We then factorized $T$ with MOTL, using the permuted $W^{(m)}$ matrices, and calculated the F1 score. We observed that MOTL outperformed direct MOFA factorization even when there were large declines in the overlap between the latent spaces of $L$ and $T$ , and that the performance of MOTL tended to drop below that of direct MOFA factorization when the values in $80 %$ or more of the feature vectors were permuted (Fig. 2b).

Evaluation protocol using TCGA multi-omics data

We next designed, and implemented, a second transfer learning evaluation protocol, based on TCGA multi-omics data (the Methods “TCGA multi-omics data acquisition and pre-processing” section). We used four types of omics data: log2 transformed mRNA counts, log2 transformed miRNA counts, DNA methylation M-values, and simple nucleotide variation (SNV) binary data, which we obtained for 32 different cancer types. We created target datasets using data from three cancer types; acute myeloid leukemia (LAML), pancreatic adenocarcinoma (PAAD) and skin cutaneous melanoma (SKCM). We created these target datasets by firstly creating four reference datasets. Each reference dataset, $R$ , contained multi-omics data for all samples from either two, or all three of the cancer types. We then randomly split every $R$ into non-overlapping target datasets which each contained only five samples per cancer type (Fig. 3a). We merged data from the remaining 29 cancer types into a learning dataset, $L$ , which contained multi-omics data for 7217 samples.

Fig. 3 — Evaluations using TCGA multi-omics data. a We created TCGA target, $T$ , datasets from two or three cancer types: For each combination of cancer types, we created a reference, $R$ , dataset containing multi-omics data for all samples from the selected cancer types. We then randomly split $R$ into non-overlapping $T$ datasets, containing multi-omics data for five samples per cancer type. We did this for subsets of the set of cancer types {LAML, PAAD, SKCM}. In total we created, and split, four reference datasets, each of which contained multi-omics data for all samples from either two (LAML and PAAD, LAML and SKCM, PAAD and SKCM), or three (LAML, PAAD, and SKCM) cancer types. For datasets containing PAAD and SKCM samples only, we included mRNA, miRNA, DNA methylation, and SNV data. We did not include SNV data in datasets containing LAML samples, due to the sparsity of SNV data. b Comparison of factorization approaches applied to TCGA multi-omics datasets. Violin plots of F-measure values for weight matrix factors (*FM_W*), F-measure values for score matrix factors (*FM_Z*), and F1 scores (F1). For each evaluation score, higher values indicate better factorizations. Scores are plotted by factorization method and by the cancer types characterizing the target dataset samples. c Frequency with which differentially active groundtruth TCGA factors were true positives. A differentially active groundtruth factor was a true positive if it was predicted as being differentially active based on a factorization of a target dataset. Each bar represents the proportion of target datasets, for which the factorization led to the differentially active groundtruth factor being a true positive. Proportions are plotted by factorization method, and by the cancer types characterizing the reference and target dataset samples

We factorized $L$ with MOFA (the Methods “Application of MOFA to simulated, TCGA and glioblastoma multi-omics datasets” section), based on which we used MOTL to factorize each $T$ (the Methods “Application of MOTL to simulated, TCGA and glioblastoma multi-omics datasets” section). To benchmark the performance of MOTL, we also performed direct MOFA factorizations (without transfer learning) of $T$ datasets. In order to evaluate the factorizations of $T$ datasets, we factorized $R$ datasets with MOFA and treated the resulting score, $Z$ , and weight, $W^{(m)}$ , matrices as groundtruth factor matrices (the Methods “Evaluation methods” section). We were interested in how well the factorizations of $T$ datasets uncovered the groundtruth, and we used F-measure values and F1 scores to evaluate this.

We calculated F-measure values to assess the correlation between factors inferred from each $T$ , and the groundtruth factors obtained from the factorization of the corresponding $R$ dataset (the Methods “Evaluation methods” section). We calculated F-measure values for weight matrices (FM_W), as well as for score matrices (FM_Z). The overall mean FM_W for MOTL was slightly lower (0.03 reduction, p-value $< 0.01$ ) than for direct MOFA factorizations of $T$ datasets (Fig. 3b, column 1), which is the result of lower average relevance counterbalancing higher average recovery. We concluded from this that groundtruth $W^{(m)}$ factors were more easily uncovered with those transferred from $L$ than with direct MOFA factorization. However, despite factor trimming during MOTL factorization, some remaining transferred factors were less associated with groundtruth factors than those obtained with direct MOFA factorization. It is of note that the difference in average FM_W is attributable to the datasets containing LAML and PAAD samples only. If we exclude these, there is no difference in FM_W (p-value 0.59). The overall mean FM_Z for MOTL was 0.20 higher (p-value $< 0.01$ , the Methods “Evaluation methods” section) than for direct MOFA factorizations (Fig. 3b, column 2). We thus observed that the $Z$ factors obtained with MOTL, from $T$ datasets, were more correlated with groundtruth factors, overall, than those obtained with direct MOFA factorization.

We also calculated F1 scores to measure how well the factorizations of $T$ datasets uncovered differentially active groundtruth factors (the Methods “Evaluation methods” section). For each $T$ dataset, the groundtruth factors were the factors obtained from factorization of the corresponding $R$ dataset. We considered the kth groundtruth factor to be differentially active if the distribution of scores in the kth column vector, of groundtruth $Z$ , differed between the cancer types. We can simultaneously evaluate the $Z$ and $W^{(m)}$ factors, and assess the overall quality of factorizations, by checking for an uplift in F1 scores (Fig. 3b, column 3). MOTL (0.34 uplift, p-value $< 0.01$ ) yielded higher F1 scores than direct MOFA factorization, meaning it was more effective in uncovering latent activity that varied across cancer types. Similarly to the evaluation using simulated data, MOTL was also more effective than direct factorization with the alternative multi-omics matrix factorization approaches intNMF and moCluster (Additional file 1: Fig. S2).

We next examined differentially active groundtruth factors, with an initial focus on the frequency with which these factors were true positives (Fig. 3c). For each factorization of a $T$ dataset, a differentially active groundtruth factor was a true positive if it was predicted as being differentially active based on the factorization of $T$ . The unique count of true positives was a component of each F1 score value (the Methods “Evaluation methods” section). We further performed a gene set enrichment analysis to identify the pathways and processes associated with differentially active groundtruth factors that were true positives (the Methods “Evaluation methods” section, Additional file 2: Table S1), and that explained at least 1% of the mRNA variance in $R$ (Additional file 3: Table S2).

The factorization of the $R$ dataset containing all LAML and PAAD samples yielded six groundtruth factors, of which two were differentially active; Factor 1 and Factor 3. Both of these factors were true positives for 100% of MOTL factorizations of $T$ datasets containing subsets of five LAML and PAAD samples (Fig. 3c). In contrast, only one of these factors was a true positive for direct MOFA factorizations, and for just over half of the same $T$ datasets (Fig. 3c). Factor 1 is significantly associated with developmental processes, cell communication and immunity signaling. Factor 3 displays similar enrichments, with an additional specific enrichment related to the regulation of gene expression in beta cells (Additional file 2: Table S1).

The factorization of the $R$ dataset containing all LAML and SKCM samples yielded 12 groundtruth factors, of which five were differentially active. Four out these five factors were true positives for MOTL factorizations for more than 80% of the $T$ datasets; one factor was a true positive for just under half of the $T$ datasets. Importantly, only two of these five groundtruth factors were true positives for direct MOFA factorizations, and only for a small proportion of the $T$ datasets. Four out of these five differentially active groundtruth factors, that were true positives, explained at least 1% of the mRNA variance in $R$ : Factor 1, Factor 3, Factor 8, and Factor 9. We performed gene set enrichment analysis on these four factors. Factor 1 is significantly associated with extracellular organization, developmental processes, cell communication signaling, and Fc Receptor mediated immune processes (Additional file 2: Table S1). Factor 3 is significantly associated with hematopoeitic cell lineage, Pi3K/AKT and G protein-coupled signaling, and chemokine, interleukin, interferon signaling. Both Factors 1 and 3 are associated with keratinization and formation of the cornified envelope. Factors 8 and 9 do not present significant pathway enrichments beyond keratinization and processes already associated with the first two factors.

The factorization of the $R$ dataset containing all PAAD and SKCM samples yielded 17 groundtruth factors, of which eight were differentially active. Seven of these eight factors were true positives for MOTL factorizations, six with high frequency (i.e., identified for more than 80% of the $T$ target datasets). Only one differentially active groundtruth factor is frequently a true positive for direct MOFA factorization of the $T$ target datasets. Factor 2 is related to B cell receptor signaling and Fc Receptor mediated immune processes, drug metabolism by cytochrome p450 and other metabolism-related processes. This Factor 2 is rarely uncovered by direct MOFA factorization. Contrarily, Factor 4 is a true positive for MOTL for 100% of the target datasets and for direct MOFA factorizations for more than 75% of the target datasets. This factor is associated with developmental processes and cytokine-cytokine receptor interactions. Factor 7 is associated with keratinization and formation of the cornified envelope, Factor 9 with cell adhesion and migration, and Factor 10 with cytokine and chemokine signaling.

The factorization of the $R$ dataset containing all LAML, PAAD and SKCM samples yielded 13 groundtruth factors, of which seven were differentially active. All seven of these differentially active groundtruth factors were true positives for MOTL factorization with high frequency, compared to one factor for direct MOFA factorization. These groundtruth factors, differentially active between all three cancer types, are associated with the same cellular processes and pathways identified when comparing the cancer types pairwise (Additional file 2: Table S1).

Overall, the factors that were differentially active when comparing two or the three cancer types reflect the different embryonic origins of the cancerous tissues, and highlight the importance of immunity and microenviroment in cancer pathophysiology and response to treatments. In conclusion, matrix factorization with transfer learning using MOTL better uncovers differentially active groundtruth factors from target datasets containing only a small number of samples.

Application of MOTL to glioblastoma

Glioblastoma is a rare, heterogeneous, and aggressive cancer type. Multi-omics datasets offer an important opportunity to better characterize glioblastoma subtypes, identify biomarkers, and propose novel therapeutic options [36]. However, large collections of glioblastoma tissue samples are difficult to obtain due to the relative scarcity of the disease and the challenges involved in acquiring samples via invasive biopsies.

In [36], the authors conducted a multi-omics profiling (mRNA expression, DNA methylation) for four normal brain samples and nine patient-derived glioblastoma stem cell (pd-GBSC) cultures. The nine cancer samples had been previously classified into three subtypes thanks to transcriptome-based signatures: classical (CL), proneural (PN), and mesenchymal (MS). Given the small number of samples, the authors devised a strategy based on analyzing this dataset in parallel with datasets gathered from the literature. We illustrate here how MOTL could help in analyzing such a dataset comprised of a limited number of samples.

We first applied a direct MOFA factorization (the Methods “Application of MOFA to simulated, TCGA and glioblastoma multi-omics datasets” section) to a target dataset comprised of the four normal and nine pd-GBSC samples (the Methods “Glioblastoma target dataset acquisition and pre-processing” section), revealing eight factors. Heatmap clustering of the samples, based on these factors, does not demonstrate clear grouping with respect to either cancer status or subtype (Fig. 4a). Next, we applied MOTL to the same target dataset (the Methods “Application of MOTL to simulated, TCGA and glioblastoma multi-omics datasets” section). In this case, we first created a TCGA learning dataset containing mRNA expression, miRNA expression, DNA methylation, and SNV data for samples from all 32 cancer types (the Methods “TCGA multi-omics data acquisition and pre-processing” section). It is noteworthy that this learning dataset did not contain data for glioblastoma, as there were no TCGA glioblastoma samples fulfilling our selection criteria (i.e., with complete 4-layer multi-omics profiles). We factorized this learning dataset with MOFA (the Methods “Application of MOFA to simulated, TCGA and glioblastoma multi-omics datasets” section), based on which we applied MOTL to the target dataset. MOTL transfer learning factorization revealed 25 factors. In this case, the heatmap clustering of the MOTL factors separates cancer and normal samples and also displays subgroups partially matching subtypes previously defined based on transcriptome signatures (Fig. 4b).

Fig. 4 — Heatmaps of factorizations and gene set enrichment analysis of glioblastoma data: The heatmaps are based on factorizations and subsequent gene set enrichment analysis of the target dataset comprised of multi-omics data (mRNA expression, DNA methylation) for all normal and patient-derived glioblastoma stem cell (pd-GBSC) samples. The multi-omics data was obtained from Santamarina-Ojeda et al. [36], who also provided subtypes for the cancer samples, previously defined from transcriptomic signatures: CN (classical), PN (proneural), MS (mesenchymal). a Heatmap of the score matrix, $Z$ , inferred with Direct MOFA factorization of the target dataset. The rows are the factors, the columns are the samples, and the cells are the row-wise centered and scaled factor values. The rows and columns were ordered with hierarchical clustering (complete-linkage). b Heatmap of $Z$ matrix inferred with MOTL factorization of the target dataset. c Reactome enrichment results for direct MOFA and MOTL factors. We performed gene set enrichment analysis on the direct MOFA and the 11 MOTL factors that were differentially active between normal and cancer samples, and that explained at least 1% of mRNA variance. This yielded a set of Reactome processes and pathways significantly associated with either one direct MOFA factor or one or more of 10 different MOTL factors. The Reactome processes and pathways were filtered, clustered, and plotted with orsum. The colors represent the quartiles of enrichment significance for each factor (darker means more significant)

Further statistical tests revealed that only one of the eight factors identified by direct MOFA factorization was differentially active between normal and cancer samples, whereas 19 of the 25 factors obtained by MOTL transfer learning factorization were differentially active (the Methods “Evaluation methods” section). Gene set enrichment analysis using Reactome, GO:CC, GO:BP and KEGG (the Methods “Evaluation methods” section), focusing on differentially active factors that explained at least 1% of mRNA variance, revealed 715 processes and pathways associated with the direct MOFA factor, and 1061 processes and pathways associated with 11 MOTL factors (Additional file 4: Table S3). The overlap between the two sets of associated processes and pathways was 318, which is statistically significant (hypergeometric test, one-sided p-value $< 0.01$ ). We filtered and integrated these enrichment results (Fig. 4c; Additional file 1: Fig. S3–S5) using orsum [37] (the Methods “Evaluation methods” section). This analysis revealed that MOTL factors capture a broader spectrum of biological pathways than the direct MOFA approach, including for instance immune/inflammatory processes and lipid metabolism pathways.

We also applied both direct MOFA, and MOTL factorizations, to target datasets comprised of normal samples and samples from just a single cancer subtype. In all cases, the direct MOFA factorization yielded only one differentially active factor, whereas MOTL yielded 11 (CL vs normal), 16 (MS vs normal), and 12 (PN vs normal) differentially active factors. Focusing on the subset of differentially active factors that explained at least 1% of mRNA variance, we performed gene set enrichment analyses. We identified processes and pathways that were associated with only a single cancer subtype, such as fatty acid metabolism enrichment for the MS subtype and clathrin-mediated endocytosis for the PN subtype (Additional file 5: Table S4).

Discussion

We presented MOTL, which factorizes a multi-omics target dataset by incorporating latent factor values already inferred from the factorization of a multi-omics learning dataset. In the application of MOTL, we are concerned with the situation in which we analyze a target dataset comprised of a limited number of samples. In our evaluations, the target datasets never contained data for more than 15 samples, as our primary concern was with target datasets considered too small for useful factor analysis. It would be insightful to extend the evaluations by using a larger range of sizes for target datasets, in order to identify a crossover point at which transfer learning no longer enhances matrix factorization.

An assumption underlying our application of transfer learning is that there is an overlapping latent space stemming from shared biological signal. With target datasets containing few samples, the learning dataset is likely to represent a set of biological conditions that neither is entirely specific to, nor fully overlaps with the set of biological conditions represented by the target dataset. In our evaluation based on simulated data, we observed that MOTL performed well even when there were large declines in the overlap between the latent spaces underlying the target and learning datasets, except for the more extreme situation where there was an almost total absence of overlap. It would be relevant to perform a similar analysis using real data, investigating how similar the learning and target datasets need to be in order for informative shared factors to exist. For instance, it would be beneficial to use a measure of similarity, between a given learning and target dataset, which would predict the effectiveness of using a transfer learning approach to apply matrix factorization to the target dataset. The similarity between the learning and target datasets could potentially be data driven, based on measures such as optimal transport [38] or maximum mean discrepancy [39]. Alternatively, a relevant ontology [40, 41] could be used to assess the overall similarity between the biological conditions characterizing the learning and target datasets. It could also be interesting to quantify how heterogeneous (i.e., representing a large diversity of tissues, diseases, experimental conditions...) a learning dataset needs to be, in order to yield factors which can be relevant for a given target dataset.

To evaluate MOTL, we designed two evaluation protocols, based on simulated and real data. Importantly, these protocols can be reused to evaluate other transfer learning strategies for multi-omics data integration. For the evaluation protocol based on real data, we used TCGA, a public repository of multi-omics data with a large number of samples, representing various cancer and tissue types. We selected three different cancer types as references, from which to build target datasets. We did not include samples from these cancer types in the learning dataset. In this setting, it was unknown, prior to evaluation, whether there were latent factors common to the learning and target datasets. Yet, MOTL was effective in uncovering differentially active latent factors, demonstrating that latent factors can be shared across different cancer types. We applied MOTL to a glioblastoma use-case as a proof-of-concept, but we envisage MOTL as being a helpful tool in the study of rare diseases in general. Therefore a future extension of our work would be to evaluate the application of MOTL to target datasets with non-cancer rare disease samples, using factors inferred from the TCGA learning dataset. We foresee the results of such an evaluation being accompanied by measures of how similar the target datasets are to the TCGA learning dataset, allowing for guidance on when MOTL is a suitable tool for the analysis of other rare disease datasets.

With MOTL, we have designed a transfer learning framework that is compatible with a prior learning dataset factorization, as carried out with the MOFA Bayesian approach [19]. The appeal of a Bayesian framework is the flexibility with regard to the incorporation of prior information, and variational inference serves as a fast alternative to sampling methods. In the future, MOTL could be extended to allow information to be incorporated at other levels of the assumed hierarchy of latent variables. For example, instead of fixing the feature weight values, they could be treated as random variables by MOTL, with priors informed by the factorization of the learning dataset.

In addition to MOFA, there are numerous methods available for multi-omics matrix factorization [16, 17, 42, 43]. A future extension of our work could be a transfer learning framework matched to different multi-omics matrix factorization methods. Similarly, we foresee value in extending an existing transfer learning method that has been designed for single omics data [20, 28], so it can be applied in the multi-omics context. The evaluation of such a method, based on the factorization of a learning dataset with a variety of matrix factorization methods, would be informative.

A limitation with MOTL is that we are restricted to a factorization based on features that were retained for factorization of the learning dataset. A consequence is that some features which are highly variable in the target dataset may not contribute to the MOTL factorization. Therefore a future extension could be to add flexibility into the MOTL workflow, so that all features that are highly variable in the target dataset contribute to the factorization, even if they were not retained for the factorization of the learning dataset. Adding flexibility in this way may enhance the weight matrices that are used by MOTL. In the evaluation based on TCGA data, we observed that factors from the groundtruth weight matrices were more easily uncovered using those transferred from the learning dataset than by using those from direct MOFA factorization. However, the transferred factors were slightly less correlated with the factors from the groundtruth weight matrices than the direct MOFA factors were. Despite this, MOTL factorizations were more effective than direct MOFA factorizations in uncovering differentially active groundtruth latent factors. We attribute this superior performance to the fact that the factors in the score matrices inferred by MOTL showed higher correlations with factors from the groundtruth score matrices than the direct MOFA factors. We thus expect that incorporating greater flexibility in the MOTL workflow to retain all of the features that are highly variable in the target dataset would further enhance the weight matrices and produce even more informative factors.

Recently, deep learning, generative, and foundation models have been tested for multi-omics data integration [10, 11, 13]. We however did not identify benchmarks comparing linear methods based on MF with deep learning methods in the context of bulk multi-omics data integration, in particular for small target datasets. It will be interesting to see the developments in this context with more multi-omics data becoming available.

Conclusions

We presented MOTL, an approach for multi-omics matrix factorization with transfer learning, which infers latent factor values for a multi-omics target dataset comprised of a small number of samples. MOTL factorizes the target dataset by incorporating latent factor values already inferred from the factorization of a learning dataset. We designed two protocols, based on simulated and real multi-omics datasets, for evaluating the performance of multi-omics matrix factorization with transfer learning. We implemented these protocols to evaluate MOTL, and observed that MOTL was more effective in uncovering differentially active groundtruth latent factors than direct matrix factorization without transfer learning. This result is observed from comparison to direct factorization with MOFA as well as with moCluster and intNMF.

Finally, the application of MOTL to a glioblastoma dataset, comprised of a small number of samples, revealed an enhanced delineation of cancer status and subtype thanks to transfer learning. We thus demonstrated, in the case of a multi-omics dataset comprised of a small number of samples, that MOTL can enhance the discovery of biological processes and pathways associated with a biological condition of interest. MOTL is accessible as an open source R implementation, as are the evaluation protocols we used in this study.

Methods

Mathematical notation

We denote matrices and datasets with bold capital letters: $Y$
If $Y$ denotes a matrix, we introduce it as $Y = [y_{nd}] \in R^{N \times D}$ for which:
- there are N rows and D columns
- $y_{nd}$ denotes the value in the nth row and the dth column
- $y_{n :}$ denotes the nth row vector, and $y_{: d}$ denotes the dth column vector
If $Y$ denotes a dataset comprised of multiple matrices, we specify this, and denote each of the matrices as $Y^{(m)} = [y_{nd}^{(m)}] \in R^{N \times D_{m}}$
We denote parameters for statistical distributions as non-bold, lower case letters. If the parameter is for a random variable stored in a matrix, we add indices. For example, $τ_{nd}^{(m)}$ is a parameter for a random variable in the nth row and dth column of the mth matrix of some dataset. If $τ_{d}^{(m)}$ is a parameter for the same matrix, then it is used for all values in the dth column.

The MOFA model

Consider a multi-omics dataset $Y$ consisting of omics matrices $Y^{(m)}$ , $m = 1, . . ., M$ . Each $Y^{(m)} = [y_{nd}^{(m)}] \in R^{N \times D_{m}}$ contains data for N samples (rows) and $D_{m}$ features (columns), where $y_{nd}^{(m)}$ is the value for the nth sample and the dth feature from the mth matrix. MOFA [44], assumes the existence of latent factors, and jointly factorizes each $Y^{(m)}$ into a shared matrix of sample scores $Z = [z_{nk}] \in R^{N \times K}$ , and an omics specific matrix of feature weights $W^{(m)} = [w_{kd}^{(m)}] \in R^{K \times D_{m}}$ . The kth column of $Z$ contains scores for the kth factor, and the kth row of $W^{(m)}$ contains corresponding weights for that factor.

MOFA assumes that each observed $y_{nd}^{(m)}$ is a random variable, characterized by a probability distribution conditional on a set of latent random variables $β$ . It is assumed that the joint likelihood $p (Y | β)$ is equal to $\overset{M}{\prod_{m = 1}} \overset{N}{\prod_{n = 1}} \overset{D_{m}}{\prod_{d = 1}} p (y_{nd}^{(m)} | β)$ , and the choice of probability distribution depends on the type of observed data. A Gaussian likelihood is assumed for observed continuous data:

\begin{matrix} p (y_{nd}^{(m)} | β) = Normal (y_{nd}^{(m)} | z_{n :} w_{: d}^{(m)}, 1 / τ_{d}^{(m)}) \end{matrix}

where $z_{n :}$ is the vector of scores for the nth sample, $w_{: d}^{(m)}$ is the vector of weights the dth feature from the mth matrix, and $τ_{d}^{(m)}$ is the precision for that feature. A Bernoulli likelihood is assumed for observed binary data and the logistic link function $π (x) = {(1 + e^{- x})}^{- 1}$ is used:

\begin{matrix} p (y_{nd}^{(m)} | β) = Bernoulli (y_{nd}^{(m)}, | π, (z_{n :}, w_{: d}^{(m)})) \end{matrix}

A Poisson likelihood is assumed for observed counts data and the link function $λ (x) = log (1 + e^{x})$ is used:

\begin{matrix} p (y_{nd}^{(m)} | β) = Poisson (y_{nd}^{(m)}, | λ, (z_{n :}, w_{: d}^{(m)})) \end{matrix}

The assumed joint prior distribution, $p (β)$ , is comprised of independent priors: $z_{nk} \sim Normal (0, 1)$ , $w_{kd}^{(m)} = {\hat{w}}_{kd}^{(m)} s_{kd}^{(m)}$ , ${\hat{w}}_{kd}^{(m)} \sim Normal (0, 1 / α_{k}^{(m)})$ , $α_{k}^{(m)} \sim Gamma (1 e^{- 14}, 1 e^{- 14})$ , $s_{kd}^{(m)} \sim Bernoulli (θ_{k}^{(m)})$ , $θ_{k}^{(m)} \sim Beta (1, 1)$ , $τ_{d}^{(m)} \sim Gamma (1 e^{- 14}, 1 e^{- 14})$ .

MOFA uses mean-field variational inference [32, 45, 46] to approximate the joint posterior distribution, $p (β | Y)$ , with a joint variational distribution factorized over J disjoint groups of variables:

\begin{matrix} q (β) & = \overset{J}{\prod_{j = 1}} q (β_{j}) \\ = \overset{N}{\prod_{n = 1}} \overset{K}{\prod_{k = 1}} q (z_{nk}) \overset{M}{\prod_{m = 1}} \overset{K}{\prod_{k = 1}} q (α_{k}^{(m)}) q (θ_{k}^{(m)}) \overset{M}{\prod_{m = 1}} \overset{D_{m}}{\prod_{d = 1}} q (τ_{d}^{(m)}) \overset{M}{\prod_{m = 1}} \overset{D_{m}}{\prod_{d = 1}} \overset{K}{\prod_{k = 1}} q ({\hat{w}}_{kd}^{(m)}, s_{kd}^{(m)}) \end{matrix}

MOFA infers $q (β)$ iteratively until convergence. At each iteration, each $q (β_{j})$ is updated as

\begin{matrix} q (β_{j}) \propto e x p {E_{q_{- j}} [log p (β, \hat{Y})]} \end{matrix}

where $E_{q_{- j}}$ denotes an expectation with respect to the joint variational distribution, after removing $q (β_{j})$ . The dataset $\hat{Y}$ is derived by transformation of $Y$ . Observed data with a Gaussian assumed likelihood are transformed with feature-wise centering, which avoids the need to estimate intercepts. Observed data with a non-Gaussian assumed likelihood are transformed to derive Gaussian pseudo-data. The derivation of Gaussian pseudo-data occurs at each iteration, and is based on a new parameter, $ζ_{nd}^{(m)}$ , which is derived for each sample n and feature d from matrix m. For observed data with a Bernoulli assumed likelihood, a precision parameter, $τ_{nd}^{(m)}$ , is introduced for each sample and feature as part of the transformation:

\begin{matrix} {\hat{y}}_{nd}^{(m)} = \frac{2 y_{nd}^{(m)} - 1}{2 τ_{nd}^{(m)}} \end{matrix}

\begin{matrix} τ_{nd}^{(m)} = 2 λ (ζ_{nd}^{(m)}) \end{matrix}

\begin{matrix} {(ζ_{nd}^{(m)})}^{2} = E_{q} [{(z_{n :}, w_{: d}^{(m)})}^{2}] \end{matrix}

For observed data with a Poisson assumed likelihood, a precision parameter, $τ_{d}^{(m)}$ , is introduced for each feature as part of the transformation:

\begin{matrix} {\hat{y}}_{nd}^{(m)} = ζ_{nd}^{(m)} - \frac{π (ζ_{nd}^{(m)}) (1 - y_{nd}^{(m)} / λ (ζ_{nd}^{(m)}))}{τ_{d}^{(m)}} \end{matrix}

\begin{matrix} ζ_{nd}^{(m)} = E_{q} [z_{n :}, w_{: d}^{(m)}] \end{matrix}

\begin{matrix} τ_{d}^{(m)} = 0.25 + 0.17 \times m a x (y_{: d}^{(m)}) \end{matrix}

where $y_{: d}^{(m)}$ is the vector of observed values for the dth feature from the mth matrix. For both Bernoulli and Poisson observed data, the ${\hat{y}}_{nd}^{(m)}$ values are centered at each iteration, and $ζ_{nd}^{(m)}$ values are derived using the factorization fit from the preceding iteration. MOFA monitors convergence with the evidence lower bound (ELBO), which is used to evaluate how well the variational distribution approximates the posterior distribution. The ELBO is calculated as:

\begin{matrix} ELBO (β) = E_{q} [log p (Y | β)] + E_{q} [log p (β)] - E_{q} [log q (β)] \end{matrix}

For $Y^{(m)}$ with non-Gaussian assumed likelihood, MOFA uses a lower bound for each $log p (y_{nd}^{(m)}, | β)$ . Maximizing this lower bound, coupled with the use of ${\hat{y}}_{nd}^{(m)}$ values, allows updates of $q (β)$ based on the assumption of Gaussian observed data [33, 34]. MOFA assesses convergence at regular intervals, based on the percentage change in ELBO after. MOFA allows factors to be dropped during training, based on the fraction of variance explained for each matrix. After each iteration, MOFA identifies factors that do not explain a fraction of variance, for any omics matrix, over a threshold. MOFA then drops one of the identified factors.

Multi-omics data simulated with groundtruth factors

We simulated multi-omics datasets, from groundtruth factors, with various configurations. For each simulation configuration, we generated 30 instances of a multi-omics dataset, $Y$ , consisting of matrices, $Y^{(m)}, m = 1, 2, 3$ . We split each $Y$ into a target dataset, $T$ , and a learning dataset, $L$ . Each $Y^{(m)} = [y_{nd}^{(m)}] \in R^{N \times D_{m}}$ contained data for $N = N_{t} + N_{l}$ samples (rows) and $D_{m} = 2000$ features (columns), where $y_{nd}^{(m)}$ is the value for the nth sample and the dth feature from the mth matrix. $N_{t}$ is the number of samples subsequently belonging to $T$ , and $N_{l}$ is the number of samples belonging to $L$ . We generated each $Y^{(m)}$ from a different statistical distribution, conditional on a random matrix of sample scores, $Z = [z_{nk}] \in R^{N \times K}$ , and a random matrix of feature weights, $W^{(m)} = [w_{kd}^{(m)}] \in R^{K \times D_{m}}$ . The kth column vector of $Z$ contained sample scores for the kth groundtruth factor. The kth row vector of $W^{(m)}$ contained feature weights for that same factor. We varied the number of groundtruth factors across configurations, using $K \in {20, 30}$ .

We generated $Z$ based on each sample being a member of a group. In each instance we created two groups of five samples belonging to $T$ , meaning $N_{t}$ was always equal to 10 samples. We allowed $N_{l}$ to vary across instances, with samples belonging to $L$ being in differently sized groups of randomly selected sizes. We used either 20 learning groups of size $\in {10, 20, 30}$ , or 40 groups of size $\in {10, 25, 40}$ . For the nth sample and kth groundtruth factor, we generated the score as $z_{nk} \sim Normal (μ_{g (n) k}, σ_{z})$ , where $μ_{g (n) k}$ is the mean parameter for groundtruth factor k, for the group that sample n belonged to, g(n). In each instance we selected $μ_{g (n) k}$ randomly for each group and groundtruth factor, with probabilities $P r (3) = 1 / 8$ , $P r (5) = 3 / 4$ , $P r (7) = 1 / 8$ . The kth groundtruth factor was differentially active for $T$ if $μ_{g (n) k}$ differed between the two target dataset groups. For all instances of a given simulation configuration, the same standard deviation parameter, $σ_{z}$ , was shared by all groups and groundtruth factors. We varied the latent noise-to-signal ratio across our simulation configurations by using $σ_{z} \in {0.5, 1.0}$ .

For the kth groundtruth factor, and the dth feature from the mth matrix, we generated the weight as $w_{kd}^{(m)} = {\hat{w}}_{kd}^{(m)} s_{kd}^{(m)}$ . As such, each $w_{kd}^{(m)}$ was the product of a continuous random variable, ${\hat{w}}_{kd}^{(m)} \sim Normal (μ^{(m)}, σ_{k}^{(m)})$ , and a binary random variable, $s_{kd}^{(m)} \sim Bernoulli (θ_{k}^{(m)})$ . We specified $μ^{(m)}$ , the mean parameter for the mth matrix, with $μ^{(1)} = 5$ ; $μ^{(2)} = 0$ ; $μ^{(3)} = 0$ . We generated, $σ_{k}^{(m)}$ , the standard deviation parameter for the kth groundtruth factor and the mth matrix, with $σ_{k}^{(1)} \sim Uniform (0.5, 1.5)$ ; $σ_{k}^{(2)} \sim Uniform (0.5, 1.5)$ ; $σ_{k}^{(3)} \sim Uniform (0.1, 0.2)$ . We generated the sparsity for the kth groundtruth factor and the mth matrix, $1 - θ_{k}^{(m)}$ , with $θ_{k}^{(m)} \sim Uniform (0.15, 0.25)$ .

We generated the values in each $Y^{(m)}$ as:

\begin{matrix} y_{nd}^{(1)} & \sim Poisson (log, (1 + exp (z_{n :}, w_{: d}^{(1)}))) \\ y_{nd}^{(2)} & \sim Normal (z_{n :} w_{: d}^{(2)}, σ_{d} \sim Uniform (0.25, 0.75)) \\ y_{nd}^{(3)} & \sim Bernoulli (1 / (1 + exp (- z_{n :} w_{: d}^{(3)}))) \end{matrix}

where $z_{n :}$ is the vector of scores for the nth sample, $w_{: d}^{(m)}$ is the vector of weights the dth feature from the mth matrix, and $σ_{d}$ is the standard deviation for the dth feature.

We split each $Y^{(m)}$ into $T^{(m)} = [t_{nd}^{(m)}] \in R^{N_{t} \times D_{m}}$ , which contained values for the target group samples, and $L^{(m)} = [l_{nd}^{(m)}] \in R^{N_{l} \times D_{m}}$ , which contained values for the learning group samples.

Before direct factorization with MOFA we pre-processed simulated $T$ and $L$ datasets by removing features with 0 variance across samples. Before factorization with transfer learning with MOTL, we pre-processed simulated $T$ datasets by removing features that had 0 variance across samples or that had been removed from the corresponding $L$ datasets.

TCGA multi-omics data acquisition and pre-processing

We used the R packages TCGAbiolinks (v.2.25.3) and SummarizedExperiment (v.1.28.0) to download and save TCGA mRNA expression, miRNA expression, DNA methylation, and simple nucleotide variation (SNV) data [47–49]. The mRNA and miRNA expression data consisted of raw counts. The DNA methylation data consisted of CpG site $β$ -values, which had been derived from HM450 array intensities with R package SeSAMe (v.1.16.0) [50]. The SNV data consisted of masked somatic mutation files.

We created four reference datasets, using data from three cancer types; acute myeloid leukemia (LAML), pancreatic adenocarcinoma (PAAD) and skin cutaneous melanoma (SKCM). Each reference dataset, $R$ , contained multi-omics data for all samples from either two, or all three of the cancer types. We did not include SNV data in $R$ datasets containing LAML samples, due to the sparsity of SNV data for LAML. We only used samples that had data for all omics of interest, and only included one sample per study participant. We thus had multi-omics data for 134 LAML samples, 157 PAAD samples, and 435 SKCM samples. We then randomly split each $R$ into non-overlapping target datasets. Each resulting target dataset, $T$ , contained multi-omics data for five samples per cancer type.

For the evaluation protocol based on TCGA multi-omics data, we merged data from the remaining 29 cancer types into a learning dataset, $L$ . For this $L$ , we only used samples that had data for all four omics, and only included one sample per study participant. This $L$ contained multi-omics data (mRNA, miRNA, DNA methylation, and SNV) for 7217 samples.

For the application of MOTL to the pd-GBSC target datasets, we created a new learning dataset by merging data from all 32 cancer types. This new learning dataset contained multi-omics data (mRNA, miRNA, DNA methylation, and SNV) for 7866 samples.

Before direct factorization with MOFA, we pre-processed $R$ , $T$ , and $L$ datasets in the same way. For mRNA data, we removed genes that map to the Y chromosome. For both mRNA and miRNA, we removed genes if they had a count of zero in $\geq 90 %$ of samples, or had zero variance across samples. We normalized mRNA and miRNA counts with the DESeq2 (v.1.38.0) R package [51], and ${log}_{2} (x + 1)$ transformed the normalized counts. For DNA methylation data, we removed CpG sites that map to the X or Y chromosome, were masked during SeSAMe quality control, had missing values in $\geq 20 %$ of samples, or had zero variance across samples. We converted DNA methylation $β$ -values to M-values [52]. We included SNV records whose variant classification was either Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins, Missense_Mutation, Nonsense_Mutation, Nonstop_Mutation, Splice_Site, or Translation_Start_Site. We then created binary SNV matrices aggregated by gene and sample. We removed genes from SNV matrices if the mutation rate across samples was $\leq 1 %$ . We filtered all omics to include only the 5000 most variable features. We did not perform any batch effect correction on $L$ datasets in order to preserve biological signal [53]. We checked each $R$ for batch effects with visualizations of UMAP co-ordinates [54]. We used the R package uwot (v.0.1.14) to derive UMAP coordinates from MOFA factorizations, and we did not observe the need to correct $R$ datasets for batch effects.

Before factorization with transfer learning with MOTL, we pre-processed $T$ datasets by removing all omics features that had zero variance across samples, or that had been removed from $L$ during pre-processing. We used DESeq2 to normalize mRNA and miRNA counts with the geometric means from $L$ , and then ${log}_{2} (x + 1)$ transformed the normalized counts. We converted DNA methylation $β$ -values to M-values, and converted SNV data to binary matrices after filtering on variant classification, as described previously.

Glioblastoma target dataset acquisition and pre-processing

We created four pd-GBSC target datasets, based on multi-omics profiling conducted by Santamarina-Ojeda et al. [36] for four normal brain samples and nine patient-derived glioblastoma stem cell (pd-GBSC) cultures. The nine cancer samples had been previously classified into three subtypes thanks to transcriptome-based signatures: classical (CL), proneural (PN), and mesenchymal (MS). Each pd-GBSC target dataset contained mRNA expression and DNA methylation data for the four normal brain cortex samples, as well as either all nine cancer samples or just the samples from a subtype.

For factorization with transfer learning with MOTL, the pd-GBSC target datasets initially consisted of mRNA expression raw counts and DNA methylation $β$ -values. Before factorization with MOTL, we pre-processed a pd-GBSC target dataset by removing all omics features that had zero variance across samples, or that had been removed from $L$ during pre-processing. We used DESeq2 to normalize mRNA counts with the geometric means from $L$ , and then ${log}_{2} (x + 1)$ transformed the normalized counts. We converted DNA methylation $β$ -values to M-values.

For direct MOFA factorization, without transfer learning, the pd-GBSC target datasets initially consisted of the same mRNA expression data, but already normalized and transformed by Santamarina-Ojeda et al. [36], and DNA methylation $β$ -values. Before direct MOFA factorization, we pre-processed mRNA data by removing genes that map to the Y chromosome, if they had a count of zero in $\geq 90 %$ of samples, or had zero variance across samples. We pre-processed DNA methylation data by removing CpG sites that had missing values in $\geq 20 %$ of samples, or had zero variance across samples. We converted DNA methylation $β$ -values to M-values. We filtered both omics to include only the 5000 most variable features.

Application of MOFA to simulated, TCGA, and glioblastoma multi-omics datasets

We factorized simulated target, $T$ , and learning, $L$ , datasets with the MOFA Python implementation mofapy2 (v.0.6.4). The number of factors we used for each MOFA factorization was equal to the lesser of the number of samples and the number of groundtruth factors that were differentially active when simulating the dataset. The kth groundtruth factor was differentially active for a dataset if the mean parameter, $μ_{g (n) k}$ , for the sample scores for that factor, was not the same for all groups of samples in the dataset. We specified observed data likelihoods corresponding to those used for simulating $Y^{(m)}$ matrices. We set the maximum number of iterations to 10,000 to ensure convergence. For the remaining settings, we used the mofapy2 defaults, meaning that all datasets were feature-wise centered during factorization fitting.

We factorized pre-processed TCGA reference, $R$ , target, $T$ , and learning, $L$ , datasets with the MOFA Python implementation mofapy2 (v.0.7.0). We specified Gaussian as the observed data likelihood for mRNA, miRNA, and DNA methylation data, and specified Bernoulli as the likelihood for SNV data. For the $L$ datasets, we started the factorization with 100 factors and allowed factors to be dropped based on the fraction of variance explained, for which we set the threshold to 0.001. We set the threshold so low in order to retain factors that explained little of the variance in $L$ , yet could be potentially relevant for transfer learning. For $R$ datasets, we also started with 100 factors. For $T$ datasets, we started with the maximum number of factors allowed by MOFA, which was either 10 factors (two cancer types) or 15 factors (three cancer types). For $R$ and $T$ datasets, we dropped factors based on a threshold of 0.01, in order to only retain relevant factors. For all TCGA datasets, we set the maximum number of iterations to 10,000, to ensure convergence, and the frequency of convergence checking to five, to ensure that the algorithm had stopped dropping factors before converging.

When saving the factorizations of simulated and TCGA $L$ datasets, we set the expectations argument to all. We did this to ensure that the point estimate for each precision parameter was saved in addition to those that are saved by default.

We factorized the pre-processed pd-GBSC target datasets with the MOFA Python implementation mofapy2 (v.0.7.0). We specified Gaussian as the observed data likelihood for the mRNA and the DNA methylation data. We started with the maximum allowable number of factors and dropped factors based on a threshold of 0.01.

Application of MOTL to simulated, TCGA, and glioblastoma multi-omics datasets

We applied MOTL to simulated, TCGA and pd-GBSC multi-omics target datasets. For each target dataset, we used point estimates of feature weight and precision values saved from the MOFA factorization of the corresponding learning, $L$ , dataset. For observed data with a Gaussian or Poisson assumed likelihood, the transferred value of the precision for each feature, $τ_{d}^{(m)}$ , was held fixed throughout iterations of MOTL updates. For observed data with a Bernoulli assumed likelihood, we initialized the value of the precision for each sample and feature, $τ_{nd}^{(m)}$ , with a feature-wise average, $τ_{d}^{(m)}$ , of the $τ_{nd}^{(m)}$ values from the factorization of $L$ . The precisions for Bernoulli observed data were then iteratively updated by MOTL. We estimated intercepts using likelihoods assumed for $L$ , combined with outputs from the MOFA factorization of $L$ . For Gaussian observed data we calculated the intercept for the $dth$ feature, from the $mth$ matrix, as $a_{d}^{(m)} = \frac{1}{N_{l}} \overset{N_{l}}{\sum_{n = 1}} l_{nd}^{(m)}$ , where $l_{nd}^{(m)}$ denotes an uncentered learning dataset value after pre-processing. For Poisson and Bernoulli observed data we obtained maximum likelihood estimates of $a_{d}^{(m)}$ values, for which we used the $mle$ function from the R package $stats 4$ (v.4.2.0). For Poisson observed data we initialized each estimate with $a_{d}^{(m)} = log (- 1 + exp (\frac{1}{N_{l}}, \sum_{n = 1}^{l}, l_{nd}^{(m)}))$ and for Bernoulli observed data we initialized it with $a_{d}^{(m)} = log ((\frac{1}{N_{l}}, \sum_{n = 1}^{l}, l_{nd}^{(m)}), {(1 - \frac{1}{N_{l}} \sum_{n = 1}^{l} l_{nd}^{(m)})}^{- 1})$ .

To evaluate robustness, we also applied MOTL to each simulated target dataset after permuting the values in feature vectors of the weight matrices inferred from $L$ . We used a range of proportions for each simulation instance, with the proportions of feature weight vectors permuted in each instance being:

{0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}

When checking the ELBO for convergence, we used 0.0005% as the threshold, which is the default for MOFA. The algorithm was stopped when the absolute change in ELBO was under this threshold for two consecutive checks, and we set the maximum number of iterations to 10,000 to be consistent with our application of MOFA. For TCGA and pd-GBSC target datasets, and for simulated target datasets factorized after permuting feature weight values, we allowed factors to be dropped based on a threshold of 0.01 for the fraction of variance explained. We checked the ELBO after every five iterations, to ensure that the algorithm had stopped dropping factors before converging.

Evaluation methods

Groundtruth factors: For each simulated $T$ (the Methods “Multi-omics data simulated with groundtruth factors” section), the groundtruth factor values were contained in the corresponding simulated $Z$ and $W^{(m)}$ matrices. The sample scores for the kth groundtruth factor were contained in $z_{: k}$ , the kth column vector of simulated $Z$ . The feature weights for the mth matrix, for that same groundtruth factor, were contained in $w_{k :}^{(m)}$ , the kth row vector of simulated $W^{(m)}$ . For each TCGA $T$ (the Methods “TCGA multi-omics data acquisition and pre-processing” section), the groundtruth factors were based on the $R$ dataset which we had split to create $T$ . We factorized each $R$ with MOFA, and treated the inferred $z_{: k}$ and $w_{k :}^{(m)}$ vectors as groundtruth factors for each $T$ that had been created by splitting $R$ .

Differentially active groundtruth factors: For each simulated and TCGA $T$ , groundtruth factor k was differentially active if the group means for groundtruth $z_{: k}$ differed between the target dataset groups. For each simulated $T$ , this was the group mean, $μ_{g (n) k}$ , used to simulate groundtruth $z_{: k}$ . For each TCGA $T$ , the factorization of corresponding $R$ provided groundtruth $z_{: k}$ and $w_{k :}^{(m)}$ factor vectors. We performed either the Wilcoxian rank sum test (two cancer types), or the Kruskal-Wallis test (three cancer types), on each groundtruth $z_{: k}$ to determine if there was a statistically significant difference between the cancer types. We classed a groundtruth factor as differentially active if its BH-adjusted p-value was below 0.05.

Post-processing: We post-processed inferred and groundtruth $W^{(m)}$ matrices before evaluation. We scaled each feature vector, $w_{: d}^{(m)}$ , by its Frobenius norm. We then centered each factor vector, $w_{k :}^{(m)}$ , of scaled values separately for each m. We then concatenated $w_{k :}^{(m)}$ vectors to produce a single vector, $w_{k :}$ , of centered and scaled feature weights for each factor k.

Best hits: For each factorization of each simulated and TCGA $T$ , we identified the best hits between the factor vectors inferred with the factorization of $T$ , and the groundtruth factor vectors. For two sets of vectors ${v_{1}, . . ., v_{K_{v}}}$ and ${x_{1}, . . ., x_{K_{x}}}$ , we define the best hit for vector $v_{k_{v}}$ as

\begin{matrix} BestHit (v_{k_{v}}) = \underset{x_{k_{x}}}{arg max} cor (v_{k_{v}}, x_{k_{x}}) \end{matrix}

where $cor (v, x)$ is the Pearson correlation coefficient between vectors $v$ and $x$ . We define the best hit for vector $x_{k_{x}}$ as

\begin{matrix} BestHit (x_{k_{x}}) = \underset{v_{k_{v}}}{arg max} cor (x_{k_{x}}, v_{k_{v}}) \end{matrix}

For each simulated $T$ , we identified best hits between inferred and groundtruth $w_{k :}$ vectors. For each TCGA $T$ , we identified best hits between inferred and groundtruth $w_{k :}$ vectors, as well as between inferred and groundtruth $z_{: k}$ vectors. We used shared features when calculating correlations for $w_{k :}$ vectors, and we used shared samples for $z_{: k}$ vectors. We calculated p-values for the correlations, and only considered correlations with a p-value $< 0.05$ (two-sided alternative hypothesis) when identifying best hits.

F-measure values: For each factorization of each TCGA $T$ , we calculated an F-measure value to assess the overall correlation between factor vectors inferred with the factorization of $T$ , and groundtruth factor vectors. We based this on the F-measure presented by Saelens et al. [55], which we adapted in order to assess correlations. For a given set of inferred factor vectors, ${v_{1}, . . ., v_{K_{v}}}$ , and a set of groundtruth factor vectors, ${x_{1}, . . ., x_{K_{x}}}$ , we calculated the F-measure as

\begin{matrix} F M = 2 / ((1 / R e l e v a n c e) + (1 / R e c o v e r y)) \end{matrix}

where

\begin{matrix} R e l e v a n c e = \frac{1}{K_{v}} \sum_{k_{v} = 1}^{K_{v}} cor (v_{k_{v}}, BestHit (v_{k_{v}})) \end{matrix}

and

\begin{matrix} R e c o v e r y = \frac{1}{K_{x}} \sum_{k_{x} = 1}^{K_{x}} cor (x_{K_{x}}, BestHit (x_{K_{x}})) \end{matrix}

Here, $cor (v, x)$ is the Pearson correlation coefficient between vectors $v$ and $x$ . We calculated F-measure values for sets of inferred and groundtruth $w_{k :}$ vectors, as well as for $z_{: k}$ vectors.

F1 scores: We calculated F1 scores to evaluate the factorizations of simulated and TCGA $T$ datasets:

\begin{matrix} F 1 = (2 \times P r e c i s i o n \times R e c a l l) / (P r e c i s i o n + R e c a l l) \\ P r e c i s i o n = T r u e P o s i t i v e s / P r e d i c t e d P o s i t i v e s \\ R e c a l l = T r u e P o s i t i v e s / A c t u a l P o s i t i v e s \end{matrix}

$A c t u a l P o s i t i v e s$ were the groundtruth factors of $T$ , that were differentially active.

$P r e d i c t e d P o s i t i v e s$ were the groundtruth factors that were predicted as being differentially active, based on the factorization of $T$ . We firstly performed either the Wilcoxian rank sum test (two groups), or the Kruskal-Wallis test (three groups), on each $z_{: k}$ vector inferred with the factorization of $T$ , and classed factors with a p-value $< 0.05$ as differentially active. For inferred factors classed as differentially active, we identified the best hits for their corresponding inferred $w_{k :}$ vectors. We selected these best hits from the set of groundtruth $w_{k :}$ vectors for $T$ . If groundtruth $w_{k :}$ was selected as a best hit for a differentially active inferred factor, then groundtruth factor k was predicted as being differentially active.

$T r u e P o s i t i v e s$ were the differentially active groundtruth factors that were predicted as being differentially active, based on the factorization of $T$ .

Statistical testing of differences between factorization methods: We calculated the differences in evaluation measures between factorization methods, and tested the statistical significance of these differences. To do this, we fit generalized least squares regressions with the R package nlme (v.3.1.157) [56]. We fit a single regression to model the F1 scores for simulated data. For TCGA data, we fit a separate regression for each evaluation measure. For each regression, we modeled $y_{i} = β_{0} + d_{i} β_{d} + f_{i} β_{f} + ϵ_{i}$ . The vector $d_{i} = (d_{i 1}, . . ., d_{iT})$ indicates the simulation configuration, or cancer type, that $y_{i}$ relates to, and vector $f_{i} = (f_{i 1}, . . ., f_{iM})$ indicates the factorization method. Vectors $β_{d} = {(β_{d 1}, . . ., β_{dT})}^{⊤}$ and $β_{f} = {(β_{f 1}, . . ., β_{fM})}^{⊤}$ are estimated fixed effects and $ϵ_{i}$ is the residual. We incorporated correlations between residuals from the same target dataset using the compound symmetry structure method. We calculated contrasts for the factorization method effects in $β_{f}$ using the R package emmeans (v.1.8.7) [57], and used Tukey-adjusted p-values for assessing statistical significance.

Differentially active factors from glioblastoma target datasets: We identified differentially active factors from the MOTL factorization of each pd-GBSC target dataset, as well as from the direct MOFA factorization (without transfer learning), of each pd-GBSCs target dataset. We performed the Wilcoxian Rank Sum test on each $z_{: k}$ vector inferred with the factorization of a pd-GBSC target dataset. We classed factors with a BH-adjusted p-value $< 0.05$ as differentially active between the normal samples and the cancer samples.

Gene set enrichment analysis: We used R package fgsea (v.1.24.0) [58] to perform gene set enrichment analysis on differentially active groundtruth factors that were true positives for factorizations of TCGA $T$ datasets. For each differentially active groundtruth TCGA factor k, we analyzed vector $w_{k}^{(m)}$ if the fraction of mRNA variance explained by k was $> 0.01$ , and where m corresponded to the mRNA matrix. We tested KEGG, Reactome, GO:BP, and GO:CC gene sets that have a size of between 15 and 500 genes, obtained using the R package msigdbr (v.7.5.1). We used an BH-adjusted p-value cutoff of 0.01 for selecting enriched gene sets. We also performed gene set enrichment analysis on differentially active factors from the pd-GBSC target datasets, and used the same criteria as outlined above for differentially active groundtruth TCGA factors. We filtered, clustered, and plotted enrichment analysis results with the Python package orsum (v.1.8.0) [37]. We ran orsum with maxRepSize = 2000; maxTermSize = 3000; minTermSize = 15; numberOfTermsToPlot = 30.

Processing time

We used a Dell computer with 20 cores at 3GHz, and 64 GB of RAM, to perform factorizations. To pre-process and factorize the $L$ used in the TCGA evaluation protocol, it took 26,405 seconds (over seven hours). Hence, we have made the factorization of a large TCGA $L$ dataset publicly available for transfer learning. It took an average of 37 seconds to pre-process a $T$ dataset, comprised of four omics, and factorize it directly with MOFA. The average time increased to 134 seconds for MOTL.

Supplementary Information

13059_2025_3675_MOESM1_ESM.pdf^{(958.8KB, pdf)}

Additional file 1. Supplementary methods and supplementary Fig. S1-S5.

13059_2025_3675_MOESM2_ESM.csv^{(1,015.8KB, csv)}

Additional file 2. Table S1: gene sets associated with differentially active groundtruth TCGA factors.

13059_2025_3675_MOESM3_ESM.csv^{(3.6KB, csv)}

Additional file 3. Table S2: the percentage of variance explained, by groundtruth TCGA factors, for each omics.

13059_2025_3675_MOESM4_ESM.csv^{(332.4KB, csv)}

Additional file 4. Table S3: gene sets associated with factors differentially active between all pd-GBSC samples and normal samples.

13059_2025_3675_MOESM5_ESM.csv^{(1.3MB, csv)}

Additional file 5. Table S4: gene sets associated with factors differentially active between all pd-GBSC samples and normal samples, as well as those associated with factors differentially active between pd-GBSC subtype samplesand normal samples.

Acknowledgements

We would like to thank Carl Herrmann, Céline Chevalier, Kim-Anh Lê Cao, Lionel Spinelli, Olivia Angelin-Bonnet, and two anonymous reviewers for helpful feedback and discussions.

For the purpose of open access, the authors have applied a CC BY-NC-ND public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.

Peer review information

Andrew Cosgrove was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.

Authors’ contributions

A.B., L.C., P.V., M.V., and D.P.H conceived the project. D.P.H conceived the model and evaluation protocols with input from all authors. D.P.H. and M.T. contributed to the code and implemented the model. D.P.H. generated the figures and results. D.P.H., A.B., and M.V. wrote the manuscript with feedback from all authors. A.B. and M.V. supervised the project. All authors reviewed and approved the final manuscript.

Funding

This research was supported by l’Agence Nationale de la Recherche (ANR), project ANR-21-CE45-0001-01, by France 2030 PEPR Digital Health managed by ANR, project ANR-22-PESN-0013, by the Royal Society of New Zealand, Catalyst project CSG-MAU1902 and by the Association Française contre les Myopathies (AFM).

Data availability

An open source (GPL-3.0 license) R implementation of MOTL is available at [35] https://github.com/david-hirst/MOTL along with the code for the evaluation protocols. The version of the code that can be used to reproduce the analyses in this study has been archived on Zenodo [59] (https://doi.org/10.5281/zenodo.15486697). We obtained the TCGA [48] data used in this study from the GDC data portal [60] (https://portal.gdc.cancer.gov/) with the TCGAbiolinks R package [47]. We assembled the glioblastoma target dataset with data archived on Zenodo [61] (https://doi.org/10.5281/zenodo.7380252). The factorization fit of the full TCGA learning dataset, used for the application of MOTL to the glioblastoma target dataset, is available at Zenodo [62] (https://doi.org/10.5281/zenodo.10848217).

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

David P. Hirst, Email: david.hirst@univ-amu.fr

Anaïs Baudot, Email: anais.baudot@univ-amu.fr.

References

1.Conesa A, Beck S. Making multi-omics data accessible to researchers. Sci Data. 2019;6(1):251. 10.1038/s41597-019-0258-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Dermitzakis ET. From gene expression to disease risk. Nat Genet. 2008;40(5):492–3. 10.1038/ng0508-492. [DOI] [PubMed] [Google Scholar]
3.Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, et al. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinforma. 2018;19(2):286–302. 10.1093/bib/bbw114. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinforma Biol Insights. 2020;14. 10.1177/1177932219899051. [DOI] [PMC free article] [PubMed]
5.Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet. 2017;8:84. 10.3389/fgene.2017.00084. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546–62. 10.1093/nar/gky889. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Pierre-Jean M, Deleuze JF, Le Floch E, Mauger F. Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration. Brief Bioinforma. 2020;21(6):2011–30. 10.1093/bib/bbz138. [DOI] [PubMed] [Google Scholar]
8.Huang Y, Du C, Xue Z, Chen X, Zhao H, Huang L. What Makes Multi-Modal Learning Better than Single (Provably). In: Advances in Neural Information Processing Systems. vol. 34. Curran Associates, Inc.; 2021. pp. 10944–56. https://papers.nips.cc/paper_files/paper/2021/file/5aa3405a3f865c10f420a4a7b55cbff3-Paper.pdf.
9.Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun. 2021;12(1):124. 10.1038/s41467-020-20430-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ballard J, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min. 2024;17(1):38. 10.1186/s13040-024-00391-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Baião AR, Cai Z, Poulos RC, Robinson PJ, Reddel RR, Zhong Q, et al. A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches. arXiv. 2025. 10.48550/arXiv.2501.17729.
12.Chauvel C, Novoloaca A, Veyre P, Reynier F, Becker J. Evaluation of integrative clustering methods for the analysis of multi-omics data. Brief Bioinforma. 2020;21(2):541–52. 10.1093/bib/bbz015. [DOI] [PubMed] [Google Scholar]
13.Wen Y, Zheng L, Leng D, Dai C, Lu J, Zhang Z, et al. Deep Learning-Based Multiomics Data Integration Methods for Biomedical Application. Adv Intell Syst. 2023;5(5):2200247. 10.1002/aisy.202200247. [Google Scholar]
14.Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, et al. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet. 2018;34(10):790–805. 10.1016/j.tig.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration–a comparison of unsupervised clustering methodologies. Brief Bioinforma. 2019;20(4):1269–79. 10.1093/bib/bbx167. [DOI] [PubMed] [Google Scholar]
16.Chalise P, Fridley BL. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS ONE. 2017;12(5):e0176278. 10.1371/journal.pone.0176278. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. 2013;7(1):523–42. 10.1214/12-AOAS597. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Meng C, Helm D, Frejno M, Kuster B. moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. J Proteome Res. 2016;15(3):755–65. 10.1021/acs.jproteome.5b00824. [DOI] [PubMed] [Google Scholar]
19.Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14(6):e8124. 10.15252/msb.20178124. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Taroni JN, Grayson PC, Hu Q, Eddy S, Kretzler M, Merkel PA, et al. MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease. Cell Syst. 2019;8(5):380-394.e4. 10.1016/j.cels.2019.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods. 2023;1–12. 10.1038/s41592-023-01886-z. [DOI] [PubMed]
22.Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):9. 10.1186/s40537-016-0043-6. [Google Scholar]
23.Stein-O’Brien GL, Clark BS, Sherman T, Zibetti C, Hu Q, Sealfon R, et al. Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Syst. 2019;8(5):395-411.e8. 10.1016/j.cels.2019.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Veeramachaneni SD, Pujari AK, Padmanabhan V, Kumar V. A Maximum Margin Matrix Factorization based Transfer Learning Approach for Cross-Domain Recommendation. Appl Soft Comput. 2019;85:105751. 10.1016/j.asoc.2019.105751. [Google Scholar]
25.Dong A, Li Z, Zheng Q. Transferred Subspace Learning Based on Non-negative Matrix Factorization for EEG Signal Classification. Front Neurosci. 2021;15. 10.3389/fnins.2021.647393. [DOI] [PMC free article] [PubMed]
26.Peng M, Li Y, Wamsley B, Wei Y, Roeder K. Integration and transfer learning of single-cell transcriptomes via cFIT. Proc Natl Acad Sci. 2021;118(10):e2024383118. 10.1073/pnas.2024383118. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Fertig EJ, Ding J, Favorov AV, Parmigiani G, Ochs MF. CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics. 2010;26(21):2792–3. 10.1093/bioinformatics/btq503. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Sharma G, Colantuoni C, Goff LA, Fertig EJ, Stein-O’Brien G. projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering. Bioinformatics. 2020;36(11):3592–3. 10.1093/bioinformatics/btaa183. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Davis-Marcisak EF, Fitzgerald AA, Kessler MD, Danilova L, Jaffee EM, Zaidi N, et al. Transfer learning between preclinical models and human tumors identifies a conserved NK cell activation signature in anti-CTLA-4 responsive tumors. Genome Med. 2021;13(1). 10.1186/s13073-021-00944-5. [DOI] [PMC free article] [PubMed]
30.Mao W, Zaslavsky E, Hartmann BM, Sealfon SC, Chikina M. Pathway-level information extractor (PLIER) for gene expression data. Nat Methods. 2019;16(7):607–10. 10.1038/s41592-019-0456-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, et al. Reproducible RNA-seq analysis using recount2. Nat Biotechnol. 2017;35(4):319–21. 10.1038/nbt.3838. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Blei DM, Kucukelbir A, McAuliffe JD. Variational Inference: A Review for Statisticians. arXiv. 2017. 10.48550/arXiv.1601.00670.
33.Jaakkola TS, Jordan MI. Bayesian parameter estimation via variational methods. Stat Comput. 2000;10(1):25–37. 10.1023/A:1008932416310. [Google Scholar]
34.Seeger M, Bouchard G. Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. PMLR; 2012. pp. 1012–8. https://proceedings.mlr.press/v22/seeger12.html.
35.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. MOTL: enhancing multi-omics matrix factorization with transfer learning. Github. 2025. https://github.com/david-hirst/MOTL.
36.Ozisik O, Térézol M, Baudot A. orsum: a Python package for filtering and comparing enrichment analyses using a simple principle. BMC Bioinformatics. 2022;23(1):293. 10.1186/s12859-022-04828-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Bunne C, Schiebinger G, Krause A, Regev A, Cuturi M. Optimal transport for single-cell and spatial omics. Nat Rev Methods Prim. 2024;4(1):58. 10.1038/s43586-024-00334-2. [Google Scholar]
38.Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans Neural Netw. 2011;22(2):199–210. 10.1109/TNN.2010.2091281. [DOI] [PubMed] [Google Scholar]
39.Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2024;52(D1):D1333–46. 10.1093/nar/gkad1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, et al. Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(D1):D940–6. 10.1093/nar/gkr972. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Rohart F, Gautier B, Singh A, Cao KAL. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol. 2017;13(11):e1005752. 10.1371/journal.pcbi.1005752. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics (Oxford, England). 2018;19(1):71–86. 10.1093/biostatistics/kxx017. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21(1):111. 10.1186/s13059-020-02015-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Grimmer J. An Introduction to Bayesian Inference via Variational Approximations. Political Anal. 2011;19(1):32–47. [Google Scholar]
45.Fox CW, Roberts SJ. A tutorial on variational Bayesian inference. Artif Intell Rev. 2012;38(2):85–95. 10.1007/s10462-011-9236-8. [Google Scholar]
46.Silva TC, Colaprico A, Olsen C, D’Angelo F, Bontempi G, Ceccarelli M, et al. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research. 2016;5:1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Hutter C, Zenklusen JC. The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell. 2018;173(2):283–5. 10.1016/j.cell.2018.03.042. [DOI] [PubMed] [Google Scholar]
48.Mounir M, Lucchetta M, Silva TC, Olsen C, Bontempi G, Chen X, et al. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput Biol. 2019;15(3):e1006701. 10.1371/journal.pcbi.1006701. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Zhou W, Triche TJ Jr, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018;46(20):e123. 10.1093/nar/gky691. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11(1):587. 10.1186/1471-2105-11-587. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Lee AJ, Park Y, Doing G, Hogan DA, Greene CS. Correcting for experiment-specific variability in expression compendia can remove underlying signals. GigaScience. 2020;9(11). 10.1093/gigascience/giaa117. [DOI] [PMC free article] [PubMed]
53.McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. 2020. 10.48550/arXiv.1802.03426.
54.Saelens W, Cannoodt R, Saeys Y. A comprehensive evaluation of module detection methods for gene expression data. Nat Commun. 2018;9(1):1090. 10.1038/s41467-018-03424-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. Statistics and Computing. New York: Springer; 2000. [Google Scholar]
56.Searle SR, Speed FM, Milliken GA. Population Marginal Means in the Linear Model: An Alternative to Least Squares Means. Am Stat. 1980;34(4):216–21. [Google Scholar]
57.Sergushichev AA. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv. 2016. 10.1101/060012. Accessed 30 Mar 2021.
58.Santamarina-Ojeda P, Tejedor JR, Pérez RF, López V, Roberti A, Mangas C, et al. Multi-omic integration of DNA methylation and gene expression data reveals molecular vulnerabilities in glioblastoma. Mol Oncol. 2023;17(9):1726–43. 10.1002/1878-0261.13479. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. david-hirst/MOTL: MOTL: enhancing multi-omics matrix factorization with transfer learning (v1.0.0). Zenodo. 2025. 10.5281/zenodo.15486697.
60.Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a Shared Vision for Cancer Genomic Data. N Engl J Med. 2016;375(12):1109–12. 10.1056/nejmp1607591. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Santamarina-Ojeda P, Tejedor JR, Pérez RF, López V, Roberti A, Mangas C, et al. Multi-omic integration of DNA methylation and gene expression data reveals molecular vulnerabilities in glioblastoma (processed data) (v1.0) [Data set]. Zenodo. 2023. 10.5281/zenodo.7380252. [DOI] [PMC free article] [PubMed]
62.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. MOTL: enhancing multi-omics matrix factorization with transfer learning [Data set]. Zenodo. 2024. 10.5281/zenodo.10848217.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13059_2025_3675_MOESM1_ESM.pdf^{(958.8KB, pdf)}

Additional file 1. Supplementary methods and supplementary Fig. S1-S5.

13059_2025_3675_MOESM2_ESM.csv^{(1,015.8KB, csv)}

Additional file 2. Table S1: gene sets associated with differentially active groundtruth TCGA factors.

13059_2025_3675_MOESM3_ESM.csv^{(3.6KB, csv)}

Additional file 3. Table S2: the percentage of variance explained, by groundtruth TCGA factors, for each omics.

13059_2025_3675_MOESM4_ESM.csv^{(332.4KB, csv)}

Additional file 4. Table S3: gene sets associated with factors differentially active between all pd-GBSC samples and normal samples.

13059_2025_3675_MOESM5_ESM.csv^{(1.3MB, csv)}

Data Availability Statement

[CR1] 1.Conesa A, Beck S. Making multi-omics data accessible to researchers. Sci Data. 2019;6(1):251. 10.1038/s41597-019-0258-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Dermitzakis ET. From gene expression to disease risk. Nat Genet. 2008;40(5):492–3. 10.1038/ng0508-492. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, et al. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinforma. 2018;19(2):286–302. 10.1093/bib/bbw114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinforma Biol Insights. 2020;14. 10.1177/1177932219899051. [DOI] [PMC free article] [PubMed]

[CR5] 5.Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet. 2017;8:84. 10.3389/fgene.2017.00084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546–62. 10.1093/nar/gky889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Pierre-Jean M, Deleuze JF, Le Floch E, Mauger F. Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration. Brief Bioinforma. 2020;21(6):2011–30. 10.1093/bib/bbz138. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Huang Y, Du C, Xue Z, Chen X, Zhao H, Huang L. What Makes Multi-Modal Learning Better than Single (Provably). In: Advances in Neural Information Processing Systems. vol. 34. Curran Associates, Inc.; 2021. pp. 10944–56. https://papers.nips.cc/paper_files/paper/2021/file/5aa3405a3f865c10f420a4a7b55cbff3-Paper.pdf.

[CR9] 9.Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun. 2021;12(1):124. 10.1038/s41467-020-20430-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Ballard J, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min. 2024;17(1):38. 10.1186/s13040-024-00391-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Baião AR, Cai Z, Poulos RC, Robinson PJ, Reddel RR, Zhong Q, et al. A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches. arXiv. 2025. 10.48550/arXiv.2501.17729.

[CR12] 12.Chauvel C, Novoloaca A, Veyre P, Reynier F, Becker J. Evaluation of integrative clustering methods for the analysis of multi-omics data. Brief Bioinforma. 2020;21(2):541–52. 10.1093/bib/bbz015. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Wen Y, Zheng L, Leng D, Dai C, Lu J, Zhang Z, et al. Deep Learning-Based Multiomics Data Integration Methods for Biomedical Application. Adv Intell Syst. 2023;5(5):2200247. 10.1002/aisy.202200247. [Google Scholar]

[CR14] 14.Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, et al. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet. 2018;34(10):790–805. 10.1016/j.tig.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration–a comparison of unsupervised clustering methodologies. Brief Bioinforma. 2019;20(4):1269–79. 10.1093/bib/bbx167. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Chalise P, Fridley BL. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS ONE. 2017;12(5):e0176278. 10.1371/journal.pone.0176278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. 2013;7(1):523–42. 10.1214/12-AOAS597. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Meng C, Helm D, Frejno M, Kuster B. moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. J Proteome Res. 2016;15(3):755–65. 10.1021/acs.jproteome.5b00824. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14(6):e8124. 10.15252/msb.20178124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Taroni JN, Grayson PC, Hu Q, Eddy S, Kretzler M, Merkel PA, et al. MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease. Cell Syst. 2019;8(5):380-394.e4. 10.1016/j.cels.2019.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods. 2023;1–12. 10.1038/s41592-023-01886-z. [DOI] [PubMed]

[CR22] 22.Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):9. 10.1186/s40537-016-0043-6. [Google Scholar]

[CR23] 23.Stein-O’Brien GL, Clark BS, Sherman T, Zibetti C, Hu Q, Sealfon R, et al. Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Syst. 2019;8(5):395-411.e8. 10.1016/j.cels.2019.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Veeramachaneni SD, Pujari AK, Padmanabhan V, Kumar V. A Maximum Margin Matrix Factorization based Transfer Learning Approach for Cross-Domain Recommendation. Appl Soft Comput. 2019;85:105751. 10.1016/j.asoc.2019.105751. [Google Scholar]

[CR25] 25.Dong A, Li Z, Zheng Q. Transferred Subspace Learning Based on Non-negative Matrix Factorization for EEG Signal Classification. Front Neurosci. 2021;15. 10.3389/fnins.2021.647393. [DOI] [PMC free article] [PubMed]

[CR26] 26.Peng M, Li Y, Wamsley B, Wei Y, Roeder K. Integration and transfer learning of single-cell transcriptomes via cFIT. Proc Natl Acad Sci. 2021;118(10):e2024383118. 10.1073/pnas.2024383118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Fertig EJ, Ding J, Favorov AV, Parmigiani G, Ochs MF. CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics. 2010;26(21):2792–3. 10.1093/bioinformatics/btq503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Sharma G, Colantuoni C, Goff LA, Fertig EJ, Stein-O’Brien G. projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering. Bioinformatics. 2020;36(11):3592–3. 10.1093/bioinformatics/btaa183. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Davis-Marcisak EF, Fitzgerald AA, Kessler MD, Danilova L, Jaffee EM, Zaidi N, et al. Transfer learning between preclinical models and human tumors identifies a conserved NK cell activation signature in anti-CTLA-4 responsive tumors. Genome Med. 2021;13(1). 10.1186/s13073-021-00944-5. [DOI] [PMC free article] [PubMed]

[CR30] 30.Mao W, Zaslavsky E, Hartmann BM, Sealfon SC, Chikina M. Pathway-level information extractor (PLIER) for gene expression data. Nat Methods. 2019;16(7):607–10. 10.1038/s41592-019-0456-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, et al. Reproducible RNA-seq analysis using recount2. Nat Biotechnol. 2017;35(4):319–21. 10.1038/nbt.3838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Blei DM, Kucukelbir A, McAuliffe JD. Variational Inference: A Review for Statisticians. arXiv. 2017. 10.48550/arXiv.1601.00670.

[CR33] 33.Jaakkola TS, Jordan MI. Bayesian parameter estimation via variational methods. Stat Comput. 2000;10(1):25–37. 10.1023/A:1008932416310. [Google Scholar]

[CR34] 34.Seeger M, Bouchard G. Fast Variational Bayesian Inference for Non-Conjugate Matrix Factorization Models. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. PMLR; 2012. pp. 1012–8. https://proceedings.mlr.press/v22/seeger12.html.

[CR35] 35.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. MOTL: enhancing multi-omics matrix factorization with transfer learning. Github. 2025. https://github.com/david-hirst/MOTL.

[CR36] 36.Ozisik O, Térézol M, Baudot A. orsum: a Python package for filtering and comparing enrichment analyses using a simple principle. BMC Bioinformatics. 2022;23(1):293. 10.1186/s12859-022-04828-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Bunne C, Schiebinger G, Krause A, Regev A, Cuturi M. Optimal transport for single-cell and spatial omics. Nat Rev Methods Prim. 2024;4(1):58. 10.1038/s43586-024-00334-2. [Google Scholar]

[CR38] 38.Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans Neural Netw. 2011;22(2):199–210. 10.1109/TNN.2010.2091281. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2024;52(D1):D1333–46. 10.1093/nar/gkad1005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, et al. Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(D1):D940–6. 10.1093/nar/gkr972. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Rohart F, Gautier B, Singh A, Cao KAL. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol. 2017;13(11):e1005752. 10.1371/journal.pcbi.1005752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Mo Q, Shen R, Guo C, Vannucci M, Chan KS, Hilsenbeck SG. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics (Oxford, England). 2018;19(1):71–86. 10.1093/biostatistics/kxx017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21(1):111. 10.1186/s13059-020-02015-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Grimmer J. An Introduction to Bayesian Inference via Variational Approximations. Political Anal. 2011;19(1):32–47. [Google Scholar]

[CR45] 45.Fox CW, Roberts SJ. A tutorial on variational Bayesian inference. Artif Intell Rev. 2012;38(2):85–95. 10.1007/s10462-011-9236-8. [Google Scholar]

[CR46] 46.Silva TC, Colaprico A, Olsen C, D’Angelo F, Bontempi G, Ceccarelli M, et al. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research. 2016;5:1542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Hutter C, Zenklusen JC. The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell. 2018;173(2):283–5. 10.1016/j.cell.2018.03.042. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Mounir M, Lucchetta M, Silva TC, Olsen C, Bontempi G, Chen X, et al. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput Biol. 2019;15(3):e1006701. 10.1371/journal.pcbi.1006701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Zhou W, Triche TJ Jr, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018;46(20):e123. 10.1093/nar/gky691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11(1):587. 10.1186/1471-2105-11-587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Lee AJ, Park Y, Doing G, Hogan DA, Greene CS. Correcting for experiment-specific variability in expression compendia can remove underlying signals. GigaScience. 2020;9(11). 10.1093/gigascience/giaa117. [DOI] [PMC free article] [PubMed]

[CR53] 53.McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. 2020. 10.48550/arXiv.1802.03426.

[CR54] 54.Saelens W, Cannoodt R, Saeys Y. A comprehensive evaluation of module detection methods for gene expression data. Nat Commun. 2018;9(1):1090. 10.1038/s41467-018-03424-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. Statistics and Computing. New York: Springer; 2000. [Google Scholar]

[CR56] 56.Searle SR, Speed FM, Milliken GA. Population Marginal Means in the Linear Model: An Alternative to Least Squares Means. Am Stat. 1980;34(4):216–21. [Google Scholar]

[CR57] 57.Sergushichev AA. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv. 2016. 10.1101/060012. Accessed 30 Mar 2021.

[CR58] 58.Santamarina-Ojeda P, Tejedor JR, Pérez RF, López V, Roberti A, Mangas C, et al. Multi-omic integration of DNA methylation and gene expression data reveals molecular vulnerabilities in glioblastoma. Mol Oncol. 2023;17(9):1726–43. 10.1002/1878-0261.13479. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. david-hirst/MOTL: MOTL: enhancing multi-omics matrix factorization with transfer learning (v1.0.0). Zenodo. 2025. 10.5281/zenodo.15486697.

[CR60] 60.Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a Shared Vision for Cancer Genomic Data. N Engl J Med. 2016;375(12):1109–12. 10.1056/nejmp1607591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Santamarina-Ojeda P, Tejedor JR, Pérez RF, López V, Roberti A, Mangas C, et al. Multi-omic integration of DNA methylation and gene expression data reveals molecular vulnerabilities in glioblastoma (processed data) (v1.0) [Data set]. Zenodo. 2023. 10.5281/zenodo.7380252. [DOI] [PMC free article] [PubMed]

[CR62] 62.Hirst DP, Térézol M, Cantini L, Villoutreix P, Vignes M, Baudot A. MOTL: enhancing multi-omics matrix factorization with transfer learning [Data set]. Zenodo. 2024. 10.5281/zenodo.10848217.

PERMALINK

MOTL: enhancing multi-omics matrix factorization with transfer learning

David P Hirst

Morgane Térézol

Laura Cantini

Paul Villoutreix

Matthieu Vignes

Anaïs Baudot

Abstract

Supplementary Information

Background

Results

MOTL: a new transfer learning framework for multi-omics matrix factorization

Fig. 1.

Evaluation protocol using simulated multi-omics data

Fig. 2.

Evaluation protocol using TCGA multi-omics data

Fig. 3.

Application of MOTL to glioblastoma

Fig. 4.

Discussion

Conclusions

Methods

Mathematical notation

The MOFA model

Multi-omics data simulated with groundtruth factors

TCGA multi-omics data acquisition and pre-processing

Glioblastoma target dataset acquisition and pre-processing

Application of MOFA to simulated, TCGA, and glioblastoma multi-omics datasets

Application of MOTL to simulated, TCGA, and glioblastoma multi-omics datasets

Evaluation methods

Processing time

Supplementary Information

Acknowledgements

Peer review information

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases