Skip to main content
. 2022 Apr 23;23(3):bbac133. doi: 10.1093/bib/bbac133

Table 1.

Transfer learning, tools and techniques

Name/acronym, reference Source domain Target domain Input of the predictors Output of the predictors Transfer method; regression or classification task? Availability, advantages and disadvantages (results/accomplishments)
Semisupervised transfer learning [9] Application-area-specific mouse phenotype-outcome-labeled gene expression data Human gene expression data Human gene expression data Human phenotype data (and subsequently DEGs and enriched pathways inferred from these) Transductive: supervised modeling (mouse) amended iteratively by semi-supervised retraining (adding unlabeled human data); classification task Matlab code available from www.mathworks.com/matlabcentral/fileexchange/69718-semisupervised-learning-functions. Compared favorably in various metrics to different machine learning methods like kNN, SVM and RF
XGSEA [18] GO (or similar) gene sets and enrichment scores, e.g. from mouse or zebrafish GO (or similar) gene sets and enrichment scores, e.g. from human Gene expression data from source species used to calculate enrichment scores Gene sets significantly associated in target species Transductive: domain adaptation followed by prediction of significantly associated gene sets; regression task: logistic on P-values, linear on enrichment scores or linear on positive and negative enrichment scores separately Code available at https://github.com/LiminLi-xjtu/XGSEA Compared favorably in various metrics to three naïve methods also proposed in the paper. XGSEA produced a smaller but more focused list of significant GO terms in the reported case study than the best performing naïve method. Depending on the needs of a study this could be an advantage or disadvantage to further interpretation
FIT [19] Precompiled datasets of mouse gene expression Precompiled datasets of human gene expression Mouse gene expression Human gene expression for matching condition, genes with high effect size Unsupervised (dimensionality reduction): gene-level lasso regression; follow-up classification task to identify high-effect genes Available at http://www.mouse2man.org; including pre-test for transferability; compared favorably to predictions based only on mouse data
Translatable components regression (TransComp-R) [20] Human gene expression data (pretreatment), human drug response data Mouse proteomics data Human gene expression (pretreatment) and drug response data (the latter are given, not to be predicted) Mouse proteins (and corresponding pathway enrichments) with association to human drug response Unsupervised (feature representation): PCA-based regression Matlab code available from https://de.mathworks.com/matlabcentral/fileexchange/77987-transcompr. Experimental verification of a gene predicted to be involved in resistance to treatment; apparently no other benchmarking
Pathway RespOnsive GENes (PROGENy) [21] and Discriminant Regulon Expression Analysis (DoRothEA) [22] Two curated resources of footprint pathway perturbations (PROGENy), and another of footprint regulons (transcription factor—target interactions in DoRothEA) from human data, and human–mouse orthologs The mouse equivalent of the source Mouse gene expression data Mouse pathway activity (PROGENy) or transcription factor activity and enrichment (DoRothEA) Transductive: supervised prediction of mouse pathways (PROGENy) and regulons (DoRothEA); regression task Both tools are available as R (Bioconductor) and python packages; for usage examples see https://github.com/saezlab/transcriptutorial; no benchmarking is described by the authors
Adversarial Inductive Transfer Learning (AITL) [12] In vitro (cell line) gene expression and quantitative outcome (IC50) data In vivo (patient) gene expression and qualitative outcome (yes/no) data In vitro gene expression data (GDSC) In vivo outcomes (TCGA) Inductive: adversarial domain adaptation and multi-task learning (predicting outcomes for both source and target) using deep neural nets; classification task in the target domain Code available at https://github.com/hosseinshn/AITL; performance benchmarked against six other methods (see main text) and found to perform best
Patient Response Estimation Corrected by Interpolation of Subspace Embeddings (PRECISE) [24] Gene expression data from preclinical models (cell lines, patient-derived xenografts) and drug response Human gene expression data Human gene expression data Human drug response Transductive: similarity-based identification of shared mechanisms between large datasets from preclinical models and a small number of human samples, focused on cancer; regression task Available as python package,; example protocols provided as Jupyter notebooks; see https://github.com/NKI-CCB/PRECISE; outperforming two state-of-the-art approaches (ridge regression on either the raw or ComBat corrected gene expression data) on retrieving associations between known biomarkers and drug responses
Transfer variational autoencoder, trVAE [30] Gene expression data (cell line) or image data (or similar) under a specific (first) condition Gene expression data or image data (or similar) under a different (second) condition Data under the first condition and a label specifying the second condition Data transformed to the second condition Transductive: based on an autoencoder neural net; regression-like task when applied to expression data Available from https://github.com/theislab/trvae_reproducibility; benchmarked against six other tools (see main text) and found to perform best
MultiPlier [15] Preprocessed disease-related datasets of human gene expression, highlighting LVs (characteristic patterns of correlated genes) Human (rare disease) gene expression data Human (rare disease) gene expression data Characteristic expression patterns of correlated genes Unsupervised (feature representation): constrained matrix factorization highlighting LVs, then projection of input into latent space; neither regression nor classification PLIER is available at https://github.com/wgmao/PLIER; MultiPlier is available from https://github.com/greenelab/multi-plier with a summary of additional dependencies also described in the accompanying paper. A docker image is provided to reproduce the analyses; no benchmarking is described by the authors