. 2019 Aug 26;47:607–615. doi: 10.1016/j.ebiom.2019.08.027

Table 1.

Recent ML tools and applications in various aspects of translational medicine with the key results and challenges faced by each application of ML.

Category	Application	ML technique(s)	Key result(s) or Advantages(s)	Challenge(s)
Drug discovery	Designing chemical compounds (retrosynthetic process)	Deep neural networks (DNNs) and Monte Carlo tree search [9]	30× quicker than traditional computer-aided methods [70]	1. Scarcity of training data 2. Stronger, but slower-reasoning, algorithms should be developed for this application
	Designing chemical compounds (de novo drug design)	Deep recurrent neural network (RNN) [6]	Generate isofunctional, new chemical entities	Appropriate predicted bioactivity which has been validated is required
		Generative deep learning (based on RNNs) [7]	Does not require similarity searching or external scoring and new molecular structures are generated immediately	User has to make a decision on when training should be stopped
		Reinforcement Learning for Structural Evolution (ReLeaSE - 2 DNNs, generative and predictive) [8]	Simpler to use compared to traditional methods	Only available for a single-task regime - development to extend to optimise several target properties together is required
	Drug screening	Random Forest and ChemVec [11]	Highest accuracy when compared to 3 other algorithms	1. Improve feature representation using deep learning 2. Experimental validation required
Imaging	Cell microscopy and histopathology	Bayesian matrix factorisation method, Macau [25]	Predictive performance comparable with that of DNNs	1. Current results for this method are based on a single HTI screen 2. Requires an adequate sized library of compound for training the model
	Cell microscopy and histopathology	Gradient Boosting [27]	Reduction of disturbances to the cells, making sample preparation quicker and cheaper	Deep learning techniques should be tried to improve the model
	Defining relationships between morphology and genomic features	Inception v3 (based on convolutional neural networks) [22]	Capable of distinguishing between 3 types of histopathological images, predicting mutational status of 6 genes	Current data may not fully represent the heterogeneity of tissues
Genomic medicine	Biomarker discovery	Elastic net regression [33]	Identification of BRAF and NRAS mutations in cell lines, were among the top predictors of drug sensitivity for a MEK inhibitor	Technique does not allow for the comparison between drugs
		Unsupervised hierarchical clustering (part of ACME analysis) [30]	Identified associations between BRAF mutant cell lines of the skin lineage being sensitive to the MEK inhibitor	1. Distance metric and linkage criteria must be specified 2. Does not scale well
		Spectral clustering by Similarity Network Fusion (SNF) [34]	Identification of new tumour subtypes by utilising mRNA and methylation signatures	Prospective studies required to determine accuracy
	Integrating different modalities of data	iCluster [44]	Identified potentially novel subtypes of breast and lung cancers on top of subgroups characterised by concordant DNA copy number alterations and gene expression in an automated way	Only focuses on array data
		Kernel Learning Integrative Clustering (KLIC) [46]	Compared to Cluster-Of-Cluster Analysis (COCA), KLIC adds more detailed information about data from each dataset into the last clustering step and is able to merge datasets having various levels of noise, giving more weight to more significant ones	Only tested on simulated datasets
		Spectral clustering by SNF [43]	Identification of new medulloblastoma subtypes	1. Larger cohort size for validation 2. Current analysis of samples is bulk analysis
		Affinity Network Fusion (ANF) and semi-supervised learning [47]	Performs similarly or better when compared to SNF, less computationally demanding, generalises better	Results on four cancer types only (known disease types) and not yet validated on additional experimental data
		Clusternomics [48]	Outperforms existing methods and derived clusters with clinical meaning and significant differences in survival outcomes when tested on real-world data [44,[49], [50], [51]]	Comparison of performance to other methods on real-world data