Table 1.
Recent ML tools and applications in various aspects of translational medicine with the key results and challenges faced by each application of ML.
| Category | Application | ML technique(s) | Key result(s) or Advantages(s) | Challenge(s) |
|---|---|---|---|---|
| Drug discovery | Designing chemical compounds (retrosynthetic process) | Deep neural networks (DNNs) and Monte Carlo tree search [9] | 30× quicker than traditional computer-aided methods [70] |
|
| Designing chemical compounds (de novo drug design) | Deep recurrent neural network (RNN) [6] | Generate isofunctional, new chemical entities | Appropriate predicted bioactivity which has been validated is required | |
| Generative deep learning (based on RNNs) [7] | Does not require similarity searching or external scoring and new molecular structures are generated immediately | User has to make a decision on when training should be stopped | ||
| Reinforcement Learning for Structural Evolution (ReLeaSE - 2 DNNs, generative and predictive) [8] | Simpler to use compared to traditional methods | Only available for a single-task regime - development to extend to optimise several target properties together is required | ||
| Drug screening | Random Forest and ChemVec [11] | Highest accuracy when compared to 3 other algorithms |
|
|
| Imaging | Cell microscopy and histopathology | Bayesian matrix factorisation method, Macau [25] | Predictive performance comparable with that of DNNs |
|
| Gradient Boosting [27] | Reduction of disturbances to the cells, making sample preparation quicker and cheaper | Deep learning techniques should be tried to improve the model | ||
| Defining relationships between morphology and genomic features | Inception v3 (based on convolutional neural networks) [22] | Capable of distinguishing between 3 types of histopathological images, predicting mutational status of 6 genes | Current data may not fully represent the heterogeneity of tissues | |
| Genomic medicine | Biomarker discovery | Elastic net regression [33] | Identification of BRAF and NRAS mutations in cell lines, were among the top predictors of drug sensitivity for a MEK inhibitor | Technique does not allow for the comparison between drugs |
| Unsupervised hierarchical clustering (part of ACME analysis) [30] | Identified associations between BRAF mutant cell lines of the skin lineage being sensitive to the MEK inhibitor |
|
||
| Spectral clustering by Similarity Network Fusion (SNF) [34] | Identification of new tumour subtypes by utilising mRNA and methylation signatures | Prospective studies required to determine accuracy | ||
| Integrating different modalities of data | iCluster [44] | Identified potentially novel subtypes of breast and lung cancers on top of subgroups characterised by concordant DNA copy number alterations and gene expression in an automated way | Only focuses on array data | |
| Kernel Learning Integrative Clustering (KLIC) [46] | Compared to Cluster-Of-Cluster Analysis (COCA), KLIC adds more detailed information about data from each dataset into the last clustering step and is able to merge datasets having various levels of noise, giving more weight to more significant ones | Only tested on simulated datasets | ||
| Spectral clustering by SNF [43] | Identification of new medulloblastoma subtypes |
|
||
| Affinity Network Fusion (ANF) and semi-supervised learning [47] | Performs similarly or better when compared to SNF, less computationally demanding, generalises better | Results on four cancer types only (known disease types) and not yet validated on additional experimental data | ||
| Clusternomics [48] | Outperforms existing methods and derived clusters with clinical meaning and significant differences in survival outcomes when tested on real-world data [44,[49], [50], [51]] | Comparison of performance to other methods on real-world data |