TABLE 2.
Examples of metabolomics studies utilizing ML algorithms.
# | Author | Journal | Publication year | Area of investigation | ML algorithm used | Brief description | Findings | Doi |
---|---|---|---|---|---|---|---|---|
1 | Shen et al | Cell | 2020 | COVID-19 | Random Forest | Identification of severe COVID-19 cases based on molecular signatures of proteins and metabolites | Severity identification was conducted on 18 non-severe and 13 severe patients. Identified 29 important variables (22 proteins, 7 metabolites) - > Incorrect classification of 1 patient | doi: 10.1016/j.cell. 2020.05.032. Epub 2020 May 28. PMID: 32492406; PMCID: PMC7254001 |
Model was tested on an independent cohort of 10 patients - > all severe patients correctly identified except 1 | ||||||||
2 | Han et al | Nature | 2021 | Human gut microbiota | Random Forest | Identification of distinct metabolites to differentiate between different taxonomic groups | The model revealed subsets of chemical features that are highly conserved and predictive of taxonomic identification | doi: 10.1038/s41586-021-03707-9. Epub 2021 Jul 14. PMID: 34262212; PMCID: PMC8939302 |
e.g., over-representation of amino acid metabolism | ||||||||
3 | Liang et al | Cell | 2020 | Human pregnancy metabolome | Linear regression | Untargeted metabolomic profiling and identification of metabolic changes in human pregnancy | Detection of many of the previously reported pregnancy-associated metabolite profiles | doi: 10.1016/j.cell. 2020.05.002. PMID: 32589958; PMCID: PMC7327522 |
>95% of the pregnancy associated metabolites are previously unreported | ||||||||
4 | Hogan et al | EBioMedicine | 2021 | Influenza | Gradient boosted decision trees and random forest | Untargeted metabolomics approach for diagnosis of influenza infection | Untargeted metabolomics identified 3,318 ion features for further investigation | doi: 10.1016/j.ebiom. 2021.103546. Epub 2021 Aug 19. PMID: 34419924; PMCID: PMC8385175 |
Described LC/Q-TOF method in conjunction with machine learning model was able to differentiate between influenza samples (pos/neg) with sensitivity and specificity over 0.9 | ||||||||
5 | Bifarin et al | J Proteome Res | 2021 | Renal Cell Carcinoma | Partial Least Squares | A 10-metabolite panel predicted Renal Cell Carcinoma within the test cohort with 88% accuracy | A total of 7,147 metabolites were narrowed down to a series of 10 and tested with 4 ML algorithms all of which were able to correctly identify RCC status with high accuracy in the test cohort | doi: 10.1021/acs.jproteome.1c00213. Epub 2021 Jun 23. PMID: 34161092 |
Random Forest Recursive feature elimination | ||||||||
K-NN | ||||||||
6 | Tiedt et al | Ann Neurology | 2020 | Ischemic Stroke | Random Forest classification | Identified 4 metabolites showing high accuracy in differentiating between Ischemic stroke and Stroke Mimics | Levels of 41 metabolites showed significant association with Ischemic stroke compared to controls. Top 4 metabolites show high accuracy in differentiating between stroke and mimics | https://doi.org/10.1002/ana.25859 |
Linear discriminant analysis | ||||||||
logistic regression | ||||||||
K-NN | ||||||||
naive Bayes | ||||||||
SVM | ||||||||
7 | Liu et al | Mol Metabolite | 2021 | Diabetic kidney disease | Linear discriminant analysis | Serum integrative omics provide stable and accurate biomarkers for early warning and diagnosis of Diabetic Kidney Disease | combination of a2-macroglobulin, cathepsin D, and CD324 could serve as a surrogate protein biomarker using 4 different ML methods | doi: 10.1016/j.molmet. 2021.101,367. Epub 2021 Nov 1. PMID: 34737094; PMCID: PMC8609166 |
SVM | ||||||||
Random Forest | ||||||||
Logistic regression | ||||||||
8 | Oh et al | Cell Metab | 2020 | Cirrhosis | Random Forest | Comparison of the dysregulation between gut microbiome in differentiating between advanced fibrosis and cirrhosis | Identified a core set of gut microbiome that could be used as universal non-invasive test for cirrhosis | doi: 10.1016/j.cmet. 2020.06.005. PMID: 32610095; PMCID: PMC7822714 |
9 | Delafiori et al | Anal Chem | 2021 | COVID-19 | ADA tree boosting | Combine ML with mass spectrometry to differentiate between COVID-19 in plasma samples within minutes | Diagnosis can be derived from raw data with diagnosis specificity 96%, sensitivity 83% | doi: 10.1021/acs.analchem.0c04497. Epub 2021 Jan 20. PMID: 33471512; PMCID: PMC8023531 |
Gradient tree boosting | ||||||||
Random forest | ||||||||
partial least squares | ||||||||
SVM | ||||||||
10 | Jung et al | Biomed Pharmacother | 2021 | Coronary artery disease | Logistic regression | 10-year risk prediction model based on 5 selected serum metabolites | provided initial evidence that blood xanthine and uric acid levels play different roles in the development of machine learning models for primary/secondary prevention or diagnosis of CAD. Purine-related metabolites in blood are applicable to machine learning model development for CAD risk prediction and diagnosis | doi: 10.1016/j.biopha. 2021.111,621. Epub 2021 May 10. PMID: 34243599 |
11 | Wallace et al | J Pathol | 2020 | Cancer | Linear discriminant analysis | Comparison between metabolic profile of tumor patients and the predictive ability of machine learning algorithm to interpret metabolite data | Application of machine learning algorithms to metabolite profiles improved predictive ability for hard-to-interpret cases of head and neck paragangliomas (99.2%) | doi: 10.1002/path.5472. Epub 2020 Jul 1. PMID: 32462735; PMCID: PMC7548960 |
12 | Kouznetsova et al | Metabolomics | 2019 | Bladder cancer | Logistic regression | Elucidate the biomarkers including metabolites and corresponding genes for different stages of Bladder cancer, show their distinguishing and common features, and create a machine-learning model for classification of stages of Bladder cancer | The best performing model was able to predict metabolite class with an accuracy of 82.54%. The same model was applied to three separate sets of metabolites obtained from public sources, one set of the late-stage metabolites and two sets of the early-stage metabolites. The model was better at predicting early-stage metabolites with accuracies of 72% (18/25) and 95% (19/20) on the early sets, and an accuracy of 65.45% (36/55) on the late-stage metabolite set. | doi: 10.1007/s11306-019-1,555-9. PMID: 31222577 |
13 | Murata et al | Breast Cancer Res Treat | 2019 | Breast Cancer | Multiple logistic regression | Combinations of salivary metabolomics and machine learning methods show potential for non-invasive screening of breast cancer | Polyamines were identified to be significantly elevated in saliva of breast cancer patients | doi: 10.1007/s10549-019-05330-9. Epub 2019 Jul 8. PMID: 31286302 |
14 | Liu et al | BMC Genomics | 2016 | Major Depressive Disorder | SVM | Identifying the metabolomics signature of major depressive disorder subtypes | ||
Random Forest | ∼80% accuracy in classification of melancholic depression | |||||||
doi: 10.1186/s12864-016-2,953-2. PMID: 27549765; PMCID: PMC4994306 | ||||||||