Skip to main content
. 2022 Nov 24;13:1017340. doi: 10.3389/fgene.2022.1017340

TABLE 2.

Examples of metabolomics studies utilizing ML algorithms.

# Author Journal Publication year Area of investigation ML algorithm used Brief description Findings Doi
1 Shen et al Cell 2020 COVID-19 Random Forest Identification of severe COVID-19 cases based on molecular signatures of proteins and metabolites Severity identification was conducted on 18 non-severe and 13 severe patients. Identified 29 important variables (22 proteins, 7 metabolites) - > Incorrect classification of 1 patient doi: 10.1016/j.cell. 2020.05.032. Epub 2020 May 28. PMID: 32492406; PMCID: PMC7254001
Model was tested on an independent cohort of 10 patients - > all severe patients correctly identified except 1
2 Han et al Nature 2021 Human gut microbiota Random Forest Identification of distinct metabolites to differentiate between different taxonomic groups The model revealed subsets of chemical features that are highly conserved and predictive of taxonomic identification doi: 10.1038/s41586-021-03707-9. Epub 2021 Jul 14. PMID: 34262212; PMCID: PMC8939302
e.g., over-representation of amino acid metabolism
3 Liang et al Cell 2020 Human pregnancy metabolome Linear regression Untargeted metabolomic profiling and identification of metabolic changes in human pregnancy Detection of many of the previously reported pregnancy-associated metabolite profiles doi: 10.1016/j.cell. 2020.05.002. PMID: 32589958; PMCID: PMC7327522
>95% of the pregnancy associated metabolites are previously unreported
4 Hogan et al EBioMedicine 2021 Influenza Gradient boosted decision trees and random forest Untargeted metabolomics approach for diagnosis of influenza infection Untargeted metabolomics identified 3,318 ion features for further investigation doi: 10.1016/j.ebiom. 2021.103546. Epub 2021 Aug 19. PMID: 34419924; PMCID: PMC8385175
Described LC/Q-TOF method in conjunction with machine learning model was able to differentiate between influenza samples (pos/neg) with sensitivity and specificity over 0.9
5 Bifarin et al J Proteome Res 2021 Renal Cell Carcinoma Partial Least Squares A 10-metabolite panel predicted Renal Cell Carcinoma within the test cohort with 88% accuracy A total of 7,147 metabolites were narrowed down to a series of 10 and tested with 4 ML algorithms all of which were able to correctly identify RCC status with high accuracy in the test cohort doi: 10.1021/acs.jproteome.1c00213. Epub 2021 Jun 23. PMID: 34161092
Random Forest Recursive feature elimination
K-NN
6 Tiedt et al Ann Neurology 2020 Ischemic Stroke Random Forest classification Identified 4 metabolites showing high accuracy in differentiating between Ischemic stroke and Stroke Mimics Levels of 41 metabolites showed significant association with Ischemic stroke compared to controls. Top 4 metabolites show high accuracy in differentiating between stroke and mimics https://doi.org/10.1002/ana.25859
Linear discriminant analysis
logistic regression
K-NN
naive Bayes
SVM
7 Liu et al Mol Metabolite 2021 Diabetic kidney disease Linear discriminant analysis Serum integrative omics provide stable and accurate biomarkers for early warning and diagnosis of Diabetic Kidney Disease combination of a2-macroglobulin, cathepsin D, and CD324 could serve as a surrogate protein biomarker using 4 different ML methods doi: 10.1016/j.molmet. 2021.101,367. Epub 2021 Nov 1. PMID: 34737094; PMCID: PMC8609166
SVM
Random Forest
Logistic regression
8 Oh et al Cell Metab 2020 Cirrhosis Random Forest Comparison of the dysregulation between gut microbiome in differentiating between advanced fibrosis and cirrhosis Identified a core set of gut microbiome that could be used as universal non-invasive test for cirrhosis doi: 10.1016/j.cmet. 2020.06.005. PMID: 32610095; PMCID: PMC7822714
9 Delafiori et al Anal Chem 2021 COVID-19 ADA tree boosting Combine ML with mass spectrometry to differentiate between COVID-19 in plasma samples within minutes Diagnosis can be derived from raw data with diagnosis specificity 96%, sensitivity 83% doi: 10.1021/acs.analchem.0c04497. Epub 2021 Jan 20. PMID: 33471512; PMCID: PMC8023531
Gradient tree boosting
Random forest
partial least squares
SVM
10 Jung et al Biomed Pharmacother 2021 Coronary artery disease Logistic regression 10-year risk prediction model based on 5 selected serum metabolites provided initial evidence that blood xanthine and uric acid levels play different roles in the development of machine learning models for primary/secondary prevention or diagnosis of CAD. Purine-related metabolites in blood are applicable to machine learning model development for CAD risk prediction and diagnosis doi: 10.1016/j.biopha. 2021.111,621. Epub 2021 May 10. PMID: 34243599
11 Wallace et al J Pathol 2020 Cancer Linear discriminant analysis Comparison between metabolic profile of tumor patients and the predictive ability of machine learning algorithm to interpret metabolite data Application of machine learning algorithms to metabolite profiles improved predictive ability for hard-to-interpret cases of head and neck paragangliomas (99.2%) doi: 10.1002/path.5472. Epub 2020 Jul 1. PMID: 32462735; PMCID: PMC7548960
12 Kouznetsova et al Metabolomics 2019 Bladder cancer Logistic regression Elucidate the biomarkers including metabolites and corresponding genes for different stages of Bladder cancer, show their distinguishing and common features, and create a machine-learning model for classification of stages of Bladder cancer The best performing model was able to predict metabolite class with an accuracy of 82.54%. The same model was applied to three separate sets of metabolites obtained from public sources, one set of the late-stage metabolites and two sets of the early-stage metabolites. The model was better at predicting early-stage metabolites with accuracies of 72% (18/25) and 95% (19/20) on the early sets, and an accuracy of 65.45% (36/55) on the late-stage metabolite set. doi: 10.1007/s11306-019-1,555-9. PMID: 31222577
13 Murata et al Breast Cancer Res Treat 2019 Breast Cancer Multiple logistic regression Combinations of salivary metabolomics and machine learning methods show potential for non-invasive screening of breast cancer Polyamines were identified to be significantly elevated in saliva of breast cancer patients doi: 10.1007/s10549-019-05330-9. Epub 2019 Jul 8. PMID: 31286302
14 Liu et al BMC Genomics 2016 Major Depressive Disorder SVM Identifying the metabolomics signature of major depressive disorder subtypes
Random Forest ∼80% accuracy in classification of melancholic depression
doi: 10.1186/s12864-016-2,953-2. PMID: 27549765; PMCID: PMC4994306