Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Oct 28;25(21):11559. doi: 10.3390/ijms252111559

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 by the authors.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Machine learning-based plasma lipid discovery for breast cancer detection. Matched plasma from the same sample set as Figure 2A (n = 256) was used, and the same machine learning biomarker discovery pipeline as Figure 3B was used for plasma signature panel identification and predictive model development. (A) Average prediction of each model for individual samples across 2000 runs of LGOCV. (B) Lipids that were consistently selected as being important by the Boruta algorithm across all runs. The cutoff between the top 20 and the remaining 10 lipids is indicated with a dotted line. Lipids from the EV23 panel are indicated with red text and bars. (C–F) Results using the top 20 lipids from (B) as variables and using the (C) indicated models or (D–F) the ensemble model, trained using LGOCV (20% test, 80% train) and repeated 2000 times. (C) Test performance summary of the three models with the highest sensitivity. (D) Boxplots with interquartile range are indicated, representing the distribution of performance metrics. (E) Average ROC curve and AUC. (F) Certainty level of predictions. High: complete model agreement, medium: greater than 80% model agreement, low: less than 80% model agreement. Proportion (%) of high, medium, and low predictions are indicated. (G) Sensitivity analysis on the plasma ensemble model with varying numbers of lipids. The violin plots represent the distribution of the ensemble model accuracy such that the top 14 to 30 lipids were selected based on (B). Horizontal lines within each violin represent the 0.05, 0.5, and 0.95 quantiles for prediction accuracy. The signature size with the best accuracy and the fewest lipids is indicated by a pink density curve. LID, lipid identifier.