Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

[Preprint]. 2024 Apr 26:2024.04.22.590547. [Version 1] doi: 10.1101/2024.04.22.590547

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

PMC Copyright notice

Fig 3. — A. The analysis workflow for data QC is depicted in three main steps: Step 1. Handling missing values: Proteins with missing data in more than 50% of the samples were removed, adjusting for sample loading differences through ratio calculation and log2 transformation. This yielded 9,734 proteins across 280 samples. Step 2. Identification and removal of outliers: Iterative principal component analysis (PCA) was utilized to detect and eliminate sample outliers. Following three rounds of PCA, two outliers were removed, resulting in 9,734 proteins across 278 samples. Step 3. Batch effect removal: Regression was applied to mitigate batch effects for the 9,734 proteins in 278 samples. B and C. Analysis of Multidimensional Scaling (MDS) plots: MDS plots depict sample variation (B) before batch correction and (C) after regression for batch effect. Emory (red) and Mayo (green) samples form distinctive clusters, with some scattering observed among samples before batch regression (B). (C) demonstrates the impact of batch regression, revealing a more cohesive grouping of Emory (red) and Mayo (green) samples. The correction effectively reduces the dispersion observed in panel B. D and E. Variance partition analysis for proteomic samples: Violin plots (D) before and (E) after batch correction show the distribution of explained variances in overall proteomic values. Panel D’s Y-axis represents the percentage of explained variance, while the X-axis includes factors like age, sex, race, diagnosis, residuals, and batch. Similar to Fig 2.D, batch variance revealed a high impact on the proteomic profile before correction. Panel E displays the same factors after batch correction, demonstrating a substantial reduction in variance associated with batch. In addition, after batch correction, age, sex, race, AD diagnosis, and other individual characteristics (residuals) remain influential factors shaping protein abundance patterns. Each data point represents a unique protein, with the corresponding protein names provided adjacent to the top points.

This highlights the success of the regression analysis in eliminating batch-related variability from the proteomic data.