Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

[Preprint]. 2024 Apr 26:2024.04.22.590547. [Version 1] doi: 10.1101/2024.04.22.590547

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

PMC Copyright notice

Fig 2. — A. The QC workflow is illustrated in the flowchart in 3 main steps: Step 1. Pre-processing for missing values: Only proteins with missing data in less than 50% of the samples were retained. The ratio of protein abundance to the total protein abundance for each sample was calculated to adjust for sample loading differences resulting in 9180 proteins being retained across 1105 samples. Subsequently, the data was log2 transformation Step 2. Outlier detection and removal: Iterative principal component analysis (PCA) was employed to identify and eliminate sample outliers. After multiple rounds of PCA analysis, 19 outliers were identified and removed, leaving 9180 proteins across 1086 samples. Step 3. Batch effect regression: Variance attributable to batching was mitigated through regression of the 9180 proteins in 1086 samples. B and C. Multidimensional scaling (MDS) plot showing variation among samples (B) before correcting for batch and (C) after regressing for batch effect. The plot dimensions (dim 1 and 2) reveal distinctive clusters formed by samples by site (Emory (red), Mount Sinai (blue), Rush (purple), and Mayo (green)), with some scattering observed among samples before regressing for batch effect (B). (C) The plot illustrates the successful removal of variance due to batch. After correcting for batch effects, samples from all four sites - Emory (red), Mount Sinai (blue), Rush (purple), and Mayo (green) - cluster together, indicating a more cohesive grouping (n.b the change in scale from B to C). The correction mitigates the dispersion observed in panel B, highlighting the effectiveness of the batch correction procedure in harmonizing the sample distribution across different data distribution sites. D and E. Variance partition analysis using experimental factors to evaluate the percentage of explained variance in proteomic samples. Violin plots before (D) and after (E) batch correction illustrate the distribution of explained variances in overall proteomic values. The Y-axis represents the percentage of explained variance, while the X-axis depicts factors contributing to variance, such as age, sex, race, diagnosis, residuals, and batch. Notably, batch variance is present before batch correction, influencing the overall proteomic profile. Panel E displays the same factors on the X-axis after batch correction. Significantly, the violin plot demonstrates a substantial reduction in variance associated with batch, ultimately reaching near zero percent after batch regression. Moreover, even after batch correction, factors such as age, sex, race, AD diagnosis, and other individual traits (residual) had levels of impact on protein abundance patterns. Each point on the violin plot represents a specific protein, with the corresponding name next to it. This underscores the efficacy of the correction procedure in eliminating batch-related variability from the proteomic data.