Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 May 9;35(8):109173. doi: 10.1016/j.celrep.2021.109173

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2021 The Author(s)

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Differential statistics of immune repertoires across cohorts

(A) The distribution of the log probability to observe a sequence $σ$ in the periphery $\log_{10} P_{post} (σ)$ is shown as a normalized probability density function (PDF) for inferred naive progenitors of clonal lineages in cohorts of healthy individuals and the mild, moderate, and severe cohorts of individuals with COVID-19. Full lines show distributions averaged over individuals (biological replicates; Data S1) in each cohort, and shading indicates regions containing one standard deviation of variation among individuals within a cohort.

(B) Clustering of cohorts based on their pairwise Jensen-Shannon divergences (D_JS) as a measure of differential selection on cohorts (STAR Methods).

(C) The bar graph shows how incorporating different features into a SONIA selection model contributes to the fractional D_JS between models trained on different cohorts. The error bars show the standard deviation of these estimates, using five independent sets of 100,000 generated BCRs for each selection model (STAR Methods).

(D–F) Logo plots show the expected differences in the log-selection factors for amino acid usage, $⟨ Δ \log Q_{cohort} (a) ⟩ = ⟨ \log Q_{cohort} (a) - \log Q_{healthy} (a) ⟩$ , for the (D) mild, (E) moderate, and (F) severe COVID-19 cohorts. The expectation values $⟨ \cdot ⟩$ are evaluated on the mixture distribution $\frac{1}{2} (P_{post}^{cohort} + P_{post}^{healthy})$ . Positively charged amino acids (lysine, K; arginine, R; and histidine, H) are shown in blue, and negatively charged amino acids (aspartate, D, and glutamate, E) are shown in red. All other amino acids are shown in gray. Positions along the HCDR3 are shown up to 10 residues starting from the 3′ (positive values) and 5′ ends (negative values).

(G) The bar graph shows the average mean difference between the log-selection factors for IGHV gene usage for the mild (green), moderate (yellow), and severe (red) COVID-19 cohorts, with the mean differences computed using the mixture distribution $\frac{1}{2} (P_{post}^{cohort} + P_{post}^{healthy})$ , and the average is taken over the 30 independently trained SONIA models for each cohort. Error bars show standard deviation of these estimates across the inferred SONIA models (STAR Methods).