Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Apr 1;118(14):e2023141118. doi: 10.1073/pnas.2023141118

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2021 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

PMC Copyright notice

Fig. 4. — Cell type- and tissue-specific selection on TCRs. (A) Jensen–Shannon divergences ( $D_{J S}$ ) (Eq. 8) computed from models trained on different subrepertoires are shown. (B) Difference in the marginal probability for amino acid composition along the CDR3, $P_{post}^{CD 8} (a) - P_{post}^{CD 4} (a)$ , between CD8⁺ and CD4⁺ Tconv (Left) and the mean difference in the corresponding log-selection factors for amino acid usage $Δ \log Q = \log Q^{CD 8} - \log Q^{CD 4}$ (Right) are shown (the mean is taken over the distribution $(P_{post}^{CD 8} + P_{post}^{CD 4}) / 2$ ). The negatively charged amino acids (Aspartate, D, and Glutamate, E) and the positively charged amino acids (Lysine, K, and Arginine, R) are indicated in red and blue, respectively. Other amino acids are shown in gray. (C) Maximum likelihood inference of the fraction of CD8⁺ TCRs in mixed repertoires of conventional CD4⁺ T cells (Tconvs) and CD8⁺ cells from spleen (Eq. 4) is shown. Each repertoire comprises $5 \times 1 0^{3}$ unique TCRs. (D) Same as C but for a mixture of Tconv and Treg TCRs. (E) Mean squared error of the inferred sample fraction from C as a function of sample size $N$ , averaged over all fractions, using models of increasing complexity: “ $Q_{V J L}$ ” is a linear model with only features for CDR3 length and VJ usage, “linear” is linear SONIA model, and “deep” is the full soNNia model (Fig. 1C). (F) ROC for classifying individual sequences coming from CD8⁺ cells or from CD4⁺ Tconvs from spleen, using the log-likelihood ratios. Curves are generated by varying the threshold in Eq. 5. The accuracy of the classifier is compared with a traditional logistic classifier inferred on the same set of features as our selection models. The training set for the logistic classifier has $N = 3 \times 1 0^{5}$ Tconv CD4⁺ and $N = 8.7 \times 1 0^{4}$ CD8⁺ TCRs, and the test set has $N = 2 \times 1 0^{4}$ CD4⁺ and $N = 2 \times 1 0^{4}$ CD8⁺ TCR sequences.