Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Oct 7;121(42):e2408696121. doi: 10.1073/pnas.2408696121

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2024 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

PMC Copyright notice

Fig. 1. — Overview of analysis methodology. (A) Sketch of T cell receptor structure highlighting the V, CDR3, and J regions and their interaction with MHC-bound peptides. The TCR is composed of two chains, most commonly $α$ and $β$ chains. Each chain is generated by the process of V(D)J recombination during T cell development, which combines a V (variable), J (joining), and C (constant) gene, with the addition of a D (diversity) gene in the $β$ chain. Within each chain, the CDR1 and CDR2 amino acid loops are coded for by the V gene while the CDR3 regions are at the V(D)J intersection, which is additionally diversified through the random insertion and deletion of nucleotides at gene template junctions. (B) An abstracted view of TCR sequence space. The set B includes all possible TCRs. The subsets S_i represent TCRs specific to particular ligands. (C) Sequencing TCR from either the whole repertoire or epitope-specific subsets gives us samples from their respective distributions. (D) The number of pairs which match in a particular feature may then be recorded to compute a probability of coincidence. The logarithm of the probability of coincidence gives a measure of the entropy of the feature. Our information theoretic approach quantifies the change in entropy between background TCRs and sets of specific TCRs of different features (Top to Bottom). Features which experience a large reduction in entropy (Bottom) are the most informative for predicting the epitope specificity of a sequence.