Trimer-guided embedding for TCRs and derivation of RFUs
(A) Method workflow. The first 3 steps describe the trimer-embedding Euclidean space, and the last two steps describe how repertoire functional units (RFUs) are defined.
(B) Massive clustering of TCRs from patients with diverse health conditions based on CDR3 amino acid sequence similarity.
(C) Illustration of replaceable trimers from small TCR clusters.
(D) Illustration of the trimer substitution matrix with each number represents the times a row trimer is replaced by the column trimer in a TCR cluster.
(E) Derivation of approximately isometric embedding for each trimer based on multidimensional scaling from the trimer substitution matrix in (D).
(F) Representation of each CDR3 sequence in the high-dimensional Euclidean space by averaging all the consecutive trimers.
(G) RFU definition by pooling 1.2 million TCRs from 120 individuals shown as t-distributed stochastic neighbor embedding plot. Colors denote distinct clusters with cluster centroids assigned by k-means.
See also Figure S1 and Tables S1 and S2.