Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Jun 3;113(25):E3529–E3537. doi: 10.1073/pnas.1601012113

Diversity and divergence of the glioma-infiltrating T-cell receptor repertoire

Jennifer S Sims a, Boris Grinshpun b, Yaping Feng b,c, Timothy H Ung a,d, Justin A Neira a, Jorge L Samanamud a, Peter Canoll e, Yufeng Shen b,f,g,1, Peter A Sims b,g,h,1, Jeffrey N Bruce a,1
PMCID: PMC4922177  PMID: 27261081

Significance

High-throughput sequencing of T-cell receptor (TCR) repertoires provides a high-dimensional biomarker for monitoring the immune system. We applied this approach, measuring the extent to which the TCR repertoires of T-cell populations infiltrating malignant brain tumors diverge from their peripheral blood. Our analytical strategy separates the statistical properties of the repertoire derived from VJ cassette combination usage from the VJ-independent contribution that reflects the antigen-binding component of the receptor. We discovered a TCR signature strongly inversely correlated with the VJ-independent divergence between the peripheral and tissue-infiltrating repertoires of these patients. Importantly, this signature is detectable in peripheral blood and could serve as a means of noninvasively monitoring immune response in patients.

Keywords: T-cell receptor, immunoprofiling, glioma, glioblastoma, immunooncology

Abstract

Although immune signaling has emerged as a defining feature of the glioma microenvironment, how the underlying structure of the glioma-infiltrating T-cell population differs from that of the blood from which it originates has been difficult to measure directly in patients. High-throughput sequencing of T-cell receptor (TCR) repertoires (TCRseq) provides a population-wide statistical description of how T cells respond to disease. We have defined immunophenotypes of whole repertoires based on TCRseq of the α- and β-chains from glioma tissue, nonneoplastic brain tissue, and peripheral blood from patients. Using information theory, we partitioned the diversity of these TCR repertoires into that from the distribution of VJ cassette combinations and diversity due to VJ-independent factors, such as selection due to antigen binding. Tumor-infiltrating lymphocytes (TILs) possessed higher VJ-independent diversity than nonneoplastic tissue, stratifying patients according to tumor grade. We found that the VJ-independent components of tumor-associated repertoires diverge more from their corresponding peripheral repertoires than T-cell populations in nonneoplastic brain tissue, particularly for low-grade gliomas. Finally, we identified a “signature” set of TCRs whose use in peripheral blood is associated with patients exhibiting low TIL divergence and is depleted in patients with highly divergent TIL repertoires. This signature is detectable in peripheral blood, and therefore accessible noninvasively. We anticipate that these immunophenotypes will be foundational to monitoring and predicting response to antiglioma vaccines and immunotherapy.


The potential for immunotherapy to alleviate progression and recurrence in glioma has inspired intense study of the immunological phenotypes underlying the regulation and selection of tissue-infiltrating lymphocytes (TILs). Motivated by the prospect of targeted therapy, previous studies of glioma have undertaken large-scale genomic and gene expression analysis. These studies revealed distinct phenotypic states or subtypes that stratify gliomas and resemble different glial lineages (13). Although immunological gene expression classifications have been associated with clinical outcomes and prognosis (4, 5), precision immunotherapy will ultimately rely on manipulation of the T-cell population that infiltrates gliomas and its underlying repertoire of T-cell receptors (TCRs). However, the structure and intertumoral heterogeneity of the TIL population in gliomas has not been described. Here, we use whole repertoire sequencing of TCRs from glioma tissue and matched peripheral blood to discover novel immunological phenotypes with implications for noninvasive patient monitoring.

Local antitumor potential is, in part, a function of the TCR repertoire expressed by T cells that surveil the CNS beyond the blood–brain barrier. Through somatic V(D)J recombination including random nucleotide insertion in each T-cell, the α- and β-chain loci each encode a complementarity-determining 3 (CDR3) domain that interacts directly with target epitopes in the heterodimeric receptor. The ability to mount an adaptive immune response relies on the diversity of binding specificities conferred by these receptors, which determine functional activation, clonal expansion, and selection for individual T cells. Methods for repertoire-wide amplification and deep sequencing now allow massively parallel repertoire profiling (6) of the TCRs in whole populations of T cells (TCRseq) (79). Besides providing unprecedented insight into the determinants of TCR diversity in healthy individuals, previous applications of this strategy have provided new understanding of xenoreactivity in transplants, infection in human patients and animal models, and therapeutic response, residual disease, and relapse in cancer (for a review, see ref. 10).

We applied TCRseq to peripheral blood and tumor tissue samples from both low- and high-grade glioma patients. By combining V and J cassette use and amino acid sequence information, we describe the diversity and divergence of these populations. We found previously undescribed phenotypes characterized by the diversity of clonotypes in the TIL repertoire and its divergence from the peripheral repertoire. Importantly, we also discovered a signature set of TCRs in the peripheral repertoire that reflect its divergence from the TIL repertoire. These immunological phenotypes defined by the TCR repertoire offer new insights into intertumoral heterogeneity among low- and high-grade glioma patients and may provide useful insights for noninvasive monitoring of glioma progression and response to immunotherapy.

Results

TCRseq of Tumor-Infiltrating and Peripheral Repertoires in Glioma Patients.

We prepared α- and β-chain TCRseq libraries from biopsies of glioma and nonneoplastic brain tissue along with peripheral blood mononuclear cells (PBMCs) procured during surgery to compare TILs with the peripheral population from which they were derived (Fig. 1A). Subjects included three low-grade glioma (LGG), eight primary glioblastoma (GBM), and three nonglioma patients from which we obtained nonneoplastic brain tissue (Table S1). Briefly, from the total RNA of each 2.5 million PBMC sample or the mRNA of each fresh-frozen tissue biopsy, we used reverse transcription amplicon-rescue multiplexed PCR (arm-PCR) to produce dsDNA sequencing libraries for TCRα and TCRβ (Fig. 1A) (7, 11, 12), which were subjected to 2 × 220- or 2 × 250-base paired-end sequencing. After mapping the merged reads to V and J cassette genes in the reference genome and in silico translation, we described each TCRα and TCRβ repertoire of the PBMC and TIL of each patient as (i) the repertoire of VJ cassette combinations, (ii) the repertoire of antigen-binding amino acid motifs encoded by the CDR3 region (CDR3 amino acid sequences), and (iii) the repertoire of clonotypes—which includes all observed unique combinations of CDR3 amino acid sequences and VJ cassette combinations. To avoid overcounting as a result of, for example, sequencing error, reads were filtered on the basis of their nucleotide similarity to other reads (SI Materials and Methods).

Fig. 1.

Fig. 1.

TCRα and TCRβ repertoires from brain tissue and matched peripheral blood. (A) Gene-specific reverse transcription and PCR of the CDR3 region using pan-repertoire primers (V-J/C) was performed on mRNA of tissue biopsies and total RNA of ∼1 × 106 T cells, followed by bead purification and second amplification incorporating sequencing adapters. Libraries were purified by gel electrophoresis (example extracted regions in green) and sequenced (PE250). Following error analysis and read filtration (SI Materials and Methods), merged reads were translated in silico, and productive reads were tabulated by V and J cassette identity, amino acid CDR3 motif, and the combination of VJ cassette combinations and amino acid CDR3 used to identify unique clonotypes. (B) Correlation between the abundance detected in two sequencing runs of a single TCRseq library (L06 TIL TCRβ) of VJ combinations and CDR3 motifs. (C) Correlation between the abundance in two aliquots of PBMC from the same draw (H15) of VJ cassette combinations and CDR3 motifs. Pearson correlation coefficients between VJ cassette combination abundance (R = 0.97 for TCRα and 0.97 for TCRβ) and CDR3 motif abundance (R = 0.64 for TCRα and 0.72 for TCRβ). (D) The number of clonotypes (the unique combination of a V and J cassette pair with a CDR3 amino acid motif) observed in the PBMC of patients. Pearson correlation between log10(clonotypes) of TCRα and TCRβ is shown (R = 0.83, P = 2.4 × 10−4). Label colors indicate clinical status (GBM, red; LGG, green; and nonneoplastic, black). (E) The Spearman correlation (ρ) of VJ combination frequencies between each TIL sample and its paired PBMC was calculated, and the median (ρ) of its correlation with all PBMCs was subtracted. Resulting values for each TIL library are displayed as a histogram. (F) Heat maps of the number of clonotypes (colorbars) occurring at frequencies (x axis) for TCRα (Left) and TCRβ (Right) chains of the PBMC and TIL of each patient (y axes). Unpopulated frequency bins are displayed in gray.

Table S1.

Clinical and histopathological classification of subjects

ID Pathology diagnosis Age, y M/F WBC DEX PTEN (IHC) p53 (IHC) EGFR (IHC) EGFR vIII IDH1 R132H MGMT methyl.
N01 Nonneoplastic (normal cortex) 79 M 6.2 0 N/A N/A N/A N/A N/A N/A
N02 Nonneoplastic (normal cortex) 45 M 9.5 0 N/A N/A N/A N/A N/A N/A
N03 Nonneoplastic (diffuse amyloid, consistent with aging) 80 M 8.4 0 N/A N/A N/A N/A N/A N/A
L04 Astrocytoma, grade II 33 F 4.5 0 (+) N/A N/A
L05 Oligodendroglioma, grade II 59 F 8.3 0 (+) (+) N/A + N/A
L06 Oligodendroglioma, grade II 25 M 15.8 5 (+) (+) + + +
G07 Glioblastoma, grade IV 67 M 17.2 4 (+) + + + +
G08 Glioblastoma, oligodendroglial component, grade IV 68 M 13.3 1 (+) (+) + + +
G09 Glioblastoma, grade IV 57 M 9.4 0 +/− + +
G10 Glioblastoma, oligodendroglial component, grade IV 36 M 16.7 0 (+) + + +
G11 Glioblastoma, grade IV 53 M 14.2 5 (+) + + N/A
G12 Glioblastoma, grade IV 78 F 18.2 1 (+) (+) + + +
G13 Glioblastoma, grade IV 53 M 11.4 1 (+) + N/A
G14 Glioblastoma, grade IV 60 F 12.1 1 (+) + + +
H15 None (healthy subject) 57 M N/A N/A PBMC only

DEX: Days on dexamethasone steroid therapy before surgery. Immunohistochemistry (IHC) and PCR genotyping were performed through the Department of Pathology at New York Presbyterian Hospital per diagnostic standards. EGFR (epidermal growth factor receptor) expression by IHC: + expression, (+) weak expression, − no expression. EGFRvIII (truncation mutation) by PCR: + mutation detected, − no vIII mutation detected. IDH1-R132H (isocitrate dehydrogenase R132H point mutation) by IHC: (+) mutation detected, (−) mutation not detected. M/F: male/female. MGMT (O6-methylguanine-DNA methyltransferase) gene promoter methylation by bisulfite DNA sequencing: + methylation detected, − methylation not detected. N/A, data not available. PTEN (phosphatase and tensin homolog) expression by IHC: + positive, − negative, (+) weak positive, +/− mixture. p53 (expression) by IHC: + positive, − negative, (+) weak positive. WBC: white blood cell count, performed by clinical laboratory at New York Presbyterian Hospital on the day of or day before surgery for glioma patients, up to 1 wk prior for nonneoplastic patients.

The quantitative reproducibility of the TCR repertoires detected by this platform was assessed first between sequencing runs of the same prepared library (Fig. 1B), yielding strong correlations for both VJ cassette combinations and CDR3 amino acid sequences, as expected, and in different aliquots of PBMC from the same blood draw, which were similarly well-correlated at the level of VJ cassette combinations. Because different PBMC aliquots will sample different TCRs, particularly for low-abundance clones, correlations between aliquots at the CDR3 amino acid sequence level were strong (Fig. 1C) but significantly lower than those of the technical replicates in Fig. 1B.

The total number of clonotypes observed was strongly correlated between TCRα and TCRβ across patients in PBMC libraries (Fig. 1D, Pearson R = 0.83, P = 2.4 × 10−4). Although most PBMC repertoires possessed at least one clonotype at a very high frequency (>5%, Dataset S1), our results agreed with previous observations in healthy individuals (13, 14) that the vast majority of clones appear at low frequencies (Fig. 1F)—only 19 or fewer clonotypes appeared at a frequency >1% in the PBMC of any patient—but the vast majority of reads comprised high-frequency clonotypes (Fig. S1 C and D). In the TIL repertoires, representing T cells infiltrating various regions of the brain parenchyma (Fig. S2A), the number of clonotypes detected was 2- to ∼100-fold lower than their respective PBMC libraries (Dataset S1 and Fig. S1B), with no significant correlation to sequencing coverage (Fig. S1A). Consistent with shared patient origins, VJ cassette combination use in TIL repertoires was more strongly correlated with that of their matching PBMC repertoires than with those of the overall cohort (Fig. 1E).

Fig. S1.

Fig. S1.

Distribution of clonotypes and sequencing read abundance. (A) For TIL repertoires (Top) and PBMC repertoires (Bottom), the relationship between number of reads and the number of unique clonotypes are plotted for TCRα (red) and TCRβ (blue), with Pearson correlations shown (TIL TCRα R = 0.21, P = 0.47 and TCRβ R = 0.18, P = 0.53; PBMC TCRα R = 0.53, P = 0.05 and TCRβ R = 0.42, P = 0.13). (B) The number of clonotypes (unique combinations of VJ and CDR3 amino acid motifs) observed in the TILs of patients. The Pearson correlation between the number of TCRα and TCRβ clonotypes across patients was (R = 0.96, P = 4.3 × 10−8), and that of log10(clonotypes) was (R = 0.63, P = 0.015, shown). Label colors indicate clinical status (GBM, red; LGG, green; and nonneoplastic, black). (C) Read abundance is distributed across clonotype frequencies; that is, although most unique clonotypes occur at the lowest frequencies (Fig. 1F), the lowest-frequency clonotypes do not comprise the largest fraction of the repertoire for either the PBMC or TIL repertoires. Heat maps of the fraction of reads from bins of clonotype frequencies (x axes) for TCRα (Left) and TCRβ (Right) chains of the PBMC and TIL of each patient (y axes). Unpopulated frequency bins are displayed in gray, and all others contain the fraction of reads indicated by each colorbar. (D) For each patient, lines are plotted for the cumulative fraction of reads (0–1, y axes) across the clonotype frequency bins, illustrating the proportion of the overall repertoire included upon filtering for clonotype frequency of these levels.

Fig. S2.

Fig. S2.

Clinical features of T-cell populations. (A) Immunoperoxidase staining for CD3 labels infiltrating T lymphocytes scattered throughout the tumor tissue (patient L06, 20× objective shown, inset images zoomed 2×). (B) The Spearman correlation between VJ-independent divergence (JSMΔ,corr, TCRα and TCRβ averaged) and peripheral blood leukocyte count (white blood cell titer, in millions of cells per mL) was not significant (rho = 0.04, P = 0.90). (C) The Spearman correlation between VJ-independent divergence (JSMΔ,corr, TCRα and TCRβ averaged) and duration of corticosteroid treatment (dexamethasone) before surgery and the Spearman correlation were calculated for all patients (rho = 0.23, P = 0.43) and for only the group who received dexamethasone (DEX+, rho = 0.37, P = 0.29), revealing no significance. Patients who did not receive dexamethasone at all before surgery (DEX−) are shown with hollow markers. (D) The spearman correlation was calculated between JSMΔ,corr of TCRα and TCRβ (averaged) and the expression of markers of the vascular endothelium (normalized counts by RNA-seq): PECAM-1 (rho = 0.48, P = 0.09), HBB (rho = 0.21, P = 0.46), CDH5 (rho = -0.43, P = 0.12), and VEGFA (rho = 0.29, P = 0.32).

Diversity of the Glioma-Infiltrating TCR Repertoire.

The vast diversity of the TCR repertoire and its responsiveness to stimuli through amplification of individual T-cell clones serve both as the basis for cell-mediated adaptive immunity and as a readout of those responses in an individual (15, 16). We used information theory to describe the TCR repertoires of the TIL and PBMC at a population-wide level. As a measure of diversity, we calculated the Shannon entropy, H—a function of the number of unique elements (e.g., VJ cassette combinations) in the population and their frequencies—for VJ cassette combinations, CDR3 amino acid sequences, and clonotypes (Dataset S2). As described previously, the entropy of each individual TCR can be described as the sum of the contributions from different parts of its sequence (17). Because the entropy of the repertoire is a separable function of those TCRs, we sought to partition the diversity of the population into components that reflect the factors affecting the distribution of these distinct components of the receptor (18), simplified as (i) the genome-encoded V and J cassette sequences and (ii) the VJ-independent sequence. These components are affected differentially by many factors (for a review see ref. 19). For example, epigenetically biased cassette use (20), age (21), and postdevelopmental enrichment for functional lineages (e.g., CD8+) (22) strongly influence the frequency of VJ cassette combinations in T-cell populations. However, as described in Fig. 2A, the binding affinity between CDR3 and an antigen depends on amino acids that are either not encoded by or not specific to the V or J cassette used by the receptor (the VJ-independent component). Thus, we describe the entropy of the clonal population, Hclonotypes as the sum of these contributions:

Hclonotypes=HVJ+HΔ, [1]

where HVJ is the entropy of the distribution of VJ cassette combinations and HΔ is the entropy from the VJ-independent component. HVJ varied among patients and between the PBMC and TIL of individuals (Fig. 2B and Fig. S6), with smaller contributions from VJ-independent diversity among TIL than in PBMC (Fig. 2C and Dataset S2). HΔ was highly correlated between TCRα and TCRβ across patients for both the PBMC and TIL repertoires (Fig. 2D) and saturated at the coverage of our TCRseq libraries (Fig. S3 A and B). TILs from nonneoplastic tissue were distinguished by a very low proportion of Hclonotypes contributed by HΔ (average 5.7% for TCRα and 15.2% for TCRβ) compared with those from glioma tissue, with an average of 15.8% for TCRα and 31.8% for TCRβ (Fig. 2D, Top and Fig. S3C), although this trend was not observed in the PBMC (Fig. 2D, Bottom and Fig. S3D).

Fig. 2.

Fig. 2.

Entropy-based dissection of TCR diversity. (A) The components of the Shannon entropy of a single CDR3 sequence are displayed, reflecting the VJ-dependent (light gray) and VJ-independent (dark gray) origins of the information given by each residue, including the addition of random nucleotides upon the joining of cassettes and the replacement of some cassette-encoded nucleotides with random bases (dashed lines). Although all amino acids in the CDR3 motif participate in antigen binding (dotted connectors), some are determined entirely by V and J cassette sequence, whereas others are entirely independent. (B) The frequencies of VJ combinations in select PBMC (Left) and TIL (Right) TCRα repertoires are displayed in circular plots, with frequency of each V or J cassette represented by its arc length and that of the VJ cassette combination by the width of the joining ribbon. (C) The entropy of the clonotype repertoire (Hclonotype) for the PBMC (Left) and TIL (Right) of each patient (x axis) is displayed as the sum of the entropy of the VJ repertoire (HVJ, light gray) and the VJ-independent entropy (HΔ, dark gray). (D) For each patient, the VJ-independent entropy (HΔ) of the TIL (Top) and PBMC (Bottom) repertoires were calculated, as well as the Pearson correlations between TCRα (x axis) and TCRβ (y axis) across patients (PBMC R = 0.86, P = 7.6 × 10−5; TIL R = 0.92, P = 3.2 × 10−6). Label colors indicate clinical status (GBM, red; LGG, green; and nonneoplastic, black). (E) To illustrate relationship between the VJ-independent entropy of the clonotype repertoire and the CDR3 amino acid diversity of each VJ cassette combination, these combinations were plotted for each patient according to the number of CDR3 motifs encoded (x axis) and the normalized Shannon entropy of those CDR3 (y axis), with colors of each VJ cassette combination indicating clinical status of the patient (GBM, red; LGG, green; and nonneoplastic, black; TCRα and TCRβ are displayed together).

Fig. S6.

Fig. S6.

Fig. S6.

Circular visualization of V and J cassette repertoire distributions. Circular plots for each pair of PBMC and TIL TCRα (Left) and TCRβ (Right) repertoires were generated using a customized Circos package then grouped by subject and color-coded by clinical status (black, nonneoplastic; green, LGG; and red, GBM). V cassettes appear in the upper arc of each circle and are colored red to green; J cassettes in the lower arc of the circle are colored purple to blue (see color legends), with the arc length of each V or J cassette on the edge representing its frequency and the ribbon between them the frequency of the VJ combination.

Fig. S3.

Fig. S3.

VJ-independent entropy (HΔ) of TIL and PBMC repertoires. The error-filtered, merged paired-end reads of each PBMC (Left) and TIL (Right) TCRseq libraries were downsampled to 0.1%, 1%, and 10–90% (x axis). (A) The VJ-independent entropy (HΔ = HclonotypeHVJ, y axes) is graphed for each down-sampled TCRα and TCRβ repertoire. (B) The number of unique clonotypes detected (y axes) at each sample size (x axes) is graphed for each down-sampled TCRα and TCRβ repertoire. (C) The VJ-independent diversity (HΔ) of the TCRα (Left) and TCRβ (Right) TIL repertoires are plotted by clinical status (GBM, red; LGG, green; and nonneoplastic, black), with brackets and P values indicating the significance by two-sample t test of the difference between (i) nonneoplastic and all glioma patients, (ii) nonneoplastic and GBM patients, and (iii) nonneoplastic and LGG patients. (D) The VJ-independent diversity (HΔ) of the TCRα (Left) and TCRβ (Right) PBMC repertoires is plotted by clinical status, with the significance of differences between clinical groups shown. (E) For each VJ combination in each TIL (C) or PBMC (D) repertoire, the number and frequency of each CDR3 amino acid motif encoded was used to calculate the entropy of the CDR3 distribution (HCDR3) for that VJ combination. The mean HCDR3 for all VJ combinations in each repertoire (y axes) is compared with the HΔ for that repertoire. Pearson correlation was significant across patients for both TCRα (red; PBMC, Right R = 0.85, P = 1.1 × 10−4; TIL, Left R = 0.85, P = 1.3 × 10−4) and TCRβ (blue; PBMC, Right R = 0.88, P = 3.2 × 10−5; TIL, Left R = 0.94, P = 5.3 × 10−7).

Although HVJ, HΔ, and Hclonotypes represent the diversity of the whole TCR repertoire, we can use the same formalism to assess the diversity within each VJ cassette combination. A given combination of a V and a J cassette will often encode multiple CDR3s in the repertoire, which gives that cassette combination a certain amount of entropy. For each VJ combination observed in each PBMC and TIL repertoire, we calculated the entropy imparted by the diversity of CDR3 amino acid sequences encoded (Fig. 2E). This analysis is analogous to spectratyping the products of certain cassette-specific primer pairs to assess clonal diversity (23). We observed both higher numbers (x axis) and broader distributions (normalized HCDR3, y axis) of CDR3 amino acid sequences encoded by each VJ combination in glioma tissue than in nonneoplastic tissue (Fig. 2E, Top), consistent with the comparatively higher HΔ of glioma TIL repertoires (Fig. S3E). Neither the HΔ (Fig. 2D, Bottom) nor the distribution of CDR3 motifs among VJ cassette combinations in the PBMC repertoires stratified the patients by clinical status (Fig. 2E, Bottom). The repertoire-wide property HΔ, as a measure of the component of CDR3 diversity most closely tied to antigen binding function, facilitates quantitative comparison between TCR repertoires. Here, we observed increasing VJ-independent diversity in the TIL repertoires—but not the matched PBMC—of patients with increasing glioma grade, suggestive of the diversified antigen targeting associated with active T-cell responses in both cancer and infectious diseases (2426).

VJ-Independent Divergence of TIL and PBMC Repertoires Distinguishes Glioma Patients.

We sought to determine how the composition of the TIL repertoire differed from the peripheral blood and whether these differences might distinguish T-cells infiltrating tumor compared with nonneoplastic tissue. Comparison of VJ cassette combination and CDR3 amino acid sequence use between the PBMC and TIL of each patients revealed that more than half of each patient’s composite VJ cassette combination repertoire was present in both tissues (average 59.5% for TCRα and 71.9% for TCRβ, Fig. S4A), whereas very few CDR3 amino acid sequences were shared (average 1.9% for TCRα and 1.6% for TCRβ, Fig. S4B), as expected given the larger potential diversity of the CDR3 amino acid sequences (17) and limited sampling in each experiment.

Fig. S4.

Fig. S4.

Divergence of PBMC and TIL repertoires and clustering of patients by peripheral blood CDR3. (A and B) Number of VJ cassette combinations (A) and CDR3 amino acid motifs (B) observed in PBMC repertoire only (blue), TIL repertoire only (red), or both TIL and PBMC (shared, purple) for each patient (x axis). (Inset) This number on a log10 scale to illustrate the values for shared and TIL-only CDR3 motifs. (C and D) The Jensen–Shannon divergence (JS) between the PBMC (y axis) and TIL (x axis) VJ cassette combination repertoires (C) or CDR3 repertoires (D) are shown in the heat maps (colorbars), with sample pairs from each patient annotated (black boxes). Each square in the heat map represents the average of JS(PBMC,TIL) for TCRα and TCRβ. (E) The VJ-independent divergence, JSMΔ,corr(PBMC,TIL), of the TCRα (Left) and TCRβ (Right) repertoires of each patient are plotted by clinical status (GBM, red; LGG, green; and nonneoplastic, black), with brackets and P values indicating the significance by two-sample t test of the difference between (i) nonneoplastic and all glioma patients, (ii) nonneoplastic and GBM patients, and (iii) nonneoplastic and LGG patients. (F) MDS of patients using the frequency of the combined top use CDR3 motifs in PBMC (11,638 TCRα, 13,561 TCRβ). The scaled distance between each patient is displayed along coordinates 1 and 2 (Left), 2 and 3 (Center), and 1 and 3 (Right). Colors represent the fraction of signature CDR3 motifs (average of TCRα and TCRβ) observed in each patient’s PBMC. (G) Correlation between the number of clonotypes and the fraction of signature CDR3 motifs (Fig. 4) observed in PBMC libraries for TCRα (Left, Pearson correlation R = 0.69, P = 0.006) and TCRβ (Right, Pearson correlation R = 0.82, P = 3.1 × 10−4). (H) The VJ-independent divergence, JSMΔ,corr(PBMC,TIL), of the two patient groups observed by hierarchical clustering in Fig. 4 were compared by Wilcoxon rank sum for both TCRα (Left) and TCRβ (Right), with P = 0.002 in both cases.

To resolve the differences between the PBMC and TIL, we calculated Jensen–Shannon divergence (JS) and its related metric (JSM), derived from the joint entropy of two distributions, which characterize the difference in the frequencies of their members (Dataset S2). JS and JSM are bounded by zero (when two distributions are identical) and one (where two distributions are completely nonoverlapping) and have been widely applied to the comparison of TCR repertoires (13, 14, 27). Inferring that the distinctiveness of the TIL population might therefore be resolved by separating the VJ-dependent and VJ-independent components, we described the components of the divergence of the TIL clonotypes repertoire from the PBMC, JSclonotypes as follows:

JSclonotypes(PBMC,TIL)=JSVJ(PBMC,TIL)+JSΔ(PBMC,TIL), [2]

where JSVJ is the Jensen–Shannon divergence of their VJ combination repertoires and JSΔ represents the VJ-independent divergence between PBMC and TIL (detailed in SI Materials and Methods). These divergences were computed for each PBMC–TIL pair for TCRα and TCRβ (Fig. 3A, Fig. S4 C and D, and Dataset S2).

Fig. 3.

Fig. 3.

VJ-independent divergence distinguishes TIL of glioma from nonneoplastic tissue. (A) The VJ-dependent (light gray) and -independent (dark gray) Jensen–Shannon divergence, JS(PBMC,TIL), was calculated for each patient (x axis). (B) Statistical divergence between PBMC and TIL due to population size was simulated using a control population (PBMC′), randomly sampled from the PBMC repertoire, containing the same number of clonotypes as the TIL (NTIL), and the JSMΔ between PBMC and PBMC′ was subtracted from JSMΔ between the PBMC and TIL, yielding JSMΔ,corr(PBMC,TIL). (C) Correlation between JSMΔ,corr(PBMC,TIL) of TCRα and TCRβ across patients (Pearson R = 0.86, P = 8.5 × 10−5). (D) The average JSMΔ,corr of the α and β chains is plotted with colors indicating clinical status (GBM, red; LGG, green; and nonneoplastic, black). (E) The JSMΔ,corr(PBMC,TIL) (average of TCRα and TCRβ) of each patient was compared with the difference between the VJ-independent diversity HΔ of the TIL and of the PBMC (Pearson R = 0.84, P = 1.7 × 10−4).

Some of the divergence between PBMC and TIL repertoires can be explained simply by the differences in repertoire size. To correct for this and extract differences due to composition, we calculated JSclonotypes, JSVJ, and thus JSΔ between the PBMC repertoire and a down-sampled repertoire (PBMC′) containing the same number of clonotypes as the TIL repertoire, randomly sampled from the PBMC distribution (Fig. 3B). The comparison between the VJ-independent divergence of PBMC–TIL and PBMC–PBMC′ could be made using the JSM, which is a distance metric defined as

JSM=JS. [3]

Thus, we write the VJ-independent Jensen–Shannon divergence metric between the PBMC and TIL, corrected for population size, as

JSMΔ,corr(PBMC,TIL)=JSMΔ(PBMC,TIL)JSMΔ(PBMC,PBMC), [4]

where JSMΔ(PBMC,TIL)=JSΔ(PBMC,TIL) and JSMΔ(PBMC,PBMC)=JSΔ(PBMC,PBMC) for the down-sampled control population PBMC′ (per Eqs. 2 and 3). Because JSMΔ,corr(PBMC,TIL)=0 between the PBMC repertoire and a random sample of its TCR distribution, positive values indicate more divergence associated with the VJ-independent component than by random selection.

To evaluate the VJ-independent divergence between peripheral blood and TIL populations, differentially exposed to local and tumor antigens, we calculated JSMΔ,corr(PBMC,TIL) for both TCRα and TCRβ of all patients (Fig. 3C and Dataset S2). VJ-independent divergence was below expectation (negative) for the three nonneoplastic patients, whereas the divergence of all glioma TIL was above expectation (positive), with high Pearson correlation between the α- and β-chains (R = 0.86, P = 8.5 × 10−5). The GBM patients displayed heterogeneity, with VJ-independent divergence ranging from slightly to strongly above expectation, and all three LGG patients exhibited high divergence (Fig. 3D). JSMΔ,corr(PBMC,TIL) had no significant correlation with overall white blood cell count in peripheral blood or presurgical steroid therapy (Fig. S2 B and C), suggesting that this variability did not reflect lymphopenia or steroid-induced immunosuppression. Furthermore, no significant relationship was found between VJ-independent divergence and gene expression markers of vascularization (Fig. S2D). Although higher divergence of glioma TIL overall (Fig. S4E) supported the inferred link between the binding-associated TCR repertoire and novel tumor antigens, JSMΔ,corr(PBMC,TIL) also allowed quantitative comparison of VJ-independent divergence between patients, revealing that the TILs of LGG patients were among the most divergent, despite the lower clinical grade of tumor.

To confirm our interpretation that this divergence metric, JSMΔ,corr, represents the degree to which the VJ-independent diversity of the TIL repertoire exceeds that predicted by its PBMC, we compared JSMΔ,corr(PBMC,TIL) to the difference between the VJ-independent diversity of the TIL and of the PBMC [HΔ(TIL) − HΔ(PBMC)] for each patient (Fig. 3E). These values correlated strongly (Pearson R = 0.84, P = 1.7 × 10−4), indicating that high JSMΔ,corr reflects diversity in the local T-cell population that (i) distinguishes it from the peripheral blood and (ii) is associated with its VJ-independent component. Thus, the proportions of diversity and divergence associated with the amino acid binding function of the TIL TCR repertoire, rather than its underlying VJ cassette combination distribution, may provide insight into differential selective pressure in the local tumor microenvironment and may indicate its capacity to respond to antigens endemic to the tissue.

A Common CDR3 Signature in Peripheral Blood Is Associated with Divergence of TIL from PBMC Repertoire.

Perturbations to the peripheral T-cell population have been linked to intratumoral immune status in glioma progression and therapy (2831), implying the potential for noninvasive clinical monitoring through TCRseq. Therefore, we investigated the properties of the peripheral blood repertoires in the context of clinical status, in particular, whether TCRseq-defined immunophenotypes of the TIL corresponded to detectable signatures in the PBMC. We identified the 1,000 highest-abundance amino acid CDR3 motifs (C…FGXG) from the peripheral blood of each patient, compiling an inclusive top use list for TCRα (11,638) and TCRβ (13,561) with frequencies ≥10−5. For both chains, hierarchical clustering of a binary matrix indicating the presence or absence of these CDR3 amino acid sequences across PBMC samples segregated the patients into two major groups: patients distinguished by common use of a subset of CDR3 amino acid sequences (1242 TCRα, 84 TCRβ, Dataset S3) and a cluster in which these same sequences were less commonly observed (Fig. 4 A and B, dendrograms). The patient groupings were consistent between the two TCR chains (Fig. 4 A and B, zoomed panels). Similar patient clustering resulted from multidimensional scaling (MDS) of the frequencies of the top 1,000 CDR3 amino acid sequences (Fig. S4F). These “signature” CDR3s were expressed in a distinctively high number of patients compared with the total top use sets (Fig. 4 A and B, histograms).

Fig. 4.

Fig. 4.

Use of highly shared CDR3 motifs in peripheral TCR repertoire predict VJ-independent divergence of PBMC and TIL. (A and B) Hierarchical clustering of top use TCRα and TCRβ CDR3 amino acid motifs. The top 1,000 CDR3 amino acid motifs from each of the 14 PBMC repertoires were compiled into a high-use list of 11,668 TCRα and 13,561 TCRβ with frequencies ≥10−5. Patients were hierarchically clustered by the presence or absence of these CDR3 motifs (y axes) across samples (x axes), revealing a subset of subjects (group 2) in which a cluster of 1,242 TCRα and 84 TCRβ motifs were commonly used (red dendrogram clusters) and a cluster of subjects in which these motifs were less frequently observed (group 1). Histograms of the number of samples in which each CDR3 was observed, with the 1,242 signature TCRα (median = 9) and 84 TCRβ (median = 8) highlighted in red, and the total top use set shown in black (median = 1, both chains). (C) The correlation between the VJ-independent entropy (HΔ) of PBMC repertoires and the fraction of the signature CDR3s they contain (average of TCRα and TCRβ for each patient) is plotted (Pearson R = 0.93, P = 1.7 × 10−6). (D) The correlation between VJ-independent divergence, JSMΔ,corr(PBMC,TIL), and the VJ-independent entropy (HΔ) of the PBMC (average of TCRα and TCRβ) is plotted (Pearson R = −0.71, P = 0.0041). (E) The VJ-independent divergence, JSMΔ,corr(PBMC,TIL), of each patient is plotted against the fraction of signature CDR3 motifs observed in the PBMC repertoire (average of TCRα and TCRβ), revealing significant anticorrelation (Pearson R = −0.79, P = 7.2 × 10−4).

The occurrence of these signature amino acid CDR3s correlated with the total number of clonotypes observed in the PBMC repertoires (Fig. S4G), which is a function of both the number of cells included in the library and the intrinsic diversity of the repertoire from which they are drawn. Because there was little variation in the number of cells used to produce each PBMC library (Dataset S1), we hypothesized that the high fractional use of these CDR3s reflected repertoire diversity, particularly VJ-independent entropy. Indeed, the fraction of signature CDR3s observed in each PBMC repertoire was significantly correlated with its HΔ (Fig. 4C, Pearson correlation R = 0.93, P = 1.7 × 10−6), tying the presence of these motifs to VJ-independent diversity. VJ-independent divergence, JSMΔ,corr was strongly anticorrelated with HΔ(PBMC) (Fig. 4D, Pearson correlation R = −0.71, P = 0.0041) and was significantly lower among patient group 2, with high use of the signature CDR3s (Fig. S4H). The relationship between the use of this signature subset of CDR3 motifs and HΔ(PBMC) allows this diversity signature to function as a peripherally accessible correlate of the divergence between the TIL and PBMC repertoires (Fig. 4E, Pearson correlation R = −0.79, P = 7.2 × 10−4).

In our dataset, the “signature” CDR3 motifs were strongly represented in healthy individuals (Fig. S5 D and E), which cluster in group 2. Previous studies have characterized the TCR repertoires in the peripheral blood of healthy individuals (13), and so we tested whether these datasets would also cluster with group 2. Fig. S5 FI shows that 6/6 TCRβ profiles and 5/6 TCRα profiles from healthy individuals obtained in a previously reported study cluster with group 2 through expression of CDR3s which strongly overlap with our signature CDR3 motifs. Thus, the relationship between signature CDR3 use and a highly diverse peripheral repertoire (high HΔ), which coincided with low VJ-independent divergence in our tissue-paired cohort, was robust among the PBMC of healthy subjects. Not only is this signature of repertoire diversity and low PBMC-TIL divergence amenable to noninvasive monitoring, it requires only detection and identification of the CDR3s, rather than precise quantification, at a sampling depth readily accessible from peripheral blood.

Fig. S5.

Fig. S5.

Distribution of CDR3 motifs of the low-divergence signature and previously published specificities. (A) Overlap among 10,691 public TCRβ CDR3 amino acid motifs described by Britanova et al. (21), 37,463 TCRβ CDR3 compiled from several studies for association with common pathogens, and our signature set (74 TCRβ CDR3, SI Materials and Methods). (B) (Left) Significantly higher numbers of signature (Top), public (Middle), and pathogen-associated CDR3s (Bottom) were observed in the TCRβ PBMC repertoires of group 2 patients than of group 1 patients (P = 1 × 10−3 for all CDR3 sets, Wilcoxon rank sum). (Right) The top use TCRβ CDR3 motifs (as in Fig. 4B) were filtered on the signature set (Top), the Britanova et al. (21) public set (Middle), and the pathogen-associated CDR3 set (Bottom) and patients hierarchically clustered by their presence and absence. (C) Among CDR3 motifs previously associated with targeting myelin-basic protein in MS patients (9 TCRα, 81 TCRβ; SI Materials and Methods), six were observed in our cohort (frequency ≥10−5), as indicated by patient (x axis) and tissue (blue, PBMC; red, TIL; and purple, both). (D) The fraction of the signature CDR3 (1,241 TCRα, 74 TCRβ) observed in PBMC of group 1 (black), group 2 (red), and additional healthy human subjects (blue), as well as the public and pathogen-associated sets (gray). (E) Signature CDR3 motifs were observed in TCRβ repertoires of healthy individuals sequenced using a different method in two studies (black lines, SI Materials and Methods) at frequencies ≥10−6 (heat map). (F and G) The top 1,000 most abundant CDR3 motifs (frequency ≥10−5) were compiled from the 14-patient cohort, subject H15, and additional healthy subjects (16,564 TCRα, 20,395 TCRβ) and hierarchically clustered by presence or absence of these CDR3 motifs (y axes) across samples (x axes). (H and I) Overlap between the TCRα and TCRβ CDR3 clusters stratifying the 21 PBMC repertoires (F and G) and the signature sets (Left). Hierarchical clustering of the paired brain tissue cohort with respect to the presence or absence of these overlapping CDR3 motifs recapitulated the initial group 1 and group 2.

A recent study from Britanova et al. (21) identified CDR3 motifs that are shared among many individuals and associated with diverse repertoires. Given that the signature CDR3 motifs in our study are also associated with the diverse repertoires of noncancer patients and certain GBM patients, we reasoned that there might be significant overlap between these two sets of CDR3 motifs. Because the previous study reported only TCRβ motifs, we compared them to our TCRβ signature CDR3s (Table S2, Dataset S3, and SI Materials and Methods). Indeed, 62/74 signature CDR3 motifs from our study appear in the Britanova et al. (21) list (∼83%), whereas only 632/13,693 top clones that are not among the signature CDR3s appear in the list (<5%). Although we found that the Britanova et al. (21) CDR3 motifs did not cluster our patients as effectively as the signature CDR3s reported here (Fig. S5), the remarkable overlap between these two sets of motifs provides valuable insight into the potential generality of these clones for assessing immune response to disease.

Table S2.

Enrichment of previously reported TCRβ CDR3 motif sets among “top usage” TCRβ CDR3 motifs, reported for both signature and nonsignature motifs

TCRβ CDR3 motifs Group 2 signature CDR3 motifs Nonsignature CDR3 motifs Total top TCRβ CDR3 motifs
“Public” clones (10,691 motifs, ref. 21) 62/74 632/13,693 694/13,767
Tetanus toxoid (12,865 motifs) 6/74 71/13,693 77/13,767
M. tuberculosis (1,823 motifs) 2/74 12/13,693 14/13,767
Influenza A (8 motifs) 0/74 2/13,693 2/13,767
HSV (3 motifs) 0/74 7/13,693 7/13,767
C. albicans (22,506 motifs) 20/74 137/13,693 157/13,767
CMV (193 motifs) 0/74 7/13,693 7/13,767
EBV (65 motifs) 0/74 6/13,693 6/13,767

One potential source of shared or “public” CDR3 motifs among individuals arises from exposure to common pathogens. There have been several reports of CDR3 motifs associated with pathogens such as Clostridium tetani (toxoid), Candida albicans, Mycobacterium tuberculosis, HSV, CMV, EBV, and influenza A (3240). We investigated the overlap of the signature CDR3 motifs with these pathogen-associated motifs for TCRβ (Dataset S3). We found enrichment of our signature CDR3 motifs in those associated with tetanus toxoid (8.1% among “signature,” 0.5% among nonsignature), M. tuberculosis (2.7% vs. 0.08%), and C. albicans (27% vs. 1%) (Table S2). Similar to the “public” motifs from Britanova et al. (21), these pathogen-associated motifs were strongly represented in the PBMC repertoires (Dataset S4), but were not able to cluster our patients as effectively as the signature CDR3 motifs identified here (Fig. S5), but exposure to common pathogens could nonetheless contribute to the prevalence of these CDR3 motifs in the population.

SI Materials and Methods

Lymphocyte Sampling Estimates.

Estimates of the fraction of glioma cellularity comprised by TILs varies, reported at 4–10% by Hitchcock and Morris (49), although differences in tumor grade and type as well as methodology have produced some ambiguity with regard to total numbers as well as relative CD8+ and CD4+ frequencies. Tumor tissue samples from which mRNA was isolated in this study were ≤300 mg, or ∼1 × 106 total cells, predicted to contain ≤1 × 105 lymphocytes.

Up to ∼1 × 106 unique TCRβ sequences have been observed among ∼1–3 × 107 T cells from 20 mL of human blood, with sequencing coverage saturating at <2 × 108 (53). Although exhaustive sequencing of a subject may reveal additional, extremely low-frequency TCRs, we sought statistical representation of rare clones without fully saturating their sequence identities, given limitations on available blood volumes in the context of craniotomies. Thus, 2.5 × 106 PBMCs were included in each library, with a goal of sequencing up to ∼1 × 106 T cells, with the exception of G14 (∼7 × 105 PBMCs) due to the availability of material, and targeted 4.5–10× sequence coverage of ∼1 × 106 peripheral T cells (Dataset S1).

Although long-term corticosteroid treatment is known to decrease stimulatory T-cell counts, the patients included in this study received dexamethasone steroid therapy for six or fewer days before surgery, and WBCs measured in their diagnostic laboratory tests did not indicate a relationship between leukocyte count and the number of days on dexamethasone before surgery among glioma patients (Spearman rho = 0.31, P = 0.36). Our major entropy-based repertoire phenotype, VJ-independent divergence [JSMΔ,corr(PBMC,TIL), discussed below] did not correlate with either WBC (Spearman correlation rho = 0.04, P = 0.90, Fig. S2B) or days on dexamethasone (rho = 0.23, P = 0.43, Fig. S2C).

Read Filtering by Sequence Error Rate.

Because sequencing error inevitably generates spurious unique CDR3 sequences by mutating the joining regions between cassettes for which we have no reference sequence, the procedure described above will inevitably result in overcounting the number of unique clones observed in each sample. To correct for this, we first estimated the error rate within a given sample based on the median mismatch rate across the V- and J-cassette sequences, for which we have a reference. The number of spurious reads n containing E errors generated by a clone with N reads can be estimated by the Poisson distribution

n(E,N,ϵ)=N(ϵL)EeϵLE!, [S1]

where L is the length of the nonreference sequence and ε is the error rate. For each observed clone, we estimated n for E = 1, 2, and 3 (beyond this there were negligible spurious reads given our coverage). We then searched the dataset for sequences that had a Hamming distance H = 1, 2, or 3 from the clone of interest and removed n(E = H) reads associated with these sequences. If n(E = H) exceeded the number of reads with a Hamming distance H, then all of the reads with a Hamming distance H were discarded.

V- and J-Cassette Mapping and Clonotype Resolution.

Reads were initially mapped to the GRCh37 human reference genome, with the β-chain on chromosome 7 masked (location 7q34 91557–667340) and replaced by a patch from GenBank (GRCh37.p8), which offered the most up-to-date sequence at the time. Our V- and J-cassette reference contained the set of amino acid sequences and CDR3-associated motifs, including pseudogenes, provided by the ImMunoGeneTics (54) information system, for a comprehensive analysis. Sequences were translated into all three reference frames, and the correct CDR3 sequence was identified based on the conserved C…FXFG motif spanning the cassettes in frame. For the TCRα and TCRβ chains, we report (i) the repertoire of VJ cassette combinations, (ii) the repertoire of amino acid motifs encoded by the CDR3 region (CDR3 amino acid sequences), and (iii) the repertoire of clonotypes—all unique combinations of CDR3 amino acid sequences and VJ cassette combinations (Dataset S1). The distributions contain a high number of low-frequency clonotypes (Fig. 1F) but reads distributed over a wide range of clonotype frequencies (Fig. S1 C and D), giving rise to distributions in which an average of 86.7% of TCRα reads and 82.1% TCRβ reads map to the top 10% of clonotypes in each repertoire.

We note that using the unique combination of VJ and CDR3 amino acid motif from the reference-corrected nucleotide sequences to identify clonotype minimizes the impact of single-nucleotide differences arising from sequence error but disallows polymorphism in the portions of the CDR3 that are mapped—this clonotype identifier thus provides a conservative estimate of species distinguished by single-nucleotide variation, only calling a clonotype unique if the mutated site is nonsilent and occurs in the cassette-joining region.

Circular Visualization of V and J Cassette Use.

Visual plots of the relative abundance of V and J cassettes, and their combinations, for each TCR repertoire were generated using the Circos software package (55). The plot construction consists of karyotype and link files generated in Python and includes a custom color palette with predefined colors for each cassette. Band size is determined by individual cassette abundance in the sample, and ribbon width and transparency depends on abundance of the VJ cassette combination. Template CONF files and Python scripts are available at https://github.com/bgrinshpun/CircosVJ.

Calculating VJ-Dependent and -Independent Diversity and Divergence.

We selected Shannon entropy (H) as our fundamental measure of the diversity within each TCR population, which has been widely used in previous TCR repertoire studies, for its inclusion of both richness (number of members) and diversity (evenness of distribution). The entropy of the nucleotide sequence of any individual TCR may be described as the sum of the entropy contributed from different sources: the V- and J-cassette sequences, nucleotides deleted from these upon joining, and random nucleotides inserted at this join (17, 18). We calculated entropy for each TCR repertoire at the level of CDR3 amino acid motifs, VJ cassette combinations, and clonotypes (Dataset S2). All three repertoire entropies HCDR3, HVJ, and Hclonotype were very stable to downsampling to <40% the size of the original libraries (Fig. S3 A and B).

We were interested in computing the distance between two populations of T cells in PBMC (P) and TIL (Q). Specifically, we would like to estimate the contribution to this distance that is independent of VJ combination use. We can define the Shannon entropy of a population as follows:

H(P)=ipilog2pi. [S2]

We can also define the total entropy Hclonotype such that each member i of the population is a clone that is uniquely identified by a VJ cassette combination and CDR3 amino acid sequence. Similarly, we can define HVJ such that each member j of the population is uniquely identified by a VJ cassette combination. There are many amino acids in each clone that are not strictly defined by the VJ cassette combination due to the insertions in the joining region between cassettes and deletions at nonjoining ends of each cassette. We can take advantage of the additivity of entropies implied by Eq. S2 to estimate the VJ-independent contribution HΔ:

HΔHclonotypeHVJ. [S3]

Having defined these entropies, we will use them to compute the Jensen–Shannon divergence (JS), generally defined as

JS(P,Q)=12ipilog2pi+12iqilog2qii(pi+qi2log2(pi+qi2)), [S4]

which can be rewritten for JSclonotype, JSVJ, and JSΔ and in terms of their entropies:

JSclonotype(P,Q)=Hclonotype(pi+qi2)12Hclonotype(P)12Hclonotype(Q) [S5]
JSVJ(P,Q)=HVJ(pi+qi2)12HVJ(P)12HVJ(Q) [S6]
JSΔ(P,Q)=HΔ(pi+qi2)12HΔ(P)12HΔ(Q). [S7]

TIL repertoires possessed consistently lower VJ cassette combination divergence, JSVJ(P,Q), from their paired PBMC than from the PBMC of the other patients (Fig. S4C), similar to these comparisons made by Spearman correlation in Fig. 1E. The divergence in amino acid CDR3 motifs between paired TIL and PBMC, JSCDR3(P,Q), were likewise much lower than those between the amino acid CDR3 repertoires of unpaired samples (Fig. S4D), motivating comparison of the VJ-independent component of divergence, JSΔ(P,Q), across patients.

Using Eq. S3, we can rewrite Eq. S7 as

JSΔ(P,Q)=Hclonotype(pi+qi2)12Hclonotype(P)12Hclonotype(Q)HVJ(pi+qi2)+12HVJ(P)+12HVJ(Q), [S8]
  • which simplifies to

JSΔ(P,Q)=JSclonotype(P,Q)JSVJ(P,Q). [S9]

Hence, we can estimate the divergence between the brain tissue and blood T-cell populations that is VJ cassette combination-independent by taking advantage of the additivity of entropies. One disadvantage of Eq. S9 is that the estimate of JSΔ that results is not a metric. The square root of the Jensen–Shannon divergence, known as the Jensen–Shannon divergence metric (JSM), is a metric and can be treated as a distance (56). We can define the VJ-independent component of the estimated metric as

JSMΔ(P,Q)=JSclonotype(P,Q)JSVJ(P,Q), [S10]

which we report in Dataset S2. We also make some sampling corrections to the divergence. In particular, we correct for the fact that the brain tissue harbors significantly fewer clones than the blood by down-sampling the blood population to the size of the brain population, yielding a distribution P′ in which the number of clones in P′ equals the number of clones in the observed brain tissue population Q. We then assess the divergence between the down-sampled blood population and the blood population. We then report the corrected metric JSMΔ,corr:

JSMΔ,corr(P,Q)=JSclonotype(P,Q)JSVJ(P,Q)JSclonotype(P,P)JVJ(P,P). [S11]

Comparisons of these properties between patient groups (nonneoplastic = 3, LGG = 3, GBM = 8) were performed in MATLAB by two-sample t test (“ttest2”) with unequal variance (Welch’s t test).

Histopathology.

Immunoperoxidase staining with anti-CD3 antibody (Dako Biosciences, manufacturer’s protocol) and hematoxylin counterstain was performed on formalin-fixed paraformaldehyde-embdedded tissue, 4-µm-thick sections (Fig. S2A), as described and shown for GBM in our previous studies (30). Micrographs were taken using ImageJ software.

Gene Expression Profiling.

We investigated the relationship between JSMΔ,corr(PBMC,TIL) and vascularization of the brain parenchyma, which could lead to the inclusion of T cells among the TIL libraries representative of the circulating, not infiltrating, population. Total RNA from the tissue RNA isolation (described above) with a high RNA integrity score (>7) was processed by the Columbia Genome Center and sequenced (30 million single-end 100 base reads) on an Illumina HiSEq. 2500. Demultiplexed sequence data were mapped to the human genome and transcriptome (hg19, Illumina iGenomes annotation) using Tophat 2. Uniquely mapped reads were quantified using HTSeq. Normalized counts are reported for genes of interest, specifically, marker genes associated with vascular endothelium or whole blood: platelet endothelial cell adhesion molecule (PECAM-1), hemoglobin (HBB), vascular endothelium cadherin (CDH5), and vascular endothelial growth factor A (VEGFA). Correlations of the RNA expression levels of these marker genes with JSMΔ,corr(PBMC,TIL) across patients is shown in Fig. S2D. The raw sequence reads in fastq format along with a table of read counts for each sample have been deposited in the Gene Expression Omnibus database (accession no. GSE79338).

Clustering by Peripheral CDR3 Amino Acid Motifs.

We compiled the 1,000 highest-abundance TCRα and TCRβ CDR3 motifs from each peripheral blood library with frequency ≥10−5 and created binary matrices of the presence or absence of these 11,368 TCRα and 13,561 TCRβ CDR3 motifs across all PBMC samples, then performed hierarchical clustering (Euclidean distance) in MATLAB (R2014b), producing the clustergrams in Fig. 4 A and B. The upper cluster of the CDR3 motif dendrograms (red brackets) contained 1,242 TCRα and 84 TCRβ, which we designated “signature” CDR3 motifs (Dataset S3). The fraction of these signature motifs that were present in each PBMC repertoire were tabulated and correlated with number of clonotypes in PBMC, JSMΔ,corr(PBMC,TIL) and HΔ(PBMC) in MATLAB (Fig. S4G). VJ-independent divergence, JSMΔ,corr(PBMC,TIL) of the two groups of patients defined by the hierarchical clustering was significantly different for both chains (Fig. S4H).

MDS was performed using the frequencies of the 11,368 TCRα and 13,561 TCRβ top use CDR3 motifs, using the “cmdscale” function (MATLAB 2014b) on the pairwise Euclidean distances between the log2(frequencies). These distances are displayed in Fig. S4F, demonstrating the similarity of patient clustering by amino acid CDR3 frequency to the hierarchical clustering shown in Fig. 4, and the correlation between the fraction of signature CDR3 motifs and the frequency-based MDS clustering.

Comparison of Signature CDR3 Amino Acid Motifs to Public and Target-Associated TCRs.

Shared CDR3 motif use may arise through intrinsic, independently arising commonalities in repertoire generation across individuals, bolstered by selection on a broad repertoire of autoantigens during development (15), or may represent convergence, due to positive selection by common pathogens over the lifespan (33).

To understand the use of the signature CDR3 set, anticorrelated with VJ-independent divergence, JSMΔ,corr(PBMC,TIL), we explored the commonality of these CDR3 motifs among healthy human subjects, and their association with known disease reactivities. We compared the CDR3 amino acid motifs in our divergence-anticorrelated signature to a set of public CDR3 motifs (10,691 TCRβ) whose abundance across 39 healthy blood donors correlated with overall repertoire diversity and anticorrelated with age (21) and to a set of 37,463 unique TCRβ amino acid sequences reported in association with microbial pathogens (Fig. S5A and Dataset S3):

  • i) Memory T cells identified by tetramer binding to CMV or EBV antigens (57);

  • ii) TCRβ chains statistically associated with CMV-positive patients (58);

  • iii) TCRs isolated from blood and synovial fluid with binding affinity for dominant EBV epitopes (34);

  • iv) TCRs associated with EBV, CMV, influenza A, and HSV (15, 33, 3538), as highlighted in ref. 39; and

  • v) Helper T cells from healthy donors differentially expanded ex vivo in response to stimulation with C. albicans (22,507, n = 5), M. tuberculosis (1,822, n = 2), or tetanus toxoid (12,865, n = 4). A List of CDR3s compiled from those observed in only one of these stimulus groups is given in ref. 32.

CMV-specific CDR3s were of particular interest due to their reported role in graft rejection and antitumor immunity (59), evidence that expansion of CMV-specific T cells leads to diversity “depletion” in the repertoire (40), and relevance to T-cell surveillance in the CNS. For collective analysis of these differently processed datasets, all CDR3 amino acid sequences were truncated to include only the first residue of the J gene-encoded 3′ [F/W]GXG motif, as was the most common annotation of the minimal CDR3 motifs in the literature and filtered for length greater than five residues, leaving 1,241 unique TCRα and 74 unique TCRβ in our divergence signature set.

The overlap between the TCRβ CDR3 motifs of our low-divergence associated signature, the public CDR3 set of Britanova et al. (21), and the pathogen-associated CDR3 set is shown in (Fig. S5A). Of the 74 signature CDR3s, 25 occurred in both of the other TCRβ sets. Notably, although the pathogen-associated set was more than three times larger than the public set, the signature set shared more CDR3 with the public set (37 CDR3s) than with the pathogen-associated set (three CDR3s), which does not support that the abundant use of signature CDR3s results from selection by the pathogen-specific immunogens described in these studies.

The number of signature, public, and pathogen-associated CDR3s were all significantly higher in the group 2 patients than in group 1 (Fig. S5B, Left). For all three CDR3 motif sets, this number was positively correlated with both the total number of CDR3s and the VJ-independent diversity of the PBMC repertoire. Correlation with the number of signature CDR3 was slightly stronger for VJ-independent diversity (R = 0.91, P = 8.5 × 10−6) than for the total number of CDR3 amino acid motifs (R = 0.86, P = 9.5 × 10−5), whereas this relationship was reversed for the public CDR3 set [Pearson correlation with number of CDR3 R = 0.93, P = 1.7 × 10−6; with HΔ(PBMC) R = 0.83, P = 2.8 × 10−4] and the pathogen-associated set [Pearson correlation with number of CDR3 R = 0.94, P = 8.1 × 10−7; with HΔ(PBMC) R = 0.81, P = 4.4 × 10−4]. Using the three sets of CDR3 amino acid motifs to filter the top use TCRβ CDR3s used for hierarchical clustering in Fig. 4B (74 signature CDR3s, 694 public CDR3s, and 270 pathogen-associated CDR3s, annotated in Dataset S3) and clustering their presence or absence revealed that the low-divergence signature set, but not the larger public set or the pathogen-associated set, stratified the patients into group 1 and group 2 (Fig. S5B, Right).

We investigated the presence in our PBMC and TIL repertoires of CDR3 amino acid motifs known to bind CNS antigens. From two published sets of CNS-autoreactive TCRs (60, 61), we compiled a candidate list of 9 TCRα and 81 TCRβ observed in multiple sclerosis (MS) patients, not through their association with viral disease, many of which were validated through cloning and expression to bind epitopes of myelin basic protein (annotated in Dataset S3). Only six of these CDR3 amino acid motifs were observed in our 14 patients at a frequency >10−5 (Fig. S5C), and none were among the signature CDR3 set. Despite having been discovered solely in the context of MS, three were also among the pathogen-associated set: CSVGTGGTNEKLFF with EBV specificity (34) and CASSRGSYEQYF expanded by tetanus toxoid (32), both public, and CASSIRSSYEQYF identified in influenza response (39). The fourth MS-associated TCRβ motif, CASSQDSNTEAFF, not present in any curated set, was present in both group 1 and group 2 PBMC, but not their TIL repertoires. Because this set of CNS-associated TCRs was small and poorly represented among both the patient repertoires and comparative CDR3 sets, it remains undetermined whether CNS-binding CDR3s overall are enriched among high- or low-divergence patients, but they do not seem to be a major component of the signature CDR3 set.

The fraction of signature CDR3s was high for both healthy subject H15 (Dataset S1) and the productive TCRα and TCRβ repertoires of three pairs of healthy identical twins [21, 27, and 46 y old, all female (13), as processed by the MiTCR software package (62)] (Fig. S5D), similar to the low-divergence group 2 patients (red) and distinct from the high-divergence group 1 patients (black). The public set of 10,691 CDR3 described by Britanova et al. (21) contained 83.8% of the TCRβ signature set, whereas the 37,463 pathogen-associated CDR3s contained only 39.2%, and pathogen-associated CDR3s were not enriched among group 1 patients. Additionally, TCR repertoires from healthy human peripheral blood that were sequenced using a library preparation method without multiplex-primed amplification at both higher (53) and lower (9) cell and sequencing coverage and a different sequencing platform and analysis pipeline contained >50 of the signature motifs in their exhaustive-depth libraries (Fig. S5E, Male 1, Male 2, Female; frequency >10−6) (53), and high-frequency use of fewer CDR3 motifs in the lower coverage blood repertoire from the prior related study (Fig. S5E, Male 2009) (9).

Unsupervised hierarchical clustering (as described above) was performed on the top 1,000 amino acid CDR3s of 21 PBMC repertoires: the 14-patient paired-sample cohort, healthy subject H15, and the six healthy repertoires from the study by Zvyagin et al. (13) (16,564 TCRα, 20,395 TCRβ, frequency ≥10−5). The PBMC repertoires of all additional healthy individuals (with the exception of TwA1 TCRα chain) clustered within group 2 (Fig. S5 G and H), consistent with their high use of signature CDR3 motifs and the low JSMΔ,corr(PBMC,TIL) of the nonglioma patients from our patient cohort. The CDR3 clusters whose use stratified groups 1 and 2 in this expanded cohort of PBMC samples included 850/1,241 (68.5%) of the signature TCRα and 51/74 (68.9%) of the signature TCRβ (Fig. S5 H and I, Left). Clustering the paired brain tissue cohort, with its known high- and low-divergence phenotypes, on the presence or absence of the overlapping set produced the same segregation of low-divergence group 2 patients from high-divergence group 1 patients (Fig. S5 H and I, Right).

Together, these results find the low-divergence signature CDR3 motif set—correlated with high VJ-independent diversity in PBMC and heavily represented among group 2 GBM patients, nontumor patients, and healthy subjects—to reflect intrinsic amino acid repertoire diversity, as suggested by its correlation with HΔ(PBMC), and with similarity to previously described public CDR3s (21), but distinct in its cohesive depletion among the high VJ-independent divergence patients constituting group 1.

Discussion

Monitoring the strength and specificity of antitumor T-cell reactivity remains a crucial but elusive component of precision immunotherapy. Whereas the immune response itself provides selective pressure and has been associated with both effective tumor clearance and immunoediting of the cancer cell population (41), the combination of the identities and distribution of tumor-responsive TCRs represent a “footprint” of the conditions they face, such as neoantigenic load and dysfunctional immune signaling. Thus, identifying and tracking such T-cell clones has risen to priority as a potential high-dimensional biomarker for tumor development and personalized predictor of the efficacy of immunotherapeutic interventions in cancer (42, 43).

High-throughput sequencing (TCRseq) provides population-wide profiles at a resolution that connects clonal identification to the selectable, functional binding properties of the receptor. Here, we have separated the Shannon entropy of the TCR repertoire into contributions from the diversity of the VJ cassette combination distribution and VJ-independent diversity. The VJ-dependent component is strongly influenced by developmental and lineage restriction, whereas the VJ-independent component includes selection for the antigen-binding affinity of the CDR3 sequence (18, 19). In glioma, these statistical attributes indicated that TILs, especially those of GBM, possess greater VJ-independent diversity than those of nonneoplastic tissue and diverge from their respective PBMC repertoires on the basis of this VJ-independent diversity.

These TCR repertoire phenotypes were consistent with known characteristics of brain TILs. In healthy CNS tissue, surveilling lymphocytes are limited to a relatively small, functionally restricted subpopulation of T cells, which may be expanded following injury or infection, with varying expansion of antigen specificity depending on microenvironmental immune signaling (see review in ref. 44). Our observation of high divergence between PBMC and TIL repertoires in LGG and some GBM provides evidence of tumor-associated T-cell responses in glioma. This overall approach may offer new insight into the coevolution of antitumor reactivity with tumor progression (41) independent and synergistically with gene expression-based immune profiling (45).

Although noninvasive, longitudinal monitoring would have tremendous utility in glioma, efforts to describe a TCR signature predictive of intratumoral immune status have focused on TCR sequences with tumor specificity (4648). Although our initial calculation of VJ-independent divergence, JSMΔ,corr, required both peripheral blood and biopsied tumor tissue, we discovered a signature set of CDR3 amino acid sequences whose high use in peripheral blood distinguished those patients with low JSMΔ,corr from those with high JSMΔ,corr, providing a predictor of PBMC–TIL divergence that could be assayed noninvasively. This signature set of CDR3s overlaps not only with previously reported CDR3 motifs associated with TCRβ diversity (21), but also with CDR3s found to be associated with certain common pathogens. Thus, instead of detecting the presence of tumor-reactive TCRs, our signature set reflects peripheral diversity that is depleted in glioma patients with high PBMC–TIL divergence (JSMΔ,corr). These motifs distinguish them from the healthy and tumor-free individuals as well as low-divergence GBM patients, reminiscent of repertoire narrowing observed following strong T-cell responses to certain diseases (10, 16, 19). Our ability to infer high VJ-independent TIL divergence based on a repertoire-wide shift away from the properties of healthy PBMC, not on a shift toward antitumor TCRs, circumvents confounding variability in the antigenic specificities of these TCRs in GBM and other heterogeneous tumors. Furthermore, this signature requires only detection and identification of certain CDR3s, not precise quantification of their frequency—and thus may be translatable across TCRseq platforms.

Direct analysis of repertoire-wide diversity and divergence based on the functionally distinct components of the receptor provides an orthogonal immunophenotype to both the neoantigen load of the tumor and immune signaling in the microenvironment. Diagnosing the capacity of adequately stimulated T cells to recognize tumor antigens and respond locally could provide key guidance for rational combination of immunotherapies, and TCRseq of the peripheral repertoire could enable longitudinal monitoring of systemic and intratumoral responses to therapy, empowering precision treatment for glioma.

Materials and Methods

Clinical Sample Procurement.

Tissue samples of nonneoplastic, low-grade and high-grade glioma were cryofrozen and stored in liquid nitrogen immediately following surgical resections, along with concurrent peripheral blood cells, performed at Columbia University Medical Center through The Neurological Institute, as were tissue samples from nontumor control subjects. Glioma patients included subjects with WHO grade II oligodendrogliomas and astrocytomas, WHO grade IV glioblastoma upon initial (not recurrent) diagnosis. Surgical specimens were collected under Columbia University Institutional Review Board protocol #AAAA4666 (not considered human subjects research) were coded and released as deidentified tissue samples to researchers according to Columbia University Office of Human Research Protection guidelines. Additional healthy peripheral blood was obtained through separate volunteer donation (Table S1).

RNA Isolation and TCR Repertoire Amplification.

Total RNA was isolated from cryofrozen human tissue samples using the TissueLyzer system with Qiazol and steel beads (Qiagen). For TCR library preparation, mRNA was isolated using magnetic oligo-dT Dynabeads (Life Technologies), according to manufacturer protocol, and final concentration determined by Qubit (Life Technologies). Tumor tissue samples in this study were ≤300 mg, or ∼1 × 106 total cells, predicted to contain ≤1 × 105 lymphocytes [4–10% of cellularity in glioma tissue (49), and less in nonneoplastic brain tissue]. Total RNA was obtained from 2.5 × 106 PBMCs for each library (the only notable exception was G14, ∼7 × 105 PBMC due to the availability of material) using the RNeasy system (Qiagen) according to manufacturer instructions, with a goal of including up to ∼1 × 106 T cells (Dataset S1, “Input”).

We used the commercially available iRepertoire platform (7) for nested amplicon arm-PCR (11) of the CDR3 of the human TCRα and TCRβ chains and addition of adaptors for Illumina platform sequencing. Reverse transcription of 100–400 ng of mRNA (tissue libraries) or total RNA (PBMC libraries) was conducted with a one-step reverse transcription and amplification kit (Qiagen) according to the manufacturer’s protocol. The PCR product was purified using AmpureX-100 magnetic beads (Agencourt), and secondary amplification of 40% of the resulting product was performed (Multiplex PCR Kit, Qiagen), allowing addition of Illumina adapter sequences (manufacturer’s protocol). Libraries were purified by agarose gel electrophoresis, cutting between 200–400 bp (Fig. 1A, predicted amplicon size 210–310 bp), extracted (Qiagen), and sequenced as described below.

Analysis of High-Throughput Sequencing Data.

Using an Illumina MiSeq, we obtained 220–250 nt paired-end reads (95 ± 2% reads passing filter, averaged over runs). We targeted 4.5–10× sequence coverage of ∼1 × 106 peripheral T cells (SI Materials and Methods).

Raw paired-end fastq files were first demultiplexed based on the internal 6-nt barcode sequences added during library construction. The paired reads were then merged using FLASH 1.2.11 (flash –M 250 -O) (50) and aligned to the human genome (GRCh37) using the Burrows–Wheeler Aligner (bwa mem) (51). Reads mapping to the T-cell receptor loci (TRA and TRB) were then extracted, associated with V- and J-cassettes, and translated in silico in all three reading frames using the genetic code. Reading frames containing a C…FGXG amino acid motif that was uninterrupted by a stop codon were identified as productive CDR3 amino acid sequences. For each demultiplexed sample, all V- and J-cassettes were then reference-corrected and the number of reads identified with each unique combination of V and J cassettes encoding a CDR3 amino acid sequence were counted. Reads were filtered by Hamming distance as thresholded by the sequence error rate in the mapped regions of the sample (detailed in SI Materials and Methods).

Shannon Entropy-Derived Statistics.

We selected Shannon entropy (H) as our fundamental measure of the diversity (52), which has been widely used in previous TCR repertoire studies for its inclusion of both richness (number of members) and diversity (evenness of distribution) and its applicability to more complex models of repertoire size (17, 18). We calculated entropy for the CDR3 amino acid motif, VJ cassette combination, and clonotype repertoire of each library (Dataset S2) and described the entropy of the VJ-dependent (genome-encoded cassette sequence) and VJ-independent (all other sources of sequence diversity and bias) as separable components of each clonotype repertoire. Saturation of the entropy properties of the repertoires was demonstrated by in silico downsampling of the reads of each PBMC and TIL library, yielding HΔ (Fig. S3A) for numbers of reads and clonotypes (Fig. S3B) spanning approximately two orders of magnitude for TCRα and TCRβ. Similarly, we calculated the Jensen–Shannon divergence (JS) between each TIL repertoire from its paired PBMC, which has been used to compare the peripheral blood repertoires of individuals (13). The square root of JS, the Jensen–Shannon divergence metric (JSM), is then corrected for divergence due to the size of the TIL population, allowing us to report the size-corrected, VJ-independent divergence metric, JSMΔ,corr. These calculations are detailed in SI Materials and Methods. All entropy characterization of the repertoires was performed using code in Python2.7.5, tabulated and displayed using MATLAB (R2014b).

Clustering by CDR3 Amino Acid Motifs in Peripheral Blood.

We compiled the 1,000 highest-abundance TCRα or TCRβ CDR3 amino acid motifs (including [C…FGXG], the amino acids relevant to CDR3-region binding, both VJ-encoded and VJ-independent) with a frequency of ≥10−5 from each peripheral blood library into a top use list for each chain across all patients. Two binary matrices, reflecting the presence or absence of each TCRα or TCRβ CDR3 motif in each sample, were hierarchically clustered (Euclidean distance) in MATLAB (R2014b), producing the clustergrams in Fig. 4 A and B. The distinct upper clusters of each CDR3 motif dendrogram (in red in Fig. 4, 1242 TCRα and 84 TCRβ) were designated signature CDR3 motifs. The fraction of these signature CDR3 motifs present in each PBMC repertoire was tabulated, and Pearson correlations of this fraction with other properties of the patient repertoires [e.g., JSMΔ,corr(PBMC,TIL)] in Fig. 4 were calculated in MATLAB (R2014b). MDS was performed using the frequencies of the same top use TCRα and TCRβ CDR3 motifs (combined list) in MATLAB using the “cmdscale” function. This associated the same “group 2” patients as the hierarchical CDR3 motif clustering (Fig. S4F), onto which the fraction of signature CDR3 motifs in each PBMC repertoire was projected in color to illustrate the relationship between their fractional use and patient clustering by MDS.

Supplementary Material

Supplementary File
pnas.1601012113.sd01.xlsx (56.3KB, xlsx)
Supplementary File
pnas.1601012113.sd02.xlsx (54.5KB, xlsx)
Supplementary File
Supplementary File
pnas.1601012113.sd04.xlsx (51.6KB, xlsx)

Acknowledgments

We thank Erin Bush and Drs. Xiaojun Feng and Xiaoyun Sun (JP Sulzberger Columbia Genome Center) for their technical assistance and Dr. Tao Su (Herbert Irving Comprehensive Cancer Center Molecular Pathology Shared Resource) and Dr. Anthony Sireci and Samantha Cano (Department of Pathology) for sharing instrumentation. We also thank the surgeons and clinical staff of the Columbia Neurological Institute, particularly Dr. Michael B. Sisti and Dr. Guy M. McKhann II, for procuring surgical specimens. J.N.B. and P.C. were supported by an anonymous donor. This work was supported by NIH/National Institute of Biomedical Imaging and Bioengineering Grant K01EB016071 (to P.A.S.), funds from the Columbia Department of Systems Biology and the JP Sulzberger Columbia Genome Center (Y.S.), and NIH/National Institute of Neurological Disorders and Stroke Grant R01NS066955 (to P.C.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequencing data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE79338).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1601012113/-/DCSupplemental.

References

  • 1.Phillips HS, et al. Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell. 2006;9(3):157–173. doi: 10.1016/j.ccr.2006.02.019. [DOI] [PubMed] [Google Scholar]
  • 2.Verhaak RG, et al. Cancer Genome Atlas Research Network Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 2010;17(1):98–110. doi: 10.1016/j.ccr.2009.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ivliev AE, ’t Hoen PA, Sergeeva MG. Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res. 2010;70(24):10060–10070. doi: 10.1158/0008-5472.CAN-10-2465. [DOI] [PubMed] [Google Scholar]
  • 4.Doucette T, et al. Immune heterogeneity of glioblastoma subtypes: Extrapolation from the cancer genome atlas. Cancer Immunol Res. 2013;1(2):112–122. doi: 10.1158/2326-6066.CIR-13-0028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Murat A, et al. Modulation of angiogenic and inflammatory response in glioblastoma by hypoxia. PLoS One. 2009;4(6):e5947. doi: 10.1371/journal.pone.0005947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Weinstein JA, Jiang N, White RA, 3rd, Fisher DS, Quake SR. High-throughput sequencing of the zebrafish antibody repertoire. Science. 2009;324(5928):807–810. doi: 10.1126/science.1170020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang C, et al. High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets. Proc Natl Acad Sci USA. 2010;107(4):1518–1523. doi: 10.1073/pnas.0913939107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Boyd SD, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci Transl Med. 2009;1(12):12ra23. doi: 10.1126/scitranslmed.3000540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Freeman JD, Warren RL, Webb JR, Nelson BH, Holt RA. Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res. 2009;19(10):1817–1824. doi: 10.1101/gr.092924.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Woodsworth DJ, Castellarin M, Holt RA. Sequence analysis of T-cell repertoires in health and disease. Genome Med. 2013;5(10):98. doi: 10.1186/gm502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Han J, et al. Simultaneous amplification and identification of 25 human papillomavirus types with Templex technology. J Clin Microbiol. 2006;44(11):4157–4162. doi: 10.1128/JCM.01762-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yang Y, et al. Distinct mechanisms define murine B cell lineage immunoglobulin heavy chain (IgH) repertoires. eLife. 2015;4:e09083. doi: 10.7554/eLife.09083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zvyagin IV, et al. Distinctive properties of identical twins’ TCR repertoires revealed by high-throughput sequencing. Proc Natl Acad Sci USA. 2014;111(16):5980–5985. doi: 10.1073/pnas.1319389111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Putintseva EV, et al. Mother and child T cell receptor repertoires: Deep profiling study. Front Immunol. 2013;4:463. doi: 10.3389/fimmu.2013.00463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zarnitsyna VI, Evavold BD, Schoettle LN, Blattman JN, Antia R. Estimating the diversity, completeness, and cross-reactivity of the T cell repertoire. Front Immunol. 2013;4:485. doi: 10.3389/fimmu.2013.00485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Merkenschlager J, Kassiotis G. Narrowing the gap: Preserving repertoire diversity despite clonal selection during the CD4 T cell response. Front Immunol. 2015;6:413. doi: 10.3389/fimmu.2015.00413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Murugan A, Mora T, Walczak AM, Callan CG., Jr Statistical inference of the generation probability of T-cell receptors from sequence repertoires. Proc Natl Acad Sci USA. 2012;109(40):16161–16166. doi: 10.1073/pnas.1212755109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Elhanati Y, Murugan A, Callan CG, Jr, Mora T, Walczak AM. Quantifying selection in immune receptor repertoires. Proc Natl Acad Sci USA. 2014;111(27):9875–9880. doi: 10.1073/pnas.1409572111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Turner SJ, Doherty PC, McCluskey J, Rossjohn J. Structural determinants of T-cell receptor bias in immunity. Nat Rev Immunol. 2006;6(12):883–894. doi: 10.1038/nri1977. [DOI] [PubMed] [Google Scholar]
  • 20.Ndifon W, et al. Chromatin conformation governs T-cell receptor Jβ gene segment usage. Proc Natl Acad Sci USA. 2012;109(39):15865–15870. doi: 10.1073/pnas.1203916109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Britanova OV, et al. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J Immunol. 2014;192(6):2689–2698. doi: 10.4049/jimmunol.1302064. [DOI] [PubMed] [Google Scholar]
  • 22.Emerson R, et al. Estimating the ratio of CD4+ to CD8+ T cells using high-throughput sequence data. J Immunol Methods. 2013;391(1-2):14–21. doi: 10.1016/j.jim.2013.02.002. [DOI] [PubMed] [Google Scholar]
  • 23.Gregersen PK, Hingorani R, Monteiro J. Oligoclonality in the CD8+ T-cell population. Analysis using a multiplex PCR assay for CDR3 length. Ann N Y Acad Sci. 1995;756:19–27. doi: 10.1111/j.1749-6632.1995.tb44479.x. [DOI] [PubMed] [Google Scholar]
  • 24.Robert L, et al. CTLA4 blockade broadens the peripheral T-cell receptor repertoire. Clin Cancer Res. 2014;20(9):2424–2432. doi: 10.1158/1078-0432.CCR-13-2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tumeh PC, et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature. 2014;515(7528):568–571. doi: 10.1038/nature13954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Radebe M, et al. Broad and persistent Gag-specific CD8+ T-cell responses are associated with viral control but rarely drive viral escape during primary HIV-1 infection. AIDS. 2015;29(1):23–33. doi: 10.1097/QAD.0000000000000508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Morris H, et al. Tracking donor-reactive T cells: Evidence for clonal deletion in tolerant kidney transplant patients. Sci Transl Med. 2015;7(272):272ra10. doi: 10.1126/scitranslmed.3010760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kmiecik J, et al. Elevated CD3+ and CD8+ tumor-infiltrating immune cells correlate with prolonged survival in glioblastoma patients despite integrated immunosuppressive mechanisms in the tumor microenvironment and at the systemic level. J Neuroimmunol. 2013;264(1-2):71–83. doi: 10.1016/j.jneuroim.2013.08.013. [DOI] [PubMed] [Google Scholar]
  • 29.Okada H, Khoury SJ. Type17 T-cells in central nervous system autoimmunity and tumors. J Clin Immunol. 2012;32(4):802–808. doi: 10.1007/s10875-012-9686-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Waziri A, et al. Preferential in situ CD4+CD56+ T cell activation and expansion within human glioblastoma. J Immunol. 2008;180(11):7673–7680. doi: 10.4049/jimmunol.180.11.7673. [DOI] [PubMed] [Google Scholar]
  • 31.Fong B, et al. Monitoring of regulatory T cell frequencies and expression of CTLA-4 on T cells, before and after DC vaccination, can predict survival in GBM patients. PLoS One. 2012;7(4):e32614. doi: 10.1371/journal.pone.0032614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Becattini S, et al. T cell immunity. Functional heterogeneity of human memory CD4+ T cell clones primed by pathogens or vaccines. Science. 2015;347(6220):400–406. doi: 10.1126/science.1260668. [DOI] [PubMed] [Google Scholar]
  • 33.Venturi V, et al. TCR beta-chain sharing in human CD8+ T cell responses to cytomegalovirus and EBV. J Immunol. 2008;181(11):7853–7862. doi: 10.4049/jimmunol.181.11.7853. [DOI] [PubMed] [Google Scholar]
  • 34.Lim A, et al. Frequent contribution of T cell clonotypes with public TCR features to the chronic response against a dominant EBV-derived epitope: Application to direct detection of their molecular imprint on the human peripheral T cell repertoire. J Immunol. 2000;165(4):2001–2011. doi: 10.4049/jimmunol.165.4.2001. [DOI] [PubMed] [Google Scholar]
  • 35.McCluskey J, Kanaan C, Diviney M. Nomenclature and serology of HLA class I and class II alleles. Curr Protoc Immunol. 2003;52(1S):A.1S.1–A.1S.8. doi: 10.1002/0471142735.ima01s52. [DOI] [PubMed] [Google Scholar]
  • 36.Wucherpfennig KW, Strominger JL. Molecular mimicry in T cell-mediated autoimmunity: Viral peptides activate human T cell clones specific for myelin basic protein. Cell. 1995;80(5):695–705. doi: 10.1016/0092-8674(95)90348-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tosato G, Cohen JI. 2007. Generation of Epstein-Barr virus (EBV)-immortalized B cell lines. Curr Protoc Immunol Chap 7:Unit 7.22.
  • 38.Bhaduri-McIntosh S, Rotenberg MJ, Gardner B, Robert M, Miller G. Repertoire and frequency of immune cells reactive to Epstein-Barr virus-derived autologous lymphoblastoid cell lines. Blood. 2008;111(3):1334–1343. doi: 10.1182/blood-2007-07-101907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lossius A, et al. High-throughput sequencing of TCR repertoires in multiple sclerosis reveals intrathecal enrichment of EBV-reactive CD8+ T cells. Eur J Immunol. 2014;44(11):3439–3452. doi: 10.1002/eji.201444662. [DOI] [PubMed] [Google Scholar]
  • 40.Suessmuth Y, et al. CMV reactivation drives posttransplant T-cell reconstitution and results in defects in the underlying TCRβ repertoire. Blood. 2015;125(25):3835–3850. doi: 10.1182/blood-2015-03-631853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Angelova M, et al. Characterization of the immunophenotypes and antigenomes of colorectal cancers reveals distinct tumor escape mechanisms and novel targets for immunotherapy. Genome Biol. 2015;16:64. doi: 10.1186/s13059-015-0620-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Drake CG, Lipson EJ, Brahmer JR. Breathing new life into immunotherapy: review of melanoma, lung and kidney cancer. Nat Rev Clin Oncol. 2014;11(1):24–37. doi: 10.1038/nrclinonc.2013.208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sims JS, Ung TH, Neira JA, Canoll P, Bruce JN. Biomarkers for glioma immunotherapy: The next generation. J Neurooncol. 2015;123(3):359–372. doi: 10.1007/s11060-015-1746-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Carson MJ, Doose JM, Melchior B, Schmid CD, Ploix CC. CNS immune privilege: Hiding in plain sight. Immunol Rev. 2006;213:48–65. doi: 10.1111/j.1600-065X.2006.00441.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Guan X, et al. Molecular subtypes of glioblastoma are relevant to lower grade glioma. PLoS One. 2014;9(3):e91216. doi: 10.1371/journal.pone.0091216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sensi M, Parmiani G. Analysis of TCR usage in human tumors: A new tool for assessing tumor-specific immune responses. Immunol Today. 1995;16(12):588–595. doi: 10.1016/0167-5699(95)80082-4. [DOI] [PubMed] [Google Scholar]
  • 47.Gros A, et al. PD-1 identifies the patient-specific CD8+ tumor-reactive repertoire infiltrating human tumors. J Clin Invest. 2014;124(5):2246–2259. doi: 10.1172/JCI73639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Linnemann C, et al. High-throughput identification of antigen-specific TCRs by TCR gene capture. Nat Med. 2013;19(11):1534–1541. doi: 10.1038/nm.3359. [DOI] [PubMed] [Google Scholar]
  • 49.Hitchcock ER, Morris CS. Mononuclear cell infiltration in central portions of human astrocytomas. J Neurosurg. 1988;68(3):432–437. doi: 10.3171/jns.1988.68.3.0432. [DOI] [PubMed] [Google Scholar]
  • 50.Magoč T, Salzberg SL. FLASH: Fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–2963. doi: 10.1093/bioinformatics/btr507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Shannon CE. A mathematical theory of communication. At&T Tech J. 1948;27(3):379–423. [Google Scholar]
  • 53.Warren RL, et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 2011;21(5):790–797. doi: 10.1101/gr.115428.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Lefranc MP, et al. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res. 2015;43(Database issue):D413–D422. doi: 10.1093/nar/gku1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Krzywinski M, et al. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Endres DM, Schindelin JE. A new metric for probability distributions. Ieee T Inform Theory. 2003;49(7):1858–1860. [Google Scholar]
  • 57.Klarenbeek PL, et al. Deep sequencing of antiviral T-cell responses to HCMV and EBV in humans reveals a stable repertoire that is maintained for many years. PLoS Pathog. 2012;8(9):e1002889. doi: 10.1371/journal.ppat.1002889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Emerson R, et al. Immunosequencing reveals diagnostic signatures of chronic viral infection in T cell memory. BioRxiv. September 10, 2015 doi: 10.1101/026567. [DOI] [Google Scholar]
  • 59.Dziubianau M, et al. TCR repertoire analysis by next generation sequencing allows complex differential diagnosis of T cell-related pathology. Am J Transplant. 2013;13(11):2842–2854. doi: 10.1111/ajt.12431. [DOI] [PubMed] [Google Scholar]
  • 60.Hong J, et al. A common TCR V-D-J sequence in V beta 13.1 T cells recognizing an immunodominant peptide of myelin basic protein in multiple sclerosis. J Immunol. 1999;163(6):3530–3538. [PubMed] [Google Scholar]
  • 61.Junker A, et al. Multiple sclerosis: T-cell receptor expression in distinct brain regions. Brain. 2007;130(Pt 11):2789–2799. doi: 10.1093/brain/awm214. [DOI] [PubMed] [Google Scholar]
  • 62.Bolotin DA, et al. Next generation sequencing for TCR repertoire profiling: Platform-specific features and correction algorithms. Eur J Immunol. 2012;42(11):3073–3083. doi: 10.1002/eji.201242517. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1601012113.sd01.xlsx (56.3KB, xlsx)
Supplementary File
pnas.1601012113.sd02.xlsx (54.5KB, xlsx)
Supplementary File
Supplementary File
pnas.1601012113.sd04.xlsx (51.6KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES