Abstract
Disruption of the class I human leukocyte antigen (HLA) molecules has important implications for immune evasion and tumor evolution. We developed major histocompatibility complex loss of heterozygosity (LOH), allele-specific mutation and measurement of expression and repression (MHC Hammer). We identified extensive variability in HLA allelic expression and pervasive HLA alternative splicing in normal lung and breast tissue. In lung TRACERx and lung and breast TCGA cohorts, 61% of lung adenocarcinoma (LUAD), 76% of lung squamous cell carcinoma (LUSC) and 35% of estrogen receptor-positive (ER+) cancers harbored class I HLA transcriptional repression, while HLA tumor-enriched alternative splicing occurred in 31%, 11% and 15% of LUAD, LUSC and ER+ cancers. Consistent with the importance of HLA dysfunction in tumor evolution, in LUADs, HLA LOH was associated with metastasis and LUAD primary tumor regions seeding a metastasis had a lower effective neoantigen burden than non-seeding regions. These data highlight the extent and importance of HLA transcriptomic disruption, including repression and alternative splicing in cancer evolution.
Subject terms: Breast cancer, Genome informatics, Lung cancer, Tumour immunology
Major histocompatibility complex (MHC) loss of heterozygosity, allele-specific mutation and measurement of expression and repression (MHC Hammer) detects disruption to human leukocyte antigens due to mutations, loss of heterogeneity, altered gene expression or alternative splicing. Applied to lung and breast cancer datasets, the tool shows that these aberrations are common across cancer and can have clinical implications.
Main
Emerging data have highlighted the importance of considering cancer evolution in the context of a predatory immune microenvironment1–3. Key mediators of the cytotoxic T cell response in cancer are neoantigens, cancer-cell-specific alterations resulting in mutant peptides capable of eliciting a T cell-mediated, human leukocyte antigen (HLA)-restricted immune response. A mutation can only result in a neoantigen if the associated mutant peptide is presented on HLA molecules to the T cell receptor. Therefore, disruption of HLA molecules has important implications for immune evasion.
Disruption to antigen-presenting machinery occurs across many cancer types4–7. Our previous work has revealed that HLA loss of heterozygosity (LOH), whereby one allele is somatically lost, occurs in 40% of non-small cell lung cancer (NSCLC) primary tumors7. A pan-cancer study has suggested that transcriptomic downregulation of HLA genes occurs frequently4. However, subtle transcriptomic alterations, such as alternative splicing events and allele-specific repression, have been poorly studied in cancer. Alternative splicing, which has been reported in non-cancer tissue and cancer cell lines, can result in a non-functional HLA molecule or, in the case of exon 5 skipping, soluble isoforms of the HLA molecule8–13. Furthermore, understanding HLA expression in tumor-adjacent normal tissue is of critical importance when attributing any change in HLA expression as a tumor-specific phenomenon.
Here we present major histocompatibility complex loss of heterozygosity, allele-specific mutation and measurement of expression and repression (MHC Hammer), a computational toolkit to accurately determine allele-specific mutations, LOH, allelic expression, allelic repression and alternative splicing of the class I HLA genes. We use MHC Hammer to investigate HLA expression in normal tissue and to evaluate genomic and transcriptomic disruption in tumor evolution in multiple cohorts, including 421 patients with NSCLC in the multiregional TRACERx421 dataset14–16 (Extended Data Fig. 1), 945 patients with NSCLC and 972 patients with breast cancer in the Cancer Genome Atlas (TCGA) dataset17–19 (Extended Data Fig. 2) and 489 normal lung and 397 normal breast samples from the Genotype-Tissue Expression (GTEx) dataset20 (Extended Data Fig. 3).
Results
A pipeline to evaluate HLA disruption
To evaluate the extent of genomic and transcriptomic HLA disruption, we developed MHC Hammer, advancing our LOHHLA algorithm7 (Fig. 1). The tool has the following four major components: (1) identifying allele-specific HLA somatic mutations, (2) calculating HLA LOH, (3) evaluating HLA allele-specific repression and (4) identifying allele-specific HLA alternative splicing. MHC Hammer is provided as a Nextflow pipeline (https://github.com/McGranahanLab/mhc-hammer) (Methods; Supplementary Note and Supplementary Fig. 1).
HLA allele-specific expression in normal tissue
We first evaluated HLA allelic expression and alternative splicing in normal lung and breast tissue using data from the GTEx project20. This dataset includes 489 lung and 397 breast tissue samples from 645 healthy individuals, of which 241 have both lung and breast tissue samples available (Methods; Extended Data Fig. 3).
We found that in both normal lung and breast tissue, HLA-B had the highest median expression (lung, 440.3 and breast, 227.4 reads per kilobase million(RPKM)), followed by HLA-C (lung, 371.9 and breast, 177.1 RPKM), then HLA-A (lung, 289.4 and breast, 162.0 RPKM; Fig. 2a). A wide range of HLA expression was observed across the three class I genes (Fig. 2a). In individuals with both lung and breast tissue samples, HLA gene expression was higher in lung tissue in 175/198 (88%), 186/204 (91%) and 190/203 (94%) of cases for HLA-A, HLA-B and HLA-C, respectively (Fig. 2b).
Significant HLA allelic imbalance (AIB) was pervasive: 273/388 (70%), 183/418 (44%) and 305/407 (75%) of normal lung tissue samples exhibited statistically significant AIB in expression in HLA-A, HLA-B and HLA-C, respectively, while 266/329 (81%), 178/343 (52%) and 255/328 (78%) of normal breast tissue samples exhibited statistically significant AIB expression in HLA-A, HLA-B and HLA-C, respectively (Fig. 2c and Supplementary Note).
This observed expression imbalance may be due in part to the combination of HLA alleles that an individual harbors. Consistently, we observed evidence of a relationship between allelic expression and the allele type, consistent with previous reports21,22 (P < 2 × 10−16, one-way analysis of variance; Fig. 2d). From the alleles for which we had >30 GTEx individuals with lung and/or breast samples, we found that in lung, HLA-A*03:01, HLA-B*15:01 and HLA-C*08:02 had the lowest expression across the three genes, while HLA-A*24:02, HLA-B*40:01 and HLA-C*04:01 had the highest. In breast, HLA-A*03:01, HLA-B*15:01 and HLA-C*07:02 had the lowest expression across the three genes, whereas HLA-A*24:02, HLA-B*08:01 and HLA-C*04:01 had the highest.
HLA alternative splicing in normal tissue samples
Given the role that HLA alternative splicing could have in HLA presentation, we first used MHC Hammer to investigate the prevalence of HLA alternative splicing in the GTEx normal tissue cohort. In our cohort, MHC Hammer identified complete exon skipping, partial exon skipping and partial intron retention in the HLA alleles, but we did not observe evidence for complete intron retention in any HLA allele (Extended Data Fig. 4 and Supplementary Note).
Alternative splicing in the HLA alleles was frequent in the GTEx normal tissue cohort; 466/483 (97%) of normal lung and 339/392 (87%) of normal breast samples harbored at least one alternative splicing event. Exon 5 skipping was the most frequent event in both lung and breast tissue, occurring in 368/483 (76%) of lung and 270/392 (69%) of breast samples, followed by partial retention of intron 5, occurring in 271/483 (56%) of lung and 180/392 (46%) of breast tissue samples (Fig. 2e,f). Skipping of exon 5 has been shown to result in a soluble HLA molecule due to the absence of the transmembrane domain10–12. We also observed HLA alternative splicing events in exons or introns 2, 3 or 4, which could result in an unstable HLA molecule potentially unable to present antigens to the immune system8,9,13 in 267/483 (55%) of normal lung tissue samples and 147/392 (38%) of normal breast tissue samples (Fig. 2e,f).
To estimate the relative abundance of the novel (alternatively spliced) transcripts, we estimated a ‘novel transcript proportion’ (Supplementary Note). Most splicing events occurred with a low novel transcript proportion, with 1,667/1,863 (90%) events in the lung and 956/1,118 (86%) events in the breast occurring with a novel transcript proportion of less than 0.1 (Fig. 2g).
We next compared the splicing events in breast and lung tissues from the same patient. In total, 43% of all alternative splicing events occurred in both breast and lung tissue, while 46% occurred only in the lung sample and just 11% occurred only in the breast sample. When restricting to events that occurred in both tissues from the same individual, we observed a high concordance between the novel transcript proportions (Fig. 2h). However, for the majority of the alternative splicing events that introduced a premature termination codon (PTC), the novel transcript proportion was higher in lung than in breast tissue (Extended Data Fig. 5). This observation could possibly be driven by differences in the rate of nonsense mediated decay (NMD) in different tissues23,24.
Finally, we investigated whether certain alleles were enriched for specific alternative splicing events. We focused on alternative splicing events in the 23 HLA alleles that were present in >30 GTEx individuals with breast and/or lung tissue. In lung, 10/105 alternative splicing events in these 23 alleles occurred in more than 50% of samples with the allele. In breast, 8/81 of alternative splicing events occurred in more than 50% of samples with the allele. For example, inframe complete exon 5 skipping in HLA-C*04:01:01:01 occurred in 100% of breast and lung samples (lung: 94/94, breast: 64/64) with this allele, which supports the results described in a previous study12. We also observed inframe partial intron 5 retention in HLA-C*03:04:01:01 in 100% of breast and lung samples with this allele (lung, 70/70 and breast, 57/57; Supplementary Table 1).
These data suggest that the HLA alleles are subject to widespread expression imbalance and alternative splicing in normal tissue and that total HLA gene expression is strongly influenced by the combination of HLA alleles that a person harbors. These data emphasize the importance of controlling for HLA allelic expression in normal tissue when assessing transcriptional alterations in tumors.
HLA genomic disruption in lung and breast cancer
In the TRACERx421 cohort, LOH of the class I HLA genes was frequent, occurring in 75/235 (32%) of lung adenocarcinoma (LUAD), 76/132 (58%) of lung squamous cell carcinoma (LUSC) and 13/44 (30%) of other NSCLC histological subtype primary tumors, consistent with our previous findings7 (Extended Data Fig. 6). In keeping with this, the rate of HLA LOH in the TCGA lung cohort was 65/245 (27%) for LUAD tumors and 104/267 (39%) for LUSC tumors (Extended Data Fig. 6). In TCGA, triple-negative breast cancer (TNBC) had the highest rate of HLA LOH (17/61 (28%)), followed by estrogen receptor negative (ER−; 7/32 (22%)) and estrogen receptor positive (ER+; 60/402 (15%); Extended Data Fig. 6). By contrast, high-impact damaging mutations in the HLA genes were relatively rare, occurring in only 5/411 (1.2%) tumors in the TRACERx421 cohort, 2/514 (0.4%) tumors in the TCGA breast cohort and were not observed in the TCGA lung cohort.
Transcriptional repression of class I HLA alleles in tumors
We next investigated whether there was evidence of additional disruption of HLA alleles through transcriptional repression in tumor regions. Given the heterogeneity observed in normal HLA allelic expression, we measured tumor HLA repression with reference to the patient-matched tumor-adjacent normal sample (Supplementary Note). We were able to evaluate transcriptional repression in 49 LUAD and 29 LUSC tumors in the TRACERx421 cohort and in 13 LUAD, 27 LUSC and 34 ER+ breast cancers from the TCGA cohort. We did not detect any high-impact damaging HLA mutations in any tumor with a tumor-adjacent normal sample.
We identified extensive transcriptional repression of the HLA alleles that could not be explained by LOH or damaging mutations in both the lung and breast tumors (Fig. 3a,b and Supplementary Fig. 2). In the TRACERx421 cohort, 30/49 (61%) of LUAD and 22/29 (76%) of LUSC tumors harbored transcriptional repression of at least one HLA allele not caused by LOH. Taken together, just 13/49 (27%) of LUAD and 2/29 (7%) of LUSC tumors exhibited no LOH or repression in any class I HLA gene (Fig. 3b). These results were consistent in the TCGA LUAD and LUSC cohorts (Supplementary Fig. 2). In contrast, 19/34 (56%) of ER+ breast cancers exhibited no damaging mutations, LOH or repression in any class I HLA gene (Fig. 3b).
HLA genomic biallelic loss (that is, homozygous deletion) was an uncommon event, occurring in only 11/411 (3%) of TRACERx421 NSCLCs, 7/512 (1%) of TCGA NSCLCs and 17/514 (3%) of TCGA breast cancers. To investigate biallelic transcriptional repression, we restricted our analysis to HLA genes with no evidence for genomic alterations. Biallelic transcriptional repression of a given HLA gene was relatively frequent, occurring in 24/43 (56%), 11/16 (69%) and 9/32 (28%) of LUAD, LUSC and ER+ breast tumors (Fig. 3c). However, while homozygous deletion will necessarily impact both alleles equally, we found evidence of unequal biallelic repression of alleles, with 14/24 (58%), 8/11 (73%) and 4/9 (44%) of LUAD, LUSC and ER+ tumors harboring at least one HLA gene with AIB in the tumor but not the normal (or vice versa).
The impact of HLA disruption on neoantigen presentation
To investigate the impact of HLA LOH and transcriptional repression on the predicted number of neoantigens presented to the immune system, we quantified, for each tumor region, the number of different alleles when considering (1) neither LOH nor repression, (2) LOH or (3) LOH and repression. When accounting for LOH and repression, 39/132 (30%) of LUAD tumor regions, 3/90 (3%) of LUSC tumor regions and 18/36 (50%) of ER+ breast tumors had all six intact HLA alleles, while 9/132 (7%) of LUAD tumor regions, 18/90 (20%) of LUSC tumor regions and 3/36 (8%) of ER+ breast tumors had all six alleles disrupted at the genomic and transcriptomic levels (Fig. 3d). On average, 28.2% and 52.3% of putative neoantigens were predicted to bind exclusively to alleles subject to LOH or repression in LUAD and LUSC, respectively (Extended Data Fig. 7).
Mechanisms of HLA repression
The predominant modulators of HLA class I transcription are the NOD-like receptor (NLR) proteins NLRC5 and CIITA. The HLA promoter region also contains the tumor necrosis factor (TNF)-stimulated promoter site, EnhA and the IFNG-stimulated response element, ISRE25. In tumors without any genomic HLA disruption, we observed a significant positive correlation (Pearson’s r ≥ 0.3 and P ≤ 0.01), between total HLA expression and the expression of NLRC5 and CIITA in LUAD, LUSC and ER+ tumors from both the TRACERx421 and TCGA cohorts, as well as between total HLA expression and IFNG expression in the TRACERx LUAD and TCGA LUAD and LUSC cohorts. We only observed a significant positive correlation between total HLA expression and TNF in the TCGA LUSC cohort (Extended Data Fig. 8).
We next investigated whether there was a relationship between tumor–normal changes in the expression of these genes and the likelihood of the tumor region having allelic repression unexplained by genomic disruption. In LUAD, LUSC and ER+ breast cancer, samples with allelic transcriptional repression had a significantly lower tumor-to-normal ratio of NLRC5 than those without transcriptional repression. The same was true for CIITA and IFNG in LUAD and LUSC tumors, but not ER+ breast tumors (Fig. 3e). We only observed a significant relationship with TNF in LUADs (Supplementary Fig. 3).
Previous work has identified methylation as a mechanism that can influence HLA allelic expression21,26. To investigate the role of hypermethylation in HLA transcriptional repression, we used methylation array data from the TCGA cohorts. We observed distinct patterns of methylation across the HLA genes in the LUAD, LUSC and breast tumors, with the gene body having the highest methylation in HLA-A and HLA-B in both tumor and normal tissues. In HLA-C, the region 1,500–200 bp upstream of the transcriptional start site (termed TSS1500) had the highest level of methylation, followed by the gene body, in both the tumor and normal tissues (Supplementary Fig. 4). We observed a significant negative correlation (Pearson’s r ≤ −0.3 and P ≤ 0.01) between HLA-B gene expression and the degree of methylation in both the TSS1500 and the gene body region in the TCGA LUAD, LUSC and ER+ tumor samples (Supplementary Fig. 5–7).
Taken together, these data suggest that in the LUAD, LUSC and breast tumors, changes in the expression of NLRC5, CIITA, IFNG as well as hypermethylation may play a role in the repression of HLA.
HLA alternative splicing in breast and lung cancer
Given the pervasive nature of alternative splicing in normal tissue, we evaluated whether splicing events occurred at significantly higher frequency in the tumor (‘tumor-enriched’) or, conversely, at significantly lower frequency in the tumor (‘tumor depleted’; Supplementary Note). Both tumor-enriched and tumor-depleted HLA alternative splicing events were common, with 30.6%, 10.7% and 14.7% of LUAD, LUSC and ER+ breast tumors harboring at least one tumor-enriched alternative splicing event and 22.6%, 21.4% and 14.7% of LUAD, LUSC and ER+ tumors harboring at least one tumor-depleted alternative splicing event (Fig. 4a).
The most frequent tumor-enriched alternative splicing events included inframe partial intron 5 retention and inframe complete exon 3 skipping (Fig. 4b). Changes to the sequence that encodes exon 3 could result in altered peptide binding or an unstable HLA molecule8,13. In contrast, inframe complete exon 5 skipping was observed more frequently as a tumor-depleted event in LUAD and ER+ tumors (Fig. 4b). Skipping of exon 5 has been demonstrated in previous studies to result in a soluble HLA molecule10–12. For each alternative splicing event, the tumor-to-normal change in the novel transcript proportion is shown (Supplementary Note and Fig. 4b).
The introduction or deletion of nucleotide sequences due to alternative splicing could result in a frameshift and/or the introduction of a PTC in the resulting transcript. In LUAD, LUSC and ER+ tumors, all complete exon skipping events were inframe (Fig. 4c). In contrast, we observed partial exon skipping events and partial intron retention events that were inframe, or that resulted in a frameshift, with and without the introduction of a PTC (Fig. 4c).
Given that tumor samples reflect an admixture of cancer cells and non-cancer cells, to estimate the fraction of alternatively spliced transcripts in the cancer cells, we scaled the novel transcript proportion of the tumor-enriched events by the estimated purity of the tumor region (Supplementary Note). Although there were outliers, the purity-scaled novel transcript proportion was less than 0.25 in most cases (mean = 0.24 and range = 0.006–1; Fig. 4d). These data suggest either one or both of the following are occurring: within each cancer cell, both the canonical and novel transcripts are being transcribed, or only a subset of cancer cells harbor the novel transcript.
To further evaluate the rate of tumor-enriched alternative splicing observed in HLA alleles and whether this is higher or lower than expected, we considered the rate of somatic alternative splicing across all protein-coding genes (Methods). We found that lung cancer genes had a higher rate of alternative splicing than other protein-coding genes (P = 9.1 × 10−3; Fig. 4e). In addition, from the set of 49 lung cancer genes, we found that HLA-C had the fourth, HLA-B had the seventh and HLA-A had the eighth highest frequency of alternative splicing (Fig. 4e).
Consistent with the selection of alternative splicing events, we observed that LUAD tumor regions without HLA LOH or repression were enriched for tumor-enriched alternative splicing events (LUAD, P = 1.9 × 10−3) compared to regions that harbored either HLA LOH or repression (Fig. 4f). We did not see this enrichment with LUSC or ER+ tumors. This suggests that in LUAD, tumor-enriched alternative splicing may offer an alternative means to disrupt HLA presentation during tumor evolution.
To further investigate the importance of HLA alternative splicing in tumor evolution, we compared the total number of neoantigens predicted to bind to alleles with or without tumor-enriched HLA alternative splicing. We first quantified the number of neoantigens predicted to bind to the intact HLA alleles for each tumor region. We then compared the neoantigen count in alleles that exhibited tumor-enriched alternative splicing versus those that did not, excluding alleles with HLA LOH. HLA alleles exhibiting tumor-enriched HLA alternative splicing in LUAD tumors were associated with a higher neoantigen count compared to alleles without evidence of tumor-enriched splicing (P = 3 × 10−5; Fig. 4g), suggesting that in LUAD tumors, tumor-enriched alternative splicing of HLA alleles may be selected to reduce antigen presentation. We did not observe this enrichment in LUSC tumors.
HLA disruption and tumor evolution
To understand when HLA LOH, transcriptional repression and somatic alternative splicing occur during NSCLC evolution, we considered the heterogeneity of these events using the multiregion TRACERx cohort. We defined an HLA disruption event as ubiquitous if it occurred in all of the primary tumor regions, and heterogeneous otherwise. In both LUAD and LUSC tumors, tumor-enriched alternative splicing events were the most heterogeneous (LUAD = 82.6% and LUSC = 71.4%), followed by repression (LUAD = 56.2% and LUSC = 52.2%) and then LOH events (LUAD = 51.6% and LUSC = 47.1%; Fig. 5a).
In 20/65 (30.8%) of TRACERx LUAD and LUSC tumors with HLA disruption, we observed convergence upon disruption of the same allele through alternative mechanisms, with genomic loss, transcriptional repression and/or alternative splicing of the same allele occurring in different regions of the same tumor. We observed ten tumors with convergence upon genomic loss and transcriptional repression of the same allele in separate regions, nine tumors with transcriptional repression and alternative splicing of the same allele in separate regions and one tumor with genomic loss, transcriptional repression and alternative splicing of the same allele in different regions (Fig. 5b,c). Conceivably, this could either reflect positive selection within individual tumors or be a consequence of the high rate of HLA disruption through diverse mechanisms.
The tumor microenvironment can shape tumor evolution1. We therefore investigated the relationship between the immune infiltrate and the presence of HLA disruption, using the Danaher in silico immune deconvolution method to estimate the amount of CD8 T cell infiltrate27. We observed a significant relationship between total HLA expression and CD8 T cell infiltrate (LUAD, P = 1.6 × 10−27 and r = 0.48; LUSC, P = 3.7 × 10−15 and r = 0.44; ER+, P = 1.3 × 10−24 and r = 0.42; Supplementary Fig. 8). We observed that LUAD and LUSC tumor regions with allelic HLA transcriptional repression had lower levels of infiltrating CD8 T cells compared to those without (LUAD, P = 1.3 × 10−6; LUSC, P = 0.015 and ER+, P = 0.25; Fig. 5d). Conversely, HLA alternative splicing was associated with elevated CD8 T cell levels in LUAD tumors (LUAD, P = 3 × 10−6; LUSC, P = 0.32 and ER+, P = 0.54; Extended Data Fig. 9). No clear relationship between HLA LOH and total HLA expression in either LUAD, LUSC or ER+ tumors was observed, indicating dosage compensation may occur following allelic HLA copy number loss (Fig. 5e).
Finally, we endeavored to understand whether disruption of the HLA alleles through LOH, repression or alternative splicing might have a role in the evolution of lung cancer metastasis. We found that LUAD tumors harboring HLA LOH were more likely to metastasize than those without HLA LOH (LUAD, P = 0.02 and LUSC, P = 0.81; Fig. 5f). To investigate this further, we considered the neoantigen burden of primary tumor regions that seeded metastasis compared to those that did not, with the metastasis-seeding regions being defined in our previously published work15. We found that the standard neoantigen burden did not distinguish seeding from non-seeding regions (Extended Data Fig. 10a). However, when we restricted our neoantigen count to only include neoantigens predicted to bind to intact HLA alleles, not subject to loss or repression, we observed that LUAD tumor regions that seeded metastasis had a lower effective neoantigen burden than those that did not (LUAD, P = 0.01 and LUSC, P = 0.81; Fig. 5g), which was not the case when we only considered HLA loss (Extended Data Fig. 10b). Taken together, these data suggest that disruption of the HLA alleles could have an important role in tumor metastasis.
Discussion
Neoantigen presentation via HLA molecules is crucial to achieving an antitumor immune response. Previous studies have illustrated that different mechanisms of HLA disruption are common across cancers4–7. Here we developed MHC Hammer, a tool to investigate the prevalence of four mechanisms of genomic and transcriptomic disruption of the HLA alleles in lung and breast cancer—mutations, LOH, repression and alternative splicing.
While damaging HLA mutations were rare in our cohorts, LOH, repression and tumor-enriched alternative splicing of the HLA alleles were pervasive. From the patients with tumor-adjacent normal samples, just 27% of LUAD, 7% of LUSC and 56% of ER+ tumors had no HLA disruption, while 30.6%, 10.7% and 14.7% of LUAD, LUSC and ER+ tumors exhibited tumor-enriched alternative splicing events. The lower rate of HLA LOH and repression observed in ER+ breast tumors may reflect the lower tumor mutational burden (TMB) in ER+ breast tumors compared to NSCLC28.
We observed differences in the patterns of HLA disruption in the NSCLC tumors—LUSC tumors were characterized by almost universal HLA disruption, while LUAD tumors exhibited less frequent HLA disruption. In LUADs, we observed an enrichment for alternative splicing in alleles without LOH or repression, a higher likelihood of tumor-enriched alternative splicing in alleles with a higher neoantigen burden and finally an association between HLA LOH and metastasis. This may reflect different selective pressures in these cancer types and the propensity for HLA disruption through diverse mechanisms.
One limitation of our method is that it requires a patient-matched tumor-adjacent normal tissue sample to determine HLA repression and tumor-enriched alternative splicing. This is due to the high variability observed in HLA allelic expression and the high prevalence of HLA alternative splicing in the normal tissue samples.
Alternative splicing of the class I HLA alleles has been observed in non-cancer tissue cohorts and in cancer cell lines8–13. However, HLA alternative splicing in large cohorts of normal and tumor tissue has not been described before, due in part to the lack of a high-throughput bioinformatics tool capable of measuring HLA alternative splicing.
HLA alternative splicing affecting exons or introns 2–4 could result in an unstable HLA molecule. For example, partial exon 3 skipping in an HLA-A allele in non-cancer tissue has been shown to result in the absence of cell-surface expression13. An HLA-A allele with complete exon 3 skipping continued to be expressed on the cell surface but as an immature glycoprotein unable to present peptides8. This immature molecule could potentially act as a decoy allele by inhibiting NK cells via its receptor ligands without presenting neoantigens to CD8 T cells. Alternative splicing resulting in exon 5 skipping has been shown to result in a soluble HLA allele10–12. Persistent presentation of neoantigens via soluble HLA molecules to the T cell receptor, without costimulatory or accessory signals, could lead to immune tolerance or T cell exhaustion. It has been shown that soluble class I HLA molecules can induce apoptosis in CD8 T cells and NK cells29.
The majority of the detected tumor-enriched alternative splicing events were present with a purity-adjusted novel transcript proportion <0.25. This could reflect NMD; PTC-induced NMD has been shown to reduce mRNA levels by up to 90% in a study of an HLA-A allele23. Alternatively, HLA alternative splicing may be a transient event, or the alternative splicing observed in the lung and breast tumors in this study may simply reflect transcriptional noise. Therefore, further work is required to establish the role of alternative splicing in lung and breast cancer, as well as investigate its prevalence in other cancer types.
It is possible that the underlying mechanisms of HLA repression and alternative splicing events are epigenetic. Supporting this, we found a strong link between methylation and expression of the HLA genes. This could have important clinical implications, as previous studies have illustrated the reversible nature of HLA epigenetic modifications26 and the importance of this reversibility in immunotherapy response30.
Further work is warranted to explore the extent to which HLA alternative splicing and repression represent a pan-cancer immune evasion mechanism. As more pre-therapy and post-therapy data emerge, it will be possible to investigate the extent to which HLA alternative splicing and repression develop during treatment and at immune-therapy resistance and the extent to which these processes might inform therapeutic strategies.
Our results may also have implications for vaccine- and T cell-based therapeutic approaches, which seek to exploit neoantigens. Our results suggest that it may be important to consider not just whether putative neopeptides bind the repertoire of HLA alleles but also the copy number, expression and splicing characteristics of each allele. Indeed, MHC Hammer may be used to help determine which set of predicted neoantigens are most likely to elicit an effective T cell response.
In conclusion, MHC Hammer enables accurate estimation of allele-specific HLA disruption, revealing that it is a common feature of NSCLC and ER+ breast cancer that facilitates immune escape and cancer evolution.
Methods
The TRACERx421 data
The TRACERx421 samples used in the study have been described in previously published manuscripts14,16. The design of the TRACERx study has been approved by an independent research ethics committee (13/LO/1546) and the ClinicalTrials.gov number is NCT01888601. Informed consent for entry into the TRACERx study was mandatory and obtained from every patient. The purity and ploidy estimates, histological subtypes, lung cancer genes and phylogenetic trees used in this study were taken from a previous TRACERx study16. Transcripts per million (TPMs) estimates were taken from a previously published TRACERx study14. The classification of primary tumors that did and did not metastasize, as well as the classification of seeding regions, was taken from a previously published TRACERx study15. Only primary tumor and non-lymph node regions with purity and ploidy estimates were used in this study. The consort diagram of the TRACERx samples used in the study is shown in Extended Data Fig. 1.
The TCGA cohort
MHC Hammer was run on lung and breast samples from the TCGA dataset17–19. We implemented the following thresholds for a sample to be included in our study:
Whole-exome sequencing (WES) samples with less than 5,000,000 paired aligned reads or an alignment rate of less than 0.8 were excluded.
RNA sequencing (RNA-seq) samples with an alignment rate of less than 0.6 were excluded.
Formalin-fixed paraffin-embedded (FFPE) samples and metastatic samples were excluded.
Tumor samples without a purity and ploidy solution were excluded.
Samples without a matched WES germline sample that passed our filters were excluded.
In the cases where a TCGA sample was sequenced multiple times, we selected a single sequencing run to use in our study. For the WES tumor samples, we prioritized choosing a non-whole-genome amplification (non-WGA) sample over a WGA sample31, and then prioritized the samples by the number of paired and aligned reads. For the WES germline samples, we also prioritized non-WGA over WGA, then blood samples over solid tissue samples and finally prioritized the samples by the number of paired and aligned reads. For the RNA-seq samples, we chose the sample with the highest number of paired and aligned reads.
The purity and ploidy solutions for the TCGA cohort were estimated using ASCAT32 and taken from https://github.com/VanLoo-lab/ascat/tree/master/ReleasedData/TCGA_SNP6_hg38. The methylation array data and TPM data used in this study were downloaded from Genomic Data Commons (GDC). The lung histological subtypes (LUAD and LUSC) were taken from GDC, and the breast subtypes (ER+, ER- and TNBC) were taken from cBioPortal.
The consort diagram for the TCGA cohort is shown in Extended Data Fig. 2.
The GTEx cohort
We ran MHC Hammer on the normal lung and breast RNA-seq samples from the GTEx dataset20. We implemented the following thresholds for a sample to be included in our study:
WES germline samples with less than 5,000,000 paired aligned reads or an alignment rate of less than 0.8 were excluded.
RNA-seq samples with an alignment rate of less than 0.6 were excluded.
Samples without a matched WES germline sample that passed our filters were excluded.
The consort diagram for the GTEx cohort is shown in Extended Data Fig. 3.
Validation of allele-specific HLA alternative splicing
To validate our HLA alternative splicing pipeline, we used allele-specific PCR amplification. We performed this for four tumor regions and one normal sample from two patients (CRUK0061_SU_N01, CRUK0061_SU_T1-R1, CRUK0061_SU_T1-R2, CRUK0733_SU_T1-R2 and CRUK0733_SU_T1-R6). RNA-seq data were available for four of these samples (CRUK0061_SU_N01, CRUK0061_SU_T1-R1, CRUK0733_SU_T1-R2 and CRUK0733_SU_T1-R6), and MHC Hammer identified exon 5 skipping in an HLA-C allele in all four samples.
To amplify each allele, we used allele-specific primers that have been described previously12, and the fragment sizes were confirmed via agarose gel electrophoresis (Supplementary Fig. 1a). These PCR products were then cloned using a TA cloning kit (Invitrogen), where the wild-type and novel alternatively spliced transcripts were subsequently validated through Sanger sequencing (Supplementary Fig. 1b).
Neoantigen calls
Patient-specific HLA haplotype predictions were obtained using HLA-HD33 (version 1.2.1). NetMCHpan4.1 (ref. 34) was run on 9–11 neopeptides derived from nonsynonymous mutations across the TRACERx421 cohort, taking into account patient-specific HLA types. A cutoff of 0.5 in the eluted ligand rank was applied to define whether a peptide is bound to a specific HLA type. An observed nonsynonymous mutation is deemed a neoantigen binding to a specific HLA if at least one of its neopeptides is considered a binder.
Danaher estimates of CD8+ T cell infiltration
The amount of CD8 T cell infiltration was estimated using the Danaher method27. To do this, TPM values of the CD8A and CD8B genes were first converted to log2, and the mean log2 value across the two genes was taken for each sample.
Calling alternative splicing in all protein-coding genes
To call alternative splicing in all protein-coding genes, we used the STAR aligner with a two-pass alignment35,36 and the GRCh38 reference sequence to generate a set of splice junctions in the TRACERx samples. Novel splice junctions were defined as those not present in the GRCh38 RefSeq GTF file (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/). To be considered as present in a tumor region, a novel splice junction required at least 20 uniquely mapping reads in any region from the tumor and at least two uniquely mapping reads in the given region. To be considered somatic, the novel splice junction could not be present in the patient-matched normal sample.
The MHC Hammer pipeline
See Supplementary Note for a detailed overview of the MHC Hammer pipeline. The following MHC Hammer parameters were used to generate the data in this study:
The HLA reference files were created using the ImMunoGeneTics (IMGT) database version 3.38 (ref. 37).
The library size was estimated as the number of paired and aligned reads in the input BAM files (include_unmapped_reads_in_library_size = FALSE).
The HLA FASTQ files were created by filtering the input BAM files to include all unmapped reads, reads that mapped to chromosome 6 or any alternate contig or reads that contained a 30-mer sequence from the IMGT database (unmapped_reads = TRUE, contig_reads = TRUE, fish_reads = TRUE).
When filtering the HLA allele BAM files, reads with more than one mismatch to the patient-specific reference were removed (max_mismatch = 1).
In the estimation of allelic copy number and DNA AIB, filtered SNPs required a read depth of at least 30 in TRACERx (min_depth = 30) or 5 in TCGA (min_depth = 5).
After the first STAR alignment, splice junctions required at least two supporting reads to be included in the cohort of splice junctions in the second STAR alignment (uniq_num_across_junc = 2).
Filters implemented for HLA WES analysis
To be included in the HLA DNA analysis, including HLA copy number, AIB and LOH calls, a gene must pass the following filters:
Have at least ten SNPs that pass the minimum read depth of 30 in the TRACERx samples or 5 in the TCGA samples.
Both alleles of the gene must have an expected depth of at least 10. The expected depth estimates the depth of the reads that are coming solely from the cancer cells (see Supplementary Note for more details).
The 95% confidence interval in the allelic copy number, calculated using the R function t.test, must be less than 2.5.
Filters implemented for somatic HLA mutations
HLA allelic mutations were classified as high-impact and damaging if the Ensembl Variant Effect Predictor (VEP)38 consequence included at least one of ‘stop_gained’, ‘frameshift_variant’, ‘start_lost’ or ‘stop_lost’. To be considered in our analysis, mutations had to be classified as ‘PASS’ using the Genome Analysis ToolKit (GATK)39 FilterMutectCalls function, have ten reads supporting the alternate allele, and fall in a sample and gene that passed the MHC Hammer WES filters.
Filters implemented for HLA RNA analysis
To be included in the HLA RNA analysis, including RNA AIB, allelic expression, allelic repression and alternative splicing, a gene must pass the DNA analysis filters and
Have at least ten SNPs in the exon sequence.
Have no more than 50% of reads mapping to both alleles of the same gene.
Have no more than 5% of reads mapping to multiple HLA genes.
Filters implemented for HLA alternative splicing
We excluded from our analysis any novel splice junction detected in the first or last exons of an HLA gene. To be included in our analysis, novel splice junctions had to be classified as one of the following: complete exon skipping, partial exon skipping, partial intron retention or complete intron retention. In addition, we applied the following depth filters:
In the GTEx cohort—to be included in our analysis, a novel splice junction required at least two uniquely mapping reads in that sample. In addition, the novel splice junction needed to be identified in another sample from the same patient with at least 20 uniquely mapping reads.
In the TRACERx and TCGA cohorts—to be included in our analysis, a novel splice junction needed to be classified as either ‘tumor-enriched’ or ‘tumor-depleted’ and also be identified in at least one tumor region or matched normal from the patient with at least 20 uniquely mapping reads.
Statistical information
All statistical tests were performed in R (v.4.3.3). No statistical methods were used to predetermine the sample size. Tests involving comparisons of distributions were done using a two-tailed Wilcoxon test (wilcox.test). Tests involving the comparison of groups were done using a two-tailed Fisher’s exact test (fisher.test). The correlation was tested using Pearson’s correlation coefficient (cor.test).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-024-01883-8.
Supplementary information
Acknowledgements
We gratefully acknowledge the individuals who participated in the TRACERx, TCGA and GTEx studies. We are grateful for assistance from the scientific computing facilities at the Francis Crick Institute. We thank all site personnel, investigators, funders and industry partners who supported the generation of the data within this study. The results published here are in part based on data generated by the TCGA Research Network and the GTEx study. The GTEx Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. The TRACERx study (ClinicalTrials.gov registration: NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (C11496/A17786) and coordinated through the Cancer Research UK and UCL Cancer Trials Centre, which has a core grant from CRUK (C444/A15953). This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). This work was also supported by the Cancer Research UK Lung Cancer Centre of Excellence and the CRUK City of London Centre Award (C7893/A26233), as well as the UCL Experimental Cancer Medicine Centre. C.P. is supported by the Breast Cancer Research Foundation (BCRF 23-157). M.M.L. is supported by the Rosetrees Trust. F.G.C. is supported by the Cancer Research UK Brain Tumor Award (A28592). C.M.-R. is supported by the Rosetrees Trust (M630) and the Wellcome Trust. K.K.D. is supported by funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie (grant agreement 101024529) and supported by a ZonMW Rubicon fellowship (20-45200-98-20102). N.K. is supported by the Breast Cancer Research Foundation (BCRF 23-157), the Rosetrees Trust and Cancer Research UK. K.L. is funded by the UK Medical Research Council (MR/V033077/1), the Rosetrees Trust and Cotswold Trust (A2437), the Royal Marsden Cancer Charity (thanks to the R. Russell family and Macfarlanes donations) and the Melanoma Research Alliance and Cancer Research UK (C69256/A30194). P.V.L. is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2008), the UK Medical Research Council (CC2008) and the Wellcome Trust (CC2008); is a CPRIT Scholar in Cancer Research and acknowledges CPRIT grant support (RR210006). M.J.-H. is a CRUK Career Establishment Awardee and has received funding from CRUK, IASLC International Lung Cancer Foundation, Lung Cancer Research Foundation, Rosetrees Trust, UKI NETs, National Institute for Health Research (NIHR) and NIHR UCLH Biomedical Research Centre. S.A.Q. is funded by a CRUK Senior Cancer Research Fellowship (C36463/A22246), CRUK Biotherapeutic Program grant (C36463/A20764) and Cancer Research UK Brain Tumor Award (A28592). C.S. is a Royal Society Napier Research Professor (RSRP\R\210001). His work is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). For Open Access, the author has applied a CC BY public copyright license to any author-accepted manuscript version arising from this submission. C.S. is funded by Cancer Research UK (TRACERx (C11496/A17786), PEACE (C416/A21999) and CRUK Cancer Immunotherapy Catalyst Network); Cancer Research UK Lung Cancer Centre of Excellence (C11496/A30025); the Rosetrees Trust, Butterfield and Stoneygate Trusts; Novo Nordisk Foundation (ID16584); Royal Society Professorship Enhancement Award (RP/EA/180007 and RF\ERE\231118)); NIHR University College London Hospitals Biomedical Research Centre; the Cancer Research UK-University College London Centre; Experimental Cancer Medicine Centre; the Breast Cancer Research Foundation (US; BCRF-22-157); Cancer Research UK Early Detection and Diagnosis Primer Award (grant EDDPMA-Nov21/100034); the Mark Foundation for Cancer Research Aspire Award (grant 21-029-ASP) and ASPIRE Phase II award (grant 23-034-ASP). C.S. is in receipt of an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement 835297). N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (grant 211179/Z/18/Z) and also receives funding from Cancer Research UK, Rosetrees and the NIHR BRC at University College London Hospitals and the CRUK University College London Experimental Cancer Medicine Centre.
Extended data
Author contributions
C.P. performed pipeline development, analysis and wrote the manuscript. T.P.J. performed pipeline development. M.M.L. performed analysis and wrote the manuscript. F.G.C., J.L., M.V.G. and A.R. designed and performed wet lab validation. O.P., C.M.-R., R.B., K.K.D., J.R.M.B. and R.R. provided bioinformatic support. N.K., K.L., R.S., D.A.M., P.V.L., M.J.-H. and S.A.Q. supervised the study. C.S. and N.M. supervised the study and wrote the manuscript.
Peer review
Peer review information
Nature Genetics thanks Justin Gainor and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
The WES and RNA-seq data used during this study have been deposited at the European Genome–Phenome Archive, which is hosted by the European Bioinformatics Institute and the Centre for Genomic Regulation under the accession code EGAS00001006494. Access is controlled by the TRACERx data access committee. Details on how to apply for access are available on the linked page. The TRACERx data are available under controlled access so that patient privacy and data confidentiality are maintained while promoting and encouraging impactful scientific discovery. The data access committee aims to reply to requests within 1 week.
Code availability
Code to run the MHC Hammer pipeline can be found at https://github.com/McGranahanLab/mhc-hammer.
Code to recreate the manuscript figure can be found at https://zenodo.org/records/13388455 (ref. 40).
Competing interests
C.P. holds a patent pending in determining HLA disruption (PCT/EP2023/059039). K.K.D. provided consultancy services to Achilles Therapeutics UK. N.K. receives research support from AstraZeneca. K.L. has a patent on InDel burden and CPI response pending and speaker fees from Roche tissue diagnostics, research funding from CRUK TDL–Ono–LifeArc alliance, Genesis Therapeutics and consulting roles with Ellipses Pharma, Monopteros and Kynos Therapeutics. D.A.M. reports speaker fees from Eli Lilly, AstraZeneca and Takeda Pharmaceuticals; consultancy fees from AstraZeneca, Thermo Fisher Scientific, Takeda Pharmaceuticals, Amgen, Janssen, MIM Software, Bristol Myers Squibb and Eli Lilly; and has received educational support from Takeda Pharmaceuticals and Amgen. M.J.-H. has received funding from CRUK, NIH National Cancer Institute, IASLC International Lung Cancer Foundation, Lung Cancer Research Foundation, Rosetrees Trust, UKI NETs and NIHR. M.J-H. has consulted for, and is a member of, the Achilles Therapeutics Scientific Advisory Board and Steering Committee and has received speaker honoraria from Pfizer, Astex Pharmaceuticals, Oslo Cancer Cluster, Bristol Myers Squibb and Genentech. M.J.-.H. is listed as a co-inventor on a European patent application relating to methods to detect lung cancer PCT/US2017/028013). This patent has been licensed to commercial entities, and under terms of employment, M.J.-H. is due a share of any revenue generated from such license(s) and is also listed as a co-inventor on the GB priority patent application (GB2400424.4) with title—Treatment and Prevention of Lung Cancer. C.S. acknowledges grants from AstraZeneca, Boehringer-Ingelheim, Bristol Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx—collaboration in minimal residual disease sequencing technologies), Ono Pharmaceutical and Personalis. He is the chief investigator for the AZ MeRmaiD 1 and 2 clinical trials and is the Steering Committee Chair. He is also the cochief investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s Scientific Advisory Board. He receives consultant fees from Achilles Therapeutics (also a SAB member), Bicycle Therapeutics (also a SAB member), Genentech, Medicxi, China Innovation Centre of Roche (CICoR) formerly Roche Innovation Centre—Shanghai, Metabomed (until July 2022), Relay Therapeutics SAB member, Saga Diagnostics SAB member and the Sarah Cannon Research Institute. He has received honoraria from Amgen, AstraZeneca, Bristol Myers Squibb, GlaxoSmithKline, Illumina, MSD, Novartis, Pfizer and Roche-Ventana. He has previously held stock options in Apogen Biotechnologies and GRAIL; currently has stock options in Epic Bioscience, Bicycle Therapeutics, and Relay Therapeutics; and has stock options and is cofounder of Achilles Therapeutics. He declares a patent application for methods to lung cancer (PCT/US2017/028013); targeting neoantigens (PCT/EP2016/059401); identifying patent response to immune checkpoint blockade (PCT/EP2016/071471); methods for lung cancer detection (US20190106751A1); identifying patients who respond to cancer treatment (PCT/GB2018/051912); determining HLA LOH (PCT/GB2018/052004); predicting survival rates of patients with cancer (PCT/GB2020/050221); and methods and systems for tumor monitoring (PCT/EP2022/077987). He is an inventor of a European patent application (PCT/GB2017/053289) relating to assay technology to detect tumor recurrence. This patent has been licensed to a commercial entity, and under their terms of employment, he is due a revenue share of any revenue generated from such license(s). N.M. has stock options in and has consulted for Achilles Therapeutics and holds a European patent in determining HLA LOH (PCT/GB2018/052004), a patent pending in determining HLA disruption (PCT/EP2023/059039), and is a co-inventor to a patent to identify responders to cancer treatment (PCT/GB2018/051912). The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A list of authors and their affiliations appears at the end of the paper.
Contributor Information
Charles Swanton, Email: charles.swanton@crick.ac.uk.
Nicholas McGranahan, Email: nicholas.mcgranahan.10@ucl.ac.uk.
TRACERx Consortium:
Heather Cheyne, Mohammed Khalil, Shirley Richardson, Tracey Cruickshank, Eric Lim, Hugo J. W. L. Aerts, Tom L. Kaufmann, Matthew R. Huska, Babu Naidu, Gareth A. Wilson, Rachel Rosenthal, Andrew Rowan, Chris Bailey, Claudia Lee, Emma Colliver, Katey S. S. Enfield, Mark S. Hill, Mihaela Angelova, Oriol Pich, Dhruva Biswas, Clare Puttick, Roberto Vendramin, Cian Murphy, Maria Zagorulya, Thomas P. Jones, Michelle M. Leung, Nicholas McGranahan, Carla Castignani, Elizabeth Larose Cadieux, Jeanette Kittel, Kerstin Haase, Kexin Koh, Rachel Scott, Gurdeep Matharu, Jacqui A. Shaw, Allan Hackshaw, Camilla Pilotti, Rachel Leslie, Anne-Marie Hacker, Sean Smith, Aoife Walker, Christopher Abbosh, Corentin Richard, Cristina Naceur-Lombardelli, Francisco Gimeno-Valiente, Krupa Thakkar, Mariana Werner Sunderland, Monica Sivakumar, Nnennaya Kanu, Ieva Usaite, Sadegh Saghafinia, Selvaraju Veeriah, Sharon Vanloo, Bushra Mussa, Michalina Magala, Elizabeth Keene, Emilia L. Lim, James R. sM Black, Maise Al Bakir, Ariana Huebner, Kristiana Grigoriadis, Takahiro Karasaki, Alexander M. Frankell, Crispin T. Hiley, Sophia Ward, Sian Harries, Olivia Lucas, David A. Moore, Nicolai J. Birkbak, Carlos Martínez-Ruiz, Kerstin Thol, Robert Bentham, Wing Kin Liu, Abigail Bunkum, Sonya Hessey, Martin D. Forster, Siow Ming Lee, Mariam Jamal-Hanjani, Despoina Karagianni, Sergio A. Quezada, Supreet Kaur Bola, Kevin Litchfield, Charles Swanton, John Le Quesne, Khalid AbdulJabbar, Catarina Veiga, Simone Zaccaria, Jonathan Tugwood, Caroline Dive, Zoltan Szallasi, Miklos Diossy, Teresa Marafioti, Elaine Borg, Mary Falzon, Reena Khiroya, Peter Van Loo, Karl S. Peggs, Gillian Price, Gary Royle, Charles-Antoine Collins-Fekete, Dionysis Papadatos-Pastos, James Wilson, Tanya Ahmad, Sarah Benafif, Judith Cave, Keith M. Kerr, Thomas B. K. Watkins, Roberto Salgado, Alexander James Procter, Asia Ahmed, Magali N. Taylor, Arjun Nair, David Lawrence, Davide Patrini, Colin R. Lindsay, Fiona H. Blackhall, Yvonne Summers, Matthew G. Krebs, Emma Nye, Richard Kevin Stone, Hanyun Zhang, Jerome Nicod, Alan Kirk, Mo Asif, Rocco Bilancia, Nikos Kostoulas, Jennifer Whiteley, Mathew Thomas, Akshay J. Patel, David Chuter, Mairead MacKenzie, Roland F. Schwarz, Andrew Kidd, Francesco Fraioli, Paul Ashford, Zoltan Kaplar, Jonas Demeulemeester, Claire Wilson, Michael J. Shackcloth, Sam M. Janes, Neal Navani, Ricky M. Thakrar, Angela Leek, Jack Davies Hodgkinson, Nicola Totton, Antonio Paiva-Correia, Stephan Beck, Miljana Tanic, Craig Dick, Lily Robinson, Peter Russell, Paulo De Sousa, Simon Jordan, Alexandra Rice, Hilgardt Raubenheimer, Harshil Bhayani, Lyn Ambrose, Anand Devaraj, Hemangi Chavan, Sofina Begum, Silviu I. Buderi, Daniel Kaniu, Mpho Malima, Sarah Booth, Nadia Fernandes, Pratibha Shah, Chiara Proli, Andrew G. Nicholson, Ekaterini Boleti, Madeleine Hewish, Kevin G. Blyth, Jason F. Lester, Anshuman Chaturvedi, Pedro Oliveira, Katherine D. Brown, Mathew Carter, Alastair Magness, Clare E. Weeden, Eva Grönroos, Jacki Goldman, Mickael Escudero, Philip Hobson, Stefan Boeing, Tamara Denner, Vittorio Barbè, Wei-Ting Lu, William Hill, Yutaka Naito, Zoe Ramsden, George Kassiotis, Imran Noorani, Anca Grapa, Aiman Alzetani, Yinyin Yuan, Xiaoxi Pan, Jack French, Kayleigh Gilbert, Angela Dwornik, Angeliki Karamani, Benny Chain, David R. Pearce, Felip Gálvez-Cancino, Georgia Stavrou, Gerasimos-Theodoros Mastrokalos, Helen L. Lowe, Ignacio Garcia Matos, James L. Reading, John A. Hartley, Kayalvizhi Selvaraju, Kezhong Chen, Leah Ensell, Mansi Shah, Maria Litovchenko, Piotr Pawlik, Samuel Gamble, Seng Kuong Anakin Ung, Victoria Spanswick, Yin Wu, Jayant K. Rane, Othman Al-Sawaf, Olga Chervova, Emilie Martinoni Hoogenboom, Fleur Monk, James W. Holding, Junaid Choudhary, Kunal Bhakhri, Pat Gorman, Robert C. M. Stephens, Maria Chiara Pisciella, Steve Bandula, Yien Ning Sophia Wong, Aya Osman, Mandeesh Sangha, Gerald Langman, Helen Shackleford, Madava Djearaman, Gary Middleton, Serena Chee, Patricia Georg, Amrita Bajaj, Apostolos Nakas, Azmina Sodha-Ramdeen, Mohamad Tufail, Molly Scotland, Rebecca Boyles, Sridhar Rathinam, Domenic Marrone, Sean Dulloo, Dean A. Fennell, Sarah Danson, Elaine Smith, Eustace Fontaine, Felice Granato, Juliette Novasio, Kendadai Rammohan, Leena Joseph, Paul Bishop, Rajesh Shah, Vijay Joshi, and Philip Crosbie
Extended data
is available for this paper at 10.1038/s41588-024-01883-8.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-024-01883-8.
References
- 1.Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature567, 479–485 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bentham, R. et al. Using DNA sequencing data to quantify T cell fraction and therapy response. Nature597, 555–560 (2021). [DOI] [PubMed] [Google Scholar]
- 3.Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell160, 48–61 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schaafsma, E., Fugle, C. M., Wang, X. & Cheng, C. Pan-cancer association of HLA gene expression with cancer prognosis and immunotherapy efficacy. Br. J. Cancer125, 422–432 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol.33, 1152–1158 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Momburg, F. et al. Loss of HLA-A,B,C and de novo expression of HLA-D in colorectal cancer. Int. J. Cancer37, 179–184 (1986). [DOI] [PubMed] [Google Scholar]
- 7.McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell171, 1259–1271 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dai, Z.-X., Zhang, G.-H., Zhang, X.-H., Zhu, J.-W. & Zheng, Y.-T. A splice variant of HLA-A with a deletion of exon 3 expressed as nonmature cell-surface glycoproteins forms a heterodimeric structure with full-length HLA-A. Hum. Immunol.75, 234–238 (2014). [DOI] [PubMed] [Google Scholar]
- 9.Wang, Z., Marincola, F. M., Rivoltini, L., Parmiani, G. & Ferrone, S. Selective histocompatibility leukocyte antigen (HLA)-A2 loss caused by aberrant pre-mRNA splicing in 624MEL28 melanoma cells. J. Exp. Med.190, 205–215 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Krangel, M. S. Secretion of HLA-A and -B antigens via an alternative RNA splicing pathway. J. Exp. Med.163, 1173–1190 (1986). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yang, D. & Le, J. Targeted amplification of alternatively spliced transcripts of major histocompatibility complex class I heavy chain. J. Immunol. Methods176, 265–270 (1994). [DOI] [PubMed] [Google Scholar]
- 12.Ehlers, F. A. I. et al. Polymorphic differences within HLA-C alleles contribute to alternatively spliced transcripts lacking exon 5. HLA100, 232–243 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Reinders, J. et al. Identification of HLA-A*0111N: a synonymous substitution, introducing an alternative splice site in exon 3, silenced the expression of an HLA-A allele. Hum. Immunol.66, 912–920 (2005). [DOI] [PubMed] [Google Scholar]
- 14.Martínez-Ruiz, C. et al. Genomic–transcriptomic evolution in lung cancer and metastasis. Nature616, 543–552 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature616, 534–542 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature616, 525–533 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature511, 543–550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature490, 61–70 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature489, 519–525 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.GTEx Consortium. Genetic effects on gene expression across human tissues. Nature550, 204–213 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ramsuran, V. et al. Epigenetic regulation of differential HLA-A allelic expression levels. Hum. Mol. Genet.24, 4268–4275 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Johansson, T., Yohannes, D. A., Koskela, S., Partanen, J. & Saavalainen, P. HLA RNA sequencing with unique molecular identifiers reveals high allele-specific variability in mRNA expression. Front. Immunol.12, 629059 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Watanabe, Y., Magor, K. E. & Parham, P. Exon 5 encoding the transmembrane region of HLA-A contains a transitional region for the induction of nonsense-mediated mRNA decay. J. Immunol.167, 6901–6911 (2001). [DOI] [PubMed] [Google Scholar]
- 24.Rivas, M. A. et al. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science348, 666–669 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Carey, B. S., Poulton, K. V. & Poles, A. Factors affecting HLA expression: a review. Int. J. Immunogenet.46, 307–320 (2019). [DOI] [PubMed] [Google Scholar]
- 26.Rodems, T. S. et al. Reversible epigenetic alterations regulate class I HLA loss in prostate cancer. Commun. Biol.5, 897 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Danaher, P. et al. Gene expression markers of tumor infiltrating leukocytes. J. Immunother. Cancer5, 18 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature578, 94–101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Contini, P. et al. Soluble HLA-A,-B,-C and -G molecules induce apoptosis in T and NK CD8+ cells and inhibit cytotoxic T cell activity through CD8 ligation. Eur. J. Immunol.33, 125–134 (2003). [DOI] [PubMed] [Google Scholar]
- 30.Garrido, C. et al. Immunotherapy eradicates metastases with reversible defects in MHC class I expression. Cancer Immunol. Immunother.60, 1257–1268 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Buckley, A. R. et al. Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls. BMC Genomics18, 458 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA107, 16910–16915 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kawaguchi, S., Higasa, K., Shimizu, M., Yamada, R. & Matsuda, F. HLA-HD: an accurate HLA typing algorithm for next-generation sequencing data. Hum. Mutat.38, 788–797 (2017). [DOI] [PubMed] [Google Scholar]
- 34.Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res.48, W449–W454 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Veeneman, B. A., Shukla, S., Dhanasekaran, S. M., Chinnaiyan, A. M. & Nesvizhskii, A. I. Two-pass alignment improves novel splice junction quantification. Bioinformatics32, 43–49 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lefranc, M.-P. et al. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res. 43, D413–D422 (2014). [DOI] [PMC free article] [PubMed]
- 38.McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol.17, 122 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. 1st edn (O'Reilly Media, 2020).
- 40.Puttick, C. MHC Hammer publication code and data. Zenodohttps://zenodo.org/records/13388455 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The WES and RNA-seq data used during this study have been deposited at the European Genome–Phenome Archive, which is hosted by the European Bioinformatics Institute and the Centre for Genomic Regulation under the accession code EGAS00001006494. Access is controlled by the TRACERx data access committee. Details on how to apply for access are available on the linked page. The TRACERx data are available under controlled access so that patient privacy and data confidentiality are maintained while promoting and encouraging impactful scientific discovery. The data access committee aims to reply to requests within 1 week.
Code to run the MHC Hammer pipeline can be found at https://github.com/McGranahanLab/mhc-hammer.
Code to recreate the manuscript figure can be found at https://zenodo.org/records/13388455 (ref. 40).