Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jun 28;16(6):e0251630. doi: 10.1371/journal.pone.0251630

Framing the potential of public frameshift peptides as immunotherapy targets in colon cancer

Ide T Spaanderman 1,2, Fleur S Peters 2, Aldo Jongejan 3, Egbert J W Redeker 4, Cornelis J A Punt 1,2,3,4,5, Adriaan D Bins 1,2,*
Editor: Jianjiong Gao6
PMCID: PMC8238217  PMID: 34181673

Abstract

Approximately 15% of Colon Cancers are Microsatellite Instable (MSI). Frameshift Peptides (FPs) formed in MSI Colon Cancer are potential targets for immunotherapeutic strategies. Here we comprehensively characterize the mutational landscape of 71 MSI Colon Cancer patients from the cancer genome atlas (TCGA). We confirm that the mutations in MSI Colon Cancers are frequently frameshift deletions (23% in MSI; 1% in microsatellite stable), We find that these mutations cluster at specific locations in the genome which are mutated in up to 41% of the patients. We filter these for an adequate variant allele frequency, a sufficient mean mRNA level and the formation of a Super Neo Open Reading Frame (SNORF). Finally, we check the influence of Nonsense Mediated Decay (MMD) by comparing RNA and DNA sequencing results. Thereby we identify a set of 20 NMD-escaping Public FPs (PFPs) that cover over 90% of MSI Colon, 62.2% of MSI Endometrial and 58.8% of MSI Stomach cancer patients and 3 out of 4 Lynch patients in the TCGA-COAD. This underlines the potential for PFP directed immunotherapy, both in a therapeutic and a prophylactic setting in multiple types of MSI cancers.

Introduction

Protein products derived from mutated regions of DNA are a source of neo-antigens that can drive immune activation against cancer and convey susceptibility to immunotherapy [14]. Specifically, frameshift insertions and deletions (INDELS) are able to drive the anti-tumor immune response [5,6]. INDELS in coding DNA result in frameshifted RNA containing neo open reading frames (NORFS) that, once translated, give rise to completely non-self, out-of-frame protein products. These out-of-frame proteins can be degraded into peptides and presented in the context of multiple MHC alleles. Recognition of these MHC-peptide complexes by CD4+ or CD8+ T-cells initializes an anti-tumor immune response and leads to an influx of T cells into the tumor microenvironment [1,7,8], Longer NORFS, consisting of at least 50 aminoacids, so called super NORFS (SNORFS), have a higher chance to be recognized as a neo-antigens [9]. Importantly, NORFS are subject to degradation on a RNA level by the Nonsense Mediated Decay (NMD) pathway [10]. NMD can be a limiting factor in tumor immunity, which can be resolved by direct targeting of the NMD pathway [11,12]. In line with this, the presence of NMD escaping SNORFS has been shown to be a strong predictor of immunotherapy response and survival in multiple cancer types [9].

Compared to other mutation signatures, INDELS are overrepresented in the exome of microsatellite instable (MSI) tumors [5,13]. Microsatellites are regions in the DNA with a repeated nucleotide motif. These regions are prone to polymerase slippage, release and subsequently incorrect re-annealing during DNA synthesis. This leads to single stand insertions and deletions in the nascent DNA strand [14]. In healthy cells, the mismatch repair (MMR) system recognizes and repairs these damages. However, MSI tumors have a deficient MMR system (dMMR) due to germline and/or acquired somatic mutations in one or multiple MMR genes (MLH1, MSH2, MSH6, PMS2, EPCAM) or by hyper methylation of the MSH1 promotor [15,16]. Approximately 15% of colon cancers are considered MSI [17]. These dMMR colon cancers have high numbers of mutations, specifically INDELS in microsatellites.

Here we comprehensively assess frameshift mutations in colon cancer samples sequenced within the genome atlas (TCGA-COAD) [18]. We focus on the TGCA-COAD series as this contains a large subset of MSI tumors. Using a novel in silico approach (Fig 1), we establish that colon cancer frameshift mutations cluster to specific loci in the colon cancer exome. Based on DNA and RNA sequencing data we investigate whether these mutations lead to SNORFS and subsequently whether these SNORFS are affected by NMD. Finally, we identify a set of frequently occurring frameshift peptides (FPs), that we name public frameshift peptides (PFPs). These PFPs are potential targets for immunotherapy strategies in an adjuvant setting, as they obviate the need for time-consuming personalization of vaccines. Moreover, PFPs can also be found in all four (Lynch) patients with a germline MMR mutation in the TCGA-COAD. This highlights the possibility for PFP specific vaccination in a prophylactic setting as well.

Fig 1. Schematic overview of in silico pipeline.

Fig 1

TCGA-COAD: The cancer genome atlas colorectal adenocarcinoma database; Varscan: Somatic variant calling; Mutect: Somatic variant calling; EBFilter: Bayesian filtering for somatic variants; Mpileup: Location specific read calling; RSEM: Transcription quantification; VAF: Variant allele frequency; ENSEMBL Perl API: Programmable interface for Ensembl genome browser; GATK: Genome analysis toolkit; Haplotypecaller: Genetic variant calling; CNNScoreVariants: Germline variant filtering; Functotator: Clinvar based annotator of genetic variants; PTC: Premature termination codon; MS: Microsatellite.

Results

Frameshift mutations cluster to specific genomic loci in MSI colon cancer

In line with previous reports in literature 71 out of 461 (15%) of TCGA-COAD patients are MSI [17]. The majority of these patient are sporadic MSI (n = 67, 94.4%), but a small number of patients have germline mutations in the mismatch repair pathway and are therefore considered Lynch patients (n = 4, 5.6%). As expected, in general MSI colon cancer patients have a higher mutational burden than microsatellite stable (MSS) colon cancer patients (MSI: median = 2974, n = 71; MSS: median = 218, n = 272; p < 0.0001) (Fig 2A) and a larger percentage of their mutations consists of deletions or insertions (MSI: SNP = 72%, INS = 5%, DEL = 23%’; MSS: SNP = 97%, INS 1%, DEL = 1%) (Fig 2B). Almost all of these INDELS are frameshift mutations (INS: Frameshift 99%; DEL: Frameshift 98%) (Fig 2C). Contrary to single nucleotide polymorphisms (SNPs), these INDELS are preferentially located in coding microsatellites (cMS). (Median cMS length on mutation locus INS = 7; DEL = 7; SNP = 1, p < 0.0001) (Fig 2D). This highlights the role of the dysfunctional MMR system in the mutation signature of MSI colon cancer. As expected, a subset of mutations cluster to specific loci in the MSI colon cancer exome, similar to the well documented BRAF SNP at chr7:140753336 [19], it seems probable that the most prevalent mutations also play a role in tumorigenesis, (Fig 3A). 13 of the 14 most prevalent mutations are frameshift mutations located in a cMS. Strikingly, the most frequently occurring frameshift mutation is present in 46% of the MSI patients. 123 frameshift mutations occur in more than 10% of the patients (Fig 3B). Notably 82.5% of the deletions that occur in more than 10% of patients are listed in the candidate cancer gene atlas and 42.5% are listed as potential drivers in colon cancer [20] (S1 Table). Interestingly, when comparing the tendency to cluster among DEL, INS and SNPs mutations, clustering (with the exception of the BRAF SNP) is only clearly present for DEL mutations (median percentage of patients with a mutation on exactly the same locus in the genome; DEL = 14.08%, INS = 4.23%, SNP = 4.23%, p < 0.0001) (Fig 3C).

Fig 2. Mutational profile for MSS and MSI in TCGA-COAD.

Fig 2

A) Difference in tumor mutational burden as number of mutations per patient between MSI (median = 2974, n = 71) and MSS patients (median = 218, n = 272, p < 0.0001). B) Mutation types as percentage of mutations between MSI (DEL = 23.15%, INS = 5.06%, SNP = 71.19%, n = 71) and MSS patients (DEL = 1.21%, INS = 1.34%, SNP = 97.4%, n = 218) C) Percentage of frameshift mutations for deletion (Frameshift = 98.69%, In_Frame = 1.31%, n = 53851) and insertions (Frameshift = 99.07%, In_Frame = 0.93%, n = 13172) in MSI patients (n = 71). D) Difference in microsatellite length in coding mononucleotide repeats on location of mutation in the genome between mutation subtypes, DEL (median = 7, n = 4960), INS (median = 7, n = 2384), SNP (median = 3, n = 1859, p < 0.0001) in MSI patients (n = 71). Microsatellite length of 3 mononucleotide repeats or less cannot be considered true microsatellites.

Fig 3. Mutation clustering in MSI TCGA-COAD.

Fig 3

A) Waterfall graph depicting the 15 most frequently occurring mutation hotspots by Gene Identifier in MSI patients. Light grey mutations are SNPs. Light blue mutations are DELs. Other mutation types do not occur within the 15 most frequently occurring mutation hotspot. B) Percentage of most frequently occurring mutation loci. C) Difference in mutation frequency as percentage of patients with mutation on locus between mutation types for the top 100 most frequently occurring mutations, DEL (median = 14.08%, n = 100), INS (median = 4.23%, n = 100), SNP (median = 4.23%, n = 100, p < 0.0001) in MSI patients (n = 71).

Nonsense mediated decay escape

As previously mentioned, NORFS derived from INDELS are potentially subject to NMD. NMD is triggered by the formation of a premature termination codon (PTC), that is prone to occur in out-of-frame NORFS [10]. Such a PTC can make the mutated transcript susceptible to degradation by NMD. One major rule that defines this is the location of the PTC. If such a PTC is located at least 50 base pairs before the last exon-exon junction, the mRNA is considered sensitive to degradation by NMD. If the PTC is located behind this point the transcript is considered NMD resistant [21]. A complete set of rules can be applied to predict whether a mutation leads to a NMD sensitive PTC [11]. In the MSI cancers of the TGCA-COAD this concept holds true, as the median Z-score for mutated transcripts, the difference between mutated mRNA transcript level and the mean transcript level, with a NMD sensitive PTC is significantly lower than that of a mutated transcripts with a NMD resistant PTC (median SENS = -0.16; RES = 0.14; p < 0.0001) (Fig 4A). Nonetheless the NMD effect is limited in an absolute sense and clearly heterogeneous as a large portion of potential sensitive mutations seem to escape NMD based on the Z-score of mRNA transcript levels. However, the fact that a single specific mutation can be mapped to multiple transcripts (isoforms), might limit the sensitivity for detection of transcript specific NMD signals.

Fig 4. Nonsense mediated decay escape in MSI TCGA-COAD.

Fig 4

Approximation of Nonsense Mediated Decay(NMD) on mutated transcripts either predicted as NMD sensitive (SENS) or NMD resistant (RES), based on the location of the premature termination codon (PTC) by NMDdetective A) NMD defined as a z-score of transcript levels for RES (median = 0.14, n = 973) and SENS (median = -0.16, n = 2224, p < 0.0001). B) NMD defined as normalized VAF-ratio: the allele frequency (VAF) as percentage RNA VAF of DNA VAF for RES (median = 100%, n = 685) and SENS (median = 57.9%, n = 1324, p < 0.0001).

To negate this problem, we checked the Variant Allele Frequency (VAF) for each mutated locus in the DNA and RNA. The VAF is the percentage of mutated reads in all reads on that specific locus. We calculated the percentage percentile difference between DNA and RNA by dividing the DNA VAF by the RNA VAF: the VAF-ratio. This VAF ratio is transcript independent and therefore a much more sensitive and accurate measurement of NMD (S1 Fig). On average, for NMD sensitive mutations the variant allele frequency (VAF) in RNA sequencing data is decreased by 42.1% compared to the DNA VAF (median VAF-ratio 57.9 n = 1324), which is significantly more than the NMD resistant mutations (median VAF-ratio 100%, n = 685; p < 0.0001) (Fig 4B). This clearly confirms that NMD is an important factor in the formation of neo-antigens from NORFS in MSI colon cancer. Nonetheless roughly 75% of NMD sensitive NORFS still escape NMD, with a VAF-ratio of 75% or more. This presents a clear opportunity for PFP formation.

NMD escape by public frameshift peptides

Besides the potential NMD impact of PTCs, PTCs also can limit NORF length to a size that prohibits presentation of the full size of the frameshifted region in the context of MHC. Peptides that are presented in MHCI are generally at least 9 amino acids in length [22], but MHCII binding peptides are highly variable in length [23]. In the NORFS identified by us in MSI cancers of the TGCA-COAD, a PTC occurs on average after 14 amino acids (S2 Fig). Therefore, we selected potential PFP by discarding NORFS with a length below 10 amino acids. Hereby, we identified 90 SNORFS occurring in more than 5% of MSI patients. Relying on our own experience that a set of 20 PFPs is feasible to incorporate in a multivalent vaccine [23], we discarded the 70 least frequently occurring SNORFS (Table 1). The remaining 20 frequently occurring SNORFS have a mean NORF length of 28 amino acids, a mean VAF-ratio of 67.93% and a mean mRNA level of 2.30 (Log2 TPM) for the most abundantly available corresponding transcript (isoform). Most importantly, at least one of these 20 SNORFS is present in the exome of 93% of MSI samples in the TGCA-COAD. A limited set of 10 SNORFS, as has been used in a personalized RNA vaccination setting [24], would still cover 74.6% of these patients (S3 Fig). This highlights the potential for non-personalized immunotherapy that targets PFP in MSI colon cancer patients. Notably, 25% of these 20 SNORFS are listed as potential drivers in colon cancer [20].

Table 1. Public Frameshift Proteins (PFP).

POSITION (HG38) GENE MS LENGTH MS BASE TYPE NMD PTC DISTANCE (AMINO ACIDS) DELTAVAF (MEAN) TRANSCRIPT MRNA LEVEL (LOG2 TPM) CUMULATIVE MSI PATIENTS COVERED (N %) cUMULaTIVE lynch pATIENTS COVERED (N, %)
17:58357799 RNF43 7 G Frameshift_Del SENS 39 85.73% 2.37 33 (46.5%)
12:121804751 SETD1B 8 C Frameshift_Del RES 28 51.65% 2.30 36 (50.7%)
2:97083987 FAHD2B 7 C Frameshift_Del RES 23 57.86% 1.25 39 (54.9%)
x:56565440 UBQLN2 7 C Frameshift _Del RES 71 103.06% 3.30 44 (62.0%) 1 (25%)
9:134053406 BRD3 8 C Frameshift _Del RES 84 111.84% 3.09 45 (63.4%)
4:94252758 SMARCAD1 8 A Frameshift _Del SENS 24 77.61% 1.90 47 (66.0%)
20:32202148 PLAGL2 7 C Frameshift _Del RES 32 53.33% 2.64 47 (66.2%)
10:88922388 STAMBPL1 8 A Frameshift _Del RES 20 98.90% 2.44 50 (70.4%)
x:130056035 BCORL1 7 C Frameshift _Del RES 20 129.17% 1.22 52 (73.2%)
1:156745311 HDGF 8 T Frameshift _Del SENS 56 66.77% 6.11 53 (74.6%)
4:2696449 FAM193A 7 A Frameshift _Del SENS 23 57.85% 2.53 54 (76.1%) 1 (25%)
12:49040708 KMT2D 5 C Frameshift _Del SENS 28 96.06% 2.29 58 (81.7%) 3 (75%)
21:14966008 NRIP1 8 A Frameshift _Del RES 22 66.73% 2.22 58 (81.7%) 3 (75%)
22:17818165 MICAL3 7 C Frameshift _Del SENS 104 68.38% 1.72 59 (83.1%)
13:52474898 CKAP2 8 A Frameshift _Del RES 14 66.05% 3.24 63 (88.7%)
14:23276100 HOMEZ 8 T Frameshift _Del RES 12 98.17% 2.11 64 (90.1%)
19:13772153 MRI1 6 C Frameshift _Del RES 24 61.77% 2.15 65 (91.5%) 3 (75%)
16:14252442 MRTFB 7 C Frameshift _Del SENS 42 121.43% 1.94 65 (91.5%)
1:27550357 AHDC1 7 A Frameshift _Del RES 54 52.91% 1.90 66 (93.0%)
2:43225372 ZFP36L2 6 G Frameshift _Del RES 41 67.48% 6.38 66 (93.0%)

Properties of the 20 most frequently occurring mutations with SNORF of at least 10 Amino Acids and DeltaVAF of 50%. MS: Microsatellite. Bold: Cumulative numbers of MSI patient covered by mutation (n = 71). Germline patients n = 4. NMD: Predicted NDM status based on PTC location in transcript by NMDetective. DeltaVAF: Mean RNA-VAF percentage of DNA-VAF for all mutated patients. Expression: log2(TPM + 1) for most abundant corresponding transcript.

PFP set covers germline mismatch repair deficient tumors in TCGA-COAD

Lynch patients carry a heterozygous germline mutation in an MMR gene, resulting in a 50–80% life-time chance to develop colon cancer [25]. MSI colon cancer tumorigenesis in Lynch patients may be different than that of sporadic MSI colon cancer, leading to expression of different PFPs. At the same time, prophylactic immunotherapy targeting PFP may be beneficial for these patients. Therefore we evaluated whether the PFP set may potentially be applicable to patients with a germline MMR mutation. Using normal blood DNA sequencing data, we identified 4 MSI TCGA-COAD patients (5.6%) with a pathogenic germline dMMR mutation. This percentage is in line with earlier reports of germline dMMR mutations in literature [25]. In overlap with sporadic MSI colon cancers, three out of four of these germline mutated dMMR tumors encodes at least 1 PFP from the 20-valent PFP-set. Colon Cancer samples with a germline MMR mutation from two patients only have 1 mutation in one of the identified hotspots and 1 germline mutated MSI sample has 4 hotspot mutations (Table 1). This supports the potential of the PFP-set in a prophylactic vaccination setting for Lynch patients.

PFP set overlaps with majority of MSI stomach and MSI endometrial tumors

As large subsets of endometrial (TCGA-UCEC) and stomach tumors (TCGA-STAD) are also microsatellite unstable, the defined PFP might also be applicable in these patients. Recently it has been reported that frequently occurring INDEL mutations between MSI patients overlap, independent of tissue origin, suggesting a role in tumor evolution [26]. There indeed is a significant overlap between INDELS in the 10 most frequently mutated INDELS in these MSI cancer types (15–25%) and to a lesser extend in the top 100 most frequently mutated INDELS (13–14%) (Fig 5A). Most importantly this overlap does occur in both mutations predicted as NMD sensitive and resistant (Fig 5B), which provides potentially overlapping SNORFS between MSI cancer types. Ultimately the 20 PFP set defined in MSI colon cancer covers 62.2% of MSI endometrial (n = 74) and 58.8% of MSI stomach (n = 34) cancer patients.

Fig 5. INDEL mutation overlap in MSI tumors.

Fig 5

Overlap between INDEL mutations in MSI colon (COAD), MSI endometrial (UCEC) and MSI stomach (STAD) TCGA datasets A) differences in overlap for top 10, 20, 100 most frequent or all INDEL mutations B) overlap in top 100 most frequent mutated INDEL mutations split in NMD SENS and RES predicted mutations by NMDetective.

Discussion

Based on DNA and RNA sequencing data from MSI colon cancer patients in the TGCA we identified a subset of frameshift mutations that cluster to distinct cMS in the genome, produce SNORFS that evade NMD, and have sufficient length for presentation in MHC context. Based on this, we established a PFP-set of 20 frameshift peptides that cover 93% of individual MSI patients. Such a set of PFPs has potential for non-personalized cancer-specific immunotherapy strategies directed against MSI colon cancer, especially in the adjuvant setting. In addition, this set of PFPs also covers three out of four Lynch patients in the TGCA-COAD, which supports potential application in a prophylactic vaccination setting for patients with Lynch syndrome. Clearly, as the sporadic MSI colon cancer mutation signature may differ from that of Lynch patients, target discovery for Lynch patients requires a larger patient cohort than the 4 patients identified in the TCGA-COAD.

We did not predict MHC binding for the selected PFPs. Currently MHC class I binding predictions tools only achieve approximately 40% sensitivity [27]. Although new methods have recently been published [28], especially the prediction of MHC class II binding remains very challenging, whilst MHC class II presented neo-antigens seem to drive anti-tumor immunity [28,29]. Furthermore, the above average length of the PFPs and the fact that they are non-self on any position makes adequate presentation within a naturally polymorphic MHC context more likely. Also, the concept of non-personalized immunotherapy would be compromised by MHC matching. In this context it is noteworthy that the loss of MHC-I on colon cancer cells could potentially limit the effectiveness of immunotherapy that target PFPs in the same way it can limit the effectiveness of checkpoint inhibition [30].

Finally, since this work is based on in silico analysis of sequencing data we could not determine the occurrence of PFPs on a protein level. Nonetheless, based on our extensive filtering for NMD signals in RNA sequencing data we deem it likely that a majority of the selected SNORFS will be translated and lead to presentable PFPs. Even SNORS with a DeltaVAF indicating a limited effect of NMD tend to harbor sufficient mutated mRNA. Also it has been shown that a first so-called pioneering round of translation, can already be adequate for neo-antigen formation [31]. In that regard our higher VAF threshold and strict NMD evasion filtering may have missed relevant PFPs. Furthermore, although we utilized novel NMD prediction algorithms [11], the availability of RNA sequencing data enables an unbiased determination of mutated RNA transcript levels, independent of PTC localization. In order to assess the PFP presentation on a protein level we are currently collecting patients samples in the ATAPEMBRO trial [32], which we aim to screen for specific PFP T-cell immunoreactivity by IFN-gamma Elispot.

In conclusion we identified a set of 20 DEL mutation hotspots in coding DNA microsatellites that give rise to adequately transcribed, NMD-escaping SNORFS and subsequently can produce PFPs. Likely, these SNORFs are involved in colon cancer tumorigenesis. The 20 most common PFPs cover 93% of MSI patients and 3 out of 4 patients that harbor a pathogenic germline MMR mutation in the TCGA-COAD. Furthermore these PFPs cover 62.2% of endometrial and 58.8% of stomach MSI patients. Therefore, these PFPs are potential targets for immunotherapy strategies, not only in in colon, but also in other MSI cancers.

Methods

Data acquisition, storage and computational analysis

All DNA and RNA sequencing data and corresponding patient characteristics, microsatellite panel and MLH1 promotor methylation data was collected from the Cancer Genome Atlas Colon Adenocarcinoma(TCGA-COAD). Uterine Corpus Endometrial Carcinoma (TCGA-UCEC) and Stomach Adenocarcinoma (TCGA-STAD) databases [18]. Access to private data was granted by the National Cancer Institute Data Access Committee (NCIDAC) from the U.S. National Institute of Health trough the database of Genotypes and Phenotypes (dbGaP) portal. According to dbGaP guidelines private data was stored on the pre-approved internal Amsterdam UMC data-infrastructure. Computations on large public TCGA datasets were performed on the SurfSara high performance computing cloud (grant provided by the Dutch government).

Somatic mutation calling from DNA sequencing files

Somatic mutations were called on aligned DNA reads by Mutect [33] and Varscan [34] separately with default quality control parameters. Those calls were independently filtered by empirical Bayesian mutation filtering using EBFilter [35] to reduce the false positive rate. Finally, the overlapping filtered mutations between these two methods for each individual patient were selected as true positive somatic mutations. VAF was calculated as percentage of total DNA reads that contains the mutation. A VAF cutoff of 20% was used.

Microsatellite status and tumor mutational burden

Microsatellite status was retrieved directly from the TCGA-COAD dataset and computationally verified by MSIpred [36]. Tumor mutational burden (TMB) was calculated by dividing the total amount of exonic mutations by the length of the captured exome sequence for the particular sequencing experiment. A limited number of high TMB outliers in the MSS group can be identified (n = 5, 1.28%). These patients all have either one or two mutations in the POLE1 or POLE2 gene, well known for causing tumor hypermutation [37,38].

Nonsense mediated decay escape by RNA and DNA mutated reads comparison (VAF-ratio)

Counts for mutated reads were collected by performing Mpileup [39] on matched RNA and DNA tumor aligned sequencing reads for each of the specific genomic position of the somatic mutations selected prior. Mutations with less than 10 mutated reads in the DNA were discarded, although this number was very limited due to the stringent selection of mutations. No lower limit was placed on the RNA read counts since this would hinder the detection of complete nonsense mediated decay (NMD). NMD is measured by the difference in percentage between the fraction of mutated reads in the DNA compared to the RNA (or ΔVAF), were the fraction of mutated reads in the DNA is set as 100%.

VAFratio(locusxforpatienty)=MutatedReadsRNAx(y)TotalReadsRNAx(y)MutatedReadsDNAx(y)TotalReadsDNAx(y)×100

transcripts that have a VAF-ratio of 50% or higher a considered NMD-escaping. For comparison between sensitive and resistant transcripts, all VAF scores are normalized against the median of all resistant transcripts

ExampleNMDresistant:150200350400×100=0.750.875×100=0.85×100=85%
ExampleNMDsensitive:50200350400×100=0.250.875×100=0.28×100=28%

Transcript mRNA levels and Zscore

Accurate transcript quantification from RNA sequencing data is calculated by RSEM [40] and mapped to Ensembl [41] genome browser transcripts from reference genome GRCh38.p13. Absolute mRNA transcript levels per million (TPM) are log_2 transformed. The z-score is calculated for each individual mutated transcript. The Z-score of the mutated transcript in a specific patient is calculated by subtracting the mean mRNA level from the same (wild-type) transcript in all (MSI) patients from the mRNA level of the mutated transcript in the patient and subsequently dividing it by the standard deviation of this same transcript in all (MSI) patients.

Zscore(mutatedtranscriptxforpatienty)=mRNAx(y)μmRNAxσmRNAx

Due to the internal control, the same transcript in the same type of patients, copy number alteration and gene expression variations do not influence this score. Nonetheless, because of the above factors, Z-scores can only be reliably be interpreted if calculated within a population with a single molecular cancer subtype, such as MSI Colon cancer or MSS Colon cancer. Transcript with a mean mRNA level less or equal than 1.0 are removed from the analysis, since these are considered not expressed in colon cancer tissue.

Microsatellite and premature termination codon calculation

A proprietary Perl script run with GNU parallel utilizes the Ensembl Perl API to collect transcript sequences based on the position, allele, type and length of inputted true positive somatic mutations. Using this sequence, it calculates the number of mono-, di- or multi-nucleotide repeats on the mutation position. Subsequently it constructs the new reading frame after mutation and checks the RNA sequence until the occurrence of a premature termination codon. Finally, it records the position of this PTC in the transcript and relative to the mutational site.

Germline mutation calling from normal tissue DNA sequencing files

Aligned DNA read files were spliced to incorporate only genomic regions of the mismatch repair genes MLH1, MSH2, MSH6, PMS2 and EPCAM. Germline mutation calling on spliced aligned reads was performed using the GATK germline short variant discovery pipeline and using the GATK [42] best practices. The workflow subsequently utilizes Haplotypecaller, CNNScoreVariants and FilterVariantTrances. Mutation calls with QD < 10 and read count < 10 were discarded. Mutational calls were functionally annotated employing Funcotator [42]. Variants of unknown significance were manually scored by a laboratory specialist clinical geneticist on a 1–5 scale according to the American College of Medical Genetics and Genomics (ACMG) guidelines [43], based on mutation characteristics, amino-acid change and taking patient clinical data, somatic mutations and MLH1 promotor methylation data into account.

Downstream analysis

Statistical analysis were performed using R 3.4.4. in Rstudio 1.2.1335 (Rstudio, INC) with the tidyverse package enabled. Difference between means are calculated by a Mann Whitney U test for comparison between 2 groups and a Kruskall Wallis test for comparison between more than 2 groups. Graphs were generated using ggplot2.

Supporting information

S1 Fig. Difference in NMD measurement.

Density map showing difference between Z-score and VAF-ratio approach for the measurement of Nonsense Mediated Decay (NMD) in NMD RES and SENS mutations. Separation of NMD RES and SENS is clearer with VAF-ratio.

(DOCX)

S2 Fig. Premature Termination Codon (PTC) distance.

PTC distance for mutation in MSI_H. Median is shown as dotted line (median = 14 Amino Acids). Dashed line shows 20 Amino Acids.

(DOCX)

S3 Fig. Patient coverage for top 30 frameshift mutations.

Percentage of patients covered by n most frequently occurring frameshift mutations with adequate expression, a SNORF and NMD-escape.

(DOCX)

S1 Table. Candidate cancer genes.

List with genes described in the candidate cancer gene database that are part of the top 100 most occurring mutations in MSI colon cancer. Rank: Relative rank assigned to CIS in study.

(DOCX)

Acknowledgments

We would like to thank Prof. dr. L. Vermeulen for proofreading the manuscript.

Code availability

The proprietary Perl computer code used to calculate PTC position in coding microsatellites is available at https://github.com/ITSpaanderman/PTC_predictor, including run instructions and test samples. GATK pipeline information is available at https://gatk.broadinstitute.org/hc/en-us. RSEM computer code is made available by its author at https://github.com/deweylab/RSEM.

Data Availability

TCGA-data is available trough the TCGA initiative at https://portal.gdc.cancer.gov/. The proprietary Perl computer code used to calculate PTC position in coding microsatellites is available at https://github.com/ITSpaanderman/PTC_predictor, including run instructions and test samples. GATK pipeline information is available at https://gatk.broadinstitute.org/hc/en-us. RSEM computer code is made available by its author at https://github.com/deweylab/RSEM.

Funding Statement

A.D. Bins received the Dutch Cancer Society (KWF) Grant 21923. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.McGranahan N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016). doi: 10.1126/science.aaf1490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Maletzki C., Schmidt F., Dirks W. G., Schmitt M. & Linnebacher M. Frameshift-derived neoantigens constitute immunotherapeutic targets for patients with microsatellite-instable haematological malignancies: frameshift peptides for treating MSI+ blood cancers. Eur. J. Cancer Oxf. Engl. 1990 49, 2587–2595 (2013). doi: 10.1016/j.ejca.2013.02.035 [DOI] [PubMed] [Google Scholar]
  • 3.Brown S. D. et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 24, 743–750 (2014). doi: 10.1101/gr.165985.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Le D. T. et al. PD-1 Blockade in Tumors with Mismatch-Repair Deficiency. N. Engl. J. Med. 372, 2509–2520 (2015). doi: 10.1056/NEJMoa1500596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Turajlic S. et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 18, 1009–1021 (2017). doi: 10.1016/S1470-2045(17)30516-8 [DOI] [PubMed] [Google Scholar]
  • 6.Mandal R. et al. Genetic diversity of tumors with mismatch repair deficiency influences anti–PD-1 immunotherapy response. Science 364, 485–491 (2019). doi: 10.1126/science.aau0447 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rooij N. van et al. Tumor Exome Analysis Reveals Neoantigen-Specific T-Cell Reactivity in an Ipilimumab-Responsive Melanoma. J. Clin. Oncol. 31, e439–e442 (2013). doi: 10.1200/JCO.2012.47.7521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gubin M. M. et al. Checkpoint blockade cancer immunotherapy targets tumour-specific mutant antigens. Nature 515, 577–581 (2014). doi: 10.1038/nature13988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Litchfield K. et al. Escape from nonsense mediated decay associates with anti-tumor immunogenicity. bioRxiv 823716 (2019) doi: 10.1101/823716 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kervestin S. & Jacobson A. NMD: a multifaceted response to premature translational termination. Nat. Rev. Mol. Cell Biol. 13, 700–712 (2012). doi: 10.1038/nrm3454 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lindeboom R. G. H., Vermeulen M., Lehner B. & Supek F. The impact of nonsense-mediated mRNA decay on genetic disease, gene editing and cancer immunotherapy. Nat. Genet. 51, 1645–1651 (2019). doi: 10.1038/s41588-019-0517-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pastor F., Kolonias D., Giangrande P. H. & Gilboa E. Induction of tumour immunity by targeted inhibition of nonsense-mediated mRNA decay. Nature 465, (2010). doi: 10.1038/nature08999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lin E. I. et al. Mutational profiling of colorectal cancers with microsatellite instability. Oncotarget 6, 42334 (2015). doi: 10.18632/oncotarget.5997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Boyer J. C. et al. Sequence dependent instability of mononucleotide microsatellites in cultured mismatch repair proficient and deficient mammalian cells. Hum. Mol. Genet. 11, 707–713 (2002). doi: 10.1093/hmg/11.6.707 [DOI] [PubMed] [Google Scholar]
  • 15.Jiricny J. The multifaceted mismatch-repair system. Nat. Rev. Mol. Cell Biol. 7, 335–346 (2006). doi: 10.1038/nrm1907 [DOI] [PubMed] [Google Scholar]
  • 16.Li G.-M. Mechanisms and functions of DNA mismatch repair. Cell Res. 18, 85–98 (2008). doi: 10.1038/cr.2007.115 [DOI] [PubMed] [Google Scholar]
  • 17.Guinney J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015). doi: 10.1038/nm.3967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.National Institute of Health. The Cancer Genome Atlas. The Cancer Genome Atlas, National Institute of Health, Bethesda, U.S. http://cancergenome.nih.gov/.
  • 19.Wu M. et al. MSI status is associated with distinct clinicopathological features in BRAF mutation colorectal cancer: A systematic review and meta-analysis. Pathol.—Res. Pract. 216, 152791 (2020). [DOI] [PubMed] [Google Scholar]
  • 20.Abbott K. L. et al. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice. Nucleic Acids Res. 43, D844–D848 (2015). doi: 10.1093/nar/gku770 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lindeboom R. G. H., Supek F. & Lehner B. The rules and impact of nonsense-mediated mRNA decay in human cancers. Nat. Genet. 48, 1112–1118 (2016). doi: 10.1038/ng.3664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Trolle T. et al. The length distribution of class I restricted T cell epitopes is determined by both peptide supply and MHC allele specific binding preference. J. Immunol. Baltim. Md 1950 196, 1480–1487 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.van den Berg J. H., Oosterhuis K., Schumacher T. N. M., Haanen J. B. A. G. & Bins A. D. Intradermal vaccination by DNA tattooing. Methods Mol. Biol. Clifton NJ 1143, 131–140 (2014). doi: 10.1007/978-1-4939-0410-5_9 [DOI] [PubMed] [Google Scholar]
  • 24.Sahin U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017). doi: 10.1038/nature23003 [DOI] [PubMed] [Google Scholar]
  • 25.Jasperson K. W., Tuohy T. M., Neklason D. W. & Burt R. W. Hereditary and Familial Colon Cancer. Gastroenterology 138, 2044–2058 (2010). doi: 10.1053/j.gastro.2010.01.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ballhausen A. et al. The shared frameshift mutation landscape of microsatellite-unstable cancers suggests immunoediting during tumor evolution. Nat. Commun. 11, 4740 (2020). doi: 10.1038/s41467-020-18514-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bonsack M. et al. Performance evaluation of MHC class-I binding prediction tools based on an experimentally validated MHC-peptide binding dataset. Cancer Immunol. Res. (2019) doi: 10.1158/2326-6066.CIR-18-0584 [DOI] [PubMed] [Google Scholar]
  • 28.Alspach E. et al. MHC-II neoantigens shape tumour immunity and response to immunotherapy. Nature 574, 696–701 (2019). doi: 10.1038/s41586-019-1671-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kreiter S. et al. Mutant MHC class II epitopes drive therapeutic immune responses to cancer. Nature 520, 692–696 (2015). doi: 10.1038/nature14426 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Garrido F., Aptsiauri N., Doorduijn E. M., Garcia Lora A. M. & van Hall T. The urgent need to recover MHC class I in cancers for effective immunotherapy. Curr. Opin. Immunol. 39, 44–51 (2016). doi: 10.1016/j.coi.2015.12.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Apcher S., Daskalogianni C. & Fåhraeus R. Pioneer translation products as an alternative source for MHC-I antigenic peptides. Mol. Immunol. 68, 68–71 (2015). doi: 10.1016/j.molimm.2015.04.019 [DOI] [PubMed] [Google Scholar]
  • 32.Pembrolizumab With Ataluren in Patients With Metastatic pMMR and dMMR Colorectal Carcinoma or Metastatic dMMR Endometrial Carcinoma: the ATAPEMBRO Study—Full Text View—ClinicalTrials.gov. https://clinicaltrials.gov/ct2/show/NCT04014530.
  • 33.Cibulskis K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). doi: 10.1038/nbt.2514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Koboldt D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinforma. Oxf. Engl. 25, 2283–2285 (2009). doi: 10.1093/bioinformatics/btp373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Shiraishi Y. et al. An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41, e89–e89 (2013). doi: 10.1093/nar/gkt126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wang C. & Liang C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci. Rep. 8, 17546 (2018). doi: 10.1038/s41598-018-35682-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mur P. et al. Role of POLE and POLD1 in familial cancer. Genet. Med. 1–12 (2020) doi: 10.1038/s41436-020-0922-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Campbell B. B. et al. Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042–1056.e10 (2017). doi: 10.1016/j.cell.2017.09.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011). doi: 10.1093/bioinformatics/btr509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Li B. & Dewey C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). doi: 10.1186/1471-2105-12-323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zerbino D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018). doi: 10.1093/nar/gkx1098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.McKenna A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). doi: 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Richards S. et al. Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. Off. J. Am. Coll. Med. Genet. 17, 405–424 (2015). doi: 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jianjiong Gao

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

28 Jul 2020

PONE-D-20-19308

Framing the potential of public frameshift peptides as immunotherapy targets in colon cancer

PLOS ONE

Dear Dr. Bins,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 11 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jianjiong Gao

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please describe in more detail what categories of private data you collected and how another researcher would obtain the data.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript by Bins et al. highlights the potential of using SNORFS as a non-personalized immunotherapy in MSI colon cancer patients, as 10 SNORFS would still cover >=75% of the patients. It uses a large TCGA database with a reasonable number of MSI tumors (76 out of 461) colon cancers, although I found it a bit unusual that they did not consider other MSI cancer types in TCGA (point #1 below). They applied stringent and well-reasoned filters to the data (see however concern #2). However it seems that not all known NMD rules are represented therefore the NMD efficiency is not estimated as accurately as it could be, and this might be improved to bring the method to the state-of-the-art (see #3). One weakness of the study -- which I do not necessarily expect them to address, but they should at least discuss -- is that there was no replication/validation in an independent dataset. Overall their conclusions seem well supported with data and well presented. The very high recurrence of some deletions in MSI colon cancer is indeed potentially a source of near-universal neoantigens that is worth reporting.

Major remarks:

1. In addition to colon cancers, also the uterine and stomach cancers in TCGA are commonly MSI. The same analysis could be used to catalogue public frameshift peptides in those cancer types as well, with minimal alterations to their pipeline. It is a missed opportunity that this was not done, particularly so because it is of interest to compare the neopeptide repertoire across tissues (does it vary more than across individuals?)

2. Line 73: "This transcript independent method is a much more sensitive and accurate measurement of NMD" (I suppose they mean than their Z-score method: please make that clear and also explain Z-score better in Methods). Their Z-score does not appear control for copy number altered segments, nor for the confounding of gene expression by molecular subtype, even though these steps were perfomed in previous work (eg. PMID: 27618451 that they cite). Either refine the Z-score calculation, or omit it completely from the paper stating you rely on the VAF-ratio method only.

2b. Related to the above: According to VAF-ratio definition, higher values indicate that PTC-containing RNA is more degraded (by NMD). How do they explain that VAF ratio for NMD-sensitive transcripts is smaller (34.3%) than NMD-resistant transcripts (59.9%)? This may need clarifying. It was also not clear to me how NMD-resistant transcripts are downregulated by 40%, however the difference between sensitive and resistant transcripts downregulation is quite small (65-40%). I am a bit worried that they might not be classifying NMD-resistant and sensitive transcripts properly (perhaps because of the #3 below).

3. As I understood, they only consider one NMD rule to assess if a PTC will or not trigger NMD (line 61): last exon + 50nt rule, which accounts for only a part of the known of the NMD-escaping variance (PMID: 27618451). It is a bit puzzling why they ignored the other rules because they have referenced this paper. That the other NMD-escape rules might have an effect is consistent with what they write (line 66): "Nonetheless the NMD effect is limited in an absolute sense and clearly heterogeneous as a large portion of potential sensitive mutations seem to escape NMD based on the Z-score of mRNA transcript levels.". One way to address this is to use predictions from previous NMD prediction methods (eg PMID: 31659324), which also consider these other rules.

Minor.

- Line 139: "Furthermore the availability of RNA sequencing data also negated the need for novel NMD prediction algorithms." Please clarify.

- Table 1: CUMALITIVE MSI PATIENTS COVERED / CUMALITIVE LYNCH PATIENTS COVERED (N, %) > CUMULATIVE

- Line 12: "presence of NMD escaping SNORFS has been shown to be a strong predictor of immunotherapy response and survival in multiple cancer types" Consider citing PMID: 20463739 and/or PMID: 31659324

- Line 13: Better explain difference between NORFS and SNORFS. Which is the nt/aa length boundary?

- Line 39: Statistics reported for Lynch and sporadic cases together? Clarify.

- Line 50: "46% of the patients" Did they mean "46% of the MSI patients?"

- Line 55: Appears a bit contradictory with sentence at line 46-47. Basically, in line 46 one gets the idea that all 3 types of mutation cluster to specific loci, but then it is said that INS and SNP hardly cluster. Consider rephrasing.

- Line 83: When they describe peptide size-limitation for loading into the MHC complex, they should mention some references. They claim that 10 aas is not enough, but 14 might be (as I understood the text), there should be a source cited for this.

- Line 87: "Assuming that a set of 20 PFPs is feasible to incorporate in a multivalent vaccine". What is this assumption based on? There is no reference.

- Line 92: SNOFRS > SNORFS

- Line 119: Cleary > Clearly

- Line 136: Alsco > Also,

- Line 140: FPF > PFP

- Line 162: Clarify that this is about DNA and not RNA (it is stated in the title but not in the text)

- Line 178: 2log > log_2

- Line 191: typo "incorporated"

Reviewer #2: Spaanderman et al present an analysis of public frameshifted peptides, in microsatellite instable colorectal cancer. The work is timely in nature, given the broad scientific and clinical interest in novel anti-cancer immunotherapeutics. Spaanderman and colleagues take a novel approach to this challenge, focusing on public neoantigens, recurrently found in a high proportion of patients. This avoids the costs and excessive effort of personalized vaccines, which are private to each patient. In this report the authors provide a concise but highly robust analysis of the most promising frameshift peptide targets in CRC. They show a cocktail vaccine of ~20 peptides could cover up to ~90% of MSI CRC patients, raising the prospect of off-the-shelf therapeutics or even prophylactic utility in Lynch syndrome cases. A good level of technical validation is presented in the paper, with targets assessed for mutation frequency in both sporadic & hereditary CRC, as well as mRNA expression levels, neo open reading frame length and susceptibility to NMD degradation. Minor corrections are suggested below, however this manuscript is of broad interested and should be supported for timely publication.

Minor comments:

1. The introduction lacks important detail on the immunogenicity data supporting MSI peptides as promising therapeutic targets, e.g. capability to elicit T cell response, e.g. https://pubmed.ncbi.nlm.nih.gov/23561850/. In addition, clinical data from MSI checkpoint inhibitor trials should be cited, where immunogenicity was demonstrated to be driven strongly by frameshift events, e.g. https://science.sciencemag.org/content/357/6349/409. Lastly, recent pre-clinical data from MSI models, which again show enriched immunogenicity (immune editing) directed towards indels should be referenced, e.g. https://science.sciencemag.org/content/364/6439/485

2. In figure 3b could the frequency of the top 15 mutations also be reported for other major MSI histologies (e.g. endometrium (UCEC), stomach (STAD))? MSI cases from these histologies are reported in sup data 1 of: https://www.nature.com/articles/ncomms15180#Sec22. If the % of MSI STAD and UCEC cases covered can be added this will help broaden the relevance of this work to multiple cancer types.

3. Could the authors comment on the extreme outliers in the MSS group in Fig1 (TMB ~15000-20000) – are these potentially misclassified as MSS by TCGA? A comment to explain this should be added to the methods.

4. Figure 3C is difficult to interpret – from Line55 – how is “similar specific mutation” defined? Within a certain +/- bp distance? This definition should be given in the main text for clarity purposes.

5. Regarding statistical tests, given TMB and other measures in the manuscript are likely not normally distributed, they should consider replacing t-tests with non-parametric tests (e.g. Mann Whitney U test). Similarly Kruskal-wallis test should be considered for Fig.3C.

6. Line24 typo – should be “assess”

7. Panel Fig. 2C – percentages on the y-axis are presented to two decimal places, which is a bit unnecessary, please reduce to 1 or no decimal places.

8. Figure 3 – gene ACVR2A appears to be offset versus the others, can this be corrected.

9. Line42-45 – This sentence is not clear and could be split into two, and also made the numerical data on line45 (2d) refers to microsatellite length.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 28;16(6):e0251630. doi: 10.1371/journal.pone.0251630.r002

Author response to Decision Letter 0


24 Jan 2021

Response to reviewer questions

Major remarks

Where applicable questions have been bundled. Reviewer #1 is I and #2 II

1. Analysis in STAD and UCEC MSI

I. In addition to colon cancers, also the uterine and stomach cancers in TCGA are commonly MSI. The same analysis could be used to catalogue public frameshift peptides in those cancer types as well, with minimal alterations to their pipeline. It is a missed opportunity that this was not done, particularly so because it is of interest to compare the neopeptide repertoire across tissues (does it vary more than across individuals?)

II. In figure 3b could the frequency of the top 15 mutations also be reported for other major MSI histologies (e.g. endometrium (UCEC), stomach (STAD))? MSI cases from these histologies are reported in sup data 1 of: https://www.nature.com/articles/ncomms15180#Sec22. If the % of MSI STAD and UCEC cases covered can be added this will help broaden the relevance of this work to multiple cancer types.

- This is a very important suggestion from both reviewers and therefore we would sincerely like to thank both reviewers for this question. It has led to an important and insightfull new analysis. In order to accommodate this analysis we added a new subsectionat the end of the results section and created a new figure (F5). This analysis shows that there is definite overlap between INDELS MSI cancer types, but most important the PFP subset also covers 62.2% of endometrial MSI and 58.8% of stomach MSI patients (from TCGA-COAD and TCGA-STAD). These new findings implicate that frameshift peptide vaccination could be a potential treatment in all MSI cancer types.

2. Clarification of Zscore

I. Line 73: "This transcript independent method is a much more sensitive and accurate measurement of NMD" (I suppose they mean than their Z-score method: please make that clear and also explain Z-score better in Methods). Their Z-score does not appear control for copy number altered segments, nor for the confounding of gene expression by molecular subtype, even though these steps were perfomed in previous work (eg. PMID: 27618451 that they cite). Either refine the Z-score calculation, or omit it completely from the paper stating you rely on the VAF-ratio method only.

Related to the above: According to VAF-ratio definition, higher values indicate that PTC-containing RNA is more degraded (by NMD). How do they explain that VAF ratio for NMD-sensitive transcripts is smaller (34.3%) than NMD-resistant transcripts (59.9%)? This may need clarifying. It was also not clear to me how NMD-resistant transcripts are downregulated by 40%, however the difference between sensitive and resistant transcripts downregulation is quite small (65-40%). I am a bit worried that they might not be classifying NMD-resistant and sensitive transcripts properly (perhaps because of the #3 below).

- We thank reviewer #1 for pointing out that our explanation of the Zscore and VAF-ratio concepts are insufficient in the manuscript. The transcript independent method is actually the VAF-ratio, where a lower VAF ratio indicates NMD. We have edited both the main text, the legend of figure 3 and the methods section to better reflect the above. As stated below the NMD classification has been replaced by NMDetective algoritm by Lindenboom et al. (PMID: 31659324) Considering the downregulation of NMD resistant mutation this is also a good questions, and we are thankfull that the reviewer made us realize that the explanation for this fact does not become very clear. Most likely this downregulation is due to the effect of the VAF-ratio method, that relies on the difference between the DNA and RNA VAF. The fact that DNA contains two alleles, whereas just one is copied to the RNA it becomes expected that even in the situation of NMD resistant genes the mean VAF-ratio is approximately 50%. We have added a clarification of this in the methods section.

- As the Zscore is calculated by determining the mRNA transcript level of an individual mutated transcript in an individual Patient which is then related to the mRNA levels in the same transcript in the rest of the patient group we feel that there is an adequate internal control that negates the influence of copy number alteration and gene expression variations.

3. NMD SENS or RES classification

I. As I understood, they only consider one NMD rule to assess if a PTC will or not trigger NMD (line 61): last exon + 50nt rule, which accounts for only a part of the known of the NMD-escaping variance (PMID: 27618451). It is a bit puzzling why they ignored the other rules because they have referenced this paper. That the other NMD-escape rules might have an effect is consistent with what they write (line 66): "Nonetheless the NMD effect is limited in an absolute sense and clearly heterogeneous as a large portion of potential sensitive mutations seem to escape NMD based on the Z-score of mRNA transcript levels.". One way to address this is to use predictions from previous NMD prediction methods (eg PMID: 31659324), which also consider these other rules.

- Indeed this is a very valid point. The NMD prediction has been reclassified in the manuscript using the NMD detective algorthm by lindeboom et al. (PMID: 31659324). This has not altered the conclusions in this manuscript. Even after applying these complete NMD rules, the NMD effect remains limited and heterogeneous on the RNA level even in the NMD sensitive transcripts.

Minor remarks

Reviewer #1

1. Line 139: "Furthermore the availability of RNA sequencing data also negated the need for novel NMD prediction algorithms." Please clarify.

- As suggested we implemented the relatively novel NMD algorithm by Lindeboom et al. in order to determine more precisely if a Frameshift mutation should be considered NMD resistant or Sensitive. Nonetheless the availability of RNA data for these same patients makes it possible to determine mRNA transcript levels from each mutated transcripts and the RNA variant allele frequency for each mutated transcript. These values are a direct and thereby more accurate representation of the degradation and remaining quantity of mutated RNA. This in the end provides the best insight in potential for neo-epitope formation and does not necessarily rely on NMD prediction algorithms.

2. Table 1: CUMALITIVE MSI PATIENTS COVERED / CUMALITIVE LYNCH PATIENTS COVERED (N, %) > CUMULATIVE

- Corrected

3. Line 12: "presence of NMD escaping SNORFS has been shown to be a strong predictor of immunotherapy response and survival in multiple cancer types" Consider citing PMID: 20463739 and/or PMID: 31659324

- Thank you for suppling the additional references. An additional line briefly stating this has been added to the introduction. Both references are included

4. Line 13: Better explain difference between NORFS and SNORFS. Which is the nt/aa length boundary?

- Different thresholds for SNORFS have been defined. The paper cited defines SNORFS as NORFS with a minimal length of 50 amino acids. Regarding the SNORFS defined in our manuscript we choose a cutoff of at least 10 amino acids (as stated in Results). This lower cutoff is based on the minimal out of frame length that is required for full out of frame presentation in the context of MHC.

5. Line 39: Statistics reported for Lynch and sporadic cases together? Clarify.

- The statistics represent all MSI patients, including Lynch patients. This has been clarified in the manuscript.

6. Line 50: "46% of the patients" Did they mean "46% of the MSI patients?"

- This is correct and corrected in the manuscript.

7. Line 55: Appears a bit contradictory with sentence at line 46-47. Basically, in line 46 one gets the idea that all 3 types of mutation cluster to specific loci, but then it is said that INS and SNP hardly cluster. Consider rephrasing.

- Corrected. With exception for the well documented BRAF SNP only frameshift deletions tent to cluster.

8. Line 83: When they describe peptide size-limitation for loading into the MHC complex, they should mention some references. They claim that 10 aas is not enough, but 14 might be (as I understood the text), there should be a source cited for this.

- More context is added in section NMD escape by public frameshift citing two articles for both HMCI as MCHII binding peptide length.

9. Line 87: "Assuming that a set of 20 PFPs is feasible to incorporate in a multivalent vaccine". What is this assumption based on? There is no reference.

- This is based on our own experience with DNA vaccination techniques. Refrence to book chapter on DNA vaccination has been added

10. Line 92: SNOFRS > SNORFS

- Corrected

11. Line 119: Cleary > Clearly

- Corrected

12. Line 136: Alsco > Also,

- Corrected

13. Line 140: FPF > PFP

- Corrected

14. Line 162: Clarify that this is about DNA and not RNA (it is stated in the title but not in the text)

- Corrected

15. Line 178: 2log > log_2

- Corrected

16. Line 191: typo "incorporated"

- Corrected

Reviewer #2

1. The introduction lacks important detail on the immunogenicity data supporting MSI peptides as promising therapeutic targets, e.g. capability to elicit T cell response, e.g. https://pubmed.ncbi.nlm.nih.gov/23561850/. In addition, clinical data from MSI checkpoint inhibitor trials should be cited, where immunogenicity was demonstrated to be driven strongly by frameshift events, e.g. https://science.sciencemag.org/content/357/6349/409. Lastly, recent pre-clinical data from MSI models, which again show enriched immunogenicity (immune editing) directed towards indels should be referenced, e.g. https://science.sciencemag.org/content/364/6439/485

- Thank you for your input on this very relevant prior work. Some of the work was already referenced in the manuscript, non-referenced work has been implemented and referenced.https://pubmed.ncbi.nlm.nih.gov/23561850/ is already referenced in the introduction (ref 2)

- https://science.sciencemag.org/content/357/6349/409 is already referenced in the introduction (ref 4)

- https://science.sciencemag.org/content/364/6439/485 has been added as a reference, confirming the importance of mismatch-repair in anti-PD1 response.

2. Could the authors comment on the extreme outliers in the MSS group in Fig1 (TMB ~15000-20000) – are these potentially misclassified as MSS by TCGA? A comment to explain this should be added to the methods.

- This is a very interesting question, and we believe that indeed it would be something that might be unclear to many readers. These outliers most likely represent a specific group of MSS patients with a mutation in polη, that has been described in literature to lead to high TMB, but no increased indel formation. A new section regarding MSI/MSS, TMB and polη has been added to the Methods.

3. Figure 3C is difficult to interpret – from Line55 – how is “similar specific mutation” defined? Within a certain +/- bp distance? This definition should be given in the main text for clarity purposes.

- Thank you for pointing out that this is not clear. Itrefers to mutations that occur on the exact same location in the genome andthis clarification has been added to the manuscript.

4. Regarding statistical tests, given TMB and other measures in the manuscript are likely not normally distributed, they should consider replacing t-tests with non-parametric tests (e.g. Mann Whitney U test). Similarly Kruskal-wallis test should be considered for Fig.3C.

- Thank you for this valid point. We took your suggestion and recalculated the statistics if needed using a Mann-Whitney U test when comparing two groups and a Kruskall Wallis test when comparing multiple groups. This did not alter the conclusions in our manuscript

5. Line24 typo – should be “assess”

- corrected

6. Panel Fig. 2C – percentages on the y-axis are presented to two decimal places, which is a bit unnecessary, please reduce to 1 or no decimal places.

- corrected

7. Figure 3 – gene ACVR2A appears to be offset versus the others, can this be corrected.

- corrected

8. Line42-45 – This sentence is not clear and could be split into two, and also made the numerical data on line45 (2d) refers to microsatellite length.

- Sentence has been split and altered to clarify the message. Text has been added before numerical data to indicate that this refers to the microsatellite length on the mutation locus.

Decision Letter 1

Jianjiong Gao

10 Feb 2021

PONE-D-20-19308R1

Framing the potential of public frameshift peptides as immunotherapy targets in colon cancer

PLOS ONE

Dear Dr. Bins,

Thank you for submitting your manuscript to PLOS ONE. The reviewers were satisfied with the revision. However, reviewer 1 has requested several minor revisions to the text to clarify certain aspects of their work. Therefore, we invite you to submit a revised version of the manuscript that addresses them.

Please submit your revised manuscript by Mar 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jianjiong Gao

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have revised the manuscript and have largely adressed reviewer concerns. I would request some minor revisions to the text to clarify certain aspects of their work to the reader.

1- 2.Clarification of Zscore

"As the Zscore is calculated by determining the mRNA transcript level of an individual mutated transcript in an individual Patient which is then related to the mRNA levels in the same transcript in the rest of the patient group we feel that there is an adequate internal control that negates the influence of copy number alteration and gene expression variations."

They should consider discussing the caveats of this approach better in the text. Probably as they say this would work fine for the copy number alterations, as the patients are all MSI and from the same cancer types. But it might be less appropriate with respect to the gene expression variation between individuals, for instance if the patients are from distinct gene expression clusters/subtypes within the cancer (I do appreciate however that MSI is typically considered a single subtype within a given cancer type). Moreover, the results are compared between three cancer types, and the global tissue-specific gene expression variation is not accounted for.

2- 2.Could the authors comment on the outliers in the MSS group in Fig 1 (TMB ~15000-20000) – are these potentially misclassified as MSS by TCGA? A comment to explain this should be added to the methods.

" -This is a very interesting question, and we believe that indeed it would be something that might be unclear to many readers. These outliers most likely represent a specific group of MSS patients with a mutation in polη, that has been described in literature to lead to high TMB, but no increased indel formation. A new section regarding MSI/MSS, TMB and polη has been added to the Methods."

Note that they are probably referring to POLE (epsilon) not polη (eta) -- two very different things.

They may consider citing https://www.sciencedirect.com/science/article/pii/S009286741731142X or https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5849393/

or any other paper that previously reported extreme mutation burden in polymerase epsilon mutated tumors.

In addition, instead of qualifying them as "most likely" bearing a mutation in POLE, they should check if these tumors in fact bear one of the causal mutations.

3- Line 92 "we discarded the 70 least frequently occurring SNORFS (table 1)." and

Line 150: "In order to assess the PFP presentation on a protein level we are currently collecting patients samples in the ATAPEMBRO trial, which we aim to screen for specific PFP T-cell immunoreactivity by IFN-gamma Elispot."

With respect to using those SNORFS as a potential vaccine - would it make sense to consider substituting some of the NMD-SENS predicted from the top 20 SNORFS, by other NMD-RES or with a higher stringent deltaVAF threshold, even though they are less frequent? Of course, the NMD prediction is not always accurate, but it is improved with the VAF ratio. For instance, take the HDGF gene: it has a log2(TPM) of 6.11 (very highly expressed) but the deltaVAF is 66% which is not much different from 50% and is slightly less than the mean (67.93%), so it may not really be an NMD-escaping gene. Please discuss this and/or other similar examples.

4- As a consideration for future work and/or something to mention in the Discussion, they may consider checking if these SNORFS have NMD-inducing features of NMD endogenous targets (~5-20% of genes), some of which are different from the PTC position rules: 5' upstream ORF, intron in 3' UTR, long 3' UTR etc.

5- There may be issues with clarity of descriptions of the VAFratio method. In methods, it would help if they could show the formula used for VAFratio calculation along with some examples (one NMD-sensitive and one NMD-resistant). It probably does not make much sense to show the z-score formula (which they finally don't seem to use) while not showing the VAFratio formula. From the current description, one could infer that the VAF-ratio is higher for NMD-sensitive transcripts than for NMD-resistant transcripts, but they state: "The transcript independent method is actually the VAF-ratio, where a lower VAF ratio indicates NMD", which seems to be suggesting the opposite. Please take care to clarify this.

Also, "the RNA VAF only relies on the single allele that has been translated," and in the revisions-paragraph: "The fact that DNA contains two alleles, whereas just one is copied to the RNA" may be misleading, since both alleles are likely transcribed (while one may be degraded and the other no).

~end~

Reviewer #2: The authors have addressed all the comments of this reviewer, and have added additional figures, explanations, and corrections of minor points.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 28;16(6):e0251630. doi: 10.1371/journal.pone.0251630.r004

Author response to Decision Letter 1


28 Apr 2021

Response to reviewer questions

Final remarks

Reviewer #2 had no more comments. All comments below are concerning reviewer 1.

1. Clarification of Zscore

I. They should consider discussing the caveats of this approach better in the text. Probably as they say this would work fine for the copy number alterations, as the patients are all MSI and from the same cancer types. But it might be less appropriate with respect to the gene expression variation between individuals, for instance if the patients are from distinct gene expression clusters/subtypes within the cancer (I do appreciate however that MSI is typically considered a single subtype within a given cancer type). Moreover, the results are compared between three cancer types, and the global tissue-specific gene expression variation is not accounted for.

- First off al we thank reviewer 1 for this question and have tried to clarify the use of the Zscore within the text, stating that as suggested by the reviewer, this method is only suitable if used within a single population with the same molecular cancer subtype, as we did in our analysis, to avoid problems with copy number variation and gene expression variation.

2. POLE MSS patients

I. Could the authors comment on the outliers in the MSS group in Fig 1 (TMB ~15000-20000) – are these potentially misclassified as MSS by TCGA? A comment to explain this should be added to the methods.We thank reviewer #1 for pointing out that our explanation of the Zscore and VAF-ratio concepts are insufficient in the manuscript. The transcript independent method is actually the VAF-ratio, where a lower VAF ratio indicates NMD. We have edited both the main text, the legend of figure 3 and the methods section to better reflect the above. As stated below the NMD classification has been replaced by NMDetective algoritm by Lindenboom et al. (PMID: 31659324) Considering the downregulation of NMD resistant mutation this is also a good questions, and we are thankfull that the reviewer made us realize that the explanation for this fact does not become very clear. Most likely this downregulation is due to the effect of the VAF-ratio method, that relies on the difference between the DNA and RNA VAF. The fact that DNA contains two alleles, whereas just one is copied to the RNA it becomes expected that even in the situation of NMD resistant genes the mean VAF-ratio is approximately 50%. We have added a clarification of this in the methods section.

- We agree that with reviewer 1 that merely hinting at POLE for these patients is not sufficient. Therefore we have looked in detail at the mutational profile of these patients and concluded that these are indeed POLE patients. We have stated the exact muations in the text.

3. SNORFS as vaccine

I. With respect to using those SNORFS as a potential vaccine - would it make sense to consider substituting some of the NMD-SENS predicted from the top 20 SNORFS, by other NMD-RES or with a higher stringent deltaVAF threshold, even though they are less frequent?

2

- We see reviewer 1 has a valid point regarding the selection of the the SNORFS, but we believe that selecting on higher deltaVAF thresholds only makes sense for vaccination in a personalized vaccination strategy. In our aim for a single vaccination for a single cancer subgroup we aim to select more frequently occurring mutations with a sufficient deltaVAF over less frequently occurring mutations with a higher deltaVAF. We have tried to clarify this in the discussion.

4. NMD-inducing features of SNORFS

I. As a consideration for future work and/or something to mention in the Discussion, they may consider checking if these SNORFS have NMD-inducing features of NMD endogenous targets (~5-20% of genes), some of which are different from the PTC position rules: 5' upstream ORF, intron in 3' UTR, long 3' UTR etc.

- We thank reviewer 1 for this suggestion. This seems to be a very interesting avenue of further research and we are considering implementing it in our new projects. Nonetheless we feel that it might further complicate our current manuscript and therefore decided not to include it.

5. VAFratio calculation

I. There may be issues with clarity of descriptions of the VAFratio method. In methods, it would help if they could show the formula used for VAFratio calculation along with some examples (one NMD-sensitive and one NMD-resistant). It probably does not make much sense to show the z-score formula (which they finally don't seem to use) while not showing the VAFratio formula. From the current description, one could infer that the VAF-ratio is higher for NMD-sensitive transcripts than for NMD-resistant transcripts, but they state: "The transcript independent method is actually the VAF-ratio, where a lower VAF ratio indicates NMD", which seems to be suggesting the opposite. Please take care to clarify this. Also, "the RNA VAF only relies on the single allele that has been translated," and in the revisions-paragraph: "The fact that DNA contains two alleles, whereas just one is copied to the RNA" may be misleading, since both alleles are likely transcribed (while one may be degraded and the other no).We thank reviewer 1 for this suggestion. This seems to be a very interesting avenue of further research and we are considering implementing it in our new projects. Nonetheless we feel that it might further complicate our current manuscript and therefore decided not to include it.

- Thanks to reviewer 1 we have realized that the VAFratio can be cause for some confusion when reading our manuscript. We have tried to clarify our use by two measures. First off all, as suggested by reviewer 1, we have added the VAFratio formula to the methods section of the manuscript, including examples of NMD escaping and non-escaping transcripts. Furthermore we have altered figure 3b in order to not create confusion about the signs of NMD in the NMD resistant transcripts, by normalizing the data to the NMD resistant mutations. The prior decrease in NMD resistant mutations is probably caused by many factors including the inaccuracy of the NMD prediction algorithm that is based on mRNA transcript level information instead of mutation based mRNA information and therefore is

prone to missing exon skipping for example. We thinks this approach clarifies are underlying message that NMD sensitive transcripts are degraded more often, but that there is a substantial portion of mutated transcripts that does not get degraded by NMD and therefore can yield neo-epitopes.

Attachment

Submitted filename: Plos One Reviewer response rebuttal.pdf

Decision Letter 2

Jianjiong Gao

30 Apr 2021

Framing the potential of public frameshift peptides as immunotherapy targets in colon cancer

PONE-D-20-19308R2

Dear Dr. Bins,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jianjiong Gao

Academic Editor

PLOS ONE

Acceptance letter

Jianjiong Gao

28 May 2021

PONE-D-20-19308R2

Framing the potential of public frameshift peptides as immunotherapy targets in colon cancer

Dear Dr. Bins:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jianjiong Gao

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Difference in NMD measurement.

    Density map showing difference between Z-score and VAF-ratio approach for the measurement of Nonsense Mediated Decay (NMD) in NMD RES and SENS mutations. Separation of NMD RES and SENS is clearer with VAF-ratio.

    (DOCX)

    S2 Fig. Premature Termination Codon (PTC) distance.

    PTC distance for mutation in MSI_H. Median is shown as dotted line (median = 14 Amino Acids). Dashed line shows 20 Amino Acids.

    (DOCX)

    S3 Fig. Patient coverage for top 30 frameshift mutations.

    Percentage of patients covered by n most frequently occurring frameshift mutations with adequate expression, a SNORF and NMD-escape.

    (DOCX)

    S1 Table. Candidate cancer genes.

    List with genes described in the candidate cancer gene database that are part of the top 100 most occurring mutations in MSI colon cancer. Rank: Relative rank assigned to CIS in study.

    (DOCX)

    Attachment

    Submitted filename: Plos One Reviewer response rebuttal.pdf

    Data Availability Statement

    TCGA-data is available trough the TCGA initiative at https://portal.gdc.cancer.gov/. The proprietary Perl computer code used to calculate PTC position in coding microsatellites is available at https://github.com/ITSpaanderman/PTC_predictor, including run instructions and test samples. GATK pipeline information is available at https://gatk.broadinstitute.org/hc/en-us. RSEM computer code is made available by its author at https://github.com/deweylab/RSEM.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES