Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2019 Mar 17;2019:6763596. doi: 10.1155/2019/6763596

Cross-platform Data Analysis Reveals a Generic Gene Expression Signature for Microsatellite Instability in Colorectal Cancer

Anna Pačínková 1,2,, Vlad Popovici 2
PMCID: PMC6441508  PMID: 31008109

Abstract

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.

1. Introduction

Microsatellite instability (MSI) refers to a genetic abnormality found in many human cancers. Microsatellites are short tandem repeats of 1-6 base pairs per unit. Spontaneous mismatches or indels in microsatellites may occur during DNA replication. Such abnormalities can be recognised and repaired by the mismatch repair (MMR) genes. Cells with defective MMR gene function exhibit an abnormal length of microsatellite repeats resulting in microsatellite instable phenotype.

Traditional approach to identify patients with MSI is using a recommended panel of five markers also known as the Bethesda Panel [1]. However, a variety of other marker panels were developed to assess MSI [2, 3]. Instability detected in30% tested markers is designated as microsatellite-high (MSI-H). Instability detected in < 30% tested is termed microsatellite-low (MSI-L), and the absence of instability is termed microsatellite stability (MSS). Although microsatellite instable (MSI) phenotype has been reported in diverse human cancers (e.g., colon, gastric, and endometrial), it is the most frequently associated with colon cancer. Approximately 15% sporadic colon cancers manifest the MSI phenotype [4]. The MSI colon tumours have characteristic molecular biomarkers such as silencing of the MLH1 promoter by hypermethylation [5]. Other well-known contributors to MSI instability in colon cancer are MSH2, MSH6, MLH3, or PMS2 [6, 7].

In colon cancer, despite being associated with resistance to 5-fluorouracil treatment [8], MSI is a favourable prognosis marker [9, 10]. In gastric and endometrial cancer, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours is of clear clinical importance and may be important for predicting the therapeutic effect of immune checkpoint inhibitors.

Nowadays, RNA-seq represents the technology of choice for gene expression analysis. Despite the benefits of RNA-seq, a wealth of expression data already exist in the form of microarray data sets. Moreover, microarray data sets were used in several studies to obtain gene expression signatures to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer [1115]. Therefore, having a MSI gene expression signature able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes would be beneficial. Although MSI testing exists, it is not routinely performed on all cases. Hence a transcriptional signature may complement available clinical features with information on MSI status.

We performed a binary classification between MSI and MSS cases. Since MSS and MSI-L tumours share similar clinicopathologic features [16, 17], MSS and MSI-L populations were pooled in a single class. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. A simple nearest-centroid classifier was built, and its performance in terms of area under the ROC curve estimated using a 10-fold cross-validation procedure. The final classifier was validated on independent data sets representing colon, gastric, and endometrial cancers. Pathway analysis was performed for identifying enriched pathways from MSigDB.

2. Materials and Methods

2.1. Patients and Samples

The discovery set consisted of n = 552 colon cancer samples of which n = 175 were from TCGA RNA-seq [18] (discovery cohort A1) and n = 377 from Affymetrix gene expression (GEO accession number GSE39582 [19]) (discovery cohort A2).

The GSE39582 data set consists of two independent data sets. The second data set from GSE39582 (n = 87) was used as an independent validation cohort B1. Another independent validation colon cancer cohort B2 (n = 136) is from Affymetrix gene expression (GSE41258 data set from GEO database [19]).

The gastric cancer set consisted of n = 369 samples of which n = 335 were from TCGA RNA-seq [18] (cohort C1) and n = 34 from Affymetrix gene expression (GEO accession number GSE13911 [19]) (cohort C2). The endometrial cancer set consisted of n = 116 samples from TCGA RNA-seq [18] (cohort D1).

A brief summary of all data sets can be found in Table 1.

Table 1.

Summary of all data sets used in the analysis. MSI microsatellite instability; MSS microsatellite stability; HG-U133A Human Genome U133A 2.0 platform; HG-U133Plus Human Genome U133 Plus 2.0 platform.

Cohort Tissue Platform MSS MSI Source
A1 development colon RNA-seq 140 35 TCGA
A2 development colon microarray (HG-U133 Plus) 318 59 GSE39582
B1 validation colon microarray (HG-U133 Plus) 77 10 GSE39582
B2 validation colon microarray (HG-U133 A) 107 29 GSE41258
C1 gastric RNA-seq 281 54 TCGA
C2 gastric microarray (HG-U133 Plus) 18 16 GSE13911
D1 endometrial RNA-seq 64 52 TCGA

2.2. RNA-Seq and Microarray Data Analysis

Gene expression data were processed following standard practices in the field as follows.

In RNA-seq data sets, genes with low counts across all libraries were filtered out prior to further analysis. Read counts were normalised using Trimmed Mean of M-values normalisation procedure [25]. Differential gene expression analysis was performed using edgeR [26] generalised linear model (batch effects included in the generalised linear model). Only genes with the absolute value of log2 fold change >1 were considered as differentially expressed (adjusted p value < 0.05, Benjamini-Hochberg procedure [21]).

Outlier microarrays were filtered out using (i) 2D images for spatial bias diagnostic and (ii) NUSE (Normalised Unscaled Standard Errors, median (NUSE) ≤1.035) (affyPLM Bioconductor package [27]). Gene expression measurements were normalised using Robust Multiarray Average procedure [28] and quantile normalisation.

Two types of Affymetrix human gene expression arrays were used in this study: Human Genome U133A 2.0 (HG-U133A) and Human Genome U133 Plus 2.0 (HG-U133Plus). HG-U133A and HG-U133Plus differ from the number of probe sets presented in the chip (HG-U133A comprises more than 22,000 probe sets; HG-U133Plus comprises more than 54,000 probe sets).

2.3. Construction of the Gene Expression Signature for MSI Status

For the analysis, MSI-low and MSS (microsatellite stable) populations were pooled in a single class. Using four published gene expression signatures of MSI trained exclusively on microarray data sets [1114], we identified a core MSI gene list. First, we filtered genes common to both platforms and then found differentially expressed genes (DEGs) between MSI/MSS in RNA-seq development cohort A1. A new gene expression signature was defined as the intersection of these DEGs and the core MSI gene list. To minimise redundancy of the gene expression signature, genes with the absolute value of Pearson's correlation coefficient > 0.75 either in the cohort A1 or in the cohort A2 were excluded from the final gene expression signature (if expression levels of two genes were highly correlated, only one randomly selected representative from these two genes was included in the signature). The gene expression signature was used to construct a nearest (cosine similarity) centroid-based classifier. For each sample, a score was computed as the difference between cosine distances from the sample and the centroids of the MSI and MSS classes and used for the prediction of MSI status. If the score exceeded an optimised threshold, a sample was classified as MSI. We did not construct more sophisticated classifiers to allow direct comparison with published signatures trained exclusively on microarray data sets.

2.4. Performance Evaluation of the Gene Expression Signature for MSI Status

The performance of the classifier was estimated using 10-fold cross-validation. As the main performance index was used in an area under the receiver operating characteristic curve (AUC) and 95% confidence intervals (CI) were computed using the DeLong's method [20] (implemented in pROC R-package [29]). The gene expression signature was validated on two independent colon cancer data sets: cohort B1 and cohort B2.

Only the independent microarray data sets were used for validation due to the lack of an independent publicly available colon cancer RNA-seq data set (with present MSI status).

Besides the validation on an independent colon cancer samples, we evaluated the performance of the gene expression signature on gastric and endometrial cancer samples (cohorts C1, C2, and D1).

2.5. Comparison with Published Signatures Trained Exclusively on Microarray Data Sets

The gene expression signature performance was also compared with published MSI gene expression signatures trained exclusively on microarray data sets.

Giacomini et al. [11] developed a 7-gene expression signature using a custom microarray. The signature was trained on colon cancer cell lines and included one probe for noncoding RNA.

Kruhøffer et al. [12] constructed a 9-gene expression signature capable of separating the MSI and MSS samples using both sporadic and hereditary nonpolyposis tumours. The Human Genome U133A 2.0 (Affymetrix) was used to measure the level of gene expression.

Lanza et al. [13] identified a signature consisting of 27 differentially expressed genes including eight miRNAs (19-gene expression signature used in comparison with a new gene expression signature; miRNAs were excluded from the analysis). Hybridisation was performed to the human 18.5k Expression Bioarray.

Tian et al. [14] developed a 64-gene expression signature for the detection of MSI phenotypes using Agilent 44K oligonucleotide array. The signature included probes without mapping to a known gene or multiple mapping probes.

The classification of samples was carried out in the same way as before (genes of the new signature were replaced by the genes from previously mentioned published gene expression signatures). DeLong's test [20] was used to compare the AUCs of the gene expression signature and published MSI gene signatures trained exclusively on microarray data sets. The correlation analysis was performed in RNA-seq development cohort A1 to detect potential multicollinearity among the genes from signatures trained exclusively on microarray data sets. The correlation was measured as the absolute value of Pearson's correlation coefficient.

2.6. Functional Interpretation and Pathway Enrichment Analysis of the Gene Expression Signature

A functional and biological interpretation of the 25-gene expression signature was obtained from the Database for Annotation, Visualization and Integrated Discovery (DAVID) version 6.8 [30].

To identify pathways enriched in the gene expression signature, pathway enrichment analysis was performed against MSigDB gene collections [31] using pathEnrich R function [32] (adjusted p value < 0.05, Benjamini- Hochberg procedure [21]).

2.7. Statistical and Survival Analysis

All statistical analyses and survival analysis were performed in R (version 3.3.1; [33]).

The prognostic value of the gene expression signature was assessed by fitting the Cox regression model in stage II and stage III cohort A1/A2 subpopulation (adjusted p value < 0.05, Benjamini-Hochberg procedure [21]).

3. Results

3.1. Construction and Performance Evaluation of the Gene Expression Signature for MSI Status

We identified a new 25-gene expression signature (see Methods) (Table 2; Figure 1). In 10-fold cross-validation, the classifier performance was AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq cohort A1 and an AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray cohort A2 (Table 3). The 25-gene expression signature was also validated in two independent microarray data sets: cohort B1 with an AUC = 0.92, 95% CI = (0.81 – 1.00) and cohort B2 with an AUC = 0.80, 95% CI = (0.70 – 0.90). Only 17 genes from the gene expression signature were used in cohort B2 (probes for eight genes were not available). We used validation cohort B2 on purpose of showing that the classifier works well also with older versions of Affymetrix microarrays. Microsatellite instable phenotype is observed in many cancers. Therefore a valid question was whether the signature could identify MSI cases also in gastric and endometrial cancer samples. The 25-gene expression signature yields good performance in gastric cancer patients both on RNA-seq data set and microarray platforms (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancer samples (AUC = 0.71, 95% CI = (0.62 – 0.81)(Table 3).

Table 2.

List of genes in the 25-gene expression signature.

Entrez gene ID Gene symbol Gene description
7138 TNNT1 troponin T1, slow skeletal type
8875 VNN2 vanin 2
81786 TRIM7 tripartite motif containing 7
8744 TNFSF9 tumor necrosis factor superfamily member 9
10551 AGR2 anterior gradient 2, protein disulphide isomerase family member
200916 RPL22L1 ribosomal protein L22 like 1
2786 GNG4 G protein subunit gamma 4
25984 KRT23 keratin 23
23305 ACSL6 acyl-CoA synthetase long-chain family member 6
7125 TNNC2 troponin C2, fast skeletal type
357 SHROOM2 shroom family member 2
54749 EPDR1 ependymin related 1
1820 ARID3A AT-rich interaction domain 3A
10656 KHDRBS3 KH RNA binding domain containing, signal transduction associated 3
2686 GGT7 gamma-glutamyltransferase 7
57477 SHROOM4 shroom family member 4
4292 MLH1 mutL homolog 1
85407 NKD1 naked cuticle homolog 1
29842 TFCP2L1 transcription factor CP2 like 1
10451 VAV3 vav guanine nucleotide exchange factor 3
80183 RUBCNL RUN and cysteine rich domain containing beclin 1 interacting protein like
430 ASCL2 achaete-scute family bHLH transcription factor 2
8313 AXIN2 axin 2
5326 PLAGL2 PLAG1 like zinc finger 2
222171 PRR15 proline rich 15

Figure 1.

Figure 1

The 25-gene expression signature profile. (a) RNA-seq development cohort A1 (n = 175), (b) microarray development cohort A2 (n = 377). MSI microsatellite instability; MSS microsatellite stability.

Table 3.

Performance of the 25-gene expression signature and the published signatures trained exclusively on microarray data sets. As the main performance index was used the AUC and 95% CIs were computed using the DeLong's method [20]. DeLong's test [20] was used to compare the AUCs of the published signatures and the 25-gene expression signature on a given cohort (adjusted p-value < 0.05, Benjamini-Hochberg procedure [21]). significantly better performance of the signature in comparison with the 25-gene expression signature; ∗∗ significantly worse performance of the signature in comparison with the 25-gene expression signature; 25-gene expr.sig. the proposed 25-gene expression signature; AUC area under the receiver operating characteristic curve; CI confidence interval.

Colon Gastric Endometrial
A1 development A2 development B1 validation B2 validation C1 C2 D1
RNA-seq Microarray Microarray Microarray RNA-seq Microarray RNA-seq
25-gene expr.sig. 0.94 0.95 0.92 0.80 0.90 0.83 0.71
CI (0.90 – 0.97) CI (0.92 – 0.97) CI (0.81 – 1.00) CI (0.70 – 0.90) CI (0.85 – 0.94) CI (0.69 – 0.97) CI (0.62 – 0.81)

Giacomini et al. [11] 0.67∗∗ 0.56∗∗ 0.55∗∗ 0.69 0.63∗∗ 0.53∗∗ 0.47∗∗
CI (0.58 – 0.76) CI (0.49 – 0.64) CI (0.35 – 0.75) CI (0.59 – 0.79) CI (0.56 – 0.71) CI (0.33 – 0.73) CI (0.36 – 0.58)
Kruhøffer et al. [12] 0.88 0.99 0.92 0.81 0.74∗∗ 0.85 0.62
CI (0.82 – 0.95) CI (0.98 – 1.00) CI (0.75 – 1.00) CI (0.70 – 0.92) CI (0.67 – 0.81) CI (0.70 – 1.00) CI (0.52 – 0.72)
Lanza et al. [13] 0.96 0.92∗∗ 0.90 0.78 0.82∗∗ 0.70 0.63
CI (0.92 – 0.99) CI (0.89 – 0.95) CI (0.82 – 0.99) CI (0.70 – 0.87) CI (0.76 – 0.87) CI (0.52 – 0.89) CI (0.53 – 0.73)
Tian et al. [14] 0.97 0.96 0.95 0.82 0.89 0.88 0.71
CI (0.95 – 1.00) CI (0.94 – 0.98) CI (0.86 – 1.00) CI (0.72 – 0.92) CI (0.84 – 0.95) CI (0.75 – 1.00) CI (0.61 – 0.80)

3.2. Comparison with Published Signatures Trained Exclusively on Microarray Data Sets

The performance of the 25-gene expression signature was compared with published signatures trained exclusively on microarray data sets (Table 3, Figure S1). The 25-gene expression signature yields better performance in comparison with Giacomini et al. [11] signature on most cohorts. On RNA-seq cohort C1, the 25-gene expression signature yields better performance in comparison with Giacomini et al. [11], Kruhøffer et al. [12], andLanza et al. [13] signatures. In case of microarray development cohort A2, the AUCs of Kruhøffer et al. [12] and Tian et al. [14] signatures were significantly better in comparison with AUC of the 25-gene expression signature.

On the contrary, the AUCs of Giacomini et al. [11] and Lanza et al. [13] signatures were significantly worse in comparison with AUC of the 25-gene expression signature on the same cohort.

In general, the accuracy of Tian et al. [14] signature was high in all cohorts including RNA-seq development cohort A1. Therefore, we performed correlation analysis to detect potential multicollinearity among the genes from the signature in the RNA-seq development cohort A1. A high correlation between expression levels indicates the strong relationship between genes and introduces a great deal of redundancy in the signature. In the RNA-seq development cohort A1, 15 genes from the Tian et al. [14] signature had the absolute value of Pearson's correlation coefficient higher than 0.75 (Figure 2). These results suggest high redundancy of this signature in RNA-seq cohort A1.

Figure 2.

Figure 2

Correlation plot of genes from the Tian et al. [14] signature with highly correlated expression levels (Pearson's correlation coefficient > 0.75) in RNA-seq development cohort A1. The color key on the right shows the value of Pearson's correlation coefficient.

The intersection of the 25-gene expression signature and the published signatures is shown in Figure 3.

Figure 3.

Figure 3

Intersection of the 25-gene expression signature and published microarray gene expression signatures used to construct the core MSI (microsatellite instability) gene list. 25-gene expr.sig.: the proposed 25-gene expression signature.

3.3. Functional Interpretation and Pathway Enrichment Analysis of the Gene Expression Signature

A functional and biological interpretation of the 25-gene expression signature was obtained from the DAVID database. Tumour-suppressor genes (MLH1 and RUBCNL), protooncogene (AGR2), and genes reported to be linked with colon cancer (EPDR1, MLH1, AXIN2) were enriched in the signature. The signature also comprised multiple genes with related oncogenic signaling pathways such as EGFR (VAV3), AKT (TNFSF9 and GNG4), or WNT (AXIN2, NKD1) signaling pathway. Genes GNG4 and VAV3 are involved in the chemokine signaling pathway that activates downstream signaling pathways such as MAPK. The 25-gene expression signature encompasses genes associated with cell differentiation, growth, adhesion, and migration.

We also carried out pathway enrichment analysis against MSigDB gene collections [31]. Three gene sets from MSigDB were significantly enriched in the new 25-gene expression signature (Table 4). The pathway enrichment analysis results support the 25-gene expression signature association with colon cancer MSI phenotype. VAV3, ACSL6, GNG4, and KRT23 were significantly enriched in gene set defined as “downregulated genes discriminating between MSI and MSS colon cancers” [22]. Results of Koinuma et al. [23] study indicate that epigenetic silencing of AXIN2 is specifically associated with carcinogenesis in MSI colorectal tumours. This is in concordance with our results.

Table 4.

Pathway enrichment analysis of the proposed 25-gene expression signature against MSigDB gene collections. MSigDB molecular signatures database.

MsigDB gene set name adj. p-value Genes in overlap
Watanabe colon cancer MSI vs MSS down [22] 0.005 VAV3, ACSL6, GNG4, KRT23
Koinuma colon cancer MSI down [23] 0.045 AXIN2, MLH1
Sansom WNT pathway require MYC [24] 0.045 AXIN2, NKD1, ASCL2

3.4. Proposed Gene Expression Signature and Prognosis

We assessed the prognostic value of each gene from the proposed 25-gene expression signature by fitting the Cox regression model to identify potential drivers of the prognostic effect. Two endpoints were tested in stage II and III cohort A2 subpopulation: relapse-free survival (RFS, n = 301) and overall survival (OS, n = 304). Because of the limitation of TCGA data set, only OS endpoint was tested in stage II and III cohort A1 subpopulation (n = 115).

It is well known that patients with MSI have a more favourable prognosis compared with those with MSS. The prognostic value of the proposed 25-gene expression signature for MSI colon cancers was not statistically significant. This suggests rather than being a prognostic gene set the new 25-gene expression signature captures the underlying biological differences between MSI/MSS phenotypes.

4. Discussion

Carcinogenesis is a multistep process, during which genetic and epigenetic alterations determine the malignant transformation of the cell. The molecular profile of a tumour is a key determinant of clinical outcome. Therefore, the precise MSI status detection is needed for guiding the treatment strategies. Having a single MSI gene expression signature that can be used without regard to platform allows researchers to take advantage of all available microarray/RNA-seq data sets.

The main objective of this study was to identify a gene expression signature for MSI predictions in colon cancer that could be applied to both microarrays and RNA-seq data sets. We developed a new 25-gene expression signature that yields high accuracy in MSI phenotype prediction in colon cancer. Interestingly, the signature yields good performance also in gastric and endometrial cancers. From a biologic perspective, this supports the idea that MSI gene expression pattern is comparable across various cancers pointing towards similar regulatory pathways.

The 25-gene expression signature performance was also compared with published MSI gene expression signatures trained exclusively on microarray data sets. The proposed 25-gene expression signature yields better performance in comparison with Giacomini et al.'s [11] signature on most cohorts. Even if Lanza et al.'s [13] signature originally consisted of both mRNAs and miRNAs, we showed that using only mRNAs from the signature can be used to distinguish MSI/MSS colon cancer phenotypes. The accuracy of Tian et al.'s [14] signature was high in all cohorts including RNA-seq development cohort A1. However, the correlation analysis revealed high redundancy of this signature in RNA-seq cohort. Therefore, we propose the new 25-gene expression signature as a core cross-platform pattern that may form the basis for a MSI phenotype classifier across multiple cancers.

The functional annotation and the pathway enrichment analysis of the 25 genes from the new gene expression signature support the association with colon cancer MSI phenotype.

Two tumour-suppressor genes and one protooncogene were enriched in the signature. AXIN2 gene is associated with the WNT signaling pathway, and it is a direct repressor of the MYC protooncogene [34]. AXIN2 was silenced in MSI subgroup, possibly as a result of methylation of its promoter region frequently observed in MSI colon cancer patients. Interestingly, AXIN2 was also identified as one of the 36 genes that contribute to the distinction between MSI-L and MSI-H samples [35]. RPL22L1 gene was previously identified as MSI specific in gastric cancer [36] and identified as colon cancer CIMP-H subtype (characterised as enrichment for MSI, right side and mucinous histology) specific gene [37].

It should also be mentioned that MLH1 gene was previously identified as part of a gene list able to differentiate deficient/nondeficient mismatch repair phenotype in colorectal cancer samples [15].

In the microarray development cohort A2, MSI colon cancer samples with downregulated MLH1 gene expression form a compact cluster. On the contrary, MSI colon cancer samples without silencing of the MLH1 gene expression are clustered together with some MSS colon cancer samples (see dendrograms in Figure 1). Most of these MSS samples were misclassified as MSI by the proposed 25-gene expression signature. A similar pattern was observed in the RNA-seq development cohort A1. Even if these samples were predicted to be microsatellite stable, we might hypothesize they have disrupted the DNA mismatch repair system in a similar way to MSI samples without silencing of the MLH1 gene expression.

5. Conclusion

We present a new 25-gene expression signature able to identify MSI cases in colon cancer with consistently strong performance across microarray and RNA-seq platforms. Therefore, the new MSI gene expression signature is able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples. The performance of the signature on the RNA-seq data set was compared with published MSI gene signatures trained exclusively on microarray data sets. The pathway enrichment analysis results support the 25-gene expression signature association with colon cancer MSI phenotype. Moreover, the new signature is able to capture common gene activation patterns in the colon, gastric, and endometrial cancers, suggesting that the development of a common expression-based cross-platform test is feasible.

Acknowledgments

This research was supported by the European Community's Seventh Framework Programme under grant agreement no. 602901 MerCuRIC and by the RECETOX Research Infrastructure (LM2015051 and CZ.02.1.01/0.0/0.0/16∖_013/0001761). Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042) is greatly appreciated.

Data Availability

The R code is freely available at https://github.com/bioinfo-recetox/Cross_platform_MSI_signature.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Supplementary Materials

Supplementary Materials

Supplementary Figure S1: receiver operating characteristic curves of the proposed 25-gene expression signature and the published signatures trained exclusively on microarray data sets. Supplementary Figure S2: comparison of receiver operating characteristic curves of the proposed 25-gene expression signature in RNA-seq cohorts A1, C1, and D1 normalised with different normalisation methods. Supplementary Table S3: performance of the 25-gene expression signature in RNA-seq cohorts A1, C1, and D1 normalised with different normalisation methods.

References

  • 1.Boland C. R., Thibodeau S. N., Hamilton S. R., et al. A National Cancer Institute workshop on microsatellite instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Research. 1998;58(22):5248–5257. [PubMed] [Google Scholar]
  • 2.Umar A., Boland C. R., Terdiman J. P., et al. Revised bethesda guidelines for hereditary nonpolyposis colorectal cancer (lynch syndrome) and microsatellite instability. Journal of the National Cancer Institute. 2004;96:261–268. doi: 10.1093/jnci/dji158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nowak J. A., Yurgelun M. B., Bruce J. L., et al. Detection of mismatch repair deficiency and microsatellite instability in colorectal adenocarcinoma by targeted next-generation sequencing. The Journal of Molecular Diagnostics. 2017;19(1):84–91. doi: 10.1016/j.jmoldx.2016.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rustgi A. K. The genetics of hereditary colon cancer. Genes & Development. 2007;21(20):2525–2538. doi: 10.1101/gad.1593107. [DOI] [PubMed] [Google Scholar]
  • 5.Herman J. G., Umar A., Polyak K., et al. Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proceedings of the National Acadamy of Sciences of the United States of America. 1998;95(12):6870–6875. doi: 10.1073/pnas.95.12.6870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kane M. F., Loda M., Gaida G. M., et al. Methylation of the hMLH1 promoter correlates with lack of expression of hMLH1 in sporadic colon tumors and mismatch repair-defective human tumor cell lines. Cancer Research. 1997;57(5):808–811. [PubMed] [Google Scholar]
  • 7.Peltomäki P. Deficient DNA mismatch repair: A common etiologic factor for colon cancer. Human Molecular Genetics. 2001;10(7):735–740. doi: 10.1093/hmg/10.7.735. [DOI] [PubMed] [Google Scholar]
  • 8.Koopman M., Kortman G. A. M., Mekenkamp L., et al. Deficient mismatch repair system in patients with sporadic advanced colorectal cancer. British Journal of Cancer. 2009;100(2):266–273. doi: 10.1038/sj.bjc.6604867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sankila R., Aaltonen L. A., Jarvinen H. J., Mecklin J.-P. Better survival rates in patients with MLH1-associated hereditary colorectal cancer. Gastroenterology. 1996;110(3):682–687. doi: 10.1053/gast.1996.v110.pm8608876. [DOI] [PubMed] [Google Scholar]
  • 10.Samowitz W. S., Curtin K., Ma K. N., et al. Microsatellite instability in sporadic colon cancer is associated with an improved prognosis at the population level. Cancer Epidemiology, Biomarkers & Prevention. 2001;10:917–923. [PubMed] [Google Scholar]
  • 11.Giacomini C. P., Leung S. Y., Chen X., et al. A gene expression signature of genetic instability in colon cancer. American Association for Cancer Research. 2005;65(20):9200–9205. doi: 10.1158/0008-5472.CAN-04-4163. [DOI] [PubMed] [Google Scholar]
  • 12.Kruhøffer M., Jensen J. L., Laiho P., et al. Gene expression signatures for colorectal cancer microsatellite status and HNPCC. British Journal of Cancer. 2005;92(12):2240–2248. doi: 10.1038/sj.bjc.6602621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lanza G., Ferracin M., Gafà R., et al. mRNA/microRNA gene expression profile in microsatellite unstable colorectal cancer. Molecular Cancer. 2007;6, article no. 54 doi: 10.1186/1476-4598-6-54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tian S., Roepman P., Popovici V., et al. A robust genomic signature for the detection of colorectal cancer patients with microsatellite instability phenotype and high mutation frequency. The Journal of Pathology. 2012;228(4):586–595. doi: 10.1002/path.4092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang T.-M., Huang T., Wang R.-F. Cross talk of chromosome instability, cpg island methylator phenotype and mismatch repair in colorectal cancer. Oncology Letters. 2018;16(2):1736–1746. doi: 10.3892/ol.2018.8860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ribic C. M., Sargent D. J., Moore M. J., et al. Tumor microsatellite-instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. The New England Journal of Medicine. 2003;349(3):1166–1170. doi: 10.1056/NEJMoa022289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hong S. P., Min B. S., Kim T. I., et al. The differential impact of microsatellite instability as a marker of prognosis and tumour response between colon cancer and rectal cancer. European Journal of Cancer. 2012;48(8):1235–1243. doi: 10.1016/j.ejca.2011.10.005. [DOI] [PubMed] [Google Scholar]
  • 18.The Cancer Genome Atlas. http://cancergenome.nih.gov.
  • 19.GeneExpression Omnibus. https://www.ncbi.nlm.nih.gov/geo.
  • 20.DeLong E. R., DeLong D. M., Clarke-Pearson D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–845. doi: 10.2307/2531595. [DOI] [PubMed] [Google Scholar]
  • 21.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B: Methodological. 1995;57(1):289–300. [Google Scholar]
  • 22.Watanabe T., Kobunai T., Toda E., et al. Distal colorectal cancers with microsatellite instability (MSI) display distinct gene expression profiles that are different from proximal MSI cancers. Cancer Research. 2006;66(20):9804–9808. doi: 10.1158/0008-5472.CAN-06-1163. [DOI] [PubMed] [Google Scholar]
  • 23.Koinuma K., Yamashita Y., Liu W., et al. Epigenetic silencing of AXIN2 in colorectal carcinoma with microsatellite instability. Oncogene. 2006;25(1):139–146. doi: 10.1038/sj.onc.1209009. [DOI] [PubMed] [Google Scholar]
  • 24.Sansom O. J., Meniel V. S., Muncan V., et al. Myc deletion rescues Apc deficiency in the small intestine. Nature. 2007;446(7136):676–679. doi: 10.1038/nature05674. [DOI] [PubMed] [Google Scholar]
  • 25.Robinson M. D., Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010;11(3, article R25) doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Robinson M. D., McCarthy D. J., Smyth G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Brettschneider J., Collin F., Bolstad B. M., Speed T. P. Quality assessment for short oligonucleotide microarray data. Technometrics. A Journal of Statistics for the Physical, Chemical and Engineering Sciences. 2008;50(3):241–264. doi: 10.1198/004017008000000334. [DOI] [Google Scholar]
  • 28.Irizarry R. A., Hobbs B., Collin F., et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
  • 29.Robin X., Turck N., Hainard A., et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12, article 77 doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Huang D. W., Sherman B. T., Lempicki R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 2009;4(1):44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  • 31.Liberzon A., Subramanian A., Pinchback R., Thorvaldsdóttir H., Tamayo P., Mesirov J. P. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Subramanian A., Tamayo P., Mootha V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Acadamy of Sciences of the United States of America. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Project R. https://www.r-project.org.
  • 34.Rennoll S. A., Konsavage W. M., Yochum G. S. Nuclear AXIN2 represses MYC gene expression. Biochemical and Biophysical Research Communications. 2014;443(1):217–222. doi: 10.1016/j.bbrc.2013.11.089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chen L., Pan X., Hu X., et al. Gene expression differences among different MSI statuses in colorectal cancer. International Journal of Cancer. 2018;143(7):1731–1740. doi: 10.1002/ijc.31554. [DOI] [PubMed] [Google Scholar]
  • 36.D'Errico M., de Rinaldis E., Blasi M. F., et al. Genome-wide expression profile of sporadic gastric cancers with microsatellite instability. European Journal of Cancer. 2009;45(3):461–469. doi: 10.1016/j.ejca.2008.10.032. [DOI] [PubMed] [Google Scholar]
  • 37.Budinska E., Popovici V., Tejpar S., et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. The Journal of Pathology. 2013;231(1):63–76. doi: 10.1002/path.4212. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

Supplementary Figure S1: receiver operating characteristic curves of the proposed 25-gene expression signature and the published signatures trained exclusively on microarray data sets. Supplementary Figure S2: comparison of receiver operating characteristic curves of the proposed 25-gene expression signature in RNA-seq cohorts A1, C1, and D1 normalised with different normalisation methods. Supplementary Table S3: performance of the 25-gene expression signature in RNA-seq cohorts A1, C1, and D1 normalised with different normalisation methods.

Data Availability Statement

The R code is freely available at https://github.com/bioinfo-recetox/Cross_platform_MSI_signature.


Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES