SUMMARY
Metastatic progression of colorectal adenocarcinoma (CRC) remains poorly understood and poses significant challenges for treatment. To overcome these challenges, we performed multiomics analyses of primary CRC and liver metastases. Genomic alterations, such as structural variants or copy number alterations, were enriched in oncogenes and tumor suppressor genes and increased in metastases. Unsupervised mass spectrometry-based proteomics of 135 primary and 123 metastatic CRCs uncovered distinct proteomic subtypes, three each for primary and metastatic CRCs, respectively. Integrated analyses revealed that hypoxia, stemness, and immune signatures characterize these 6 subtypes. Hypoxic CRC harbors high epithelial-to-mesenchymal transition features and metabolic adaptation. CRC with a stemness signature shows high oncogenic pathway activation and alternative telomere lengthening (ALT) phenotype, especially in metastatic lesions. Tumor microenvironment analysis shows immune evasion via modulation of major histocompatibility complex (MHC) class I/II and antigen processing pathways. This study characterizes both primary and metastatic CRCs and provides a large proteogenomics dataset of metastatic progression.
Graphical Abstract

In brief
Tanaka et al. conduct proteogenomic characterization of 154 primary and 142 metastatic colorectal cancers and find 6 subtypes with 3 functional signatures (hypoxia, stemness, and immune). These subtypes shed light on the molecular progression from primary to metastatic cancer, highlighting proteomic plasticity and features of metastatic disease.
INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer in both men and women and the third leading cause of cancer-related death in the United States.1 Despite early diagnosis and treatment, many CRCs recur, and when metastases occur, CRCs are difficult to cure at present.2 Which molecular signatures and pathways are shared vs. separate primary-site CRC (pCRC) from metastatic CRC (mCRC)? Are mCRCs heterogeneous with molecular features that can be used to classify them into unique subtypes for better risk stratification? Answers to these questions require in-depth molecular characterization of both pCRC and mCRC.
Increasing efforts have been devoted to cataloging a spectrum of cancers at multiomics levels, and datasets from various consortia, such as TCGA (The Cancer Genome Atlas) and CPTAC (Clinical Proteomic Tumor Analysis Consortium), are expanding.3,4 CRC has been profiled by various genomic and proteomic approaches, including whole-genome sequencing, RNA sequencing (RNA-seq), global and specific proteomic sequencing, and integrated multiomics analysis.5–10 However, a comprehensive molecular definition of CRC is far from complete, and multiomics comparison of pCRC and mCRC using large cohorts of mCRC is particularly scant.
In this study, we studied 142 liver mCRCs and 154 pCRCs as well as 78 normal colon and 14 normal liver tissues by an integrated multiomics approach (Figure S1). We first obtained whole genomes, transcriptomes, and proteomes from 16 patient-matched triplets of mCRC, pCRC, and benign colonic mucosa (Figure S1). Based on these initial results, we then focused on deep proteome sequencing of 340 tissue samples, most of which were also annotated with Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) sequencing.11 We discovered distinct proteomic signatures in pCRC and mCRC. Selected protein markers from differential expression and pathway enrichment analyses were independently validated by immunohistochemistry (IHC) using a tissue microarray (TMA) of 220 additional CRC samples. Overall, this study presents the largest combined proteomics dataset of pCRC and mCRC to date and a rich proteogenomics resource for the cancer research community.
RESULTS
Structural variants (SVs) enriched in oncogenes/tumor suppressor genes
Whole-genome sequencing identified 16,757 SVs in 32 CRC samples (Figure 1A), including 9,183 (54.8%) insertions, 3,436 (20.5%) translocations, 2,188 (13.1%) deletions, 1,158 (6.9%) in-versions, 792 (4.7%) duplications, and 7,586 SV breakpoints in 2,984 genes (5’ UTR, CDS, 3’UTR). Gene-specific SV events were not significantly different between pCRC and mCRC. When we focused on the 723 cancer-associated genes (CAGs) curated by COSMIC cancer gene census,12 SVs affected 138 (19.1%) CAGs, but only 4.9% (2846/58290) of non-CAGs were affected (Figure 1B). mCRC had slightly higher SV counts in CAGs (median, 15.0) than pCRC (median, 10.5) (Figure 1C). Oncogenes had more frequent SV events in mCRC than pCRC (Figure S2A), and deletion of tumor suppressor genes showed a slight increase in mCRC (Figure S2B).
Figure 1. SV landscape and downstream effect on molecular pathways.

(A) SV event number with gene involvement. All samples have more than 100 structural events, ranging from 104 to 666 per sample. There is no significant difference in SV event number between pCRC and mCRC.
(B) SV event frequency of CAGs and non-CAGs. Among 723 CAGs, 19.1% (138 CAGs) are affected by SVs. However, only 4.9% (2,846) of non-CAGs (58,290 genes) are affected by SVs (chi-square test, p < 10E–5). CAG, cancer-associated gene.
(C) Boxplot of SV-affected CAGs comparing pCRC (n = 16) and mCRC (n = 16). While not statistically significant, mCRC has slightly more SV events than pCRC (Wilcoxon test).
(D) Oncoprint of 30 recurrently (>8 times) SV-affected genes in cohort 1 (16 pCRC, 16 mCRC). Among these, 11 genes (*) overlap with commonly SV-affected genes reported in the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. DUP, duplication; DEL, deletion; INV, inversion; INS, insertion; BND, translocation; MULTI, multiple SV events.
(E) Copy number boxplot of the top 12 SV-affected genes grouped by SV status. Only 3 genes (FHIT, RBFOX1, and PRKN) show a significant copy number decrease in the SV-affected group (Wilcoxon test). CCSER1 shows a decrease, but it is not statistically significant. The main SV event for these 4 genes is deletion. The remaining 8 genes have different SV event types.
(F) mRNA expression levels (log2 transcript per million [TPM] values) of genes in (E). LINGO1 and CAM2B were not available in the RNA-seq data. Note that mRNA expression does not highly correlate with copy number status. MACROD2, TTC28, and CCSER1 show significant or close-to-significant expression decreases in the SV-affected group (Wilcoxon test).
(G) Number of genome-wide SV events grouped by FHIT locus SV status (FHIT altered, n = 16; FHIT non-altered, n = 16). The FHIT locus SV-affected group shows significantly higher genome-wide SV events than the non-affected group. This is consistent with a report that loss of FHIT induces genome instability.13
(H) Normalized enrichment score boxplot of single-sample GSEA obtained from transcriptome (“RNA”) and proteome (“Protein”) data (Wilcoxon test). The FHIT locus SV-affected group shows significantly higher double-strand break signatures in several gene set terms at the mRNA and protein levels. This suggests that FHIT function is strongly connected to DNA double-strand repair mechanisms. This supports previously reported phenotypes in FHIT loss.13
(I) Boxplot of TMB (per MB) between the FHIT locus SV-affected group and non-affected group. Concordant with FHIT function, TMB is significantly higher in the FHIT locus SV-affected group than in the non-affected group.
An Oncoprint of the top 30 recurrently SV-affected genes and copy numbers of the top 12 genes are shown in Figures 1D–1F. Of these, 11 genes (MARCROD2, FHIT, CCSER1, RBFOX1, PTPRD, WWOX, CSMD1, PDE4D, PRKG1, GPC6, and NAALADL2) were among the top 21 SV-affected genes in common fragile sites (CFS) reported in the ICGC/TCGA 2,658 whole-genome sequencing (WGS) cohort across 38 tumor types.14 CFSs are specific chromosomal regions that preferentially form gaps or breaks on metaphase chromosomes under conditions that impede DNA synthesis, and two loci (FRA3B/FHIT and FRA16D/WWOX) in the human genome are most prone to form lesions. FHIT was found to be the third highest CFS-related gene in our cohort. FHIT is a tumor suppressor and regulates the apoptotic pathway and cell cycle,15 and loss of FHIT expression induces genome instability and correlates with various clinical features, such as invasiveness and poor outcome.13,16 Although FHIT mRNA expression did not significantly change between SV-affected and SV-nonaffected samples (Figure 1F), SV events, tumor mutation burden (TMB), and DNA double-strand repair process/response pathway scores were higher in the FHIT-SV-affected group at the RNA and protein levels (Figures 1G–1I). In addition, cell cycle and oncogenic pathways, such as MYC, were upregulated in the FHIT-SV-affected group both at the mRNA and protein level (Figure S2C).
Recurrent somatic copy number alterations (SCNAs) of metastases involve known oncogenes and tumor suppressors
Recurrent SCNAs at chromosomal arm level are similar to those reported previously3,17,18 (Figure S2D). Recurrent arm-level amplifications in both pCRC and mCRC occurred in 8q, 13p, 13q, 20p, and 20q, which harbor several common oncogenes, such as MYC and FLT3. Recurrent arm-level deletions in both pCRC and mCRC occurred in 18q, which harbors several common tumor suppressor genes, such as SMAD2/4. Arm-level amplifications of 7p and 7q and deletions of 17p and 22q were significantly recurrent only in mCRC. These loci harbor MET and TP53, suggesting that these genomic alterations enhance metastatic tumor progression.
There were 35 unique focal amplifications (27 in pCRC, 27 in mCRC, and 19 shared) (Figure 2A) and 8 unique focal deletions (4 in pCRC, 5 mCRC, and 1 shared) (Figure 2B). Focal-level events specific to mCRC involved several known CAG, loci such as MYC (8q24.21) or FHIT (3p14.2). Recurrent focal peaks involved 831 genes in pCRC and 1,173 genes in mCRC (Figure 2C), with 648 shared between pCRC and mCRC and 525 specific to mCRC. Molecular complex detection (MCODE) analysis of mCRC-specific genes revealed several functional clusters (Figure S2E), including the major histocompatibility complex (MHC) class I/antigen-presenting pathway and phosphatidylinositol 3-kinase (PI3K)-Akt signaling. When we focused on CAG, 7 genes were specific to pCRC, 11 were specific to mCRC, and 12 were shared by both (Figure 2C). Several known CAGs of CRC, such as EGFR, were included in the shared gene list. MYC, an established oncogene of CRC, was specific to mCRC by focal peak analysis (Figures 2C and 2D). In the GENIE cohort (public v.10.1) with MSK-IMPACT targeted panel cancer gene sequencing data of 83 pCRC-mCRC paired samples, currently the largest publicly available primary-metastasis-paired sequenced cohort, MYC amplification had the third-highest average frequency in metastases, which agrees with our finding19 (Figure S2F).
Figure 2. Recurrent somatic copy number aberration and integrated genomic alterations.

(A) Focal peak amplification plot. The left side shows pCRC, and the right side shows mCRC. Peaks with cytoband loci are statistically recurrent events in this cohort 1 (FDR < 0.05). Cytoband loci with a red color are shared between pCRC and mCRC. Cytoband loci with a black color denote specific events not shared between pCRC and mCRC.
(B) Focal peak deletion plot. Cytoband loci with a red color are shared between pCRC and mCRC. Cytoband loci with a black color denote specific events not shared between pCRC and mCRC.
(C) A Venn diagram of genes that are involved in recurrent focal peaks. Red-colored genes are involved in focal peak amplification, and blue-colored genes are involved in focal peak deletion. Recurrent focal peaks of pCRC involve 831 genes. Recurrent focal peaks of mCRC involve 1,173 genes. Among CAGs, 7 genes are specific to pCRC, and 11 genes are specific to mCRC.
(D) A network plot of GOBP enrichment results in pCRC-mCRC shared CAGs (12 genes) and mCRC-specific CAGs (11 genes). GOBP enrichment analysis of mCRC specific genes shows β-catenin-related pathway enrichment and stemness enrichment. See also Figure 3.
(E) Oncoprint of integrated genomic alterations with 10 or more events in cohort 1. Among 39 genes in this plot, 24 genes (61%) are mainly affected by SVs. Among genes with a greater than 50% genomic alteration rate, 4 genes (MACROD2, TTC28, FHIT, and LINGO1) of 7 genes are mainly affected by SVs.
(F) Co-occurrence and mutual exclusiveness analysis of the top 19 genes that are involved in 12 samples or more. Shown is the co-occurrence exclusiveness score (log10(odds value)).
(G) Among the 15 genes that were mainly affected by small SNVs, indels, or other alterations, 12 genes were quantified at the mRNA level. mRNA expression is generally downregulated in the genomically altered group.
Integrated SNV, SV, and SCNA calls reveal recurrent events of tumor-suppressive genes
To understand multilevel genomic alterations, we summarized genomic alterations from single-nucleotide variants (SNVs) to large-scale genomic SVs of the top 39 recurrent genes that harbor genomic alterations in over 31% of samples (Figure 2E). Among these 39, 24 genes (61%) were mainly affected by SVs, including large insertions, translocations, or deletions, and various genes have been reported to have tumor suppressor roles; e.g., MACROD2, FHIT, and CCSER1.20–22 WGS analysis revealed that CRC had frequent SV events in some well-known genes. Among the 15 genes that were mainly affected by small SNV insertions or deletions (indels) or other alterations (Figure S2G), 7 genes have tumor-suppressive functions (APC, TP53, FLRT2,23,24 NRK,25 ADGRL3, GABRG3,26 and GUCY1A27). APC had the highest count of nonsense mutations. TP53 had various mutation types but was mainly affected by missense mutations and frameshift deletions. The highest mutation type for FLRT2, ADGRL3, NRG1, GABRG3, STS, and GUCY1A2 was 3’ UTR mutation. KRAS, CFAP47, MUC16, and TTN were largely affected by missense mutations. Co-occurrence and exclusiveness analysis revealed that FLRT2 and MACROD2 were mutually exclusive, whereas APC had significantly high co-occurrence with NRK and FHIT (Figure 2F). To elucidate genomic alteration impact on mRNA expression, we compared transcript expression between genomically altered and non-genomically altered groups (Figure 2G). Among 15 genes that are mainly affected by SNV_INDEL, mRNAs of 12 genes were quantified in our dataset. Although the effects of single-nucleotide mutations in the 3’ UTR on mRNA expression and stability are still understudied, mRNAs with 3’ UTR mutations have been reported to be susceptible to decay,28 supporting downregulation of FLRT2, ADGRL3, NRG1, STS, and GUCY1A2 in the genomically altered group. On the other hand, the important oncogene KRAS is only involved by missense mutations, resulting in constitutively active protein. No structural event was found in KRAS. The KRAS mRNA increase in the genomically altered group is compatible with its oncogenic role.
Transcriptome and proteome analyses reveal hypoxia, stemness, and oncogenic signatures
RNA-seq quantified 20,791 gene transcripts, and proteomics quantified 4,204 proteins in the discovery cohort. The mean correlation coefficient between mRNA and protein abundance was 0.49, which is similar (0.47) to a previous CRC CPTAC cohort.4 An unsupervised uniform manifold approximation and projection (UMAP) shows clear separation between normal and tumor samples, with the proteome showing better separation than the transcriptome (Figure S3A). Differential transcriptome expression analysis identified 2,846 significantly differentially expressed genes (DEGs) with 2-fold change or greater between pCRC and normal tissue (1,465 up and 1,381 down) and 3,313 significant DEGs between mCRC and normal tissue (1,616 up and 1,697 down) (Table S1). Differential proteome expression analysis found 675 significantly differentially expressed proteins (DEPs) with 2-fold change or greater between pCRC and normal tissue (454 up and 221 down) and 826 significant DEPs between mCRC and normal tissue (546 up and 280 down) (Table S1). A CPTAC study of CRC (pCRC only) has reported previously that 31 proteins were upregulated in pCRC relative to normal colon tissue.7 Among 31 upregulated proteins in the CPTAC study, 22 proteins were quantified in our cohort 1. All 21 proteins except one protein (IGF2BP3), 95.2% of quantified proteins, showed significant upregulation (≥2-fold change) in our dataset. Thus, our dataset has high concordance with this previous report.
To identify common signatures in transcriptome and proteome, we plotted heatmaps using significant DEGs and DEPs with 4-fold or greater expression changes in any pairwise comparison (normal vs. pCRC vs. mCRC). These 1,220 mRNAs and 312 proteins were separated by k-nearest neighbor (k-NN) clustering into 3 signatures: signature 1 (normal high, pCRC/mCRC low), signature 2 (normal low, pCRC/mCRC high), and signature 3 (normal/pCRC low, mCRC high) (Figure 3A; Table S1). Pathway enrichment analysis by Metascape29 revealed various dysregulated pathways in CRC, including cell adhesion, epithelial-to-mesenchymal transition (EMT), mesenchymal stem cell differentiation, adaptive immune system, and drug metabolism pathways. Signature 1 of the transcriptome and the proteome revealed consistent downregulation of cell adhesion, extracellular matrix (ECM) organization, and increased EMT in pCRC and mCRC. Signature 2 of the transcriptome and the proteome showed upregulated oncogenic pathways. Signature 3 of the transcriptome and the proteome revealed concurrent upregulation of drug metabolism in mCRC.
Figure 3. Transcriptome and proteome analyses.

(A) Heatmap of the transcriptome and proteome using significant DEGs genes with 4-fold change (pCRC vs. normal colon or mCRC vs. normal colon). Row order was clustered by the k-NN method with k = 3. Significant pathways are shown on the right side of the heatmap, as obtained from the Metascape web tool. Signature 1 is upregulated only in pCRC and downregulated in both pCRC and mCRC. Signature 2 is upregulated in both pCRC and mCRC and downregulated in normal colon. Signature 3 is specifically upregulated in mCRC. Signatures 1 and 2 are CRC common signatures regardless of tumor site. Signature 3 may have an important role in metastatic progression of CRC.
(B) A balloon plot of canonical pathway enrichment (IPA software). The Z score is the predicted pathway activation status. Gray color means no directional information in the IPA knowledge database. Even though the individual input gene/protein list for IPA has significant enrichment of pathway genes/proteins, if a calculated confidence level of the entire pathway directional value remains under statistical significance, then IPA assigns a gray color. At the RNA level, many oncogenic pathways, such as the ERK/MAPK pathway and AKT pathway, are enriched in mCRC. In addition, hypoxic pathways are also enriched in mCRC. At the protein level, the LXR/RXR pathway is strongly enriched in mCRC.
(C) Boxplots of immunohistochemistry scores for cohort 1 (Wilcoxon test). IHC assessment of tumor cells confirms enrichment of the hypoxia signature, stemness signature, common oncogenic signature, and RXR signature in mCRC. Normal colon, n = 16; pCRC, n = 16; mCRC, n = 16. Note: only CD44 and L1CAM were quantified in proteomics data. We did not perform direct correlation analyses between IHC score and proteome quantification because the IHC score is derived from tumor cell staining only, whereas proteomics data are derived from bulk tissue analysis (including cancer cells, stroma, inflammatory cells, etc.).
(D) A heatmap of upstream regulators that govern RNA signature 2. Boxplots on the right show gene perturbation effect scores for 53 colon cancer cell lines when genes are edited by CRISPR-Cas9. Inhibition of these genes may have antitumor effects. * denotes 10 cell cycle-related genes that are annotated as GOBP_CELL_CYCLE (GO:0007049).
Signaling pathway enrichment analysis by Ingenuity Pathway Analysis (IPA)30 using 2-fold or greater abundance changes between pCRC and mCRC identified several oncogenic pathways enriched in mCRC; e.g., ERK/mitogen-activated protein kinase (MAPK) and PTEN signaling (Figure 3B). We validated these pathways by IHC scoring of key regulators, such as AKT, mTOR, and ERK (Figure 3C). At the proteome level, the liver receptor X (LXR)/RXR signaling pathway was the most significant (Figure 3B), and IHC confirmed RXR upregulation and LXR downregulation in CRC (Figure 3C). LXR is a nuclear receptor that induces apoptosis and AKT repression when activated31 and is downregulated in cancer. Phosphorylated RXRα, a nuclear receptor and heterodimer partner of PPARγ, has been reported to accumulate in colon cancer tissue and colon cancer cell lines. Accumulation of RXRα impairs PPARγ signaling and results in tumor growth and apoptosis inhibition.32 IHC scoring of cohorts 1 and 2 showed RXR upregulation and LXR downregulation, respectively (Figures 3C and S3B). These results support the findings reported above and may be new therapeutic targets of mCRC as well as pCRC. HIF1α signaling was the second most significant pathway at the transcriptome level, and we confirmed protein overexpression in CRC by IHC (Figure 3C). IHC assessment of our large validation cohort (Figure S1) confirmed enrichment of these key signatures in mCRC (Figure S3B).
Upstream regulator analysis using IPA identified 18 transcriptional regulators that were upregulated in mCRC, including TCF7L2 and SOX2 (Figure S3C). TCF7L2 participates in angiogenesis and metastasis (Figure S3D). Its gene inactivation in mouse models and intestinal organoids results in depletion of stem and progenitor cells.33–36 SOX2 is a cancer stem cell-related gene in colon cancer, and its network promotes stem cell development and viability (Figure S3D). These findings support stemness enrichment in mCRC.
To elucidate how transcriptome-based consensus molecular subtypes37 distribute in our cohort 1, we assigned a consensus molecular subtype to each sample by using its R CMS classifier37 and our transcriptome data of cohort 1 (Figure S3E). As expected, there was no CMS subtype 1 since this subtype is strongly associated with MSI cancer, and our cohort 1 is only comprised of non-MSI cancers. In addition, subtype distribution between primary and metastatic sites was similar to the previous report. Although statistical testing is difficult due to the small sample number of cohort 1, metastatic samples showed a higher proportion of CMS4. This is concordant with the fact that CMS4 has significantly shorter overall and relapse-free survival compared with CMS1–CMS3.37
To find shared upstream regulators in pCRC and mCRC that are potentially inhibitory targets, we performed upstream regulator analyses in IPA software by using genesin RNA signature 2 (significantly upregulated in both pCRC and mCRC compared with normal colon tissue) that possess average CRISPR-Cas9 gene effect scores in 53 colon cancer cell lines below −0.5 (a lower value means stronger dependency on a gene) as derived from the DepMap database.38,39 This analysis identified 17 targetable genes, with 10 being cell cycle related (Figure 3D). Since these genes were upregulated in both pCRC and mCRC with high gene effect scores, inhibitory molecules against these targets may be promising, especially for late-stage CRC treatment.
Proteome-based clustering separates pCRC and mCRC into unique subtypes
Since the discovery cohort showed the promise of proteogenomics for profiling pCRC and mCRC and delineating metastatic progression, we sequenced a large cohort of 340 tissue samples (126 mCRCs, 138 pCRCs, 62 normal colonic mucosae, and 14 normal liver tissues) and quantified 5,377 proteins across the entire cohort (Figure S1). Unsupervised UMAP analysis highlights robust separation of pCRC from mCRC (Figure S4A). Benign colonic mucosa is distinct but more closely related to pCRC than mCRC, and benign liver parenchyma forms its own very distinct cluster. After removing 6 outliers, we performed joint consensus clustering of all tumor samples and found 6 distinct subtypes as an ideal separation (Figures S4B–S4E). Although all 258 pCRC and mCRC samples were analyzed together in an unbiased manner, unsupervised clustering separated 135 pCRC profiles into 3 clusters (P1–P3) and 123 mCRC profiles into 3 clusters (M1–M3) (Figure 4A). These findings demonstrate that mCRC and mCRC can be differentially subtyped based on their proteome profiles.
Figure 4. Unsupervised clustering of cohort 2 reveals 6 subtypes.

(A) A summary heatmap of clinical attributes and proteome data (top 250 proteins with largest mean absolute deviation within the cohort). IHC scores, stemness score, hypoxia score, and immune score are also shown. Gray color means a missing value.
(B) A bar plot of the mutation frequency for each proteome subtype (Fisher’s exact test). TP53 mutations are significantly enriched in M2/3 compared with M1. KRAS mutations are significantly enriched in M1 compared with M2/3. PIK3CA mutations are significantly enriched in P2 compared with P1/3.
(C) A bar plot of right-sided CRC frequency in each proteome subtype. Right-sided CRC of the P1 subtype has a significant lower frequency compared with P2/3 subtypes.
(D) A Sankey plot of 68 matched pCRC-mCRC samples from cohort 2. This plot shows that proteomes of CRC have high plasticity during metastatic progression.
Next, we asked whether these proteome-based subtypes correlate with specific cancer gene mutations. We obtained clinical MSK-IMPACT gene panel sequencing data on 198 samples in this proteome cohort (Table S2). Mutation frequencies of APC, TP53, KRAS, PIK3CA, NRAS, and BRAF in each proteome subtype are shown in Figure 4B. TP53 mutations are significantly more frequent in M2 and M3 than in M1. Conversely, KRAS mutations are significantly more frequent in M1 than M2 or M3. PIK3CA mutations are significantly more frequent in P2 than other pCRCs. In addition, we observed that right-sided pCRCs, which tend to have poorer clinical outcomes,40 appear to be enriched in P2/P3 primary tumors and M1/M2 metastases (Figure 4C). To elucidate the possible influence of treatment effects, we performed differential expression (DE) analysis between untreated pCRC (n = 68) and treated pCRC (n = 67) and untreated mCRC (n = 31) and treated mCRC (n = 92). DE analysis of pCRC found only 33 protein groups with a false discovery rate (FDR) of less than 0.05 and log2-fold change (log2FC) absolute value over 1 (Figure S4F). DE analysis of mCRC found no statistical difference between untreated and treated samples (Figure S4G). We also checked sample distribution of the proteomic subtype based on prior treatment status and found no statistical difference (chi-square test, p = 0.24) in sample distribution between untreated and treated samples (Figure S4H). This suggests that prior treatment status does not have a strong effect on tumor proteome and proteomic subtypes. We then asked whether there are survival time differences between subtypes, but proteome subtypes did not show a statistical difference in survival time, possibly due to the fact that our cohort is, by design, enriched in stage IV patients.
To understand how CRC subtypes progress from primary to metastasis, we created a Sankey plot using 68 matched pCRC-mCRC pairs (Figure 4D). Surprisingly, there was no simple connection between pCRC and mCRC subtypes. Primary tumors from each of the pCRC P1–P3 subtypes can progress to metastases of any of the mCRC M1–M3 subtypes. This finding indicates that CRC displays proteomic subtype plasticity during progression, whereas underlying mutations in major oncogenic genes change little between pairs of pCRC and mCRC. This also suggests that proteomic profiling of both primary and metastatic cancers may be necessary for better disease stratification and therapy selection.
Since the discovery cohort identified hypoxia, stemness, and immune signatures as potential proteomic signatures (Figures 3 and S2), we evaluated their relevance in the proteomic subtypes (Figures 4A and S4I–S4K). For pCRC, subtype P1 is low in stemness, P2 is low in hypoxia, whereas P3 is not easily defined. For mCRC, subtype M1 is high in hypoxia, M2 is high in stemness but low in immune score, whereas M3 is moderate in these features. To characterize our proteomic subtypes of cohort 2 in detail, we then conducted differential analyses between one subtype vs. the other subtypes at the pathway level based on ssGSEA scores of the MSigDB 50 Hallmark gene sets, which cover well-known cancer-related pathways (Figure S4L). Cell cycle-related pathways (E2F, DNA repair, MYC targets, etc.) and stemness-related features (MYC targets) were significantly enriched in P2 and M2, whereas these pathways were downregulated in P1 and M1. The PI3K and Wnt/β-catenin pathways were highly enriched in P2. EMT, which is an important feature of clinically aggressive cancer, was significantly enriched in P1 and M1. Inflammatory signatures, such as the interferon γ (IFNγ) response, were enriched in M1. Even with systematic pathway analyses, P3 is difficult to define. M3 showed significant enrichment of hypoxia, glycolysis, and reactive oxygen species pathways. Thus, independent systematic Hallmark gene set analyses of cohort 2 suggested that hypoxia/EMT, stemness, oncogenic, and inflammatory signatures are of interest for further investigation, consistent with what was found in cohort 1.
The hypoxia signature is associated with EMT and metabolic reprogramming
To characterize CRC with hypoxia signature, we focused on EMT and metabolic adaption since these are key features observed in cancer under hypoxic conditions, facilitating metastasis.41,42 First, we investigated the correlation between the proteomic hypoxia signature and EMT and metabolic adaption pathways (Figure 5A). As expected, the HIF1α IHC score, EMT, and metabolic pathways significantly correlated with the Hallmark hypoxia score. The P4HA1 protein level, a stabilizer of HIF1α,43 also positively correlated with the hypoxia score and the HIF1α IHC score (Figure S5A). In addition, P4HA1 protein levels were significantly higher in mCRC compared with pCRC (Figure S5B). High P4HA1 expression correlates with poor prognosis and increased metastasis in murine CRC PDX models.44,45 Taken together, these results suggest that P4HA1 is an important stabilizer of HIF1α, especially for mCRC. Other important features of hypoxia in cancer, such as vascularization scores (HALLMARK_ANGIOGENESIS and VEGF_A_UP.V1_UP), were also significantly correlated with the proteomic hypoxia signature, concordant with previous reports.46
Figure 5. Hypoxia signature of CRC.

(A) A heatmap showing hypoxia signature correlation with EMT signature and metabolic reprogramming. The bar chart on the right side of the heatmap shows correlation values between the hypoxia score and the corresponding score. * denotes statistical significance (Spearman correlation coefficient). mCRC generally has a high hypoxia score (shown as “Tissue site” annotation bar). The Hallmark hypoxia score significantly correlates with known hypoxia-related features, such as angiogenesis or the EMT signature.
(B) A volcano plot of DE analysis between hypoxia-high CRC (n = 129, right) and hypoxia-low CRC (n = 129, left). DE analysis identifies 805 significantly dysregulated proteins with an over 2-fold abundance change.
(C) Boxplots of EMT markers. Vimentin and E-cadherin were significantly dysregulated in the hypoxia-high group.
(D) Boxplots of IHC scores of EMT-inducing transcription factors. Only ZEB1 is significantly upregulated in the hypoxia-high group. 104 hypoxia-high group samples and 110 hypoxia-high group samples were assessed.
(E) A correlation plot of TGF-β signaling and the EMT signature, showing significant positive correlation.
(F) Boxplots of non-Smad TGF-β signaling pathway member expression. The PTEN/AKT/mTOR and MAPK pathways are significantly enriched in the hypoxia-high group.
(G) A correlation plot of non-Smad TGF-β signaling and the EMT signature. pmTOR and pERK have significant positive correlations with the EMT signature. The direction of PTEN/Akt pathway activation also matches the EMT signature but is not statistically significant.
(H) A GSE plot of metabolic reprogramming-related pathways in cohort 2 (hypoxia-high/low). Aerobic glycolysis and fatty acid metabolic processes are positively enriched in the hypoxia-high group. Conversely, oxidative phosphorylation and the TCA cycle are negatively enriched in the hypoxia-high group.
(I) A bar plot of glucose transporter expression ratios (hypoxia-high/low). All glucose transporters are upregulated in the hypoxia-high group. * denotes statistical significance in the DE analysis shown in (B).
(J) A bar plot of glycolysis-related enzyme expression ratios (hypoxia-high/low). Glycolysis-related enzymes show various expressional trends, but as a pathway, glycolysis is upregulated in the hypoxia-high group (H). * denotes statistical significance in the DE analysis shown in (B).
(K) A bar plot of TCA cycle-related enzyme expression ratios (hypoxia-high/low). TCA cycle enzymes are downregulated in the hypoxia-high group. * denotes statistical significance in the DE analysis in (B).
(L) A chart of metabolic reprogramming in CRC under hypoxia. Cancer cells utilize the aerobic glycolysis process to produce biologic energy.
DE analysis between CRCs with a low hypoxia score and CRCs with a high hypoxia score (median hypoxia score as threshold) revealed 805 significantly dysregulated proteins with 2-fold or greater abundance change (304 upregulated, 501 downregulated; Figures 5B; Table S3). Gene set enrichment analysis (GSEA) of Hallmark genes confirmed EMT enrichment in the hypoxia-high group and found DNA repair and cell cycle pathways to be downregulated in the hypoxia-high group (Figure S5C). Independent CPTAC proteome data confirm this result (Figure S5D; see the GSEA results for c2 and Hallmark gene sets in Table S3). Downregulation of cell cycle activity under hypoxic conditions is plausible due to low energy and nutrient supply and concordant with a previous report.47 A hypoxic environment induces downregulation of DNA repair processes, resulting in increased mutagenesis.48,49 Although specific gene sets of single- or double-strand repair mechanisms were not significant in our cohort 2, in the CPTAC CRC dataset, both single-strand and double-strand break repair (including homologous recombination and non-homologous end joining) mechanisms were significantly downregulated in the hypoxia-high group (Figure S5E). Consistent with this, TMB was higher in the hypoxia-high CPTAC group (Figure S5F).
Since EMT appeared to be enriched in hypoxia-high CRC (Figures 5A and 5C), we performed IHC assessment of EMT-inducing transcription factors (ZEB1, TWIST, and SNAI). This analysis showed that ZEB1 was significantly upregulated in hypoxia-high tumors, but SNAI1/2 and TWIST1 showed no difference (Figure 5D). Discovery cohort 1 transcriptome data and Pan-Cancer Atlas CRC transcriptome datasets support our finding (Figures S5G and S5H).
Next, we investigated a possible crosstalk between the EMT signature and transforming growth factor β (TGF-β) signaling.50 The EMT signature score showed significant positive correlation with TGF-β signaling (Figures 5A and 5E). However, SMAD2 and SMAD4 expression showed negative correlation with the EMT signature score. We therefore focused on non-SMAD-dependent TGF-β signaling; i.e., the MAPK and PI3K/AKT pathways.51 We found the PTEN/AKT/mTOR and MAPK pathways to be activated in hypoxia-high proteomic subgroup CRCs (Figure 5F). Comparing expression of these proteins against the Hallmark EMT signature score shows the same trend (Figure 5G). These findings were confirmed by CPTAC data, except for MAPK signaling (Figures S5I–S5K). These observations indicate that the hypoxia-induced EMT signature has crosstalk with TGF-β signaling via a non-SMAD-dependent axis. These findings raise the possibility of inhibitory therapy against TGF-β signaling, especially for hypoxia-high mCRC with EMT features (Figure 5A).41,52
We then examined how CRC may adapt metabolically to hypoxic conditions; i.e., low energy supply. We focused on ATP-producing pathways; i.e., glycolysis, the TCA cycle, and fatty acid oxidation. We performed a GSEA between hypoxia-high and hypoxia-low groups and found that glycolysis and fatty acid metabolic pathways were enriched in hypoxia-high tumors, while oxidative phosphorylation and TCA cycle pathways were downregulated (Figure 5H). Consistent with this gene set enrichment result, protein expression of glucose transporters and glycolysis-related enzymes was increased in the hypoxia-high vs. the hypoxia-low group (Figures 5I and 5J). TCA cycle enzyme expression levels were downregulated in hypoxia-high CRCs (Figure 5K). The CPTAC CRC dataset showed similar pathway enrichment results (Figure S5L). In addition, mCRC showed aerobic glycolysis enrichment compared with pCRC (Figure S5M). Cancers can adapt to preferentially use aerobic glycolysis over the TCA cycle to produce ATP even though the TCA cycle produces ATP more efficiently (the Warburg effect).53 Our proteomics findings confirm that hypoxia-subtype CRC cells shift to ATP production via aerobic glycolysis, especially in hypoxia-prone metastatic lesions, resulting in hypoxia adaption to the cancer microenvironment during metastatic progression (Figure 5L).
Cancer stemness correlates with oncogenic pathway activation, telomere maintenance, and the drug resistance phenotype
To characterize proteome-based cancer stemness of pCRC and mCRC in a systematic manner, rather than relying on single protein markers such as CD44 or SOX2, we calculated a comprehensive proteomic stemness score and performed DE analyses between stemness score high and low groups (median stemness score as threshold). This analysis identified 1,089 proteins that were significantly dysregulated with 2-fold or greater abundance changes, comprising 860 up- and 229 downregulated proteins) (Figure 6A; Table S4). GSEA with Hallmark pathways uncovered dysregulation of oncogenic pathways (e.g., MYC targets) upregulated in stemness-high tumors (Figure 6B). Upstream regulator analysis of DEGs between stemness-low and -high groups also ranked MYC as the most enriched transcription factor in stemness-high tumors (Figure S6A), which is validated by the CPTAC dataset (Figure S6B) and cohort 2 dataset with MSigDB c6 gene set analysis (oncogenic pathway gene sets) (Figure S6C). IHC validated MYC overexpression in stemness-high tumors (Figure 6C). In addition, G2M checkpoint and cell cycle pathways were activated in stemness-high tumors (Figures 6B and 6D; Table S4), which is validated by the increased expression of the cell proliferation marker Ki67 by IHC (Figure 6E). MYC upregulation in CRC with a high stemness score is consistent with the observation that MYC is one of the transcription factors inducing pluripotency in embryonic stem cells.54 Since cancers with high stemness properties often display therapeutic drug resistance, we investigated the expression of drug transporters. Most of the 15 drug transporter proteins quantified in our proteome dataset were highly upregulated in stemness-high CRCs; e.g., ABCB7, ABCC1, ABCD3, ABCE1, and ABCF3 (Figure 6F).
Figure 6. Characteristics of cancer stemness.

(A) DE analysis between stemness-low (n = 129) and stemness-high groups (n = 129) in cohort 2, identifying 1,089 significantly dysregulated proteins with over 2-fold abundance changes.
(B) A bar plot of GSEA with Hallmark gene sets comparing stemness-low and stemness-high groups in cohort 2. All significant results (q < 0.05) are shown. The MYC target set is the top enriched term in the stemness-high group. In addition, cell cycle and DNA repair processes are significantly enriched in the stemness-high group.
(C) A boxplot of the MYC IHC score. IHC scores were available for 111 stemness-high samples and 103 stemness-low samples. MYC expression is significantly higher in the stemness-high group than in the stemness-low group (Wilcoxon test).
(D) A GSE plot of cell cycle-related Kyoto Encyclopedia of Genes and Genomes (KEGG) and REACTOME terms, showing cell cycle up-regulation in the stemness-high group with statistical significance.
(E) A boxplot of the Ki67 index, showing significantly greater tumor cell proliferation in the stemness-high group than in the stemness-low group (Wilcoxon test).
(F) A bar plot of ABC transporter protein expression ratios between stemness-low and stemness-high groups in cohort 2. All quantified ABC transporters except ABCA6 are upregulated in the stemness-high group, suggesting drug resistance. * denotes statistical significance in the DE analysis shown in (A).
(G) A bar plot of GSEA results with the MSigDB c2 and c5 gene sets. Telomere-related terms with statistical significance are shown.
(H) Bar plot of GSEA results with MSigDB REACTOME gene sets. DNA damage response-related terms with statistical significance are shown.
(I) A correlation plot of the TEL score and stemness score shows significant positive correlation (Spearman correlation coefficient).
(J) A correlation plot of the ALT score and stemness score shows that the ALT score has stronger correlation with the stemness score compared with the correlation between the TEL score and the stemness score (Spearman correlation coefficient).
(K) Boxplot of the TEL score by tissue type, showing that only pCRC has high TEL activity (Wilcoxon test).
(L) Boxplot of the ALT score by tissue type, showing that both pCRC and mCRC have higher ALT activity than normal tissue (Wilcoxon test).
(M) Boxplot of the ALT/TEL score ratio by tissue type and proteome subtype, showing that M2 has a significantly higher ratio than pCRC subtypes (Wilcoxon test).
(N) Boxplot of the stemness score by tissue type and proteome subtype. M2 has a significantly higher stemness score than pCRC, consistent with (M) and (J) (Wilcoxon test).
(O) Boxplot of ATRX and DAXX IHC scores by tissue type, showing that both molecules are significantly downregulated in mCRC (Wilcoxon test).
Since stemness-high CRC showed upregulation of DNA repair pathways and telomere maintenance pathways (Figures 6B, 6G, and 6H), we examined telomere maintenance mechanisms using the TelNet database55 and the ssGSEA algorithm. Both the TEL (telomerase-dependent lengthening) and ALT (alternative telomere lengthening) pathways correlated positively with stemness scores, with ALT showing stronger correlation with stemness (Figures 6I and 6J). The TEL pathway appeared to be upregulated only in pCRC, but the ALT pathway was enriched in both pCRC and mCRC (Figures 6K and 6L). CPTAC proteome and GSE50760 transcriptome datasets confirmed these findings (Figures S6D and S6E). ALT/TEL score ratios showed that the M2 subtype has relatively higher ALT pathway scores (Figure 6M), consistent with M2 having the highest stemness score among mCRCs (Figure 6N). ALT-high tumors, such as sarcomas, have been associated with DAXX/ATRX downregulation,56 and, concordantly, we found loss of these proteins in mCRC, especially in subtype M2 (Figures 6O, S6F, and S6G).
Immune-cold tumors are characterized by suppression of antigen processing pathways and poor survival, especially in the metastasis setting
To elucidate cancer immune evasion mechanisms of immune-cold tumors, we performed integrated proteomics analyses assisted by IHC assessment of immune cell infiltration of tumor tissue. The ESTIMATE immune score57 correlated well with CD3 (a general lymphocyte marker) IHC assessment. MSI CRC samples were significantly enriched in the immune-hot group (median immune score as threshold, chi-square test, p = 0.0199). In addition, the immune score positively correlated with antigen-presenting machinery (APM), including MHC class I and II pathways (Figure 7A). DE analysis between immune-cold and immune-hot tumors (median immune score as threshold) identified 790 DEPs with 2-fold or greater abundance changes, including 473 up- and 317 downregulated proteins in immune-cold tumors (Figure S7A; Table S5). GSEA confirmed significant downregulation of antigen-presenting pathways (Figure 7B). Protein expression of MHC class I/II expression by tumor cells correlated with CD4+ and CD8+ T cell counts, CIITA (a transcription regulator of MHC class II), and IRF1 (a transcription regulator of MHC class I molecules and CIITA) (Figure 7C). Analysis of the CPTAC CRC cohort supports these findings (Figures S7B and S7C). To investigate a possible regulatory effect of regulatory T cells or infiltrating macrophages on the immune microenvironment, we analyzed FOXP3+ (a regulatory T cell marker) and CD68+ (a pan-macrophage marker) cell counts relative to the immune score, but no significant correlation was observed. IRF1 protein expression correlated positively with a number of proteins involved in the antigen processing process (including the immunoproteasome), such as TAP1, TAP2, TAPBP, ERAP1, PSMB8, and PSMB9, while catalytic subunits of the conventional proteasome (PSMB5, PSMB6, and PSMB7) were negatively correlated (Figure 7D). Concordant with a report that the IRF1 and CIITA transcription factors are regulated by the IFNγ-JAK-STAT pathway,58 IRF1 and CIITA showed positive correlations with these proteins and pathways (Figure 7C), suggesting that stimulation of IFNγ signaling may have therapeutic value against immune-cold CRC.59–61 Among the 6 proteome-based subtypes, M1 mCRC had the lowest protein expression of HLA-ABC, HLA-DP/DQ/DR, CD3, and IRF1 (Figure 7E). Based on IHC results, the M1 subtype is immune cold, but, interestingly, M1 also has the highest hypoxia score, as described earlier (Figure S4I). This phenotype suggests that hypoxic conditions may actively suppress immune defenses in the tumor microenvironment. In contrast, the P3 subtype (pCRC) appeared to be immune hot with significantly higher protein expression of IRF1, CIITA, and HLA-DP/DQ/DR (Figure 7E).
Figure 7. Characterization of immune-cold tumors.

(A) A heatmap with immune signature and clinical attributes. The immune score significantly correlates with IHC immune infiltrate scores and immune-related pathways, except for CD4 and FOXP3 IHC scores. * denotes statistical significance.
(B) A GSE plot with antigen processing and presenting pathways from KEGG and REACTOME terms. Both antigen processing pathways are strongly downregulated in the immune-cold group.
(C) A correlation plot (Spearman correlation coefficient) of IHC scores that involve the antigen processing machinery. Protein expression of STAT is derived from mass spectrometry data. The IFNγ response score is derived from ssGSEA analysis with HALLMARK_INTERFERON_GAMMA_RESPONSE. *p < 0.05, **p < 0.01, ***p < 0.001. Clear correlations of CD4-MHC class II and CD8-MHC class II are shown with statistical significance. In addition, the IFNγ pathway and STAT1 positively correlate with their direct targets (IRF1 and CIITA) and with downstream targets (MHC class I and II).
(D) A correlation plot (Spearman correlation coefficient) with IRF1 IHC score and antigen processing machinery protein expression. *p < 0.05, **p < 0.01. ***p < 0.001. IRF1 expression positively correlates with immunoproteasome subunits but negatively correlates with catalytic conventional proteasome subunits.
(E) Boxplots of IHC analysis results for HLA-ABC, HLA-DP/DQ/DR, CD3, CD4, CD8, IFR1, and CIITA. 22 P1, 42 P2, 51 P3, 16 M1, 53 M2, and 30 M3 samples were assessed. M1 shows the lowest expression of HLA-ABC (MHC class I) among all proteome subtypes. Although the infiltrating lymphocyte profile (CD3, CD4, and CD8) shows no significant differences, M1 has the lowest CD3+ lymphocyte count and lower median counts of CD8+ lymphocytes than other subtypes. This is concordant with the lowest expression of HLA-ABC (MHC class I) in the M1 subtype. In addition, IRF1, a key transcription initiator of MHC class I, shows the lowest expression in M1.
(F) Kaplan-Meier survival curve analysis stratified by CD4 expression level of pCRC.
(G) Kaplan-Meier survival curve analysis stratified by CD8 expression level of pCRC.
(H) Kaplan-Meier survival curve analysis stratified by CD4 expression level of mCRC.
(I) Kaplan-Meier survival curve analysis stratified by CD8 expression level of mCRC.
We assessed patient outcome stratification based on immune profiles. pCRC with a high density of CD4+ or CD8+ T cells had slightly better clinical outcomes in overall survival (Figures 7F and 7G), whereas mCRC with a high density of T cells had significantly better outcomes (Figures 7H and 7I). This observation underlines the need for assessment of metastatic tissue biopsies for better patient stratification.
DISCUSSION
This study provides a large and comprehensive proteomics and proteogenomics resource for studying both pCRC and mCRC. We examined two cohorts of patients with mCRC: a multiomics discovery cohort (cohort 1) and a large proteogenomics cohort with independent TMA validation cohort (cohort 2). Our findings in this study are summarized as follows. (1) SV events are significantly enriched in CAGs compared with non-CAGs and showed higher frequency in mCRC. (2) Recurrent SCNA of mCRC involves many known oncogenes and tumor suppressors and characterizes immune signature enrichment in mCRC. (3) Integrated SNV, SV, and SCNA calls identify recurrently affected tumor suppressors and do not show significant differences in frequency between pCRC and mCRC. (4) Integrated transcriptomics and proteomics uncover distinct signatures of mCRC compared with pCRC, including hypoxia, cancer stemness, and immune signatures. (5) Transcriptomics highlight 17 druggable candidate targets that are concomitantly upregulated in both pCRC and mCRC. (6) Unsupervised clustering of CRC proteomes reveals 3 subtypes of pCRC and 3 subtypes of mCRC that are characterized by hypoxia, stemness, and immune signatures. (7) Hypoxia-high CRC, such as mCRC, has an enriched EMT signature and dynamically shifted metabolic processes. (8) Stemness-high CRC displays higher oncogenic pathway activation and telomere lengthening via the ALT pathway. (9) Immune-cold CRC evades host immune surveillance via downregulating MHC class I and immunoproteasome machinery.
SV analysis of the ICGC/TCGA 2,658 WGS cohort across 38 tumor types focused mainly on primary tumor tissue.14 Detailed SV analysis with multiomics assessment of matched pCRC and mCRC is still understudied. In cohort 1, we discovered that SV events were enriched in CAGs (oncogenes and tumor suppressors) and that mCRC showed a slightly higher SV event frequency. MACROD2, which is the most common SV-affected gene in our cohort 1, is frequently lost in CRC in the TCGA cohort, and loss of MARCOD2 impairs both DNA single-strand and double-strand damage responses via PARP1 dysregulation, resulting in chromosomal instability.20 TTC28 is also a recurrently SV-affected gene in colon cancer3,62 and esophageal squamous cell carcinoma.63 Although TTC28 function in cancer cells is not well studied, TTC28 deficiency suppresses microtubule dynamics and mid-zone microtubule assembly in zebrafish,64 and TTC28 knockdown in mammal cells shows an abnormally doubled nuclear phenotype,65 suggesting possible effects on chromosomal instability. Loss of FHIT, MACROD2, and TTC28 occurred in a mutually exclusive manner, suggesting these genes have similar biological effects on carcinogenesis and progression. These findings suggest that SV events (more than SNVs or indels) occur in specific human genome loci and facilitate tumor progression in CRC.
Recurrent SCNA also found CNAs enriched in CAGs, and mCRC-specific events involved MYC, a well-established oncogene, and immune signatures (MHC class I, antigen-presenting pathway). Concordantly, pan-cancer analyses have shown decreased MHC class I and related gene expression changes in metastatic lesions.66,67
To understand multidimensional genomic changes, we created an Oncoprint of pCRC and mCRC that integrates SNVs, SVs, and SCNA calls for each gene in a single plot. Such an integrated analysis, missing in a previous TCGA CRC study,3 reveals frequent occurrence of SV events in certain genes, such as KRAS or TP53. However, the integrated analysis found no significant difference at the single-gene level between pCRC and mCRC. In contrast, transcriptome and proteome analyses showed a much better signature separation (Figure 3). Both transcriptomes and proteomes of CRCs revealed similar clusters of oncogenic, hypoxia, and stemness signatures.
Previous efforts to molecularly subtype CRC with clinical significance have been made by using transcriptome (not proteome) data of pCRC. One of these efforts is the Consensus Molecular Subtypes (CMS) approach.37 As shown in Figure S3E, our cohort 1 has similar CMS distribution, with the CMS4 subtype enriched in mCRC.37 However, these subtypes were derived from transcriptome data of pCRC only. No molecular subtyping of mCRC at transcriptome or proteome levels has been done previously. In addition, no study has classified primary and metastatic samples together based on proteome data. Unsupervised clustering of pCRC and mCRC together unambiguously separated pCRC into 3 subtypes (P1–P3) and mCRC into 3 subtypes (M1–M3). Based on pathway enrichment analyses and key signature analyses, P1 has hypoxia-medium/stemness-low features, P2 has hypoxia-low features, M1 has hypoxia-high and low antigen presentation features, M2 has stemness-high features, and P3 and M3 are of mixed type. A prior clinical proteomics study of pCRC only reported 4 proteomic subtypes (MSI, CIN, mesenchymal, and other).7 According to their characterization, our P1/M1 subtypes are close to the mesenchymal type, and a subset of our P3 subtype is close to MSI. A trajectory analysis of 68 matched pCRC and mCRC discovered no apparent connection between pCRC subtypes and mCRC subtypes, suggesting that CRC is capable of high proteomic plasticity during metastatic progression to adapt to a new tumor microenvironment. This may have significant clinical-therapeutic implications because it supports the need for assessing mCRC directly (rather than making predictions based on pCRC samples) to derive metastatic signature-specific therapeutic targets for patients. Such proteomic trajectory analyses with a high number of patients have not been done previously, but further confirmatory studies will be needed.
We demonstrate that hypoxia-high mCRC shows a high EMT signature, high vascularization, and an oncogenic activation signature, such as high PTEN-Akt-mTOR signaling, and greatly shifts to the anerobic glycolysis process instead of the TCA cycle. These processes enable mCRC to survive under low nutrient supply conditions. Inhibition of metabolic adaptation blocked CRC progression.68,69 These results suggest targeting glycolytic processes and vascularization of mCRC as possible treatments.
Many genomic, transcriptomic, and proteomic signatures have been associated with cancer stemness and are connected to oncogenic pathways that orchestrate cancer progression and growth.70–72 We found oncogenic pathway enrichment, such as MYC targets or cell cycle regulators, in stemness-high CRC. Furthermore, we identified telomere lengthening mechanisms connected to cancer stemness features and relative ALT pathway enrichment in mCRC. This may open a way to new therapeutic development that targets telomere lengthening, especially the ALT pathway.
Recent advances in cancer immunotherapy have led to impressive gains in survival for many patients, including those with metastatic disease.73,74 Our study demonstrates that immune-cold CRC, including mCRC, may evade host immune surveillance via downregulation of MHC class I and antigen processing machinery. This could suggest that upregulation of these pathways via molecular modulators, such as IFNγ, may make immune-cold tumors again visible to the host immune system, resulting in better tumor progression control. In addition, the superiority of T cell assessment in mCRC, not pCRC, for patient outcome prediction suggests that there are metastatic immune properties that are currently not well leveraged for treatment.
In summary, our uniquely large proteomic and proteogenomic study provides a powerful resource for the study of both pCRC and mCRC and the unique features that drive metastatic progression.
Limitations of the study
There are several limitations that should involve additional follow-up studies. Our study provides primarily observational analyses and lacks mechanistic in vitro or in vivo experiments. While we found 6 proteomic subtypes, which characterize pCRC and mCRC, all mCRCs were from liver metastases (the most common site of CRC metastases). The second most common metastatic organ of CRC is the lung, whose tissue microenvironment is different from that of the liver. Therefore, further studies focusing on other metastatic sites will be needed. In addition, it is generally difficult to completely exclude therapeutic effects on analysis results of mCRC because of frequent pre-treatment history.
STAR★METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and requests should be directed to and will be fulfilled by the lead contact, Michael H. Roehrl (michael_roehrl@bidmc.harvard.edu).
Materials availability
This study did not generate new reagents.
Data and code availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD034575 (cohort 1) and PXD031705 (cohort 2). Uploaded file lists are available in Tables S9 and S10. Data of WGS and transcriptome analysis were deposited to European Genome-Phenome archive (EGAS00001006464, EGAS00001006465).
This study analyzed 3 publicly available datasets for validation, including CPTAC CRC proteome data (via LinkedOmics website91; 97 primary CRCs, 100 normal tissues),7 CRC transcriptome data of the PanCancerAltas project (via TCGAbiolink R package92; 646 primary CRCs, 51 normal tissues),3,76 and CRC transcriptome data of paired samples (GSE50760 via GEO website; 18 normal/primary CRC/liver metastatic CRC triplets).75.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this study is available from the lead contact upon request.
EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
The study was approved by the Institutional Review Board of Memorial Sloan Kettering Cancer Center. Frozen tissue sample and formalin-fixed paraffin-embedded (FFPE) tissue blocks were retrieved from Precision Pathology Biobanking Center of Memorial Sloan Kettering Cancer Center. Clinical data of these samples, including age and gender, were retrieved from medical records anonymously and are summarized in Table S6. Cohort design and sample size are summarized in Figure S1.
METHOD DETAILS
Clinical specimens and pathological data
We examined tissues from two cohorts of CRC patients. The discovery cohort 1 consists of matched sets of primary tumors, liver metastatic tumors, normal colonic mucosae, and normal liver tissues from 16 patients. The analysis and validation cohort 2 consists of 340 fresh frozen tissues, including matched tumors and normal tissues. There is no sample overlap between cohort 1 and cohort 2. Clinical data, such as patient demographics, treatment history, recurrence status, targeted sequencing results, mismatch repair enzyme expression status, histologic type were retrieved from medical records. Tumor content ratios for all samples were verified by gastrointestinal subspecialty pathologists. Clinicopathologic features and prognosis information, including gender, age, pTNM, anatomic site, chemotherapy status, and outcome, are summarized in Table S6.
MSK-IMPACT targeted cancer gene sequencing
To perform MSK-IMPACT sequencing,11 genomic DNA from primary or metastatic colorectal tumors were extracted using the Qiagen DNeasy Tissue kit and the EZ1 Advanced XL system (Qiagen). Extracted DNA was sheared using the Covaris E200 instrument (Covaris). Custom DNA probes were designed for targeted sequencing of all exons and selected introns of 505 genes. Probes were synthesized using the NimbleGen SeqCap EZ library custom oligo system and biotinylated. Sequencing libraries were prepared using the KAPA HTP protocol (Kapa Biosystems) and the Biomek FX system (Beckman Coulter) and sequenced using Illumina HiSeq 2500 to high, uniform coverage (>500× median coverage). All classes of genomic alterations including substitution, indels, copy number alteration, and rearrangement were determined and called against the patient’s matched normal sample. Clinical sequencing, including MSK-IMPACT, MassARRAY, or Sanger sequencing results, were available for 198 samples from MSKCC clinical records (Table S2). Structural variant analysis results are not available from clinical MSK-IMPACT data.
Tissue proteome extraction
Aliquots of 5 mg of frozen tissue were lysed with 200 μL lysis buffer containing 8 M urea, 0.1 M ammonium bicarbonate, phosphatase inhibitors 2 and 3 (Sigma), and protease inhibitors (Roche). The tissue mixture was homogenized with 12 1-min cycles of sonication at 120 W power (FB120, Fisher Scientific) and intermittent cooling. After centrifugation at 14,000 g for 30 min at 4°C, the supernatant containing all soluble proteins was collected. Proteome extraction from FFPE tissue was performed according to a previous proteome analysis report of FFPE tissue with modifications.93 Ten 10-μm thick sections were cut from FFPE blocks, and tumor areas were dissected according to H&E staining of adjacent section. After dewaxing, the tissue samples were lysed with 100 mM Tris and 5% SDS, sonicated and incubated at 98°C for 20 min, and incubated at 80°C for 2 h. After centrifugation, the supernatant containing all soluble proteins was collected. Protein concentrations were determined by BCA assays (Pierce).
Mass spectrometry sample preparation
Aliquots of 50 μg of the extracted proteomes from frozen tissue were reduced with 5 mM dithiothreitol at 56°C for 30 min and alkylated with 11 mM iodoacetamide at room temperature for 30 min in the dark. Instead of trypsin digestion alone, samples were then digested with trypsin and Lys-C (Promega) at 1:50 (w/w) ratio to total input sample protein at 37°C for 12 h to reduce missed cleavage sites.94 We have used this approach in multiple prior studies.95,96 The digestion was stopped by the addition of trifluoroacetic acid to a final concentration of 1%. The mixture was centrifuged at 14,000 g for 10 min, and the supernatant was collected and desalted on a lab-made C18 StageTip. Desalted peptides were dried in a SpeedVac concentrator and dissolved in 10–15 μL of 3% acetonitrile/0.1% formic acid and stored at −80°C. Proteome (100 μg from each sample) from FFPE tissue was double digested (Trypsin/Lys-C) and processed by S-Trap system (ProtiFi) according to manufacturer’s manual. Then, samples were desalted on a lab-made C18 StageTip. Desalted peptides were dried in a SpeedVac vacuum concentrator and re-dissolved in 10–15 μL of 3% acetonitrile/0.1% formic acid and stored at −80°C until single-shot mass spectrometry analysis. We did not perform offline fractionation. In proteome data, we checked and found high measurement reproducibility of the quality-control samples across the mass spectrometry runs (Figures S1D and S1E).
Proteome sequencing by mass spectrometry
Approximately 1-μg peptides derived from frozen tissue were injected into a 50-cm C18 capillary column mounted to an Easy-nLC 1200 system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific). Peptides were eluted over a 200-min gradient in 2–35% buffer B (0.1% (v/v) formic acid, 99.9% (v/v) acetonitrile) and buffer A (0.1% (v/v) formic acid, 99.9% (v/v) HPLC-grade water) at a flow rate of 300 nL/min. MS data were acquired with an automatic switch between a full scan and 10 data-dependent MS/MS scans. MS/MS scans were acquired at a resolution of 15,000 at 200 m/z with an ion target value of 5×104, maximum injection time of 100 ms, and dynamic exclusion for 15 s in centroid mode. Desalted 1-μg peptides from FFPE tissue were dissolved in 3% acetonitrile/0.1% formic acid and were injected onto a C18 capillary column (Peptide BEH, 1.7 μm × 75 μm × 250 mm) on a nanoACQUITY UPLC system (Water) which was coupled to the Q Exactive Plus mass spectrometer (Thermo Scientific) via a Proxeon 2 nano electrospray source. Peptides were eluted with a non-linear 200 min gradient of 2–35% buffer B (0.1% (v/v) formic acid, 100% acetonitrile) at a flow rate of 300 nL/min. After each gradient, the column was washed with 90% buffer B for 5 min and re-equilibrated with 98% buffer A (0.1% formic acid, 100% HPLC-grade water). MS data were acquired with an automatic switch between a full scan and 10 data-dependent MS/MS scans (TopN method). Target value for the full scan MS spectra was 1 × 106 ions in the 380–1600 m/z range with a maximum injection time of 50 ms and resolution of 70,000 at 200 m/z with data collected in profile mode. Precursors were selected using using a 1.5 m/z isolation width. Precursors were fragmented by higher-energy C-trap dissociation (HCD) with a normalized collision energy of 27 eV. MS/MS scans were acquired at a resolution of 17,500 at 200 m/z with an ion target value of 5 × 104, maximum injection time of 50 ms, dynamic exclusion for 15 s and data collected in centroid mode.
Whole genome sequencing (WGS)
DNA was extracted from FFPE tissue sections using the in-house developed SeqPlus method and eluted in 100ul TE buffer. DNA was quantified using the Qubit DNA HS assay (Thermo Fisher). DNA libraries were generated from 200 ng of DNA samples using the Illumina TruSeq Nano library preparation protocol. Library quality was assessed using the Roche KAPA Library Quantification kits, Qubit dsDNA HS assay and Agilent D1000 screen tapes. Libraries were pooled and sequenced on an Illumina HiSeq X instrument targeting a coverage of 70× in tumor and 30× in normal tissue. FastQ generation was performed using BCL2FastQ, adapter trimming using Skewer and assessment of QC using FASTQC.
Genomic data analysis
For alignment, single nucleotide variant (SNV) and INDEL (insertions/deletions) detection, we followed the Genome Analysis Toolkit (GATK, version 4.1.9).81 We mapped qualified paired-end WGS reads to human reference genome (hg38) with BWA (0.7.17-r1188),82 the BAM files were sorted and marked with duplicates using picard tools (version 2.23.8), and base quality score recalibration was performed using the BaseRecalibrator. SNVs and INDELs were called from tumor and matched-normal pairs using MuTect2 from GATK with these filtration criteria: min-allele-fraction 0.05, min-median-read-position 10, min-reads-per-strand 2, unique-alt-read-count 5. The sequence variants were then annotated using the Funcotator function of GATK.
Somatic structural variant and copy number alteration analyses used BAM files that were processed in the somatic mutation detection pipeline. These BAM files were further processed and annotated by GRIDSS (v2.9.4),83 COBALT (v1.8),84 AMBER (v3.5),84 PURPLE (v2.51)84 and AnnotSV (v3.1 online).85,97 In brief, somatic structural variant was called by GRIDSS and somatic copy number alteration with ploidy was called by PURPLE integrating with B-allele frequency from AMBER, read depth ratios from COBALT, somatic variants from GATK pipeline and structural variants from GRIDSS. Then, structural variant was annotated by AnnotSV.
Recurrent SCNAs (somatic copy number alterations) were identified by Genomic Identification of Significant Targets in Cancer (GISTIC, version 2.0.23)86 to determine which SCNA regions were significantly amplified or deleted than expected by chance with q value at 0.05 in the cohort. The following parameters were used: Amplification Threshold = 0.3, Deletion Threshold = 0.3, Cap Values = 1.5, Confidence Level = 0.90, Join Segment Size = 4, Arm Level Peel Off = 1, Maximum Sample Segments = 10,000, Sample normalization method = mean. Default values for other factors were used.
Tissue microarray construction
Tissue microarrays were constructed from FFPE blocks of 118 primary CRC and 102 liver metastatic CRC tissue samples. Three 1-mm cores from each tissue paraffin block were drilled out and transferred to tissue array blocks using a TMA arrayer (3DHistech). Tumor and normal areas were selected based on rigorous review of individual histologic slides and electronic image-based coring target area selection.
QUANTIFICATION AND STATISTICAL ANALYSIS
Protein identification, quantification, and differential expression analysis
Label-free protein quantification was carried out with MaxQuant.77,98 The match-between-runs option was used, and NQ-deamidation was added to default variable modifications for frozen sample analysis. Two peptides were required to calculate a protein level ratio. After quantification, protein expression data was processed with Perseus software. Only proteins whose expression values were valid over 50% samples in at least one group (normal colon mucosa, primary colorectal cancer, liver metastatic colorectal cancer, or normal liver) were kept, and missing values were imputed with shifted Gaussian distribution, followed by differential expression analyses in Perseus software.78,99 Imputed and non-imputed protein expression data are shown in Table S7.
RNA sequencing
Total RNA was extracted from FFPE tissue sections using the Maxwell RSC RNA FFPE kit (Promega). RNA was quantified using the Qubit RNA HS assay (Thermo Fisher). Libraries were prepared from 2000 ng of input RNA using the Illumina TruSeq stranded RNA with ribo-zero depletion protocol. The quantity and quality of the libraries were assessed using the Roche KAPA Library quantification kit, Qubit dsDNA HS assay and Agilent D1000 screen tapes. Libraries were pooled and sequenced with an Illumina HiSeq X instrument, targeting 50 million reads per sample. Reads quality was assessed with FastQC and Picard QC programs. Trimmed raw reads were aligned to the human genome version hg38 using STAR (v2.7.5a)79 with default parameters, and gene annotation was added using Gencode 36. Gene-level count output of STAR was used for edgeR input (v3.30.3),80 followed by normalization and differential expression analysis.
Immunohistochemistry (IHC)
Four-mm sections were cut from FFPE tissue blocks. Paraffin was removed with xylene, and antigens were retrieval by heat-mediated epitope retrieval. Tissue sections were stained using a Leica BOND-MAX IHC stainer. The staining intensity of individual tumor cell was scored 0–3 and averaged across 3 independent tissue cores per case. The total weighted IHC score (IHC H-score with a theoretical range of 0–300) of a sample was calculated by multiplying the expression intensity of individual tumor areas (score, 0–3) by their relative contribution (0–100%) to total tumor area and then summing these numbers. All tissue samples were independently scored by two pathologists. In cases of discrepancies, the cases were reviewed again until a consensus score was reached. Representative microscopic images are shown in Figure S8.
Unsupervised consensus clustering of proteomic data
The normalized protein expression matrix of 135 pCRC and 123 mCRC was analyzed using ConsensusClusterPlus R package with the following parameters: maxK = 12, reps = 1,000 bootstraps, pItem = 0.8, pFeature = 1, clusterAlg = “hc”, distance = “elucidian”.87 To obtain representative proteome signatures, we applied uniform manifold approximation and projection (UMAP) onto the normalized protein expression matrix and reduced dimensions to 70.88 To simplify clustering, we performed joint unsupervised clustering of primary and metastatic CRC. The number of clustering was mainly determined by 4 factors below: (i) the average pairwise consensus matrix within consensus clusters, (ii) the delta plot of the relative change in the area under the cumulative distribution function (CDF) curve, (iii) tracking plot, and (iv) cluster consensus values. We selected consensus matrix with k = 6 which yielded the cleanest separation among clusters with enough sample number in each subtype (Figures S4B–S4E).
Gene set enrichment analysis (GSEA)
Fold change values of gene expression between specified groups were calculated by edgeR and Perseus. We used clusterProfiler (v2.1.2)89 with ranked gene list that was ordered by fold-change value. We conducted enrichment test of GOBP, Hallmark, KEGG, and Reactome gene sets obtained from Molecular Signatures Database (MSigDB). For functional characterization of each sample by single sample GSEA, we calculated normalized enrichment scores (NES) of cancer-relevant gene sets by projecting the matrix of normalized RNA expression (TPM value) and protein expression onto GOBP, Hallmark, KEGG and Reactome pathway gene sets using ssGSEA implementation available on https://github.com/broadinstitute/ssGSEA2.0 with following parameters: sample.norm.type = “log”, weight = 0.75, statistic = “area.under.RES”, output.score.type = “NES”, nperm = 1000, min.overlap = 3, correl.type = “z.score”. To elucidate gene set enrichment in each sample, we applied single sample GSEA on normalized RNA expression data and proteome data100 and calculated the normalized enrichment score (NES) for MSigDB terms.
Co-occurrence and mutual exclusiveness analysis
For given gene lists, we performed pairwise Fisher’s exact test and estimated odds ratio by using the fisher.test package in R. Odds ratios were log-transformed, with positive values indicating co-occurrence and negative values indicating exclusiveness.
Protein-protein interaction analysis
Pathway enrichments of protein-protein networks were assessed by Metascape.29 The analysis has been carried out with the following databases: STRING,101 BioGrid,102 OmniPath,103 and InWeb_IM.104 Only physical interactions in STRING (physical score >0.132) and BioGrid were used. The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. Densely connected network components were identified by the Molecular Complex Detection (MCODE) algorithm.90
Proteome-based stemness scores
According to a computational analysis method of mRNA-based stemness scores.105 we first built a predictive model using one-class logistic regression (OCLR)106 with leave-one-out cross validation and tested the accuracy of the model with the pluripotent stem cell samples (ESC and iPSC) from the Progenitor Cell Biology Consortium (PCBC) dataset.107,108 To ensure compatibility with our cohort, we mapped the gene names from Ensembl IDs to the Human Genome Organization (HUGO) and dropped genes that had no mapping. The resulting training matrix contained 12,953 mRNA expression values measured across all available PCBC samples. To calculate mRNA-based stemness score, we used TPM (transcript per million) mRNA expression values from our cohort and available normalized expression values from the GSE50760 transcriptome datasets. To calculate protein expression-based signatures, since no proteome data of pluripotent stem cell samples are available in the PCBC dataset, we downloaded CPTAC/TCGA colorectal cancer datasets whose mRNA-based stemness scores and proteome are available.4 After merging these datasets, samples from 56 patients had both stemness scores and proteome data, from which we calculated correlation coefficients between colorectal cancer mRNA-based stemness scores and mass spectrometry-based protein expression. We selected 997 genes that showed significant positive correlation, which we considered colorectal cancer stemness markers. Using these proteins, we trained the prediction model again and applied the model to our proteome dataset to obtain proteome-based stemness scores (Table S8).
Immune scores
Immune scores were calculated from the normalized protein expression matrix of 258 CRC samples using ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data) package in R.57 Immune scores of CPTAC cohort and GSE50760 cohort were calculated based on normalized proteome expression and mRNA expression data, respectively. To assess immune cell profile of CRC samples, we performed IHC with immune cell markers (CD3, CD4, CD8, FOXP3, and CD68) and counted positive cells in 3 tissue cores for each sample. The average cell number/TMA core was used for downstream analysis.
Supplementary Material
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Rabbit polyclonal anti-HIF1a | Sigma-Aldrich | Cat. # HPA001275; RRID: AB_1079057 |
| Mouse monoclonal anti-CD44 | Cell Signaling | Cat. # 3570; RRID: AB_2076465 |
| Mouse monocloal anti-SOX2 | Cell Signaling | Cat. # 14962; RRID: AB_2798664 |
| Rabbit monocloal anti-SOX9 | Cell Signaling | Cat. # 82630; RRID: AB_2665492 |
| Rabbit monoclonal anti-c-Myc | Abcam | Cat. # ab32072; RRID: AB_731658 |
| Mouse monoclonal anti-L1CAM | BioLegend | Cat. # SIG-3911; RRID: N/A |
| Rabbit monoclonal anti-pAKT | Abcam | Cat. # ab81283; RRID: AB_2224551 |
| Rabbit monoclonal anti-p-mTOR | Cell Signaling | Cat. # 2976; RRID: AB_490932 |
| Rabbit monoclonal anti-pERK | Cell Signaling | Cat. # 4370; RRID: AB_2315112 |
| Mouse monoclonal anti-LXR | SANTA CRUZ BIOTEHCNOLOGY | Cat. # sc-271064; RRID: AB_10611071 |
| Mouse monoclonal anti-RXR | SANTA CRUZ BIOTEHCNOLOGY | Cat. # sc-46659; RRID: AB_2184877 |
| Mouse monoclonal anti-Ki-67 | Dako | Cat. #M7240; RRID: N/A |
| Rabbit monoclonal anti-pRb | Cell Signaling | Cat. # 8516; RRID: AB_11178658 |
| Rabbit monoclonal anti-PTEN | Cell Signaling | Cat. # 9559; RRID: AB_390810 |
| Mouse monoclonal anti-CD3 | Leica Biosystems | Cat. # NCL-L-CD3-565; RRID: AB_563541 |
| Rabbit monoclonal anti-CD4 | Sigma-Aldrich | Cat. # 104R-1; RRID: N/A |
| Mouse monoclonal anti-CD8 | Dako | Cat. #M7103; RRID: N/A |
| Mouse monoclonal anti-FOXP3 | Abcam | Cat. # ab20034; RRID: AB_445284 |
| Mouse monoclonal anti-CD68 | Dako | Cat. #M0876; RRID: N/A |
| Rabbit monoclonal anti-SNAI | Abcam | Cat. # ab85936; RRID: AB_1925448 |
| Mouse monoclonal anti-Twsit | Abcam | Cat. # ab175430; RRID: AB_2928035 |
| Rabbit monoclonal anti-ZEB1 | Sigma-Aldrich | Cat. # HPA027524; RRID: AB_1844977 |
| Rabbit polyclonal anti-ATRX | Sigma-Aldrich | Cat. # HPA001906; RRID: AB_1078249 |
| Rabbit monoclonal anti-DAXX | Abcam | Cat. # ab32140; RRID: AB_731847 |
| Rabbit monoclonal anti-IRF1 | Abcam | Cat. # ab243895; RRID: AB_2832955 |
| Mouse monoclonal anti-CIITA | SANTA CRUZ BIOTEHCNOLOGY | Cat. # sc-13556; RRID: AB_627261 |
| Rabbit monoclonal anti-HLA class I ABC | Abcam | Cat. # ab225636; RRID: N/A |
| Mouse monoclonal anti-HLA class DR + DP + DQ | Abcam | Cat. # ab7856; RRID: AB_306142 |
| Biological Samples | ||
| Colorectal cancer tissue including normal adjacent tissue | Precision Pathology Biobanking Center of Memorial Sloan Kettering Cancer Center |
N/A |
| Chemicals, Peptides, and Recombinant Proteins | ||
| Urea | Sigma-Aldrich | Cat. #U0631 |
| Ammonium bicarbonate | Sigma-Aldrich | Cat. # 09830 |
| Phosphatase Inhibitor Cocktail 2 | Sigma-Aldrich | Cat. #P5726 |
| Phosphatase Inhibitor Cocktail 3 | Sigma-Aldrich | Cat. #P0044 |
| Protease inhibitors, EDTA-free | Roche | Cat. # 04693150001 |
| DL-Dithiothreitol | Sigma-Aldrich | Cat. # 43815 |
| Trypsin | Promega | Cat. #V5111 |
| Lys-C | Promega | Cat. #V167 |
| Acetonitrile | Fisher chemical | Cat. # A955 |
| Acetic acid | Fisher chemical | Cat. # A113 |
| Empore SPE Disks | Sigma-Aldrich | Cat. # 66883-U |
| Trizma base | Sigma-Aldrich | Cat. # 77-86-1 |
| Sodium dodecyl sulfate | Sigma-Aldrich | Cat. #L3771 |
| Formic acid | Fisher chemical | Cat. # A117-10X1AMP |
| S-Trap | ProtiFi | Cat. #C02-mini |
| Iodoacetamide | Sigma-Aldrich | Cat. #I1149 |
| Critical Commercial Assays | ||
| TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Gold | Illumina | Cat. # RS-122-2301 |
| Illumina TruSeq Nano library | Illumina | Cat. # 20015964 |
| BCA Protein Assay Kit | ThermoFisher Scientific | Cat. # A53225 |
| Deposited Data | ||
| Proteome data of cohort 1 | This paper | PXD034575 |
| Proteome data of cohort 2 | This paper | PXD031705 |
| WGS data and transcriptome data of cohort 1 | This paper | EGAS00001006464, EGAS00001006465 |
| Proteogenomic analysis data of colorectal cancer from CPTAC | (Vasaikar et al.)7 | http://www.linkedomics.org/data_download/CPTAC-COAD/ |
| Transcriptome data of 18 paired colon cancer (normal-primary-metastasis) | (Kim et al.)75 | GSE50760 |
| Colon cancer transcriptome data of PanCancerAltas project | (Hoadley et al.)76 | N/A |
| Software and Algorithms | ||
| MaxQuant (v1.6.4) | (Cox et al.)77 | https://maxquant.org/maxquant/ |
| Perseus (v1.6.15) | (Tyanova et al.)78 | https://maxquant.org/perseus/ |
| STAR (v2.7.5b) | (Dobin et al.)79 | https://github.com/alexdobin/STAR |
| edgeR (v3.30.3) | (Robinson et al.)80 | https://bioconductor.org/packages/release/bioc/html/edgeR.html |
| GATK version 4.1.9 | (McKenna et al.)81 | https://github.com/broadinstitute/gatk |
| BWA (v0.7.17-r1188) | (Li et al.)82 | https://github.com/lh3/bwa |
| picard (v2.23.8) | Broad Institute | http://broadinstitute.github.io/picard/ |
| GRIDSS (v2.9.4) | (Cameron et al.)83 | https://github.com/PapenfussLab/gridss |
| COBALT (v1.8) | (Cameron et al.)84 | https://github.com/hartwigmedical/hmftools/blob/master/cobalt |
| AMBER (v3.5) | (Cameron et al.)84 | https://github.com/hartwigmedical/hmftools/blob/master/amber |
| PURPLE (v2.51) | (Cameron et al.)84 | https://github.com/hartwigmedical/hmftools/blob/master/purple |
| AnnotSV (v3.1 online) | (Geoffroy et al.)85 | https://lbgi.fr/AnnotSV/ |
| GISTIC2.0 (v2.0.23) | (Mermel et al.)86 | https://github.com/broadinstitute/gistic2 |
| ConsensusClusterPlus (v1.52.0) | (Wilkerson et al.)87 | https://git.bioconductor.org/packages/ConsensusClusterPlus |
| umap (v0.2.7.0) | (McInnes et al.)88 | https://github.com/tkonopka/umap |
| clusterprofiler (v2.1.2) | (Yu et al.)89 | https://github.com/YuLab-SMU/clusterProfiler |
| single sample GSEA | Broad Institute | https://github.com/broadinstitute/ssGSEA2.0 |
| Metascape | (Zhou et al.)29 | https://metascape.org/gp/index.html#/main/step1 |
| MCODE | (Bader et al.)90 | ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE |
| ESTIMATE (v1.0.13) | (Yoshihara et al.)57 | https://sourceforge.net/projects/estimateproject/ |
| R (v4.0.2) | R Development Core Team | https://www.R-project.org/ |
Highlights.
Distinct proteogenomic subtypes of colorectal cancer characterize primaries and metastases
A hypoxic signature was enriched in metastases along with metabolic reprogramming
A cancer stemness signature with ALT feature was elevated in metastases
Immune-cold cancer showed suppression of antigen presentation, especially in metastases
ACKNOWLEDGMENTS
M.H.A.R. acknowledges grants from the NIH/NCI (R21 CA251992, R21 CA263262, and U01 CA263986), a Cycle for Survival Equinox Innovation grant, an investigator grant from the Neuroendocrine Tumor Research Foundation (NETRF), and support from the Farmer Family Foundation. Parts of the study were supported by an MSKCC NCI Cancer Center Support Grant (P30 CA008748). The funding bodies were not involved in the design of the study or the collection, analysis, and interpretation of data.
Footnotes
SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.celrep.2024.113810.
DECLARATION OF INTERESTS
D.S.K. was a consultant for and is now employed by Paige.AI. P.G.B. and J.G. are employees of and hold equity interests in Genuity Science. J.Y.W. is the founder of Curandis. M.H.A.R. is a member of the scientific advisory boards of Azenta Life Sciences and Universal DX. None of these companies had any role in design, execution, data analysis, or any other aspect of this study.
REFERENCES
- 1.Siegel RL, Miller KD, Fuchs HE, and Jemal A (2022). Cancer statistics, 2022. CA. Cancer J. Clin 72, 7–33. 10.3322/caac.21708. [DOI] [PubMed] [Google Scholar]
- 2.Biller LH, and Schrag D (2021). Diagnosis and Treatment of Metastatic Colorectal Cancer: A Review. JAMA 325, 669–685. 10.1001/jama.2021.0106. [DOI] [PubMed] [Google Scholar]
- 3.(2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337. 10.1038/nature11252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, Chambers MC, Zimmerman LJ, Shaddox KF, Kim S, et al. (2014). Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387. 10.1038/nature13438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mendelaar PAJ, Smid M, van Riet J, Angus L, Labots M, Steeghs N, Hendriks MP, Cirkel GA, van Rooijen JM, Ten Tije AJ, et al. (2021). Whole genome sequencing of metastatic colorectal cancer reveals prior treatment effects and specific metastasis features. Nat. Commun 12, 574. 10.1038/s41467-020-20887-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Stodolna A, He M, Vasipalli M, Kingsbury Z, Becq J, Stockton JD, Dilworth MP, James J, Sillo T, Blakeway D, et al. (2021). Clinical-grade whole-genome sequencing and 3’ transcriptome analysis of colorectal cancer patients. Genome Med. 13, 33. 10.1186/s13073-021-00852-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Vasaikar S, Huang C, Wang X, Petyuk VA, Savage SR, Wen B, Dou Y, Zhang Y, Shi Z, Arshad OA, et al. (2019). Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities. Cell 177, 1035–1049.e19. 10.1016/j.cell.2019.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li C, Sun YD, Yu GY, Cui JR, Lou Z, Zhang H, Huang Y, Bai CG, Deng LL, Liu P, et al. (2020). Integrated Omics of Metastatic Colorectal Cancer. Cancer Cell 38, 734–747.e9. 10.1016/j.ccell.2020.08.002. [DOI] [PubMed] [Google Scholar]
- 9.Imperial R, Ahmed Z, Toor OM, Erdoğan C, Khaliq A, Case P, Case J, Kennedy K, Cummings LS, Melton N, et al. (2018). Comparative proteogenomic analysis of right-sided colon cancer, left-sided colon cancer and rectal cancer reveals distinct mutational profiles. Mol. Cancer 17, 177. 10.1186/s12943-018-0923-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Muzny DM, Bainbridge MN, Chang K, Dinh HH, Drummond JA, Fowler G, Kovar CL, Lewis LR, Morgan MB, Newsham IF, et al. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337. 10.1038/nature11252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cheng DT, Mitchell TN, Zehir A, Shah RH, Benayed R, Syed A, Chandramohan R, Liu ZY, Won HH, Scott SN, et al. (2015). Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J. Mol. Diagn 17, 251–264. 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, and Forbes SA (2018). The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705. 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Saldivar JC, Miuma S, Bene J, Hosseini SA, Shibata H, Sun J, Wheeler LJ, Mathews CK, and Huebner K (2012). Initiation of genome instability and preneoplastic processes through loss of Fhit expression. PLoS Genet. 8, e1003077. 10.1371/journal.pgen.1003077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, et al. (2020). Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121. 10.1038/s41586-019-1913-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sard L, Accornero P, Tornielli S, Delia D, Bunone G, Campiglio M, Colombo MP, Gramegna M, Croce CM, Pierotti MA, and Sozzi G (1999). The tumor-suppressor gene FHIT is involved in the regulation of apoptosis and in cell cycle control. Proc. Natl. Acad. Sci. USA 96, 8489–8492. 10.1073/pnas.96.15.8489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Saldivar JC, Shibata H, and Huebner K (2010). Pathology and biology associated with the fragile FHIT gene and gene product. J. Cell. Biochem 109, 858–865. 10.1002/jcb.22481. [DOI] [PubMed] [Google Scholar]
- 17.Ishaque N, Abba ML, Hauser C, Patil N, Paramasivam N, Huebschmann D, Leupold JH, Balasubramanian GP, Kleinheinz K, Toprak UH, et al. (2018). Whole genome sequencing puts forward hypotheses on metastasis evolution and therapy in colorectal cancer. Nat. Commun 9, 4782. 10.1038/s41467-018-07041-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Priestley P, Baber J, Lolkema MP, Steeghs N, de Bruijn E, Shale C, Duyvesteyn K, Haidari S, van Hoeck A, Onstenk W, et al. (2019). Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216. 10.1038/s41586-019-1689-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.(2017). AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831. 10.1158/2159-8290.Cd-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sakthianandeswaren A, Parsons MJ, Mouradov D, MacKinnon RN, Catimel B, Liu S, Palmieri M, Love C, Jorissen RN, Li S, et al. (2018). MACROD2 Haploinsufficiency Impairs Catalytic Activity of PARP1 and Promotes Chromosome Instability and Growth of Intestinal Tumors. Cancer Discov. 8, 988–1005. 10.1158/2159-8290.Cd-17-0909. [DOI] [PubMed] [Google Scholar]
- 21.Waters CE, Saldivar JC, Hosseini SA, and Huebner K (2014). The FHIT gene product: tumor suppressor and genome “caretaker. Cell. Mol. Life Sci 71, 4577–4587. 10.1007/s00018-014-1722-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Santoliquido BM, Frenquelli M, Contadini C, Bestetti S, Gaviraghi M, Barbieri E, De Antoni A, Albarello L, Amabile A, Gardini A, et al. (2021). Deletion of a pseudogene within a fragile site triggers the oncogenic expression of the mitotic CCSER1 gene. Life Sci. Alliance 4, e202101019. 10.26508/lsa.202101019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Guo X, Song C, Fang L, Li M, Yue L, and Sun Q (2020). FLRT2 functions as Tumor Suppressor gene inactivated by promoter methylation in Colorectal Cancer. J. Cancer 11, 7329–7338. 10.7150/jca.47558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bae H, Kim B, Lee H, Lee S, Kang HS, and Kim SJ (2017). Epigenetically regulated Fibronectin leucine rich transmembrane protein 2 (FLRT2) shows tumor suppressor activity in breast cancer cells. Sci. Rep 7, 272. 10.1038/s41598-017-00424-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yanagawa T, Denda K, Inatani T, Fukushima T, Tanaka T, Kumaki N, Inagaki Y, and Komada M (2016). Deficiency of X-Linked Protein Kinase Nrk during Pregnancy Triggers Breast Tumor in Mice. Am. J. Pathol 186, 2751–2760. 10.1016/j.ajpath.2016.06.005. [DOI] [PubMed] [Google Scholar]
- 26.Yan L, Gong YZ, Shao MN, Ruan GT, Xie HL, Liao XW, Wang XK, Han QF, Zhou X, Zhu LC, et al. (2020). Distinct diagnostic and prognostic values of γ-aminobutyric acid type A receptor family genes in patients with colon adenocarcinoma. Oncol. Lett 20, 275–291. 10.3892/ol.2020.11573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wilson C, Lin JE, Li P, Snook AE, Gong J, Sato T, Liu C, Girondo MA, Rui H, Hyslop T, and Waldman SA (2014). The paracrine hormone for the GUCY2C tumor suppressor, guanylin, is universally lost in colorectal cancer. Cancer Epidemiol. Biomarkers Prev 23, 2328–2337. 10.1158/1055-9965.Epi-14-0440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Geissler R, and Grimson A (2016). A position-specific 3’UTR sequence that accelerates mRNA decay. RNA Biol. 13, 1075–1077. 10.1080/15476286.2016.1225645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, and Chanda SK (2019). Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun 10, 1523. 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Krämer A, Green J, Pollard J Jr., and Tugendreich S (2014). Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523–530. 10.1093/bioinformatics/btt703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Derangère V, Chevriaux A, Courtaut F, Bruchard M, Berger H, Chalmin F, Causse SZ, Limagne E, Végran F, Ladoire S, et al. (2014). Liver X receptor β activation induces pyroptosis of human and murine colon cancer cells. Cell Death Differ. 21, 1914–1924. 10.1038/cdd.2014.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yamazaki K, Shimizu M, Okuno M, Matsushima-Nishiwaki R, Kanemura N, Araki H, Tsurumi H, Kojima S, Weinstein IB, and Moriwaki H (2007). Synergistic effects of RXR alpha and PPAR gamma ligands to inhibit growth in human colon cancer cells–phosphorylated RXR alpha is a critical target for colon cancer management. Gut 56, 1557–1563. 10.1136/gut.2007.129858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Korinek V, Barker N, Moerer P, van Donselaar E, Huls G, Peters PJ, and Clevers H (1998). Depletion of epithelial stem-cell compartments in the small intestine of mice lacking Tcf-4. Nat. Genet 19, 379–383. 10.1038/1270. [DOI] [PubMed] [Google Scholar]
- 34.Angus-Hill ML, Elbert KM, Hidalgo J, and Capecchi MR (2011). T-cell factor 4 functions as a tumor suppressor whose disruption modulates colon cell proliferation and tumorigenesis. Proc. Natl. Acad. Sci. USA 108, 4914–4919. 10.1073/pnas.1102300108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.van Es JH, Haegebarth A, Kujala P, Itzkovitz S, Koo BK, Boj SF, Korving J, van den Born M, van Oudenaarden A, Robine S, and Clevers H (2012). A critical role for the Wnt effector Tcf4 in adult intestinal homeostatic self-renewal. Mol. Cell Biol 32, 1918–1927. 10.1128/mcb.06288-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hrckulak D, Janeckova L, Lanikova L, Kriz V, Horazna M, Babosova O, Vojtechova M, Galuskova K, Sloncova E, and Korinek V (2018). Wnt Effector TCF4 Is Dispensable for Wnt Signaling in Human Cancer Cells. Genes 9, 439. 10.3390/genes9090439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Guinney J, Dienstmann R, Wang X, de Reyniès A, Schlicker A, Soneson C, Marisa L, Roepman P, Nyamundanda G, Angelino P, et al. (2015). The consensus molecular subtypes of colorectal cancer. Nat. Med 21, 1350–1356. 10.1038/nm.3967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, Dharia NV, Montgomery PG, Cowley GS, Pantel S, et al. (2017). Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet 49, 1779–1784. 10.1038/ng.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ghandi M, Huang FW, Jané-Valbuena J, Kryukov GV, Lo CC, McDonald ER 3rd, Barretina J, Gelfand ET, Bielski CM, Li H, et al. (2019). Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508. 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Baran B, Mert Ozupek N, Yerli Tetik N, Acar E, Bekcioglu O, and Baskin Y (2018). Difference Between Left-Sided and Right-Sided Colorectal Cancer: A Focused Review of Literature. Gastroenterol. Res 11, 264–273. 10.14740/gr1062w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Rankin EB, and Giaccia AJ (2016). Hypoxic control of metastasis. Science 352, 175–180. 10.1126/science.aaf4405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Eales KL, Hollinshead KER, and Tennant DA (2016). Hypoxia and metabolic adaptation of cancer cells. Oncogenesis 5, e190. 10.1038/oncsis.2015.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Xiong G, Stewart RL, Chen J, Gao T, Scott TL, Samayoa LM, O’Connor K, Lane AN, and Xu R (2018). Collagen prolyl 4-hydroxylase 1 is essential for HIF-1α stabilization and TNBC chemoresistance. Nat. Commun 9, 4456. 10.1038/s41467-018-06893-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Tanaka A, Zhou Y, Shia J, Ginty F, Ogawa M, Klimstra DS, Hendrickson RC, Wang JY, and Roehrl MH (2020). Prolyl 4-hydroxylase alpha 1 protein expression risk-stratifies early stage colorectal cancer. Oncotarget 11, 813–824. 10.18632/oncotarget.27491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Agarwal S, Behring M, Kim HG, Bajpai P, Chakravarthi BVSK, Gupta N, Elkholy A, Al Diffalha S, Varambally S, and Manne U (2020). Targeting P4HA1 with a Small Molecule Inhibitor in a Colorectal Cancer PDX Model. Transl. Oncol 13, 100754. 10.1016/j.tranon.2020.100754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Muz B, de la Puente P, Azab F, and Azab AK (2015). The role of hypoxia in cancer progression, angiogenesis, metastasis, and resistance to therapy. Hypoxia 3, 83–92. 10.2147/hp.S93413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ortmann B, Druker J, and Rocha S (2014). Cell cycle progression in response to oxygen levels. Cell. Mol. Life Sci 71, 3569–3582. 10.1007/s00018-014-1645-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yuan J, Narayanan L, Rockwell S, and Glazer PM (2000). Diminished DNA repair and elevated mutagenesis in mammalian cells exposed to hypoxia and low pH. Cancer Res. 60, 4372–4376. [PubMed] [Google Scholar]
- 49.Coquelle A, Rozier L, Dutrillaux B, and Debatisse M (2002). Induction of multiple double-strand breaks within an hsr by meganucleaseI-SceI expression or fragile site activation leads to formation of double minutes and other chromosomal rearrangements. Oncogene 21, 7671–7679. 10.1038/sj.onc.1205880. [DOI] [PubMed] [Google Scholar]
- 50.Deshmukh AP, Vasaikar SV, Tomczak K, Tripathi S, den Hollander P, Arslan E, Chakraborty P, Soundararajan R, Jolly MK, Rai K, et al. (2021). Identification of EMT signaling cross-talk and gene regulatory networks by single-cell RNA sequencing. Proc. Natl. Acad. Sci. USA 118, e2102050118. 10.1073/pnas.2102050118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Zhang YE (2009). Non-Smad pathways in TGF-beta signaling. Cell Res. 19, 128–139. 10.1038/cr.2008.328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Huynh LK, Hipolito CJ, and Ten Dijke P (2019). A Perspective on the Development of TGF-β Inhibitors for Cancer Treatment. Biomolecules 9, 743. 10.3390/biom9110743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Liberti MV, and Locasale JW (2016). The Warburg Effect: How Does it Benefit Cancer Cells? Trends Biochem. Sci 41, 211–218. 10.1016/j.tibs.2015.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Young RA (2011). Control of the embryonic stem cell state. Cell 144, 940–954. 10.1016/j.cell.2011.01.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Braun DM, Chung I, Kepper N, Deeg KI, and Rippe K (2018). TelNet - a database for human and yeast genes involved in telomere maintenance. BMC Genet. 19, 32. 10.1186/s12863-018-0617-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Liau JY, Lee JC, Tsai JH, Yang CY, Liu TL, Ke ZL, Hsu HH, and Jeng YM (2015). Comprehensive screening of alternative lengthening of telomeres phenotype and loss of ATRX expression in sarcomas. Mod. Pathol 28, 1545–1554. 10.1038/modpathol.2015.114. [DOI] [PubMed] [Google Scholar]
- 57.Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, Treviño V, Shen H, Laird PW, Levine DA, et al. (2013). Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun 4, 2612. 10.1038/ncomms3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Der SD, Zhou A, Williams BR, and Silverman RH (1998). Identification of genes differentially regulated by interferon alpha, beta, or gamma using oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 95, 15623–15628. 10.1073/pnas.95.26.15623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.van den Elsen PJ, Holling TM, van der Stoep N, and Boss JM (2003). DNA methylation and expression of major histocompatibility complex class I and class II transactivator genes in human developmental tumor cells and in T cell malignancies. Clin. Immunol 109, 46–52. 10.1016/s1521-6616(03)00200-6. [DOI] [PubMed] [Google Scholar]
- 60.Mora-García M.d.L., Duenas-González A, Hernández-Montes J, De la Cruz-Hernández E, Pérez-Cárdenas E, Weiss-Steider B, Santiago-Osorio E, Ortíz-Navarrete VF, Rosales VH, Cantú D, et al. (2006). Up-regulation of HLA class-I antigen expression and antigen-specific CTL response in cervical cancer cells by the demethylating agent hydralazine and the histone deacetylase inhibitor valproic acid. J. Transl. Med 4, 55. 10.1186/1479-5876-4-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Angell TE, Lechner MG, Jang JK, LoPresti JS, and Epstein AL (2014). MHC class I loss is a frequent mechanism of immune escape in papillary thyroid cancer that is reversed by interferon and selumetinib treatment in vitro. Clin. Cancer Res 20, 6034–6044. 10.1158/1078-0432.CCR-14-0879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Pitkänen E, Cajuso T, Katainen R, Kaasinen E, Välimäki N, Palin K, Taipale J, Aaltonen LA, and Kilpivaara O (2014). Frequent L1 ret-rotranspositions originating from TTC28 in colorectal cancer. Oncotarget 5, 853–859. 10.18632/oncotarget.1781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Chang J, Tan W, Ling Z, Xi R, Shao M, Chen M, Luo Y, Zhao Y, Liu Y, Huang X, et al. (2017). Genomic analysis of oesophageal squamous-cell carcinoma identifies alcohol drinking-related mutation signature and genomic alterations. Nat. Commun 8, 15290. 10.1038/ncomms15290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Chen J, Castelvecchi GD, Li-Villarreal N, Raught B, Krezel AM, McNeill H, and Solnica-Krezel L (2018). Atypical Cadherin Dachsous1b Interacts with Ttc28 and Aurora B to Control Microtubule Dynamics in Embryonic Cleavages. Dev. Cell 45, 376–391.e5. 10.1016/j.devcel.2018.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Izumiyama T, Minoshima S, Yoshida T, and Shimizu N (2012). A novel big protein TPRBK possessing 25 units of TPR motif is essential for the progress of mitosis and cytokinesis. Gene 511, 202–217. 10.1016/j.gene.2012.09.061. [DOI] [PubMed] [Google Scholar]
- 66.Cindy Yang SY, Lien SC, Wang BX, Clouthier DL, Hanna Y, Cirlan I, Zhu K, Bruce JP, El Ghamrasni S, Iafolla MAJ, et al. (2021). Pan-cancer analysis of longitudinal metastatic tumors reveals genomic alterations and immune landscape dynamics associated with pembrolizumab sensitivity. Nat. Commun 12, 5137. 10.1038/s41467-021-25432-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Dhatchinamoorthy K, Colbert JD, and Rock KL (2021). Cancer Immune Evasion Through Loss of MHC Class I Antigen Presentation. Front. Immunol 12, 636568. 10.3389/fimmu.2021.636568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Carr RM, Qiao G, Qin J, Jayaraman S, Prabhakar BS, and Maker AV (2016). Targeting the metabolic pathway of human colon cancer overcomes resistance to TRAIL-induced apoptosis. Cell Death Dis. 2, 16067. 10.1038/cddiscovery.2016.67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.La Vecchia S, and Sebastián C (2020). Metabolic pathways regulating colorectal cancer initiation and progression. Semin. Cell Dev. Biol 98, 63–70. 10.1016/j.semcdb.2019.05.018. [DOI] [PubMed] [Google Scholar]
- 70.Kim J, Woo AJ, Chu J, Snow JW, Fujiwara Y, Kim CG, Cantor AB, and Orkin SH (2010). A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143, 313–324. 10.1016/j.cell.2010.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Ben-Porath I, Thomson MW, Carey VJ, Ge R, Bell GW, Regev A, and Weinberg RA (2008). An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat. Genet 40, 499–507. 10.1038/ng.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kim J, and Orkin SH (2011). Embryonic stem cell-specific signatures in cancer: insights into genomic regulatory networks and implications for medicine. Genome Med. 3, 75. 10.1186/gm291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Robert C, Schachter J, Long GV, Arance A, Grob JJ, Mortier L, Daud A, Carlino MS, McNeil C, Lotem M, et al. (2015). Pembrolizumab versus Ipilimumab in Advanced Melanoma. N. Engl. J. Med 372, 2521–2532. 10.1056/NEJMoa1503093. [DOI] [PubMed] [Google Scholar]
- 74.Larkin J, Chiarion-Sileni V, Gonzalez R, Grob JJ, Cowey CL, Lao CD, Schadendorf D, Dummer R, Smylie M, Rutkowski P, et al. (2015). Combined Nivolumab and Ipilimumab or Monotherapy in Untreated Melanoma. N. Engl. J. Med 373, 23–34. 10.1056/NEJMoa1504030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kim SK, Kim SY, Kim JH, Roh SA, Cho DH, Kim YS, and Kim JC (2014). A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol. Oncol 8, 1653–1666. 10.1016/j.molonc.2014.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, Shen R, Taylor AM, Cherniack AD, Thorsson V, et al. (2018). Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 173, 291–304.e6. 10.1016/j.cell.2018.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Cox J, and Mann M (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol 26, 1367–1372. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 78.Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, and Cox J (2016). The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740. 10.1038/nmeth.3901. [DOI] [PubMed] [Google Scholar]
- 79.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Robinson MD, McCarthy DJ, and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, and DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Li H, and Durbin R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Cameron DL, Schröder J, Penington JS, Do H, Molania R, Dobrovic A, Speed TP, and Papenfuss AT (2017). GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res. 27, 2050–2060. 10.1101/gr.222109.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Cameron DL, Baber J, Shale C, Papenfuss AT, Valle-Inclan JE, Besselink N, Cuppen E, and Priestley P (2019). 10.1101/781013. [DOI] [PMC free article] [PubMed]
- 85.Geoffroy V, Guignard T, Kress A, Gaillard JB, Solli-Nowlan T, Schalk A, Gatinois V, Dollfus H, Scheidecker S, and Muller J (2021). AnnotSV and knotAnnotSV: a web server for human structural variations annotations, ranking and analysis. Nucleic Acids Res. 49, W21–W28. 10.1093/nar/gkab402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, and Getz G (2011). GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41. 10.1186/gb-2011-12-4-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Wilkerson MD, and Hayes DN (2010). ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573. 10.1093/bioinformatics/btq170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.McInnes L, and Healy J (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.
- 89.Yu G, Wang LG, Han Y, and He QY (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Bader GD, and Hogue CWV (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinf. 4, 2. 10.1186/1471-2105-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Vasaikar SV, Straub P, Wang J, and Zhang B (2018). LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res. 46, D956–D963. 10.1093/nar/gkx1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot TS, Malta TM, Pagnotta SM, Castiglioni I, et al. (2016). TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71. 10.1093/nar/gkv1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Marchione DM, Ilieva I, Devins K, Sharpe D, Pappin DJ, Garcia BA, Wilson JP, and Wojcik JB (2020). HYPERsol: High-Quality Data from Archival FFPE Tissue for Clinical Proteomics. J. Proteome Res 19, 973–983. 10.1021/acs.jproteome.9b00686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Hakobyan A, Schneider MB, Liesack W, and Glatter T (2019). Efficient Tandem LysC/Trypsin Digestion in Detergent Conditions. Proteomics 19, e1900136. 10.1002/pmic.201900136. [DOI] [PubMed] [Google Scholar]
- 95.Tanaka A, Wang JY, Shia J, Zhou Y, Ogawa M, Hendrickson RC, Klimstra DS, and Roehrl MH (2020). DEAD-box RNA helicase protein DDX21 as a prognosis marker for early stage colorectal cancer with microsatellite instability. Sci. Rep 10, 22085. 10.1038/s41598-020-79049-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Tanaka A, Wang JY, Shia J, Zhou Y, Ogawa M, Hendrickson RC, Klimstra DS, and Roehrl MHA (2020). Maspin as a Prognostic Marker for Early Stage Colorectal Cancer With Microsatellite Instability. Front. Oncol 10, 945. 10.3389/fonc.2020.00945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Geoffroy V, Herenger Y, Kress A, Stoetzel C, Piton A, Dollfus H, and Muller J (2018). AnnotSV: an integrated tool for structural variations annotation. Bioinformatics 34, 3572–3574. 10.1093/bioinformatics/bty304. [DOI] [PubMed] [Google Scholar]
- 98.Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, and Mann M (2011). Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res 10, 1794–1805. 10.1021/pr101065j. [DOI] [PubMed] [Google Scholar]
- 99.Tusher VG, Tibshirani R, and Chu G (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121. 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl C, et al. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112. 10.1038/nature08460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613. 10.1093/nar/gky1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, and Tyers M (2006). BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539. 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Türei D, Korcsmáros T, and Saez-Rodriguez J (2016). Omni-Path: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967. 10.1038/nmeth.4077. [DOI] [PubMed] [Google Scholar]
- 104.Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, Workman CT, Rigina O, Rapacki K, Stærfeldt HH, et al. (2017). A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64. 10.1038/nmeth.4083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Malta TM, Sokolov A, Gentles AJ, Burzykowski T, Poisson L, Weinstein JN, Kamińska B, Huelsken J, Omberg L, Gevaert O, et al. (2018). Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation. Cell 173, 338–354.e15. 10.1016/j.cell.2018.03.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Sokolov A, Paull EO, and Stuart JM (2016). ONE-CLASS DETECTION OF CELL STATES IN TUMOR SUBTYPES. Pac. Symp. Biocomput 21, 405–416. [PMC free article] [PubMed] [Google Scholar]
- 107.Salomonis N, Dexheimer PJ, Omberg L, Schroll R, Bush S, Huo J, Schriml L, Ho Sui S, Keddache M, Mayhew C, et al. (2016). Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium. Stem Cell Rep. 7, 110–125. 10.1016/j.stemcr.2016.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Daily K, Ho Sui SJ, Schriml LM, Dexheimer PJ, Salomonis N, Schroll R, Bush S, Keddache M, Mayhew C, Lotia S, et al. (2017). Molecular, phenotypic, and sample-associated data to describe pluripotent stem cell lines and derivatives. Sci. Data 4, 170030. 10.1038/sdata.2017.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD034575 (cohort 1) and PXD031705 (cohort 2). Uploaded file lists are available in Tables S9 and S10. Data of WGS and transcriptome analysis were deposited to European Genome-Phenome archive (EGAS00001006464, EGAS00001006465).
This study analyzed 3 publicly available datasets for validation, including CPTAC CRC proteome data (via LinkedOmics website91; 97 primary CRCs, 100 normal tissues),7 CRC transcriptome data of the PanCancerAltas project (via TCGAbiolink R package92; 646 primary CRCs, 51 normal tissues),3,76 and CRC transcriptome data of paired samples (GSE50760 via GEO website; 18 normal/primary CRC/liver metastatic CRC triplets).75.
This paper does not report original code.
Any additional information required to reanalyze the data reported in this study is available from the lead contact upon request.
